A lexical analyser or scanner is a program that groups sequences of characters into lexemes, and
outputs (to the syntax analyser) a sequence of tokens. Here:
(a) Tokens are symbolic names for the entities that make up the text of the program; e.g.
if for the keyword if, and id for any identifier. These make up the output of the
(b) A pattern is a rule that specifies when a sequence of characters from the input constitutes a
token; e.g the sequence i, f for the token if , and any sequence of alpha numerics starting
with a letter for the token id.
(c) A lexeme is a sequence of characters from the input that match a pattern (and hence
constitute an instance of a token); for example if matches the pattern for if , and foo123bar
matches the pattern for id.
Consider the following example
program foo(input,output);var x:integer;begin
p, r, o, g, r, a, m
newlines, spaces, tabs
letter followed by seq. of alphanumerics
a left parenthesis
i, n, p, u, t
o, u, t, p, u, t
a right parenthesis
v, a, r