|
INTRODUCTION:
Lexical
analysis involves scanning the program to be compiled and recognizing the tokens
that make up the source statements Scanners or lexical analyzers are usually
designed to recognize keywords , operators , and identifiers , as well as
integers, floating point numbers , character strings , and other similar items
that are written as part of the source program . The exact set of tokens to be
recognized of course, depends upon the programming language being used to
describe it.
A sequence of
input characters that comprises a single token is called a lexeme. A lexical
analyzer can insulate a parser from the lexeme representation of
tokens. Following are the list of functions that lexical analyzers perform.
Removal of
white space and comment :
Many
languages allow “white space” to appear between tokens. Comments can likewise be
ignored by the parser and translator , so they may also be treated as white
space. If white space is eliminated by the lexical analyzer, the parser will
never have to consider it.
Constants
:
An integer
constant is a sequence of digits, integer constants can be allowed by adding
productions to the grammar for expressions, or by creating a token for such
constants . The job of collecting digits into integers is generally given to a
lexical analyzer because numbers can be treated as single units during
translation. The lexical analyzer passes both the token and attribute to the
parser.
Recognizing identifiers and keywords :
Languages use
identifiers as names of variables, arrays and functions. A grammar for a
language often treats an identifier as a token. Many languages use fixed
character strings such as begin, end , if, and so on , as punctuation marks or
to identify certain constructs. These character strings, called keywords,
generally satisfy the rules for forming identifiers.
SYSTEM
ANALYSIS:
The operating system used is MS-DOS
MS-DOS :
The MS-DOS is a single user, single
process, single processor operating system. Due to the confinement of the device
dependent code into one layer, the porting of MS-DOS has theoretically been
reduced to writing of the BIOS code for the new hardware. At the command
level it provides a hierarchical file system, input output redirection, pipes
and filters. User written commands can be invoked in the same way as the
standard system commands, giving the appearance as if the basic system
functionality has been extended.
Being a
single user system, it provides only file protection and access control. The
disk space is allocated in terms of a cluster of consecutive sectors. The
command language of MS-DOS has been designed to allow the user to interact
directly with the operating system using a CRT terminal. The principal use of
the command language is to initiate the execution of commands and programs. A
user program can be executed under MS-DOS by simply typing the name of the
executable file of the user program at the DOS command prompt.
The
programming language used here is C
programming
SYSTEM DESIGN:
Process:
The
lexical analyzer is the first phase of a compiler. Its main task is to read the
input characters and produce as output a sequence of tokens that the parser uses
for syntax analysis. This interaction, summarized schematically in fig. a.
Upon receiving a “get next token
“command from the parser, the lexical analyzer reads the input
characters until it can identify next token.
Sometimes ,
lexical analyzers are divided into a cascade of two phases, the first called
“scanning”, and the second “lexical analysis”.
The scanner
is responsible for doing simple tasks, while the lexical analyzer proper does
the more complex operations.
The lexical
analyzer which we have designed takes the input from a input file. It reads one
character at a time from the input file, and continues to read until end of the
file is reached. It recognizes the valid identifiers, keywords and specifies the
token values of the keywords.
It also
identifies the header files, #define statements, numbers, special characters,
various relational and logical operators, ignores the white spaces and comments.
It prints the output in a separate file specifying the line number .
BLOCK
DIAGRAM:

|