electrofriends.com  

...bringing innovative minds together       | HOME | ABOUT US | ARTICLES | SOURCE CODES | PROJECTS | EBOOKS |  FEEDBACK |  

Lexical analyzer

 Modules  and Algorithms:

Functions:

Verify(): The lexical analyzer uses the verify operation to determine whether  there is entry for a lexeme in the symbol table.

Data  structures:

Structure :  A structure in C is a collection of variables which contains related data items of similar and /or dissimilar data types but logically related items. Each variable in the structure represents an item and is called as a member or field of the structure. Complex data can be important represented in more meaningful way using structures and is one of the very features available in C. 

An array of structures is basically a array of records. Each an every record has the same format consisting of similar or dissimilar data types but logically related entities.

 

Algorithms:

Procedure main

begin

          if symb=’#’  then

          begin

                   advance to next token in input file

                   if symb=’i’ then

                   begin

                             advance to next token in input file

                             while symb!=’\n’ do

                             begin

                                      advance to next token in input file

                             end {while }

                             print symb is a preprocessor directive

                   end {if symb=’i’}

                   if symb=’d’ then

                   begin

                             advance to next token input file

                             while symb!=’ ‘ do

                             begin

                                      advance to next token in input file

                             end{while}

                             advance to next token in input file

                             print symb is a constant

                             advance to next token in input file

                             while symb!=’\n’ do

                             begin

                                      advance to the next token in input file

                             end {while}

                   end {if symb=’d’}

end {if symb=’#’}

if symb is a alphabet  or symb=’_’ then

            begin

                   advance to the next token in input file

                   while symb is a digit or alphabet or symb=’_’ do

                   begin

                             advance to the next token of input  file

                   end {while}

                   call function verify  to check whether symb is a identifier or keyword

          end {if}

          if symb=’+’ then

          begin

                   advance to the next token in input file

                   if symb=’+’ 

                             print symb is ++ operator

                   else

                             ungetc symb from the input file

                             print symb is + operator

          end {if}

          if symb=’-’ then

          begin

                   advance to the next token in input file

                   if symb=’-’ 

                             print symb is -- operator

                   else

                             ungetc symb from the input file

                             print symb is - operator

       end {if}

          if symb=’|’ then

          begin

                   advance to the next token in input file

                   if symb=’|’ 

                             print symb is logical or  operator

                   else

                             ungetc symb from the input file

                             print symb is bitwise or operator

          end {if}

          if symb=’*’ then

          begin

                    print symb is a multiplication operator

          end {if}

          if symb=’?’ then

          begin

                   print symb  is a conditional operator

          end{if}

          if symb=’!’or symb=’>’or symb=’<’then

          begin

                   advance to the next token in input file

                           if symb=’=’

                                    print symb is a relational operator

                           else

                                   ungetc symb from output file

                                   print symb is a operator

          end{if}

          if symb=’=’

       begin

                   advance to next token in input file

                           if symb=’=’then

                                    print symb is  equal to operator

               else

                                     ungetc symb from output file

             print symb is assignment operator

end{if}

if symb=’&’ then

begin

                             advance to next token in input file

                             if symb=’&’ then

                                       print symb is a logical and operator

                    else

                                      print & symb is an address operator

end{if}

if symb=’/’ then

begin

                             advance to next token in input file

                             if symb=’*’ then

                             begin

                                      advance to next token in input file

                                      while symb!=’/’ do

                                                advance to next token in input file

                                      end{while}

                             end{if}

                             else if symb=’/’ then

                   begin

                                      advance to next token in input file

                                      while symb!=’\n’ do

                                                advance to next token in input file

                                      end{while}

                            end{if}

                             else

                                      ungetc symb from output file

                                      print symb is a division operator

end{if}

if symb is a digit then

            begin

                   advance to next token in input file

                   while symb is a digit or symb=’.’ then

                   begin

                             advance to next token in input file

                    end {while}

                    print symb is a number

          end{if}

          if symb =’”’ then

          begin

                   advance to next token in input file  

                   while symb!=’”’ do

                   begin

                             advance to next token in input file

                   end{while}

                   print symb is a string

          end{if}}

          if symb= ‘{‘ then

                   print open brace

          if symb=’}’ then

                   print close brace

          if symb=’[‘ then

                   print  open bracket

          if symb=’]’ then

                   print close bracket

          if symb=’(‘ then

                   print open parenthesis

          if symb=’)’ then

                   print close parenthesis

end {procedure main}

procedure verify

begin

        scan the symbol table to check if encountered token exists

        if exists

            return token value

end{procedure}


USER MANUAL

The code for modules appears in two files: lex.c and output.c. The file lex.c contains the main source code of the lexical analyzer. And the input to the lexical analyzer is contained in test.c. Under the DOS operating system, the program is compiled by using alt F9, and is executed by using ctrl F9. The output i.e token types are stored in the output file, output.txt

Sample Input:

#include<stdio.h>

#include<stdlib.h>

#define abc 100

void main()

{

          int a_,b=30;

          printf("enter 2 no.s\n"); // printf statement

          scanf("%d%d",&a,&b); 

         /* scanf

          statement*/

          if(a<20)

          a=a+1;

}

 Sample Output:
 

LINE NO                TOKENS
-----------------------------------------------

    1:          #include<stdio.h> is a header file

    2:          #include<stdlib.h> is a header file

    3:          #define statement: abc is a constant

    4:          void: token value : 7

                main :identifier, token value : 18

                (: open parenthesis

                ): close parenthesis

    5:          {: open brace

    6:          int: token value : 1

                a_ :identifier, token value : 18

                , : comma

                b :identifier, token value : 18

                =: assignment operator

                30 is a number

                ; : semi colon

    7:         printf: token value : 5

                (: open parenthesis

                enter 2 no.s\n : is a string

                ): close parenthesis

                ;: semi colon

    8:         scanf: token value : 6

                (: open parenthesis

               %d%d : is a string

                ,: comma

               &a: address operator

                , : comma

                &b: address operator

                 ): close parenthesis

                ;: semi colon

    9:

    10:

    11:        if: token value : 8

                (: open parenthesis

                a :identifier, token value : 18

                <: less than operator

                20 is a number

                ): close parenthesis

    12:        a: token value : 18

                =: assignment operator

                a: token value : 18

                +: plus operator

                1 is a number

                ;: semi colon

    13:        }: close parenthesis


CONCLUSION:

Generally, when syntactic analysis is being carried out by the parser it may call upon the scanner  for tokenizing the input. But the LEXICAL ANALYZER designed by us  is an independent program. It takes as input a file  with an executable code in C. There fore, the parser cannot make use of the designed scanner as and when required.

Consider as an example an array ch[20].The designed lexical analyzer will tokenize 'ch' as an identifier,'[' as an opening brace,'20' as a number, and ']' as a closing brace. But the parser might require a[5] to be identified as an array. Similarly, there may arise a number of cases where the parser has to identify a token by a different mannerism than the one specified and designed. Hence, we conclude that the LEXICAL ANALYZER so designed is an independent program which is not flexible.

| Next Page | Previous page |

Other Related projects and articles

Doctor's Diary using JSP Technology

Device Switching Using PC’s  Parallel Port

Shuffle game using C++

Sudoku Solver using C++

Turbo C graphics programming

Mouse programming in C/C++

Sorting of numbers using C++ graphics

For more projects click here

 

 

 | HOME | ABOUT US | ARTICLES |  SOURCE CODES | PROJECTS |  SITEMAP |  EBOOKS | FEEDBACK |   



  Copyrights © 2005-2007 electrofriends.com, All rights reserved. webmaster@electrofriends.com