Develop the C or C++ source code required to solve the following problem

CS 3361 | Fall 2020 | Assignment #3 Lexical Analyzer

Assignment #3

Lexical Analyzer

Develop the C or C++ source code required to solve the following problem.

Problem

Develop a lexical analyzer in C or C++ that can identify lexemes and tokens found in a source code file provided by the user. Once the analyzer has identified the lexemes of the language and matched them to a token group, the program should print each lexeme / token pair to the screen.

The source code file provided by the user will be written in a new programming language called “DanC” and is based upon the following grammar (in BNF):

P ::= S S ::= V:=E | read(V) | write(V) | while C do S od | S;S C ::= E < E | E > E | E = E | E <> E | E <= E | E >= E E ::= T | E + T | E – T T ::= F | T * F | T / F F ::= (E) | N | V V ::= a | b | … | y | z | aV | bV | … | yV | zV N ::= 0 | 1 | … | 8 | 9 | 0N | 1N | … | 8N | 9N

Your analyzer should accept the source code file as a required command line argument and display an appropriate error message if the argument is not provided or the file does not exist. The command to run your application will look something like this:

Form: danc_analyzer

Example: danc_analyzer test_file.danc

Lexeme formation is guided using the BNF rules / grammar above. Your application should output each lexeme and its associated token. Invalid lexemes should output UNKNOWN as their token group. The following token names should be used to identify each valid lexeme:

Lexeme

Token

Lexeme

Token

Lexeme

Token := ASSIGN_OP + ADD_OP do KEY_DO

<

LESSER_OP

SUB_OP

od

KEY_OD > GREATER_OP * MULT_OP IDENT

=

EQUAL_OP

/

DIV_OP

INT_LIT <> NEQUAL_OP read KEY_READ ( LEFT_PAREN

<=

LEQUAL_OP

write

KEY_WRITE

)

RIGHT_PAREN >= GEQUAL_OP while KEY_WHILE ; SEMICOLON

CS 3361 | Fall 2020 | Assignment #3 Lexical Analyzer

Additional Solution Rules

Your solution must conform to the following rules:

1) Your solution should be able to use whitespace, tabs, and end of line characters as delimiters between lexemes, however your solution should ignore these characters and not report them as lexemes nor should it require these characters to delimit lexemes of different types.

a. Example: “while i<=n do”

i. This line will generate 5 lexemes “while”, “i”, “<=”, “n”, and “do”.

ii. This means the space between “while” and “i” separated the two lexemes but wasn’t a lexeme itself.

iii. This also means that no space is required between the lexemes “i”, “<=”, and “n”.

2) Your solution should print out “DanC Analyzer :: R<#>” on the first line of output. The double colon “::” is required for correct grading of your submission.

3) Your solution must be tested to ensure compatibility with the GNU C/C++ compiler version 5.4.0.

4) Lexemes that do not match to a known token should be reported as an “UNKNOWN” token. This should not stop execution of your program or generate an error message.

Hints

1) Draw inspiration by looking at the lexical analyzer code discussed and distributed in class.

2) Start by focusing on writing the program in your usual C/C++ development environment.

3) Once your solution is correct, then work on testing it in Linux using the appropriate version of the GNU compiler (gcc).

4) Linux/Makefile tutorials:

a. Linux Video walkthrough: http://www.depts.ttu.edu/hpcc/about/training.php#intro_linux

b. Linux Text walkthrough: http://www.ee.surrey.ac.uk/Teaching/Unix/

c. Makefile tutorial: https://www.tutorialspoint.com/makefile/index.htm

What to turn in to BlackBoard

A zip archive (.zip) containing the following files:

___Assignment3.c / ___Assignment3.cpp

o C/C++ Source code file

o Example: Eric_Rees_R123456_Assignment3.c

• Makefile

o A makefile for compiling your C/C++ file.

o This makefile must work in the HPCC environment to compile your source code file and output an executable named danc_analyzer.

CS 3361 | Fall 2020 | Assignment #3 Lexical Analyzer

Example Execution

The example execution below was run on Quanah, one of the HPCC clusters. It shows all the commands used to compile and execute my analyzer. Bolded text is text from the Linux OS, text in red are the commands I typed and executed, and the text in blue represents the output from each step.

quanah:/assignment_3$ make clean

rm -f danc_analyzer

quanah:/assignment_3$ make

gcc -o danc_analyzer Eric_Rees_R123456_Assignment3.c

quanah:/assignment_3$ ./danc_analyzer test.danc

DanC Analyzer :: R123456

f IDENT

:= ASSIGN_OP

1 INT_LIT

; SEMICOLON

i IDENT

:= ASSIGN_OP

1 INT_LIT

; SEMICOLON

read KEY_READ

( LEFT_PAREN

n IDENT

) RIGHT_PAREN

; SEMICOLON

while KEY_WHILE

i IDENT

<= LEQUAL_OP

n IDENT

do KEY_DO

f IDENT

:= ASSIGN_OP

f IDENT

* MULT_OP

i IDENT

; SEMICOLON

i IDENT

:= ASSIGN_OP

i IDENT

+ ADD_OP

1 INT_LIT

od KEY_OD

; SEMICOLON

Tags: No tags