Académique Documents
Professionnel Documents
Culture Documents
COMPILER DESIGN
LAB MANUAL
YEAR 2010-11
III/IV B. Tech CSE II Semester
INDEX
S.No 1 2 3 4 5 6 7 Objectives of the lab Requirements Lab Syllabus Programs (JNTU) Introduction About Lab Solutions for Programs Topics beyond the Lab Syllabus References Contents Page.no 3 4 5 9 15 34 35
Page 2
Page 3
REQUIREMENTS
Hardware Requirements: Processsor RAM: Hard Disk Software Requirements: Lex and Yacc tools.(A Linux Utility) Language: C/C++ Pentium I 64MB 100 MB
System Configuration on which lab is conducted Processor RAM HDD Monitor Keyboard Operating System Mouse PIV(166 MHZ) 64 MB 100 MB 14Color Multimedia Windows XP Scroll
Page 4
Consider the following mini language, a simple procedural high-level language, only operating on integer data, with a syntax looking vaguely like a simple C crossed with Pascal. The syntax of the language is defined by the following BNF grammar <program> ::= <block> <block> ::= { <variabledefinition> <slist> } | {<slist>} <variabledefinition> ::= int <vardeflist> ; <vardeflist> ::= <vardec> I <vardec> , <vardetlist> <vardec> ::= <identifier> I <identifier> [ <constant> ] <slist> ::= <statement> I <statement> ; <slist> <statement> ::= <assignment> I <ifstatement> I <whilestatement> I <block> I <printstatement> I <empty> <assignment> ::= <identifier> = <expression> I <identifier> [ <expression> ] = <expression> <ifstatement> ::= if <bexpression> then <slist> else <slist> endif I if <bexpression> then <slist> endif <whilestatement> ::= while <bexpression> do <slist> enddo <printstatement> ::= print ( <expression> ) <expression> ::= <expression> <addingop> <term> 1 <term> 1 <addingop> <term> <bexpression> ::= <expression> <relop> <expression> <relop> ::= < I <= I == I >= I > I != <addingop> ::= + 1<term> ::= <term> <multop> <factor> I <factor> <multop> ::= ' 1/ <factor> ::= <constant> I <identifier> 1 <identifier> [ <expression>] I ( <expression> )
Page 5
Page 7
Page 8
Compiler is a System Software that converts High level language to low level language. We human beings cant program in machine language (low level lang.) understood by Computers so we program. In high level language and compiler is the software which bridges the gab between user and computer. Its a very complicated piece of software which took 18 man years to build first compiler .To build this software it is divided into six phases which are Lexical Analysis Syntax Analysis Semantic Analysis Intermediate Code Generation Code Optimization Code Generation.
This course is a thorough introduction to compiler design, focusing on more low-level and systems aspects rather than high-level questions such as polymorphic type inference or separate compilation. You will be building several complete end-to-end compilers for successively more complex languages, culminating in a mildly optimizing compiler for a safe variant of the C programming language to x86-64 assembly language. Goals After this course you should know how a compiler works in some depth. In particular, you should understand the structure of a compiler, and how the source and target languages inuence various choices in its design. It will give you a new appreciation for programming language features and the implementation challenges they pose, as well as for the actual hardware architecture and the runtime system in which your generated code executes. Understanding the details of typical compilation models will also make you a more discerning programmer. You will also understand some specic components of compiler technology, such as lexical analysis, grammars and parsing, type-checking, intermediate representations, static analysis, common optimizations, instruction selection, register allocation, code generation, and runtime organization. The knowledge gained should be broad enough that if you are Department of CSE, Padmasri Dr. BVRIT Page 9
For a trivial example, consider a program to delete from the input all blanks or tabs at the ends of lines.
%% [ \t]+$ ;
An overview of Lex
is all that is required. The program contains a %% delimiter to mark the beginning of the rules, and one rule. This rule contains a regular expression which matches one or more instances of the characters blank or tab (written \t for visibility, in accordance with the C language convention) just prior to the end of a line. The brackets indicate the character class made of blank and tab; the + indicates ``one or more ...''; and the $ indicates ``end of line,'' as in QED. No action is specified, so the program generated by Lex (yylex) will ignore these characters. Everything else Department of CSE, Padmasri Dr. BVRIT Page 12
; printf(" ");
The finite automaton generated for this source will scan for both rules at once, observing at the termination of the string of blanks or tabs whether or not there is a newline character, and executing the desired rule action. The first rule matches all strings of blanks or tabs at the end of lines, and the second rule all remaining strings of blanks or tabs. Lex can be used alone for simple transformations, or for analysis and statistics gathering on a lexical level. Lex can also be used with a parser generator to perform the lexical analysis phase; it is particularly easy to interface Lex and Yacc [3]. Lex programs recognize only regular expressions; Yacc writes parsers that accept a large class of context free grammars, but require a lower level analyzer to recognize input tokens. Thus, a combination of Lex and Yacc is often appropriate. When used as a preprocessor for a later parser generator, Lex is used to partition the input stream, and the parser generator assigns structure to the resulting pieces. The flow of control in such a case (which might be the first half of a compiler, for example) is shown in Figure 2. Additional programs, written by other generators or by hand, can be added easily to programs written by Lex.
lexical rules | v +---------+ | Lex | +---------+ | v +---------+ Input -> | yylex | -> input +---------+ grammar rules | v +---------+ | Yacc | +---------+ | v +---------+ | yyparse | -> Parsed +---------+
Yacc users will realize that the name yylex is what Yacc expects its lexical analyzer to be named, so that the use of this name by Lex simplifies interfacing. Lex generates a deterministic finite automaton from the regular expressions in the source. The automaton is interpreted, rather than compiled, in order to save space. The result is still a fast Department of CSE, Padmasri Dr. BVRIT Page 13
The Yacc library (-ly) should be loaded before the Lex library, to obtain a main program which invokes the Yacc parser. The generations of Lex and Yacc programs can be done in either order.
Page 14
AIM - 1: Design a Lexical analyzer. The lexical analyzer should ignore redundant s tabs and new lines. It should also ignore comments. Although the syntax specification s that identifiers can be arbitrarily long, you may restrict the length to some reasonable Value.
ALGORITHM: We make use of two functions. look up() it takes string as argument and checks its presence in the symbol table. If the string is found then returns the address else it returns NULL . insert() it takes string as its argument and the same is inserted into the symbol table and the corresponding address is returned. Step1: Step2: Start function scan lexbuf[50] of char tokenvalue[10] begin store scanned char in c if c blank or tab,then do nothing else if c is \n,then line_number :=line_number=1 else if c is a digit token value:=value of the digit and successive digits return tokentype Step3: p:=lookup(lexbuf) if p:=NULL, then Page 15
Test Data: INPUT Valid Data Set: InValid Data Set: Limiting Data Set: Result INPUT file xx.txt { int t1,t2 t1=2 t2=t1*3/2; if t2>5 then print(t2); else { int t3; t3=99; t2=-25 print(-t1+t2+t3); } End if Department of CSE, Padmasri Dr. BVRIT Page 16 Input should be a file name Not applicable Not applicable
OUTPUT SOB int ID SEPE ID ENDST ID ASSIGN ENDST ID ASSIGN ID MULOP DIVOP ENDST If ID GT NUM Then print OPENPRA ID CLOSEPRA ENDST else SOB int ID ENDST ID ASSIGN NUM ID ASSIGN SUBOP NUM ENDST OPENPRA SUBOP ID ADDOP ID MULOP ID CLOSEPRA ENDST EOB endif
Page 17
VIVA QUESTIONS
1. What is lexical analyzer? 2. Which compiler is used for lexical analyzer? 3. What is YACC? 4. What is the output of Lexical analyzer? 5. What is LEX source Program? 6. What is the role of a Lexical analyzer in compilation process? 7. Define the terms: tokens, Pattern and Lexeme. 8. Give example of lexical error 9. Give examples of nontokens in C program 10. What are other tasks performed by Lexical analyzer other than dividing text to tokens. 11. Can we combine Lexical analysis phase with Parsing? If not why? 12. What are the error recovery strategies used by Lexical analyzer? 13. What are the advantages of having Lexical analysis as a separate phase? 14. Find the number of tokens in following code if ( x > y) z=0;
Page 18
AIM - 2: Implement the lexical analyzer using JLex, flex or lex other lexical analyzer generating tools. ALGORITHM: Step1: Start
Step2: Declare the declarations for the given language tokens like digit, alphabet, white space, delimiters, etc. digit[0-9] letter[A-Z a-z] delim[\t\n] W${delim}+ID{(letter)(letter/digit)}+ Integer {digit}+ %% {ws} {print (SpecialCharacters)} {ID} {print(Identifiers)} {digit} {print(\n Integer)} if {printf(keyword)} else {print(keyword)} & & {print(logoical operators)} >{print(logoical operators)} <{print(logoical operators)} <={print(logoical operators)} >={print(logoical operators)} = {printf(\n \n)} !{printf(\n \n)} + {printf(arithmetic operator)} - {printf (arithmetic) * {printf(arithmetic)} Department of CSE, Padmasri Dr. BVRIT Page 19
Test Data: INPUT Valid Data Set: InValid Data Set: Limiting Data Set: Result OUTPUT Input should be a file name Not applicable Not applicable
Page 20
VIVA QUESTIONS 1. What is Parsing? 2. Construct the parsing table for the given? S1 iC+SS1|a S1 eS1| C b 3. What is Token? 4. What is Jlex? 5. What is Flex? 6. What is advantage of using Lex tool over hand written Lexical analyzer module? 7. What is default c file name Lex produces after compiling Lex program to c program?
8. Give lex program to count no of lines in text.
9. Give lex program to deletes all blanks or tabs at the ends of lines. 10. What are the routines compulsory in subroutine section? 11. What is the output of Lex compiler? 12. Write the general form of translation rule section of LEX program? 13. Explain the purpose of each section of LEX program? 14. What are three different sections of LEX program? 15. What is LEX?
Page 21
AIM - 3:
ALGORITHM Step1: Step2: Step3: Start declare w[10] as char and Z as an array enter the string with $ at the end
Step4: if (A(w[z]) then increment z and check for (B(w[z])) and if satisfies increment z and check for d if d is present then increment and check for (D(w[z])) Step5: if step 4 is satisfied then the string is accepted Else string is not give for the grammer A-> bc/ab in the loop A(int k) Describe the grammer b->c/d in the loop B (int k) Similarly describe the grammer D->d/abcd if steps7,8,9 are satisfied accordingly string is accepted Else string is not accepted Step10: Stop Step6: Step7: Step8: Step9: Test Data: INPUT Valid Data Set: InValid Data Set: Limiting Data Set: Result INPUT file { int a, ba[10]; a=a*a; Department of CSE, Padmasri Dr. BVRIT Page 22 aa.txt Input should be a file name Not applicable Not applicable
OUTPUT
Successful parsing
Page 23
VIVA QUESTIONS 1. What is Predictive parser? 2. How many types of analysis can we do using Parser? 3. What is Recursive Decent Parser? 4. How many types of Parsers are there? 5. What is LR Parser? 6. What is meant by LL(1) ? 7. What are the requirements for LL(1) parser? 8. Is LL(1) parser is a top down parser or bottom up parser. 9. Can every unambiguous grammer be parsed by LL(1) 10. What is the advantage of eliminating left recursion 11. What is the advantage of left factoring 12. Is every LL(1) is unambiguous? 13. What is an ambiguous grammar
14. How to eliminate left recursion for A A | 15. How to left factor the grammar
A 1 | 2 | 3
Page 24
AIM - 4:
ALGORITHM
Start Initially the parser has s0 on the stack where s0 is the initial state and w$ is in buffer Set ip point to the first symbol of w$ repeat forever, begin Let S be the state on top of the stack and a symbol pointed to by ip If action [S, a] =shift S then begin Push S1 on to the top of the stack Advance ip to next input symbol Step7: Else if action [S, a], reduce A->B then begin Pop 2* |B| symbols of the stack Let S1 be the state now on the top of the stack Step8: Output the production A->B End Step9: else if action [S, a]=accepted, then return Else Error() End Step10: Stop
Test Data: INPUT Valid Data Set: InValid Data Set: Limiting Data Set: Result [ [ [ [ ] $ lex parser.l ] $ Yacc -d parser.y ] $ cc lex.yy.c ] $ ./a.out Page 25 y.tab.c -ll -ld Input should be a file name Not applicable Not applicable
OUTPUT
2+3 5.000
Page 26
VIVA QUESTIONS 1. What is LALR parsing? 2. What is Shift reduced parser? 3. What is the operations of Parser? 4. What is the use of parsing table? 5. What is bottom up parsing? 6. What is meant by LALR parser 7. What is the other name of LR parser 8. What are the advantages of bottom up parsers? 9. 11. 12. Can you design a LR parser for left recursive grammar Compare LALR(1) with SLR(1) Compare table sizes of LALR(1) and SLR(1)
10. Compare LALR(1) with LR(1)?
13. For a given grammar, if SLR(1) has n1 states, LALR(1) has ne states & LR(1) has n3 states, what is the relation on states? 14. How the number of states in LALR(1) are reduced? 15. Which type of items are used by LR(1) ?
Page 27
AIM - 5: Convert the BNF rules into YACC form and write code to generate abstract syntax tree. ALGORITHM Step1: Step2: Start declare the declarations as a header file {include<ctype.h>} Step3: Step4: token digit define the translations rules like line, expr, term, factor Line:exp \n {print(\n %d \n,$1)} Expr:expr+ term ($$=$1=$3} Term:term + factor($$ =$1*$3} Factor Factor:(enter) {$$ =$2) %% Step5: Step6: define the supporting C routines Stop
Test Data: INPUT Valid Data Set: InValid Data Set: Limiting Data Set: Result [ [ ] $ lex int.l ] $ Yacc -d int y Page 28 Input should be a file name Not applicable Not applicable
OUTPUT
POS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Arg1 a to a t1
Arg2 b FALSE b
Result to 5 +1 a 5
a t2 a t3
b FALSE b
t2 10 +3 a 5
a t4 a t5
b FALSE b
t4 +=5 +5 c 17
a t6
t6 c
Page 29
VIVA QUESTIONS
1. What is Abstract Syntax tree? 2. What are BNF Rules? 3. What is DAG representation? 4. How LALR(1) states are generates? 5. In which condition the user has to supply more information to YACC? 6. What is YACC? 7. What is the routine yyerror()? 8. How does it handles errors? 9. What is default c file name after compiling yacc program to c program? 10. How does YACC uses Lexical analyzer? 11. What are three different sections in YACC program? 12. What is the role of yylex() function in YACC program? 13. What is the output of yacc compiler? 14. YACC implements which type of parser? 15. What is yyparse()?
Page 30
AIM - 6: Write program to generate machine code from the abstract syntax tree generated by the Parser .The following instruction set may considered as target code.
ALGORITHM Step1: Step2: Step3: Start for every three address statement of the form x=y op z begin
Step4: Call getreg() to obtain the location L which the computation y op z should be performed Step5: Obtain the current location of the operand y by consulting its address descriptor ,and if the values of Y are currently both in the memory location as well as in the register, then prefer the register.If the value of y is not currently available in 1,then generate an instruction MOV y,l Step6: Generate the instruction OP Z,l and update the address descriptor of X to indicate that X is now available in l and in register then update t\ its descriptor to indicate that it will contain the run time value of x Step7: If the current values of y ad/or z are in register and we have no further use for them,and they are live at the end of the block,then after the register descriptor to indicate that after the execution of the statem,ent x=y op z,those registers will no longer contain y and / or z. Step8: store all results Step9: Stop Test Data: INPUT Valid Data Set: InValid Data Set: Limiting Data Set: Input should be a file name Not applicable Not applicable
Page 31
=+ 1 a a a +
2 1 2 3 a 2 3 4 b c
Page 32
VIVA QUESTIONS
1. What is target code? 2. What is machine code? 3. What is Cross compiler? 4. Give the example for cross compiler? 5. What is the difference between syntax & Semantics? 6. What is the role of intermediate code generator in compilation process? 7. What are different intermediate code representations? 8. What are different representations to implement three address code statement? 9. What is three address code statements? 10. What is polish notation? 11. What is the advantage of generating intermediate code? 12. What is DAG? 13. What is the advantage of DAG? 14. Write three address code for -(a+b)+(c+d)+(a+b+c)? 15. What is difference between tree and DAG?
Page 33
IDENTIFIER PROBLEM STATEMENT Identifier is an entity which starts from a letter and then it can contain both letters and digit any other special character is not allowed in the identifier. While checking for identifier we normally use DFA as it is done in lexical analysis state which works with regular grammar. DFA for identifier consists of three states first state is accepting only letters and then moving to second state when it got a letter. Second state can accept both letters and digits and comeback to itself when it got one. Second state can also accept terminating symbol (delimiter) which lead it to third state which identifies it as an identifier.
TO IDENTIFY WHETHER A GIVEN STRING IS AN IDENTIFIER OR NOT
TO CHECK WHETHER A STRING IS A KEYWORD OR NOT TO FIND WHETHER A STRING IS A CONSTANT OR NOT
Page 34
REFERENCES
BOOKS:
Lex&yacc
WEBSITES:
Page 35