Vous êtes sur la page 1sur 71

Pune Vidyarthi Griha’s

COLLEGE OF ENGINEERING, NASHIK.

“ LANGUAGE TRANSLATOR ”

By
Prof. Anand N. Gharu
(Assistant Professor)
PVGCOE Computer Dept.

22nd Jan 2018


.
CONTENTS :-
1. Role of lexical analysis
2. Parsing, token, pattern, lexemes lex. Error
3. Regular def. for language construct & string
4. Sequences, comments & transition diagram for
recognition of tokens, reserved word & ident.
5. Introduction to Compiler & Interpreters
6. General model of Compiler
7. Compare compiler and interpreter
8. Use of interpreter & component of interpreter
9. Overview of Lex & YACC Specifications.
What’s a compiler?
• All computers only understand machine language

This is
a program

10000010010110100100101……

• Therefore, high-level language instructions must be translated


into machine language prior to execution

3
What’s a compiler?
• Compiler
A piece of system software that translates high-level languages
into machine language
while (c!='x')

if (c == 'a' || c == 'e' || c == 'i')

printf("Congrats!");

else
program.c
if (c!='x')
Congrats!
printf("You Loser!");

prog
Compiler 10000010010110100100101……

gcc -o prog program.c


4
Compiler
• Complier:-
• These are the system programs which will
automatically translate the High level language
program in to the machine language program

Source program Target program /


High level Lang. Compiler M/C Lang. Prog.
Prog.

Database
Types of Compiler
• Cross Assembler:-
• These are the system programs which will automatically
translate the Assembly Language program compatible with
M/C A, in to the machine language program compatible with
M/C A
Source program Target program /
Assembly Lang. M/C Lang. Prog.
Cross Assembler Compatible with
Prog. Compatible
with M/C A M/C A

M/C B
Types of compiler
• Cross Compiler:-
• These are the system programs which will automatically
translate the HLL program compatible with M/C A, in to the
machine language program compatible with M/C A , but the
underlying M/C is M/C B
Source program Target program /
M/C Lang. Prog.
HLL Prog. Cross Compiler
Compatible with
M/C A

M/C B
Types of Compiler
Interpreter
- It is the language translator which execute source
program line by line with out translating them into
machine language.

- It does not generate object code.


Compiler vs Interpreter

, C++ , Visual Basic


Phases of compiler
Structure of Compiler
• Any compiler must perform two major tasks

Compiler

o Analysis ofAnalysis
the source program Synthesis
o Synthesis of a machine-language program

13
Structure of Compiler
Source
Tokens Syntactic
Program Semantic
Scanner Parser
(Character Structure Routines
Stream)
Intermediate
Representation

Symbol and
Optimizer
Attribute
Tables
(Used by all Phases of The Compiler)
Code
Generator
14
Target machine code
Structure of Compiler
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Structure Routines
Stream)
Intermediate
Scanner (Lexical Analysis) Representation
The scanner begins the analysis of the source program
by reading the input,Symbol and by character, and
character Optimizer
Attribute
grouping characters into individual words and symbols
(tokens) Tables
(Used by all
RE ( Regular expression )
Phases of Code
NFA ( Non-deterministic Finite Automata )
The Compiler)
DFA ( Deterministic Finite Automata )
Generator
LEX 15
Target machine code
Structure of Compiler
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Structure Routines
Stream)
Intermediate
Parser (Syntax Analysis) Representation
Given a formal syntax specification (typically as a
context-free grammarSymbol
[CFG]and
), the parse reads tokens Optimizer
and groups them into Attribute
units as specified by the
productions of the CFG being used.
Tables
As syntactic structure is recognized, the parser either
(Used by routines
calls corresponding semantic all directly or builds a
syntax tree. Phases of Code
CFG ( Context-Free Grammar
The) Compiler) Generator
BNF ( Backus-Naur Form )
16
GAA ( Grammar Analysis Algorithms ) Target machine code
Structure of Compiler
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Structure Routines
Stream)
Intermediate
Semantic Routines Representation
 Perform two functions
Symbol and
 Check the static semantics of each construct Optimizer
Attribute
 Do the actual translation
Tables
 The heart of a compiler
(Used by all
Phases
Syntax Directed Translation of Code
The Compiler)
Semantic Processing Techniques Generator
IR (Intermediate Representation) 17
Target machine code
Structure of Compiler
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Structure Routines
Stream)
Intermediate
Optimizer Representation
The IR code generated by the semantic routines is
Symbolinto
analyzed and transformed andfunctionally equivalent but
Optimizer
improved IR code Attribute
This phase can be very complex and slow
Tables
Peephole optimization
(Used by all
loop optimization, register
Phases allocation,
of code scheduling Code
The Compiler) Generator
Register and Temporary Management
18
Peephole Optimization Target machine code
Structure of Compiler
Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Structure Routines
Stream)
Intermediate
Code Generator Representation
 Interpretive Code Generation
 Generating Code from Tree/Dag Optimizer
 Grammar-Based Code Generator

Code
Generator
Target machine code
19
Structure of Compiler
Code Generator
[Intermediate Code Generator]

Non-optimized Intermediate
Scanner
[Lexical Analyzer] Code

Tokens

Code Optimizer
Parser
[Syntax Analyzer]
Optimized Intermediate Code
Parse tree

Code Optimizer
Semantic Process
[Semantic analyzer] Target machine code

Abstract Syntax Tree w/ Attributes

20
Lexical Analysis
Syntax Analysis
Semantic Analysis
Intermediate Code Generation
Code Optimization
Code Generation
Code Optimization
Structure of Compiler
Compiler writing tools
• Compiler generators or compiler-
compilers
oE.g. scanner and parser
generators
oExamples : Yacc, Lex

52
Overview of Lex & YAAC
 Lex:
 Theory.
 Execution.
 Example.
 Yacc:
 Theory.
 Description.
 Example.
 Lex & Yacc linking.
 Demo.

53
Lex
 lex is a program (generator) that generates lexical analyzers, (widely
used on Unix).
 It is mostly used with Yacc parser generator.
 Written by Eric Schmidt and Mike Lesk.
 It reads the input stream (specifying the lexical analyzer ) and
outputs source code implementing the lexical analyzer in the C
programming language.
 Lex will read patterns (regular expressions); then produces C code
for a lexical analyzer that scans for identifiers.

54
STRUCTURE OF LEX
Lex
◦ A simple pattern: letter(letter|digit)*
 Regular expressions are translated by lex to a computer
program that mimics an FSA.
 This pattern matches a string of characters that begins with a
single letter followed by zero or more letters or digits.

56
Lex

 Some limitations, Lex cannot be used to recognize nested


structures such as parentheses, since it only has states and
transitions between states.

 So, Lex is good at pattern matching, while Yacc is for more


challenging tasks.
57
Lex

Pattern Matching Primitives

58
Lex

• Pattern Matching examples.


59
Lex
……..Definitions section……
%%
……Rules section……..
%%
……….C code section (subroutines)……..

• The input structure to Lex.

•Echo is an action and


predefined macro in lex that
writes code matched by the
pattern.

60
Lex

Lex predefined variables.

61
Lex

 Whitespace must separate the defining term and the associated expression.

 Code in the definitions section is simply copied as-is to the top of the generated
C file and must be bracketed with “%{“ and “%}” markers.

 substitutions in the rules section are surrounded by braces ({letter}) to


62
distinguish them from literals.
Yacc
 Theory:
◦ Yacc reads the grammar and generate C code for a parser .

◦ Grammars written in Backus Naur Form (BNF) .

◦ BNF grammar used to express context-free languages .

◦ e.g. to parse an expression , do reverse operation( reducing the


expression)

◦ This known as bottom-up or shift-reduce parsing .

◦ Using stack for storing (LIFO).

63
STRUCTURE OF YACC
Yacc
• Input to yacc is divided into three sections.

... definitions ...


%%
... rules ...
%%
... subroutines ...

65
Yacc
 The definitions section consists of:
◦ token declarations .
◦ C code bracketed by “%{“ and
“%}”.

◦ the rules section consists of:


 BNF grammar .

 the subroutines section consists of:


◦ user subroutines .

66
yacc& lex in Together
• The grammar:
program -> program expr | ε
expr -> expr + expr | expr - expr | id

• Program and expr are nonterminals.


• Id are terminals (tokens returned by lex) .

• expression may be :
o sum of two expressions .
o product of two expressions .
o Or an identifiers
67
Lex file

68
Yacc file

69
Linking lex&yacc

70
Thank You
Gharu.anand@gmail.com

1/22/2018 71

Vous aimerez peut-être aussi