Vous êtes sur la page 1sur 35

Modern Compiler Design

T1 - Overview

Mooly Sagiv and Eran Yahav


School of Computer Science
Tel-Aviv University

yahave@post.tau.ac.il
http://www.cs.tau.ac.il/~yahave

1
Who
Eran Yahav
Schrieber Open-space
Tel: 6405358
yahave@post.tau.ac.il
Wednesday 14:00-16:00
http://www.cs.tau.ac.il/~yahave

2
What

Compiler

txt exe
Frontend Semantic Backend
Source Executable
(analysis) Representation (synthesis)
text code

3
Say What?

Compiler

txt exe
Frontend Semantic Backend
Source Executable
(analysis) Representation (synthesis)
text code

txt Lexical Syntax AST Symbol Inter. Code exe


Analysis Analysis Table Rep.
Gen.
etc.
Parsing (IR)
Turkish Executable
Coffee code

4
How
txt Lexical Syntax AST Symbol Inter. Code exe
Analysis Analysis Table Rep.
Gen.
etc.
Parsing (IR)
Turkish Executable
Coffee code

JLex javaCup Java GC Lib


Assembler

5
How II

 Groups of 3-4 students


 Submit assignments on schedule
 In case of doubt – ask questions

6
Why?

 Useful techniques and algorithms


 Lexical analysis / parsing
 Semantic representation
 …
 Register allocation
 Understand programming languages better
 Understand internals of compilers

7
Today
txt Lexical Syntax AST Symbol Inter. Code exe
Analysis Analysis Table Rep.
Gen.
etc.
Parsing (IR)
Turkish Executable
Coffee code

Goals:
•Understand project scope
•Learn how to use JLex

8
Turkish Coffee
 (extended) subset of Java
 Main features
 Object oriented
• Objects, virtual method calls, but no overloading
 Strongly typed
• Primitives for int, boolean, string
• Reference types, array types
 Dynamic allocation and Garbage Collection
• Heap allocation, automatic deallocation
 Run-time checks
• Null references, array bounds, negative array size
• Adapted with permission from Cornell course material by Radu
Rugina

9
Good News

 No “static” modifier
 No interfaces
 No method overloading
(but still allow overriding)
 No exceptions
 No packages
 No multiple files to handle

10
Better News

 Project to be implemented in Java


 Turkish Coffee language is still rich enough for
doing interesting things

11
Jumping into the water
/** Sort the array a[] in ascending order
** using an insertion sort.
*/
void sort(int a[], int size) {
for (int i = 1; i < size; i++) {
// a[0..i-1] is sorted
// insert a[i] in the proper place
int x = a[i];
int j;
for (j = i-1; j >=0; --j) {
if (a[j] <= x)
break;
a[j+1] = a[j];
}
// now a[0..j] are all <= x
// and a[j+2..i] are > x
a[j+1] = x;
}
} // sort
12
Jumping into the water

class HelloTest {
public static void main(String[] args) {
Hello greeter = new Hello();
greeter.speak();
}
}
class Hello {
void speak() {
System.out.println(“I know Java, really!");
}
}

(see http://www.cs.wisc.edu/~solomon/cs537/java-tutorial.html)

13
Jumping into the water
class Pair { int x, y; }

C++ Java
Pair origin; Pair origin = new Pair();
Pair *p, *q, *r; Pair p, q, r;
origin.x = 0; origin.x = 0;
p = new Pair; p = new Pair();
p -> y = 5; p.y = 5;

q = p; q = p;
r = &origin; N/A

(see http://www.cs.wisc.edu/~solomon/cs537/java-tutorial.html) 14
Jumping into the water

p = new Pair();
// ...
q = p;
// ...
delete p;
q -> x = 5; // oops!

15
Jumping into the water

 Download recent SDK


 Download JLex
 Download javaCup
 Use of Eclipse is recommended
 Java
 On-line tutorial
 Books (e.g., Java Tutorial 519.836)
• Bruce Eckel’s Thinking in Java

16
Lexical Analysis with JLex

 JLex – lexical analyzer generator


 Input: spec file
 Output: a lexical analyzer
 A Java program
text

Lexical
spec JLex .java javac
analyzer

tokens
17
JLex Spec File
Possible source
User code of javac errors
down the road
 Copied directly to Java file
%%
DIGIT= [0-9]
JLex directives LETTER= [a-zA-Z]
 Define macros, state names
YYINITIAL
%%
Lexical analysis rules
 Optional state, regular expression, action
 How to break input to tokens
 Action when token matched

{LETTER}
({LETTER}|{DIGIT})*

18
User Code
package TC.Lexer;

import TC.Error.*;
import TC.Parser.sym;


any lexer-helper Java code

19
JLex Directives

 Directives - control JLex internals


• %char
• %line
• %class class-name
• %cup
 State definitions
 %state state-name
 Macro definitions
 Macro-name = regex

20
Regular Expressions

$ end of a line
. (dot) any character except the newline
"..." ignore meaning
{name} macro expansion
* zero or more repetitions
+ one or more repetitions
? zero or one repetitions
(...) grouping within regular expressions
[...] class of characters - any one character enclosed in brackets
a – b range of characters
[^…] negated class – any one not enclosed in brackets

21
Example Macros

ALPHA=[A-Za-z_]
DIGIT=[0-9]
ALPHA_NUMERIC={ALPHA}|{DIGIT}
IDENT={ALPHA}({ALPHA_NUMERIC})*
NUMBER=({DIGIT})+
WHITE_SPACE=([\ \n\r\t\f])+

22
Lexical Analysis Rules

 Rule structure
 [states] regexp { action }

 Priority for rule matching longest string


 More than one match for same length – priority
for rule appearing first !
 Important: rules given in a JLex specification
should match all possible input !

23
Action Body

 Java code
 Can use special methods and vars
 yytext()
 yyline,yychar (when enabled)
 Lexer state transition
 yybegin(state-name)
 YYINITIAL

24
More on Lexer States

 Tokenize differently according to context

// this conditon checks if x > y


if (x>y) {…
}

Example
“if” is a keyword token when in program text
“if” is part of comment text when inside a comment

25
<YYINITIAL> {NUMBER} {
return new Symbol(sym.NUMBER, new Token(yytext(), yyline,yychar));
}
<YYINITIAL> {WHITE_SPACE} { }

<YYINITIAL> "+" {
return new Symbol(sym.PLUS, new Token(yytext(), yyline, yychar));
}
<YYINITIAL> "-" {
return new Symbol(sym.MINUS, new Token(yytext(), yyline, yychar));
}
<YYINITIAL> "*" {
return new Symbol(sym.TIMES, new Token(yytext(), yyline, yychar));
}

...

<YYINITIAL> "//" { yybegin(COMMENTS); }


<COMMENTS> [^\n] { }
<COMMENTS> [\n] { yybegin(YYINITIAL); }
<YYINITIAL> . { return new Symbol(sym.error, null); }

26
Putting it all together –
count number of lines
File: lineCount
import java_cup.runtime.*;
%%
%cup
%{
private int lineCounter = 0;
%}

%eofval{
System.out.println("line number=" + lineCounter);
return new Symbol(sym.EOF);
%eofval}

NEWLINE=\n
%%
<YYINITIAL>{NEWLINE} {
lineCounter++;
}
<YYINITIAL>[^{NEWLINE}] { } 27
Putting it all together –
count number of lines

text
lineCount JLex lineCount.java

Lexical
java JLex.Main lineCount javac
analyzer

javac *.java
Main.java tokens

sym.java

JLex and javaCup must be in the CLASSPATH


28
Running the Lexer
import java.io.*;
import java_cup.runtime.*;

public class Main {


public static void main(String[] args) {
Symbol currToken;
try {
FileReader txtFile = new FileReader(args[0]);
Yylex scanner = new Yylex(txtFile);
do {
currToken = scanner.next_token();
// do something with currToken
} while (currToken.sym != sym.EOF);

} catch (Exception e) {
throw new RuntimeException("IO Error (brutal exit)");
}
}
}

(Just for testing Lexer as stand-alone program) 29


Sym.java File

public class sym {


public static final int EOF = 0;

}

• Defines symbol constant ids


• Tells parser what is the token returned by lexer
• Actual value doesn’t matter
• in the future will be generated by javaCup

30
Common Pitfalls

 Classpath
 Path to executable
 Define environment variables
 JAVA_HOME
 CLASSPATH
 Note the use of . (dot) as part of package
name / directory structure
 e.g., JLex.Main

31
Assignment 1

 class Token
 At least - id, value, line
 Should extend java_cup.runtime.Symbol
 Numeric token Ids in sym.java
• Will be later generated by javaCup
 class Compiler
 class LexicalError

 Don’t forget to generate Lexer and recompile Java when you


change the spec

 You need to download and install both JLex and javaCup

32
Token Class

import java_cup.runtime.Symbol;

public class Token extends Symbol {


public int id;
public Object value;

}

33
(some of the) JLex directives to
be used

%cup (integrate with cup)


%line (count lines)
%char (count chars)
%type Token (pass type Token)
%class Lexer (gen. lexer class)

34
http://www.cs.tau.ac.il/~yahave

35

Vous aimerez peut-être aussi