Vous êtes sur la page 1sur 47

Table of Contents

1. Introduction
2. Chapter 01
i. Lecture (2014.08.28)
ii. Lecture (2014.09.02)
iii. Lecture (2014.09.04)
iv. Reading Questions
3. Chapter 02
i. Lecture (2014.09.09)
ii. Lecture (2014.09.11)
iii. Lecture (2014.09.16)
iv. Lecture (2014.09.18)
v. Lecture (2014.09.25)
vi. Reading Questions 2.1
vii. Reading Questions 2.2
4. Chapter 03
i. Lecture (2014.10.02)
ii. Lecture (2014.10.07)
5. Chapter 04
i. Lecture (2014.10.28)
6. Pop-Quizes
7. Exam review
i. Midterm Review

CS-3304 : Comparative Languages


This is my notebook for CS-3304 (Spring '14)

Chapter 01
Introduction to Programming Language Design

Lecture (2014.08.28)
Programming Language Groups
Object Oriented Languages (OO): C++, Java, Objective C
Functional Languages: Lisp (Only Recursion, No Iteration)
Recursion => additional Overhead (Runtime Stack)
Logic Languages: Prolog

2 Kinds of Languages
Imperative (Procedural): Java, C, C++
Concerned with HOW the computer does something
i.e. YOU GO
command based
Functional (Declarative):
Concerned with WHAT the computer is doing

Computable Language Traits


Sequencing: ( 1, 2, 3... )
Looping ( While this Do something )
Decisions ( if this Do something )

Programming Language Evolution


Fortran
Used for numerical computation
No if control statements
Used GOTOs :
~if something --> GOTO
Think assembly
Cobalt
Good for report generation
Algol
Introduced if control statements
designed by a committee (TOO MUCH STUFF!)

Important Concepts in Software Engineering


Decomposition (Breaking down a problem into smaller pieces)
Information Hiding (hide implementation and details)
Encapsulation vs. Information Hiding
Mutually Exclusive???
Encapsulation:
Think structs of records. An object that holds multiple data members within it. Holding different data within
one object.
Information Hiding:
The user does not need to know all available information, just be able to directly access relevant data within
through pre-determined methods or access points.

Evaluating a Language

4 Main Criteria (and 1 extra):


Readability - simplicity, orthogonality, control statements, data types & structs
Writeability - simplicity, *orthogonality, abstraction, expressivity
Reliability - type checking, exception handling, aliasing, readability & writability
Cost - training, creation, compilation, execution, maintenance Portability
Orthogonality (defn): two separate ideas that can be used together
ex. Ternary Operator :

cond ? T : F

Writeable but not very Readable


Aliasing *(def): something with more than one name
ex. Pass by Reference

func A( ) B( c )
{
{
var x;
...
B( x );
}

---> c === x : the var c points to the var x


not to its value
}

Language Design Trade-Offs


Flexibility vs. Safety
dynamic vs. static type binding
Writeability
ex: cond ? T : F
Reliability vs. Cost of Compilation
dynamic vs. static type checking
Reliability vs. Cost of Execution

Lecture (2014.09.02)
Orthogonality - two different features can be used in any combination, the combinations all make sense, and the
meaning of a given feature is consistent, regardless of the other features with which it is combined.
Readability vs. Writability
ie. the ternary operator

cond ? T : F

Arrays and Structs : we can use arrays inside a struct and we can have an array of structs. Both can exist
independently of each other.

4 Main Criteria of a Programming Language


Readability - can humans easily read the code and understand it?
Writeability - can humans easily write the code?
Reliability - static type checking for type compatibility at compile time *
Cost - design, compilation, training, poor reliability

Trade Offs
Writeability vs. Readability: the ternary operator

cond ? T : F

Flexibility vs. Safety: Dynamic Types vs. Static Types Reliability vs Cost of Execution: C arrays with no OOB
checking (fast execution, low reliability)

Primary Influences on Language Design


Computer Architecture
Imperative Languages (Von Neumann Architecture)
fetch -> decode -> execute
memory -> bus -> CPU
IO -> Memory -> CPU -> Output
1950s - 1960s:
machine efficiency was paramount
Fortran: the first high-level programming language not very efficient
best for numerical calculations
no existing if statement structure -> used GOTOs
1960s: people --> readability
1970s: data abstraction, higher level thinking
1980s: OO languages
ex. Smalltalk
Everything is an object!
Lots of overhead as a result *

Language Categories
Procedural/Imperative
DO THIS!
Command based
HOW the computer is to do
Functional/Applicative/Declarative
Composition
fg(x) : sending the result of g(x) to f as input
WHAT the computer is to do it
Logic
ex. Prolog
Establishes rules and facts and then reasons based on that
Problem Oriented/Application Specific

3G/4G Languages

Complete Languages
sequencing
decision
iteration

Compilation vs. Interpretation


Compiler is NOT in memory during execution
Interpreter IS in memory during execution

LOADING...

Hybrid Implementations
source code is translated into intermediate code which is fed to the virtual machine alongside the input during

LOADING...

Library of Routines and Linking

LOADING...

Preprocessor
#include

is pre-processed before compilation conditional compilation (ie.

ifdef

Macros
filters in parameters to separate code and serves up the result allow for more flexible inputs than a function more
variety for parameters, less overhead

JIT (Just-In-Time) Compilation


delays compilation until the last possible moment
compiles pieces of code during runtime
allows for self generating code to be written, then compiled during runtime
portability (ie: Java JVM)
reuse compiled code for SPEED! No need to re-compile

LOADING...

Lecture (2014.09.04)
Concerning #include
Both #include and source program goes through the preprocessor
#includes

.h header files

DLL (Dynamic Link Loader)


links external calls dynamically during execution
saves space as only source code is included and written in binary

The -static Option


ensures that the resulting code will run on ANY machine
compiles complete code

The Compiler
a program is really just a sequence of characters
First 3 phases (Front End), final 3 phases (Back End)

6 Phases of Compilation
1. Scanner (Lexical Analysis)
input: character stream
output: token stream

Reads in characters and breaks them into tokens to pass to the parser
Main purpose is to simplify the task of the parser by decreasing the size
Optional tuning can remove whitespace and comments from the code and can also tag tokens with line and
column numbers for debugging later
2. Parser (Syntax Analysis)

input: tokens
output: parse tree

Organises tokens into a structure called a Parse Tree, representing higher level constructs
Each node in the tree is a construct, and its children arconstituents
3. Semantic Analysis and Intermediate Code Generation

input: parse tree


output: AST (abstract syntax tree)

or other intermediate form


Discovers meaning in the program
Typically builds a symbol table structure to map identifiers to information known about it
Enforces syntactical rules using the symbol table like identifier declaration, function call arguments, method
returns etc...

4. Machine Independent Code Improvement (optional)


input: AST (abstract syntax tree) or other intermediate form
output: Modified Intermediate Form

Makes very high-level improvements that will optimize performance on any machine
5. Target Code Generation

input: Modified Intermediate Form


output: Target Language Code

Translates the intermediate form into the target language (usually assembler or binary)
6. Machine Specific Code Improvement

input: Target Language Code


output: Modified Target Language Code

The Compiler can make very specific low-level code improvements based on the architecture of the machine to
improve performance

LOADING...

DFA (Deterministic Finite Automaton)


used to recognize tokens

Reading 01
1. What is the difference between machine language and assembly language?
Machine Language is a series of bits that directly controls a processor, where as assembly language is a collection
of mnemonics that can be better understood by humans and can be translated into machine code.

2. In what way(s) are high-level languages an improvement on assembly language? In what circumstances does it
still make sense to program in assembler?
High level languages can be more easily read and understood by humans where numerical calculations are more
similar to mathematical formulae. High level languages are machine independent.
It would make sense to program in assembler when working very close to the hardware, for instance with embedded
systems where memory and hardware is limited.

3. Why are there so many programming languages?


Evolution: things change and were constantly finding better ways to do things
Special Purposes: many languages are designed for very specific use
Personal Preferences: different people like different things
4. What makes a programming language successful?
Expressive Power: the ability for a programmer to write clear, concise and maintainable code
Ease of Use for Novices: a low learning curve, quick to pick up
Ease of Implementation: simple, versatile, portable and free
Standardization: international standards and standard libraries
Open Source: freely available and open languages lead to high adoption
Excellent Compilers: good compilers generate fast code quickly
Economics, Patronage and Inertia: backing of large, established companies help propel a languages use

5. Name three languages in each of the following categories: von Neumann, functional, object-oriented. Name two
logic languages. Name two widely used concurrent languages.
Von Neumann (follows the concept of stored program computing):
C
Ada
Fortran
Functional:
Lisp/Scheme
ML
Haskell
Object-Oriented:
Java

C++
Objective C
Eiffel
Logic:
Prolog
SQL
XSLT
Excel
Concurrent:
Ada
Erlang
Java
Rust
6. What distinguishes declarative languages from imperative languages?
Declarative languages focus more on WHAT the computer is doing, more so from the programmers point of view.
Where imperative languages focus on HOW the computer should do it. There are still very fuzzy distinctions between
the two classifications.

7. What organization spearheaded the development of Ada?


Ada, Cobol -> US DOD
C -> Bell Labs
PL/I -> IBM
C# -> Microsoft
8. What is generally considered the first high-level programming language?
Fortran is widely considered to be the first high-level programming language, then Lisp and Algol

9. What was the first functional language?


Taking their inspiration from Lambda Calculus, a computational model based on the recursive function definitions.
Lisp was one of the first, then ML and Haskell.

10. Why arent concurrent languages listed as a category in Figure 1.1?


Most concurrent programs are written using special library packages or compilers in conjunction with a sequential
language such as Fortran or C.

11. Explain the distinction between interpretation and compilation. What are the comparative advantages and
disadvantages of the two approaches?
Interpretation is done during program execution, reading the program line by line whereas compilation will translate
the high level source code into a target program (generally assembly language) to be run later by the OS.
Interpretation can provide better diagnostics and error messages as well as variable names depending on the input
because the code is analyzed during runtime. Interpretation can also delay decisions about program implementation
until runtime, known as Late Binding. By comparison compilation generally provides better performance as
decisions can be made prior to program execution, whereas interpreted code will need to make various decisions
during runtime.

12. Is Java compiled or interpreted (or both)? How do you know?


Java is technically both compiled and interpreted. The original source code is compiled into bytecode that is then
interpreted by the JVM (Java Virtual Machine) during execution.

13. What is the difference between a compiler and a preprocessor?


A preprocessor removes whitespace, comments and generally cleans up the code so that it may be interpreted more
efficiently.

14. What was the intermediate form employed by the original AT&T C++ compiler?
The AT&T compiler originally generated C code as an intermediate.

15. What is P-code?


Intermediary code generated by Pascal, a stack-based language similar to Java bytecode, used for bootstrapping.

16. What is bootstrapping?


Boostrapping is the process of using a simple implementation of something to build progressively more sophisticated
versions. For example writing a Pascal compiler that generates P-code, and running that compiler through the
Pascal compiler to generate a machine language version of the compiler.

17. What is a just-in-time compiler?


A Just-In-Time compiler will delay compilation until runtime where certain lines are compiled just before they are run.
Example being Java which will employ just-in-time compilation translating Java bytecode into machine language
immediately before execution.

18. Name two languages in which a program can write new pieces of itself on the fly.
Both Lisp and Prolog can write new pieces of itself during runtime execution to translate newly generated code into
machine language to optimize the code.

19. Briefly describe three unconventional compilers whose purpose is not to prepare a high-level program for
execution on a microprocessor.
Compilers for text formatting program languages like TEX and troff that generate high-level code into commands for
printers and phototypesetters.
Query language compilers for languages like SQL will translate code into primitive operations on files.
Compilers for logic-level circuit specifications into photographic masks for computer chips.

20. List six kinds of tools that commonly support the work of a compiler within a larger programming environment.
Text Editors
Pretty-Printers

Style Checkers
Configuration Management Tools (track dependencies)
Perusal Tools
Profilers
21. Explain how an IDE differs from a collection of command-line tools.
When errors occur in the code, breakpoints can be set without implicitly invoking a debugger and the line where the
error may be highlighted in the IDE allowing the programmer to make changes to the code without implicitly invoking
an editor. Rerunning the program can be done without explicitly re-building or invoking the compiler. Basically, the
functionality of many different command utilities are integrated into one environment without having to explicitly
running them separately.

22. List the principal phases of compilation, and describe the work performed by each.
Scanner (Lexical Analysis)
Input (Character Stream)
Output (Token Stream)
Parser (Syntax Analysis)
Input (Token Stream)
Output (Parse Tree)
Semantic Analysis and Intermediate Code Generation
Input (Parse Tree)
Output (Abstract Syntax Tree or other intermediate form)
Machine-Independent Code Improvement (optional)
Input (Abstract Syntax Tree or other intermediate form)
Output (Modified Intermediate Form)
Target Code Generation
Input (Modified Intermediate Form)
Output (Target Language i.e. Assembler Language)
Machine-Specific Code Improvement (optional)
Input (Target Language i.e. Assembler Language)
Output (Modified Target Language)

23. Describe the form in which a program is passed from the scanner to the parser; from the parser to the
semantic analyzer; from the semantic analyzer to the intermediate code generator.
Scanner to Parser the Scanner simplifies the input for the parser by tokenization, removing whitespace etc.
Parser to Semantic Analyzer Parser generates the Parse Tree, representing higher level constructs where the
Semantic Analyzer will discover the meaning of the code. The Semantic Analyzer will typically build and
maintain the Symbol Table for mapping identifiers, checking for grammar and enforcing other rules etc.
Semantic Analyzer to the Intermediate Code Generator Semantic Analyzer passes, from the Front End to the
Back End, some form of intermediate or syntax tree to the Intermediate Code Generator, which is then
traversed, thereby generating some intermediate form of code.

24. What distinguishes the front end of a compiler from the back end?
The main difference between the front end and back end of the compiler has to do with the intermediary code, which
is the form of code accepted by the back end. This allows for multiple systems to share a back end as different
systems could produce the same intermediary code.
The front end serves to determine the meaning of the source program, where the back end serves to construct the
equivalent target program.
The front end takes in source code as input, outputting intermediate code, whereas the back end takes in
intermediate code and outputs the target program.

25. What is the difference between a phase and a pass of compilation? Under what circumstances does it make
sense for a compiler to have multiple passes?
Phases serve to discover information about the program for use in later phases. A pass is a phase or set of phases
that is run prior to moving through the compilation any further, it is serialized from the rest of the compilation.
A compiler can have multiple passes so that the code space could be reused after one pass was complete, to
minimize memory usage.

26. What is the purpose of the compilers symbol table?


The symbol table is a data structure that serves as a repository of information about identifiers that can be used
during compilation. For instance it can be used by the semantic analyzer to enforce rules not caught by the contextfree grammar or the parse tree.
For Example : Every identifier is declared before it is used.
No identifier is used in an inappropriate context (calling an integer as a sub- routine, adding a string to an integer,
referencing a field of the wrong type of struct, etc.)
Subroutine calls provide the correct number and types of arguments. Labels on the arms of a switch statement are
distinct constants.
Any function with a non-void return type returns a value explicitly.

27. What is the difference between static and dynamic semantics?


Semantic rules that can be checked at compile time are known as Static Semantics.
Whereas Dynamic Semantics are rules that must be checked at run time;
such as:
Variables are never used in an expression unless they have been given a value
Pointers are never dereferenced unless they refer to a valid object
Array subscript expressions lie within the bounds of the array Arithmetic operations do not overflow

28. On modern machines, do assembly language programmers still tend to write better code than a good compiler
can? Why or why not?
Generally speaking a good compiler can outperform a humans assembly code on modern machines. A good

compilers code improver can choose when to store variables in registers for extended periods during runtime
which is an improvement in efficiency in modern processors that can execute code simultaneously.

Chapter 02

Lecture (2014.09.09)
DFA (Deterministic Finite Automaton)
Deterministic - only one outcome
Finite - it has an end. A finite set of states
Automaton - it runs on it's own, 'automates'
can be implemented with a "transition table"
State/Event

alpha

numeric

space

End

Conway Diagrams: circles and arrows in Pascal manual


DFA for a variable name:

LOADING...

Regular Grammar
ex: ab2d ===> alpha+ (alpha|int)*
start ===> alpha Y
===> alpha alpha Y
===> alpha alpha int Y
===> alpha alpha int alpha Y
===> alpha alpha int alpha END*

Parser
tokens from the scanner, are they in the correct order?
do the tokens make logical sense?
discovers the structure of the program
uses PDA (Push Down Automatan) forrecognizing valid/invalid structure in the program
PDA is a context-free grammer/language

similar to a stack: *think push-pop for parenthesis counting

token = tokenStream.getNext()
if ( token !=')')
pda.push(token);
if (token == ')')
tos = pda.pop();
if (!tokenStream.hasNext() && pda.isEmpty())
return true;
else return false;
}

Parse Tree and Context-Free Grammar


iteration ---> statement
---> while (expression) statement
statement
---> compound statement
---> X
compound statement ---> begin statementList end

Deriving a Terminal in a Production to Non-Terminals


iteration statement ---> while (expression) statement
is ---> w (e) s
---> w (e) B { statementList } E
---> w (e) B { sl; s } E
---> w (e) B { s; s } E
---> w (e) B { sl; x } E
---> w (e) B { x; x } E

Constant Folding
example: stored constant expression
let a :int = 5
m * a + 10 \ b

In this case the compiler knows to store


therefore

m * a + 10 / b

is stored as

example: reduction in strength


a ** 2
a ** a

Machine Specific Optimizations


during target code generation
x = y + z ---- STR R1 addr(x)
m = x + 2 ---- LDR R! addr(x)

a + 10

m * 15 \ b

as

15

Lecture (2014.09.11)
BNF (Baucus-Naur Form)
used to express grammars
productions of non-terminal -> terminal values to derive into sentential form

Programming History
1950's
Fortran : Formual Translation
the first compiled, high-level language
optimizations introduced for different machines/code
both high and low level
Lisp: List Processes (Functional Language) 1958
dynamic scoping <- static scoping
scope determined at runtime, not compile time
recursion
garbage collection
runtime stack in use
dynamic memory management
1960's Elaboration and Analysis
Algol First interperated "universal" language
block structure
call by value
stack-based arrays
COBOL: Common Business Oriented Language
Good for report generation
expressions in LHS : Too much orthogonality
APL
BASIC
Time Sharing Terminals
SNOBOL
Pattern matching language
character manipulation
PL/1
The Kitchen Sink Language
Too Much!
1970's
SIMULA 67
classes
inheritance
data abstraction
Pascal
C
systems
efficiency
Prolog
first logic language

AI oriented
SmallTalk
first OO Language
interpreted
sloooow
1980's
C++
OOP
Haskell and Miranda
both "functional" languages
1990's Python, Javascript and Perl
scripting languages
dynamic

Syntax and Lexical Analysis


Recognizer
compilers
grammars
BNF
using grammar to build/compile the program
Generators
derivations
parse trees
Ex: The Java Language is comprised of ALL potential Java programs and sentences
How to determine if a word is a palindrome?

algorithm W#W.r : W.r = reversed order of W


* push all chars in W to a stack
* until # is reached
* pop the stack and compare to W.r
* i.e. reverse W and compare it to W.r

DFA (Deterministic Finie Automaton)


regular language
PDA (Push Down Automaton)
DCFL: deterministic context-free languages
( CFL (DCFL ) )

Lecture (2014.09.16)
Basic Definitions
Syntax - the form or structure of the expressions, statements and program units
Semantics - the meaning of the expressions, statements and program units
Sentance - a sentance is a string of characters over some alphabet
Language - a set of sentances. i.e. Java is the set of all programs that can be written in Java
Lexeme - the lowest level syntactical unit of a language
Token - a form of lexeme (i.e. an identifier)

Grammar
Context-Free Grammer G : G = {N, T, S, P}
N = Non-Terminals
identifiers, statements, keywords
T = Terminals
the "alphabet" of the grammar
S = Start Symbol
(a non-terminal)
P = Productions
or a set of productions
A set of* Terminals (Alphabet Characters)
*S = {a, b, c}
using the Kleene Plus
S+ = {a, b, c, aa, bb, cc, ab, ac, ba, bc, ca, cb, aaa...}
Tokens are the alphabet of the Parser (Context-Free Grammar)
a Grammar G contains a finite, non-empty set of rules

Recognizers and Generators


Recognizers
Ex. Compiler
will tell you wether a given sentance is in the language
Generators
Ex. BNF, Grammar
will generate an arbitrary sentance in the language

Grammars
definition: A Grammar is a language generator meant to describe the syntax of natural languages
4-Level Heirarchy
lvl 0: Regular
lvl 1: Context-Free
lvl 2: Context Sensitive

lvl 3: Phrase Structured

(PSL (CS (CFL (Reg))))

Examples (note: CAPITAL => Non-Terminal, lowercase => Terminal)


Regular Language

A -> Ab
-> c

Context-Free Language

A -> BCd
-> Dm
-> x

Context Sensitive

cAb -> cDmb

only in THIS context can A -> Dm


when flanked by 'c' and 'b'
CS can have a terminal on the LHS
Phase Structured
there are no restrictions on productions
3-Levels of Context-Free Languages (CFL)
lvl 0: Deterministic CFL (DCFL)
lvl 1: Context-Free Language (CFL)
lvl 2: Non-Deterministic CFL (NCFL)

BNF (Baukus-Naur Form)


derive a production into sentential form using recursion
ex. Left Recursive (recursive elements are kept LEFT)
A => Ab
A => Abb
A => Abbb

ex. Right Recursive (recursive elements are kept RIGHT)


A => Ab
A => bAb
A => bbAb

BNF is only concerned with syntax (form & structure), no semantics (meaning)
the Parser can easily be based directly on the BNF
the BNF based Parser is easy to maintain
Abstractions: used to represent classes of syntactic structures, also call Non-Terminal symbols

ex: <while_stmt> -> while <logic_expr> do <stmt>

abstraction: non-terminal symbol can have more than 1 RHS

Lecture (2014.09.18)
Ambiguity
ambiguity can be based on asociativity or precedence
subtraction is NOT associative
stratification forces precedence/associativity
ex: (9^(5^(4))) : right to left
a grammar is ambiguous if two distinct parse trees/ LL or LR derivations produce the same sentance

Derivation Sequences
ex. Grammar Rules: Productions
<id_list> -> id
-> id, <id_list>
ex. Derivation Sequence
<id_list> => id, <id_list>
* sentential form
=> id, id, <id_list>
*
=> id, id, id, <id_list> *
=> id, id, id, id
***sentance

production rules use -> where as derivations use =>


ONLY non-terminals on LHS

Parsing
Top Down Parser: works by deriving a left most generation

ex. Left Most Derivation


<N> => <N>, <N>
=> T, <N>
=> T, T, <N>

Bottom Down Parser works back from the reverse of a right most derivation

ex. Right Most Derivation


<N> => <N>, <N>
=> <N>, T
=> <N>, T, T

Parse Trees
Grammar G:

E -> E - E
E -> id

Left Most Derivation

E => E
=> id
=> id
=> id
=> id

E
E
E-E
id - E
id - id

generated parse tree:

E
/|\
E -E
/ /|\
id E - E
/
\
id
id

Right Most Derivation

E => E - E
=> E - id
=> E - E - id
=> E - id - id
=> id - id - id

generated parse tree:

E
/|\
E-E
/|\ \
E - E id
/
\
id id

Lecture (2014.09.25)
The Parser Problem
Types of Parser
Top-Down Parser
(LL) Left-to-right, Left-most derivation.
produces the parse tree from the root
order of a left-most derivation
table-driven implementation
cannot use a left recursive grammar
Bottom-Up Parser
LR: Left-to-right, Right-most derivation.
produces the parse tree from the leaves
reverse order of a right-most generation
Recursive Decent Parser
a top-down parser
(LL) Left-to-right, Left-most derivation.
coded implementation

Bottom Up Parse (LR)


Right-Most Derivation

E => E + T
=> E + T * F
=> E + T * id
=> E + F * id
=> E + id * id
=> T + id * id
=> F + id * id
=> id + id * id

Reversal of the Right-Most Derivation --> LR Parse

Stack
Input
Rule
empty
id + id * id
id
<=
+ id * id
F
+ id * id
F -> id
T
+ id * id
T -> F
E
+ id * id
E -> T
E+
<=
id * id
E + id
<=
* id
E+F
* id
F -> id
E+T
* id
T -> F
E + T*
<=
id
E + T * id <=
E + T* F
F -> id
E+T
T -> T * F
E
E -> E + T

Grammar transformations

A -> A
->
A -> A'
A' -> A'
->

Example

Grammar
E -> E + T
-> T

transformation:

E -> E + T
: A -> A
: E == A
: + T ==
: A -> A == E -> E + T
:
:
:
:
:
:

A -> A'
E == A
== + T
A' == E'
E -> + T E'
->

Top-Down Parse
Grammar

E -> E + T
==>
E -> TE'
-> T
==>
E' -> +TE' |
T -> T * F
==>
T -> FT'
-> F
==>
T' * F T' |
F -> id
==>
F -> id

We see that this Grammar is Left Recursive, and must be transformed in order to do a Top-Down Parse
Leftmost Derivation

sentence: E -> E + T ===> E => TE'


Leftmost Derivation
Stack (Top-Down Parse)
E+T
[T + E
T E'
[E' --(T) -> F T'
F T' E'
[E' T' --(F) -> "id"
id T' E'
[E' --(T') -> + T
id + T E'
[E' T + --(+ T) -> + F T'
id + F T' E'
[E' T' -- + F -> " + id "
id + id T' E'
[E' -- T' -> * F T'
id + id * F T' E'
[E' T' -- * F -> " * id "
id + id * id T' E'
[E' -- T' ->
id + id * id E'
[ -- E' ->
id + id * id
$$

Reading Questions 2.1


1. What is the difference between syntax and semantics?
Syntax refers to the structure of a language, and the rules that govern the way it is written. Semantics is the meaning
behind the language, or a sentance/program written in that language.
2. What are the three basic operations that can be used to build complex regular expressions from simpler
regular expressions?
i. Concatenation
ii. Alternation
iii. Kleene Closure
3. What additional operation (beyond the three of regular expressions) is provided in context-free grammars?
Recursion is added in a context-free grammar (language)
4. What is Backus-Naur form? When and why was it devised?
For definition of the Algol-60 programming language, John Backus and Peter Naur developed the notation for
context-free grammars known as Baukus-Naur form.
5. Name a language in which indentation affects program syntax.
Python uses indentation to denote a set of expressions like curly-braces would in Java or C
6. When discussing context-free languages, what is a derivation? What is a sentential form?
A derivation is the breaking down of a production in terms of non-terminals and terminals into a more specific form
based on the rules of the language. Sentenial form is when a production has been derived to the point where only
terminal values are on the RHS of the production.
7. What is the difference between a right-most derivation and a left-most derivation?
A right-most derivation will beging breaking the production down from the right-most variable and move left, a leftmost derivation moves from left to right.
8. What does it mean for a context-free grammar to be ambiguous?
A context-free grammar is ambiguous when two or more parse trees can be generated that have the same frontier.
That is to say that two sentances with different constructions can have the same meaning.
9. What are associativity and precedence? Why are they significant in parse trees?
Associativity deals with how operations of the same precedence are handled in lieu of parenthesis. Precedence is
an order of operations where it is determined which operations willl be performed first, before others in some predetermined heirarchy.

Reading Questions 2.2


1. List the tasks performed by the typical scanner.
i. Read in the character stream
ii. Groups characters into tokens
iii. Remove whitespace and comments
iv. Saves the text of identifiers, strings and numeric literals
v. Notes line and column numbers for different tokens for later debugging
2. What are the advantages of an automatically generated scanner,in comparison to a handwritten one? Why do
many commercial compilers use a handwritten scanner anyway?
Handwritten automata tend to use nested case statements, while most automatically generated automata use tables.
Tables can be difficult to write by hand, but easier than code to create from within a program. Handwritten scanners
using nested statements (switches) can be easier to debug.
3. Explain the difference between deterministic and non-deterministic finite automata. Why do we prefer the
deterministic variety for scanning?
4. Outline the constructions used to turn a set of regular expressions into a minimal DFA.
5. What is the longest possible token rule?
The scanner returns to the parser only when the next character cannot be used to continue the current token.
The scanner will always save the longest possible token and not separate out characters from an identifier or digits
from a numeric value. For example, "Foobar" is always "Foobar" and not "Foo" and "Bar". Just like 3.14 is not '3', '.'
and '14'
6. Why must a scanner sometimes peek at upcoming characters?
A scanner must sometimes look ahead to make decisions on what constitutes the end of a token and the beginning
of another based on subsequent characters.
7. What is the difference between a keyword and an identifier?
A keyword is a special reserved word that has specific meaning, such as "if" or "while". These keywords differ from
identifiers that can be used to signify different variables.
8. Why must a scanner save the text of tokens?
9. How does a scanner identify lexical errors? How does it respond?
If a piece of code does not comply with the syntactical rules of the language, it is said to be a lexical error. The
scanner can identify these through the use of the DFA. The scanner will generally hold onto these lexical errors and
continue scanning the remaining code for further errors that can all be displayed back to the programmer for
debugging purposes.
10. What is a pragma?

Chapter 03

Lecture (2014.10.02)
Top Down Parsing
Cannot be used when:
1. Left-Recursive (either directly or indirectly)

Direct
Indirect
E -> E + X
E -> X + T
X -> E * F

2. a. Not Pairwise-Disjoint
b. Common Prefixes

Direct
A -> bcD
A -> bxM

Indirect
A -> bX
X -> cD
-> xM

Recursive Descent Parsing


ex.
E -> E + T
E() {
E();
if '+'
T();
}

Names, Scope and Binding


Dynamic vs. Static Type Checking
Dynamic
Pros: provides flexibility
Cons: costly (run-time type checking)
Static
Pros: safety & faster execution time
Cons: no flexibility (strict)
Scope
local method parameters memory locations change with each call to the method. These variables live on the
stack
Variable Attributes:
1. Address (in memory)
2. Type (int, double, String)
3. Scope (local, gloabl, static, method)
4. Lifetime (how long is it valid?)
Binding

the association between an attribute and an entity


Binding Times
Static Binding :
language design
language implementation
compile time
load time
(i.e. loading static variables to memory locations)
Dynamic Binding
runtime
(i.e. method parameters, method variables on the stack)
Binding Time

{Language Design}
- (Early) (High Safety, High Efficiency, Low Flexibility)

|
|
|
|
(time)
(time)
|
|
|
|

+ (Late) (Low Safety, Low Efficiency, High Flexibility)


{Runtime}

Dynamic Type Checking - every access of a dynamically typed variable has to be checked for validity at
runtime (low efficiency)
Type Binding
Static Type Binding

var X: int = 22

Implicit Declaration

X := 1.2

Dynamic Type Binding


Coercion \ Double /
\ Float /
\ Int /
\ Bin /

Strongly Typed Languages


Type errors are ALWAYS caught whether it be a static or dynamically binded type
Fortran 77 : Equivalence - the ability to look at the value in one memory location two different ways

i.e. a char or an int, one value can be looked at either as a char or an int
(*within the range of ASCII or UNICODE chars*)

Lecture (2014.10.07)
Scope
Static scope rules specify that the referencing environment depends on the lexical nesting of program blocks in
which names are declared.
Dynamic scope rules specify that the referencing envi- ronment depends on the order in which declarations are
encountered at run time.

Binding
deep binding - the early binding of the referencing environment, at the time the routine is first passed as a parameter,
and then restoring that environment when the routine is finally called.
shallow binding - the late binding of the referencing environment of a subroutine that has been passed as a
parameter. The referencing environment of the passed routing is not created until the routine is actually called

Chapter 04: Semantics

Lecture (2014.10.28)
Semmantics
meaning
characterized in terms of "annotation" through (decorating) a parse tree or syntax tree
Static Semantics
<>
Dynamic Semantics
Attempts to describe the meaning of a statement or program
Two Common Approaches:
Operational Semantics
meaning in terms of its implementation on a real or virtual machine
"change of state" defines meaning
aka: translational semantics
i.e.high level code -> assembly code
Advantages
May be simple, intuitive for small examples
Good if used informally
Disadvantages
No mathematical rigor
too complex for large problems
Denotational Semantics
Based on recursive function theory
Static Rules
enforced by the compiler at complile time
ex. Static Type Checking
Dynamic Rules
enforced by the compiler at runtime
ex. Array Bounds Checking

Attribute Grammars
Computational cousins w/ Semantic Functions
Serves to define the semantics of a program
Attribute Rules
best thought of as definitions, not assignments
not meant to be valuated at particular time/in order
Evaluating Attribute Rules
process: Annotation or "decorating" the parse tree
value of the expression will be the val attribute of the root
Synthesized Attributes
calculated from the attributes below (Child Nodes)
Inherited Attributes
come from the "top-down"
defined (or computed) in terms of attributes at the parent and/or siblings of that node
contextual information flows from the top or the side
Example Attribute Grammar (snippet)
Grammar Rules
E1 -> E2 + T

Semantic Function
E1.val = E2.val + T.val

S-Attributed Grammar
uses only synthesized attributes
attribute flow is purely "bottom-up"

arguments to symantec functions are always attributes of symbols on RHS of the current production
return value is placed in LHS attribute
L-Attributed Grammar
Use both Synthesized and Inherited attributes
Support attribute evaluation in a single, left-to-right pass over the input
Symbol table information is commonly passed be means of inherited attributes
Inherited attributes of the root of the parse tree can be used to represent external environment

Pop Quiz 1
Q: Give an example of a trade off between reliability and cost of execution.
A:
Exception handling - You sacrifice cost of execution by enabling exception handling but increase program reliability,
particularly important in the case of embedded systems.
Recursive methods - Recursive methods may often be more reliable than iterative methods, but you will sometimes
sacrifice performance using a recursive method as there can be much more overhead involved in pushing multiple
method calls to the runtime stack.
Dynamic typing - Utilizing dynamic type checking will favor the cost of execution but sacrifice reliability if the program
runs into a type check error during runtime.
Array bounds - Bound checking on arrays. If the bounds are checked invalid memory accesses can be caught but
slow down execution time (it has to check the bounds for every access to the array).

Pop Quiz 2
Q: What is orthogonality in regard to programing languages?
A: Orthogonality is when two distinct features in a language can be used together in a way that enhances what each
individual feature can do.
ex: Using an array of structs in C. Structures are flexible in what they can contain, and arrays provide an easy way of
traversing constructs. When used together, an array of structures allows easy traversal of structures which may contain
any type of information.

Pop Quiz 3
Q: What does the compiler do on its first pass?
A: It tokenizes the sources code.
ex: The first stage of a compiler is called the scanner. It takes a character stream (source code) and performs a lexical
analysis on it. When its done, it outputs a Token stream.

Pop Quiz 4
Q: What do we use to implement a DFA (Deterministic Finite-state Automata)
A: A transition table
ex: A transition table has all the states on one axis (vertical), and all the possible inputs on the other axis (horizontal).
Then each entry on each row holds the value for the next state taking into consideration the state it started with and the
input it obtained.

Pop Quiz 5
Q: For the sentences within the Language Grammar, what are they from the prospective of the software?
A: The sentances are every possible program that can be written in that language. In other words, a sentence is an entire

program.

Pop Quiz 6
Q: Using the Context Free Grammar given on slide 8, construct a valid sentence. That is, a sentence containing only
terminals.
A: a = b + c;

Pop Quiz 7
Q: What are the four components of a grammar?
A: Terminals (T), Non-terminals (N), a start symbol (S), and productions or "rules" (P)

G = {S, N, T, P}

Pop Quiz 8
Q: What is the distinctive charactersitic of associativity in a grammar.
A: A recursive production rule in the grammar, where an LHS non-terminal is included in the RHS as well.

Ex: <expr> -> <expr> + const | const

Pop Quiz 9
Q: What characteristic of a grammar can keep it from being top down parsable?
A:
Left recursive (indirect and direct)
Common prefixes or pairwise disjoint (A->bcD, A->bxM)

Pop Quiz 10
Q: Why does having static binding and dynamic type checking not make sense?
A: It would be inefficient to have to go back and do dynamic type checking at runtime when static type binding was
performed during compile time and all relevant information was available then.

Pop Quiz 11
Q: What is strong typing? How does casting affect strong typing?
A: Strong typing is when all type errors are caught during compilation. Casting allows you to circumvent the type of a
variable , explicit coercion, and considerably weakens strong typing.

Pop Quiz 12

Q: What are the two attributes of an Attribute Grammar?


A: Inherited (top down), and synthetic (bottom up)

Midterm Review
Chapter 1
Programming Language Classification
Imperative (procedural)
concerned with HOW the computer does something
Command Based: "YOU GO"
Von Nueman
C : Block Structure
based on computer architecture: memory and processor *
OO
C++
Java
Information Hiding
Functional/Applicative
concerned with WHAT the computer is doing
computational model based on the recursive definition of functions
Lisp (Only Recursion, No Iteration)
Logic/Declarative
based on first order predicate logic
setup a logical environment (rules) for the language
Compilation
generates a binary file to be read by the hardware
compiler is NOT in memory at runtime
Interpretation
Association List
IS in memory at runtime
no translation
internal table
Hybrid
Java
generates intermediate Java byte code
JIT
translate ONLY what I need to
once translated, I dont need to translate it again
portability (ie: Java JVM)
Phases of Compilation
1. Scanner (lexical analysis)
reads in char stream, outputs tokens
determines the structure of the program
2. Parser (Syntactical analysis)
drives the process
asks scanner for a token
generates the parse tree
determines the meaning of structure of the program
3. Semantic Analysis (Intermediate Code Generation) *
4. Machine Independent Code Optimation (Optional)
constant folding
stores repeated computations of constants in memory for repeated retreival
reduction of strength *

5. Target Code Generation


generates the lowest level of code to be read by the machine
6. Machine-Specific Code Improvement (Optional)
make improvements to the code based on the system architecture of a particular machine
7. Symbol Table used at all levels
8. first 3 levels : Front End
9. last 3 levels : Back End
10. Front End is "portable"
11. Back End depends on the Hardware : Machine Specific Optimizations
Lexical and Syntactical Analysis (Intro)
detects basic syntactic units (tokens)
implements a DFA (Deterministic Finite-State Automaton) *
Language Evaluation Criteria
DR ARTHUR!!!
Writability
C is writeable, but not very readable
cryptic looking code
Reliability
COBOL is reaable
Cost *
Tradeoffs among the above
implicit declaration vs. explicit declaration
implicit writability > reliability
invalid type at runtime
sharkies nippin' at your feet

Chapter 2
Grammars
Developed by NOAM CHOMSKY in the 50's
regular grammars
context free grammars
Lexical Analysis
Scanner
Uses DFA
saves complexity for later
tuning (remove whitespace and comments)
Parsers
discovers the structure of the program
uses a PDA
PDA: characterized by a context-free grammar
Grammars (4-Tuple)
(S, P, N, T)
S: Start Symbol
P: Productions
N: Non-Terminals
T: Terminals
Regular and Context Free Grammars
Regular Grammar

Used for Lexical Analysis


DFA (Deterministic Finite-State Automita)
or FSA (Finite-State Machine Automaton)
Context Free Grammar
Derivations
-> used for grammar rules (productions)
=> used for derivations
Parse Trees
a heirarchical representation of a derivation
Associativity
Recursion is needed (Left or Right)
between operators of the same precedence
Precedence
Stratification of the Grammar
highest precence is at deepest level of stratification
Ambiguity
1. distinct parse trees that have the same frontier

1. two distinct LL or LR derivations that generate same sentance


Left Factoring *
Lexical Analysis (Scanner)
Regular Grammar
S -> A
A -> aA
-> aB
B -> bB
-> cB
-> aC
C -> aC
-> empty

Regular Expression *
FSA Implementation
Programmers method
Table Driven
State -> char -> Next State
Parsing
DR. ARTHUR!!!
(Deterministic) Context Free Grammars
e.g. compilers only need one PDA
deterministic requires one PDA
non-deterministic requires multiple PDAs
WWr => >1 PDA
W#Wr => 1 PDAs
Top Down Parsing
LL
Cannot have Left Recursion in Top-Down Parsing
Left-To-Right Leftmost Generation

defined:
A -> Aa
-> b
let:
A -> bA'
A' -> aA'
-> empty

Recursive Descent
uses if statements and methods for each Non-Terminal
sub-program for each non-terminal
Bottom Up Parsing
LR
reverse of Left-To-Right Rightmost (LR) Generation
PDA Implementation
Stack/Input (Parser Confiuration) *

Chapter 3
Binding Times
Early Binding Times
language definition
Static/Dynamic
Static
occurs during compilation
Dynamic
occurs during runtime
Binding Explicit / Implicit Binding
Explicit *
Implicit *
Types & Memory
explicit binding to memory (Lifetime)
java: new
C: malloc()
Scope
Dynamic
looks to verify references in the caller
follows static links
at Runtime
Static
scope is determined by compiler
scope is based on the lexical structure
Nested Blocks *
Dynamic & Static Links in Activation Record *
Referencing Environment
local variables
reusing var name (var hiding)
"hole" in the global vars lifetime
Main procedure var is NOT in scope of nested procs
Names / Scope

Aliasing
one thing with two different names
example: pass by reference
A.x ... B.x : both are var x
Overloading
function name overloading
inner processes having local variables
Variable Hiding
local variable in inner procedure hides global var of the same name
Name Qualification
A.x
B.x
Type Binding and Type Checking
DR ARTHUR!!!
Dynamic / Static
Type Binding
Static *
Dynamic *
Type Checking
Static *
Dynamic
Name vs Structure Compatibility (or Equivalence)

Vous aimerez peut-être aussi