Vous êtes sur la page 1sur 21

ASSIGNMENT 01/02 Name Registration No Learning Center Learning Center Code Course Subject Semester Module No Date of Submission

n Marks Awarded : : : : : : : : : Ajay Kumar 571013064 GLACE 02815 MC0073 : III

Signature of Evaluator

Signature of Center Coordinator

Directorate of Distance Education Sikkim Manipal University II Floor,Syndicate House Manipal 576 104

Assignment Set 1 1. Describe the following with respect to Language Specification: A) Programming Language Grammars B) Classification of Grammars C) Binding and Binding Times Ans: A) Programming Language Grammars The lexical and syntactic features of a programming language are specified by its grammar. This section discusses key concepts and notions from formal language grammars. A language L can be considered to be a collection of valid sentences. Each sentence can be looked upon as a sequence of words and each word as a sequence of letters or graphic symbols acceptable in L. A language specified in this manner is known as & formal language. A formal language grammar is a set of rules which precisely specify the sentences of L. It is clear that natural languages are not formal languages due to their rich vocabulary. However, PLs are formal languages. Terminal symbols, alphabet and strings The alphabet of L, denoted by the Greek symbol , is the collection of symbols in its character set. We will use lower case letters a, b, c, etc. to denote symbols in . A symbol in the alphabet is known as a terminal symbol (T) of L. The alphabet can be represented using the mathematical notation of a set, e.g. = {a, b, z, 0, l, 9} Here the symbols {, , and} are part of the notation. We call them metasymbols to differentiate them from terminal symbols. Throughout this discussion we assume that metasymbols are distinct from the terminal symbols. If this is not the case, i.e. if a terminal symbol and a meta symbol are identical, we enclose the terminal symbol in quotes to differentiate it from the meta symbol. For example, the set of punctuation symbols of English can be defined as where , denotes the terminal symbol comma. A string is a finite sequence of symbols. We will represent strings by Greek symbols a, (, , etc. Thus = axy is a string over . The length of a string is the number of symbols in it. Note that the absence of any symbol is also a string, the null string . The concatenation operation combines two strings into a single string. It is used to build larger strings from existing strings. Thus, given two strings and , concatenation of with yields a string which is formed by putting the sequence of symbols forming before the sequence of symbols forming . For example, if = ab, = axy, then concatenation of and , represented as . or simply , gives the string abaxy. The null string can also participate in a concatenation, thus a. =.a = a. Nonterminal symbols A nonterminal symbol (NT) is the name of a syntax category of a language, e.g. noun, verb, etc. An NT is written as a single capital letter, or as a name enclosed between <>, e.g. A or < Noun >. During grammatical analysis, a nonterminal symbol represents an instance of the category. Thus, < Noun > represents a noun. Productions A production, also called a rewriting rule, is a rule of the grammar. A production has the form A nonterminal symbol :: = String of Ts and NTs and defines the fact that the NT on the LHS of the production can be rewritten as the string of Ts and NTs appearing on the RHS. When an NT can be written as one of many different strings, the symbol | (standing for or) is used to separate the strings on the RHS, e.g. < Article > ::- a | an | the

The string on the RHS of a production can be a concatenation of component strings, e.g. the production < Noun Phrase > ::= < Article >< Noun > expresses the fact that the noun phrase consists of an article followed by a noun. Each grammar G defines a language lg. G contains an NT called the distinguished symbol or the start NT of G. Unless otherwise specified, we use the symbol S as the distinguished symbol of G. A valid string of lg is obtained by using the following procedure 1. Let = S. 2. While is not a string of terminal symbols (a) Select an NT appearing in , say X. (b) Replace X by a string appearing on the RHS of a production of X. Example Grammar (1.1) defines a language consisting of noun phrases in English < Noun Phrase > :: = < Article > < Noun > < Article > ::= a | an | the <Noun> ::= boy | apple < Noun Phrase > is the distinguished symbol of the grammar, the boy and an apple are some valid strings in the language. Definition (Grammar) A grammar G of a language lg is a quadruple where is the alphabet of Lg, i.e. the set of Ts, SNT is the set of NTs, S is the distinguished symbol, and P is the set of productions. Derivation, reduction and parse trees A grammar G is used for two purposes, to generate valid strings of lg and to recognize valid strings of lg. The derivation operation helps to generate valid strings while the reduction operation helps to recognize valid strings. A parse tree is used to depict the syntactic structure of a valid string as it emerges during a sequence of derivations or reductions. Derivation Let production pi of grammar G be of the form * * *A and let be a string such that = A, then replacement of A by in string constitutes a derivation according to production p1 . We use the notation N to denote direct derivation of from N and N to denote transitive derivation of (i.e. derivation in zero or more steps) from N, respectively. Thus, A => only if A : = is a production of G and A if A . We can use this notation to define a valid string according to a grammar G as follows: is a valid string according to G only if S , where S is the distinguished symbol of G. Example: Derivation of the string the boy according to grammar can be depicted as < Noun Phrase > => < Article > < Noun > => the < Noun > => the boy A string such that S => is a sentential form of lg. The string is a sentence of lg if it consists of only Ts. Example: Consider the grammar G < Sentence >::= < Noun Phrase > < Verb Phrase >

< Noun Phrase >::= < Article >< Noun > < Verb Phrase >::= <verb> <Noun Phrase> <Article> ::= = a | an | the < Noun >::= boy | apple <verb> ::= ate The following strings are sentential forms of Lg < Noun Phrase > < Verb Phrase > the boy < Verb Phrase > < Noun Phrase > ate < Noun Phrase > the boy ate < Noun Phrase > the boy ate an apple However, only the boy ate an apple is a sentence. Reduction: To determine the validity of the string Example The boy ate an apple according to grammar we perform the following reductions Step String The boy ate an apple 1 < Article > boy ate an apple 2 < Article > < Noun > ate an apple 3 < Article > < Noun > < Verb > an apple 4 < Article > < Noun > < Verb > < Article > apple 5 < Article > < Noun > < Verb > < Article > < Noun > 6 < Noun Phrase > < Verb > < Article > < Noun > 7 < Noun Phrase > < Verb > < Noun Phrase > 8 < Noun Phrase > < Verb Phrase > 9 < Sentence > The string is a sentence of lg since we are able to construct the reduction sequence the boy ate an apple > < Sentence >. Parse trees A sequence of derivations or reductions reveals the syntactic structure of a string with respect to G. We depict the syntactic structure in the form of a parse tree. Derivation according to the production A :: = gives rise to the following elemental parse tree. B) Classification of Grammars Grammars are classified on the basis of the nature of productions used in them (Chomsky, 1963). Each grammar class has its own characteristics and limitations. Type 0 Grammars These grammars, known as phrase structure grammars, contain productions of the form where both and can be strings of Ts and NTs. Such productions permit arbitrary substitution of strings during derivation or reduction, hence they are not relevant to specification of programming languages. Type 1 grammars These grammars are known as context sensitive grammars because their productions specify that derivation or reduction of strings can take place only in specific contexts. A Type-1 production has the form

Thus, a string in a sentential form can be replaced by A (or vice versa) only when it is enclosed by the strings . These grammars are also not particularly relevant for PL specification since recognition of PL constructs is not context sensitive in nature. Type 2 grammars These grammars impose no context requirements on derivations or reductions. A typical Type-2 production is of the form which can be applied independent of its context. These grammars are therefore known as context free grammars (CFG). CFGs are ideally suited for programming language specification. Type 3 grammars Type-3 grammars are characterized by productions of the form A::= tB | t or A ::= Bt | t Note that these productions also satisfy the requirements of Type-2 grammars. The specific form of the RHS alternativesnamely a single T or a string containing a single T and a single NT gives some practical advantages in scanning. Type-3 grammars are also known as linear grammars or regular grammars. These are further categorized into left-linear and right-linear grammars depending on whether the NT in the RHS alternative appears at the extreme left or extreme right. Operator grammars Definition (Operator grammar (OG)) An operator grammar is a grammar none of whose productions contain two or more consecutive NTs in any RHS alternative. Thus, nonterminals occurring in an RHS string are separated by one or more terminal symbols. All terminal symbols occurring in the RHS strings are called operators of the grammar. C) Binding and Binding Times Definition: Binding: A binding is the association of an attribute of a program entity with a value. Binding time is the time at which a binding is performed. Thus the type attribute of variable var is bound to type, when its declaration is processed. The size attribute of type is bound to a value sometime prior to this binding. We are interested in the following binding times: 1. Language definition time of L 2. Language implementation time of L 3. Compilation time of P 4. Execution init time of proc 5. Execution time of proc. Where L is a programming language, P is a program written in L and proc is a procedure in P. Note that language implementation time is the time when a language translator is designed. The preceding list of binding times is not exhaustive; other binding times can be defined, viz. binding at the linking time of P. The language definition of L specifies binding times for the attributes of various entities of programs written in L. Binding of the keywords of Pascal to their meanings is performed at language definition time. This is how keywords like program, procedure, begin and end get their meanings. These bindings apply to all programs written in Pascal. At language implementation time, the compiler designer performs certain bindings. For example, the size of type integer is bound to n bytes where n is a number determined by the architecture of the target machine. Binding of type attributes of variables is performed at compilation time of program bindings. The memory addresses of local variables info and p of procedure proc are bound at every execution init time of procedure proc. The value attributes of variables are bound (possibly more than once) during an execution of proc. The memory address of P is bound when the procedure call new (p) is executed.

Static and dynamic bindings Definition (Static binding) A static binding is a binding performed before the execution of a program begins. Definition (Dynamic binding) A dynamic binding is a binding performed after the execution of a program has begun. 2. What are data formats? Explain ASCII data formats.? Ans: Data format in information technology can refer to either one of: Data type, constraint placed upon the interpretation of data in a type system Signal (electrical engineering), a format for signal data used in signal processing Recording format, a format for encoding data for storage on a storage medium File format, a format for encoding data for storage in a computer file Container format (digital), a format for encoding data for storage by means of a standardized audio/video codecs file format Content format, a format for converting data to information Audio format, a format for processing audio data Video format, a format for processing video data

ASCII data formats Acronym for the American Standard Code for Information Interchange. Pronounced ask-ee, ASCII is a code for representing English characters as numbers, with each letter assigned a number from 0 to 127. For example, the ASCII code for uppercase M is 77. Most computers use ASCII codes to represent text, which makes it possible to transfer data from one computer to another. For a list of commonly used characters and their ASCII equivalents, refer to the ASCII page in the Quick Reference section. Text files stored in ASCII format are sometimes called ASCII files. Text editors and word processors are usually capable of storing data in ASCII format, although ASCII format is not always the default storage format. Most data files, particularly if they contain numeric data, are not stored in ASCII format. Executable programs are never stored in ASCII format. The standard ASCII character set uses just 7 bits for each character. There are several larger character sets that use 8 bits, which gives them 128 additional characters. The extra characters are used to represent non-English characters, graphics symbols, and mathematical symbols. Several companies and organizations have proposed extensions for these 128 characters. The DOS operating system uses a superset of ASCII called extended ASCII or high ASCII. A more universal standard is the ISO Latin 1 set of characters, which is used by many operating systems, as well as Web browsers. 3. Explain the following with respect to the design specifications of an Assembler: A) Data Structures B) pass1 & pass2 Assembler flow chart

Ans: A) Data Structure The second step in our design procedure is to establish the databases that we have to work with.

Pass 1 Data Structures 1. Input source program 2. A Location Counter (LC), used to keep track of each instructions location. 3. A table, the Machine-operation Table (MOT) that indicates the symbolic mnemonic, for each instruction and its length (two, four, or six bytes) 4. A table, the Pseudo-Operation Table (POT) that indicates the symbolic mnemonic and action to be taken for each pseudo-op in pass 1. 5. A table, the Symbol Table (ST) that is used to store each label and its corresponding value. 6. A table, the literal table (LT) that is used to store each literal encountered and its corresponding assignment location. 7. A copy of the input to be used by pass 2. Pass 2 Data Structures 1. Copy of source program input to pass1. 2. Location Counter (LC) 3. A table, the Machine-operation Table (MOT), that indicates for each instruction, symbolic mnemonic, length (two, four, or six bytes), binary machine opcode and format of instruction. 4. A table, the Pseudo-Operation Table (POT), that indicates the symbolic mnemonic and action to be taken for each pseudo-op in pass 2. 5. A table, the Symbol Table (ST), prepared by pass1, containing each label and corresponding value. 6. A Table, the base table (BT), that indicates which registers are currently specified as base registers by USING pseudo-ops and what the specified contents of these registers are. 7. A work space INST that is used to hold each instruction as its various parts are being assembled together. 8. A work space, PRINT LINE, used to produce a printed listing. 9. A work space, PUNCH CARD, used prior to actual outputting for converting the assembled instructions into the format needed by the loader. 10. An output deck of assembled instructions in the format needed by the loader. Format of Data Structures The third step in our design procedure is to specify the format and content of each of the data structures. Pass 2 requires a machine operation table (MOT) containing the name, length, binary code and format; pass 1 requires only name and length. Instead of using two different tables, we construct single (MOT). The Machine operation table (MOT) and pseudo-operation table are example of fixed tables. The contents of these tables are not filled in or altered during the assembly process. The following figure depicts the format of the machine-op table (MOT) 6 bytes per entry Mnemonic Binary OpcodeInstruction Instruction Not used Opcode (1byte) length format here (4bytes) (hexadecimal) (2 bits) (binary) (3bits) (3 bits) characters (binary) Abbb 5A 10 001 Ahbb 4A 10 001 ALbb 5E 10 001 ALRB 1E 01 000 . . . . b represents blank

B) pass1 & pass2 Assembler flow chart Pass Structure of Assemblers Here we discuss two pass and single pass assembly schemes in this section: Two pass translation Two pass translation of an assembly language program can handle forward references easily. LC processing is performed in the first pass and symbols defined in the program are entered into the symbol table. The second pass synthesizes the target form using the address information found in the symbol table. In effect, the first pass performs analysis of the source program while the second pass performs synthesis of the target program. The first pass constructs an intermediate representation (IR) of the source program for use by the second pass. This representation consists of two main componentsdata structures, e.g. the symbol table, and a processed form of the source program. The latter component is called intermediate code (IC). Single pass translation LC processing and construction of the symbol table proceed as in two pass translation. The problem of forward references is tackled using a process called backpatch-ing. The operand field of an instruction containing a forward reference is left blank initially. The address of the forward referenced symbol is put into this field when its definition is encountered. Look at the following instructions: START READ MOVER MOVEM MULT MOVER ADD MOVEM COMP BC MOVEM PRINT STOP DS DS DC PS END 101 N BREG, ONE BREG, TERM BREG, TERM CREG, TERM CREG, ONE CREG, TERM CREG, N LE, AGAIN BREG, RESULT RESULT 1 1 1 1

AGAIN

101) 102) 103) 104) 105) 106) 107) 108) 109) 110) 111) 112) 113) 114) 115) 116)

+ 09 0 113 + 04 2 115 + 05 2 116 + 03 2 116 + 04 3 116 + 01 3 115 + 05 3 116 + 06 3 113 + 07 2 104 + 05 2 114 + 10 0 114 + 00 0 000

N RESULT ONE TERM

+ 00 0 001

In the above program, the instruction corresponding to the statement MOVER BREG, ONE can be only partially synthesized since ONE is a forward reference. Hence the instruction opcode and address of BREG will be assembled to reside in location 101. The need for inserting the second

operands address at a later stage can be indicated by adding an entry to the Table of Incomplete Instructions (TII). This entry is a pair (instruction address>, <symbol>), e.g. (101, ONE) in this case. By the time the END statement is processed, the symbol table would contain the addresses of all symbols defined in the source program and TII would contain information describing all forward references. The assembler can now process each entry in TII to complete the concerned instruction. For example, the entry (101, ONE) would be processed by obtaining the address of ONE from symbol table and inserting it in the operand address field of the instruction with assembled address 101. Alternatively, entries in TII can be processed in an incremental manner. Thus, when definition of some symbol symb is encountered, all forward references to symb can be processed. Design of A Two Pass Assembler Tasks performed by the passes of a two pass assembler are as follows: Pass I: 1. Separate the symbol, mnemonic opcode and operand fields. 2. Build the symbol table. 3. Perform LC processing. 4. Construct intermediate representation. Pass II: Synthesize the target program. Pass I performs analysis of the source program and synthesis of the intermediate representation while Pass II processes the intermediate representation to synthesize the target program. The design details of assembler passes are discussed after introducing advanced assembler directives and their influence on LC processing. 4. Explain the following, a) Lexical Analsis b) Syntax Analysis.

Ans: Lexical Analysis The lexical analyzer is the interface between the source program and the compiler. The lexical analyzer reads the source program one character at a time, carving the source program into a sequence of atomic units called tokens. Each token represents a sequence of characters that can be treated as a single logical entity. Identifiers, keywords, constants, operators, and punctuation symbols such as commas and parentheses are typical tokens. There are two kinds of token: specific strings such as IF or a semicolon, and classes of strings such as identifiers, constants, or labels. Syntax Analysis The parser has two functions. It checks that the tokens appearing in its input, which is the output of the lexical analyzer, occur in patterns that are permitted by the specification for the source language. It also imposes on the tokens a tree-like structure that is used by the subsequent phases of the compiler. The second aspect of syntax analysis is to make explicit the hierarchical structure of the incoming token stream by identifying which parts of the token stream should be grouped together. 5. Describe the process of Bootstrapping in the context of Linkers ? Ans : In computing, bootstrapping refers to a process where a simple system activates another more complicated system that serves the same purpose. It is a solution to the Chicken-and-egg problem of starting a certain system without the system already functioning. The term is most often applied to the

process of starting up a computer, in which a mechanism is needed to execute the software program that is responsible for executing software programs (the operating system). Bootstrap loading The discussions of loading up to this point have all presumed that theres already an operating system or at least a program loader resident in the computer to load the program of interest. The chain of programs being loaded by other programs has to start somewhere, so the obvious question is how is the first program loaded into the computer? In modern computers, the first program the computer runs after a hardware reset invariably is stored in a ROM known as bootstrap ROM. as in "pulling ones self up by the bootstraps." When the CPU is powered on or reset, it sets its registers to a known state. On x86 systems, for example, the reset sequence jumps to the address 16 bytes below the top of the systems address space. The bootstrap ROM occupies the top 64K of the address space and ROM code then starts up the computer. On IBMcompatible x86 systems, the boot ROM code reads the first block of the floppy disk into memory, or if that fails the first block of the first hard disk, into memory location zero and jumps to location zero. The program in block zero in turn loads a slightly larger operating system boot program from a known place on the disk into memory, and jumps to that program which in turn loads in the operating system and starts it. (There can be even more steps, e.g., a boot manager that decides from which disk partition to read the operating system boot program, but the sequence of increasingly capable loaders remains.) Why not just load the operating system directly? Because you cant fit an operating system loader into 512 bytes. The first level loader typically is only able to load a single-segment program from a file with a fixed name in the top-level directory of the boot disk. The operating system loader contains more sophisticated code that can read and interpret a configuration file, uncompress a compressed operating system executable, address large amounts of memory (on an x86 the loader usually runs in real mode which means that its tricky to address more than 1MB of memory.) The full operating system can turn on the virtual memory system, loads the drivers it needs, and then proceed to run user-level programs. Many Unix systems use a similar bootstrap process to get user-mode programs running. The kernel creates a process, then stuffs a tiny little program, only a few dozen bytes long, into that process. The tiny program executes a system call that runs /etc/init, the user mode initialization program that in turn runs configuration files and starts the daemons and login programs that a running system needs. None of this matters much to the application level programmer, but it becomes more interesting if you want to write programs that run on the bare hardware of the machine, since then you need to arrange to intercept the bootstrap sequence somewhere and run your program rather than the usual operating system. Some systems make this quite easy (just stick the name of your program in AUTOEXEC.BAT and reboot Windows 95, for example), others make it nearly impossible. It also presents opportunities for customized systems. For example, a single-application system could be built over a Unix kernel by naming the application /etc/init. Software Bootstraping & Compiler Bootstraping Bootstrapping can also refer to the development of successively more complex, faster programming environments. The simplest environment will be, perhaps, a very basic text editor (e.g. ed) and an assembler program. Using these tools, one can write a more complex text editor, and a simple compiler

for a higher-level language and so on, until one can have a graphical IDE and an extremely high-level programming language. Compiler Bootstraping In compiler design, a bootstrap or bootstrapping compiler is a compiler that is written in the target language, or a subset of the language, that it compiles. Examples include gcc, GHC, OCaml, BASIC, PL/I and more recently the Mono C# compiler. 6. Describe the procedure for design of a Linker ? Ans : Design of a linker Relocation and linking requirements in segmented addressing The relocation requirements of a program are influenced by the addressing structure of the computer system on which it is to execute. Use of the segmented addressing structure reduces the relocation requirements of program. Implementation Examples: A Linker for MS-DOS Example: Consider the program of written in the assembly language of intel 8088. The ASSUME statement declares the segment registers CS and DS to the available for memory addressing. Hence all memory addressing is performed by using suitable displacements from their contents. Translation time address o A is 0196. In statement 16, a reference to A is assembled as a displacement of 196 from the contents of the CS register. This avoids the use of an absolute address, hence the instruction is not address sensitive. Now no relocation is needed if segment SAMPLE is to be loaded with address 2000 by a calling program (or by the OS). The effective operand address would be calculated as <CS>+0196, which is the correct address 2196. A similar situation exists with the reference to B in statement 17. The reference to B is assembled as a displacement of 0002 from the contents of the DS register. Since the DS register would be loaded with the execution time address of DATA_HERE, the reference to B would be automatically relocated to the correct address. Though use of segment register reduces the relocation requirements, it does not completely eliminate the need for relocation. Consider statement 14 . MOV AX, DATA_HERE Which loads the segment base of DATA_HERE into the AX register preparatory to its transfer into the DS register . Since the assembler knows DATA_HERE to be a segment, it makes provision to load the higher order 16 bits of the address of DATA_HERE into the AX register. However it does not know the link time address of DATA_HERE, hence it assembles the MOV instruction in the immediate operand format and puts zeroes in the operand field. It also makes an entry for this instruction in RELOCTAB so that the linker would put the appropriate address in the operand field. Inter-segment calls and jumps are handled in a similar way. Relocation is somewhat more involved in the case of intra-segment jumps assembled in the FAR format. For example, consider the following program :

FAR_LAB EQU THIS FAR ; FAR_LAB is a FAR label JMP FAR_LAB ; A FAR jump Here the displacement and the segment base of FAR_LAB are to be put in the JMP instruction itself. The assembler puts the displacement of FAR_LAB in the first two operand bytes of the instruction , and makes a RELOCTAB entry for the third and fourth operand bytes which are to hold the segment base address. A segment like ADDR_A DW OFFSET A (which is an address constant) does not need any relocation since the assemble can itself put the required offset in the bytes. In summary, the only RELOCATAB entries that must exist for a program using segmented memory addressing are for the bytes that contain a segment base address. For linking, however both segment base address and offset of the external symbol must be computed by the linker. Hence there is no reduction in the linking requirements.

Assignment Set 2

1. Writer short notes on video controller ? Ans : Video controller is used to control the operation of the display device . A fixed area of the system is reserved for the frame buffer, and the video controller is given direct access to the frame buffer memory.
2. Write about Deterministic and Non-Deterministic Finite Automata with suitable numerical

examples ? Ans: Deterministic finite automata Definition: Deterministic finite automata (DFA) A deterministic finite automaton (DFA) is a 5-tuple: (S, , T, s, A) an alphabet () a set of states (S) a transition function (T : S S). a start state (s S) a set of accept states (A S) The machine starts in the start state and reads in a string of symbols from its alphabet. It uses the transition function T to determine the next state using the current state and the symbol just read. If, when it has finished reading, it is in an accepting state, it is said to accept the string, otherwise it is said to reject the string. The set of strings it accepts form a language, which is the language the DFA recognizes. Non-Deterministic Finite Automaton (N-DFA) A Non-Deterministic Finite Automaton (NFA) is a 5-tuple: (S, , T, s, A) an alphabet () a set of states (S) a transition function (T: S S). a start state (s S) a set of accept states (A S) Where P(S) is the power set of S and is the empty string. The machine starts in the start state and reads in a string of symbols from its alphabet. It uses the transition relation T to determine the next state(s) using the current state and the symbol just read or the empty string. If, when it has finished reading, it is in an accepting state, it is said to accept the string, otherwise it is said to reject the string. The set of strings it accepts form a language, which is the language the NFA recognizes. A DFA or NFA can easily be converted into a GNFA and then the GNFA can be easily converted into a regular expression by reducing the number of states until S = {s, a}.

Deterministic Finite State Machine The following example explains a deterministic finite state machine (M) with a binary alphabet, which determines if the input contains an even number of 0s. See the following figure

M = (S, , T, s, A) = {0, 1} S = {S1, S2} s = S1 A = {S1} The transition function T is visualized by the directed graph shown on the right, and defined as follows: o T(S1, 0) = S2 o T(S1, 1) = S1 o T(S2, 0) = S1 o T(S2, 1) = S2 Simply put, the state S1 represents that there has been an even number of 0s in the input so far, while S2 signifies an odd number. A 1 in the input does not change the state of the automaton. When the input ends, the state will show whether the input contained an even number of 0s or not. There are two main methods for handling where to generate the outputs for a finite state machine. They are called a Moore Machine and a Mearly Machine, named after their respective authors. 3. Write a short note on: A) C Preprocessor for GCC version 2 Ans: The C Preprocessor for GCC version 2 The C preprocessor is a macro processor that is used automatically by the C compiler to transform your program before actual compilation. It is called a macro processor because it allows you to define macros, which are brief abbreviations for longer constructs. The C preprocessor provides four separate facilities that you can use as you see fit: Inclusion of header files. These are files of declarations that can be substituted into your program. Macro expansion. You can define macros, which are abbreviations for arbitrary fragments of C code, and then the C preprocessor will replace the macros with their definitions throughout the program. Conditional compilation. Using special preprocessing directives, you can include or exclude parts of the program according to various conditions. Line control. If you use a program to combine or rearrange source files into an intermediate file which is then compiled, you can use line control to inform the compiler of where each source line originally came from. B) Conditional Assembly

ANSI Standard C requires the rejection of many harmless constructs commonly used by todays C programs. Such incompatibility would be inconvenient for users, so the GNU C preprocessor is configured to accept these constructs by default. Strictly speaking, to get ANSI Standard C, you must use the options `-trigraphs, `-undef and `-pedantic, but in practice the consequences of having strict ANSI Standard C make it undesirable to do this. Conditional Assembly : Means that some sections of the program may be optional, either included or not in the final program, dependent upon specified conditions. A reasonable use of conditional assembly would be to combine two versions of a program, one that prints debugging information during test executions for the developer, another version for production operation that displays only results of interest for the average user. A program fragment that assembles the instructions to print the Ax register only if Debug is true is given below. Note that true is any non-zero value.

Here is a conditional statements in C programming, the following statements tests the expression `BUFSIZE == 1020, where `BUFSIZE must be a macro. #if BUFSIZE == 1020 printf ("Large buffers!n"); #endif /* BUFSIZE is large */ 4. Write about different Phases of Compilation. ? Ans: Phases of Compiler A compiler takes as input a source program and produces as output an equivalent sequence of machine instructions. This process is so complex that it is not reasonable, either from a logical point of view or from an implementation point of view, to consider the compilation process as occurring in one single step. For this reason, it is customary to partition the compilation process into a series of sub processes called phases. A phase is a logically cohesive operation that takes as input one representation of the source program and produces as output another representation. The first phase, called the lexical analyzer, or scanner, separates characters of the source language into groups that logically belong together; these groups are called tokens. The usual tokens are keywords, such as DO or IF identifiers, such as X or NUM, operator symbols such as < = or +, and

punctuation symbols such as parentheses or commas. The output of the lexical analyzer is a stream of tokens, which is passes to the next phase, the syntax analyzer, or parser. The tokens in this stream can be represented by codes which we may regard as integers. Thus, DO might be represented by 1, + by 2, and identifier by 3. In the case of a token like identifier, a second quantity, telling which of those identifiers used by the program is represented by this instance of token identifier, is passed along with the integer code for identifier. The syntax analyzer groups tokens together into syntactic structures. For example, the three tokens representing A + B might be grouped into a syntactic structure called an expression. Expressions might further be combined to form statements. Often the syntactic structure can be regarded as a tree whose leaves are the tokens. The interior nodes of the tree represent strings of tokens that logically belong together. The intermediate code generator uses the structure produced by the syntax analyzer to create a stream of simple instructions. Many styles of intermediate code are possible. One common style uses instructions with one operator and a small number of operands. These instructions can be viewed as simple macros like the macro ADD2. The primary difference between intermediate code and assembly code is that the intermediate code need not specify the registers to be used for each operation. Code Optimization is an optional phase designed to improve the intermediate code so that the ultimate object program runs faster and / or takes less space. Its output is another intermediate code program that does the same job as the original, but perhaps in a way that saves time and / or space. The final phase, code generation, produces the object code by deciding on the memory locations for data, selecting code to access each datum, and selecting the registers in which each computation is to be done. Designing a code generator that produces truly efficient object programs is one of the most difficult parts of compiler design, both practically and theoretically. The Table-Management, or bookkeeping, portion of the compiler keeps track of the names used by the program and records essential information about each, such as its type (integer, real, etc). The data structure used to record this information is called a Symbol table. The Error Handler is invoked when a flaw in the source program is detected. It must warn the programmer by issuing a diagnostic, and adjust the information being passed from phase to phase so that each phase can proceed. It is desirable that compilation be completed on flawed programs, at least through the syntax-analysis phase, so that as many errors as possible can be detected in one compilation. Both the table management and error handling routines interact with all phases of the compiler.

Lexical Analysis The lexical analyzer is the interface between the source program and the compiler. The lexical analyzer reads the source program one character at a time, carving the source program into a sequence of atomic units called tokens. Each token represents a sequence of characters that can be treated as a single logical entity. Identifiers, keywords, constants, operators, and punctuation symbols such as

commas and parentheses are typical tokens. There are two kinds of token: specific strings such as IF or a semicolon, and classes of strings such as identifiers, constants, or labels. Syntax Analysis The parser has two functions. It checks that the tokens appearing in its input, which is the output of the lexical analyzer, occur in patterns that are permitted by the specification for the source language. It also imposes on the tokens a tree-like structure that is used by the subsequent phases of the compiler. The second aspect of syntax analysis is to make explicit the hierarchical structure of the incoming token stream by identifying which parts of the token stream should be grouped together. Intermediate Code Generation On a logical level the output of the syntax analyzer is some representation of a parse tree. The intermediate code generation phase transforms this parse tree into an intermediate language representation of the source program called Three-Address Code. Three-Address Code One popular type of intermediate language is what is called three-address code. A typical three-address code statement is A: = B op C Where A, B and C are operands and op is a binary operator. Code Optimization Object programs that are frequently executed should be fast and small. Certain compilers have within them a phase that tries to apply transformations to the output of the intermediate code generator, in an attempt to produce an intermediate-language version of the source program from which a faster or smaller object-language program can ultimately be produced. This phase is popularity called the optimization phase. A good optimizing compiler can improve the target program by perhaps a factor of two in overall speed, in comparison with a compiler that generates code carefully but without using specialized techniques generally referred to as code optimization. There are two types of optimizations used: Local Optimization Loop Optimization Code Generation The code generation phase converts the intermediate code into a sequence of machine instructions. A simple-minded code generator might map the statement A: = B+C into the machine code sequence.

LOAD B ADD C STORE A However, such a straightforward macro like expansion of intermediate code into machine code usually produces a target program that contains many redundant loads and stores and that utilizes the resources of the target machine inefficiently. To avoid these redundant loads and stores, a code generator might keep track of the run-time contents of registers. Knowing what quantities reside in registers, the code generator can generate loads and stores only when necessary. Many computers have only a few high speed registers in which computations can be performed particularly quickly. A good code generator would therefore attempt to utilize these registers as efficiently as possible. This aspect of code generation, called register allocation, is particularly difficult to do optimally. 5. What is MACRO? Discuss its Expansion in detail with the suitable example. Ans: Macro definition and Expansion Definition : macro

A macro name is an abbreviation, which stands for some related lines of code. Macros are useful for the following purposes: To simplify and reduce the amount of repetitive coding To reduce errors caused by repetitive coding To make an assembly program more readable. A macro consists of name, set of formal parameters and body of code. The use of macro name with set of actual parameters is replaced by some code generated by its body. This is called macro expansion. Macros allow a programmer to define pseudo operations, typically operations that are generally desirable, are not implemented as part of the processor instruction, and can be implemented as a sequence of instructions. Each use of a macro generates new program instructions, the macro has the effect of automating writing of the program.

Macros can be defined used in many programming languages, like C, C++ etc. Example macro in C programming.Macros are commonly used in C to define small snippets of code. If the macro has parameters, they are substituted into the macro body during expansion; thus, a C macro can mimic a C function. The usual reason for doing this is to avoid the overhead of a function call in simple cases, where the code is lightweight enough that function call overhead has a significant impact on performance. For instance, #define max (a, b) a>b? A: b Defines the macro max, taking two arguments a and b. This macro may be called like any C function, using identical syntax. Therefore, after preprocessing z = max(x, y); Becomes z = x>y? X:y; While this use of macros is very important for C, for instance to define type-safe generic data-types or debugging tools, it is also slow, rather inefficient, and may lead to a number of pitfalls. C macros are capable of mimicking functions, creating new syntax within some limitations, as well as expanding into arbitrary text (although the C compiler will require that text to be valid C source code, or else comments), but they have some limitations as a programming construct. Macros which mimic functions, for instance, can be called like real functions, but a macro cannot be passed to another function using a function pointer, since the macro itself has no address. In programming languages, such as C or assembly language, a name that defines a set of commands that are substituted for the macro name wherever the name appears in a program (a process called macro expansion) when the program is compiled or assembled. Macros are similar to functions in that they can take arguments and in that they are calls to lengthier sets of instructions. Unlike functions, macros are replaced by the actual commands they represent when the program is prepared for execution. function instructions are copied into a program only once. Macro Expansion. A macro call leads to macro expansion. During macro expansion, the macro statement is replaced by sequence of assembly statements.

Figure 1.1 Macro expansion on a source program. Example

In the above program a macro call is shown in the middle of the figure. i.e. INITZ. Which is called during program execution. Every macro begins with MACRO keyword at the beginning and ends with the ENDM (end macro).when ever a macro is called the entire is code is substituted in the program where it is called. So the resultant of the macro code is shown on the right most side of the figure. Macro calling in high level programming languages (C programming) #define max(a,b) a>b?a:b
Main () {

int x , y; x=4; y=6; z = max(x, y); } The above program was written using C programing statements. Defines the macro max, taking two arguments a and b. This macro may be called like any C function, using identical syntax. Therefore, after preprocessing Becomes z = x>y ? x: y; After macro expansion, the whole code would appear like this. #define max(a,b) a>b?a:b main() { int x , y; x=4; y=6;z = x>y?x:y; } 6 .What is linking? Explain dynamic linking in detail.? Ans: Linking : Linking, is the process where object modules are connected together (linked) to form one large program. That is, if you have three source code files called myprogram.c, searchfunction.c, and printreport.c, you would compile them separately into myprogram.obj, searchfunction.obj, and printreport.obj. The linker with take the object modules, along with any library code that needs to be included, and create one .exe file, resolving external variables and function calls. Dynamic Linking: Dynamic linking defers much of the linking process until a program starts running. It provides a variety of benefits that are hard to get otherwise:

Dynamically linked shared libraries are easier to create than static linked shared libraries.

Dynamically linked shared libraries are easier to update than static linked shared libraries. The semantics of dynamically linked shared libraries can be much closer to those of unshared libraries. Dynamic linking permits a program to load and unload routines at runtine, a facility that can otherwise be very difficult to provide.

There are a few disadvantages, of course. The runtime performance costs of dynamic linking are substantial compared to those of static linking, since a large part of the linking process has to be redone every time a program runs. Every dynamically linked symbol used in a program has to be looked up in a symbol table and resolved. (Windows DLLs mitigate this cost somewhat, as we describe below.) Dynamic libraries are also larger than static libraries, since the dynamic ones have to include symbol tables. Beyond issues of call compatibility, a chronic source of problems is changes in library semantics. Since dynamic shared libraries are so easy to update compared to unshared or static shared libraries, it's easy to change libraries that are in use by existing programs, which means that the behavior of those programs changes even though "nothing has changed". This is a frequent source of problems on Microsoft Windows, where programs use a lot of shared libraries, libraries go through a lot of versions, and library version control is not very sophisticated. Most programs ship with copies of all of the libraries they use, and installers often will inadvertently install an older version of a shared library on top of a newer one, breaking programs that are expecting features found in the newer one. Wellbehaved applications pop up a warning before installing an older library over a newer one, but even so, programs that depend on semantics of older libraries have been known to break when newer versions replace the older ones.