Vous êtes sur la page 1sur 41

System Programming & Compiler Design Manual

SYSTEM PROGRAMMING
&
COMPILER DESIGN
LAB MANUAL

FOR VI SEMESTER
COMPUTER SCIENCE & ENGINEERING

DEPARTMENT OF COMPUTER SCIENCE


S.J.B.INSTITUTE OF TECHNOLOGY
BENGALURU
Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual

1.Lex Introduction
The word lexical in the traditional sense means pertaining to words. In terms of
programming languages, words are objects like variable names, numbers, keywords etc. Such
words are traditionally called tokens.
A lexical analyzer, or lexer for short, will take input as a string of individual letters and
divide this string into tokens. Additionally, it will filter out whatever separates the tokens (the
so-called white-space), i.e., lay-out characters (spaces,newlines etc.) and comments.
The lexical analyzer is the first phase of a compiler. Its main task is to read the input
characters and produce as output a sequence of tokens that the parser uses for syntax analysis.
This interaction, summarized schematically in Fig-1.1, is commonly implemented by making
the lexical analyzer be a subroutine or a co routine of the parser.

token
Source
Program

LEXICAL
ANALYZER

PARSER
get next
token

SYMBOL
TABLE

Fig.1.1Interaction of lexical analyzer with parser


1.1

A LANGUAGE FOR SPECIFYING LEXICAL ANALYZER

Several tools have been built for constructing lexical analyzers from special purpose
notations based on regular expressions.
In this section, we describe a particular tool, called Lex that has been widely used to
specify lexical analyzers for a variety of languages. We refer to the tool as Lex compiler, and
its input specification as the Lex language.
Lex is generally used in the manner depicted in Fig 1.2.First, a specification of a lexical
analyzer is prepared by creating a program lex.l in the lex language. Then, lex.l is run through
the Lex compiler to produce a C program lex.yy.c.The program lex.yy.c consists of a tabular
representation of a transition diagram constructed from the regular expression of lex.l,
together with a standard routine that uses the table to recognize lexemes. The actions
associated with regular expression in lex.l are pieces of C code and are carried over directly
to lex.yy.c.Finally lex.yy.c is run through the C compiler to produce an object program
a.out,which is the lexical analyzer that transforms an input stream into a sequence of tokens.

Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual

Lex source
Program lex.l

Lex
compiler

lex.yy.c

lex.yy.c

C
Compiler

a.out

a.out

input
stream

Sequence of
tokens

Fig 1.2 Creating a lexical analyzer with Lex


Lex specifications
A Lex program consists of three parts:
declarations
%%
translation rules
%%
auxiliary procedures
The declarations section includes declarations of variables,constants,and regular definitions.
The translation rules of a lex program are statements of the form
R1 {action1}
R2 {action2}
....
.
Rn {action n} where each Ri is regular expression and each action i, is a program fragment
describing what action the lexical analyzer should take when pattern Ri matches lexeme.
Typically, action i will return control to the parser. In Lex actions are written in C;in
general,however,they can be in any implementation language.
The third section holds whatever auxiliary procedures are needed by the actions.

1.2 THE ROLE OF PARSER


The parser obtains a string of tokens from the lexical analyzer and verifies that the
string can be generated by grammar for the source language. We expect the parser to report
any syntax errors in an intelligible fashion. It should also recover from commonly occurring
errors so that it can continue processing the remainder of its input.
We know that programs can contain errors at many different levels. For example, errors can
be
1.Lexical ,such as misspelling an identifier,keyword,or operator
Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


2.Syntactic,such as arithmetic expression with unbalanced parentheses
3.Sematic,such as an operator applied to an incompatible operand.
4.Logical,such as infinitely recursive call.
Often much of the error detection and recovery in a compiler is centered around the syntax
analysis phase.

1.3 LEXICAL CONVENTIONS


The notations for specifying tokens :
1.

Matches any single character except the newline (\n)

2. * Matches zero or more copies of the preceding expression.


3. [ ] A character class which matches any character within the brackets.
4. ^ Matches the beginning of a line as the first character of a regular expression.
5. $ Matches the end of line as the last character of a regular expression.
6. \ Used to escape metacharacter.
7. + Matches one or more occurrence of the preceding regular expression.
For example [0-9]+ matches 12,9but not an empty string.
8. ? Matches zero or one occurrence
9. | Matches either the preceding regular expression or the following expression.For example
are | is | because matches any three words.
10. / Matches the preceding regular expression but only if followed by the regular

expression.
11. ( ) Groups of series of regular expressions together into a new regular expression .
12. Blanks between tokens are optional ,with the exception that keywords must be
surrounded by blanks,newlines,the beginning of the program,or the final dot.

Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual

PART-A

LEX and YACC Programs


Execute the following programs using LEX:
1a. Program to count the number of characters, words, spaces and lines in a given
input file.
%{
int ch=0,sp=0,wd=0,ln=0;
%}
%%
[\n] {ln++;}
[^\t\n ]+ {ch+=yyleng,wd++;}
" " {sp++;}

/*For counting the line*/


/*For the character and words*/
/*For counting the space*/

%%
int main(int argc,char *argv[])
{
++argv,--argc;
if(argc>0)
yyin=fopen(argv[0],"r");
else
yyin=stdin;
yylex();
printf("Number of characters:%d\n",ch);
printf("Number of spaces:%d\n",sp);
printf("Number of words:%d\n",wd);
printf("Number of lines:%d\n",ln);
}

Procedures for Excecution


Save the lex code program with .l extension.
Create a file using the editor command and write the content of the file
[root@localhost ~]# vi data.txt
December
August
[root@localhost ~]# lex filename.l
[root@localhost ~]# cc lex.yy.c -ll
[root@localhost ~]# ./a.out data.txt

OUTPUT
Number
Number
Number
Number

of
of
of
of

characters:14
spaces:0
words:2
lines:2

Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual

Theoretical Explanation
The first section, the definition section, introduces any initial C program code we want copied
into final program. i.e. Take a variable for representing a character, line, space and words.
%{
int ch=0,ln=0,sp=0,wd=0;
%}
The next section is the rules section .Each rule is made of two parts: a pattern and an action,
separated by white space. The lexer that lex generates will execute the action when it
recognizes the pattern.
1.To count the number of line the pattern: [\n] .After lexer recognizes the [\n] pattern it
should perform the action .Here the action is to increment the line value, so the action is
represented by {ln++;}
2.To count the space the pattern : .After lexer recognizing the pattern it should perform
the action. Here the action is to increment the space value, so the action is represented by
{sp++;}
3.To count the words and characters the regular expression :[^ \t\n]+. Here ^ Matches the
beginning of a line as the first character of a regular expression. The tool lex provides an
internal variable yyleng which contains the length of the string our lexer recognized.+
symbol matches one or more occurrence of the preceding regular expression. So the action is
represented by {ch+=yyleng,wd++ ;}
The next section is procedures section, we know that when a program is invoked ,the
execution starts from the function main( ). We can pass the parameters to the function main( )
whenever the program is invoked and are called command line parameters.
To access the command line parameters the function main should have the following format :
Syntax:

main(int argc , char *argv[ ])


{
--------}

The function main( ) can take two arguments namely argc and argv where argc must be an
integer variable whereas argv is an array of strings.argc indicates the number of parameters
passed and argv represents a parameter that is passed to function main.
The file should be opened before writing a file or before reading a file.
Syntax: fopen(char *filename, char *mode)
Return values
-file pointer if successful.
-NULL if unsuccessful.
A lex lexer reads its input from the standard I/O file yyin.The default value of yyin is stdin,
since the default input source is standard input. If you to change the source you should
mentioned it explicitly.
yylex(): You call yylex ( ) to start or resume scanning. If a lex action does a return to
pass a value to the calling program ,the next call to yylex( ) will continue from the point
where it left off. All the code in the rules section is copied into yylex( ).
Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


1b. Program to count the numbers of comment lines in a given C program. Also
eliminate them and copy the resulting program into separate file.
%{
int comment=0;
%}
%%
"/*"[\n]*.*[\n]*"*/" {comment++;}
/*For counting the comment*/
"/*"[\"*/"]* { fprintf(yyout," "); } /*For eliminate */
%%
int main()
{
char infile[256],outfile[256];
printf("Enter the input filename:\n");
scanf("%s",infile);
printf("Enter the output filename:\n");
scanf("%s",outfile);
yyin=fopen(infile,"r");
yyout=fopen(outfile,"w");
yylex();
printf("Number of comment lines in the given file: %d\n",comment);
}

Procedures for Excecution


Save the lex code Program with .l extension.
Create a file using the editor command and write the content with comment
line
[root@localhost ~]# vi sum.c
/*To find the sum of two numbers*/
#include<stdio.h>
void main()
{
int a=1,b=2;
printf("%d",a+b);
/*To display the sum*/
}
[root@localhost ~]# lex filename.l
[root@localhost ~]# cc lex.yy.c -ll
[root@localhost ~]# ./a.out

OUTPUT
Enter the input filename:
sum.c
Enter the output filename:
add.c
Number of comment lines in the given file: 2
Atlast see the content of add file using the command
[root@localhost ~]# cat add.c

cat add.c

#include<stdio.h>
void main()
{
int a=1,b=2;
printf("%d",a+b);
}

Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual

Theoretical Explanation
The first section, the definition section, introduces any initial C program code we want copied
into final program. i.e. Take a variable for representing comment line.
%{
int comment=0;
%}
The next section is the rules section .Each rule is made of two parts: a pattern and an action,
separated by white space. The lexer that lex generates will execute the action when it
recognizes the pattern.
1. C style comment line is enclosed with /* and */ characters. To count the numbers of
comment line
the pattern: "/*"[\n]*.*[\n]*"*/".After lexer
recognizes the
"/*"[\n]*.*[\n]*"*/" pattern it should perform the action .Here the action is to increment
the comment value , so the action is represented by {comment++;}.As you know . (dot)
doesnt match the newline character ,it matches the character.
2.Now to eliminate the comment line and then to copy the remaining content in file the
pattern : /*"[\"*/"]* . Here the metacharacter \ is to suppress the character *. After lexer
recognizes the pattern it inserts whitespace in the place of comment line and copies the
remaining content of the file.

fprintf()function
The function is similar to that of printf( ) except the syntax .The Prototype of fprintf is:
Syntax : fprintf(fp, control string ,list)
fp : file pointer associated with the file.

Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


2a. Program to recognize a valid arithmetic expression and to recognize the identifiers
and operators present. Print them separately.
%{
int count=0,ids=0,bracket=0;
%}
%%
[+] {printf("+");count++;}
[-] {printf("-");count++;}
[*] {printf("*");count++;}
[/] {printf("/");count++;}
[a-zA-Z0-9]+ {ids++;}
/*For recognizing the identifiers*/
[(] {bracket++;}
[)] {bracket--;}
%%
int main()
{
printf("Enter the Arithmetic expression:\n");
yylex();
printf("Number of Operators=%d\n",count);
printf("Number of Identifiers=%d\n",ids);
if(count>=ids||bracket!=0||ids==1)
printf("Invalid expression\n");
else
printf("Valid expression\n");
}

Procedures for Excecution


Save the lex code Program with .l extension
[root@localhost ~]# lex filename.l
[root@localhost ~]# cc lex.yy.c -ll
[root@localhost ~]# ./a.out

OUTPUT
Enter the Arithmetic expression:
2+3*4
+* (Press Ctrl d)
Number of Operators=2
Number of Identifiers=3
Valid expression

Compiled by: Darshan.K.R

Enter the Arithmetic expression:


2+3*4+*- (Press ctrl d)
Number of Operators=3
Number of Identifiers=3
Invalid expression

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual

Theoretical Explanation
The first section, the definition section, introduces any initial C program code we want
copied into final program. i.e. Take variables for representing the operators, identifiers and
for brackets.
%{
int count=0,ids=0,bracket=0;
%}

The next section is the rules section .Each rule is made of two parts: a pattern and an action,
separated by white space. The lexer that lex generates will execute the action when it
recognizes the pattern.
1. To recognize the operators the pattern is : [ + ] ,[ - ],[ * ],[ / ]. After lexer recognizes the
pattern it should perform the action .Here the action is to display the operator and to count the
operator, so the action is represented by {printf("operator");count++;}
2.To recognize the identifiers the pattern is :[a-zA-Z0-9]+ . After lexer recognizes the pattern
it should perform the action .Here the action is to increments the identifier value.
3. To recognize the brackets the pattern is [ ( ] and [ )].After lexer recognizes the pattern it
should perform the action. Here the action is to increment brackets and decrement the bracket
values respectively.

Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


2b. Program to recognize whether a given sentence is simple or compound.
%{
int flag=0;
%}
%%
"
"
"
"
"
"

and " |
or " |
but " |
/*Words which leads to the compound statement*/
because " |
than " |
nevertheless " {flag=1;}

%%
int main()
{
printf("Enter the sentence:\n");
yylex();
if(flag==1)
printf("Given sentence is compound statement\n");
else
printf("Given sentence is simple statement\n");
}

Procedures for Excecution


Save the lex code Program with .l extension
[root@localhost ~]# lex filename.l
[root@localhost ~]# cc lex.yy.c -ll
[root@localhost ~]# ./a.out

OUTPUT
Enter the sentence:
abc is alphabet
abc is alphabet (Press Ctrl d)
Given sentence is simple statement
[root@localhost ~]# ./a.out
Enter the sentence:
abc or 123 are not equal
abc123 are not equal (Press Ctrl d)
Given sentence is compound statement

Theoretical Explanation
The first section, the definition section, introduces any initial C program code we want
copied into final program. i.e. Take variables for representing the words which leads to a
compound statement.
%{
int flag=0;
%}

The next section is the rules section .Each rule is made of two parts: a pattern and an action,
separated by white space. The lexer that lex generates will execute the action when it
recognizes the pattern.
1. List few words which lead to compound statement. After lexer recognizes the pattern
it should perform the action .Here the action is to set the value as it recognizes the
words in the given statement.
Compiled by: Darshan.K.R
S.J.B.I.T
Dept. of C.S.E

System Programming & Compiler Design Manual


3. Program to recognize and count the number of identifiers in a given input file.
%{
int ids=0;
%}
%%
(([0-9]+[a-zA-Z]*)|([_][0-9]*)|([`!@#$%^&*-+=][a-zA-Z]+)) {;}
([a-zA-Z]|[_])(([0-9]|[a-zA-Z]|[_])*) {printf("%s\t",yytext);ids++;}
%%
int main(int argc,char *argv[])
{
--argc,++argv;
if(argc>0)
yyin=fopen(argv[0],"r");
printf("Identifiers in the given file are\n");
yylex();
printf("Number of identifiers are %d",ids);
}

Procedures for Excecution


Save the lex code Program with .l extension
Create a file using editor command and write the content
[root@localhost ~]#vi data.txt
bangalore
_bangalore
12bangalore
+bangalore
banga_lore
[root@localhost root]# lex filename.l
[root@localhost root]# cc lex.yy.c -ll
[root@localhost root]# ./a.out data.txt

OUTPUT
Identifiers in the given file are
bangalore
_bangalore
banga_lore
Number of identifiers are 3

Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual

Theoretical Explanation
The first section, the definition section, introduces any initial C program code we want
copied into final program. i.e. Take variables for representing the identifiers.
%{
int ids=0;
%}

The next section is the rules section .Each rule is made of two parts: a pattern and an action,
separated by white space. The lexer that lex generates will execute the action when it
recognizes the pattern.
1. Invalid identifiers start with number and symbols . To recognize an invalid identifiers that
start with number the pattern is ([0-9]+[a-zA-Z]*).To recognize an invalid identifiers that
start with underscore then followed by the number the pattern is ([_][0-9]*).Then
recognize an invalid identifiers that start with symbols the pattern is
([`!@#$%^&*-+=][a-zA-Z]+).

So the combined the regular pattern is


(([0-9]+[a-zA-Z]*)|([_][0-9]*)|([`!@#$%^&*-+=][a-zA-Z]+)).

After lexer recognizes the pattern it should perform the action Here the action is eliminated
in those in the output. So the action is just { ; }.
2.A valid identifiers start with an alphabet or underscore followed alphabet , number or
underscore. To recognize valid identifiers the pattern is : ([a-zA-Z]|[_])(([0-9]|[a-zAZ]|[_])*). After lexer recognizes the pattern it should perform the action Here the action is
to display the valid identifiers and to count that identifiers.

yytext( )
Whenever a lexer matches a token , the text of the token is stored in the null terminated string
yytext . In some implementations of lex , yytext is a character array declared by :
extern char yytext[ ]
The contents of yytext are e replaced each time a new token is matched. If yytext[ ] is an
array ,any token which is longer than yytext will overflow the end of the array and cause the
lexer to fail in some hard to predict way. In AT&T lex ,the standard size for yytext [ ] is 200
character.

Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual

2.YACC Introduction
The unix utility yacc (Yet Another Compiler Compiler) parses a stream of token, typically
generated by lex, according to a user-specied grammar.

2.1 Structure of a yacc file


A yacc le looks much like a lex le:
definitions
%%
rules
%%
code
Denition: All code between %{ and %} is copied to the beginning of the resulting C le.
Rules: A number of combinations of pattern and action: if the action is more than a single
command it needs to be in braces.
Code: This can be very elaborate, but the main ingredient is the call to yylex, the lexical
analyzer. If the code segment is left out, a default main is used which only calls yylex.

Definition section
There are three things that can go in the denitions section:
C code: Any code between %{ and %} is copied to the C le. This is typically used for
dening le variables, and for prototypes of routines that are dened in the code segment.
Denitions: The denition section of a lex le was concerned with characters; in yacc this is
tokens.
Example : %token NUMBER.
These token denitions are written to a .h le when yacc compiles this le.
Associativity rules These handles associativity and priority of operators.

2.2 Lex Yacc interaction


Conceptually, lex parses a le of characters and outputs a stream of tokens; yacc accepts a
stream of tokens and parses it, performing actions as appropriate. In practice, they are more
tightly coupled.
If your lex program is supplying a tokenizer, the yacc program will repeatedly call the yylex
routine. The lex rules will probably function by calling return everytime they have parsed a
token.
If lex is to return tokens that yacc will process, they have to agree on what tokens there are.
This is done as follows :
For Example
1.The yacc le will have token denition %token NUMBER in the denitions section.
Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


2. When the yacc le is translated with yacc d , a header le y.tab.h is created that has
denitions like
#define NUMBER 258.
.
3. The lex le can then call return NUMBER, and the yacc program can match on this token.

2.3 Rules section


The rules section contains the grammar of the language you want to parse. This looks like
statement : INTEGER = expression
| expression
;
expression : NUMBER + NUMBER
| NUMBER - NUMBER
;

This is the general form of context-free grammars, with a set of actions associated with each
matching right-hand side. It is a good convention to keep non-terminals (names that can be
expanded further) in lower case and terminals (the symbols that are nally matched) in upper
case.
The terminal symbols get matched with return codes from the lex tokenizer. They are typically denes coming from %token denitions in the yacc program or character values.

2.4 Compiling and running a simple parser


On a UNIX system ,yacc takes your grammar and creates y.tab.c ,the C Language parser, and
y.tab.h, the include file with the token number definitions.
Lex creates lex.yy.c,the C language lexer .You need only compile them together with the yacc
and lex libraries. The Libraries contain usable default versions of all of the supporting
routines ,including main( ) that calls the parser yyparse( ) and exits.
[root@localhost
[root@localhost
[root@localhost
[root@localhost

~]#
~]#
~]#
~]#

lex filename.l
yacc -d filename.y
cc lex.yy.c y.tab.c ll
./a.out

Compiled by: Darshan.K.R

S.J.B.I.T

#makes lex.yy.c
#makes y.tab.c and y.tab.h
#compile and link C files

Dept. of C.S.E

System Programming & Compiler Design Manual


Execute the following programs using YACC:
4a. Program to recognize a valid arithmetic expression that uses operators +, -, * and
/.
Lex part
%{
#include "y.tab.h"
%}
%%
[a-zA-Z][a-zA-Z0-9]*
[0-9]+
.
\n

{return
{return
{return
{return

ID;}
NUMBER;}
yytext[0];}
0;}

/*Logical EOF*/

%%

Yacc part
%token NUMBER ID
%left '+''-'
%left '*''/'

/*token definition*/
/*Operator precedences*/
/*Operator precedences*/

%%
expr:expr '+' expr;
|expr '-' expr;
|expr '*' expr;
|expr '/' expr;
|'('expr')'
|NUMBER
|ID
;

/*Grammar*/

%%
int main()
{
printf("Enter the Expression\n");
yyparse();
printf("Valid Expression\n");
}
int yyerror()
{
printf("Expression is invalid\n");
exit(0);
}

Procedures for Excecution


Save the lex code Program with .l extension
Save the yacc code Program with .y extension
[root@localhost
[root@localhost
[root@localhost
[root@localhost

~]#
~]#
~]#
~]#

lex filename.l
yacc -d filename.y
cc lex.yy.c y.tab.c -ll
./a.out

OUTPUT
Enter the Expression
+23
Expression is invalid

Enter the Expression


2+3-4
Valid Expression

Theoretical Explanation
Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


LEXER(lex code)
We need a lexer to feed it tokens. The yyparse( ) parser is the high level routine, and calls the
lexer whenever it needs a token from the input .As soon as the lexer finds token of interest to
the parser ,it return to the parser ,returning the token code as value. Yacc defines the token
names in the parser as C preprocessor names in y.tab.h so the lexer can use them.
1. Strings of digits are number ,To match the string of digit the pattern
to return the token NUMBER value.

is : [0-9]+. Action is

2. Strings of alphabets are identifiers. To match an identifiers the pattern


[a-zA-z][a-zA-Z0-9]*. Action is to return the token ID value.
3. . (dot) matches a character. Whenever a lexer matches a token , the text of the token is
stored in the null terminated string yytext . This rule says to return any character otherwise
not handled as a single character token to the parser. Character token are usually punctuation
such as parentheses ,semicolons and single-character operator. If the parser receives a token
that it doesnt know about ,it generates a syntax error ,so this rule lets you handle all of the
single-character tokens easily while letting yaccs error checking catch and complain about
invalid input.
4.A newline (\n) character returns an end of input token(number zero ) to tell the parser that
there is no more to read.

PARSER(Yacc code)
The token definition for the number ad identifiers

%token NUMBER ID

Yacc lets you to specify the operator precedences explicitly.


%left +-
%left */
Each of these declarations defines a level of precedence. That +and - are left associative
and have lower precedence level,* and / are left associative and have the higher
precedence level.
Grammar for an valid Expression is as follows
expr:expr '+' expr;
|expr '-' expr;
|expr '*' expr;
|expr '/' expr;
|'('expr')'
|NUMBER
|ID

yyparse():The entry point to a yacc-generated parser is yyparse ( ).Whenever your


programs call yyparse( ) ,the parser attempts to parse an input stream. The parser returns a
value of zero if the parse succeeds and non-zero if not.

yyerror():Whenever a yacc parser detects a syntax error ,it calls yyerror ( ) to report the
error to the user.

Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


4b. Program to recognize a valid variable, which starts with a letter, followed by any
number of letters or digits.
Lex part
%{
#include"y.tab.h"
%}
%%
[0-9]
[a-z]
.
\n

{return
{return
{return
{return

DIG;}
LET;}
yytext[0];}
0;}

/*Logical EOF*/

%%

Yacc part
%token LET
%token DIG
%%
stmt:id {printf("Valid identifier \n");}
;
id: letter next
| letter {;}
;
next: letter next
| digit next
| letter
| digit {;}
;
letter: LET {;}
;
digit: DIG {;}
;
%%
int main()
{
printf("Enter an identifier:");
yyparse();
}
int yyerror()
{
printf("Not a valid identifier\n");
exit(0);
}

Procedures for Excecution


Save the lex code Program with .l extension
Save the yacc code Program with .y extension
[root@localhost ~]# lex filename.l
[root@localhost ~]# yacc -d filename.y
[root@localhost ~]# cc lex.yy.c y.tab.c -ll
[root@localhost ~]# ./a.out

OUTPUT
Enter an identifier:ab12
Valid identifier

Compiled by: Darshan.K.R

Enter an identifier:12dc
Not a valid identifier

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual

Theoretical Explanation
LEXER(lex code)
1. To match the digit the pattern is [0-9].Action is to return the token DIG value.
2. To match the letter the pattern is [a-z].Action is to return the token DIG value.

PARSER
The grammar to recognize a valid identifier
statement: id
;
id: letter next
| letter {;}
;
next: letter next
| digit next
| letter
| digit {;}
;
letter: LET {;}
;
digit: DIG {;}

Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


5a. Program to evaluate an arithmetic expression involving operators +, -, * and /.
Lex Part
%{
#include"y.tab.h"
extern int yylval;
%}
%%
[0-9]+ {yylval=atoi(yytext); return(NUM);}
[ \t];
.
{return yytext[0];}
\n
{return 0;}

/*Ignore the whitespace*/

%%

Yacc part
%token NUM
%left '+''-'
%left '*''/'
%%
stmt : expr { printf("Result:%d\n",$1);return 0; }
;
expr :expr'+'expr {$$=$1+$3;}
| expr'-'expr
{$$=$1-$3;}
| expr'*'expr
{$$=$1*$3;}
| expr'/'expr
{$$=$1/$3;}
| '('expr')'
{$$=-$2;}
| NUM
{$$=$1;}
;
%%
int main()
{
printf("Enter the expression\n");
yyparse();
}
int yyerror()
{
printf("Invalid input\n");
exit(0);
}

Procedures for Excecution


Save the lex code Program with .l extension
Save the yacc code Program with .y extension
[root@localhost
[root@localhost
[root@localhost
[root@localhost

~]#
~]#
~]#
~]#

lex filename.l
yacc -d filename.y
cc lex.yy.c y.tab.c -ll
./a.out

OUTPUT
Enter the expression
2+3
Result:5

Compiled by: Darshan.K.R

Enter the expression


2*3+4
Result:10

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual

Theoretical Explanation
Lexer
1.String of digit is number ,whitespace is ignored. Whenever the lexer returns a token to the
parser ,if the toke has an associated value, the lexer must store the value in yylval befor
returning. We explicitly declare yylval.

Parser
The grammar is as follows
expr
|
|
|
|
|
;

:expr'+'expr
expr'-'expr
expr'*'expr
expr'/'expr
'('expr')'
NUM

Compiled by: Darshan.K.R

{$$=$1+$3;}
{$$=$1-$3;}
{$$=$1*$3;}
{$$=$1/$3;}
{$$=-$2;}
{$$=$1;}

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


5b. Program to recognize strings aaab, abbb, ab and a using the grammar
(anbn, n>= 0).
Lex part
%{
#include"y.tab.h"
%}
%%
a
b
.
\n

{return A;}
{return B;}
{return yytext[0];}
{return yytext[0];}

%%

Yacc part
%token A B
%%
str : s'\n'
s : A s B ;
| ;

{return 0;}

%%
int main()
{
printf("Type the string\n");
yyparse();
printf("Valid string");
}
int yyerror()
{
printf("Invalid string");
exit(0);
}

Procedures for Excecution


Save the lex code Program with .l extension
Save the yacc code Program with .y extension
[root@localhost
[root@localhost
[root@localhost
[root@localhost

~]#
~]#
~]#
~]#

lex filename.l
yacc -d filename.y
cc lex.yy.c y.tab.c -ll
./a.out

OUTPUT
Type the string
aaabbb
Valid string

Type the string


aaab
Invalid string

Theoretical Explanation
LEXER
1.To match the string a the pattern is a. Action is to return the token A value .
2.To match the string b the pattern is b. Action is to return the token B value .

PARSER
The grammar is as follow:
Compiled by: Darshan.K.R

string : s'\n'
s : A s B ;

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


6. Program to recognize the grammar (anb, n>= 10).
Lex part
%{
#include"y.tab.h"
%}
%%
a
b
.
\n

{return
{return
{return
{return

A;}
B;}
yytext[0];}
yytext[0];}

%%

Yacc Part
%token A B
%%
str: s'\n' {return 0;}
s : x B ;
x : x A
| ;
%%
int main()
{
printf("Type the string\n");
yyparse();
printf("Valid string");
}
int yyerror()
{
printf("Invalid string");
exit(0);
}

Procedures for Excecution


Save the lex part code with .l extension
Save the yacc part code with .y extension
[root@localhost
[root@localhost
[root@localhost
[root@localhost

~]#
~]#
~]#
~]#

lex filename.l
yacc -d filename.y
cc lex.yy.c y.tab.c -ll
./a.out

OUTPUT
Type the string
aaaaab
Valid string

Type the string


aabb
Invalid string

Theoretical Explanation
LEXER
1.To match the string a the pattern is a. Action is to return the token A value .
2.To match the string b the pattern is b. Action is to return the token B value .

PARSER
The grammar is as follow:

Compiled by: Darshan.K.R

string : s \n
s : x B ;
x : x A
| ;

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


INTRODUCTION TO SHELL PROGRAMMING
A set of commands that are taken together as a single unit within a file and executed at a
stretch is called a shell program or a shell script.
A shell script is named just like all other files. However, by convention shell script name uses
.sh extension. A shell program runs in the interpretive mode ,that is ,one statement is
executed at a time.
Example :
#!/bin/sh
echo Todays date is :`date`
echo My shell :$SHELL

#to display the date

example.sh
The first line is interpreter line. Here, this line specifies the shell we are using i.e Bourne shell
A shell script is executed by using the shell command sh as shown below.
[root@localhost ~]# sh example.sh
or else use
Todays date is : Sat Jan 27 09:10:18 IST 2004
My Shell :/bin/sh

[root@localhost ~]# . /example.sh

3.1 Comments
In shell scripts comments are written using the hash (#) character as the first character of the
comment line.
3.2 The read Command
The read command or statement is the shells internal tool for taking the input from the user
,i.e., making scripts interactive. It is used with one or more variables. Input supplied through
the standard input is read into the variables.
#!/bin/sh
echo Enter the value of x
read x
echo The value of x is : $x
value.sh
When you use a statement like read x the script pauses at that point to take input from the
keyboard. Whatever you enter is stored in the variable x. Since this is a form of assignment
,no $ is used before x. To display that value we have to use $ symbol along with variable.

Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


3.3 Special parameters Used by Shell.

Variable

Significance

$#

Number of arguments specified in command line

$*

Complete set of positional parameters as a single


string
Name of the file or executed command

$0
$1,$2,$3.
$?

Positional parameters representing command line


arguments
Exit status of last executed command

$$

PID of current shell

$!

PID of the last background job

3.4 Using command line


#!/bin/sh
echo The program Name:$0
echo The number of arguments specified is: $#
echo The arguments are :$*
echo Value of $1 and $2
command.sh
[root@localhost ~]# sh command.sh Nokia Motorola
#command line
The program Name:command.sh
The number of arguments specified is :2
The arguments are :Nokia Motorola
Value of Nokia and Motorola
The first argument is read by the shell into the parameter $1 ,the second argument into $2.We
can use more positional parameters in this way up to $9.(and using the shift command ,you
can go beyond)
3.5 The if Conditional
This is the simplest of all the branching control structures. It has the following general
formats
if command is successful
then
execute commands
fi
successful

if command is successful
then
execute commands
else
execute commands
fi

if command is successful
then
execute commands
elif command is
then
commands
else
commands

fi
Format 1
Compiled by: Darshan.K.R

Format 2
S.J.B.I.T

Format 3
Dept. of C.S.E

System Programming & Compiler Design Manual


Every if is close with corresponding fi ,and youll encounter an error of one is not present.
If command succeeds ,the sequence of commands following it is executed .If command
fails,then the else statement(if present) is executed.
3.6 Using test and [ ] to evaluate expressions.
When you use if to evaluate expressions, you need the test statement because the true or
false values returned by expressions cant be directly handled by if. test Uses certain
operators to evaluate the condition and either a true or false exit status,which is then used by
if for making decisions.
Numerical comparison operators used by test
Operator
-eq
-ne
-gt
-ge
-lt
-le

Meaning
Equal to
Not equal to
Greater than
Greater that or equal to
Less than
Less than or equal to

The operators begins with (hyphen) ,followed by a two-letter string. The operators are quite
mnemonic; -eq implies equal to ,-lt less than and so on.
Example:
[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]#
[root@localhost ~]#

a=8 ; b=9
test $a eq $b ; echo $?
1
test $a lt $b
0

; to combine the commands


False
True

3.7 expr :Computation


expr can perform basic arithmetic operations (+,-,*,/,%).
[root@localhost ~]# a=2 ; b=9
[root@localhost ~]# c= `expr $x + $y ` ; echo $c
[root@localhost ~]# 11
The operands must be enclosed on either side by whitespace. For multiplication we have to
use \ (Escaping technique) to prevent the shell from interpreting it as metacharacter.
3.8 while Looping
while statement repeatedly performs a set of instructions until the control command returns
a true exit status. The general syntax of this :
while condition is true
do
commands
done
Compiled by: Darshan.K.R

# do is a keyword
#done is a keyword
S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


The commands enclosed by do and done are executed repeatedly as long as condition
remains true.
3.9 The case conditional
In case statement ,the statement which matches an expression is executed.
The general syntax of the case statement is as follows:
case expression in
pattern1)command1;;
pattern2)command2;;
pattern3)command2;;
.
esac
case first matches expression with pattern1.If match succeeds, the its executes command1.If
matches fails, the pattern2 is matched ,and so forth. Each command list is terminated with a
pair of semicolons, and the entire construct is closed with esac (reverse of case).
3.10 eval command
The use of eval command makes the shell to scan the command line once more, that is,
second time and then actually executes the command line.
Example:
[root@localhost ~]# b=a
[root@localhost ~]# c=b
[root@localhost ~]# eval echo \$$c
[root@localhost ~]# a
The first two statements in this example are assignment statement. When the shell comes
cross the third statement, because of eval; it first scans the statements once for any possible
pre-evaluation or substitution. Here because of metacharacter \ the first $ is overlooked and
the next variable $c gets evaluated resulting b. After this evaluation the third statement will
be equivalent to echo $b. Then this statement gets executed as usual by the shell resulting as
the answer.

Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual

PART B
UNIX Programming
1a.Non-recursive shell script that accepts any number of arguments and prints them in the
Reverse order, ( For example, if the script is named rargs then executing rargs A B C should
produce C B A on the standard output).
#!/bin/sh
c=$#
echo "The arguments in reverse order are:"
while [ $c -ne 0 ]
do
eval echo \$$c
c=`expr $c - 1`
done

Procedures for Execution


Save the file with .sh extension.(Not Mandatory)
[root@localhost ~]# sh filename.sh one two three

OUTPUT
The arguments in reverse order are:
three
two
one
[root@localhost ~]# sh filename.sh A B C
The arguments in reverse order are:
C
B

Theoretical Explanation:
The first line is the interpreter line informing that we are using Bourne shell. We can use any
of the shell which is available like Korn shell (ksh), Bash shell (bash), C shell (c).
The special shell parameters $# holds the number of arguments passed in the command line
.Assume you are passing three arguments X Y Z .So $# value will be 3.
The while loop does the operation still control command returns a true exit status.
The use of eval command makes the shell to scan the command line once more, that is,
second time and then actually executes the command line. Here because of metacharacter \
the first $ is overlooked and the next variable $c is gets evaluated resulting 3. After this
evaluation the statement will be equivalent to echo $3.So the positional parameter $3 value is
displayed and so on the remaining positional parameter value is displayed.
While evaluating the expression the operands must be enclosed on either side by whitespace.

Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


1b. C program that creates a child process to read commands from the standard input
and execute them (a minimal implementation of a shell like program). You can
assume that no arguments will be passed to the commands to be executed.
#include<stdio.h>
#include<stdlib.h>
#include<unistd.h>
main()
{
int x,i;
char cmd[10];
x=fork ();
/*To create a child process*/
if(x==0)
do
{
printf("Child process has been created\n");
printf("Enter the command to be executed\n");
scanf("%s",cmd);
system(cmd);
printf("Enter 1 to continue and 0 to exit\n");
scanf("%d",&i);
} while(i!=0);
wait();
}

Procedures for Execution


Save the file with .c extension
[root@localhost ~]# cc filename.c
[root@localhost ~]# ./a.out

OUTPUT
Child process has been created
Enter the command to be executed
date
Tue Jan 20 16:17:01 IST 2009
Enter 1 to continue and 0 to exit
1
Child process has been created
Enter the command to be executed
cal
January 2009
Su Mo Tu We Th Fr Sa
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
Enter 1 to continue and 0 to exit

Theoretical Explanation:
All processes in a UNIX system ,expect the very first process (Process 0) which is created by
the system boot code and remaining are created via the fork system call.
The fork system call is used to create a child process .The function prototype of fork is :
Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


# include <unistd.h>
pid_t fork(void);
Returns: 0 in child, process ID of child in parent, 1 on error
The new process created by fork is called the child process. This function is called once but
returns twice. The only difference in the returns is that the return value in the child is 0,
whereas the return value in the parent is the process ID of the new child. The reason the
child's process ID is returned to the parent is that a process can have more than one child, and
there is no function that allows a process to obtain the process IDs of its children. The reason
fork returns 0 to the child is that a process can have only a single parent, and the child can
always call getppid to obtain the process ID of its parent. (Process ID 0 is reserved for use
by the kernel, so it's not possible for 0 to be the process ID of a child.)
System function
The system function allows the users to access the standard output or standard input of the
executed command. The function prototype of system function
#include <stdlib.h>
int system(const char *cmdstring);

Returns: (see below)


If cmdstring is a null pointer, system returns nonzero only if a command processor is
available. Because system is implemented by calling fork, exec, and waitpid, there are
three types of return values.
1. If either the fork fails or waitpid returns an error other than EINTR, system returns 1
with errno set to indicate the error.
2. If the exec fails, implying that the shell can't be executed, the return value is as if the
shell had executed exit(127).
3. Otherwise, all three functions fork, exec, and waitpidsucceed, and the return value
from system is the termination status of the shell, in the format specified for waitpid.
Wait function
When a process terminates, either normally or abnormally, the kernel notifies the parent by
sending the SIGCHLD signal to the parent. Because the termination of a child is an
asynchronous event it can happen at any time while the parent is running this signal is the
asynchronous notification from the kernel to the parent. The parent can choose to ignore this
signal, or it can provide a function that is called when the signal occurs: a signal handler. The
default action for this signal is to be ignored. So we need to be aware that a process that calls
wait .The function prototype of wait function
#include <sys/wait.h>
pid_t wait(int *statloc);

Return: process ID , or 1 on error


Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


2a. Shell script that accepts two file names as arguments, checks if the permissions for
these files are identical and if the permissions are identical, outputs the common
permissions, otherwise outputs each file name followed by its permissions.
#!/bin/sh
perm1=`ls -l $1|cut -c 1-10`
perm2=`ls -l $2|cut -c 1-10`
if [ $perm1 = $perm2 ]
then
echo "Files have same permissions"
echo "The Files permission is $perm1"
else
echo "The Files have different permission "
echo "$1 has permission $perm1"
echo "$2 has permission $perm2"
fi

Procedures for Execution


Save the file with .sh extension.(Not Mandatory)
Create two files using the editor command
[root@localhost ~]# vi data1.txt
[root@localhost ~]# vi data2.txt
The default permission of files will be -rw-r--r--(644)
To Change the permission use the chmod command
[root@localhost ~]# sh filename.sh data1.txt data2.txt
OUTPUT
Files have same permissions
The Files permission is -rw-r--r-[root@localhost root]# chmod 777 data1.txt
[root@localhost root]# sh filename.sh data1.txt data2.txt
The Files have different permission
data1.txt has permission -rwxrwxrwx
data2.txt has permission -rw-r--r--

Theoretical Explanation:
Listing the file attributes is done with ls l (long) option. This option displays all the 7
attributes of a file like its permission,links,owner ,group owner ,size ,last modification time
and filename.
[root@localhost ~]# ls l
total 45
-rw-rr-- 1 chan sjbit 1765 Jan 27 10:89 data.txt

To extract the specific field we need to use simple filter cut follows with c option with a list
of column numbers. Ranges can be specified using the hyphen.
To extract the permission field we have use cut c 1-10.
Here we are using pipeline mechanism to redirect the output of ls l to the simple filter cut.
A pipe is general mechanism by using which ,the output of one program is redirected as the
input to another program directly without using any temporary files in between. In pipeline
,the command on the left of the | must use standard output and the one on the right must use
standard input.
Compiled by: Darshan.K.R
S.J.B.I.T
Dept. of C.S.E

System Programming & Compiler Design Manual


2b.C program to create a file with 16 bytes of arbitrary data from the beginning and
another 16 bytes of arbitrary data from an offset of 48. Display the file contents to
demonstrate how the hole in file is handled.
#include<stdio.h>
#include<sys/types.h>
int main()
{
char buf1[]="abcdefghijklmnop";
char buf2[]="ABCDEFGHIJKLMNOP";
int fd=creat("data.txt","w");
write(fd,buf1,16);
lseek(fd,48,SEEK_SET);
write(fd,buf2,16);
system("vi data.txt");
return 0;
}

/*creating a file*/
/*writing the data*/
/*To set the offset*/

Procedures for Execution


Save the file with .c extension
[root@localhost ~]# cc filename.c
[root@localhost ~]# ./a.out

OUTPUT
abcdefghijklmnop^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@ABCDEFGHIJKLMNOP

Theoretical Explanation:
creat Function
A new file can be created by calling the creat function. The function prototype of creat
function
#include <fcntl.h>
int creat(const char *pathname, mode_t mode);

Returns: file descriptor opened for write-only if OK, 1 on error

write Function
Data is written to an open file with the write function.
#include <unistd.h>
ssize_t write(int filedes, const void *buf, size_t nbytes);

Returns: number of bytes written if OK, 1 on error

lseek Function
Every open file has an associated "current file offset," normally a non-negative integer that
measures the number of bytes from the beginning of the file .Read and write operations
normally start at the current file offset and cause the offset to be incremented by the number
of bytes read or written. By default, this offset is initialized to 0 when a file is opened, unless
the O_APPEND option is specified.
Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


An open file's offset can be set explicitly by calling lseek. The function prototype of lseek
function :
#include <unistd.h>
off_t lseek(int filedes, off_t offset, int whence);

Returns: new file offset if OK, 1 on error

The interpretation of the offset depends on the value of the whence argument.

If whence is SEEK_SET, the file's offset is set to offset bytes from the beginning of the
file.
If whence is SEEK_CUR, the file's offset is set to its current value plus the offset. The
offset can be positive or negative.
If whence is SEEK_END, the file's offset is set to the size of the file plus the offset. The
offset can be positive or negative.

Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


3a.Shell function that takes a valid directory names as an argument and recursively
descends all the subdirectories, finds the maximum length of any file in that hierarchy
and writes this maximum value to the standard output.
#!/bin/sh
echo "Enter the directory name"
read dirname
ls -lR $dirname|tee file1|cut -c 38-43,57-64 > file2
cat file1
sort -n file2 > file3
echo "Maximum size file is"
tail -1 file3

#column numbers are


dependent on system

Procedures for Execution


[root@localhost root]# sh filename.sh
Enter the directory name
run

OUTPUT
run:
total 224
-rw-r--r--rw-r--r--rw-r--r--rw-r--r--rw-r--r--rw-r--r--rwxr-xr-x
drwxr-xr-x
-rw-r--r-drwxr-xr-x
-rw-r--r--rw-r--r--

1
1
1
1
1
1
1
2
1
2
1
1

root
root
root
root
root
root
root
root
root
root
root
root

root
root
root
root
root
root
root
root
root
root
root
root

17
28
618
614
477
476
22195
4096
35781
4096
9610
76

Jan
Jan
Jan
Jan
Jan
Jan
Jan
Jan
Jan
Jan
Jan
Jan

20
24
24
24
24
24
24
25
24
25
24
24

07:36
07:28
19:03
19:02
07:26
07:26
19:55
11:45
19:55
11:45
19:55
19:55

1
100
1a.l
1a.l~
1b.l
1b.l~
a.out
d
lex.yy.c
o
y.tab.c
y.tab.h

run/d:
total 0
run/o:
total 0
Maximum size file is
35781 lex.yy.c

Theoretical Explanation:
To take the input from the user we have to use read statement with a variable.
Recursively to descend all the files in list we need to use ls command with R option.Also
l is used for long listing. The output of ls command is piped to a terminal using tee
command ,as well as to a file. Then further it is piped to the simple filter cut . To extract
specific column we need use c option with a list of column numbers, delimited by a comma.
To extract the filesize field we need to use the range 38-43 and to extract the file name we
are use 57-64.Then at last the output is redirected to an another file.
To display the content of a file we can use cat command. Then the file content is sorted with
respect to numerals using sort n option. Then the sorted output is redirected to an another
file. So the last line while have the maximum value ,we can extract that using the filter tail
with -1 option from the file.
Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


3b. C program that accepts valid file names as command line arguments and for each
of the arguments, prints the type of the file ( Regular file, Directory file, Character
special file, Block special file, Symbolic link etc.)
#include<stdio.h>
#include<sys/stat.h>
#include<sys/types.h>
#include<unistd.h>
int main(int argc,char *argv[])
{
int i;
struct stat buf;
for(i=1;i<argc;i++)
{
lstat(argv[i],&buf);
printf("%s\n",argv[i]);
if(S_ISCHR(buf.st_mode))
printf("File is a character file\n");
else if(S_ISBLK(buf.st_mode))
printf("File is a block file\n");
else if(S_ISREG(buf.st_mode))
printf("File is regular file\n");
else if(S_ISDIR(buf.st_mode))
printf("File is a directory file\n");
else if(S_ISLNK(buf.st_mode))
printf("Symbolic link file\n");
}
return(0);
}

Procedures for Execution


For creating Block file use the mknod command with b option with any major
and minor number
Example:mknod b data1 12 34 (data1 is the filename)
For creating Character file use the mknod command with c option with any
major and minor number
Example:mknod c data2 12 34 (data2 is the filename)
For symbolic link use the command ln -s.
Example: ln -s x.c data3 (data3 is the Symbolic link file for x.c)
[root@localhost ~]# cc filename.c
[root@localhost ~]# ./a.out x.c data1 data2 data3

OUTPUT
x.c
File is regular file
data1
File is a block file
data2
File is a character file
data3
Symbolic link file

Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual

Theoretical Explanation:
lstat Functions
#include <sys/stat.h>
int lstat(const char *restrict pathname, structstat *restrict buf);

Return: 0 if OK, 1 on error


Given a pathname, the stat function returns a structure of information about the named file.
The fstat function obtains information about the file that is already open on the descriptor
filedes. The lstat function is similar to stat, but when the named file is a symbolic link,
lstat returns information about the symbolic link, not the file referenced by the symbolic
link.
The second argument is a pointer to a structure that we must supply. The function fills in the
structure pointed to by buf. The definition of the structure can differ among implementations,
but it could look like
struct stat
mode_t
ino_t
dev_t
dev_t
nlink_t
uid_t
gid_t
off_t
time_t
time_t
time_t
blksize_t
blkcnt_t
};

{
st_mode;
st_ino;
st_dev;
st_rdev;
st_nlink;
st_uid;
st_gid;
st_size;
st_atime;
st_mtime;
st_ctime;
st_blksize;
st_blocks;

/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*
/*

file type & mode (permissions) */


i-node number (serial number) */
device number (file system) */
device number for special files */
number of links */
user ID of owner */
group ID of owner */
size in bytes, for regular files */
time of last access */
time of last modification */
time of last file status change */
best I/O block size */
number of disk blocks allocated */

File Types
Most files on a UNIX system are either regular files or directories, but there are additional
types of files. The types are:
1. Regular file. The most common type of file, which contains data of some form. There
is no distinction to the UNIX kernel whether this data is text or binary. Any
interpretation of the contents of a regular file is left to the application processing the
file.
2. Directory file. A file that contains the names of other files and pointers to
information on these files. Any process that has read permission for a directory file
can read the contents of the directory, but only the kernel can write directly to a
directory file
3. Block special file. A type of file providing buffered I/O access in fixed-size units to
devices such as disk drives.
4. Character special file. A type of file providing unbuffered I/O access in variablesized units to devices. All devices on a system are either block special files or
character special files.
5. FIFO. A type of file used for communication between processes. It's sometimes
called a named pipe.
Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


6. Socket. A type of file used for network communication between processes. A socket
can also be used for non-network communication between processes on a single host.
7. Symbolic link. A type of file that points to another file.
The type of a file is encoded in the st_mode member of the stat structure. We can determine
the file type with the macros shown in below .The argument to each of these macros is the
st_mode member from the stat structure.
File type macros in <sys/stat.h>
Macro

Type of file

S_ISREG()

regular file

S_ISDIR()

directory file

S_ISCHR()

character special file

S_ISBLK()

block special file

S_ISFIFO() pipe or FIFO


S_ISLNK()

symbolic link

S_ISSOCK() socket

Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


4a. Shell script that accepts file names specified as arguments and creates a shell script
that contains this file as well as the code to recreate these files. Thus if the script
generated by your script is executed, it would recreate the original files.
#! /bin/bash
echo "#to unbundled ,bash this file"
for i
do
echo "echo $i
1>&2"
echo "cat >$i << 'End of $i'"
cat $i
echo "End of $i"
done

Procedures for Execution


Save the file with .sh
Create two files using
[root@localhost ~]# vi
[root@localhost ~]# vi

extension(Not mandatory)
the editor command and write the content of the file
data1.txt
data2.txt

Redirect the contents of files to another file


[root@localhost ~]# sh filename.sh data1.txt data2.txt > data3.txt
See the Content of the file using cat command.
[root@localhost ~]# cat data3.txt

OUTPUT
#to unbundled ,bash this file
echo data1.txt
1>&2
cat >data1.txt << 'End of data1.txt'
Unix
End of data1.txt
echo data2.txt
1>&2
cat >data2.txt << 'End of data2.txt'
shell programming
End of data2.txt

Compiled by: Darshan.K.R

S.J.B.I.T

/*Content of data1.txt*/

/*Content of data2.txt*/

Dept. of C.S.E

System Programming & Compiler Design Manual


4b.C program to do the following: Using fork ( ) create a child process. The child
process prints its own process-id and id of its parent and then exits. The parent process
waits for its child to finish (by executing the wait ( )) and prints its own process-id and
the id of its child process and then exits.
#include<stdio.h>
#include<sys/types.h>
#include<unistd.h>
int main()
{
int status;
int pid,ppid,mpid;
pid=fork();
if(pid<0)
{
printf("Error in forking child");
}
if(pid==0)
{
ppid=getppid();
printf("I am the child and my parent id %d\n",ppid);
mpid=getpid();
printf("Child process\n");
printf("My own id is %d\n",mpid);
}
wait();
mpid=getpid();
printf("My id is %d and my child id is %d\n",mpid,pid);
}

Procedures for Execution


Save the file with .c extension
[root@localhost ~]# cc filename.c
[root@localhost ~]# ./a.out

OUTPUT
I am the child and my parent id 16599
Child process
My own id is 16600
My id is 16600 and my child id is 0
My id is 16599 and my child id is 16600s

Theoretical Explanation:
Every process has a unique process ID, a non-negative integer. Because the process ID is the
only well-known identifier of a process that is always unique, it is often used as a piece of
other identifiers, to guarantee uniqueness.
Process ID 0 is usually the scheduler process and is often known as the swapper. No program
on disk corresponds to this process, which is part of the kernel and is known as a system
process. Process ID 1 is usually the init process and is invoked by the kernel at the end of the
bootstrap procedure.
In addition to the process ID, there are other identifiers for every process. The following
functions return these identifiers.
Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual


#include <unistd.h>
pid_t getpid(void);

Returns: process ID of calling


pid_t getppid(void);

Returns: parent process ID of calling process

Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

System Programming & Compiler Design Manual

BIBILOGRAPHY
1.Lex & yacc , 2nd Edition by John R.Levine ,Tony Mason & Doug Brown.
2.Unix Concepts and applications ,Fifth Edition by Sumitabha Das.
3.Unix System programming by Terrence Chan.
4. The UNIX Programming Environment by Brain W. Kernighan and Rob Pike.
5. Compiler Design by Alfred V Aho, Ravi Sethi, Jeffrey D Ullman:
Compilers- Principles, Techniques and Tools, Addison-Wesley,

Compiled by: Darshan.K.R

S.J.B.I.T

Dept. of C.S.E

Vous aimerez peut-être aussi