Vous êtes sur la page 1sur 89

This watermark does not appear in the registered version - http://www.clicktoconvert.

com

A
COURSE
IN THE
UNIX
OPERATING SYSTEM
Martin Wynne
© 1992-1997
Department of Linguistics and Modern English Language
Lancaster University
Bailrigg
Lancaster
email: M.Wynne@lancaster.ac.uk

0
This watermark does not appear in the registered version - http://www.clicktoconvert.com

Chapters
1. About UNIX - 2
2. Logging in and out - 5
3. UNIX filestore - 7
4. UNIX commands - 10
5. Doing more - 15
6. Communications - 20
7. File permissions - 26
8. Standard input and output - 30
9. An introduction to the ex line editor - 34
10. Regular expressions - 41
11. Processing large text corpora - 46
12. Introduction to the vi screen editor - 54
13. Text formatting - 60
14. More on the shell - 62
15. Shell programming - 71

Appendices:
1. Appendix A Command summary - 79
2. Appendix B Example scripts - 83

About this course


This course is intended to serve as a general introductory guide to the Unix operating
system. The course was written to be given to students for self-study, with some
tutorial support. It is intended however that it can be used entirely as a self-study
course if necessary. While the course is intended for those learning to use Unix for
language processing, the material covered is of interest to any users who wish to
explore Unix and develop their own simple applications. The language processing
techniques introduced here should be of interest to any user who wishes to handle text
files.

The course contains a series of exercises, to be found at the end of most chapters, plus
some practice material inside the chapters. It is important that you do these as you
work through the course. Not only do they serve to consolidate what has been covered
in the text, they should lead you to find out more. You will learn much more if you
adopt an active, curious and critical approach to Unix. So try things at the keyboard,
and don't be afraid to get things wrong - it is an important part of the learning process.

Most of the information given in this course should be relevant for most versions of
Unix. However, students should be prepared to encounter local variations.

Information about to the implementation of Unix on Lancaaster University machines


is separated from the main text and included in boxes like this, so that the main text
may remain as widely applicable as possible.

1
This watermark does not appear in the registered version - http://www.clicktoconvert.com

CHAPTER 1 - ABOUT UNIX


What is Unix?
Unix is a computer operating system. An operating system is the software that
provides the interface between the hardware of a computer system and the
applications programs that are used on it. Simply put, the operating system provides
the link between the hardware of the computer and the user. Popular operating
systems include DOS (used on PCs) and VM/CMS (used on mainframes, now
becoming rare). Unix is available on a wide variety of computer systems, including
personal computers, workstations, mainframes and supercomputers. It was developed
for, and is particularly well-suited to, multi- user systems, but is now also run on
'stand-alone' machines.

A little history
Unix was first developed in the early 1970s at Bell Laboratories in the USA. It was
originally developed as a system to be used by the staff in the laboratories, and it was
principally intended to provide an operating system that people would enjoy using. It
was designed for users who were largely computer scientists, which may explain
some of the more arcane and apparently unfriendly features of Unix, such as the
obscure sounding command names. AT&T (the owners of Bell Laboratories) made
Unix available at nominal cost to academic users, with whom it became popular. This
helped to create a market for Unix, at a time when technological changes had
themselves created a need for a portable multi- user operating system. As a result Unix
began to be adopted by non-academic users in the 1980's as it became commercially
available. Several standards are now being worked out, and Unix is steadily becoming
the standard operating system in many environments.

What's special about Unix?


Unix has the following advantages: Portability Unix is written in the high level
language C. This makes it easy to install on new computing systems. Applications
written to run on a Unix system will hopefully run on any Unix system, regardless of
the hardware. Popularity Unix is available on many widely- used systems. It is very
widely used and it has become the de facto standard for academic users, and for all
multi- user applications.

Power
A wide and growing range of applications software is available. Unix provides a
range of tools that can be combined and manipulated to perform such a wide variety
of jobs that users of the system can very often carry out sophisticated tasks without
writing programs in a programming language. Standardisation Although there are
many versions of Unix, these are already largely compatible, and official standards
are currently being defined.

2
This watermark does not appear in the registered version - http://www.clicktoconvert.com

Different Unix systems


They are many different versions of Unix, as well as some Unix 'lookalikes'. The most
widely used are:
· System V (distributed by the original developers, AT&T)
· AIX (IBM)
· Berkeley BSD (from the University of California, Berkeley)
· SunOS, now known as Solaris (from the makers of Sun workstations)
· Xenix (a PC version of Unix).

Unix features
The kernel and the shell
The Unix operating system consists basically of the kernel and the shell. The kernel is
the part carries out basic operating system functions such as accessing files, allocating
memory and handling communications.

A shell provides the user interface to the kernel. A number of shells are available on
the Unix operating system including the Bourne shell and the C shell. The shell is
basically an extensive program that runs all the time that you are logged on to the
computer, and provides an interactive interface between the user and the computer
functions. The C shell is the default shell for interactive work on many Unix systems.
It will be covered in this document.

See chapter 13 below for more details.

Graphical User Interfaces


Graphical User Interfaces (usually written GUIs and pronounced 'gooeys') provide an
alternative user interface to shells such as the C shell and Bourne shell.

GUIs provide a replacement to the command line interface based on the use of icons,
menus and a mouse. Using GUIs, applications software from different suppliers can
have a consistent interface, which reduces the time needed to master new applications.

If you have access to a workstation or a powerful PC with the necessary software, you
may wish to attempt to master a GUI, especially if you are already used to using a PC
windows environment. You can create shells within a GUI environment and continue
to work through this course, as well as having more utilities available to you. Using a
GUI will not be covered in this course however, since the lack of a standard means it
is not clear which should be taught. Furthermore, the commands and utilities taught
here deal with Unix at a more fundamental level than GUI interfaces provide, and
what you will learn here will give you an insight into how Unix actually works, give
you access to the full power and flexibility of Unix. The skills learned here should be
of use in many different applications and environments.

3
This watermark does not appear in the registered version - http://www.clicktoconvert.com

On-line tutorials should be available with GUI implementations and provide an


introduction to their use.

Text processing
Standard Unix implementations offer a variety of text editors and formatters.

Editors
It is essential that a Unix user becomes reasonably proficient in the use of at least one
editor if they want manipulate text files. Most users nowadays with experience of
word-processors prefer a screen editor, and these generally provide the friendliest
interface. There are good reasons however for learning to use the Unix line editor, as
it's use involves learning a great deal about the way that Unix commands and
programs deal with strings, texts, contexts, etc.. In this course we will therefore look
in some detail at ex, the enhanced Unix line editor, and at the other text processing
utilities that have been built on the basic ex functions. You are recommended to look
ahead to the chapter on ex as soon as you have need of a text editor. If you find using
ex impractical, use a screen editor. The standard Unix screen editor is vi, but as this is
built on ex, you need some knowledge of ex to make use of the majority of it's
functions, which are in any case very different from those of a modern editor. Some
version of emacs is usually available on Unix systems, and it may be best for you to
use this. Emacs is available on other systems, such as DOS, and is a good general
purpose editor.

On the other hand, the advantage of using vi is that it is always available in basically
the same form on any Unix system, so if you learn vi, you know that you will always
be able to use a screen editor on any Unix system. What is more, once you have
learned about ex, you will be to exploit some of the power of vi without much extra
effort. It is therefore certainly worth having at least a basic familiarity with vi, and
many users use it as their preferred editor. The decision about which editors to use
and when depends on your own needs and preferences. If you want to use a screen
editor straight away, use emacs, or whatever is available on your system. At a later
date, a little effort to learn vi could be well rewarded.Further reading: Documentation
on vi is available on- line (type man vi) and in the SunOS manuals. There is a prose
introduction in chapter 24 of Nishinuma (1987).

If you are using some type of windows program then there will be a simple interactive
screen editor (such as textedit with OpenWindows) available with the program, and
this will be more suitable than emacs for simple tasks.

Text formatters
Unix has its own text formatters (principally nroff and troff.) and systems will often
support other documentation software, such as TeX. Many users will have no use for
these, and will prefer to use a word-processor. A short introduction to nroff is given in
Chapter 12.

4
This watermark does not appear in the registered version - http://www.clicktoconvert.com

CHAPTER 2 - LOGGING IN AND


OUT
Logging In
When you have established contact with the Unix system, the login prompt will be
displayed. You must give your username followed by your password:

login:lnp3jb

Password: secret1 -the password is not in fact displayed when you


type it

The username can be up to 8 characters in length. Unix usernames contain only


lowercase characters, and it is important that you type your username in lower case (if
you don't you will be permitted to log in, and then the shell will not recognise case
differences.) The password must normally contain between 6 and 8 characters. On
some unix systems the password must contain at least 1 non-alphabetic character.

System messages
When you log in a number of system messages may be displayed. The more filter
will be used to control the output if the file contains more than a screenful of
information. Just press the space bar to see the next screenful if it says 'more' at the
bottom of the screen.

The message:

You have new mail


indicates that electronic mail has been sent to your mailbox.

The prompt
When your login procedure is completed you should see the system prompt. This
indicates that the shell is running and is awaiting instructions from the user. The
prompt can take many forms, and you can change it later on if you want to. Often the
prompt will contain the % character, and a number in brackets. This number will
represent the number of a command, and can be used to recall commands already
issued. It may also display the name of machine or system that you are logged onto.
Some users prefer to have the name of the current working directory displayed in their
prompt. For convenience, in this document, the % character will be used to represent
the prompt.

5
This watermark does not appear in the registered version - http://www.clicktoconvert.com

Changing your password


Users will be assigned their username and initial password by the unix system
administrator. You are advised to change the initial password to one that you will find
easy to remember. Use the passwd command to change your password:
% passwd -where '%' is the prompt
Changing password for lnp5mw
Old password: -type in your old password
New password: -type in your new password
Retype new password: -and again, to make sure
%

Notice that your passwords are not displayed.

Logging out
When you have finished your unix session you must log out from the system. To do
this give the command:

% logout

You should always wait for the message confirming that you have logged out.

Problems?
On some unix systems you may receive the message:

logout: command not known

If this happens you should type:

exit

You may occasionally get the message:

There are stopped jobs

If this happens simply give the logout command again.

PRACTICE

Log in to the unix system using your username and password.Change your password
using the passwd command. You may find that the system will not change your
password immediately. In this case you may have to use your old password next time
that you log on.

6
This watermark does not appear in the registered version - http://www.clicktoconvert.com

CHAPTER 3 - THE UNIX


FILESTORE

File hierarchy
Unix has a hierarchical tree- like filestore. The filestore contains files and directories,
as illustrated in the diagram below.

An illustration of a fragment of the Unix filestore hierarchy.

The top- level directory is known as the root. Beneath the root are several system
directories. The root is designated by the / character.

The directories below the root are designated by the pathnames:

/bin /etc /usr

Confusingly, the / character is also used as a separator in pathnames. So, from the
figure above, the directory lnp5jb can be referred to by the pathname
/bin/home/sunserv1_b/lnp5jb. Historically, user directories were often kept in the
directory /usr. However, it is often desirable to organise user directories in a
different manner.

Users have their own directory in which they can create and delete files, and create
their own sub-directories. For example:

/user/ei/eib035

belongs to someone whoe has the username eib035.

Some typical system directories below the root directory:

/bin contains many of the programs which will be executed by users


/etc files used by system administrators
/dev hardware peripheral devices
/lib system libraries
/usr normally contains applications software
/home home directories for different systems

7
This watermark does not appear in the registered version - http://www.clicktoconvert.com

The current directory


This refers to your actual location in the filestore hierarchy. When you log in the
current directory is set to the home directory. You can then change current directory,
effectively moving around the filestore tree structure. The current directory is also
called the "current working directory" and the "working directory". The current
directory can be referred to in pathnames by the . character (a full stop).

Changing current directory


The command cd is used to change your current directory. For example:

% cd bin

will move you from your current directory, down one "branch" to the directory bin, if
such a directory exists. Typing cd with no arguments takes you to your home
directory.

Display current directory


The command pwd is used to display your current directory. For example:

% pwd
/home/sunserv1_b/lnp5jb/bin

Pathnames
Files and directories may be referred to by their absolute pathname. For example:

/home/sunserv1_b/lnp5jb/bin/hello

Files and directories may also be referred to by a relative pathname. For example, if
your current directory is /home/sunserv1_b/lnp5jb, the above file can be referred
to as:

bin/hello

The home directory


Each user has a home directory. They will be attached to this directory when they log
in. Jenny Brown's home directory is:

/home/sunserv1_b/lnp5jb

The symbol ~ can be used to refer to the home directory. If Jenny Brown wishes to
refer to her file she can give:

~/bin/hello

8
This watermark does not appear in the registered version - http://www.clicktoconvert.com

rather than typing the long form:

/home/sunserv1_b/lnp5jb/bin/hello

The symbol ~ can also refer to other the home directory of other users. For example
Jenny can refer to a file in John Smith's home directory using:

~lnp5js/test.dat

The parent directory


The parent directory is the directory above the current directory. The parent directory
can be referred to by the .. characters (two full stops). For example to refer to the file
test.dat in the parent directory:

../test.dat

Linking files
The ln command can be used to link files and directories across the filestore system.
The symbolic link function (ln -s) is the most useful. This enables a file or directory
to appear to be in a particular directory when it is in fact stored somewhere else. This
can save the user from having to type out long pathnames for frequently used files or
directories. For example, if you want to use the files in /usr/games regularly, you
can set up a symbolic link to this directory. If Jenny Brown is in her home directory
and types:

% ln -s /usr/games fun

this will create what appears to be a new directory below her home directory, entitled
fun. When she does cd fun she will move to /usr/games. If she now does pwd, the
current directory will appear as /home/sunserv2_a/lnp5jb/fun. Some things may
be a little surprising however: the parent directory, for example, will be that of the
original file or directory.

Exercises
1. Check which directory you are currently in. If necessary, move to your home
directory. (Remember: cd will do this from anywhere).
2. Move to the root directory. ("Move to..." means "change your current working
directory to...". It is useful to picture the process as movement around the tree
structure.)
3. Work your way down one directory at a time to your home directory.
4. Experiment with using relative and absolute pathnames; show how the two
can produce the same results.
5. Explore your systems filestore. Try to get into the home directory of someone
else you know! (You may not be able to view their files.)

9
This watermark does not appear in the registered version - http://www.clicktoconvert.com

10

CHAPTER 4 - UNIX COMMANDS

Unix commands have the general format:

command [options ] [item ]

Items in brackets are optional, and words in italics are generic identifiers (i.e. options
must be replaced by a particular option, e.g. -a).

Note that:

Commands are case sensitive. The command ls is different from LS. In fact LS is not
recognised as a valid command.

Command options consist of a single character. The command to list all the files in a
directory is ls -a and could not be ls -all (the latter would have to mean a
combination of options.)

Command options can usually be combined or listed separately. For example:

ls -al or ls -a -l

The command item is given last. This is very often a file name. For example:

ls -a file1.f not ls file1.f -a

The echo command

The echo command 'echoes' its argument to the standard output. This means that in its
simplest form it prints something out on screen. For example:

% echo Hello - you type


Hello - response from the shell %

Who is logged on?

The command who gives a list of logged on users:

% who
root console Jan 4 10:34
men6matw ttyp1 Jan 6 09:45 (ecusun1)
cbl6nd ttyp2 Jan 6 10:10 (cblslcd)
cbl6ar ttyp3 Jan 6 16:03 (cblsuna)
csc6ea ttyp4 Jan 6 14:15 (csuna1)
root ttyp5 Jan 6 10:40 (sun032)
ecl6rsh ttyp6 Jan 6 15:39
csc6ea ttyp8 Jan 6 14:15 (csuna1)

10
This watermark does not appear in the registered version - http://www.clicktoconvert.com

11

lnp5mw ttyUf Jan 6 16:16


lnp5jb ttyp3 Jan 6 15:20 (sun051)

Also try the command finger. This command gives the full name of logged in users.

PRACTICE

Type finger to get information on yourself and other users.

Creating a directory
The mkdir command is used to create directories. The format of this command is:
% mkdir directory_name

Jenny Brown stores her unix scripts in a directory called scripts beneath her home
directory. In order to create this directory she uses the command:

% mkdir scripts

Deleting a directory
The rmdir command is used to delete directories. The format of this command is:
% rmdir directory_name

Jenny Brown stores files for project work in a directory called proj. When the project
has been completed she deletes the directory using the command:

% rmdir proj

Note that the directory must be empty before it can be deleted.

Listing contents of a directory


The command ls is used to list the contents of a directory. For example:
% ls
file1 scripts test.f test

Notice that directories are listed as well as files. To list all files, including hidden
files, give the command:

% ls -a
.cshrc file1 bin test.f test

Hidden files begin with . (a full stop). Hidden files are normally system files, and will
normally include the following:

% ls -a
.cshrc .forward .history .login .logout

· .cshrc contains commands that are executed every time you start off a C-shell,
including when you log in

11
This watermark does not appear in the registered version - http://www.clicktoconvert.com

12

· .forward enables you to redirect your mail to another computer


· .history contains a record of previously executed commands
· .login contains commands that are executed at login time
· .logout contains commands that are executed at logout time

The purpose of some hidden files.

To identify directories in a listing give the command:

% ls -F
file1 bin/ test.f test

Notice how the directory is identified by the slash (/) character.

Deleting files

Files can be deleted using the rm command. For example:

% rm test.f

Displaying files

The command cat is used to display the contents of a file on the screen.

For example:

% cat file1

Creating files

The command cat can also be used to create a file. For example:

% cat > test.f


When typing in a new file
the input must be terminated by
^D

NOTE ^D means press the <ctrl> and the d keys simultaneously. Be careful not to
type ^D when you have the shell prompt, because this might log you out. Normally
you would use an editor for creating files. This example is given since it illustrates
how to create a small file without needing to learn the use of an editor.

Copying files

The command cp is used to copy a file. It takes the format:

% cp old_file new_file

For example:

% cp file1 file2

12
This watermark does not appear in the registered version - http://www.clicktoconvert.com

13

Renaming files

The command mv is used to rename a file.

For example:

% mv file2 temp

changes the name of file2 to temp.

Moving files

The command mv is also used to move a file to a new location in the filestore
hierarchy. For example:

% mv file2 bin

moves the file file2 into the subdirectory bin.

Overwriting files

Commands such as rm and cp can be dangerous if not used with care. The command:

% cp file1 file2

will delete file2 if a file of that name already exists. If you have spelled the name of
the new file incorrectly you may accidentally overwrite the contents of a file. Using
the wildcard symbol * with the command rm can also be very dangerous. The
command:

% rm test*

will delete all files starting with test. However if you inadvertently type an extra space
(do not try this!):

% rm test * -do not try this!

the file test will be deleted if it exists. Then all other files in the directory will be
deleted! Often no warning will be given.

To prevent accidental deletion of files you can use the -i option with commands such
as rm. The format of the command is:

% rm -i file

You will be asked to confirm that files are to be deleted. You may find that this is set
as the default on your system.

13
This watermark does not appear in the registered version - http://www.clicktoconvert.com

14

Wildcards

Wildcard characters can be used to identify directory and file names. The wildcard
character * is used to refer to any combination of characters. For example:

% ls * - refers to all files


% cat test* - refers to all files starting with 'test',
e.g. 'test', 'testing', 'test.c', etc.

The wildcard character ? is used to refer to a single character. For example:

% ls test? - refers to files starting with 'tes t' followed by a


single character e.g. 'test1', 'test2', 'testz', etc.
% cat test.? - refers to all files starting with 'test' with a
single character after the full stop, e.g. 'test.c, test.f'

Exercises
1. Display your current working directory using the pwd command.
2. Make a directory called exercises.
3. Change your directory to the directory exercises. Display the current
working directory.
4. Return to your home directory.
5. List the contents of your directory. Use the -l, -a and -F options and compare
the output.
6. Change your directory to the directory exercises. Create a file called
example1 using the cat command containing the following text:
7. water, water everywhere
8. and all the boards did shrink;
9. water, water everywhere,
10. Nor drop to drink
11. List the contents of your directory. Use the -l option to obtain a long listing.

14
This watermark does not appear in the registered version - http://www.clicktoconvert.com

15

CHAPTER 5 - DOING MORE

Viewing files with the more command


The command more is used to display the contents of a file on the screen. The
command is particularly useful for viewing long files since the display stops at the
bottom of the screen. The following is a listing of a program in the Icon programming
language:

% more lookup.icn
# program to look up words (given at the terminal) in the
# computer usable version of the OALD
# last change 18.12.91
# set global parameters
global k
# main body
procedure main()
# input word to be searched for
write("Give me a word: \n")
word:=read()
# this the important line - call the 'lookup' procedure
if not write(lookup(word)) then write("Not found in the
dictionary.")
end
procedure lookup(voc)
# connect to the dictionary
(dict:=open("/home/sunserv1_a/ecl6rsh/oald.mitton/cuv2")) |
stop("can't open the dictionary")
# lookup algorithm
every k:=1 to *voc do {
--More-- (75%)

The message at the bottom of the screen means that 75% of the file has been viewed
so far. (The amount shown on screen will depend on the type of terminal you are
using.) You can now do the following:

To continue viewing press the space bar

To view the next line press <RETURN>

To quit press the <q> key

To jump to the next occurrence of a string of characters type /string

For a list of valid commands press the <h> key.

15
This watermark does not appear in the registered version - http://www.clicktoconvert.com

16

Viewing files with the pg command


The pg command is also available on some systems. This is an alternative to more

% pg lookup.icn
# program to look up words (given at the terminal) in the
# computer usable version of the OALD
# last change 18.12.91

# set global parameters


global k

# main body
procedure main()
# input word to be searched for
write("Give me a word: \n")
word:=read()
# this the important line - call the 'lookup' procedure
if not write(lookup(word)) then write("Not found in the
dictionary.")
end

procedure lookup(voc)
# connect to the dictionary
(dict:=open("/home/sunserv1_a/ecl6rsh/oald.mitton/cuv2")) |
stop("can't open the dictionary")
# lookup algorithm
every k:=1 to *voc do {
bit:=bite(voc)

Commands can be typed to the ':' prompt at the bottom of the screen: Type <RETURN>
to view the next screen. Type <h> for a list of valid commands.

PRACTICE

If you have a file longer than 20 lines use pg to view it. Compare the use of pg with
more. Use them both on the file /etc/passwd, and find the listing for your own
username.

16
This watermark does not appear in the registered version - http://www.clicktoconvert.com

17

Searching for strings in files


The command grep is used to search a file for a string of characters. For example, to
search the file lookup.icn for the character '#' (which designates comments in the
program), use the command:

% grep # lookup.icn
# program to look up words (given at the terminal) in the
# computer usable version of the OALD
# last change 18.12.91
# set global parameters
# main body
# input word to be searched for
# this the important line - call the 'lookup' procedure
# connect to the dictionary
# lookup algorithm

A lot of pattern matching operations can be carried out with grep. The following
example shows the use of a regular expression. In this example, the search is
restricted to lines beginning with the 'p' character.

% grep " p" lookup.icn


procedure main() -output starts here
procedure lookup(voc)
procedure bite(voc2)

You will learn more about pattern matching expressions later.

Control characters
The actual key sequences for the following operations can vary from between
different systems and different terminals. The most commonly used key sequences are
described below. If it is different on your system, remember the correct sequence and
use it whenever the key sequences below are referred to later in the text. Where
possible the operation itself is named (e.g. end-of- file), and not just the key sequence.

Deleting the last character typed

If you make a typing mistake you can delete the last character typed by using your
delete key, which is usually the one marked <DEL> or <DELETE>.

Deleting the entire line

If you make many typing mistakes you can delete the entire line by typing ^U.

NOTE Remember ^U means "press <CTRL> and <u> keys simultaneously".

Sending an interrupt

If you wish to terminate the execution of a command type ^C.

17
This watermark does not appear in the registered version - http://www.clicktoconvert.com

18

Sending an end-of-file character

In many Unix commands you need to finish your input with an end-of- file character.
The default end-of- file character is ^D.

Printing on paper
This is usually called 'obtaining hard copy output', as distinct from output to the
screen or a file. The command lpr sends a file to the line printer:

% lpr file1

Note that the command lp is used on some Unix systems. The command:

% lpr -Pprinter file

is used to submit the file to a specific printer.

The locally developed command printers can be used to obtain a list of printers.

Getting help
The command man is used to display help on the syntax of Unix commands.

The format of this command is:

% man [option] [file]

For example to obtain help information on the who command, type:

% man who

The keyword option -k keyword is used to display a list of help files associated with
the keyword. For example to display a list of all man files associated with password
type the command:

% man -k password
getpass(3) read a password
passwd(1) change login password
passwd(5) password file

The command man automatically invokes the more program for viewing files. You
can use the normal more commands to continue viewing.

18
This watermark does not appear in the registered version - http://www.clicktoconvert.com

19

If you have any problems that can't be solved by referring to the manual, please
consult your supervisor or the Advisory Service. The Help Desk can be contacted in
person in the User Access Area, on the telephone on extension 5366, or by email to
helpdesk. Also the LUCS Unix system operators can be contacted on telephone
extension 5380. With non-urgent problems, an email message to your supervisor is
usually the most efficient way of getting help. (See next chapter on how to use email.)

Exercises
1. Display a list of logged on users.

2. Obtain further information for a particular user using the finger command.

3. Use the man command to obtain further information on the finger command.

4. Use the man -k command to find what manual entries there are related to
passwords.

5. Use the grep command to search the file example1 for occurrences of the string
'water'.

6. Use man and the keyword option to find out more information on communications
and e- mail in Unix.

7. Print out a file on paper.

19
This watermark does not appear in the registered version - http://www.clicktoconvert.com

20

CHAPTER 6 - COMMUNICATIONS

Mail
The mail command enables the user to send and receive electronic mail messages to
and from users on both the Unix system and remote users.

This is the basic mail command. Enhanced versions, such as programs that run under
a windows program (e.g. mailtool), or screen-based versions of mail (e.g. elm) may
be available, and you will probably find them preferable to mail. If so, much of the
following can safely be ignored. Remember however that some version of mail will
definitely be available on any unix system that you use.

Sending mail

To send a message to a user on your system, type:

% mail username

The cursor will move to the next line, and you will get a Subject: prompt. You can
now type in the subject of your message, and then press <RETURN>. The cursor will
go to the start of the next line and there will be no prompt. You now type in the text of
your message. Terminate each line with <RETURN>. When you have finished the
text of the message, type an end-of- file character (usually ^D), or a full-stop character.
You should now return to your normal shell prompt. If the message is dispatched
successfully, you will hear no more about it. The following is example of the mail
command in action:

% mail lnp6ttld
Subject: UNIX course
I don't think I'll ever be able to get the students
in the UNIX course to understand how to use e-mail.
^D
%

Entering the text of the message by this method is a rather crude process. Errors on
the line being typed can be erased with your delete key, but once you have pressed
<RETURN>, a line cannot be edited. A message may be aborted by pressing ^C
twice.

20
This watermark does not appear in the registered version - http://www.clicktoconvert.com

21

PRACTICE

Send yourself a message. (You will find out where it has gone in the next section.)

Subcommands while entering mail

There are several commands you can type while entering mail:

<CTRL/Z> will cancel the message, and leave the text in a file named dead.letter.

^e invoke a text editor to edit your message.

~v invoke a screen editor to edit your message.

~f reads the contents of the message you have just read, into your message text.

~r file reads contents of file into your message text.

While this method is quick and easy to use, and quite adequate for short and simple
messages, many users prefer to first create a file containing the text of the message,
and then mail this file to the intended recipient. This enables you to use any system
editor and formatter to create the message, and you do not need to send it
immediately.

The following sequence shows how to send a file note containing the text of a
message to another user.

% mail lnp6ttld < note

To understand fully how this works see the section on 'Re-direction of standard
output' in Chapter 8 below.

In this example the message will not contain a subject heading, unless one has already
been included as the first line of the file note. There is a -s option with the mail
command, that can be used to include a subject header, as follows:

% mail -s UNIX lnp6ttld < note

The string following the -s is the subject; in this case, the subject is "UNIX".

21
This watermark does not appear in the registered version - http://www.clicktoconvert.com

22

Receiving mail

If new mail is waiting for you when you login, you will see the message:

You have new mail

To start the mail program type the command:

% mail

Each message is summarised on a numbered list. The current message is marked with
a "> " character. The mail prompt character is "& ". Type the number of the message
you want to read, or just press <RETURN> to read through the list. The list of mail
headers will look something like this:

% mail
Mail version SMI 4.0 Thu Oct 11 12:59:09 PDT 1990 Type ? for help.
"/usr/spool/mail/lnp5jb": 2 messages 2 new
>N 1 lnp5mw Thu Jan 9 15:10 11/262 hello
N 2 lnp5js Thu Jan 9 15:11 10/287 party
&

This tells Jenny Brown that she has two messages, one from user lnp5mw, and one
from lnp5js. The date and time at which the messages were received is also listed,
and so is the subject header (the last item on each line - here 'hello' and 'party'). The
following commands can be entered to the mail prompt:

d Mark the current message for deletion

d n Mark message number n for deletion

u n undelete message number n.

w file save the current message in file with the mail header and mark for deletion

s file Save the current message in file without the mail header and mark for deletion

r Reply to the current message

q Quit mail, removing deleted messages from your system mailbox. Undeleted
messages that have been read are normally stored in your personal mailbox (see
below)

x Exit mail, leaving your mailbox untouched, i.e. messages deleted in this session are
restored

h Show list of message headers

? List the useful mail commands

! command Execute specified shell command

22
This watermark does not appear in the registered version - http://www.clicktoconvert.com

23

- Re-read previous message.

m recipient Send mail to named recipient

Files used by mail

~/mbox Your personal mailbox, located in your home directory. This is where
messages that you have saved are stored, unless you specified another location when
you saved them. You can access this file by issuing the command:

% mail -f mbox

~/.mailrc A file that can hold commands for mail to obey when it starts up.

PRACTICE

See if you have received any mail. If you have, save a message to your mailbox file.
Send yourself another message, and this time discard it. Send a message to another
user.

Sending mail to remote users


The following also applies to the elm mail program.

Sending mail to users on other computer systems is simple using mail. Simply type
the full address of the remote user where the system username is used above. For
example:

% mail lnp5mw@uk.ac.leeds.gps
or% mail -s Hello ecl6rsh@uk.ac.leeds.cms1 < note

These two examples show two ways of sending mail shown above.

It is also possible to use mail to look at folders of mail that you have already
received. To do this type:

% mail -f folder_name

and it will treat the messages in the folder as incoming mail.

Sending on-line messages


As you have seen, messages sent using mail are received in a special buffer, and it is
up to the recipient when to look at them and what to do with them. It is also possible

23
This watermark does not appear in the registered version - http://www.clicktoconvert.com

24

to send a message that will simply appear on the screen of the recipient, if they are
logged on. This is less useful than mail for the following reasons:

mail can be used irrespective of whether the recipient is logged on or not.

mail messages can be stored by the recipient. This means that files can be transferred
by mail, and a record of transactions can be kept.

On-line messages can be confused with whatever the recipient has on screen and can
easily disrupt what the are doing. They can be very annoying!

On the other hand, on- line messages do have the advantage of obtaining the
immediate attention of another user, and it is possible to have an interactive
conversation. Bearing these facts in mind, use the following command with caution!

write

The write command is used to send on- line messages to another user on the same
machine.

The format of the write command is as follows:

% write username
text of message
^D

After typing the command, you enter your message, starting on the next line,
terminating with the end-of- file character. The recipient will then hear a bleep, then
receive your message on screen, with a short header attached. The following is a
typical exchange. User lnp5jb types:

% write lnp8zz
Hi there - want to go to lunch?
^D
%

User lnp8zz will hear a beep and the following will appear on his/her screen:

Message from lnp5jb on sun050 at 12:42


Hi there - want to go to lunch?
EOF

If lnp8zz wasn't logged on, the sender would see the following:

% write lnp8zz
lnp8zz not logged in.

SunOS has the talk command. This has several advantages over write. Firstly, talk
can call other machines on a network. Secondly, talk provides a clearer interface for
the exchange of messages, dividing the screen into two windows for the interlocutors.
Type

24
This watermark does not appear in the registered version - http://www.clicktoconvert.com

25

talk username@machine

to start a conversation.

PRACTICE

Try to have an extended on-line conversation with another user.

You can stop messages being flashed up on your screen if you wish. To turn off direct
communications type:

% mesg n

It will remain off for the remainder of your session, unless you type:

% mesg y

to turn the facility back on. Typing just mesg lets you know whether it is on or off.

Remote logins
It is possible to log on to another machine on a Unix network, provided that you have
permission to do so. To do this use the rlogin command. Type:

rlogin machine

and you will be asked for your password. It may be necessary for you to do this to
make on-line communications with another user easier.

Exercises
1. Send a message to another user on your Unix system, and get them to reply.

2. Create a small text file and send it to another user.

3. When you receive a message, save it to a file other than your mailbox. (Remember
you can always send yourself a message if you don't have one.)

4. Send a message to a user on a different computer system.

5. Send a note to your course tutor telling him that you can use mail now.

25
This watermark does not appear in the registered version - http://www.clicktoconvert.com

26

CHAPTER 7 - FILE PERMISSIONS

What are file permissions?


The Unix file security system can prevent unauthorised users from reading or altering
files.

Every file and directory has specific permissions associated with it, giving different
categories of user certain permissions to look at or change a file, and to run executable
files.

NOTE Executable files are files containing commands than can themselves be
executed as if the file itself were a command.

The file permissions can be displayed using the command:

% ls -l [filename ]

For example, to display the permissions on the file lookup.icn, type the command:

% ls -l lookup.icn
-rw-r--r-- 1 lnp5jb 777 Dec 18 lookup.icn

The first set of characters in the output from the command (-rw-r--r--) gives the
permissions. The username in the middle of the line (lnp5jb) is the owner of the file.
This is user who created the file. The following fields tell you the number of
characters in the file, the date it was created and the name of the file.

Note that the first character specifies the file type. This is normally one of the
following:

- indicates a file

d indicates a directory

The following nine characters represent permissions for different classes of users.
Users on a Unix system are assigned to a group or groups, which might correspond to
a particular department, or research group in the real world. Members of a particular
group can be allowed access to files belonging to other members of the group.

The second, third and fourth characters in the permissions string represent
permissions that apply to the owner of the file. The next three characters apply to
members of the owner's group. The last three apply to all other users. The file in this
example therefore has rw- for the owner, r-- for the group and r-- for others.

26
This watermark does not appear in the registered version - http://www.clicktoconvert.com

27

The three characters corresponding to each class of user each represent a different
type of permission. The first character represents 'read' permission. This means that a
user has permission to open a file and view the contents. If there is an r in this
position then that class of users has read permission. In this example all users have
read permission. In this, and in every case, a horizontal bar character (- ) means that
permission is denied.

The second position represents 'write' permission (the right to make changes to a file).
In the example, only the owner has write permission. Normally, you will not want
others to be allowed to make changes to your files, so write permission is only
allowed to the owner.

The third position represents 'execute permission'. This means permission to 'execute',
or run, a file that works like a command. In this example no-one has execute
permission for the file lookup.icn (it is an Icon program, and it would have to be
compiled before it could be executed, so execute permission would be useless). To
summarise the above, this is how the permissions string is divided up:

- rw- r-- r--


type of file owner group others

Here is another example, this time an executable file:

-rwxr-x--x 1 lnp5jb 562 Jan 10 hello

This tells us that hello is a file; the owner is lnp5jb, the owner has read, write and
execute permission; the group has read and execute permission; others just have
execute permission.

PRACTICE

What are the default permissions for your files and directories? Are they all the same?

When you copy a file what file permissions does the new file have?

27
This watermark does not appear in the registered version - http://www.clicktoconvert.com

28

Changing file permissions


The command chmod is used to change the permissions on a file. The format of this
command is:

% chmod mode filename

For example, to add read permission for the group to the file file1, give the
command:

% chmod g+r file1

chmod modes
In the command:

% chmod mode filename

the mode consists of three elements:

who

operator

permissions

The following options are possible:

who:
u user (owner)

g group

o other

a all

operators:
- remove permission

+ add permission

= assign permission

permissions:
r read

28
This watermark does not appear in the registered version - http://www.clicktoconvert.com

29

w write

x execute

For example:

chmod o-rw file1.f

removes read and write permissions from others.

chmod u+x test

adds execute permission to the owner.

Permissions for directories


Read, write and execute permissions are set for directories as well as files. Read
permission means that the user may see the contents of a directory (e.g. use ls for this
directory.) Write permission means that a user may create files in the directory.
Execute permission means that the user may enter the directory (i.e. make it his
current directory.)

Exercises
1. Try to move to the home directory of someone else in your group. There are several
ways to do this, and you may find that you are not permitted to enter certain
directories. See what files they have, and what the file permissions are. (Remember
that you can protect your own files from prying eyes, or from interference.)

2. Try to copy a file from another user's directory to your own.

3. Set permissions on all of your files and directories to those that you want. You may
want to give read permission on some of your files and directories to members of your
group.

29
This watermark does not appear in the registered version - http://www.clicktoconvert.com

30

CHAPTER 8 - STANDARD INPUT


AND OUTPUT

Standard input
Input to Unix commands is normally given from the keyboard. For example you can
use the cat command interactively:

% cat Hello - you type Hello - response there - you type there
- response ^D - you type %

Note that input from the keyboard is terminated with the end-of- file character, usually
^D. For another example consider the spell command, which is the unix spelling
checker:

% spell - you type Input to the spell ulitity - you type is typed at
the keyboard - you type D - you type ulitity - response

The spell command outputs words that are incorrectly spelled in the input.

Standard output
Output from Unix commands is normally displayed on the screen. For example:

% spell
Input to the spell ulitity
is typed at the keyboard
^D
ulitity - output

PRACTICE

Try out the spell checker. See how it copes with British spellings (remember it's an
American system), proper nouns, hyphens and recently coined vocabulary.

30
This watermark does not appear in the registered version - http://www.clicktoconvert.com

31

Re-direction of standard input


It is possible to redirect standard input so that the input is taken from a file. Imagine
you wish to check for spelling errors in a report. A text can be put into the file report,
which can be fed into the spell command:

% cat > report


Input to the spell ulitity
can come from a file
^D
% spell < report
ulitity

The < character is used to re-direct the input from the file report to the command
spell. The general format for re-direction of user input is:

command < filename

Another common use of re-direction of standard input is to mail a file to another user.
The command:

% mail lnp8zz < report

will mail the file report to local user lnp8zz.

Re-direction of standard output


You do not always want the output from a Unix command to be displayed on the
screen. It has already been shown how it is possible to direct the output from the cat
command to a file. Imagine you want a list of your files and directories kept in a file.
You would use the command:

% ls > filelist

The > character is used to re-direct the output from the command to the file called
filelist. The general format for re-direction of user output is:

% command > filename

Note that output directed to the file /dev/null is effectively discarded. This is the
system 'wastebasket'.

Another example involves directing the output of echo to a file:

echo "Hello there" > greeting

This would normally overwrite any existing contents of the file greeting. Study the
following sequence:

% echo "Hello there" > greeting


% cat greeting

31
This watermark does not appear in the registered version - http://www.clicktoconvert.com

32

Hello there
% echo "This instead" > greeting
% cat greeting
This instead

It is possible to append output to a file, rather than overwriting it, by using the >>
operator. For example:

% echo "Hello there" > greeting


% cat greeting
Hello there
% echo "and goodbye" >> greeting
% cat greeting
Hello there
and goodbye

Look carefully at the difference between these two examples.

Re-direction of input and output


It is possible to re-direct both standard input and output. If you have a report
containing many spelling mistakes you may wish to keep a list of the mistakes in a
file. You can do this using the following command:

% spell < report > errors

Piping
Output from one command can be sent ('piped') to the input of another command
using the | character:

command1 | command2

A common use for pipes is to control the output of large files to the screen. It is
possible to send output to the more command so that only one screenful at a time is
output. If the command

% ls -l

is used to give a long listing of all files and directories there may be too many lines to
see them all at once on the screen. (If you don't have many files, move to /etc where
there should be plenty.) Output from ls -l can be piped to more as follows:

% ls -l /etc | more

You can then use the usual more commands to control the output.

In the output from ls -l, directories are identified by the d character at the start of
each line. A list of just the directories can be obtained by piping the output of this
command to the grep command, giving grep an option which will list only lines
containing the d character at the start of the line. The command is:

32
This watermark does not appear in the registered version - http://www.clicktoconvert.com

33

% ls -l | grep "^d"

The commands sort and grep are often used when piping. For example:

% cat phonenos | sort | lpr

will send an alphabetically sorted list of the phone numbers contained in the file
phonenos to the line printer. The command:

% cat phonenos | grep leeds | sort | lpr

will send a sorted list of phone numbers containing the string 'leeds' to the line printer.

Exercises
1. Put a listing of the files in your directory into a file called filelist . (Then delete it!)

2. Create a text file containing a short story, then use the spell program to check the
spelling of the words in the file.

3. Redirect the output of the spell program to a file called errors.

4. Type the command ls -l and examine the format of the output. Pipe the output of
the command ls -l to the word count program wc to obtain a count of the number of
files in your directory.

33
This watermark does not appear in the registered version - http://www.clicktoconvert.com

34

CHAPTER 9 - AN INTRODUCTION
TO THE EX LINE EDITOR

What's ex for?
Editors available on Unix include:

ed basic line editor

ex line editor

vi screen editor

emacs screen editor

Ex is an enhanced and more friendly version of ed. Vi is a screen-based version of ex.


Most users have no practical use for a line editor nowadays, and they are really a relic
of an earlier age in computing. However, you may occasionally have to use ex, if for
some reason you can't run a screen editor on your terminal. It is covered here mainly
to teach something else, namely, the way that Unix handles texts. This is perhaps
most transparent when you are using ex. Ex forces the user to use complicated pattern
matching operations to do things that are comparatively easy with a screen editor,
such as making correcting small typing errors in the text. While taking this approach
may at times seem unnecessarily difficult, it should be remembered that what follows
here is just a stepping stone to other Unix utilities, such as vi (which you are far more
likely to want to use as an editor than ex), and commands that use regular expressions,
such as grep, tr and awk. Learning to use ex involves skills necessary for getting the
most out of these utilities.

Using ex
Starting ex

The command ex is used to invoke the editor. The format of this command is:

% ex [filename]

A filename can be supplied if you wish to edit an existing file.

% ex oldfile
"oldfile" 10 lines 465 characters
:

34
This watermark does not appear in the registered version - http://www.clicktoconvert.com

35

Alternatively the filename may be used as the name of a new file:

% ex newfile
"newfile" [Newfile]
:

notice that the prompt for ex commands is the ':' character.

Adding Text

To enter text simply type the command a (short for append), and then type in the text,
as follows:

:a
This is the text

Input is terminated by typing a full stop ('.') on a new line:

:a
This is just one line of text
.
:

The command i is used to insert text before the current line.

Saving Your Data

The command w (short for 'write') is used to save your data. The format of this
command is:

:w [filename ]

If no filename is specified, the filename given when ex was invoked will be used.
E.g.:

:w test.f
test.f 50 lines 576 characters
:

The number of lines and characters in the file will be displayed.

Quitting the Editor

The command q (short for 'quit') is used to quit the editor. Note that if changes have
been made to the file and have not been saved the editor will respond with a warning
message:

No write since last change (:quit! overrides)

The command quit! (or just q!) must be given if you wish to quit without saving
your changes:

35
This watermark does not appear in the registered version - http://www.clicktoconvert.com

36

Displaying Lines in the File

The p command (for 'print') used to display lines in the file. The format of this
command is:

:[line_range ] p

If no range is supplied the current line is displayed.

Pressing <RETURN> is equivalent to moving on to and displaying the next line. With
small files it is possible to display the entire file by pressing <RETURN> until the end
of the file is reached.

Line Ranges

Ranges of lines that can be given to edit commands include:

Absolute line number

6 refers to line 6

1,6 refers to lines 1 to 6

Relative line numbers

-2 refers to 2 lines before the current line

+3 refers to 3 lines after the current line

-2,+3 refers to a range from 2 lines before the current line to 3 lines after the current
line

Special symbols

$ refers to the last line in the file e.g. $p to display last line, 1,$p to display entire file

. refers to the current line e.g. .,$p to display from the current line to the end

36
This watermark does not appear in the registered version - http://www.clicktoconvert.com

37

Examples:

6d
- deletes lines the sixth line 1,6d
- deletes the first six lines 1,$d
- deletes all lines 3a
- append text after line three .,+10w new
- saves the next ten lines to a file called new

The = operator gives the line number, with the last line the default, so typing = gives
you the number of lines in a text. The number of the current line is obtained by typing
.=.

Deleting Lines

The d command is used to delete lines. The format of this command is:

:[line_range ] d

If no line number is given the current line will be deleted. It is possible to supply a
range of lines. For example:

:1,$d

will delete the entire file.

Searching

Searches are carried out by including the search string in slashes ('/'):

/string /

The search will start at the current line.

:/Jane/
This is Jane's file

The special characters '^' and '$' can be used to assist the search. For example:

/^This/ will find a line beginning with 'This' /file$/


will find a line ending in 'file'

The last string searched for is the default string. This means that you can repeat a
search just by typing //.

Reverse Searches

Reverse searches are carried out by including the search string in question marks ('?'):

:?string ?

The search will start at the current line and search backwards through the file.

37
This watermark does not appear in the registered version - http://www.clicktoconvert.com

38

Making Substitutions

The s command is used to make substitutions. The format of this command is:

:[line_range ]s/old_string /new_string /

If no line number is given substitutions will be made only on the current line. For
example:

:s/old/new/

will substitute the first occurrence of the string 'old' with 'new' on the current line. The
command:

:.,$s/old/new/

will substitute the first occurrence of the string 'old' with 'new' in every line from the
current line to the end of the file.

Global Substitutions

The g command (for 'global') is used to make multiple substitutions on a line. For
example:

:s/old/new/g

will substitute all occurrences of the string 'old' with 'new' on the current line. The
command:

:1,$s/old/new/g

will substitute all occurrences of the string 'old' with 'new' in the file.

Search strings can also be used in conjuction with the s command in order to carry out
more sophisticated global changes. The line range preceding a substitution string may
include a search for the string to changed. For example:

:g/old/s//new/g

This means 'search globally for 'old', then replace every occurrence with 'new'.
Remember the null string (in s// ) stands for the last RE, in this case the RE 'old'. This
is the same as:

:1,$s/old/new/g

Additional ex facilities

Additional commands available using the ex editor include:

c replaces lines

38
This watermark does not appear in the registered version - http://www.clicktoconvert.com

39

t transfers lines

m moves lines

j joins lines

l shows invisible characters

f gives the name of the file being edited

r inserts named file

e edits named file

u undo last change

The commands m and t above work in a similar way, in that they require two line
addresses, one before and one after the command. The address in front refers to the
source and the address after the destination. If either is omitted, the current line is
assumed. Line addresses may be ranges, allowing blocks of text to be moved. Here
are a few examples of commands:

:.m2

This moves the current line to a position after line 2.

:1,.m$

This moves a block (line 1 to the current line) to the end of the text.

:1,.t$

This copies the block at the end of the text, leaving the original block untouched.

39
This watermark does not appear in the registered version - http://www.clicktoconvert.com

40

Exercises
1. Create a file using ex. Put the text of a message in the file and then mail it to
someone (see chapter on mail).

2. Use ex to explore the file /etc/passwd. Search for your own listing, and those of
others in your group. (You won't be able to save changes to the file).

3. Find a text file to which you have access and copy it to your home directory. Try
making some changes to it.

40
This watermark does not appear in the registered version - http://www.clicktoconvert.com

41

CHAPTER10-REGULAR
EXPRESSIONS

What are regular expressions?


A regular expression (RE) is a string of characters that can be used to match a set of
character strings. For example, to globally search for all occurrences of the word
"and" would require a search for "and", "And", "AnD", "AND", etc. Without regular
expressions finding all possible occurrences of "and" would require eight separate
searches. Using an RE the search could be done with one command.

Regular expressions are used by many Unix utilities, including:

ed

ex

vi

grep

sed

awk
(The awk utility interprets a special-purpose programming language that makes it
possible to handle simple data-reformatting jobs easily with just a few lines of code.
Awk is not covered in this course, but the GAWK Manual is a good guide to its use.)

Regular expressions are used in searches and substitutions.

Character strings

A character string is the simplest regular expression which simply matches the string
itself. For example:

/hello/
- matches 'hello' s/hello/goodbye/
- matches 'hello' and makes a substitution

Matching single characters

The '.' character is used to match a single character. For example:

/p.t/

41
This watermark does not appear in the registered version - http://www.clicktoconvert.com

42

- matches 'p' and 't' separated by a single character, e.g. 'pit',


'put', 'pot', etc.

Sets of characters

The expression /RE/ is used to match a set of characters in a single character position.
For example:

/x[ab2X]y/ - matches any of the following:


xay
xby
x2y
xXy

In the expression /[RE]/ a range of characters can be specified. For example:

[a-z]
- matches any single lower case character [0-9]
- matches any single digit

Note however:

[0-57] - matches any one of the fol lowing: 0 1 2 3 4 5 7

i.e. 0-5 and 7. Sets of characters can be combined:

[a-d5-8X-Z] - matches any one of the following: a b c d 5 6 7 8 X


Y Z

It is possible to specify a set of characters which are not to be matched in the RE. For
example:

[^0-9] - match es any single character which is not a digit

Anchors

An anchor is used to match a RE found at a particular position. For example:

/^RE/ - matches RE at the start of a line


/RE$/ - matches RE at the end of a line
/^RE$/ - matches RE as the whole line

Note that there are two separate uses of the '^' operator. One is as the sart of line
anchor, and the other as the 'logical not' operator. The latter function only applies
inside square brackets.

Repetitions

Multiple occurrences of REs can be specified. For example:

a* - matches 0 or more occurrences of 'a' aa* - matches 1 or


more occurrences of 'a' .* - matches any string of characters

42
This watermark does not appear in the registered version - http://www.clicktoconvert.com

43

Remembered regular expressions

A null RE stands for the last RE. For example:

:/[Tt]he.*car/p
The blue car exploded with a roar.
:s//(The blue car)/p
(The blue car) exploded with a roar.

The '&' character in a replacement string stands for the most recently matched string.
For example:

:/[Tt]he.*car/p
The blue car exploded with a roar.
:s//(&)/p
(The blue car) exploded with a roar.

Sub-expressions

A sub-expression in a RE can be referred to.

\(string \) - defines an RE sub -expression \n - refers to the


nth RE sub -expression

NOTE The backslash is the escape character for REs. This means it neutralises the
special meanings of special characters. For example:

:p
A line of text
:s/\(line\).*\(text\)/\2\1/p
A text line
:*

Repetition

It is possible to specify multiple occurrences of REs. For example:

c\{4\} matches exactly 4 c's c\{4,\} matches 4 or


more c's c\{2,4\} matches betwe en 2 and 4 c's

For example, to find a line containing 5 digits:

/[0-9]\{5\}/

A summary of special characters


Special characters in the search string

start of line anchor (or NOT operator inside [] )

$ end of line anchor

43
This watermark does not appear in the registered version - http://www.clicktoconvert.com

44

. any character

* character repeated any number of times

\ escape character

[ ] contains range of characters

Special characters in the replacement string

& string matched in search string

\ escape character

Note that any regular expression can be used with grep. (It gets its name from the
editor command g/RE/p which means 'globally search for RE and print it'). This
opens up many new possibilities for the use of grep. Unix commands that use regular
expressions often makes the use of an editor redundant.

PRACTICE

Obtain a listing of the members of your group from the password file using grep.

Introduction to sed
sed is a non- interactive stream editor which is used for text. The command to invoke
sed is:

sed [-n] [-e command ] [-f edfile ] [input_file ]

For example:

sed "s/UNIX/Unix/g" thesis > thesis.new

This will process the file thesis line by line, outputting each line to the file
thesis.new and replacing each occurrence of the string "UNIX" with "Unix".

In the above example every line of thesis will be output to thesis.new, irrespective
of whether it has been changed or not. This is because the default output for sed is
every line of the input. Using the -n option supresses the default output, and only
specified lines are output. In the above example this would mean that no lines would
be output in the following example:

sed -n "s/UNIX/Unix/g" thesis > thesis.new

44
This watermark does not appear in the registered version - http://www.clicktoconvert.com

45

since a change but no output has been specified. If a print command is added, as
follows:

sed -n "s/UNIX/Unix/gp" thesis > thesis.new

then only those lines in which "UNIX" had been changed to "Unix" would be output.

As you also see in the example, the -e option is not not necessary when there is only
one editor command. It is possible to specify more than one command, and in this
case each must be preceded by -e. For example:

% sed -e "s/a/A/" -e "s/b/B/" file1 > file2

This command will carry out the two substitutions on each line of file1.

The -f option enables the user to use a file containing editor commands, instead of
typing out a series of commands with the -e option.

sed examples

The sed command to list only files (exclude directories) is:

% ls -l | sed -n "/ -/p"


-rw------- 1 lnp5jb 1765 mbox
-rw------- 1 lnp5jb 320 example1

The sed command to extract a list of usernames from the password file is:

% sed "s/:.*//" /etc/passwd | more

What this does is to delete everything that comes after ':' in the password file.

Exercises
1. Reproduce the effects of the above sed examples using grep instead. Note that
grep is generally better for searches, such as this, while sed can be used to make
changes to files.

2. Find the system's games directory and type quiz function ed-command to do the
ed commands quiz. Don't worry if there are a couple of things that you haven't come
across. Try it again and see if you improve your score.

45
This watermark does not appear in the registered version - http://www.clicktoconvert.com

46

CHAPTER 11 - PROCESSING LARGE


TEXT CORPORA

This section will focus on exploiting large files containing linguistic material with the
use of the commands already covered plus many more.

Compressed files
Often large files are compressed to save disk space. If this is the case then the user
must make the file revert to it's original format in order to be able to do anything with
it. A popular compressing command is called, simply, compress. The command:

% compress filename

will cause the file to be replaced by a compressed file with a .Z suffix. The command
uncompress will cause it to revert to its original format. It is often not necessary to
uncompress a file to use it. In fact, the file will often be owned by someone else, and
you would have to copy it and then uncompress it, using up a great deal of disk space
and processor time. It is often better to use the zcat which sends the uncompressed
contents of a compressed file to the standard output, while leaving the compressed
version of the file in the filestore.

PRACTICE

Try compressing and uncompressing some of your own files.

Find a large compressed file on your system and search it for some appropriate string
using grep without uncompressing the file.

Some useful commands for processing text files


The following is a summary of some useful commands for processing text files, some
of which you have met already, some of which are new to you. Both have been
included so that this section can easily be used for reference purposes. Not all of these
commands are standard Unix, so they may not all work in the way you expect (or at
all) on your system. For the same reasons, their syntax is somewhat incongruous and
some use different input and output conventions. Not all are included in the command
summary in the appendix below. See the relevant manual pages for more details.

sort sort into alphabetical order

46
This watermark does not appear in the registered version - http://www.clicktoconvert.com

47

sort -n sort into numerical order

sort -m merge sorted files into one sorted file

sort -r sort into reverse order (highest first)

sort -c check a file is already sorted

uniq remove duplicate lines (or partly-duplicate lines)

uniq -d output only duplicate lines

uniq -c count identical lines (or lines with identical fields)

grep find lines containing given string or pattern

grep -v find lines not containing given string or pattern

grep -c count lines containing given string or pattern

grep -n give line numbers of lines containing...

fgrep same as grep except that it does not recognise regular expressions

egrep same as grep except that it recognises all REs grep only recognises certain
special characters

wc -c count characters

wc -w count words

wc -l count lines

NOTE

wc -l file will output the number of lines in the file, and the file name.

wc -l < file just gives the bare line count.

head -17 output first 17 lines

tail -17 output last 17 lines

tail +30 output from line 30

cut -f3 delete all but third field of each line

cut -f3,5 delete all but third and fifth fields of each line

cut -f3-5,7 delete all but 3rd, 4th, 5th, 7th fields of each line

47
This watermark does not appear in the registered version - http://www.clicktoconvert.com

48

cut -c-4,6-8 delete all but 2nd 3rd 4th, 6th 7th 8th characters

cut -f2 -d":" deletes all but the second field where ": " is the field delimiter (tab is the
default)

paste combines files horizontally; corresponding lines are appended

paste -d">" pastes with delimiter defined as "> " (tab is default). The special
characters "\n " (newline) and "\0 " (null string) may be used.

cat concatenates file vertically (appends files to one another)

cat -n precedes each line with a line number in the output

cat -b as above, but does not number blank lines

cat -s reduces any number of successive blank lines to one blank line

tr "abc-e" "kmx-z" translates a, b, c, d, e to k, m, x, y, z respectively.

tr -d "xy" deletes all occurrences of x and y

tr -s "a" "b" translates all a to b and reduces any string of consecutive b to just one
b.

To go down to the character, rather than field, level, sed is simplest for line by line
processing. sed looks for patterns, so is not very good with column or field positions.

uniq needs an already-sorted file. A common idiom is

sort | uniq
to produce a sorted list of all the different lines in a file. uniq has a peculiar way of
spacing its output, so it is difficult to use in a pipeline with another command such as
cut.

tr is useful for converting blanks to newlines (hence converting a text to a vertical list
of words, which can then be sorted, counted etc.). The command:

% tr " " "\012" < filename

will do this. 012 is the octal code for the linefeed character. This is also useful for
converting strings of blanks or tabs to single characters. 011 is the octal code for the
tab character.

PRACTICE

Try out the following pipeline on a text file:

48
This watermark does not appear in the registered version - http://www.clicktoconvert.com

49

tr " " "\012" < input_file | sort | uniq > output_file

Using language corpora


A corpus (plural corpora ) is a collection of language data. The corpora with which we
will be concerned here are electronic, that is they are stored in a computer. Corpora
may contain data about written or spoken language. They usually contain texts from
one language, but they may also be multilingual. Corpora are usually designed and
collated for a specific purpose. Many of the major corpora in use today aim to be
representative of different domains of language use, and can facilitate comparative
studies. For example, the average length of words in academic texts and newspaper
reports could be compared by measuring words in texts from these two domains.
Computers obviously make this type of number-crunching (or word-crunching)
activity much easier than it would be if you had to count words and letters in a printed
text. Corpora are particularly useful for checking the intuitions that we have and the
generalisations that are made about language use.

Unix commands can be used to extract information from language corpora. The
commands learned in this course can be used for issuing commands and writing
simple scripts that can be used to extract information from language corpora.

Types of Corpora
There are many types of corpora, defined by the types of language that they represent
and the formats in which that information is stored. Unix commands for handling
strings are sufficiently flexible to handle many different formats. Users however need
to be sensitive to the arcane minutiae of the format and markup of the different
corpora that they use. The 'l' command in the vi editor can be used to view hidden
characters (such as spaces and tabs) in a file.

The LOB and Brown corpora

Brown and LOB are parallel corpora, with very similar formats and tagging. Brown,
which was constructed first, represents different types of written American English.
LOB represents the same categories of British English. All words are lemmatised and
given a word class tag. Here is a sample from the so-called 'vertical tagged' version of
Brown:

^N01002001 ----- ----- -----


N01002010 - NP Alastair
N01002020 - BEDZ was
N01002030 - AT a
N01002040 - NN bachelor
N01002041 - . .
^N01002042 ----- ----- -----
N01002050 - ABN all
N01002060 - PP$ his
N01002070 - NN life
N01002080 - PP3A he

49
This watermark does not appear in the registered version - http://www.clicktoconvert.com

50

N01002090 - HVD had


N01002100 - BEN been
N01002110 - VBN inclined
N01002120 - TO to
N01003010 - VB regard
N01003020 - NNS women
N01003030 - IN as
N01003040 - PN something
N01003050 - WDTR which
N01003060 - MD must
N01003070 - RB necessarily
N01003080 - BE be
N01003090 - VBN subordinated
N01003100 - IN to
N01004010 - PP$ his

And the 'untagged' version of the same passage, plus the following lines:

N01 0010 DAN MORGAN TOLD HIMSELF HE WOULD FORGET Ann Turner. He
N01 0020 was well rid of her. He certainly didn't want a wife who was
fickle
N01 0030 as Ann. If he had married her, he'd have been asking for
trouble.
N01 0010 DAN MORGAN TOLD HIMSELF HE WOULD FORGET Ann Turner. He
N01 0020 was well rid of her. He certainly didn't want a wife who was
fickle
N01 0030 as Ann. If he had married her, he'd have been asking for
trouble.
N01 0040 But all of this was rationalization. Sometimes he woke up
in
N01 0050 the middle of the night thinking of Ann, and then could not
get back
N01 0060 to sleep. His plans and dreams had revolved around her so
much and for
N01 0070 so long that now he felt as if he had nothing. The easiest
thing would
N01 0080 be to sell out to Al Budd and leave the country, but there
was
N01 0090 a stubborn streak in him that wouldn't allow it. The best
antidote
N01 0100 for the bitterness and disappointment that poisoned him was
hard
N01 0110 work. He found that if he was tired enough at night, he went
to sleep

Users can choose the version (from those available to them) which includes the
information that they need. If you are only interested in word frequencies, then the
grammatical information encoded in the tagged version is redundant, and the
untagged version can be used. If however you are looking for the word 'set' used as a
noun, then it would be necessary to use a tagged version, so that this word can be
differentiated from 'set' used as a verb or adjective.

50
This watermark does not appear in the registered version - http://www.clicktoconvert.com

51

Processing LOB and Brown

The Susanne corpus

This corpus uses a section of the Brown corpus and marks it up with syntactic
information.

N01:0010a - YB <minbrk> - [Oh.Oh]


N01:0010b - NP1m DAN Dan [O[S[Nns:s.
N01:0010c - NP1s MORGAN Morgan .Nns:s]
N01:0010d - VVDv TOLD tell [Vd.Vd]
N01:0010e - PPX1m HIMSELF himself [Nos:i.Nos:i]
N01:0010f - PPHS1m HE he [Fn:o[Nas:s.Nas:s]
N01:0010g - VMd WOULD will [Vdc.
N01:0010h - VV0v FORGET forget .Vdc]
N01:0010i - NP1f Ann Ann [Nns:o.
N01:0010j - NP1s Turner Turner .Nns:o]Fn:o]S]
N01:0010k - YF +. - .
N01:0010m - PPHS1m He he [S[Nas:s.Nas:s]
N01:0020a - VBDZ was be [Vsb.Vsb]
N01:0020b - RR well well [Tn:e[R:h.R:h]
N01:0020c - VVNt rid rid [Vn.Vn]
N01:0020d - IO of of [Po:u.
N01:0020e - PPHO1f her she .Po:u]Tn:e]S]
N01:0020f - YF +. - .
N01:0020g - PPHS1m He he [S[Nas:s.Nas:s]
N01:0020h - RR certainly certainly
[R:m.R:m]
N01:0020i - VDD did do [Vde.
N01:0020j - XX +n<apos>t not .
N01:0020k - VV0v want want .Vde]
N01:0020m - AT1 a a [Ns:o101.
N01:0020n - NN1c wife wife .
N01:0020p - PNQSr who who [Fr[Nq:s101.Nq:s101]

The London-Lund corpus

This corpus differs from the others that we have looked at because it is a transcription
of spoken English. Intonation is marked.

1 1 1 10 1 1 B 11 ((of ^Spanish)) . graph\ology#/

1 1 1 20 1 1 A 11 ^w=ell# ./

1 1 1 30 1 1 A 11 ((if)) did ^y/ou _set _that# - /

1 1 1 40 1 1 B 11 ^well !J\oe and _I#/

1 1 1 50 1 1 B 11 ^set it betw\een _us#/

1 1 1 60 1 1 B 11 ^actually !Joe 'set the :p\aper#/

1 1 1 70 1 1 B 20 and *((3 to 4 sylls))*/

1 1 1 80 1 1 A 11 *^w=ell# ./

51
This watermark does not appear in the registered version - http://www.clicktoconvert.com

52

1 1 1 90 1 1 A 11 "^m/\ay* I _ask#/

1 1 1 100 1 1 A 11 ^what goes !\into that paper n/ow#/

1 1 1 110 1 1 A 11 be^cause I !have to adv=ise# ./

1 1 1 120 1 1 A 21 ((a)) ^couple of people who are !d\oing [dhi: @]/

1 1 1 130 1 1 B 11 well ^what you :d\/o#/

1 1 1 140 1 2 B 12 ^is to - - ^this is sort of be:tween the :tw\/o of


/

1 1 1 140 1 1 B 12 _us# /

1 1 1 150 1 1 B 11 ^what *you* :d\/o#/

1 1 1 160 2 1 B 23 is to ^make sure that your 'own . !c\andidate/

1 1 1 170 1 1 A 11 *^[\m]#*/

1 1 1 160 1 2(B 13 is . *.* ^that your . there`s ^something that your


/

1 1 1 160 1 1(B 13 :own candidate can :h\/andle# - -/

CUVOALD

This acronym stands for the Computer Usable Version of the Oxford Advanced
Learners Dictionary. There are in fact two versions. The most useful is usually in a
file called cuv2.dat contains 68742 words including inflected forms and proper
nouns. It is most often of use as a wordlist, but the file also contains a phonemic
transcription and a part-of-speech tag for every word. Here is a sample of cuv2.dat:

verbs v3bz Kj
verdancy 'v3dnsI L@
verdant 'v3dnt OA
verdict 'v3dIkt K6
verdicts 'v3dIkts Kj
verdigris 'v3dIgrIs L@
verdure 'v3dj@R L@
verge v3dZ I2,K6 3A
verged v3dZd Ic,Id 3A
verger 'v3dZ@R K6
vergers 'v3dZ@z Kj
verges 'v3dZIz Ia,Kj 3A
verging 'v3dZIN Ib 3A
verifiable 'verIfaI@bl OA
verification ,verIfI'keISn M6
verifications ,verIfI'keISnz Mj
verified 'verIfaId Hc,Hd 6A
verifies 'verIfaIz Ha 6A
verify 'verIfaI H3 6A
verifying 'verIfaIIN Hb 6A
verily 'ver@lI Pu
verisimilitude ,verIsI'mIlItjud M6

52
This watermark does not appear in the registered version - http://www.clicktoconvert.com

53

verisimilitudes ,verIsI'mIlItjudz Mj
veritable 'verIt@bl OA
verities 'verItIz Mj
verity 'verItI M8
vermicelli ,v3mI'selI L@
vermiform 'v3mIfOm OA
vermilion v@'mIlI@n M6,OA

The coding conventions for the phonemic and syntactic tags are explained in a file
that comes with dictionary. Some examples of applications that use the dictionary can
be found in the appendix of this course.

Other texts

Corpus building is currently a growth area, and there are many, many more corpora as
well as the above examples. Currently available or under construction are a number of
very large corpora, comprehensive corpora aiming to cover all registers of English,
international English corpora, corpora of different languages and specialised corpora
covering a single well-defined domain of language.

Exercises
1. Find a large text file with a fixed field format (e.g. the Brown or LOB corpora) and
inspect the format. Use zcat to view it if necessary.

3. Use cut to strip away the reference material and leave just the text field.

4. Use tr to strip away any tags that are actually in the text (e.g. attached to the
words), so that you are left with just the words.

5. Make a sorted wordlist from the file.

6. Combine the above commands in a shell script so that you have a small program
for extracting a wordlist.

53
This watermark does not appear in the registered version - http://www.clicktoconvert.com

54

CHAPTER 12 - INTRODUCTION TO
THE VI SCREEN EDITOR

What is vi
Vi is a screen editor. This means that you can see part of the file in a window on the
screen, and editing operations can be controlled by moving a cursor around the text on
screen.

Vi works in a different way from the editing functions of modern word processors. It's
effective use requires a considerable amount of expertise on the part of the user. The
user must have the ability to remember and manipulate opaquely named one- letter
commands that can be combined in an arbitrary variety of different ways.

Vi is a screen-based version of ex. It's lack of user- friendliness is largely a result of


this. In many ways it still works like a line editor, with complicated commands typed
in by the user.

The main enhancements on ex are the window, which enables you to constantly view
part or all of the file, the visible cursor and the commands that can be issued without
moving to the command line. Once you have learned to start vi, you will probably not
need to use ex again. Everything that you have learned with ex, you can do with vi.
What is more, with vi you have a window and the possibility to use interactive
commands. The only time that you might want to use ex now is if you have trouble
running a screen-based utility on your terminal.

Using vi
The next section lists the commands needed to start and use vi. In this section, the key
concepts underpinning the use of vi are explained so that you can understand what is
happening when you use it.

The first thing to understand is that there are three modes:

command mode:

insert mode

last line mode (or command line mode)

You start in command mode. The commands listed below for moving the cursor and
changing the file are entered in command mode. To enter a command simply type it at
the keyboard. What you type will not appear anywhere on screen. To abandon a

54
This watermark does not appear in the registered version - http://www.clicktoconvert.com

55

command you have started, you can type <ESC>. If you are not sure which mode you
are in at any time you can type <ESC> and return to command mode. When you leave
the other modes you return to command mode. Insert mode is used to enter text. Insert
mode is entered by issuing one of a variety of commands that involve entering text.
Insert mode must be exited in order to issue more commands. A common mistake
made is to attempt to enter a command while in insert mode, which results in the
command appearing on screen as part of the text.

Last line mode is entered from command mode, and enables the user to type a
command on the last line of the screen. Any ex command can be used in this way,
simply by typing ':' followed by the command. The current line will be that where the
cursor is positioned.

When you start vi you will see a screen similar to the one below. If you are starting a
new file, or the file you are editing is less than 18 lines long, then the empty lines in
the window will be marked by the '~' (tilde) character.

This is a small file called 'vi.prac'.


This is the second and last line.
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
^
"vi.prac" 2 lines 103 characters
A typical vi screen

Note that is necessary to press return at the end of each line of text that you enter.
Otherwise, vi will interpret all of your text as a single line!

PRACTICE

Create a new file, enter several lines of text and save it.

Edit an existing file that you have, making several changes.

55
This watermark does not appear in the registered version - http://www.clicktoconvert.com

56

vi reference
vi modes
· command Normal and initial state. <ESC> cancels partial command
· insert entered by the following commands: a, A, i, I, o, O, c, C, s,
S, R. Terminates with <ESC> (or ^C ).
· last line entered by :, /, ? or !. Input is read and echoed at the bottom of the
screen. Commands executed by <RETURN> or <ESC>, terminated by ^C.

Entering and leaving vi


· % vi file edit file
· % vi +n file edit starting at line n
· % vi + file :edit starting at end
· % vi +/RE/ file edit starting at RE
· % view file read only mode
· ZZ exit from vi, saving changes (same as :wq)
· ^Z stop vi process, for later resumption

Some simple commands

The following are examples of some compound commands, using the operators listed
later.

· dw delete word
· de delete word leaving punctuation
· dd delete line
· 4dd delete 4 lines
· xp transpose characters
· cwtext <ESC> change word to text

File manipulation

The following are all last line mode commands, so must be preceded by a colon.

· w save changes
· wq save and quit
· q quit
· q! quit, discarding changes
· e file edit file
· e! re-edit current file, discarding changes
· w file write to file
· w! file overwrite file
· ! command execute shell command, then return
· f show current file and line

56
This watermark does not appear in the registered version - http://www.clicktoconvert.com

57

Positioning within the file


· ^F forward one screenful
· ^B back one screenful
· ^D scroll down half screen
· ^U scroll up half screen
· nG go to line n (last line default)
· /RE/ go to next occurrence of RE
· % find matching bracket

Marking
· `` return to previous cursor position
· mx mark position with x
· `x go to mark x

Line positioning
· H top line of window (home)
· M middle line of window
· L last line of window
· + next line, at first non-white character
· - previous line, at first non-white character
· <RETURN> same as +
· j next line, same column (same as down arrow)
· k previous line, same column (same as up arrow)

Character positioning
· 0 beginning of line
· ^ first non-white in line
· $ end of line
· <SPACE> forward (same as right arrow)
· fx find x forwards in current line
· Fx find x backwards in current line
· ; repeat last find command forwards
· : repeat last find command backwards
· n| go to column n

Words, sentences, paragraphs


· w forward to start of next word (delimited by non-alphanumeric character)
· b back to start of last word
· e forward to end of next word
· W as w, with word delimited by blank only
· B as b, with word delimited by blank only
· E as e, with word delimited by blank only
· ) forward to start of next sentence
· ( Back to start of next sentence
· } Forward to start of next sentence
· { Back to start of last sentence

57
This watermark does not appear in the registered version - http://www.clicktoconvert.com

58

Corrections during insert


· H erase last character (or your usual delete key)
· W erase last word
· \ escape character
· <ESC> ends insert; back to command mode
· C ends insert

Insert and replace commands


· a append after cursor
· i insert before cursor
· A append at end of line
· I insert before first non-blank
· o open line below current line
· O open line above current line
· rx replace single character with x
· R replace characters

Operators

The following can be doubled to apply to a line and also preceded by a number to
indicate a number of lines. They can be combined with positional commands (e.g.d$
to delete to end of line.)

· d delete
· c change
· y yank

Miscellaneous operations
· x delete character
· X delete character to left of cursor
· C change rest of line (same as c$).
· D delete rest of line (same as d$)
· J join lines
· Y yank (paste) lines

Yank and put


· p put back after cursor
· P put back before cursor
· "xp put from buffer x
· "xy yank to buffer x
· "xd delete to buffer x

58
This watermark does not appear in the registered version - http://www.clicktoconvert.com

59

Undo, redo and retrieve


· u undo last change
· U restore current line
· . repeat last command
· "np retrieve nth last delete

59
This watermark does not appear in the registered version - http://www.clicktoconvert.com

60

CHAPTER 13 - TEXT FORMATTING

There are text formatting facilities available with all Unix implementations. They will
not be investigated in any detail here. Many users will prefer to use a PC-based word
processing package for document production. Those that want to format text on Unix
will have vastly differing needs, and it would be impossible to go into all of the
possibilities here. A flavour of the simpler programs is given here, and users can look
elsewhere for more extensive documentation.

pr
This is a filter that will format a text, giving a choice of columns, page width, length
etc.. It is not capable of sophisticated formatting for document production.

nroff
The simplest of the proper formatters is nroff. You can format a plain text file with
nroff, by simply typing:

% nroff text_file

Formatting commands can be inserted into text files. Some simple commands:

.ce centre text .ll line length .pl page length .po
page offset (left margin) .sp blank line

These commands may be followed by a numerical argument, which will make the
command apply to the specified number of lines, e.g. .sp 3 to leave three blank lines.
Formatting commands must be placed at the beginning of a line to be recognised as
such. Normally they appear as the only text on a line. Commands are normally
composed of lower-case characters. Here is an example of a text containing some
nroff instructions:

.ce
This is the title
.sp 2
And this is the text, which
will be formatted and justified when I run nroff. You will see
that the line
breaks will change, and the text will look tidier. That is what
formatting is all about.
.sp
That was a blank line.

The following is what the output from this file would look like:

60
This watermark does not appear in the registered version - http://www.clicktoconvert.com

61

This is the title


And this is the text, which will be formatted and justified when I
run nroff. You will see that the line breaks will change, and the
text will look tidier. That is what formatting is all about.
That was a blank line.

nroff macros
Macros are a special type of nroff command, identified by being in upper-case
characters. Standard macro libraries can be invoked by using option flags with the
nroff command, e.g.:

nroff -ms filename

for the standard macros. Other macro libraries can be invoked by the me, mn and mv
options. Here are some standard macros:

.FS footnote starts .FE footnote ends .ND no date .TL


title .PP start paragra ph

The .PP tag, for example, is the equivalent of the following sequence of ordinary
nroff instructions:

.sp 5
.ce 1
.sp 5

It is possible write your macros.

More details on nroff can be found in the manual.

61
This watermark does not appear in the registered version - http://www.clicktoconvert.com

62

CHAPTER 14 - MORE ON THE


SHELL

General
The role of the shell

A Unix shell is used to:

evaluate the command line. For example:

% car nofile
car: Command not found

Here the shell looks for a command called car. Since it cannot find this command it
gives an error message.

perform variable substitution. For example:

% echo "In directory $HOME"


In directory /home/sunserv1_b/lnp5jb

Here the shell variable $HOME is evaluated and displayed.

handle pipelines. For example:

% who | wc -l

Here the output from who is piped through to the wc command which displays a count
of the number of lines in its input.

Types of shells

A number of shells are available for Unix systems, including:

Bourne shell

C shell

Korn shell

Graphical User Interface (GUI) shells

62
This watermark does not appear in the registered version - http://www.clicktoconvert.com

63

The Bourne shell, which was developed by Steve Bourne at Bell Laboratories, is one
of the oldest shells and, as such, has gained a lot of popularity. It is widely used for
shell programming because of its efficiency and because it is available on all Unix
systems.

The C shell provides sophisticated interactive capabilities lacking in the Bourne shell.
The C shell, which was developed at the University of California, Berkeley, has a
syntax which resembles the C language. Features of the C shell include a command
history buffer, command aliases and file name completion.

However the C shell does not allow efficient shell programs (also known as scripts) to
be written. Due to the fact that C shell programs are written in a style similar to the C
programming language, people who are unfamiliar with C may find the C shell
difficult to program in.

The Korn shell combines the best features of the Bourne and C shells. Korn scripts are
95% upwardly compatible with Bourne scripts. The Korn shell interactive features
include:

in- line editing

command editing

job control

Graphical User Interface (GUI) shells provide a iconic interface to Unix. GUI shells
require the use of workstations (or powerful microcomputers) which perform part of
the processing locally. The use of GUIs such as X-Windows is likely to become
increasingly important in the near future. GUIs currently available include:

Sun View A Sun-specific GUI

Open Look GUI standard supported by Sun

Motif GUI standard supported by other suppliers

Vista eXceed Available on PCs; similar in style to Motif

There is a battle currently taking place in the market-place to establish the standard
GUI.

Recommended shells

The Bourne shell is the oldest shell, and is widely used. The C shell has more utilities
however and is probably more widely used now.

63
This watermark does not appear in the registered version - http://www.clicktoconvert.com

64

The default shell for interactive shells at Leeds is the C shell. The Bourne shell is the
default for shell programs.

However the Bourne shell is recommended for shell programs. The Korn shell is not
widely available and is not a standard part of Unix, but is perhaps the best option if
available, unless you want to do a lot of C programming. You can change your default
login shell using the command:

% chsh username /bin/sh Bourne shell % chsh


username /bin/csh C shell % chsh username /bin/ksh

Warning! You probably don't want to try these commands now.

C shell features
The history mechanism

The history mechanism enables previous typed Unix commands to be re- invoked and
edited. There are two forms. One is the quick substitution, which acts only on the
immediately preceding command, e.g:

% car message
car: Command not found
% ^r^t
This is the message file

This command replaces the first occurrence of 'r' with 't' in the last command.

A list of previously entered commands can be displayed using the history command:

% history
1 cd texts
2 vi lookup
3 who
4 history

Commands can be re-entered using the number. For example:

% !2

will re-execute the second command (vi lookup). It is possible to add extra options
to commands re-executed. For example to redirect output from the who command to a
file called list we could give the command (for the above list):

% !3 > list

64
This watermark does not appear in the registered version - http://www.clicktoconvert.com

65

You may also edit previous commands e.g:

% !2:s/vi/cat/
cat lookup

although it is usually easier to re-type the whole command. The last command may be
referred to as !!, and you can count back using !-2, !-3 etc..

File name completion

Within the C shell when a file name is used in a command it is possible to specify
only as many characters as will uniquely identify the file, and then press the <ESC>
key to complete the filename:

% ls
mbox message
% cat me<ESC>
This is the message file

When you type <ESC>, the file name will be extended to 'message' on screen.

Command aliases

Command aliases provide a way of customising commands. For example:

% alias dir ls
% dir
mbox message

Note that command aliases are only valid during the execution of the current shell. It
is normal practice to include alias definitions in your .cshrc file.

The following aliases could be useful to shorten long command names:

alias hh history
alias ll 'ls -al'
alias q logout

The quotes around ls -al are necessary because of the space in the command. This
tells the shell that it is all one command.

65
This watermark does not appear in the registered version - http://www.clicktoconvert.com

66

PRACTICE

Put the above aliases in your .cshrc file. Think of some other aliases that you would
use, such as shortened versions of commands or different names for commands that
you will find easier to remember.

C shell startup files

Certain files are executed automatically.

These are:

.cshrc file

Executed whenever a new C shell spawned

Useful for specifying command aliases

Since C shells may be spawned automatically be certain systems commands (such as


the mail system of a compiler) this file should NOT contain commands which send
output to your terminal.

Contains a list of directories that are searched for commands. A line in the .cshrc file
will give a value to the PATH system variable. The user can add pathnames to this list.
It is conventional to store any of your own commands or shell scripts that you will use
frequently directory called bin, and to add ^/bin to your search path.

.login file

Executed when you login.

Use for setting system wide variables, such as your terminal type.

Can be used to display information, such as who is logged on, or news from the
system managers.

Shell processes
A process is an executing program. To display a list of processes use the ps command:

% ps
PID TTY TIME COMMAND
23268 ttyp1 0:01 ps
22520 ttyp1 0:00 csh

66
This watermark does not appear in the registered version - http://www.clicktoconvert.com

67

The PID specifies the Process Identifier. The 'time' field gives the amount of CPU
used by the process.

Background processes

Normally processes run interactively, but they may also be run interactively, to enable
the user to do something else while a process is running (this is known as
'multitasking'). This is usually necessary when you are running a very long job. To
run a command in the background use the & character at the end of the command line,
as follows:

% command &

Note that output from command will still be sent to standard output. If you fail to
redirect standard output it will be sent to your terminal where it is likely to be
confused with output from your interactive process.

For example, to sort logged on users using a background process give the command:

% who | sort > sortedwho &

Note that this would normally be a very short process and you would not in fact need
to run it in the background.

Controlling processes

You may wish to terminate a background process. To do this first you must first find
out its process id (PID) using ps :

% ps
PID TTY TIME COMMAND
23397 ttyp1 0:01 who
23268 ttyp1 0:02 ps
22520 ttyp1 0:00 csh

Then use the kill command to terminate your process.

For example:

% kill 23397

If the process continues use the -9 argument:

% kill -9 23397

Another way of displaying your background processes is to use the jobs command:

% jobs
[1] + Running who - sort > sortedwho

67
This watermark does not appear in the registered version - http://www.clicktoconvert.com

68

The background process (or 'job') has been assigned the number 1, and this can be
used to refer to it instead of the process i.d.. The job number is usually identified by
preceding it with the '%' (per cent) character, so as to differentiate it from a process
i.d.. So, for example, the command:

% kill %1

should kill this process. A job may also be stopped using ^Z if it is running
interactively (you have already met this as a way of stopping vi). A stopped job can
be resumed by simply typing it's job number (e.g. %1 to run it in the foreground, or %1
& to run it in the background).

NOTE There are also the C shell commands fg and bg which will bring a job to the
foreground and push to the background respectively.

Controlling Processes After Logging Off

If you create a background process and log off the background process will continue
to execute. If you log in again and use ps or jobs command to display your
background processes, the original background process will not be displayed. The
same will happen if you switch between windows when you are using a GUI. This is
because these commands, by default, will only display processes that have been
created (or 'spawned') by the original login process.

To display background processes spawned by a previous login session you will have
to use the command:

% ps -u lnp5jb
UID PID PPID C STIME TTY TIME COMMAND
lnp5jb 7759 7757 0 10:37:21 ttyw7 0:00 who > sorted who
lnp5jb 5058 5057 0 09:57:02 ttyw5 10:03 longjob
lnp5jb 7760 7758 18 10:37:21 ttyv4 0:00 -csh [csh]
lnp5jb 7798 7760 6 10:37:42 ttyv4 0.00 ps -fu lnp5jb

Special characters
Certain characters have a special meaning to the shell. The backslash (\ ) is known as
the escape character. A character following an escape character has a special meaning.
For example:

% echo "This is a very long message \


which is longer than 1 line"

In this example because a command could not fit on one line, the \ character was
given IMMEDIATELY before the <RETURN> key was pressed. This indicates that the
<RETURN> has a special meaning - which is not it's usual meaning, to terminate the
command. The double quotes character (") is used to group words together as a single
expression. The single back quote (`) is used to identify a string which is to be
executed rather than to be displayed. For example:

% echo "Users logged on are: `who`"

68
This watermark does not appear in the registered version - http://www.clicktoconvert.com

69

PRACTICE

Try this with and without the backquotes around who. Try it with date.

Shell parameters
Parameters can be set interactively in the C shell by using the set command:

% set jenny=/home/sunserv1_b/lnp5jb

To then use a parameter:

% cd $jenny
% pwd
/home/sunserv1_b/lnp5jb

The variable name is preceded by the $ prefix, to indicate that it is a variable. Curly
brackets can be used to delimit the variable name if other characters needs to come
straight after it. For example:

$ cat ${jenny}/test.dat

Note that the syntax for the Bourne shell is slightly different. You are most likely to
use parameters in shell scripts, which you may well be executed by a Bourne shell.
The basic difference is that the set command is not used:

$ jenny=/home/sunserv1_b/lnp5jb
$ cd $jenny

Special shell variables

CDPATH informs the shell where to search for the relative pathnames

HOME the name of your home directory

MAIL the pathname of the file where your mail is placed

PATH the list of directories searched for commands

PS1 the primary prompt string

PS2 the secondary prompt string

69
This watermark does not appear in the registered version - http://www.clicktoconvert.com

70

! the process number of the last process run in the background

# the number of positional parameters

$ the process number of the current shell

? the exit status of the last command run (0 if it was completed successfully, non- zero
otherwise).

You can see the values of these variables by typing set (with no arguments).

70
This watermark does not appear in the registered version - http://www.clicktoconvert.com

71

CHAPTER 15 - SHELL
PROGRAMMING

Shell commands can be stored in a file which can be executed when required. A file
containing shell commands is known as a script. For example:

% cat > list -create the file pwd


ls
^D
% chmod u+x list -give execute permission % list -execute the
script /home/sunserv2_a/lnp5jb
mbox message list bin

Control structures
As mentioned earlier you are strongly recommended to carry out all shell
programming in the Bourne Shell. This does not mean that you have to be running a
Bourne Shell when you start a script. By default, all shell scripts are normally
executed by the Bourne shell, whatever your normal interactive shell. This is possible
because when you run a script a new shell is started ('spawned' according to the
jargon) to run the commands. You can add (as the first line):

#! /bin/sh

to ensure that it is a Bourne Shell script. A C shell script would begin:

#! /bin/csh

Command parameters

The Bourne shell is capable of using parameters (see the section on parameters in the
previous chapter.) These may be defined by the attribution operator =, by the read
command and by the for command. The Bourne shell also interprets parameters
which are given as arguments to the command that executes the shell script. Such
parameters are 'positional parameters', which means that they are interpreted as a list
structure. This can be seen in the simple example below:

71
This watermark does not appear in the registered version - http://www.clicktoconvert.com

72

% ex simple - first create the scri pt"simple" [New file]


:a
echo $1
echo $2
echo $3
.
:wq
"simple" [New file] 3 lines, 24 characters
% chmod u+x simple - make it executable % simple one two three
- execute it one
two
three
%

The three arguments given to the script ('one', 'two' and 'three') are read in by the
script as variables named 1, 2 and 3, and so are referred to in the script as $1, $2 and
$3 respectively. The special parameter * refers to all of the parameters, and the special
parameter # refers to the number of parameters.

% ex simple2 - create a new script "simple2" [New file]


:a
echo $*
echo $#
:wq
"simple" [New file] 3 lines, 51 characters
% chmod u+x simple2
% simple2 one two three
one two three
3
%

read

The read command enables parameter values to be entered interactively by the user
while the script is running. It is usual to provide a prompt for the user, as in the script
listed below (called greeting):

echo "What's your name?"


read name
echo "Hello, $name"

This can give the following results:

% greeting - execute the script What's your name? - output Jenny


- the shell waits for your input Hello, Jenny - output %

More than one parameter can be given to the read command, usually separated by
one or more spaces, as in the following script (called count):

echo How far can you count?


read first second third
echo $first $second $third

which can run to give the following:

% count

72
This watermark does not appear in the registered version - http://www.clicktoconvert.com

73

How far can you count?


1 2 3 - user input 1 2 3 - script output

PRACTICE

See what happens with this script if you give it less than three parameters. Try it with
more than three - is this what you expected? Can you explain this?

Try changing the script so that it echoes each parameter on a different line. This
should show what is going on.

Control structures
Sometimes it is useful to use control structures (like you find in programming
languages), for example specifying that a command is only carried out under certain
conditions, or that it does the same thing to a list of arguments. The shell provides
control of flow with the following statements:

if structured control branching

case multiway branching

for looping over a list of commands

while conditional looping

until conditional looping

if...then...else...fi

This structure allows conditional branching. It takes the following form:

if command_list_1 then command_list_2 [else command_list_3 ] -this


clause is optional fi

Note that it is usual practice to indent the subordinate clauses, to make the script
easier to read, but it is not necessary. This structure depends on the exit status of
command_list_1. Every time a command runs it returns a 0 (also known as a 'true
result') if it completes it's run successfully or a 1 ('false') if fails to end normally. The
command_list_2 if and only if the exit status of the last command in
command_list_1 is 0 (or true). The command_list_3 is executed if and only if the
exit status of command_list_1 is 1 (or false).

The test command is often used to generate an exit result. Equivalence operators
may also be used such as = (equals) or != (not equal to). The following example
shows the script trio in action:

73
This watermark does not appear in the registered version - http://www.clicktoconvert.com

74

% cat trio
if test $ = 3
then echo "There are three parameters."
fi
% trio one two three
There are three parameters.
% trio one two
% - No output

test
The test command can be used in its simplest form to test if a string exists (more
exactly, if it is a 'null string' or not), as in the following script:

% cat test.1
echo "Type something please:"
read a
if test $a
then echo "Thank you"
else echo "Thanks for nothing"
fi
% test.1
Type something please:
Hello
Thank you
% test.1
Type something please:
Thanks for nothing
%

There are also several options that can be used in a command of the form:

test [options ] filename

The following options are available:

-d true if a file is a directory

-h true if a file is a symbolic link

-x true if file exists and is executable

-l tests the length of a string

-f true if the file exists

-r true if the file can be read

-s true if the file exists and is not empty

-w true if the file can be written to

= is equal to

74
This watermark does not appear in the registered version - http://www.clicktoconvert.com

75

!= is not equal to

There are also the following arithmetic operators which apply to integer values:

-eq is equal to

-ne is not equal to

-gt is greater than

-ge is greater than or equal to

-lt is less then

-le is less than or equal to

Note that the above operators are all for use with the test command, and cannot be
used independently.

case

When more than two directions for the control of flow are needed, if clauses may be
nested, but the case structure provides a more elegant way of doing this. The case
structure is of the form:

case string in
pattern) command_list_1;;
pattern ) command_list_1 ;;
--
--
pattern ) command_list_N ;;
esac

The shell attempts to match the string with each pattern in turn. When a pattern that
matches string is found, the appropriate command list is executed, and the case
command is then terminated.

The case command is often used to give the user a choice of options, as in the
following:

% cat pick
echo "Type one of the following:"
echo " 1 - who am I?"
echo " 2 - who is logged on?"
echo " 3 - date"
echo " 4 - calendar"
read n
case $n in
1) whoami ;;
2) who ;;
3) date ;;
4) cal ;;
esac

75
This watermark does not appear in the registered version - http://www.clicktoconvert.com

76

Study the following, rather more complex, example:

% cat test.2
echo "Give me a letter:"
read letter
case $letter in
[aeiou]) echo "That's a vowel!";;
[b-df-hj-np-tv-z]) echo "That's a consonant!";;
[A-Z]) echo "I said lower case!";;
[1-9]) echo "I said a letter, not a number!";;
*) echo "What's that?"
esac
echo "Thank you and goodbye"
% test.2
Give me a letter:
a
That's a vowel!
Thank you and goodbye
% test.2
x
That's a consonant!
Thank you and goodbye
% test.2
;
What's that?
Thank you and goodbye
%

Note that the last pattern in this case clause will match anything if a match has not
already been found.

for

The for command can be used to apply a list of commands to a series of variables. It
has the general form:

for variable [in wordlist]


do
command -list done

The wordlist is a series of strings separated by spaces. The variable takes the value of
each of this strings consecutively and then runs the command list. Here is an example:

for n in one two three four five six seven


do
echo $n
done

This script will output the list of words ('one', 'two', etc.)

while

The while command allows a sequence of commands to be executed repeatedly while


certain conditions are met. It takes the form:

while command_list_1do

76
This watermark does not appear in the registered version - http://www.clicktoconvert.com

77

command_list_2 done

If command_list_1 is exited successfully, then command_list_2 is executed. This


process continues until command_list_1 fails. Here is an example:

flag=y
while test $flag = y
do
echo Do it again?
read flag
done

The loop will be repeated while the value of the variable flag remains 'y'.

until

The until command tests for the opposite condition to the while command.
Command -list-1 is executed until command_list_2 fails. The following will do the
same as the example with while above:

flag=y
until test $flag = n
do
echo Do it again?
read flag
done

77
This watermark does not appear in the registered version - http://www.clicktoconvert.com

78

Exercises
1. Write a script called hello which outputs the following:

your username

the time and date

who is logged on

Also output a line of asterices (*********) after each section.

2. Put the command hello into your .login file so that the script is executed every
time that you log on.

3. Write a script that will count the number of files in each of your subdirectories.

78
This watermark does not appear in the registered version - http://www.clicktoconvert.com

79

APPENDIX

Command summary
alias assigns an alias for commands, files or devices. Only available in the C shell.

cat concatenates (joins) files and lists the result. Often used to direct the contents of a
single file to the standard output.

Qualifiers:

-n numbers the lines in the file(s)

-s eliminates consecutive blank lines

Example: % cat file1 file2 > file3

cd [directory] changes current working directory. (Default is home directory.)

Example: % cd /usr/etc

chmod mode file changes permissions of files and directories. Mode consists of three
elements: [ugoa] [+-=] [rwxs]

Example: % chmod g+r project (adds read permission to group)

cmp compares two files and prints the line number and character where they differ.

Example: % cmp file1 file2

comm compares two files for common lines.

-1 suppresses lines that only occur in file1

-2 suppresses lines that only occur in file2

-3 suppresses lines that only occur in one file

cp makes a copy of a file.

Qualifiers:

-i interactive mode (to protect destination file if it already exists)

79
This watermark does not appear in the registered version - http://www.clicktoconvert.com

80

Example: % cp - i file1 file2

date gives time and date

diff lists differences in two files or directories.

Qualifiers

-b ignores trailing blanks

-e prints ed changes needed to make files identical

ed accesses the ed line editor

grep searches a file for a pattern (see chapter 15)

head -n Prints first n lines

jobs lists the background jobs.

Qualifier:

-1 displays process id

kill terminates background job

ln -s sets up a symbolic link to a file or directory.

Example: ln -s /usr/games fun

ls lists files in a directory (default current directory)

Qualifiers:

-a all files (including hidden files)

-c in order of creation time

-g give group identity

-l in long format

-s sorted by block size

-t sorted by modification time

-u sorted by last access time

mail receives and sends mail

80
This watermark does not appear in the registered version - http://www.clicktoconvert.com

81

mkdir creates a directory

more lists the contents of a file a page at a time.

mv moves a file. Often used to simply rename a file.

Qualifiers:

-i interactive mode to protect destination file if it already exists

passwd change passwd

pg pager available on some systems

pr formats and outputs a file.

Qualifiers:

-ln where n is the page length (default 66)

-wn where n is the page width (default 72)

- n no. of columns

-hstring defines the header name as string

pwd displays name of current directory

rm deletes files

Qualifiers:

-i interactive prompt to protect files

rmdir delete directory (only works on empty directories).

sort sorts and merges files.

Qualifiers:

-b ignores blanks

-d dictionary order

-f fold upper to lower case

-i ignores characters outside the printable ASCII set

-n sorts numbers by value

81
This watermark does not appear in the registered version - http://www.clicktoconvert.com

82

-o directs output to a file

-r sorts in reverse order

spell checks spelling in a file

tail n lists the last n lines of a file if n is negative, or starts listing on the nth line, if n
is positive

time displays the execution time of a command

unalias removes a previously defined alias

vi accesses the vi screen editor

wc counts the number of lines, words and characters in a file

Qualifiers:

-c counts only characters

-w counts only words

-l counts only lines

who who is logged on

write direct communications to users on the same machine

82
This watermark does not appear in the registered version - http://www.clicktoconvert.com

83

APPENDIX B - EXAMPLE SCRIPTS

1. Some examples of commands in action


read
echo How far can you count?
read first second third
echo $first
echo $second
echo $third

for
for x in 1 2 3 4 5 6 7 8 9
do
echo $x
echo -n "Do you want to continue: (y/n) "
read f
if test $f = n
then break
fi
done
echo Done

case
echo "Give me a letter:"
read l
case $l in
[aeiou]) echo "That's a vowel!";;
[b-df-hj-np-tv-z]) echo "That's a consonant!";;
[A-Z]) echo "I said lower case!";;
[1-9]) echo "I said a letter, not a number!";;
*) echo "What's that?" ;;
esac
echo "Thank you and goodbye."

until
n=1
until test $n = 10
do
echo $n
n=`expr $n + 1`
done

83
This watermark does not appear in the registered version - http://www.clicktoconvert.com

84

while
flag=y
while test $flag = y
do
echo Do it again?
read flag
done

if
echo -n "Give me a number: "
read n
echo
if test $n
then
echo The number is $n
if test $n -gt 100
then
echo That's a big number!
else
if test $n -le 100
then
echo That's a nice number!
fi
fi
if test $n = 69
then
echo That's a bit rude!
fi
fi
echo
echo Byee!

while
while echo 'Give me a word:'; read name
do
echo "Hello , " $name
done

test
echo "Type something please:"
read a
if test -d $a
then echo "Thank you"
else echo "Thanks for nothing"
fi
if test $# = 3
then echo "There are three parameters"
fi

84
This watermark does not appear in the registered version - http://www.clicktoconvert.com

85

sed

Can you work out what this bourne shell command does?

sed -e "s/./&\\
/g" file_name | tr A-Z a-z | sort | uniq -c | sort -r
Try it!

2. A few useful algorithms


Incrementing a variable
n=1
until test $n = 10
do
echo $n
n=`expr $n + 1`
done

Using flags
flag=y
until test $flag = n
do
echo Do it again?
read flag
done
flag=y
while test $flag = "y"
do
then echo Do it again?
read flag
done

3. Using a dictionary
The following script will look up regular expressions in the Computer-Usable Version
of the Oxford Advanced Learners Dictionary (CUVOALD). It was designed to be of
use to crossword players, who know the number of letters in a word, and have some
of the letters.

x
echo "This program looks up in the OALD words you don't know in the
crossword."
echo
echo "Type your word with periods ('.') for the letters you do not
know"
echo "and do not type spaces in words (e.g. busstop):"
echo "(type ^c to interrupt)"
echo
read re
echo
cut -f1 /home/gps_20/ecl6rsh/cif/ctape/oald.mitton/cuv2 \
| tr -d " " | grep "^$re$"

85
This watermark does not appear in the registered version - http://www.clicktoconvert.com

86

lookup

This script is a more complex version of the above, that makes full use of all the
information in CUVOALD. See Chapter 11 for more information on the dictionary.

comment="For looking up words in the oald, giving full listing of the


entry"
echo "What are you looking for:"
echo
echo "a - a word"
echo "b - a pattern that is part of a word"
echo "c - a transcription"
echo "d - a partial transcription"
echo "e - a word class tag"
echo "f - a pattern to match any of the above categories"
echo
read select
echo
echo "Type in the word or pattern:"
read re
echo
echo Looking for $re
dict="/home/gps_20/ecl6rsh/cif/ctape/oald.mitton/cuv2"
case $select in
a) grep "^$re " $dict | more ;;
b) grep ".*$re" $dict | more ;;
c) grep " $re " $dict | more ;;
d) grep " .*$re" $dict | more ;;
e) grep "$re[,$]" $dict | more ;;
f) grep $re $dict | more ;;
esac
echo
echo -n "Press any key to continue, or CTRL-C to stop: "
read n
lookup

4. A cloze test
The following script runs a cloze test. This is a vocabulary test for language learners.
The student is presented with a text from which several words are missing, and he has
to guess the words form the context. This probably represents the limits to which
Bourne shell scripts can be used for tasks normally done with a programmng
language. It would be interesting to compare this script with a program in C or Icon.
Note that the different modules (the text, the missing words and the script) are
separate files. This means that more texts can easily be added by the teacher.

86
This watermark does not appear in the registered version - http://www.clicktoconvert.com

87

cloze
clear
n=1
until test $n = 10
do
echo
n=`expr $n + 1`
done
echo " ********************"
echo " **** CLOZE TEST ****"
echo " ********************"
echo
echo
echo ' Type CONTROL-C to exit at any time'
sleep 3
echo
until test ${name.tmp}
do
echo -n 'Please type in your name: '
read name.tmp
done
echo
until test $text
do
echo 'Please choose a text. Type one of the following:'
ls *.txt | sed "s/\.txt//"
echo
echo -n ': '
read text
done
cp ${text}.txt $HOME/$name.tmp
clear
cat $HOME/$name.tmp
echo
echo "You must guess the missing words."
echo "Read the entire text, then press return."
echo "-----------------Press return-----------------"
read rubbish
n=1
until test $n = 9
do
ans=`head -$n ${text}.sol | tail -1`
flag=
until test $flag
do
clear
cat $HOME/$name.tmp
echo
echo -n "Guess word number ${n} (just type RETURN to give up): " read
guess
if test $guess
then sleep 1
else echo 'The answer is ' $ans ; break fi
if test $guess = $ans
then echo Right! ; sleep 1 ; flag=y else echo 'Wrong!' ; sleep 2 ;
continue fi
done
sed "s/\[$n\]/$ans/" $HOME/$name.tmp > tmp.$$ mv tmp.$$
$HOME/$name.tmp
n=`expr $n + 1`
done

87
This watermark does not appear in the registered version - http://www.clicktoconvert.com

88

UK.txt
The UK is one of the world's great trading powers and financial
centers, and its economy [1] among the four largest in Europe. The
Thatcher government halted the expansion of welfare measures and
promoted extensive reprivatization of the government economic sector.
Agriculture is intensive, highly mechanized, and efficient by
European
standards, producing about 60% of food [2] with only 1% of the labor
force. Industry is a mixture of public and [3] enterprises, employing
about 27% of the work force and generating 22% of GDP. The UK is an
energy-rich nation with large coal, natural gas, and oil reserves;
primary energy production accounts for 12% of GDP, one of the highest
shares of any industrial nation. In mid-1990 the economy fell into
[4]
after eight years of strong economic expansion, which had raised
national output by one quarter. Britain's inflation rate, which has
been consistently well [5] those of her major trading partners, is
expected to decline in 1991. Between 1986 and 1990 unemployment fell
from 11% to about 6%, but it is now [6] rapidly because of the
economic
slowdown. As a major trading nation, the UK will continue to be
greatly
affected by world boom or [7], swings in the international oil
market,
productivity trends in domestic industry, and the terms on which the
economic integration of [8] proceeds.

UK.sol
ranks
needs
private
recession
above
rising
recession
Europe

The source of this material is


http://www.comp.lancs.ac.uk/computing/users/eiamjw/unix/index.html

88

Vous aimerez peut-être aussi