Vous êtes sur la page 1sur 71

Introduction To Scientific Computing

Introduction
Key commands and concepts in this chapter

Computing
Bits and bytes
Computer architecture
Data streams and languages
Apple graphical user interface, GUI
Command line interface, CLI
Vi editor
man
vimtutor
[CTRL+d]
login
passwd
logout

Concepts
What is it?
A brief history.
Binary digit. All information is written down in bits
Hardware and software
Input, manipulate, output
How to use the Apple Mac
The terminal
The making of a file
Commands
Help pages (manual)
Tutorial on the editor, vi
Exit a terminal session
Gaining access to the terminal
Every login needs a passwd
Exiting your session

INTRODUCTION
Computing is the process of using computers, hardware, software and related technology to complete a given problem
or task. We probably all perform some form of computing everyday. Our cell phones are the most obvious use. Music
downloads, gaming, email, GPS, programmable thermostats, charge and debit cards, key cards all involve some form of
computing. Computing can also be thought of as manipulation of information. Thanks to the rapid development of
various technologies, we live in the age of excess information. Information about science, art, religion, banking,
business, entertainment, politics, weather, sports, people, societies, history, and so on
You will find that no matter what discipline you choose to study you will also need computing skills to maneuver
through your field: from the liberal arts major gathering data from archeological digs to the data crunching needs of
particle physicists working with data obtained from the large Hadron Collider to the genomics and bioinformatics
research of todays biology related majors. Academicians across the world are realizing they must start students early
in developing computational thinking.
In this class you will begin to learn the process of computing through three different computing languages. Each
section will build on the previous but you will learn the basics first. First you will learn about the UNIX shell, which
allows you to give commands to the UNIX operating system. Then you will learn Python, which is an example of a very
popular language that has proven to be quite useful in todays computing problems. The last language is a very
intuitive, and widely used in some scientific and engineering communities. Throughout the semester you will build a
project that takes the idea of obtaining data, bringing it into a computer, manipulating the data, and finally presenting
the data. Topics for this project are wide ranging but should focus on something you are interested in understanding
further.



A brief history
When did computers come to be? When did they first appear on the scene? Some would argue an abacus is a
computer. Most would say the first invention of a programmable computer was the analytical engine, a mechanical
device programmed with punch cards, designed by Charles Babbage in the 1800s. At the same time a contemporary
of Babbage, Ada Lovelace, was credited with conceiving of the first general purpose computer program.
A significant event in the history leading up to todays computers was the development from mechanical devices to
vacuum tubes which controlled the flow of current through a circuit such that current in a tube could flow on or off to
transistors which miniaturized the on off nature of a signal. The contribution of van Neumann to the idea of a stored
set of instructions was also key. And finally, around the same time period, Alan Turing outlined the idea that given a
finite set of steps or moves (also called a primitive set of instructions) any problem posed could be solved by the set
of instructions given. This is referred to as the Turing Machine. These three events in the first half of the 20th century
were foundational to the development of the computer as we know it today.

What do bits have to do with it?
Computers are composed of switches which can be turned on or off. To store and manipulate data on computers, this
data must be turned into so called binary notation. A bit, short for binary digit, is a single digit that can have the values
1 or 0 for yes or no, on or off. Any number can be expressed as a binary number, written in the base-2 numeral
system.

Numbers are (just) bits (Excerpted from ref. Leo Reyzin)
How is 10100101 a number? Each position of the number is treated as a power of 2. From right to
left: 1 = 20, 2 = 21, 4 = 22, 8 = 23, . . . . Then you add the positions where the bit is equal to 1 and all
others are multiplied by zero and are therefore not included. In the example 10100101 =
1x20+0x2 1+1x22+0x23+0x2 4+1x25+0x26+1x2 7 = 165. We can see that actually we use this concept
every day; the number 165 = 5x1 + 6x10 + 1x100 or 5x100 + 6x101 + 1x102. In the first example w e
are using the binary (or base 2) system and in the second case we are using the more familiar
decimal (or base 10) system to describe the quantity 165.

Text is (just) bits
In fact, letters can be represented by a numerical code, and numbers, as we now know, are bits.
There is a table of numerical codes for letters and other symbols, called ASCII (American Standard
Code for Information Interchange). The table only goes up to 255, because it was decided at some
point (apparently, at IBM in 1962) that a computer will work with chunks of 8 bits at a time. Eight
bits are called a byte (the smallest amount the computer can byte off).

Seeing bits of a file
Writing out individual bits is a pain. We have a shorthand for 4 bits at a time: Notation: Writing
with 0s and 1s is called binary. Our good old 0 through 9 is called decimal, and adding A through F
is called hexadecimal. A byte can be represented by 8 bits or two hexadecimal digits. There is
software that allows you to see bytes of a file, typically displaying each byte in hexadecimal
notation. For example, we can look at a plain-text (.txt) file and see the ASCII code of each letter.
These bits are hard to get rid of as well. Some programs just mask d eletions and the original text is
still available. Some just black it out. Both scenarios have caused problems for the government in
recent years.


Pictures are (just) bits
We perceive color because our eye has sensors (called cones) for red, green, and blue light. Thus,
to display a color to a human, we need to specify the intensity of its red, green, and blue (RGB)
components. We can use numbers for that. Typically, we use one byte (i.e., intensity between 0 and
255) per component, thus using three bytes, or 24 bits, to encode a color. See, e.g.,
http://colormixers.com/mixers/cmr/. Because there are 24 bits, this representations allows for
224 = 16, 777, 216 possible colors. Another way to see the number of possible colors is to use the
multiplication principle: observe that each of the three components has 256 possible values
independent of the other componentsthus, there are 256 256 256 = 16, 777, 216 possible
combinations of the three components. The number of pixels in an image and the number of bits
used for each pixel determine the file size of the image.

Sound is (just) Bits
Sound is just air pressure acting on our eardrum. We can view it as a graph of pressure as a
function of time and then sample the graph at discrete time points to obtain a binary
representation of sound.

How Many Bits
Using prefixes from the metric system, a kilobit (kb) is 1,000 bits, and a kilobyte (kB) is 1,000
bytes, or 8,000 bits (note that lower case b usually means bit, and uppercase B usually means byte
in such abbreviations). Similarly, a megabyte is a million (106) bytes, a gigabyte is a billion (a
thousand million or 109) bytes, a terabyte is a trillion (a thousand billion, or 1012) bytes, and a
petabyte is a quadrillion (a thousand trillion or 1015 bytes). However, because in computer science
powers of two come much more naturally than powers of ten, and because 210 = 1, 024 is close to
103 = 1, 000, people often refer to a kilobyte as 210 = 1, 024 bytes, a megabyte as 220 = 1, 048, 576
bytes, a gigabyte as 230 (a little over a billion) bytes, a terabyte as 240 (a little over a trillion) bytes,
and a petabyte as 250 (a little over a quadrillion) bytes. It doesnt make a difference in an order of
magnitude calculation.


Computer Architecture: what does a computer consist of?



We are not going to focus on what makes up a computer however, a basic understanding of the pieces
required is useful. The physical computer (hardware) today consists of a CPU (arithmetic logic unit, ALU,
control unit and processor registers), memory, and inputs and outputs. The unique combinations of the CPU
and other components are what increase the speed of the computer. The speed refers to how fast a computer
can perform an instruction. Instructions to the computer are given by software. The instructions given to the
CPU are performed in a sequence called a machine cycle. Each instruction will have smaller pieces consisting
of simple operations the ALU can perform at a very fast speed. The synchronization of the steps involved to
get an instruction from the CPU to the ALU and back into memory defines a machine cycle. The CPU speed (the
number of machine cycles per second) is usually measured in GHz. In terms of the number of instructions per
second the units are flops (floating point operations per second). For example the fastest computer in 2015 is in
China. It is called the Tianhe-2 supercomputer, and is clocked at ~55 quadrillion flops.

The heart of the computer, the CPU
1. CPU the central processing unit, performs the instructions
2. Memory stores information (both data and instructions)
3. Inputs and outputs (keyboard, monitor, printer, mouse )

Principal components of a CPU include the arithmetic logic unit (ALU) that performs arithmetic and logic operations,
processor registers, i.e. a small amount of storage that can be accessed very quickly, and a control unit that fetches
instructions from memory and "executes" them.
In addition to the small amount of memory on the CPU (registers) that is accessed quickly, there is the so called main
memory which is connected to the CPU via a wire called a memory bus. When the computer is switched off,
information stored in the main memory will be lost. For more permanent storage, one uses the hard drive memory,
sometimes also called secondary storage. Hard drive is cheaper and much slower to access than the main memory.
Information can also be stored on removable storage devices such as CD and USB drives etc.
Most of the main memory can be thought of as cells each storing one bit (0 or 1). The memory cells are grouped into
words of fixed word length, for example 1, 2, 4, 8, 16, 32, 64 or 128 bit. Each word can be accessed by an address.
The bus is more complicated than just connecting the CPU with RAM. It is also responsible for connecting devices
(keyboard, monitor or display, printer, mouse, hard drive, etc) to the CPU/memory chain. Since there is a lot of
information multiple busses are often used. And in fact a device exists to manipulate the multiple buses.

The instructions (software)
Software consists of files of characters stored in a format that is understood by the computer to be instructions. There
are many levels of instruction sets. Lowest level instructions executed by the CPU are in a language called machine
code. This code consists of patterns of 0s and 1s, such that it can be understood directly by the computer. However,
writing machine code is tedious and error prone. The human readable version of machine code is assembly code.
Finally, there are higher level programs, such as Fortran, C, java, Matlab, Python, etc These higher level languages
need to be translated into machine code, i.e., the code that a computer can understand. Based on how this translation
is achieved, the programming languages can be either compiled, interpreted or both.

Various levels of languages and examples
Lowest level machine code
Low level assembly code
Higher level Compiled: Fortran, C
Higher Level Interpreted: Matlab, Python, Mathematica

The operating system is the software that makes the computer work. It performs common system tasks that all users
take advantage of (data input and output, displaying output, managing files) allowing the programmer to focus on his


specific problem to program. The operating system is stored on hard disk memory and loaded when the computer is
turned on.
The operating system consists of two parts: a kernel and a shell.

The kernel is the center of the operating system that manages everything. It sends instructions to the heart
of the computer handling the necessary behind the scenes steps that enable the machine code to execute user
requests. Basic functions include: managing memory control, input/output, device control, and networking.

The shell is the user friendly connection to the kernel. The shell allows the programmer (user) access to the
kernel through the command line interpreter. In other words it is the interface that translates what you
want the computer to do into low-level calls to the kernel. The shell is launched when the user starts a
terminal session.


Programming languages
There are many languages in the world. Similarly there are many coding languages. While different languages have a
different syntax, some basic programming concepts are used by all languages. We will talk about some basic concepts
in this course, and you will see how they appear in all three languages that we will be working on.
It must be said some languages are more intuitive than others! Some languages can be harnessed to do similar tasks.
Thus, a choice can be made for speed in function or speed in programming.

Comparison of languages
Language
Italian
Danish
Spanish
Romanian
Haitian-creole
Bash
Java
Pascal
Python
Ruby

Syntax
Ciao, mondo

hej verden
hola mundo
Bun ziua lume
alo mond
echo Hello
alert("Hello,
Write(Hello,
Print(Hello
Puts Hello,

World
world!");
World!);
World!)
World!


Data streams
The key concept in computing is that of a data stream: data -> manipulation -> output. In this scenario information is
made available to the computing system, the data are manipulated in some manner, and the final data are exported to
the end user.
INPUT
DNA

MANIPULATION
CELLULAR
MACHINERY

OUTPUT
PROTEIN

18 year old
Lots of late nights,
sweat, and $$$

JHU

Bachelor Degree

File

Command

Information to Screen


We will see how this flow of information is central at every level of computing. At the lowest level data is accessed
and sent along the bus to the CPU to have instructions performed and results sent back to memory. The same
treatment of data will occur at the higher programming levels you will access this semester. You will decide the
instruction, provide the data, and request the output. The computer will perform the instruction set.

1. Getting to know your Mac computer. Vi editor



1.1 GETTING STARTED - Login and the Apple Environment*
Log into the Mac computer using your JHED ID and password. Typically, the first thing one should do upon using a
new system is to change ones password. There is no need to do this if you use a personal cpu or a JHED authenticated
system. In the latter case passwords are managed within the framework setup by JHU-IT. If you were accessing a
typical multi-user UNIX system you would change your password with the command passwd. You are probably
familiar with the use of passwords. How should a user keep track of passwords? How should one choose a password?
These are rhetorical questions. Basically use common sense and keep the information safe.
You will notice that Apple-Mac computers are intuitive. That is what they have always been known for. Steve Jobs,
co-founder of Apple computers, also had a keen eye for aesthetics. Under his guidance the organization of documents
and programs on a computer became a visual experience with icons. This environment Jobs created is called the
Finder. The Finder also allows you to visually access practically everything on the Mac, including applications, hard
disks, files, folders, and DVDs. You can use the Finder to organize all your files and folders as you want, search for
stuff anywhere on your Mac, delete things you don't want, and more. You will notice the Finder gives you immediate
access to one level of folders with your name on it. You can look at folders on a tier above your name and at folders
below your tier. This is what is known as file hierarchy. As you move up a folder or down a folder, you are moving
through the file system or traversing the file system.
At the bottom of your screen are several icons sitting on a dock. The dock can be customized to contain applications
you use often. For example a browser is usually found on the dock. Launch the browser by double clicking the icon on
the dock and go to the jhu.edu website. Log in and click on BlackBoard. Files needed for class will be obtained from
BlackBoard. Once downloaded you will find them in your Downloads directory. Go ahead and download todays files.
Check to see they were downloaded. With the mouse move the download to your USB flash drive.
To see your files, click the Finder icon in the dock, then click on your Home icon in the sidebar. Items in the Finder
side bar are grouped into categories: favorites, shared, devicesjust like the Source list in iTunes. The favorites
portion contains favorite links to folders which include the Desktop, Documents, Movies, Music, Pictures, AirDrop, and
Applications folder. The shared portion contains computers that are connected to your computer though the network.
The device portion contains mounted and accessible volumes you have, such as a hard disk, USB flash drive, network
volume, DVD, and so forth.

1.

Window close, minimize and zoom buttons. To close the window, click the round, red button in the upper-left
corner. If you don't want to close the window but want it out of your way, click the round, yellow button to
minimize the window to the Dock. If a window is full of stuff, you can resize it by dragging the lower-right
corner to make it bigger, or click the round, green button to maximize the window's size.

2.

Finder window View buttons:


o

Icon view - Used to display the contents of your folder as a series of icons. In Icon view, you can view
live icon previews that you can use to thumb through a multipage document or watch a QuickTime
movie.

List view - Used to display your folder in a spreadsheet-style manner. Each folder can be expanded by
clicking on the disclosure triangle just to the left of the folder. You can easily sort by file name, date
modified, and so forth. Choose Show View Options from the View menu to add / remove attribute
columns. You can change the sorting from ascending order to descending order and back again by
clicking on the attribute column title.

Column view - Used to display the hierarchy of your folders where each column represents a folder.

Cover Flow view - Used to display the contents of your folder just like the Cover Flow used in iTunes.
You can see live previews of images, documents and movies, and can thumb through documents and
movies.

3.

Action Menu - Quick access to Finder functions for highlighted items, such as Get Info, Move to Trash, and
Services.

4.

Item Arrangement button - In any view, you can organize the window by clicking the item arrangement
button and choosing one of the ways to group items.

5.

Search Field - Start typing a word or phrase and Spotlight will search your Mac for any matches.

6.

Right pane - The contents of a selected folder are shown in this pane.

7.

Pane edge - Drag to resize.

8.

Devices - A device connected to your computer, such as a DVD, USB device, or your Time Machine backup disk.


9.

Sidebar - Items are grouped into categories: Favorites, Shared, and Devicesthe top portion has Favorites
which contains quick access to All My Files, Applications, Desktop, Documents, Downloads, Movies, Music,
and Pictures.

10. Back / Forward buttons - As you move to different places in the Finder window, you can use the back button
to return one step back and the forward button to go forward.
The contents of the selected folder or volume appear in the right pane. Depending on what view your Finder window
is set to, this pane may look a little different from the one in these pictures. The window above is shown in Icons view.
In Icons view, you can navigate by double-clicking folders to view the contents inside.
Viewing a window as a list or columns
If you'd like to change how folder and volume contents appear in the right pane, click one of the view buttons in the
toolbar (item 2 above). For example, when you click the list view button, the Finder window transforms into this:


List view enables you to see more content in the window than icons view, and displays some extra file and folder
information, such as the last date the item was modified, the file size, and what kind of item it is. As with Icons view,
you can navigate through your stuff by simply double-clicking folders until you find what you're looking for.
If you'd rather see your stuff displayed in a more hierarchical fashion, click the Columns view button. In Columns
view, the right pane splits into multiple columns to display your computer's file and folder organization. Instead of
double-clicking folders to see what's inside, select a folder in any column, click on it once, and its contents will appear
in another column to its right. If you really start digging down deep into your folder, you can drag the bottom-right
corner of the Finder window to expand it and see how many layers you've traveled.
Getting around
When you click All My Files, all files and folders on your computer that you have created or downloaded, such as
documents, photos, music, and movies are displayed in a categories list according to the types of files.
When you put stuff on your desktop, technically it's stored in your user account's Desktop folder, even though it
appears on your desktop. When you bring music into iTunes, your music files get stored in the iTunes folder in your
user's Music folder. Likewise, iPhoto stores pictures in the Pictures folder and iMovie stores movies in the Movies
folder. You can get quick access to your folders from any Finder window's sidebar.
The Users folder stores all the content for each user account on your Mac; each user has a separate Home folder that's
named after his or her user account name. The Documents, Downloads, Movies, Music, and Pictures in the favorites
section of the Sources list are subfolders within your Home folder.


Find stuff fast
You can find stuff quickly using the search field in the Finder window. This field uses the Spotlight technology to scour
the disk volume you select. Combine Cover Flow with Spotlight and youve got an amazingly powerful search tool. Use
your mouse pointer and hover over a file to shuffle through them visually.
Just start typing in the search field, and Spotlight dynamically displays results in the Finder window and search
suggestions will pop-up below the search field that match your criteria as you type. You can choose where you want
Spotlight to look by clicking on an item in the header, such as Servers, This Mac, Home, and more, that appears just
below the search field in the Finder window. Spotlight will scour the location you select and organize its search results
by kind.
Search tokens
The Finder includes tokensa smart way to filter your searches. When you select a suggestion, a token is created.
Instantly your search is filtered, and you see only the files that meet those criteria. You can change the scope of the
token by clicking the arrow. If you want to narrow your search even more, you can use multiple tokens together.


Search for specific file types: For example, if you wanted to find all the JPEG (.jpg) images on your Mac, type .jpg in the
search field, and you'll immediately see results pop into the window. Click This Mac in the header to find all JPEGs on
your entire hard disk. Not only will Spotlight display thumbnails, small images, of your JPEG images, it'll also list other
things that match your criteria, such as documents that contain the word .jpg in them. You can then access the file
right from that Finder window.
*To get more comfortable within the Apple-Mac environment itself see:
http://www.apple.com/support/macbasics/tour/



File Location
On an Apple computer certain folders hold specific items. Files downloaded from the internet may often be found in
either the Download folder or in the Document folder. This is something that can be changed by you the user in
preferences. It is important to be able to locate the Applications folder. The Applications folder contains all the
applications installed on your Mac. It also includes a Utilities folder, with apps that are designed to support different
functions of your Mac.


1.2 The Terminal



An interface is a mechanism that is used to tell the computer what to do. Up to now we have been interacting with the
computer using the graphical user interface, the GUI. In this mode a user visually decides what to do with files and
apps. The Finder allows one to see the connections between different objects (files, directories, applications, etc). GUI
applications make use of the mouse to manipulate files.
Another way to tell the computer what to do is through the command line interface, CLI. This is a mechanism where
the user types in the specific commands. Find the Applications folder and the terminal program within on your Mac.
The terminal icon is also located on the dock. This is the software program you will use the most during this
semester.
When you click on the terminal icon you will launch a shell. The shell is the connection between what you, the user,
wants the computer to do and the bits that are processed at the hardware level. It is a wrapper that connects what
you type in the terminal window (high level commands) to the actual computing engine or kernel (low level
commands). You can begin by typing a command at the cursor. The cursor is the small dark squarish box.
Try telling the computer to do something by typing a few commands at the prompt (in the example below the prompt
is represented with a $ sign):
$
date
Mon Jun 2 21:09:14 EDT 2014


$
Su
1
8
15
22
29

cal
June 2014
Mo Tu We Th
2 3 4 5
9 10 11 12
16 17 18 19
23 24 25 26
30

Fr
6
13
20
27

Sa
7
14
21
28


Imagine what is happening. What is the stream of information that is being accessed? When you type date the
computer takes that information and copies the value of the system clock to memory, formats the data and then prints

10


the values to the screen. How do the above commands fit in the data stream concept of computing? The input in this
case (the current year, month, day, hour, minutes, and second) is information stored on the computer but hidden from
the user until accessed. The data are manipulated in format by the command date. Finally the result is sent to the
output, the screen.

INPUT
File

MANIPULATION
Command

Hidden information

Date

OUTPUT
Information to Screen
Todays date
representation


Commands in UNIX follow the above stream. The stream is: input followed by manipulation followed by output. In
fact, pretty much all of programming fits into the data stream concept. The key to good programming is to understand
what is possible with a stream and the available information. Much of what we will be doing in this course is looking
at possibilities. From the possibilities you can then attack the problems that need to be solved.
To open a NEW terminal window, type [CMD+t] (CMD = command and + = in addition to do not enter in a + sign)
(for a new tab window) or [CMD+n] (for a new separate window). Lets compare the environment of the terminal
with that of the desktop. In the terminal window there is a symbol called a prompt. Immediately after the prompt is
the cursor location. This is the point where text will be displayed as you start to type from the keyboard.
1. Type ls in the terminal window. Now use the mouse to open up your Home folder on the desktop. What files are
in the folder? Are they the same as in the terminal window?
2. Type open notes.pdb in the terminal window. Compare this to double clicking on the same file in the Finder.
Is the result the same?
Can you identify which is a GUI command and which is a CLI command?

Logout
When you are finished with your terminal session you should type logout or exit. If you type logout a
sequence of cleanup commands will be invoked from your logout file. These might be commands that clear the
history and clear the screen for security purposes. However, if you type exit, to logout of the Apple computer, quit
terminal or close a terminal window no major problems will occur. You just will not be running one aspect of the
logout sequence.
When all else fails to close a terminal session, press CTRL-d (see table in appendices). This will return you to the
previous shell if you were using one (yes, you can have more than one shell running from a single terminal session) or
perform a logout if you are in your login shell. See the appendix for a table of ctrl commands.

1.3 Man pages
UNIX has help pages or a help manual that will provide quick answers to command usage. The help manual, also called
man pages is itself a program. The UNIX command is man. This command is your help utility and
The UNIX man pages (UNIX manual) are very useful for both the novice and the well seasoned UNIX guru. The manual
pages are accessible through a utility (program or app) invoked (run or executed) by the UNIX command man. To see
the manual pages for the command date type:
$ man date
1.4 Editors and vi
Text editors are necessary for typing words into files to be run by the computer. A text editor is simply a program
used to alter files containing just plain text (we also use the phrase ASCII characters to describe plain text). You are
probably most used to text that has had some sort of formatting applied to it. For example, bold or italics, special
fonts, margins, and images. Files that contain special formatting are often machine (computer) dependent. Plain text

11


files are not. Any computer can read them resulting in portable files, i.e., files that can be transferred to other
computers without be converted.
Editors come in many flavors. A fast editor is critical. It needs to be fast in handling data entry and manipulation. It
also needs to be fast in terms of memory handling. GUIs have text editors known as Microsoft Word and Open Office.
Most computers have a default minimalist text editor. These types of text editors are generally slow. Throughout the
history of programming several UNIX text editors have stayed the course and remain (with many improvements) the
text editors of choice. In my opinion probably the most universal editor for UNIX systems is vi (also called vim). vim
will be our editor of choice. The popular EMACS editor is a definite second followed by ed. You can read all about text
editors on Wikipedia. Also you might be amused at the discussion within the topic editor wars. There are many
other free editors available, some with an extensive following. Still other GUI and proprietary editors may be of
interest to you for intuitive editing. I am certain after using vi for a few weeks you will resort to it over other choices.
It has been heralded as the fastest editor available (once you internalize some tricky key strokes). Lastly, I must add
some integrated development environments (IDEs) are quite useful. These applications consist of multiple
programming tools including an editor all housed within a GUI interface. However, since there are many instances
where your IDE might not be available you will have to resort to vi from time to time. vi ships with every UNIX
installation I have knowledge of. It is universal.
We will be using the vim editor (m for modified). It has features that allow scripting customization for some
programming languages eg. Python. A vim tutorial is a part of the vim/UNIX distribution. The first few days of this
course will focus on learning the basics of vi usage. The first thing to do is to complete the tutorial on vi that is
provided by the UNIX distribution, vimtutor. Please remember it is assumed you will use vim for all editing of files.
Therefore it is very important that you become comfortable in vi. Most of the exercises will depend on vi and the first
exercises are geared toward helping you understand basic file manipulations.

vimtutor
At the command prompt type vimtutor
$

vimtutor

To exit, type :
$

q <enter>

At the end of class today you can continue to work through vimtutor. Lets go through some basics together now.

vi, visual editor
To start vi type vi at the command prompt
$

vi

vi may be used in three different modes to alter files: the command mode, the insert mode and the last line mode.
When vi is invoked (started) by default (without customization) the program runs in the command mode. When you
are in command mode, letters of the keyboard are interpreted as commands that will alter the text of the file. For
example you can type dd and the current line of text will be deleted. When you are in insert mode you are simply
entering letters or text into the file and they appear wherever the cursor is located. For example you can type hello
world. This is just like typing up a document in MS Word. If you are in the command mode you can type i (for insert)
to get into the insert mode. You will know that you are in the insert mode when you see -- INSERT at the bottom of
your terminal. If you want to go back to the command mode hit esc button. Finally, when you are in last line mode you
will be typing commands at the bottom of the screen. To get into the last line mode from the command mode type :.
From this mode you can for example save the current state of the file by typing w for write followed by the enter key.
Below is a list of some basic vi commands that you should get familiar with. These are commands you will use often
although not all of them. Note, you can move around a file, up or down a line or to the left or right on a given line, by
using the arrow keys.

Summary of basic, well used, commands for vi:
Basic commands: those with a : in front are last line mode commands

12


i insert
a append
R overwrite
: last line mode
:wq write (save) and quit vi
:q! quit vi without saving
~ this character (a tilde) indicates the end of the file
Basic editing commands:

change

one word
cw
two words
2cw
three words back 3cb
one line cc
to end of line c$ or C
to beginning of line c0
single character r

put text


to undo previous action:
to undo all the changes on a line

delete

copy(yank)

dw
2dw
3db
dd
d$ or D
d0
x

yw
2yw
3yb
yy or y
y$
y0
y1

p or P
u
U

Moving around in a hurry: (the arrow movements are slower)


h, j, k, l
ctrl-F
ctrl-B
0
$
ctrl-G
G
nG
gg

equivalent to arrow keys


forward one screen
back one screen
to the beginning of line
to the end of line
display line number
moves to the end of the file
moves to line number, n
moves to the first line.

Moving by text blocks:


w
b
e
(
)
{
}

next word
previous word
end of word
beginning previous sentence
beginning of previous sentence
beginning of previous paragraph
beginning of next paragraph

Moving by searches:
/text<RETURN>
search forward for text
?text<RETURN>
search backward for text

n
repeat search in same direction
N
repeat search in opposite direction

Practice with Vi
Lets practice some vi commands. At the UNIX prompt type:
$ vi
Now add some text

13


In order to insert text into a file, type i and begin typing. Press return to end a line and keep typing. Use the backspace
to correct errors. To end insert mode, and return to command mode, press the escape key.
Add the following to your file:
Example 1:

There are several ways to insert text, other than using the "i" command. The "a" command inserts text beginning after
the current cursor position
Enter the insert mode using the a command and add two words to your file:
Example 1: hi bye

Press the esc key.


To begin inserting text at the line below the current one, use the o command and type a second line:
Example 1: hi bye
Example 2: hi bye

Now try to insert the word and between the two words on the second line.
Example 1: hi bye
Example 2: hi and bye

Key to remember - at any given time you are in one of three modes:
command
commands such as "i", "a", or "o" are valid
insert
insertion of text, followed by escape key to return to command mode
last line
where extended commands can be entered
Deleting text
The "x" command deletes the character under the cursor.
Delete the word bye
Example 1: hi bye
Example 2: hi and

The "dd" command deletes an entire line


Delete the second line
Example 1: hi bye

"dw" deletes a word


Delete the word bye
Example 1: hi bye

If you make a mistakes type escape followed by u.


Type u as many times as necessary to return to the original 2 lines you typed.
Example 1: hi bye
Example 2: hi and bye

Changing text
"R" command replaces text by overwriting.
Put your cursor on the first word bye and type r (that is shift-r)
Now, edit the file to contain:
Example 1: hi and bye
Example 2: hi and bye

The "r" command replaces the single character under the cursor.
The "~" command changes the case of the letter under the cursor from upper- to lower-case, and vise versa.
More Practice with Vi

14


Type some text into a file
$
vi file_vi.txt
I LOVE COFFEE
I love coffee
I really love espresso

Try each of the following commands on the text.


Moving Commands
"h", "j", "k", and "l" commands move the cursor left, down, up, and right.
This comes in handy when (for some reason) your arrow keys aren't working correctly and that will happen!
"w" command moves the cursor to the beginning of the next word;
"b" moves it to the beginning of the previous word.
"0" (that's a zero) command moves the cursor to the beginning of the current line
"$" command moves it to the end of the line.
"G"
"10G"
"1G" or gg

go to end of file
go to line 10 in the file
move to the beginning of the file

You can couple moving commands with other commands, such as deletion.
"d$" will delete everything from the cursor to the end of the line
"dG" will delete everything from the cursor to the end of the file
Saving files and quitting vi
When you type the ":", the cursor will move to the last line on the screen; you'll be in last line mode. In last line mode,
certain extended commands are available. Remember that you must press return after a command entered in last line
mode.
":q!" to quit vi without making changes to the file
"q!", which quits "vi" without saving
":wq" saves the file and then exits "vi"
"ZZ" (from command mode, without the ":") is equivalent to ":wq"
To save the file without quitting vi, just use ":w"

More Practice with Vi


Begin vimtutor in class. Before you leave you should be comfortable with lesson 1 in vimtutor.


15

2. Data and files


Key commands and concepts in this chapter
Concepts
Stored characters
A suffix of a file that suggests its content
Used to separate columns
User descriptions of file contents and use
binary, ascii
Commands
Viewing files
Printing files

Data
File extensions
Field separators
Readme files
File type
cat, head, tail, more, less
lp, lpr

2.1 Data

What is data? We started this course by telling you that one purpose of this course is to give the researcher a
set of tools which will enable them to analyze data sets. But what are these datasets? Yes, lots of numbers or
letters that have meaning to the researcher. To the computer hardware they are just bits of value 0 or 1.
What we as computer programmers are interested in is the abstraction layer above the bits. The abstraction
layer includes combinations of bits to give character types, data types, and file types. Files of data may be just
plain characters (plain text) or more often they are formatted in some particular way that must be
understood by the end user or program. If the file can be opened by a text editor so that it is readable text
then the file type is traditionally called an ASCII file. If it is not readable it is binary. Binary files may contain
text characters but the files are unintelligible by merely looking at the text. They require an interpreter to
make sense of the content. You can read more about ASCII and binary files here:
http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/BitOp/asciiBin.html

Character types
Character types are alpha characters, numbers, punctuation marks, white space, and control characters (tabs
or returns for example). They are found on your keyboard. In addition other countries will have specialized
types. Charters are encoded for the computer to interpret. ASCII is the traditional form of character encoding
assigning a unique number to each character. More modern encodings are in use today due to the limited
number of characters allowed in ASCII. You can see the ASCII table here:
http://www.asciitable.com
To understand the ASCII table you need to understand the different numeral systems: binary, decimal, octal
and hexadecimal.
http://www.rapidtables.com/math/number/Numeral_system.htm

Data types
Data may be of different types. Besides individual characters the combination of characters can be assigned a
type. Common data types are characters, strings, integers, floating point numbers, or Boolean.

abc123
1234
3.1416
1 or 0

string a composite of alphanumerics


integer
floating point or real numbers
boolean (sometimes given as false or true)

16


The distinction becomes important when a programmer manipulates the data. For example summing a
column of numbers or computing the average value. Each type will take up a specific amount of memory
depending on the computing language and the computer system.

File types
A collection of data might be kept in a file whose contents looks like the following:
1
2
3

10
11
12

34
19
11

13
15
13

18
12
11

19
10
11

The data may have been entered by you or downloaded from a web source. The file is often formatted with
specific data in columns (or fields). In this particular case the data is plain text and can be read by a text
editor. Plain text is also called ASCII data. The file above is ASCII data. The following two datasets are also
ASCII data. They both contain alpha characters and numeric characters.
Readme.txt
U.S. HISTORICAL CLIMATOLOGY NETWORK MONTHLY (USHCN) Version 2.5.0
(Last Updated: 10 October, 2012)
1. INTRODUCTION
1.1 OVERVIEW
In October 2012, a revision to the USHCN version 2 datasets was
released as version 2.5. The version 2.5 processing steps are essentially
the same as in version 2.0, but the increase in version number change reflects


temp_se.dat
USH00011084
USH00011084
USH00011084
USH00011084
USH00011084
USH00011084
USH00011084
USH00011084
USH00011084
USH00011084

1980
1981
1982
1983
1984
1985
1986
1987
1988
1989

1197 d
598
1010
780b
861
676a
1040b
978E
864a
1429E

977 d
1141
1194
1064
1195
1111E
1381a
1208E
1046E
1283E

1511 d
1362
1740
1317a
1554
1776E
1587E
1558E
1512E
1684E

1819 d
2100
1908
1666a
1905a
1985E
1855E
1779E
1953E
1804E

2319 d
2112
2303
2266
2278
2319E
2351E
2366E
2178E
2135i

2663E
2741
2665
2513
2604
2688a
2747E
2612E
2633E
2573b

2831
2835
2707
2804
2689a
2662E
2878a
2716E
2705b
2651d

Binary data is another type of data. Binary data have 1 of 2 values (0 or 1 or may be called Boolean data if
the two states represent false or true). ASCII data really are binary data that have been encoded (or
translated) to characters that we understand. Some programs can read one or the other data type and some
can read both. An executable file (an application) is a binary file. When an executable file is viewed in a text
editor some of the binary bytes will be translated into characters we recognize, some will be translated into
seldom seen characters, while others will be translated into blank characters since the text editor does not
have a character representation for that particular byte value. Below is an example of a portion of a screen
dump of a binary file /bin/bash.
????

??
?

??
?????? 8__PAGEZERO?__TEXT__text__TEXT??~?
?__symbol_stub__TEXT?????__stub_helper__TEXT????__cstr
ing__TEXT??c???__const__TEXT?__unwind_info__TEXT?H??__DATA?`__nl_symbol_ptr__DATA??__la_symbol_pt
r__DATA??j__data__DATA?0@?__const__DATA?
??__bss__DATA? |__common__DATA8
8_?!INKED44_
?du
??"?0 pp
?D
h
P??h +
/usr/lib/dyld?H8f?23~v>??$


Below is a hexdump (conversion of binary to hexadecimal) the same binary file.
hexdump
0000000
0000010
0000020

/bin/bash
ca fe ba be 00 00 00 02 00 00 00 07 00 00 00 03
00 00 10 00 00 09 9e b0 00 00 00 0c 01 00 00 07
80 00 00 03 00 09 b0 00 00 0a aa a0 00 00 00 0c

17


0000030
*
0001000
0001010
0001020
0001030
0001040

00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ce
11
38
00
00

fa
00
00
00
00

ed
00
00
00
00

fe
00
00
00
00

07
b0
5f
00
00

00
05
5f
00
00

00
00
50
00
00

00
00
41
00
00

03
85
47
00
00

00
00
45
10
00

00
20
5a
00
00

00
01
45
00
00

02
01
52
00
00

00
00
4f
00
00

00
00
00
00
00

00
00
00
00
00

File extensions
Data files are often given a filename with an extension (suffix) that describes the type of data enclosed. A file
containing numerical data may also be plain text. In this case an extension of txt might be used to indicate
the contents are text data. So the file would be called file.txt. You will also often find the file extension dat
used for text type data files. Many programs (applications) can read text files. Some files have other
extensions. For example,

.xlsx

a file with this extension is for use with the application MS Excel.

.docx

a file with this extension is for use with the application MS Word.

Try to read a MS Word file in Vi, or look at it with less.



User defined file types
At the user or programmer abstraction level you may define your own data types. How do you want to store
your data? How will your program interpret your data? One file that should accompany all programs is a
readme file. This file contains everything a user should know to utilize your set of data and programs.
Readme files
A readme file is a text file that describes the contents of a set of data files and programs. These files will have
descriptions of the data contained in the columns or rows (also called fields) and the type of character
(comma, space, semi-colon) that separates (delimits) the fields. In addition keywords are often used to give
access to specific attributes of the data. For example DATE might describe the date the data were collected
and AUTHOR would be the pointer to the author of the file. A readme file for the above data might look like
this:
DATE: 2013.09.09
AUTHOR: ??
MEDIA: ascii flat
FIELDS:
1 index
2 trial number
3:6 sample counts


2.2 FIELDs and separators
Big data sets require storage and formats that permit easy data extraction. When a program extracts portions
of data from a file to be printed or used for another purpose according to a rule-set we call that parsing a file.
In the following example the data file has 5 fields separated by 2 spaces. In this example the fields are:

1. First name

2. Last name

3. birth year

4. career


Jane
John

5. area of focus
Bolden
Talbot

1932
1945

author economics
poet english

Here is another example:


Jane,Bolden,1932,author,economics
John,Talbot,1945,poet,english

In this example the field separator is a comma (,)

18


We will use field separators in several different ways this semester. The first is the use of a colon to separate
fields in a header for files. This can be used to parse (extract particular information) from the files you create
for various purposes such as searching and making tables of contents.

2.3 First files: commands_unix.txt AND computing_timeline.txt
Despite the fact you are probably not entirely comfortable with vi yet, we will push forward with editing of
files. You can work on the vimtutor on your own. We will start first with a commands file that will be kept
throughout the course. After you get this file started you can work on the second file: computing_timeline.txt.
Throughout this course you will be making individual reference pages. Today commands_unix.txt will
be started. This file will be a reference for you to reduce your need to reinvent the wheel every 6 months.
Commands can be complicated and it may take a while for you to come up with a command with specific
options and inputs and outputs. A reference file that can easily be searched will save you an immense amount
of time later.
Begin by typing:
$

man vi

This will bring up the manual page for vi. Now lets search the man page and figure out how to edit a file and
give it the name commands_unix.txt within the program vi.
Try it:
Now that you know how to invoke vi with an argument (the filename) create another file called
computing_timeline.txt
For this file you should write an empty file. Once you have written the file, can you figure out where the file is
located in the Finder?

The first line: Class header
Headers are often used in files to describe what is in them. A header will consist of one or more lines of text
and will be distinguishable from the rest of the file. In our case a # will indicate the header lines. We will also
be using a field separator (:) with fields JHED, date, filename, and short description.
#jhed:date:file name:description

For example:
#cfitch1:20140110:commands_unix.txt:reference commands used in bash shell.

Add a header at the beginning of the file containing a # followed by user name, date, filename and purpose
all separated by the field separator (:). Notice the use of todays date. This is the date format to use in this
class whenever a date is needed. I will reference it as 20xxyyzz.
Edit a new document named header containing the class header..
The second line: Usage line
Another typical statement in a file or program is the usage statement. A usage statement helps the author or
user to remember how to use the file or program. Six months after writing code or editing a file a usage
statement can be quite useful. Add the following line as the second line of the file:
#format command:definition:last usage:notes


The rest of the file: File content
Now you can begin to add individual commands you have used to your file. Using vi, append the following 3
lines to the file:
man:manual pages displayed:man man:
date:date sent to screen:date:

19

2.4 Viewing files: cat, head and tail, more, less


The command cat will show you on the terminal what the contents of the files are.

$ cat commands_unix.txt

You can now add another command to your commands file.


cat:concatenate and print files, view content:cat commands_unix.txt:viewing files

Two other useful commands for viewing files are more and less. To figure out the difference between cat,
more and less it is best to generate a slightly longer file. Make a file called long.txt by using vi
$ vi long.txt
On first line enter number 1, second line 2, and so on, until you have reached 25. The beginning of the file
should look like this:
1
2
3

Now try viewing this file with cat, more and less. Can you figure out the difference? Do you think the
expression less is more applies in this case?
Can you guess what commands head and tail. will do? Use the man pages to find out how to invoke the
head command. Using the head command print to the screen just the first two lines of the file

commands_unix.txt

Go ahead and add the commands head and tail to the commands_unix.txt file.

2.5 General format of UNIX commands
Most commands in UNIX are lowercase. But as UNIX is case-sensitive beware. General format of the Unix
commands is:
command (options) [argument]
Remember this format. All commands adhere to it. In the example
$

date

the command is date and no options or arguments are given.


example
$

In the

cal 2014

an argument (2014) was given to the command cal. Arguments modify the commands they support. In this
example no options are given.
$

head -2 commands_unix.txt

Here head is the command, -2 is an option and commands_unix.txt

is an argument.

2.6 Printing files
Printing a file is really another form of viewing a file. Instead of viewing the file on the screen or in the
terminal you send the contents to an output device called a printer. When you view the file in the terminal
the output device is the screen. These are some printing commands:
cat, lp, lpr
You can look at the man pages for each to decide how to print a file.
See
the
man
pages
or
the
following
link
http://www.cups.org/documentation.php/doc-1.3/options.html

for

command

line

options:


2.7 Editing a file: computing_timeline.txt

20

What do you type to edit this file?



Enter the following text into the empty document.
#JHED:20130130:computing_timeline.txt:Timeline of computer history for intro to computing
#usage year:topic:description:reference or source

19760000:Bill Gates:Cofounded Microsoft with P. Allen, licensed DOS to IBM brought huge
...success:woopidoo.com

NOTE: When you see ellipses () at the beginning of a line in this document it merely means the line is
continued in the actual file but can not be viewed in this document as such.

Write the file in vi and take a look at the file contents
Use [CMD+N] to bring up a new terminal window.
Use the command cat to catenate the file to the screen.
$

cat computing_timeline.txt


TRY IT (print your computing_timeline.txt file)

2.8 Saving your data

You must regularly make copies of your files. Files can become corrupt such that they are unreadable. In
addition, theses computers can be overwritten at any time. You should not trust these computers as the only
source for you files.

For today you will use the GUI driven drag and drop mechanism to save data to your flash drive. We will soon
learn the UNIX commands to do this.

Your flash drive should be viewable in the Finder. Just click on the file you want to save and drag it to the
Flash drive icon.

2.9 Help
Try using man pages on other commands we learned today. If you are confused with what the man pages are
telling you, use google to look up the commands.


3. Navigating the computer


Key commands and concepts in this chapter

Directory hierarchy
Traversing the directory tree
Hidden files
Saving data to flash drives
pwd, cd, ls
touch, mkdir, cp, mv

Concepts
The system map
Moving around the system
Files that are not visible by default viewing
commands
Copying data for backup or transport
Commands
Commands to view and maneuver through the
directory tree
Creation of files and directories

21


rm, rmdir

Deletion of files and directories


3.1 Directory hierarchy
You might have heard (or figured out on your own) the organization of files on a computer system is in the
style of a tree-like structure of directories (sometimes called folders) which may contain files and other
directories. The initial point of the tree is called the root directory. The root directory may contain files and
subdirectories which may contain other files and subdirectories. All other directories on the computer branch
from this topmost layer, root. Its location is designated by a single forward slash /.




3.2 Your login directory
When you start a terminal session you will automatically find yourself in your login or home directory. In
the directory structure your login directory is below the users directory which in turn is under the root.
$

/Users/login


3.3 Print working directory: pwd
To find the absolute pathname, or full path name, to your current position in the directory tree type pwd.
$

pwd

This is your present working directory, pwd.


$

/Users/login


3.4 Changing directory
When you first login you are in your personal home directory. In order to traverse the file system you must
tell the command where to go.
$

cd directory

Directory specification can be done in many ways. The most basic way is by using the absolute path name.
The absolute path is specified by reference to the root directory, /
$

cd /Users

22

pwd

To change directories you can also use relative paths. For example when you do
$

cd ../

you will go back one directory relative to your current working directory. If you are currently in the /Users
directory this should take you to the root. Which command do you need to use to figure out your current
working directory?
You want to go back to /Users directory by using relative paths.
$

cd Users

Note here there is no / as we are using relative pathnames, and you are already in the / directory
There are several ways to return to your home directory.
$

cd /Users/login

cd /Users

cd login

There is a special shortcut to your home directory using a tilde (~)


$

cd ~

The simplest approach to get to your home directory is cd with no argument


$

cd

Return to the root directory


$

cd /

pwd

3.5 Listing the content of a directory


To find out what files are in your current directory type:
$

ls

ls provides a listing of the contents of the current directory. Try listing the content of your home directory
$

cd

ls

Return to the root directory


$

cd /

ls

What files are in the root directory?


Adding an argument to the ls command will list the directory contents of the given argument:
$

ls /Users

will list the contents of the directory Users.

3.6 Hidden files


Hidden files start with a dot. They can be seen, by adding an option [a] to the ls command
$

ls a

This command lists all files in the current directory including hidden files. Hidden files are necessary but
can produce clutter for everyday computing tasks therefore they are listed only when the option a is used. In
fact, the Mac OSX system does not show these files to the user in the GUI application. Usually these are
configuration files and are generally not to be altered. To make a file hidden preface the filename with a
period ..

23



3.7 File and directory creation: touch, mkdir, cp, and mv
Files can be created within vi or you can edit an existing file. File creation can be accomplished in many
ways. For file creation you can use the command touch. Look at the man page for touch
$

$ man touch

$ touch file1

Where is this file located? Can you find it in the Finder?


Now use ls to see if the file exists. Finally edit the file using vi. Enter your favorite food to eat.
$

ls file1

vi file1

Now within vi type your text after entering the insert mode:
oranges
Write the file and quit vi by hitting the esc key followed by shift : wq
To create a directory (an empty directory that is) use the mkdir command.
$

mkdir WORK

Files can be moved throughout the system. To copy a file use the command cp.
$

cp file1 file2

To move a file use the command mv.


$

mv file1 WORK

ls WORK

mv file2 file3

ls

mv WORK/file1 .

ls

To copy something into your current working directory use a dot (.), as in the example above. Now see what
the contents of your HOME directory contains:
$

ls a /Users/login


Make another directory DATA within your login directory. We will make use of directories to organize your
code and data.

3.8 Deleting files: rm and rmdir
To remove unwanted files type rm. To remove unwanted directories use rmdir. In rmdirs case, the
directory must be emptied first of all files.
TRY IT:
$

rm file1

Now type the command to remove the directory WORK

3.9 Saving data


How will you backup your data?
It will be your responsibility to be sure your data are backed up. You should use a flash drive. At various
times we may also use of JShare, a server, email or some combination of these. One should always have
backup mechanisms in place. Notice the use of plural

24

If you do not have a flash drive yet please email your files to yourself just in case they are deleted.
The following is an example of using the terminal to copy data to your flash drive. Where you see
filename insert the name of the file you want to save. Where you see FLASH insert the name of your
flash drive. NOTE: If the name of your flash drive has a space in it, click on its icon and rename.
1.

Insert your USB FLASH drive.

2.

From your home directory type:


$
ls /Volumes
Macintosh HD FLASH

3.

Using the output from the ls command, which gives your flash drive name, type:
$

cp filename /Volumes/FLASH/

OR (for PC users)
cp filename /cygdrive/f/
To copy a directory type:
$

cp r directory_name /Volumes/FLASH/

OR (for PC users)
cp r directory_name /cygdrive/f/
4.

Finally if your flash drive is entitled my_flash your command to backup a file, e.g.,
computing_timeline.txt to and from the flash drive are respectively:
$

cp /Users/jhed/computing_timeline.txt /Volumes/my_flash/

cp /Volumes/my_flash/computing_timeline.txt /Users/jhed/

4. The Unix operating system. Your first program


Key commands and concepts in this chapter

UNIX operating system


Shells
File system
Processes
Programming and scripting
ps, kill
tty
echo
source

Concepts
Multi-user, multi-tasking operating system
A user interface for accessing operating system
Organization of files
A task that is being run (executed)
Placing multiple commands in a file for execution
Commands
Commands for viewing and killing processes
Terminal name
Prints arguments to screen
Used to run bash scripts

The goal of this chapter is to provide your with a brief introduction to the UNIX operating system. We will
explore how the hardware is able to function merely by the user input whether using the GUI or the CLI.

4.1 Unix

25


Unix is a family of multitasking, multiuser computer operating systems that derive from the original AT&T
Unix, developed in the 1970s at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and others
(Wikipedia)..There are multiple flavors of Unix and there are many systems which are UNIX-like in their
architecture. Notable among these are the Mac OSX and the GNU/Linux distributions. .
The Unix operating system consists of many utilities along with the master control program, the kernel. In
computing, the kernel is a computer program that manages I/O (input/output) requests from software, and
translates them into data processing instructions for the central processing unit and other electronic
components of a computer (Wikipedia).
A UNIX kernel consists of many kernel subsystems like process management, memory management, file
management, device management and network management.
Some key features of the UNIX architecture concept are:

UNIX systems use a centralized operating system kernel which manages system and process activities.

UNIX systems are multiuser and multitasking: multiple processes can run at the same time, or within
small time slices and nearly at the same time, and any process can be interrupted and moved out of
execution by the kernel.

Files are stored on disk in a hierarchical file system, with a single top location throughout the system
(root, or "/"), with both files and directories, subdirectories, sub-subdirectories, and so on below it.

4.2 Unix file system


The Unix operating system comes with a lot of files that are necessary for the computer to run. Some of these
files are executable. An executable file is a computer file that contains instructions in a form that a computer's
operating system or application can understand and follow. Computers must use executable files to carry out
the tasks that you give to them. Every application you run starts off with an executable program.
A number of directories exist beneath root. Some flavors of UNIX will vary in the location and naming
convention of these files. For example user or home instead of Users. Most of these differences are merely
related to system organization. Some typical directories beneath the root directory include:
/bin

/usr/bin
/sbin

/usr
/dev

/etc

/Users or /home
/Volumes or /mnt
/opt


User command files. Must be present for system to boot and run
User commands files not required by the system
Executable files, usually for system administration
Used for miscelanous purposes, used by multiple users
Device files. Here computer contains a list of all devices it understands
System wide configuration files
User home directories (or users)
Mount point for a temporarily mounted filesystem
Add-on application software packages

4.3 Processes
The basic operation of the UNIX operating system (and most others today) revolves around processes and files.
A program that is currently running (i.e., being executed) is termed a process. Each process is given a unique
identifier, PID number. Processes have owners and permissions. UNIX processes are constantly running.
Processes are started by the kernel at boot time, by the operating system alone, and by users. Each process is
spawned from another except for the intial process. Some complete and others start. The commands ps and
top are used to look at running processes.
The initial process started at boot time is launchd (or initd). At the terminal type
$

ps -1

This command tells us that the 1st process (PID=1) was launched and its name is launchd. At the terminal
type
$

ps aux

26


You can now see a list of all currently running processes. All PIDs are greater than 1. Some processes have
been launched by PID 1 and some by other PIDs.
You can see all of your currently running processes by typing:
$

ps -fu username

The output will be something like:


UID
PID PPID
502
788
787
502 75485 75484
0
795
533
/bin/bash 502
797
795
502 2986 2985
502 6271 6270

C
0
0
0

STIME
Mon12PM
Fri11AM
Mon12PM

TTY
ttys000
ttys001
ttys002

0 Mon12PM ttys002
0 Mon01PM ttys003
0 Mon06PM ttys004

TIME
0:00.02
0:00.04
0:00.03

CMD
-bash
-bash
login -pfl fitch

0:00.07 -bash
0:00.12 -bash
0:00.07 -bash

where the columns indicate:


UID
PID
PPID
C
STIME
TTY
TIME
CMD

user ID
process ID
parent process ID
cpu usage time
start time
terminal type (shell identification) to which the PIDs are spawned from
running time
command in execution (a dash indicates a login shell)


To kill (or stop) a shell process, look for the processes with a dash and choose the process number that you
want to end. Be sure that the PID is not for your current tty. Then issue the following command:
$

kill processid

Replace processid with the process ID number you identified.



TRY IT:
Launch a new terminal session.
Determine the terminal name. Use the command tty.
$
$ tty
/dev/ttys007

The terminal name is ttys007


Determine its PID. Use the command ps.
$
PID
36979
36980
89132

$ ps -t ttys007
TTY
TIME
ttys007
0:00.02
ttys007
0:00.19
ttys007
0:00.00

CMD
login -pf fitch
-bash
ps -t ttys007

Using the command kill , terminate the newly made terminal session.
$

kill 36980

OR
$

kill KILL 36980

The option listed for kill allow for a forced exit (KILL, 9). See the man page on kill for additional
information.

4.4 Shells

27


User access to the kernel is through the shell. What is a shell? A shell is the higher level language that
interfaces with the kernel. You have already used the Unix shell whenever you type in a command, say pwd,
you are invoking the UNIX shell. The shell is a command line interpreter. We have been invoking commands
mostly on the command line which are consequently interpreted by the shell.
There are several shell flavors, such as sh, bash, tcsh, or csh, which are all transferrable to any computer
running UNIX. Our class will be using the bash shell. Of course, you may download a copy from the internet of
any shell you wish and load it and run it. Use the ls command to look into directory /bin and see if there are
some shell executable files in that directory.
On some systems using the command chsh (chsh s path_to_shell) will permanently change your shell
preference. Outside of this course, you may find yourself having to (or wanting to) use a different shell.
Remember they are basically the same idea with a few different commands and syntax requirements. New
shells developed out of programmers frustrations. It is not that one is necessarily better or worse, they meet
different needs.

4.5 Programming and scripting
Our exploration into computing will entail exposure to a variety of commands the shell can interpret. One can
combine these commands and include them in a file. This saves time and prevents errors in typing when
needing to repeat commands. A program is a file filled with a collection of commands. We will begin by
writing your first program! In this case the file will contain shell commands that you have learned. For an
interpreted language such programs are also called scripts. Thus your first program will be a shell script.

4.6 echo command and your first scipt
It is customary for a first program in a new programming language to include commands that print Hello
World to screen. In bash, this is done with the command echo.
TRY IT:
$

echo Hello

echo World!

echo Hello World!


Type the command
echo Hello World!
in a file called script_1.bash
TRY IT:
$

vi script_1.bash


Run the commands within your bash script using the command source or alternatively .
$

source script_1.bash

. script_1.bash



4.7 Adding comments to your scripts
You can prevent some lines from being executed by adding a comment card. In bash a comment card is a
hash tag (pound, #) symbol. Add the class header to your script as a comment.

#JHED:20140904:script_1.bash:my first bash script

Add some additional comments to your script:

28


# This is my first script in bash
# hash tags are comments in bash
echo Hello World!
# This is a comment

Note the scripts that follow will be commented in two manners.


1.

For general comments where the first character of a line is a comment card, the # sign for bash.

2.

Comments from me as to what the code is doing. These are optional for you to enter. Though, it is
probably a good idea to add them. These also include a comment card, #, but it is not at the first
character position. When a # is found anywhere on a line the bash program will disregard
everything that follows on the same line.



#JHED:20140904:script_1.bash:my first bash script
# This is my first script in bash
# hash tags are comments in bash
echo Hello World!
# This is a comment


Run script again
$

source script_1.bash


4.8 Adding #!/bin/bash as the 1st line to your script
The combination of characters #! is called a shebang. When ! is following the # character, the line that follows
is not a comment anymore. Usually the #! characters are followed by the location of the program to be used to
run the script. For the bash shell, we will use

#!/bin/bash

First line of your scripts:


Note that #!/bin/bash should be the first line of your script. This 1st line is not necessary if sourcing a file.
Second line:
In our scripts, our second line should be a comment, stating the following information

#login name:20120215:script_1.bash: my first bash script

Third and following commented lines


Tailored to your requirements. Can contain notes and descriptions.
#script_1 is a sample template for bash scripts
#CAF
#20190914

Add #!/bin/bash to your new script:


#!/bin/bash
#JHED:20140904:script_1.bash:my first bash script
# This is my first script in bash
# hash tags are comments in bash
echo Hello World!
# This is a comment

Make a directory named CODE. Move your newly made script to your CODE directory

29

5. Variables, meta-characters and expansion


Key commands and concepts in this chapter
Concepts
Flow of information in command processing
Meta-characters
Filename expansion
Variables, variable expansion
Quoting variables
Arithmetic expansion

Characters with special meaning in bash


Characters for matching filenames in bash
Symbolic name for information or value
Manipulation special meaning of characters or
strings
Evaluation of arithmetic operations in bash


5.1 Bash command processing sequence
When you type a command in your terminal, the bash shell performs a series of operations. For example,
when you type:
$
echo Hello
Hello

the shell needs to identify what is a command, and what is an argument. Then it needs to find where the
command is located, execute the command and in some cases produce some sort of an output. In practice,
things are even more complicated since there are some special characters, called , meta-characters that have a
special meaning to the shell. For example, we learned about the pound sign #, which tells the shell to ignore
the characters that follow that sign, and treat them as a comment. The shell also performs something called
expansion, i.e., it replaces some characters with some other characters. We will talk about meta-characters
and expansion in this chapter. A general recipe that the bash shell follows when processing a file or your
commands from the terminal is presented below.

Input processing: take characters from the terminal or a file and break
them into lines. The lines are sequences of characters terminated by
newline characters.

Lexical analysis and parsing: identify meta-characters and separate


the stream of characters into words. Words are sequences of characters
separated by meta-characters

Expansion: some characters/words can be expanded into some other


characters

Command execution: the set of expanded words is decomposed into


command name and a set of arguments. Commands are executed

Exit status: the shell will report if an error occurred in command


execution




5.2. Meta-characters and expansion

30


Meta-characters are characters with a special meaning. You have already met the pound sign #. The
characters that we will familiarize ourselves with in the chapter. Some other meta-characters that we will talk
about are: ~ $ & * ( ) [ ] { } ; < > / \ ? !

5.2. Expansion
The shell performs several different types of expansion. Those include the filename and pathname expansion,
brace expansion, variable expansion, and arithmetic expansion.

5.3 Filename and pathname expansion (*, ?, [])
The shell allows the use of metacharacters for pattern recognition in filenames (filename expansion). It will
supply the list of all filenames matching the given pattern.
* matches 0 or any characters
? matches exactly one character
[xyz] matches any one of the characters within the brackets.
[!xyz] matches
brackets.

any

characters

except

the

ones

contained

within

the


The all important wildcard, *
You might be used to using the mouse or using a find to select multiple files in a GUI interface. Within the
UNIX environment groups of files can be accessed by using special characters. The wildcard, *, matches any
number of characters.
Example
$

ls *.dat

The ls command above will list all files ending in .dat in the current working directory.

Example
Type the following commands:
$

mkdir example

cd example

touch feb86 jan12.89 jan19.89 jan26.89 jan5.89 jan85 jan86 jan87


jan88 mar88

touch memo1 memo10 memo2 memo2.sv


This will create files for use in the examples below.
Guess the output from the following commands:

echo *

echo m[a-df-z]*

echo jan*

echo ?????

echo jan?? feb?? mar??

echo *[!0-9]

echo [A-Z]*

echo *.*

echo *89

31

echo [fjm][ae][bnr]*


Wildcards allow you to specify a relative path (as opposed to an absolute path)
Example

$

ls ../*

ls /Users/login/*

If you use absolute paths in your wildcard, bash will expand the wildcard to a list of absolute paths. Otherwise,
bash will use relative paths in the subsequent word list.


5.4 Brace expansion ({})

Brace expansion is similar to filename expansion. Pathname and filename expansion work on existing files
and directories. With brace expansion you can match filenames and pathnames but you can also create new
files and directories.
{a..z} matches characters a to z lowercase as a list
{1..9} matches numbers 1 to 9 as a list

{1,2,3} matches each of the characters within the braces


Example

touch file_{1..3}

ls file{1,2,3}


5.5 Variables and variable expansion
What is a variable? You can recall from algebra setting numbers to be referenced by a character or a string:

x=4
y=5

In UNIX variables can be set to numbers and strings (a string is any finite sequence of characters (i.e., letters,
numerals, symbols and punctuation marks)). Once you have a variable made the shell will use a meta-
character to recognize it as such. To see what the variable holds we use the echo command and a $. The
$ sign merely tells the interpreter to expand the contents of the variable. It points (references) to the contents
of the specified variable held in memory. This is another meta-character because in the shell it has a special
meaning.

$

x=4

$
4

echo $x

v1=test


$
echo $v1
test

32


5.6 Quotes
For extremely detailed information on how quotes should be used in bash, you may want to look at the
"QUOTING" section in the bash man page. The existence of special character sequences that get "expanded"
(replaced) with other values complicates how strings are handled in bash.
Enclosing characters in single quotes (') preserves the literal value of each character within the quotes. A
single quote may not occur between single quotes, even when preceded by a backslash. Enclosing characters
in double quotes (") preserves the literal value of all characters within the quotes, with the exception of $, `,
\ and !
.
TRY IT:

Another way to explore the use of quotes is to make a variable HW. Type echo $HW after each of these
variables
HW=Hello World # if you want to include two words separated by space you must use quotes
HW='Hello World'
HW="Hello World"
HW="Hello World!"
HW='Hello World!'

5.7 Escape characters
Escape characters are used to remove the special meaning from a single character. A non-quoted backslash, \,
is used as an escape character in Bash.

echo $HW

echo \$HW

5.8 Arithmetic expansion


You can perform simple integer math using shell syntax. Simply enclose the particular arithmetic expression
between a "$((" and a "))", and bash will evaluate the expression. Here are some examples:
INTEGER
$ echo $(( 100 / 4 ))
25
$ myvar="56"
$ echo $(( $myvar + 12 ))
68
$ echo $(( $myvar - $myvar ))
0
$ myvar=$(( $myvar - 1 ))
$ echo $myvar
55

33

5.9 One line of code for multiple commands ( ; , && )


Just like putting several lines of commands in a file and running the file you can separate commands by a
semicolon all on one line. In this manner you would not have to write a file to do several commands.
So, if you want to run several commands, one right after the other, type them on a single line separated by
semicolons:
$

touch file1; mv file1 DIR1; vi DIR1/file1

What happens if you type something incorrectly? The next command is executed.
So, if the command touch fails the sequence of commands will continue.
$

touch file1; mv file1 DIR1; vi DIR1/file1

Depending on the files available an error will occur for each. The use of the logical and ( &&) will cause the
following command to execute only if the prior commands succeeded.
$

touch file1 && mv file1 DIR1 && vi DIR1/file1

6. The environment
Key commands and concepts in this chapter
Environment variable usage
The PATH
rc files, bashrc
env
alias
export
find

Concepts
Make use of variables to automate tasks and to
shorten path designation
Sequential lookup path
Customization of every login automated
Commands
Display environment
Rename a command
Make a variable active in the environment
Locate files in the file system


6.1 UNIX Environment Setup
When working on a UNIX system an environment is created. The environment consists of variables that are set
through several different means. Most are set by the system and within your own startup files. Others may
be set within user programs. When a shell process starts it inherits the environment from its parent process.
You can see all variables that have been set in the current shell environment by typing
$

env

One of the variables listed is HOME. Where does this variable point? The variable HOME provides yet another
way to return to your home directory. Thus,
$

ls $HOME

lists the contents of your home directory. If the variable HOME had the wrong character sequence an error
would be generated.
$
pwd
/Users/Shared
$

cd $HOME

$
pwd
/Users/fitch

34


Another variable is SHELL. This tells the computer what shell you wish to work in. What shell are you
currently using? If you are not in the bash shell you need to change shells.

6.2 Creating environment variables
Environment variables are a great way to save time typing lengthy paths. They are many other uses for
environment variables as we shall see. Here is an example:
$

cd $HOME

cd CODE

mkdir PYTHON

Make an environment variable, to this new directory:


If bash shell:
$

PYTHON=$HOME/CODE/PYTHON

export PYTHON

Repeat the above but for a directory in the CODE directory named UNIX.
$

mkdir $HOME/CODE/UNIX

UNIX=$HOME/CODE/UNIX

export UNIX

How do these environment variables work?


$

echo $UNIX

ls $UNIX

cd $UNIX

cd ~

ls latr

Does the following command give an error?


$

cd $CODE

Yes it does produce an error as the environment variable CODE has not been created. Create the
environment variable CODE that points to the CODE directory. Once made correctly the following command
will move your bash programs into the UNIX directory:
$

mv $CODE/*bash $UNIX

Make another environment variable $DATA.


What exactly does the export command do? What is it useful for? The export command is used to export
a variable to the environment of all child processes running in the current shell. Open a new shell. Will the
same sequence of commands work in the new shell or do you first have to define the variable again and
export it? You will find the variable is not available. The way to automatically have it available is to add the
commands to your .bashrc file. An example of a .bashrc file is on the blackboard, named bashrc.orig.
The commands in this configuration file are executed when you open a new shell.

6.3 alias
Alias is used to rename a command. For example instead of typing your favorite options for the ls command,
such as:
$

ls -lrt

You can make an alias for it


$

alias lrt='ls lrt'

Try running the lrt command.


Unintended removal of files can occur. Recovering these files is often possible but certainly not convenient.
Instead of using the default options for rm command you can make an alias to the command with desired
options. A useful option for the rm command is i which requires confirmation before a file is deleted.

35

alias rm='rm i'

Having this alias as part of your .bashrc file can help to minimize the risk of unintentional deletion.
$

touch test

$
rm test
remove test?

Now you can enter y and the file will be removed or n and it will not be removed.
Add the alias to your bash configuration file (.bashrc).
6.4 PATH
If you type a command the shell must be able find the command. For example if you type ls, the shell must know
where to find it. It does this by looking up a list of directories that are defined sequentially in an environment
variable. This variable is called PATH. The PATH variable contains a colon separated list of files to be searched
for executable programs.
echo $PATH

You will notice that the output contains files with binary executable files. You can add your own paths to this list.
For example you can add your scripts directory to this list.
export PATH=$PATH:$HOME/CODE/UNIX

This command will add to this new directory to the current list of paths
echo $PATH



6.5 find
The find command is useful for locating specific files. The first argument to the find command is the
directory to be searched for files. The following command will find all files under the directory root that are
named filename.
$

find / -name filename

To find if the file nobel.txt is in your home directory, type:


$

find $HOME name nobel.txt

6.6 CUSTOMIZATION runtime configuration files


To customize your shell environment you must edit some configuration scripts called startup files.. In bash
we will be editing these configuration scripts: .bash_profile and .bashrc. While .bash_profile gets
executed when you login, .bashrc is executed when you open a new window or type bash in your terminal
window. In both files users can specify and export various environmental variables, such as the PATH
variable. Aliases can also be defined in these scripts. Placing environment variables in these scripts avoids
the arduous task of entering the command every time you log in.. To avoid duplicating setup of variables and
aliases, we will only be placing them in the .bashrc file. In .bash_profile we will simply source the
bashrc file.
We have provided two files for you on the BlackBoard: bashrc and bash_profile . Download these two
files. Use the find command to locate the files. Make a copy of the files inside your $HOME directory. Make
these files hidden (remember section 3 where we talked about hidden files - use the copy command to add
the dot before the filename, e.g., cp bashrc .bashrc). You should now see the hidden startup files .bashrc
and .bash_profile in your home directory.
Add the following lines to your .bashrc file:
export UNIX=$HOME/CODE/UNIX
export CODE=$HOME/CODE

36


export DATA=$HOME/DATA
export PYTHON=$HOME/CODE/PYTHON

We will go over this exercise in detail during class.

7. Security, backing up, grep


Key commands and concepts in this chapter
Concepts
Management of access to files and directories
How much disk space are files using
Copying your files to another location
Assuring adequate and reliable memory is available
Prevention of lost files
Commands
Change permissions
Copy file to remote locations
Compress and pack files
file size
free disk space
Find and extract a pattern from a file

Security and permissions


File size
Backing up
Managing file space
Backing up
chmod
rsync, scp
tar, gzip
ls l, du, wc
df
grep

7.1 Security: file permissions, users, groups and root


If you start to roam around the file system you will soon find out you are unable to go just anywhere you
please. This is because there are some security measures in place. These began not to prevent hackers from
manipulating your system but to prevent inadvertent destruction by the owner. When systems became
multi-user and networked, additional security measures were needed. We will only touch the surface on how
to secure your system.
The first level of security is contained in settings associated with files. The l option (long format) for
command ls lists these settings for a given file.
$
$ pwd
$DATA/NOAA
$
$ ls -l
total 0
drwxr-xr-x 6 fitch
drwx-w---- 2 fitch
$

staff
staff

204 Jan 21 21:14 NOAA


68 Jan 21 15:15 lab.20120131

staff
staff
staff

14668127 Jan 19 16:03 temp_se.dat


1648 Jan 21 18:30 temp_md_1.dat
1648 Jan 21 17:04 temp_md_1.dat.orig

$ cd NOAA

$
$ ls -l
total 28672
-rw-r--r-- 1 fitch
-rw-r--r--@ 1 fitch
-rw-r--r-- 1 fitch

Displayed for the temp_se.dat file is:


-rw-r--r-1
fitch
staff
14668127
Jan 19 16:03

file mode (1+9+1 bits)


number of links
owner name
group name
number of bytes in the file
modification time

37


temp_se.dat

pathname

The file mode printed with the -l option consists of the file type (bit 1), permissions [owner (bits 2-4), group
(bits 5-7), and other (bits 8-10)], and extended attributes if applicable (bit 11).

File type - We will only be concerned if the file is a regular file (-) or a directory (d).

temp_se.dat is a regular file

Permissions - The next three fields are three characters each: owner permissions, group permissions,
and other permissions. Each field has three character positions:
1.

If r, the file is readable; if -, it is not readable.

2.
3.

If w, the file is writable; if -, it is not writable.


If x the file is executable; if -, it is not executable.

For temp_se.dat:
the owner can read and write the file
the group can only read the file
others can only read the file.

Extended attributes (+,@) are used in Mac OSX and often are a result of transfer of files or downloads.

7.2 Changing permissions, chmod
The command
$

ls l filename

will list the current permission settings on files:


$
ls l
-rw-r--r--

1 fitch

staff

16 Jan 22 18:12 data1

The file, data1, gives 10 characters for the current mode. The first indicates the file type. The next 9
characters are 3 for owner, 3 for group, and 3 for world (or others). Here the owner has read and write
privileges, the group and world have only read permissions.
To change permissions of a file, use the chmod command.
u user, g group, o others, a all
r read, w write (and delete), x execute (and access directory)
+ add permission, - take away permission
For example to make a file readable, writable, and executable for only the owner and anyone in the owners
group use ug+rwx,o-rwx for options.
$
ls lt
-rw-r--r-$

1 fitch

staff

16 Jan 22 18:12 ata1

chmod ug+rwx,o-rwx data1

$
ls lt
-rwxrwx---

1 fitch

staff

16 Jan 22 18:12 data1

Try making your script_1.bash file executable. Now try running this file like this:
$

./script_1.bash



7.3 file size and directory size, ls, du

38


The size of a file or the amount of memory (in bytes) taken up by the file is determined by the number of
characters in the file. To see this value you can use the ls command. If a character takes 8 bits or 1 byte your
total file size should reflect the number of characters.
Make an empty file and check it size using the ls command.
Now add 1 character to the file using vi. Did the size change?

when file was empty : file size = 0 byte.

when one character : file size = 2 bytes.

when two characters : file size = 3 bytes.

Why is there an extra character? It is do to the vi editor placing a character at the end of each line.

For each directory whose contents are displayed by the ls command, the total number of 512-byte blocks
used by the files in the directory is displayed on a line by itself, immediately before the information for the
files in the directory.

TRY IT:
How much space does your HOME directory use? (find the total using the long format)

How would you produce a listing of files by increasing size? Try man on ls command.

Another way to check file and directory size is the command du.
$

cd $DATA

du -sh

Can you make sense of the output?


Listing the size of files in UNIX can be a bit confusing. There are several ways to go about doing this (ls and du
for example). Depending on the command usage the output maybe rounded up, it may refer to the actual file
size or it may refer to the actual disk space reserved for the file. These considerations can lead to large
discrepancies in the values reported for large files and/or many files.

7.4 Quotas
As you use your computer you will need to check quotas or keep track of your total disk usage. The command
to show disk usage is df
$

df h

If you have limited disk space you may need to start archiving data.

7.5 clear, wc

clear will clear your screen of all text

wc will print information about the file

EXAMPLE
$

wc $HOME/.bashrc

What does the output of wc correspond to? Use the man page to help you decipher the output.

39

7.6 Backing up
The simplest way to backup data is to use the copy command, cp.
For example, make a local backup directory, BU:
$

mkdir $HOME/BU

cp filename $HOME/BU/filename


FLASH drive backups:
To make a backup on a different local disk use cp as follows:
$

mkdir /Volumes/FLASH/BU_MAC

cp R $HOME/DATA /Volumes/FLASH/BU_MAC/

Remote drive backups:


For copies to be run over the internet use remote copy, rcp instead of cp. And preferably secure copy,
scp instead of rcp.
scp rp gbpc13@10.160.112.218:file1 ./

# use password xxxxxx

scp rp ./file1 gbpc13@10.160.112.218:

# use password xxxxxx

scp -rp gbpc13@10.160.112.218:nobel.tar.gz ./


Use the man pages to find out what do the r and p options stand for.
More complete backups can be used with an incremental backup program. The UNIX rsync command is
awesome at doing backups and even remote copies. It is similar to scp but has a ton more bells and whistles.
If you make changes to the files it will just replace the parts that were changed, speeding up the file transfer
significantly. As with all UNIX commands many options govern the use of the command. Read the man page
to get an idea of the power in rsync.

Example 1.
To backup your DATA directory to a mounted disk (this could be your flash drive or another mounted disk
or another place in the current file system):
$

mkdir /Volumes/mounted_disk/BU_MAC

rsync -rp $HOME/DATA/ /Volumes/mounted_disk/BU_ MAC/

rsync rp $HOME/CODE /Volumes/mounted_disk/BU_MAC/

rsync rp $HOME/.bash* /Volumes/mounted_disk/BU_MAC/

To restore these files from the mounted disk:


$

rsync rp /Volumes/mounted_disk /BU_MAC/*.* $HOME/

Example 2.
To backup your DATA directory to a remote disk:
$

rsync -rp $HOME/DATA login@remote_system:

And to restore these files from the remote disk:


rsync -rp login@remote_system:DATA/ $HOME/DATA/


7.7 Compression and archiving
If your files are large you might want to compress them. To compress a file use gzip:
$

gzip filename

40


To un-compress a file use gunzip:
$

gunzip filename


An archive is a single file that contains any number of individual files plus information to allow them to be
restored to their original form by one or more extraction programs. To archive a folder use tar (tape archive).
Here (c for create):

tar -cvf filename.tar folder_name

To un-archive (untar) a file using the same command, tar, but different options (x for extract):
$

tar -xvf filename.tar

7.8 grep
The command grep is a VERY useful command. Its basic function is to extract a word from a file. The grep
command allows you to search one file or multiple files for lines that contain a pattern. If it finds that pattern
on any line the line will be sent to the output (the screen).
For example, consider this file called file1.txt:
1902,physics
1923,medicine
1945,literature
1960,chemistry
1979,medicine
1988,physics
1998,economics
1998,peace
2005,medicine
2007,peace


We can use grep to extract the lines of file1.txt that contain physics.

$
grep physics file1.txt
1902,physics
1988,physics


If you want to extract lines that do NOT match a patter, try using the v option with grep. Try it out on file1.txt







8. Data stream manipulation


Key commands and concepts in this chapter
standard input, output, error

Concepts
Streams of data

41


redirection
pipe

Specify where output goes from commands


Send output from one command to another as input
Commands
Writes files to standard output

cat

8.1 Standard input, output and error (stdin, stdout, stderr)


Many commands such as ls, echo or pwd produce some sort of an output. This output is sent into a
special file called standard output (stdout). By default, standard output is sent to screen. For example, if you
use the command echo
$
echo Hello
Hello

an output will be generated and shown on screen. In this case, the command echo was executed successfully.
However, if you type echos, another output will be produced
$
echos Hello
-bash: echos: command not found

In this case the output is showing that the command was not executed successfully., Status messages (such as
errors) are sent into another special file called standard error (stderr). Standard error is also sent to the
screen by default.
Many programs take input from a device called standard input (stdin) which is attached to the keyboard.

DESCRIPTION
Standard input
Standard output
Standard error

Name

default

Produced from

stdin
stdout
stderr

keyboard
screen
screen

keyboard or program
program
program


8.2 More about the cat command
The command cat will send its output by default to standard out, the screen. Try running the cat by itself,
i.e., type cat and hit enter
$

cat

now type some text and press [return]


What does the cat command do?
The command takes standard input from the keyboard (the text you type) and catenates it. The result is it
repeats that which you typed. This does seem so useful however, the next command will perhaps give you
some ideas as to how to make it useful. The command cat will keep displaying what you typed until you hit
Cntl-D.

8.3 Redirect (>, >>, <), [CTRL+D]
It is often useful to capture the output from a command into a file. This is accomplished with the redirect
sign, >. This command redirects the standard output of a command from the screen to a file.
TRY IT:
$

echo 'Hello World!' > welcome.txt

cat welcome.txt

OR, instead of just cat by itself, redirect the catenated output to a file (use the command sequence ctrl-D to
exit)
$
cat > welcome2.txt
Hello World!
[Cntl-D]
$

cat welcome2.txt

42


Hello World!


What does a double redirect, >>, do? This command will append a file and not overwrite it.
$
cat >> welcome.txt
The END
[Cntl-D]
$
cat welcome.txt
Hello World!
The END


Try the following:
$
cat > file
# using the cat command create a file for input
a[CTRL+D] [CTRL+D]
# input one character 'a' and exit the process [CTRL+D]
# first to end the file and the second is to stop the cat command.

Check the file size with ls. You should obtain exactly 1 byte for the size of this file. Earlier using vi we saw the
editor added one character to the file. Here there is not end of line character using the cat command so the
file size is purely one character.
More examples:
$

echo 'hello world' > output

cat < output

The first line writes "hello world" to the file "output", the second reads it back and writes it to standard
output (normally the terminal). In essence this redirect sends the input to the command cat.

8.4 Redirect standard error and standard out
Standard error can also be redirected to a file:

command 2> error.log

Both stderr and stdout to a file:


command &> file.out

Standard error to the bit bucket:


command 2> /dev/null

Both stderr and stdout to the bit-bucket:


command &> /dev/null

The command above is redirecting standard output into /dev/null, which is a place you can dump anything
you dont want to see or keep at the moment. The command also redirects standard error into standard
output (you have to put an & in front of the destination when you do this). This command will not produce
any output on the screen, it is very quiet. Files that are written within the script are still written.

TRY IT:
Use the find command at the root level to locate your DATA directory. Some directories are not accessible by
your user login. So as to not see these errors mixed with desired output you can send all errors to another
output. This is basically a garbage can for unwanted data (bit-bucket).

$

find / -name DATA 2> /dev/null

43

8.5 Pipe, |
A pipe in UNIX is a way to take the output from one command and treat it as the input to another command.
$

ls | wc

The output of a command is piped (|) to another command wc.


TRY IT:
How many files are in the directory /bin?

EXAMPLE
Check what a keyword TERM exists in your environment variables. Are they part of the environment? We
can write a pipe stream to answer this question:
$

env | grep TERM


The output of env is piped (|) to the command grep. You can noticed that several lines contain the keyword
TERM. Maybe now you only want to show the one that contains also the keyword color. You can use the pipe
multiple times to extract exactly the information that you want:

$

env | grep TERM | grep color


8.6 Command substitution
It is useful to know how to take the output from a command and redirect it to an environment variable.
mypwd=`pwd`
echo $mypwd
The command to be executed is enclosed in back quotes. Those are not standard single quotes, but instead
come from the keyboard key that normally sits above the Tab key.
As you can see, bash provides multiple ways to perform exactly the same thing. Using command substitution,
we can place any command or pipeline of commands in between ` ` and assign it to an environment variable.
Here's an example of how to use a pipeline with command substitution:
EXAMPLES:
sshfiles=`ls /etc | grep ssh`
echo $sshfiles


9. Text manipulation
sort, uniq, cut, paste, lam, tr, sed
Key commands and concepts in this chapter
sort
uniq
cut, paste, lam
tr
sed

Commands
Sort content of a file
Omit repeated lines
Cut portions of lines, or merge lines
Translate characters
Stream editor

44


printf

Command for formatting output

Now that you are able to work with files it is time to manipulate the data within the files. Just as in a word
processing program you are used to using you can manipulate text in a file with UNIX commands. We will
explore commands that pull out words or phrases and commands that replace them with different text. We
can overwrite the files or send the edited text to a new file. You will find out how to pull out columns and
rows and how to put various parts back together.

9.1 sort
The sort command will sort the contents of a file. Options can include specific columns and a sequence of
columns and sorts on numeric or alpha characters. The t option is useful when columns are separated by
something other than blank spaces. What is the k option for?
Make a file f1.txt containing the following:
1 3
1 3
2 6
6 1
7 2
12 12
7 12
3 12
8 4
2 2

Use the command sort to sort based on the first column.


$

sort k1 f1.txt

Does this work? Check the man page to see what you need to add.
Use the command sort to sort based on the second column.
$

sort k2 f1.txt

Now try to figure out what the option n does.


If columns in a file are separated with a comma., you would add the t option, and specify the field separator:
$

sort t, filename


9.2 uniq

uniq - report or omit repeated lines

$

sort k1 f1.txt | uniq

9.3 cut, paste, lam


Where as the grep command works on lines you can extract specific columns using cut:
$

cut c 1-5 welcome.txt

Remember, file welcome.txt contained the following text:


Hello World!

You can use paste and lam to put columns back together.
$

cut c 1-5 welcome.txt > first

cut c 6 welcome.txt > int

cut c 7-11 welcome.txt > last

What is the difference between paste and lam?

45


Now try using paste and lam to put the words back together and produce
Hello World
as well as

World Hello
9.4 tr

tr commands does translation. It takes two sets of characters, and replaces occurrences of the characters in
the first set with the corresponding elements from the second set. For example,
$

echo ana

echo ana | tr 'a' 'd'

echo ana | tr 'an' 'da'

Use tr to translate all lowercase characters

What does the following command do?


$

tr a-z A-Z

tr reads the standard input (the keyboard), and on receiving the "end of file" (^D), copies it to the
standard output (the screen).

In UNIX, we can redirect both the input and the output of commands. Try this:
$

tr a-z A-Z > file

type a word with lowercase letters


type [ctrl-D]
type cat file

Now use the file you just made as input to tr:


$

tr A-Z a-z < file

What is the result?



9.5 sed
sed is a stream editor which allows non-interactive editing tasks (vi is interactive). This means you can edit
a file with vi like commands right on the command line or within a script. Many pages have been written
describing the use of sed. It is a very powerful stream editor that can take the place of grep. We will use it in
brief here.
The most simplest use Is that of search and replace. Use cat to make a file coffee.txt
$
cat > coffee.txt
I love tea

type [ctrl-D]
Now you can use sed to correct replace the word tea for coffee
$
sed s/tea/coffee/ coffee.txt
I love coffee

The s function will perform substitution of text.


sed edits a file line by line and is quite useful for doing single edits without invoking the default editor.
sed s/text/replace/g file
The g appended at the end of the last slash tells sed to replace all occurrences of text Without the g only
the first instance on a given line would be replaced.
To delete a word use,
sed s/word//g file

46

9.6 printf
Printf stands for formatted print. With the printf command you have more control on the appearance of your
output. General format when using printf is:

printf <format> arguments

You can try running printf without any format string:



$

printf "Hello world!

What happens. If you want your bash prompt to appear on a new line you need to add a new line character.
You can also tell printf that the argument is a string:

$

printf "%s \n Hello world!

Here %s means that the argument is a strng, and \n tells to add a new line.
Try removing the quotes:

$

printf "%s \n Hello world!

What happened?
You can also specify to include two strings before the new line.

$

printf "%s %s \n Hello world!

Some additional widely used format specifiers are:


%d or %i - for decimal (as opposed to e.g.m hexadecmial) integer
%f floating point (real number)
You can specify exactly how long your strings will be, or how many significant digits you want to keep. Here
are some examples
$

printf "%f %d\n" 100.1 10

printf "%1.2f %d\n" 10 100.10

printf "%1.2f %4d\n" 10 100.10

printf "%s %s \n Hello world!

printf "%10s %s \n Hello world!

printf "%10s %5s \n Hello world!

printf "%-10s %5s \n Hello world!


To see a detailed description and available formats, use the man pages. Several examples can be found at
https://www.gnu.org/software/gawk/manual/html_node/Format-Modifiers.html

9.7 Useful miscellaneous UNIX commands: diff, date, who, finger, history
There are many useful commands (programs) that are packaged with UNIX. We have looked at several during
class sessions. Spending time roaming the man pages or the GNU website manuals might trigger new
inspiration about how to tackle a problem. Often you might be trying to reinvent a wheel that has been
optimized. Additionally, some commands have stripped down versions that are optimized for speed. So if
things are getting sluggish you should do some investigating for a more optimal code.

47

Here are a few notable commands:


Use diff to compare 2 files
Use date to echo the date
Use whoami to see how you are logged in at this terminal session
Use who to see who is logged in
Use finger to get information about a user
Use history to view and access your command history stored in ~/.bash_history:
history

repeats the last command

!!
!3

Displays the history stored in ~/.bash_history file.

!command

repeats command 3

repeats most recent command starting with the given command

10. Advanced text processing with AWK


Key commands and concepts in this chapter
Concepts
Programming language that takes action on patterns,
And is also a tool for processing rows and columns

awk

10.1 AWK

Another text processing tool often used for command line data extraction is AWK. AWK is a programming
language which takes action on patterns. A number of people also utilize the scripting features of AWK. The
language was developed at Bell Labs in the 1970s (a powerhouse of computer architecture design). The
name derives from its authors: Alfred Aho, Peter Weinberger, and Brian Kernighan. Similar to grep, AWK
processes text but is not limited to lines of text.
Since AWK is a programming language, execution of AWK commands can be via the command line or an AWK
program. To invoke awk within bash
awk pattern {action} file


10.2 AWK example

To learn how awk works, make a file called example1.dat that contains 3 lines, and 4 columns (fields)
separated by white spaces
1 a d one

2 b a two

3 c b three


To search for pattern b invoke awk like this

48

$
2
3

b
c

awk /b/ {print} example1.dat


a two
b three


Here b is the pattern, and print is the action. The search pattern is enclosed between two slash (/) characters.
In this case, awk works like the grep command, i.e., one could have simply done:

$
2
3

b
c

grep b example1.dat
a two
b three


However, awk can read a line and split it into fields. With awk, one can search for a pattern in a specific field,
and one can print specific fields. For example, if one only wants to print a line that contains a letter b in the
second column (field) one would do:
$
2

awk $2 == b {print} example1.dat


a two


Note that here, character $ specifies a field (the meaning of $ is different in awk and in bash). Now try printing
out only the line that contains a b in the third column. If one wanted to print lines where numbers in the first
column are greater than 2, one would do

$
3

awk $1 > 2 {print} example1.dat


b three


With awk, one can also print only specific columns (fields). For example, to print only third column:

$
d
a
b

awk {print $3} example1.dat


Note that no search pattern was given. One can combine search for patterns, and also only print certain
columns. For example if the number in the first columns is equal to 2, print third column:
$
a

awk $1 == 2 {print $3} example1.dat


Now try running awk on file example1.dat with following actions and patterns
Examples of actions:

Print Every Line

awk {print} OR awk {print $0}

Print Certain Fields

awk {print $1, $3}

NF, The Number of Fields

awk {print NF, $1, NF}

Computing and Printing

awk {print $4, $1 + $1}

Printing Line Numbers

awk {print NR, $0}

Putting Text in the Output

awk {print total pay for , $1, is , $2 * $3}

49

Examples of search patterns. Here the specific signs signify:


> (greater than); < (less than); >= (greater than or equal); <= less than or equal; == (equal); != not equal

awk $1 > 2 {print $4, 5*$1 }

awk $1 >= 2

awk $1*$1 != 1

awk $4 = = two


Selection patterns can be combined with parentheses and the logical operators AND (&&), (OR) ||, and
(NOT) !

awk $2 >= 1 || $3 != 200

awk 'NR<5 && ($1 <= 1 || $3 == b)'

awk NF !=5 {print $0, number of fields is not equal to 5}

awk $1 <10 {print $0, value is too small}

awk $1 > 10 {print $0, value is too large}

awk !($2 == a || $3 == b)


Overview:

Often awk is used for single command line editing of a file.

Only two types of data in awk: numbers and strings of characters.

Awk reads one line at a time and splits each line into fields.

A field is a sequence of characters that doesnt contain any blanks or tabs.

First field in current input line is called $1, the second $2, $n.

The entire line is referenced as $0.

The number of fields can vary from line to line.

Use print to print the output

Use printf for nicely formatted output

11. Flow control: looping with For


For loop, counting, script arguments
Key commands and concepts in this chapter
For; do; done
loops
for loop

Commands
Basic elements of a for loop
Concepts
programming tool that enable repeated execution of
commands
operates on lists of items and repeats a set of
commands for every item in a list

50


Flow control is specification of order in which certain commands are executed. In the next three lessons we
will talk about loops and about making choices. The main use of loops is to automate repetitive tasks. We will
talk about the so called For and While loops.

11.1 The For loop
For loops will go sequentially through a list, and repeat execution of commands for each item in the list. The
list can be a set of numbers, words or files. The general format of the for loop in bash is:

for variable in 1 2 3 N

do

command1

command2

done

In the format above, the iteration is over a list of numbers 1..N. Below we will see examples of for loops over a
list of numbers, words and filenames. Multiple commands can be included in a loop.

11.2 Loop examples
Example 1. Edit a script and name it for_1.bash. This loop will iterate over a list of words.
#!/bin/bash
#login name:20120915:for_1.bash:
#for_1 is a sample for loop in bash script
# i is a variable that will be assigned once for every entry in a given
#list (Moon, Stars, Sun)
# for each i value the code between "do" and "done" will be executed.
for i in Moon Stars Sun
do
echo $i
done

Run the script:


$

. for_1.bash

What exactly happened? The "for i" part of the "for" loop defined a new variable (also called a loop control
variable) called "i", which was successively set to the values "Moon", "Stars", and "Sun". After each
assignment, the body of the loop (the code between the "do" ... "done") was executed once. In the body, we
referred to the loop control variable "i" using standard variable expansion syntax, i.e., $i, like any other
variable.
Example 2: Edit a script and name it for_2.bash. Now the iteration will be over a list of numbers.

#!/bin/bash
#login name:20120915:for_2.bash:echo a list of numbers
for i in 1 2 3 4
do
echo number $i
done


Try changing the name of variable i into x. Is there any difference?

Example 3: Edit a script and name it for_3.bash. The iteration will be over a list of files. We will also include
two different commands within the loop.

51

#!/bin/bash
#login name:20120915:for_3.bash:echo a list of numbers
for i in for_1.bash for 2.bash
do
ls $i
cat $i
done



11.3 Using shell expansion and quotes in for loops
The for conditional always takes a list after the "in" statement. In first case we specified three English words,
in second case we specified four numbers, in third case we specified two files. You can use shell expansion to
specify the lists.
Example 4: This example uses the * wildcard to specify a list of files.

#!/bin/bash
#login name:20190215:for_4.bash:description
for i in *.dat
# using filename expansion.
# i is a variable that will be assigned
# once for every .dat file in the current directory
do
head -2 $i # display first 2 lines of all *.dat files
done


Example 5: In the case below we are using a brace expansion to generate a list of numbers 1 10.
#!/bin/bash
#login name:20190215:script_5.bash:description
#Usage: ./script_5.bash
for i in {110}
# using brace expansion.
# i is a variable that will be assigned
# once for every number in the brace expansion
do
echo number $i
done

Example 6: In the case below we are using quotes to limit the list from three elements, to just one
# !/bin/bash
#login name:20190215:script_6.bash:description
for i in Moon Stars Sun
do
echo $i
done
exit


Compare this example to example 1.
11.4 Script arguments
A script can have arguments passed to it from the command line. Within the program they are treated as variables.
These arguments are referenced within the program as follows:
"$1" will expand to the 1st argument, as called from the command line

52


"$2" to the 2nd
"$3" to the 3rd
"$@" refer to all command-line arguments separated by spaces.
Example 7: In this example, arguments to the script are specified from the command line. Instead of the list, we
specify $@.
# !/bin/bash
#login name:20120215:for_7.bash:description
#usage: . for_7.bash Sun Moon Stars
for i in $@
do
echo $i
done
exit


When we run this program we have to specify the arguments:.
$

. for_7.bash Sun Moon Stars


Try running this script with different arguments, including different words, numbers, files, and a mixture of
words, numbers and files.
11.5 Counting
Counting is a task often done in programming. Here are different syntax styles to use for counting.
count=0
#define variable count and set it to 0
for i in a b c
do
count=$(( count + 1 )) # increase variable count by 1
done
echo $count

Here is a modification of the counting loop.


count=0
#define variable count and set it to 0
for i in a b c
do
count=$(( count + 1 )) # increase variable count by 1
echo variable i is $i
# show variable i
echo variable count is $count # show variable count
done


Program to calculate a sum
sum=0
for i in 2 5 8
do
sum=$(( sum + $i ))
done

And a modification of this program


sum=0
for i in 2 5 8
do
sum=$(( sum + $i ))
echo variable i is $i
echo variable sum is $sum

53


done

54

12. Flow control: branching with if


Key commands and concepts in this chapter
Concepts
Used to determine course of action depending of a
condition is true or false
Convenient test of variables and data to direct
programming path
Putting one loop inside another loop
Continue doing an action until a condition is true
Commands
Basic commands of if statements
return exit status
Tests exit status
Basic commands of while loop

if conditionals
String comparison
nesting
while loop
If, then, else, elif, fi
true, false
test
while, do, done

12.1 If then else statements


Conditional expressions such as if-then-else statements, perform different computations or actions
depending on whether a programmer-specified condition evaluates to true or false.
The general form of if then else statements sets conditions, and if they are true an action is done.

if [ condition 1 ]; then
action 1
elif [ condition 2 ]; then
action 2
else
action 3
fi

EXAMPLES
Lets make a variable called age and set it to 20:
age=20

We can pose a simple condition, if the age is greater than or equal to 18, the person can vote. Based on this
condition, we can perform an action of printing out a statement:

if [ $age -ge 18 ]
then
echo I can vote
fi

# this will get executed if the condition is met


Here ge means greater than or equal. Try running this simple if statement. We can make this more
complicated by including another action that gets executed if the condition is NOT met. This is achieved with
an ifelse statement:

if [ $age -gt 18 ]
then
echo I can vote # this will get executed if the condition is met.
else
echo I cannot vote #this will get executed if the condition is NOT met
fi

55


Try running this. Now change the variable age to 17 and try running the if-else statements again. What
happens?
Things can get more complicated by including multiple conditions. Maybe, if a person is exactly 18 years old,
we can print a statement saying that the person just turned 18. This can be performed with an if-elsif-else
statement:

if [ $age -gt 18 ]
then
echo I can vote
elif [ $age -eq 18 ] # for elif we have to specify another condition
echo I just turned 18 and I now can vote
else
echo I cannot vote
fi


Here gt means greater than, while eq means equal. Try running this for different values of variable age.

12.3 Specifying test condition, [
To test if a condition is true or false the square bracket is used. The test condition will have an exit status of 0
when the expression inside the square bracket is true, and 1 when it is false. The exit status is checked in the
following way
$

echo $?

For example, a correctly executed command will have an exit status of 0:


$
$

pwd; echo $?
0

However, an incorrectly executed command will have an exit status that is different from 0:
$
$

ped; echo $?
127

Thus if variable age is set to 17, the test condition [ $age -gt 18 ] gives an exit status of non-zero, i.e. false
$
$

age=17; [ $age -gt 18 ]; echo $?


1

If the age is changed to 20, the same test condition will give an exit status of zero, i.e. true
$
$

age=20; [ $age -gt 18 ]; echo $?


0


Arithmetic values can be compared using the following comparison operators: -eq, -ne, -lt, -le, -gt, or -ge,
meaning equal, not equal, less than, less than or equal, greater than, and greater than or equal, respectively.
Note that for the test condition to work, you have to have a blank space right after and before the square
brackets [ ,and ], respectively.
You can compare strings for equality, inequality, or whether the first string sorts before or after the second
one using the operators =, !=, <, and >, respectively. Note: the < and > operators are also used by the shell for
redirection, so you must escape them using \< or \>.
Example:
$


[ "abc" != "def" ];echo $?

The above string test returns a zero. Guess what the next string tests will produce:
[
[
[
[
[

"abc"
"abc"
"abc"
"abc"
"abc"

!= "def" ];echo $?
\< "def" ];echo $?
\> "def" ];echo $?
\<"abc" ];echo $?
\> "abc" ];echo $?

56


In bash, one can also perform tests on files and directories. For example:

-f file1

will give true if file1 exists and is a regular file; while


-d file1

will give true if file1 exists and is a directory.



Example:
If the file exists cat its contents else let the user know it does not exist. Not so useful for one file but for a lot
of files quite useful.
if [ -f filename ]; then
# If the file exists
cat filename
# then cat its contents
else
echo "The file filename does not exist."
fi

TRY IT:
Try it with a dataset from your DATA directory.
Find the definitions for the following options (use a bash resource):

options

-a, -d, -r, -s, -w, -n


12.5 Nested conditional
We looked at for and if conditionals now lets nest an if statement inside a for loop:
#!/bin/bash
for myfile in /etc/r*
do
if [ -d "$myfile" ] ; then
echo "$myfile (dir)"
else
echo "$myfile"
fi
done

# the first conditional


# the second nested conditional

The above code looped over each file in /etc that began with an "r". To do this, bash first took our wildcard
/etc/r* and expanded it, before executing the loop. Nested inside the loop, the "-d" conditional operator was
used to perform two different actions, depending on whether myfile was a directory or not. If it was, then the
directory name followed by (dir) was shown on screen.

12.6 The while loop
A "while" statement will execute as long as a particular condition is true, and has the following format:
while [ condition ]
do
statements
done
This conditional, while, can be used to loop a certain number of times, as in the following example. This code
will loop exactly 10 times:
i=0
while [ $i -ne 10 ]
do
echo $i
i=$(( $i + 1 ))
# this statement increments the variable by one

57


done

In the above example, the while loop makes use of arithmetic expansion to increment the variable i by one,
and eventually cause the test condition to be false, and the loop to terminate. What would have happened if
the increment statement was omitted?

13. Regular expressions


Key commands and concepts in this chapter
Concepts
Actual characters
Characters used for pattern matching
symbolic notations used to identify patterns in text
Commands
global regular expression print

Literals
Metacharacters
Regular expressions (RegEx)
grep

Regular Expressions (RegEx) is a metalanguage used for pattern matching. If you have a particular criterion
you are trying to select for or write (xyz) you will want to make use of regular expressions. The notation
allows a user to match and select data strings. RegEx are not used by the shell directly but find their use in
many UNIX utilities: vi, less, sed, egrep, grep, awk, python and others. In fact the name grep stands for global
regular expressions print. Some annoying differences persist between the different flavors. We will try to
keep it simple here and you can augment as you find need. Note: The metacharacters used to match
filenames (ie, for filename expansion) in the shell ($, *, ?, []) do not have the same meaning as they do in
Regular Expressions which are used to match text.
There are two key ideas at work in regular expressions (RegEx):

ordinary characters ie, just the actual character

special characters or metacharacters

Both of these were introduced earlier in the course but in the context of filename expansion. Here we will use
the same concepts to extract information from files. An ordinary character is literally just that, the character.
It is a literal. If a character performs an action or has a meaning beyond its literal nature it is a
metacharacter.

TABLE: Regular Expression Characters

Operator

Function

Syntax

Result

Any single character

x.z
hop.ins
a..

xyz
any character in place of k
a followed by any two characters

Beginning of string (line)

^Hopkins

Hopkins at beginning of line

End of string (line)

N$
Hopkins$
x$
^abcd$
^$

All nitrogen atoms


Hopkins at end of line
x only if it is the last character on the
line
a line containing just the characters abcd
a line that contains no characters

58


[]

Single character within


bracket

[O]
[Tt]
[a-z]
[a-zA-Z]
[Hh]opkins

Every line (any line with O)


lower or uppercase t
lowercase letter
any alphabetic character

[^ ]

Negation character.
Single character not
contained in the brackets

[^A-Z]
[^0-9]
[^a-zA-Z]

No uppercase letters
any nonnumeric character
any nonalphabetic character

There are additional RegEx metacharacters. To familiarize yourself with them you can look up this webpage:
http://regexone.com/lesson/

EAMPLE:
Make a file dirlist.txt in the following way:
$

ls /bin/ > dirlist.txt

ls /usr/bin/ >> dirlist.txt

ls /sbin/ >> dirlist.txt

ls /usr/sbin/ >> dirlist.txt


Now look at that file with one of the file viewing commands.
What is the difference between matching zip and .zip:
$

grep zip dirlist.txt

grep .zip dirlist.txt

Now try the metacharacters that will match zip at the beginning and ending of a line:
$

grep ^zip dirlist.txt

grep zip$ dirlist.txt

grep ^zip$ dirlist.txt

Now try matching or NOT matching a single character


$

grep [bg]zip dirlist.txt

grep {^bg]zip$ dirlist.txt

Finally, try figuring out what happens now:


$

grep [A-Z] dirlist.txt

59

Appendices and Tables


Accessing a Computer and Obtaining Software

This course will use the computers maintained by Krieger Arts and Sciences Information and Technology
department, IT. These computers (Apple computers) are setup with all software and current versions we will
use during the course. You only need an up to date JHED ID.
LOCATIONS:
Undergraduate Teaching Laboratories (UTL)
Krieger Hall
Access to the UTL building itself and to the computer lab (G98) occurs via the ISIS roster. If at any time you
are unable to access either the building or the computer lab please send an email to me. Computer related
issues can be directed toward http://kriegerit.jhu.edu/support.html.
The computer lab will have a TA available several times a week who can help most evenings with your UNIX,
Python, Matlab, and Mathematica related questions. The TAs are a valuable resource for you. We have
arranged to have several available so that you can have one-on-one interactions often. We want you to ask
questions and interact with us. Please make use of these resources.
You may wish to setup your personal or lab computers to have the same environment or close to the same. In
addition to help at http://kriegerit.jhu.edu/support.html the following links are provided to help you obtain
necessary software. Should you need any additional help the TA or myself can help you. It is important to
stay with the class and valuable time can be lost when students get sidetracked setting up their computers.
Rarely is the process streamlined. I advise you to rely and work on computers maintained by IT in the early
weeks of class. Once your personal computer is setup properly you can utilize it for homework and practice.

UNIX
Most computers used by students have a version of UNIX on their systems. If you do not have it check out the
following link to see what options you may have:
http://www.tech-faq.com/where-to-download-UNIX.html
Bash shell
If UNIX is installed on your system you likely have bash available as well. If not check out the following link:
http://www.gnu.org/s/bash/
Python
Python programming language is free. You may download the latest version, 3.3.2
http://www.python.org/getit/
Matlab
JHU has a site license for your use:
http://www.it.johnshopkins.edu/services/software/matlab/
Mathematica
JHU has a site license for your use:
http://www.it.johnshopkins.edu/services/software/slic/

60


Download NOAA temperature dataset
Steps to get data file:

$

cd $HOME/DATA/NOAA

open ftp client:


$

ftp

go to main site:
$

open ftp.ncdc.noaa.gov

login:
anonymous

password:
anonymous

follow the following path:


cd pub/data/ushcn/v2.5

Download tar file


mget *

File should be appear in current working directory folder


The File you will use contains: Fls and tavg

Resources

http://www.gnu.org/software/bash/manual/bashref.html
"Bash by example: Part 1" on developerWorks.
"Bash by example: Part 2" on developerWorks.
http://www.oreillynet.com/linux/cmd

linux

http://www.linuxfocus.org/English/July1998/article53.html

vi, egrep

tldp.org Bash, Linux


The AWK programming Language. Aho, Kernighan, and Weinberger. 1988.
http://www.thegeekstuff.com
the "Bash Cookbook" (Albing, Vossen, & Newham)


61

Reference Tables


Filename expansion characters
The following characters are used by the shell to match pattern expressions to filenames. It will substitute
the names of all matching files for the pattern.
*

Match any # of characters (0 or more)


For listing files in $HOME for
specified login

~
~ loginid

Match a single character

ls file?

[abs]

Match any characters within brackets


Match any characters except those
within brackets

[!abc]
{1,2,3}.dat
{1..5}.dat
{a-b}.dat

current users $HOME


$HOME for loginid
Lists all files beginning with file and ending
in any character.

1.dat 2.dat 3.dat


1.dat 2.dat 3.dat 4.dat 5.dat
a.dat b.dat c.dat

Table of special (meta) character usage.


character

Character type
single quotes*

double quotes

dollar sign

{}

curly braces

``

back quotes

()

parentheses

$#
$0

-eq
-lt
misc

boolean
boolean
variables

wildcard

>, <
|

redirect
pipe

;
&&

command separator

;
[

test

return
exit
#
$, :, >

return
exit
terminal prompt
terminal prompts

type

meta

Example / usage
myvar=This is an environment variable!

text to print
expand the variable myvar: $myvar
$myvar_foo1
variable expansion
$myvar
${myvar}_foo2
touch {t1,t2,t3}

Notes
1. no spaces
2. not needed unless multiple words
3. disables shell expansion

1. use quotes or braces to segregate from


other text
1. similar to double quotes
2. Will perform the command on each
element in the braces separated by
commas

execute command enclosed


idir=`$pwd`
execute command enclosed
idir=$(pwd)
expand to number of arguments
expand to string bash from shell interactive
shell
OR
expand to name of calling script
Logical equals
Logical less than
made globally, ie can access outside of script in
current shell
use locale to distinguish
wildcard expansion
ls t*
Stream input or output elsewhere
take output from last command and use as input
to next command
Used to put multiple lines of code on one line.
Logical operator can be used with a test and a
command. Eg, [ -z $PS1 ] && echo done
Command separator on a single line
Test the condition that follows. Used in if
statements
exit from the sequence
exit from shell
often used as the prompt for the root user
ordinary prompts

62


./
../

current directory
parent directory

Search in current directory


Traverse the directory up one level from the
current

Single quotes disables a bash feature called expansion, where special characters and sequences of
characters are replaced with values.


Key Mappings
Key binding

Function

Tab

Complete filename up to next non-unique character

Ctrl-c

Kill foreground process, ie cancel command or interrupt program

Ctrl-z

Suspend foreground process, type fg to resume

Ctrl-d

Terminate input, or exit shell (function can depend on context)

Ctrl-s

Suspend output

Ctrl-u

Clear the command line

Ctrl-q

Resume output

Ctrl-o

Discard output

Ctrl-l

Clear screen

Redirects
Operator

Function

>

Redirect standard output to file

>>

Redirect standard output and append to file

<

Redirect standard input from file

<<

Redirect standard input from command source. Used for here documents

>!

Redirect standard output and overwrite file

>>!

Redirect standard output to file or append to file

>&

Redirect standard output/error to file

Usage
command > output

command < input


grep abc < file.dat

63


>>&

Redirect standard output/error and append to file

<<<

Redirect a word to standard input to a command

>>&!

Redirect standard output/error to file or append to file and overwrite

Pipe standard output to standard input

|&

Pipe standard output/error to standard input

`command`

Replace command with its output

cat <<<Hello World!

64


Test conditions for Strings, Files and Integers
String Condition Tests
Example: [ condition ]

$
0

[ a \< b ]

since a comes before b is its less than b

[ "a" \< "d" ];echo $?

Operator

True if

string1 = string 2

string1 matches string2

string1 != string 2

string1 does not match string2

string1 == string2

string1 is equal to string2

string1 !== string2

string1 is not equal to string2

string1 < string2

string1 is less than string2

string1 > string2

string1 is greater than string2

-n string1

string1 is not null

-z string1

string1 is null

&&

Logical AND

||

Logical OR


File Condition Tests
Example: [ condition ]
Operator

True If

-a file

file exists

-d file

file exists and is a directory

-f file

file exists and is a regular file (e.g. is not a directory)

-r file

You have read permission on file. Can also be used with -w, -x for

write, and execute permissions respectively.

-s file

file exists and is not empty

file1 -nt file2

file1 is newer than file2

file1 -ot file2

file1 is older than file2


Integers
Integer variables can take the following conditionals in addition to those for strings.
For example: [ 3 -gt 2 ] is equivalent to [ 3 \> 2 ]

Operator

Meaning

-lt

Less than

-gt

Greater than

-le

Less than or equal to

-ge

Greater than or equal to

-eq

Equal to

-ne

Not equal to

65

Glossary terms
Reference: http://www.tldp.org/LDP LUNIX Documentation Project
BASH
The Bourne Again Shell and is based on the Bourne shell, sh, the original command interpreter.
Bourne Shell
The Bourne shell is the original UNIX shell (command execution program, often called a command
interpreter) that was developed at AT&T. Named for its developer, Stephen Bourne, the Bourne shell is
also known by its program name, sh. The shell prompt (character displayed to indicate readiness for input)
used is the $ symbol. The Bourne shell family includes the Bourne, Korn shell, bash, and zsh shells. Bourne
Again Shell (bash) is the free version of the Bourne shell distributed with Linux systems. Bash is similar to
the original, but has added features such as command line editing. Its name is sometimes spelled as Bourne
Again SHell, the capitalized Hell referring to the difficulty some people have with it.
CLI
A CLI (command line interface) is a user interface to a computer's operating system or an application in
which the user responds to a visual prompt by typing in a command on a specified line, receives a response
back from the system, and then enters another command, and so forth. The MS-DOS Prompt application in
a Windows operating system is an example of the provision of a command line interface. Today, most users
prefer the graphical user interface (GUI) offered by Windows, Mac OS, BeOS, and others. Typically, most
of today's UNIX-based systems offer both a command line interface and a graphical user interface.
daemon
A process lurking in the background, usually unnoticed, until something triggers it into action. For
example, the \cmd{update} daemon wakes up every thirty seconds or so to flush the buffer cache, and the
\cmd{sendmail} daemon awakes whenever someone sends mail.
environment variable
A variable that is available to any program that is started by the shell.
filesystem
The methods and data structures that an operating system uses to keep track of files on a disk or partition;
the way the files are organized on the disk. Also used to describe a partition or disk that is used to store the
files or the type of the filesystem.
GUI
Graphical User Interface. The use of pictures rather than just words to represent the input and output of a
program. A program with a GUI runs under some windowing system (e.g. The X Window System,
Microsoft Windows, Acorn RISC OS, NEXTSTEP). The program displays certain icons, buttons, dialogue
boxes etc. in its windows on the screen and the user controls it mainly by moving a pointer on the screen
(typically controlled by a mouse) and selecting certain objects by pressing buttons on the mouse while the
pointer is pointing at them. Though Apple Computer would like to claim they invented the GUI with their
Macintosh operating system, the concept originated in the early 1970s at Xerox's PARC laboratory.
hard link
A directory entry, which maps a filename to an inode, number. A file may have multiple names or hard
links. The link count gives the number of names by which a file is accessible. Hard links do not allow
multiple names for directories and do not allow multiple names in different filesystems.
init
'init' process is the first user level process started by the kernel. init has many important duties, such as
starting getty (so that users can log in), implementing run levels, and taking care of orphaned processes.
This chapter explains how init is configured and how you can make use of the different run levels. init is
one of those programs that are absolutely essential to the operation of a Linux system, but that you still can
mostly ignore. Usually, you only need to worry about init if you hook up serial terminals, dial-in (not dialout) modems, or if you want to change the default run level. When the kernel has started (has been loaded
into memory, has started running, and has initialized all device drivers and data structures and such), it
finishes its own part of the boot process by starting a user level program, init. Thus, init is always the first
process (its process number is always 1). The kernel looks for init in a few locations that have been
historically used for it, but the proper location for it is /sbin/init. If the kernel can't find init, it tries to run
/bin/sh, and if that also fails, the startup of the system fails. When init starts, it completes the boot process
by doing a number of administrative tasks, such as checking filesystems, cleaning up /tmp, starting various
services, and starting a getty for each terminal and virtual console where users should be able to log in.
After the system is properly up, init restarts getty for each terminal after a user has logged out (so that the
next user can log in). init also adopts orphan processes: when a process starts a child process and dies

66


before its child, the child immediately becomes a child of init. This is important for various technical
reasons, but it is good to know it, since it makes it easier to understand process lists and process tree
graphs. init itself is not allowed to die. You can't kill init even with SIGKILL. There are a few variants of
init available. Most Linux distributions use sysvinit (written by Miquel van Smoorenburg), which is based
on the System V init design. The BSD versions of UNIX have a different init. The primary difference is run
levels: System V has them, BSD doesn't.
inode
An inode is the address of a disk block. When you see the inode information through ls, ls prints the
address of the first block in the file. You can use this information to tell if two files are really the same file
with different names (links). A file has several components: a name, contents, and administrative
information such as permissions and modification times. The administrative information is stored in the
inode (over the years, the hyphen fell out of "i-node"), along with essential system data such as how long it
is, where on the disc the contents of the file are stored, and so on. There are three times in the inode: the
time that the contents of the file were last modified (written); the time that the file was last used (read or
executed); and the time that the inode itself was last changed, for example to set the permissions. Altering
the contents of the file does not affect its usage time and changing the permissions affects only the inode
change time. It is important to understand inodes, not only to appreciate the options on ls, but because in a
strong sense the inodes are the files. All the directory hierarchy does is provide convenient names for files.
The system's internal name for the file is its i-number: the number of the inode holding the file's
information.
kernel
Part of an operating system that implements the interaction with hardware and the sharing of resources.
libraries
Executables should have no undefined symbols, only useful symbols; all useful programs refer to symbols
they do not define (eg. printf or write). These references are resolved by pulling object files from libraries
into the executable.
link
A symbolic link (alias in MacOS and shortcut under Windows) is a file that points to another file; this is a
commonly used tool. A hard-link rarely created by the user, is a filename that points to a block of data that
has several other filenames as well.
man page
Every version of UNIX comes with an extensive collection of online help pages called man pages (short for
manual pages). The man pages are the authoritative documentation about your UNIX system. They contain
complete information about both the kernel and all the utilities.
NFS
Network File System, is the UNIX equivalent of Server Message Block (SMB). It is a way through which
different machines can import and export local files between each other. Like SMB though, NFS sends
information including user passwords unencrypted, so it's best to limit its usage to within your local
network.
operating system
Software that shares a computer system's resources (processor, memory, disk space, network bandwidth,
and so on) between users and the application programs they run. Controls access to the system to provide
security.
PATH
The shell looks for commands and programs in a list of file paths stored in the PATH environment variable.
An environment variable stores information in a place where other programs and commands can access it.
Environment variables store information such as the shell that you are using, your login name, and your
current working directory. To see a list of all the environment variables currently defined; type 'set' at the
prompt. When you type a command at the shell prompt, the shell will look for that command's program file
in each directory listed in the PATH variable, in order. The first program found matching the command you
typed will be run. If the command's program file is not in a directory listed in you PATH environment
variable, the shell returns a "commands not found" error. By default, the shell does not look in your current
working directory or your home directory for commands This is really a security mechanism so that you
don't execute programs by accident. What if a malicious user put a harmful program called ls in your home
directory? If you typed ls and the shell looked for the fake program in your home directory before the real
program in the /bin directory, what do you think would happen? If you thought bad things, you are on the
right track. Since your PATH doesn't have the current directory as one of its search locations, programs in
your current directory must be called with an absolute path of a relative path specified as './program-name'.

67


To see what directories are part of your PATH enter this command: # echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin/X11
pipes and sockets
Special files that programs use to communicate with one another. They are rarely seen, but you might be
able to see a socket or two in the /dev/ directory.
process identifier
Shown in the heading of the ps command as PID. The unique number assigned to every process running in
the system.
SSH
The Secure Shell, or SSH, provides a way of running command line and graphical applications, and
transferring files, over an encrypted connection, all that will be seen is junk. It is both a protocol and a suite
of small command line applications, which can be used for various functions. SSH replaces the old Telnet
application, and can be used for secure remote administration of machines across the Internet. However, it
also has other features. SSH increases the ease of running applications remotely by setting up X
permissions automatically. If you can log into a machine, it allows you to run a graphical application on it,
unlike Telnet, which requires users to have an understanding of the X authentication mechanisms that are
manipulated through the xauth and xhost commands. SSH also has inbuilt compression, which allows your
graphic applications to run much faster over the network. SCP (Secure Copy) and SFTP (Secure FTP)
allow transfer of files over the remote link, either via SSH's own command line utilities or graphical tools
like Gnome's GFTP. Like Telnet, SSH is cross-platform. You can find SSH server and clients for Linux,
UNIX and all flavours of Windows, BeOS, PalmOS, Java and embedded Oses used in routers.
STDERR
Standard error. A special type of output used for error messages. The file descriptor for STDERR is 2.
STDIN
Standard input. User input is read from STDIN. The file descriptor for STDIN is 0.
STDOUT
Standard output. The output of scripts is usually to STDOUT. The file descriptor for STDOUT is 1.
symbolic link or soft link
A special filetype, which is a small pointer file, allowing multiple names for the same file. Unlike hard
links, symbolic links can be made for directories and can be made across filesystems. Commands that
access the file being pointed to are said to follow the symbolic link. Commands that access the link itself do
not follow the symbolic link.
ZSH
Zsh was developed by Paul Falstad as a replacement for both the Bourne and C shell. It incorporates
features of all the other shells (such as file name completion and a history mechanism) as well as new
capabilities. Zsh is considered similar to the Korn shell. Falstad intended to create in zsh a shell that would
do whatever a programmer might reasonably hope it would do. Zsh is popular with advanced users. Along
with the Korn shell and the C shell, the Bourne shell remains among the three most widely used and is
included with all UNIX systems. The Bourne shell is often considered the best shell for developing scripts.

68

Glossary commands
alias
Create a shell alias for a command.
apropos
Search the whatis database for strings.
apt-get
APT package handling utility.
aspell
Spell checker.
(g)awk
Pattern scanning and processing language.
bash
Bourne Again SHell.
bg
Run a job in the background.
cat
Concatenate files and print to standard output.
cd
Change directory.
chattr
Change file attributes.
chgrp
Change group ownership.
chmod
Change file access permissions.
chown
Change file owner and group.
compress
Compress files.
cp
Copy files and directories.
cut
Remove sections from each line of file(s).
date
Print or set system date and time.
df
Report file system disk usage.
diff
Find differences between two files.
du
Estimate file space usage.
echo
Display a line of text.
egrep
Extended grep.
eject
Unmount and eject removable media.
exec
Invoke subprocess(es).
exit
Exit current shell.
export
Add function(s) to the shell environment.
fg
Bring a job in the foreground.
file
Determine file type.

69


find
Find files.
ftp
Transfer files (unsafe unless anonymous account is used!)services.
grep
Print lines matching a pattern.
gzip
Compress or expand files.
head
Output the first part of files.
help
Display help on a shell built-in command.
info
Read Info documents.
init
Process control initialization.
kill(all)
Terminate process(es).
less
more with features.
ln
Make links between files.
logout
Close current shell.
lp
Send requests to the LP print service.
lpc
Line printer control program.
lpq
Print spool queue examination program.
lpr
Offline print.
lprm
Remove print requests.
ls
List directory content.
man
Read man pages.
mkdir
Create directory.
more
Filter for displaying text one screen at the time.
mv
Rename files.
pr
Convert text files for printing.
printenv
Print all or part of environment.
ps
Report process status.
pwd
Print present working directory.
rm
Remove a file.
rmdir
Remove a directory.
scp
Secure remote copy.
set

70


Display, set or change variable.
sh
Open a standard shell.
shutdown
Bring the system down.
sleep
Wait for a given period.
sort
Sort lines of text files.
ssh
Secure shell.
stty
Change and print terminal line settings.
su
Switch user, substitute user, super user
tail
Output the last part of files.
tar
Archiving utility.
top
Display top CPU processes.
touch
Change file timestamps.
ulimit
Control resources.
uncompress
Decompress compressed files.
uniq
Remove duplicate lines from a sorted file.
uptime
Display system uptime and average load.
vi(m)
Start the vi (improved) editor.
vimtutor
The Vim tutor.
w
Show who is logged on and what they are doing.
wc
Print the number of bytes, words and lines in files.
which
Shows the full path of (shell) commands.
who
Show who is logged on.
who am i
Print effective user ID.

UNIX Quick Reference


The command files you have created all semester are useful for quick reference. The file
UNIX_quick_reference.pdf contains some handy UNIX commands all on one page. The file can be downloaded
from BlackBoard. The reference sheet was put together by a group in Minnesota.

The End

71

Vous aimerez peut-être aussi