Académique Documents
Professionnel Documents
Culture Documents
Published 2005-08-17
Table of Contents
1. Text Encoding and Word Counting...................................................................................................... 7
Discussion .......................................................................................................................................... 7
Files .......................................................................................................................................... 7
Text Encoding........................................................................................................................... 8
Internationalization (i18n) ...................................................................................................... 12
Revisiting cat, head, and tail ................................................................................................. 15
The wc (Word Count) Command ........................................................................................... 17
Examples.......................................................................................................................................... 18
Example 1. Counting Characters ............................................................................................ 18
Example 2. Invisible Characters Are Important, Too ............................................................. 18
Example 3. Whats My Line?................................................................................................. 18
Example 4. I Want It All......................................................................................................... 19
Example 5. Linux, Dos, and Macintosh Files ........................................................................ 19
Example 6. Counting Users.................................................................................................... 19
Example 7. Counting Processes ............................................................................................. 19
Online Exercises............................................................................................................................... 20
Specification ........................................................................................................................... 20
Deliverables ............................................................................................................................ 21
Hints ....................................................................................................................................... 22
Questions.......................................................................................................................................... 22
2. Finding Text: grep................................................................................................................................ 25
Discussion ........................................................................................................................................ 25
Searching Text File Contents using grep ............................................................................... 25
Show All Occurrences of a String in a File ............................................................................ 26
Searching in Several Files at Once ......................................................................................... 27
Searching Directories Recursively ......................................................................................... 27
Inverting grep ......................................................................................................................... 28
Getting Line Numbers ............................................................................................................ 28
Limiting Matching to Whole Words....................................................................................... 29
Ignoring Case.......................................................................................................................... 29
Examples.......................................................................................................................................... 30
Example 1. Finding Simple Character Strings ....................................................................... 30
Example 2. In That Case ........................................................................................................ 30
Example 3. Matching Whole Words....................................................................................... 30
Example 4. Combining grep and xargs ................................................................................. 31
Online Exercises............................................................................................................................... 32
Specification ........................................................................................................................... 32
Deliverables ............................................................................................................................ 33
Questions.......................................................................................................................................... 33
3. Introduction to Regular Expressions ................................................................................................. 37
Discussion ........................................................................................................................................ 37
Introducing Regular Expressions............................................................................................ 37
Regular Expressions, Extended Regular Expressions, and the grep Command ....................39
Anatomy of a Regular Expression.......................................................................................... 39
Taking Literals Literally ......................................................................................................... 40
iii
Wildcards................................................................................................................................ 40
Common Modifier Characters ................................................................................................ 42
Anchored Searches ................................................................................................................. 44
Coming to Terms with Regex Grouping................................................................................. 45
Escaping Meta-Characters...................................................................................................... 46
Summary of Linux Regular Expression Syntax ..................................................................... 46
Regular Expressions are NOT File Globbing ......................................................................... 47
Where to Find More Information About Regular Expressions ..............................................48
Examples.......................................................................................................................................... 48
Example 1. Literal Searches ................................................................................................... 48
Example 2. Range Expressions .............................................................................................. 48
Example 3. REGEX Modifiers ............................................................................................... 49
Example 4. Anchored Searches .............................................................................................. 49
Example 5. REGEX Term Grouping...................................................................................... 50
Example 6. Is elvis in the House? .......................................................................................... 50
Example 7. Searching for Telephone Numbers ...................................................................... 51
Online Exercises............................................................................................................................... 53
Specification ........................................................................................................................... 54
Deliverables ............................................................................................................................ 55
Questions.......................................................................................................................................... 55
4. Everything Sorting: sort and uniq ..................................................................................................... 59
Discussion ........................................................................................................................................ 59
The sort Command................................................................................................................. 59
The uniq Command ............................................................................................................... 64
Examples.......................................................................................................................................... 67
Example 1. Sorting the Output of ps aux............................................................................... 67
Example 2. Using sort and uniq to Collect Information on Running Processes ...................69
Online Exercises............................................................................................................................... 70
Specification ........................................................................................................................... 70
Deliverables ............................................................................................................................ 72
Questions.......................................................................................................................................... 73
5. Extracting and Assembling Text: cut and paste ............................................................................... 77
Discussion ........................................................................................................................................ 77
The cut Command.................................................................................................................. 77
The paste Command .............................................................................................................. 81
Examples.......................................................................................................................................... 82
Example 1. Handling Free-Format Records ........................................................................... 82
Example 2. Living With Fixed-Format Records .................................................................... 83
Example 3. Using (and Misusing) a Space as a Delimiter .....................................................83
Example 4. Examples of Pasting ............................................................................................ 84
Online Exercises............................................................................................................................... 84
Specification ........................................................................................................................... 85
Deliverables ............................................................................................................................ 86
Questions.......................................................................................................................................... 87
iv
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat
Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated,
stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc.
If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com
or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat
Academy. Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated,
stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc.
If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com
or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
vi
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
When storing text, computers transform characters into a numeric representation. This process is
referred to as encoding the text.
In order to accommodate the demands of a variety of languages, several different encoding techniques
have been developed. These techniques are represented by a variety of character sets.
The oldest and most prevalent encoding technique is known as the ASCII character set, which still
serves as a least common denominator among other techniques.
The wc command counts the number of characters, words, and lines in a file. When applied to
structured data, the wc command can become a versatile counting tool.
The cat command has options that allow representation of nonprinting characters such as NEWLINE.
The head and tail commands have options that allow you to print only a certain number of lines or a
certain number of bytes (one byte usually correlates to one character) from a file.
Discussion
In this Workbook, we begin looking at various tools for searching, sorting, extracting, and manipulating
text. Because Linux, and Unix before it, has a strong tradition of storing data in human readable text
formats, these tools should be thought of as not only aiding composition, but data manipulation in
general.
Files
What are Files?
Linux, like most operating systems, stores information that needs to be preserved outside of the context
of any individual process in files. (In this context, and for most of this Workbook, the term file is meant in
the sense of regular file). Linux (and Unix) files store information using a simple model: information is
stored as a single, ordered array of bytes, starting from at first and ending at the last. The number of bytes
in the array is the length of the file. 1
What type of information is stored in files? Here are but a few examples.
The characters that compose the book report you want to store until you can come back and finish it
tomorrow are stored in a file called (say) ~/bookreport.txt.
The individual colors that make up the picture you took with your digital camera are stored in the file
(say) /mnt/camera/dcim/100nikon/dscn1203.jpg.
The characters which define the usernames of users on a Linux system (and their home directories,
etc.) are stored in the file /etc/passwd.
The specific instructions which tell an x86 compatible CPU how to use the Linux kernel to list the files
in a given directory are stored in the file /bin/ls.
What is a Byte?
At the lowest level, computers can only answer one type of question: is it on or off? What is it? When
dealing with disks, it is a magnetic domain which is oriented up or down. When dealing with memory
chips, it is a transistor which either has current or doesnt. Both of these are too difficult to mentally
picture, so we will speak in terms of light switches that can either be on or off. To your computer, the
contents of your file is reduced to what can be thought of as an array of (perhaps millions of) light
switches. Each light switch can be used to store one bit of information (is it on, or is it off).
Using a single light switch, you cannot store much information. To be more useful, an early convention
was established: group the light switches into bunches of 8. Each series of 8 light switches (or magnetic
domains, or transistors, ...) is a byte. More formally, a byte consists of 8 bits. Each permutation of ons and
offs for a group of 8 switches can be assigned a number. All switches off, well assign 0. Only the first
switch on, well assign 1; only the second switch on, 2; the first and second switch on, 3; and so on. How
many numbers will it take to label each possible permutation for 8 light switches? A mathematician will
quickly tell you the answer is 2^8, or 256. After grouping the light switches into groups of eight, your
computer views the contents of your file as an array of bytes, each with a value ranging from 0 to 255.
Data Encoding
In order to store information as a series of bytes, the information must be somehow converted into a
series of values ranging from 0 to 255. Converting information into such a format is called data encoding.
Whats the best way to do it? There is no single best way that works for all situations. Developing the
right technique to encode data, which balances the goals of simplicity, efficiency (in terms of CPU
performance and on disk storage), resilience to corruption, etc., is much of the art of computer science.
As one example, consider the picture taken by a digital camera mentioned above. One encoding
technique would divide the picture into pixels (dots), and for each pixel, record three bytes of
information: the pixels "redness", "greenness", and "blueness", each on a scale of 0 to 255. The first
three bytes of the file would record the information for the first pixel, the second three bytes the second
pixel, and so on. A picture format known as "PNM" does just this (plus some header information, such as
how many pixels are in a row). Many other encoding techniques for images exist, some just as simple,
many much more complex.
Text Encoding
Perhaps the most common type of data which computers are asked to store is text. As computers have
developed, a variety of techniques for encoding text have been developed, from the simple in concept
(which could encode only the Latin alphabet used in Western languages) to complicated but powerful
techniques that attempt to encode all forms of human written communication, even attempting to include
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
ASCII
One of the oldest, and still most commonly used techniques for encoding text is called ASCII encoding.
ASCII encoding simply takes the 26 lowercase and 26 uppercase letters which compose the Latin
alphabet, 10 digits, and common English punctuation characters (those found on a keyboard), and maps
them to an integer between 0 and 255, as outlined in the following table.
Table 1-1. ASCII Encoding of Printable Characters
Integer Range
Character
33-47
Punctuation: !"#$%&;*(*+,-./
48-57
58-64
Punctuation: :;<=?>@
65-90
91-96
Punctuation: [\]^_
97-122
123-126
Punctuation: {|}~
What about the integers 0 - 32? These integers are mapped to special keys on early teletypes, many of
which have to do with manipulating the spacing on the page being typed on. The following characters are
commonly called "whitespace" characters.
Table 1-2. ASCII Encoding of Whitespace Characters
Integer
Character
Common Name
Common
Representation
BS
Backspace
\b
HT
Tab
\t
10
LF
Line Feed
\n
12
FF
Form Feed
\f
13
CR
Carriage Return
\r
32
SPACE
Space Bar
127
DEL
Delete
Others of the first 32 integers are mapped to keys which did not directly influence the "printed page", but
instead sent "out of band" control signals between two teletypes. Many of these control signals have
special interpretations within Linux (and Unix).
Table 1-3. ASCII Encoding of Control Signals
Integer
Character
Common Name
Common
Representation
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Character
Common Name
EOT
End of Transmission
BEL
27
ESC
Escape
Common
Representation
\a
Generating Control Characters from the Keyboard: Control and whitespace characters can be
generated from the terminal keyboard directly using the CTRL key. For example, an audible bell can
be generated using CTRL-G, while a backspace can be sent using CTRL-H, and we have already
mentioned that CTRL-D is used to generate an "End of File" (or "End of Transmission"). Can you
determine how the whitespace and control characters are mapped to the various CTRL key
combinations? For example, what CTRL key combination generates a tab? What does CTRL-J
generate? As you explore various control sequences, remember that the reset command will restore
your terminal to sane behavior, if necessary.
What about the values 128-255? ASCII encoding does not use them. The ASCII standard only defines
the first 128 values of a byte, leaving the remaining 128 values to be defined by other schemes.
Formal Name
Description
Latin-1
ISO 8859-1
Latin-2
ISO 8859-2
Arabic
ISO 8859-6
Latin/Arabic
Greek
ISO 8859-7
Latin/Greek
Latin-9
ISO 8859-15
All of these character encoding schemes use a common technique. They preserve the first 128 values of a
byte to encode traditional ASCII, and use the remaining 128 values to encode glyphs unique to the
particular encoding. For example, ISO 8859-1 (Latin-1) uses the value 196 to encode a Latin capital A
with an umlaut (), while ISO-8859-7 (Greek) uses the value 196 to encode the Greek capital letter
Delta (), but both use the value 101 to encode a Latin lowercase e.
Notice a couple of implications about ISO 8859 encoding.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
10
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used,
copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Unicode (UCS)
In order to overcome the limitations of ASCII and ISO 8859 based encoding techniques, a Universal
Character Set has been developed, commonly referred to as UCS, or Unicode. The Unicode standard
acknowledges the fact that one byte of information, with its ability to encode 256 different values, is
simply not enough to encode the variety of glyphs found in human communication. Instead, the Unicode
standard uses 4 bytes to encode each character. Think of 4 bytes as 32 light switches. If we were to again
label each permutation of on and off for 32 switches with integers, the mathematician would tell you that
you would need 4,294,967,296 (over 4 billion) integers. Thus, Unicode can encode over 4 billion glyphs
(nearly enough for every person on the earth to have their own unique glyph; the user prince would
approve).
What are some of the features and drawbacks of Unicode encoding?
Scale
The Unicode standard will easily be able to encode the variety of glyphs used in human
communication for a long time to come.
Simplicity
The Unicode standard does have the simplicity of a sledgehammer. The number of bytes required to
encode a set of characters is simply the number of characters multiplied by 4.
Waste
While the Unicode standard is simple in concept, it is also very wasteful. The ability to encode 4
billion glyphs is nice, but in reality, much of the communication that occurs today uses less than a
few hundred glyphs. Of the 32 bits (light switches) used to encode each character, the first 20 or so
would always be "off".
ASCII Non-compatibility
For better or for worse, a huge amount of existing data is already ASCII encoded. In order to convert
fully to Unicode, that data, and the programs that expect to read it, would have to be converted.
The Unicode standard is an effective standard in principle, but in many respects it is ahead of its time,
and perhaps forever will be. In practice, other techniques have been developed which attempt to preserve
the scale and versatility of Unicode, while minimizing waste and maintaining ASCII compatibility. What
must be sacrificed? Simplicity.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
11
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Internationalization (i18n)
As this Workbook continues to discuss many tools and techniques for searching, sorting, and
manipulating text, the topic of internationalization cannot be avoided. In the open source community,
rha030-3.0-0-en-2005-08-17T07:23:17-0400
12
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
en_US.UTF-8
[elvis@station elvis]$ chmod 666 /etc/passwd
chmod: Beim Setzen der Zugriffsrechte fr /etc/passwd: Die Operation ist nicht erlaubt
More subtly, the choice of a particular language has implications for sorting orders, numeric formats, text
encoding, and other issues.
Role
LL
CC
enc
The locale command can be used to examine your current configuration (as can echo $LANG), while
locale -a will list all settings currently supported by your system. The extent of the support for any given
language will vary.
The following tables list some selected language codes, country codes, and code set specifications.
Table 1-6. Selected ISO 639 Language Codes
Code
Language
de
German
el
Greek
en
English
es
Spanish
fr
French
ja
Japanese
13
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Language
zh
Chinese
Country
CA
Canada
CN
China
DE
Germany
ES
Spain
FR
France
GB
Britain (UK)
GR
Greece
JP
Japan
NG
Nigeria
US
United States
Country
utf8
UTF-8
iso88591
iso885915
iso88596
iso88592
See the gettext info pages (info gettext, or pinfo gettext) for a complete listing.
14
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy.
Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or
otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being
used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Effect
-E
-T
-v
display whitespace and control characters as ^n, with n indicating the CTRL sequence
for the nonprinting character.
-A
-t
-e
As an example, in the following, the cat command is used to display the contents of the /etc/hosts
configuration file.
[student@station student]$ cat /etc/hosts
Using the -A command line switch, the whitespace structure of the file becomes evident, as tabs are
replaced with ^I, and line feeds are decorated with $.
[student@station student]$ cat -A /etc/hosts
rha030-3.0-0-en-2005-08-17T07:23:17-0400
15
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
elvis$
blondie$
prince$
madonna$
Had this file been created on a Microsoft or Macintosh operating system, and copied into Linux, the files
would look like the following.
[student@station student]$ cat -A musicians.dos
elvis^M$
blondie^M$
prince^M$
madonna^M$
[student@station student]$ cat -A musicians.mac
elvis^Mblondie^Mprince^Mmadonna^M[student@station student]$
Linux (and Unix) text files generally adhere to a convention that the last character of the file must be a
line feed for the last line of text. Following the cat of the file musicians.mac, which does not contain
any conventional Linux line feed characters, the bash prompt is not displayed in its usual location.
Table 1-10. Command Line Switches for the head Command
Switch
Effect
-N , -nN
-cN
16
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Effect
-N , -nN
Display the last N lines of the file. If N is prepended by a +, display the remainder of the
file, starting at the Nth line.
-cN
Switch
Results
-c
-l
-w
filename
When used without any command line switches, wc will report on the number of characters, lines, and
words. Command line switches can be combined to return any combination of character count, line count
or word count.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
17
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Examples
Example 1. Counting Characters
To count the characters in a file, just run wc -c:
[student@station student]$ echo hello | wc -c
In addition to the five letters in the word, the line also has a NL character at the end.
14
Keep in mind that spaces and TABs count as characters, too. Remember our typewriter analogy? Both
the spacebar and the TAB key require keystrokes; each character in a text file corresponds to a press of a
typewriter key.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
18
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
student]$
student]$
student]$
student]$
3 foo
station]$
student]$
student]$
student]$
31 x
4
4
0
8
4
4
4
12
29
33
29
91
musicians
musicians.dos
musicians.mac
total
For the file musicians.mac, which did not contain any conventional Linux line feed characters,
the number of lines is reported as 0.
In the above output, why does the file musicians.dos have 33 characters, while musicians and
musicians.mac only 29?
rha030-3.0-0-en-2005-08-17T07:23:17-0400
19
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
USER
root
root
root
root
...
VSZ
1384
0
0
0
RSS
76
0
0
0
TTY
?
?
?
?
STAT
S
SW
SW
SWN
START
Sep28
Sep28
Sep28
Sep28
TIME
0:04
0:00
0:00
0:00
COMMAND
init [
[keventd]
[kapmd]
[ksoftirqd_CPU0]
The tail command, with its ability to print the remainder of a file starting from a specified line, can be
used to remove the header line.
[student@station student]$ ps aux | tail +2
root
root
root
root
root
...
1
2
3
4
9
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
1384
0
0
0
0
76
0
0
0
0
?
?
?
?
?
S
SW
SW
SWN
SW
Sep28
Sep28
Sep28
Sep28
Sep28
0:04
0:00
0:00
0:00
0:00
init [
[keventd]
[kapmd]
[ksoftirqd_CPU0]
[bdflush]
The following short script combines ps aux, tail +2, and wc, to create a new command called nprocs.
When made executable, and placed the ~/bin directory (which is part of the standard executable search
PATH), the script becomes available from the command line.
[student@station student]$ cat nprocs
#!/bin/bash
ps aux | tail +2 | wc -l
[student@station
[student@station
[student@station
[student@station
student]$
student]$
student]$
student]$
mkdir bin
mv nprocs bin
chmod a+x bin/nprocs
nprocs
86
Online Exercises
Lab Exercise
Objective: Use the wc command as a counting tool.
Estimated Time: 10 mins.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
20
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used,
copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Specification
1. Create the file ~/gplwords.txt, which contains the number of words (as reported by the wc
command) in the file /usr/share/doc/redhat-release-4ES/GPL as its only word.
2. Create the file ~/localusers.txt, which contains the number of locally defined users as its only
word.
3. Statically compiled libraries conventionally live in the /usr/lib directory, and have names that
start lib and end with a .a extension. Create the file ~/usrlibs.txt, which contains the number
of files whose name follows this convention in the /usr/lib directory as its only word. (Do not
include subdirectories.)
4. Create an executable script called ~/bin/nrecent. The script should expect a single argument,
which is the name of a directory. Upon execution, the script should return a single number, which is
the number of files in the directory which have been modified in the last 24 hours. The script should
generate no error messages about unaccessible directories on the standard error stream.
If you have implemented the exercises correctly, you should be able to reproduce output akin to the
following. (Do not be concerned if your actual numbers differ from those listed below).
[student@station student]$ head *.txt
22
Deliverables
1. A file called ~/gplwords.txt, which contains the number of words found in the file
/usr/share/doc/redhat-release-4ES/GPL.
2. A file called ~/localusers.txt, which contains the number of locally defined users on the Linux system.
3. A file called ~/usrlibs.txt, which contains the number of files that begin lib and end .a found in the
/usr/lib directory.
4. An executable script called ~/bin/nrecent, which expects the name of a directory as its single argument.
Upon execution, the script would return a single number which is the number of files under the specified
rha030-3.0-0-en-2005-08-17T07:23:17-0400
21
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Hints
For the file ~/localusers.txt, recall that local users are defined in the /etc/passwd file, one user
per line.
For the script ~/bin/nrecent, recall that $1 dereferences to a bash scripts first argument. Consider
using the find command to generate a list of files that match the criteria, and then count the number of
lines (or words) in the output. You might want to use the /etc or /var/log directories to test your
script.
Questions
1. Create an empty file using the touch foo command. How many characters does it contain?
( ) a. 0
( ) b. 2
( ) c. 1
( ) d. The wc command does not work on empty files.
( ) e. None of the above
2. Create a file using the echo > foo command. How many characters does it have?
( ) a. 2
( ) b. 0
( ) c. 1
( ) d. The wc command does not work on empty files.
( ) e. None of the above
3. Create a file using echo -e \n\n\n\n > foo; how many words does it have?
( ) a. 2
( ) b. 1
( ) c. 4
( ) d. 5
( ) e. 0
rha030-3.0-0-en-2005-08-17T07:23:17-0400
22
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
4. Which of the following command lines would generate a single word output, which is the sum of the number of
words found in the files /etc/services and /etc/hosts?
( ) a. cat /etc/services /etc/hosts | wc -w
( ) b. wc -w < /etc/hosts /etc/services
( ) c. wc -w /etc/hosts /etc/services
( ) d. A and C
( ) e. All of the above
5. Which of the following command lines would generate a single word output, which is the number of users logged
into the local machine (as reported by the w command)?
( ) a. w | wc -u
( ) b. w | tail -3 | wc -w
( ) c. w | tail +3 | wc -l
( ) d. w | tail +USER | wc -c
( ) e. None of the above
Use the following transcript to answer the next two questions.
[student@station student]$ cat /etc/adjtime
rha030-3.0-0-en-2005-08-17T07:23:17-0400
23
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
4 /etc/group
[student@station student]$ ls -l /etc/group
-rw-r--r--
1 root
root
Notes
1. While this may seem an obvious way to do things, some operating systems take more elaborate
approaches. The Macintosh operating system, for example, stores file using two arrays of
information, a data fork and a resource fork.
24
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
grep is a command that prints lines that match a specified text string or pattern.
grep -v prints lines that do NOT match a specified text string or pattern.
Many other command line switches allow users to specify greps output format.
Discussion
Searching Text File Contents using grep
In an earlier Lesson, we saw how the wc program can be used to count the characters, words and lines in
text files. In this Lesson we introduce the grep program, a handy tool for searching text file contents for
specific words or character sequences.
The name grep stands for general regular expression parser. What, you may well ask, is a regular
expression and why on earth should I want to parse one? We will provide a more formal definition of
regular expressions in a later Lesson, but for now it is enough to know that a regular expression is simply
a way of describing a pattern, or template, to match some sequence of characters. A simple regular
expression would be Hello, which matches exactly five characters: H, e, two consecutive l
characters, and a final o. More powerful search patterns are possible and we shall examine them in the
next section.
The figure below gives the general form of the grep command line:
Figure 2-1. Form of the grep commands
There are actually three different names for the grep tool 1:
fgrep
Does a fast search for simple patterns. Use this command to quickly locate patterns without any
wildcard characters, useful when searching for an ordinary word.
25
Effect
-c
-h
-e
Use expression as a search pattern. (Helpful for specifying several alternate patterns.)
expression
-i
-l
-n
-q
"Quiet". Do not write anything to standard out. Instead, exit with a zero exit status if
any match is found.
-r
-w
-C
Include two lines of context before and after the matched line.
Reading the file, we see that the file does indeed contain the letters even. Using this method on a large
file suffers because we could easily miss one word in a file of several thousand, or even several hundred
thousand, words. We can use the grep tool to search through the file for us in an automatic search:
[student@station student]$ grep even file
rha030-3.0-0-en-2005-08-17T07:23:17-0400
26
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
student]$
student]$
student]$
student]$
echo
echo
echo
grep
Every cat has one more tail than no cat. > general
No cat has nine tails. > specific
Therefore, every cat has ten tails. > fallacy
cat general specific fallacy
Perhaps we are more interested in just discovering which file mentions the word nine than actually
seeing the line itself. Adding the -l switch to the grep line does just that:
[student@station student]$ grep -l nine general specific fallacy
specific
rha030-3.0-0-en-2005-08-17T07:23:17-0400
27
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
/etc/sysconfig/network-scripts/ifcfg-eth0:DEVICE=eth0
/etc/sysconfig/networking/devices/ifcfg-eth0:DEVICE=eth0
/etc/sysconfig/networking/profiles/default/ifcfg-eth0:DEVICE=eth0
This shows a common use of grep as a filter to simplify the outputs of other commands.
If only the names of the files were of interest, the output can be simplified with the -l command line
switch.
[student@station student]$ grep -rl eth0 /etc/sysconfig 2>/dev/null
/etc/sysconfig/network-scripts/ifup-aliases
/etc/sysconfig/network-scripts/ifup-ipv6
/etc/sysconfig/network-scripts/ifcfg-eth0
/etc/sysconfig/networking/devices/ifcfg-eth0
/etc/sysconfig/networking/profiles/default/ifcfg-eth0
Inverting grep
By default, grep shows only the lines matching the search pattern. Usually, this is what you want, but
sometimes you are interested in the lines that do not match the pattern. In these instances, the -v
command line switch inverts greps operation.
[student@station student]$ head -n 4 /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:
daemon:x:2:2:daemon:/sbin
adm:x:3:4:adm:/var/adm:
[student@station student]$ grep -v root /etc/passwd | head -n 3
bin:x:1:1:bin:/bin:
daemon:x:2:2:daemon:/sbin:
adm:x:3:4:adm:/var/adm:
12526:dictionary
You might also want to combine the -n switch with the -r switch when searching all the files below a
directory:
[student@station station]$ fgrep -nr dictionary /usr/share/dict
linux.words:12526:dictionary
rha030-3.0-0-en-2005-08-17T07:23:17-0400
28
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
The cat
sat on
the mat
at home.
Suppose we wanted to retrieve all lines containing the word at. If we try the command:
[student@station student]$ fgrep at rhyme
The cat
sat on
the mat
at home.
Do you see what happened? We matched the at string, whether it was an isolated word or part of a
larger word. The grep command provides the -w switch to imply that the specified pattern should only
match entire words.
[student@station student]$ grep -w at file
at home.
The -w switch considers a sequence of letters, numbers, and underscore characters, surrounded by
anything else, to be a word.
Ignoring Case
The string Bob has quite a meaning quite different from the string bob. However, sometimes we
want to find either one, regardless of whether the word is capitalized or not. The grep -i command solves
just this problem.
Look again at our nursery rhyme:
[student@station student]$ cat rhyme
The cat
sat on
the mat
at home.
See if the file contains the word the, all in lowercase letters:
[student@station student]$ grep the rhyme
the mat
rha030-3.0-0-en-2005-08-17T07:23:17-0400
29
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
1:The cat
3:the mat
Notice that we also used the -n switch to add the line numbers to the output.
Examples
Example 1. Finding Simple Character Strings
Verify that your computer has the system account lp, used for the line printer tools. Hint: the file
/etc/passwd contains one line for each user account on the system.
[student@station student]$ grep lp /etc/passwd
lp:x:4:7:lp:/var/spool/lpd:
Nothing was matched because the pattern does not match the case for the account name. Search again
and ignore the case:
[student@station student]$ grep -i LP /etc/passwd
lp:x:4:7:/var/spool/lpd:
honey
honeybee
honeycomb
honeycombed
honeydew
honeymoon
honeymooned
rha030-3.0-0-en-2005-08-17T07:23:17-0400
30
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Evidently, the dictionary contains several words using the string honey as a root word. We can limit
the matching to whole words by using the grep -w command. The grep command considers a word to be
a group of letters, digits, or underscores surrounded by anything else. The beginning and end of a line
also qualifies as anything else, so the first or last word on a line is recognized correctly. Try to lookup
honey in the dictionary again:
[student@station student]$ grep -w honey /usr/share/dict/words
honey
/usr/share/doc/vim-common-6.1/docs/message.txt
/usr/share/doc/vim-common-6.1/docs/options.txt
/usr/share/doc/vim-common-6.1/docs/os_risc.txt
/usr/share/doc/vim-common-6.1/docs/tags
/usr/share/doc/vim-common-6.1/docs/todo.txt
/usr/share/doc/vim-common-6.1/docs/various.txt
/usr/share/doc/vim-common-6.1/docs/version5.txt
You would now like to open each of these files in the gedit text editor, so that you can make your edits.
You pipe the results of your search into the gedit command.
[student@station student]$ grep -ril commandline /usr/share/doc/vim * | gedit
The gedit editor opens, but with an empty buffer titled "untitled". This is not what you had meant! You
had wanted gedit to open the filenames that the grep command supplied on stdin, not stdin itself.
Unfortunately, thats not how gedit works. gedit (like most text editors) expects filenames to be supplied
as arguments on the command line, not using stdin.
Fortunately, there is a standard Linux (and Unix) utility that helps in just such situations: xargs. The
xargs command will read stdin, and append any words found there to the supplied command line, as
additional arguments. Hopefully, the following example will clarify. With your knowledge of the xargs
command, you modify your previous approach.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
31
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Now, the gedit editor opens up with multiple buffers, one for each file output by the grep command.
Figure 2-2. Using xargs to Convert Standard In into Arguments for gedit
Notice that you never had to type in the individual file names. The words supplied on stdin were
exchanged for arguments on the command line, thus the name xargs. Nice.
Online Exercises
Lab Exercise
Objective: Use the grep command to find occurrences of specified text.
Estimated Time: 10 mins.
Specification
1. Create the file ~/bashusers.txt, which contains lines from the /etc/passwd file which contain
the text /bin/bash.
2. Create the file ~/nostdhome.txt, which contains only lines from the /etc/passwd file which do
not contain the text home (implying that the associated user has a nonstandard home directory).
3. Create the file ~/ansiterms.txt, which contains every line from the /etc/termcap file which
contains the text ansi, using a case insensitive search. (In other words, ansi, ANSI, Ansi, and AnSi
would all count as matches).
4. Create the file ~/mayhemnum.txt, which contains the line number of the word mayhem from the
file /usr/share/dict/words as its only word.
32
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used,
copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Deliverables
1. The file ~/bashusers.txt, which contains lines from the /etc/passwd file which contain the text /bin/bash.
2. The file ~/nostdhome.txt, which contains lines from the /etc/passwd file which do not contain the text
home.
3. The file ~/ansiterms.txt, which contains every line from the /etc/termcap file which contains the text
ansi, using a case insensitive search.
4. The file ~/mayhemnum.txt, which contains the line number of the word mayhem from the file
/usr/share/dict/words as its only word.
5. The file ~/firstredhat.txt, which contains an alphabetically sorted list of all files underneath the
/usr/share/firstboot directory that contain the text redhat, using a case insensitive comparison. The files
should be listed one per line using absolute references.
Questions
1. Which of the following command lines would list lines from the file /etc/group which contain the text elvis?
( ) a. grep /etc/group elvis
( ) b. echo elvis | grep /etc/group
( ) c. echo /etc/group | grep elvis
( ) d. A and C
( ) e. None of the above
2. To allow the search pattern HELLO to match both hello and HELLO, you would use the grep command with
which command line switch?
( ) a. -i
( ) b. -r
( ) c. -w
( ) d. -k
33
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a
violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in
electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
34
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
35
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied,
or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Notes
1. When the original grep program was written, computers did not have much memory, so having small
programs was very desirable. Having a single program that could efficiently implement the three
different types of searching was not possible unless the program were to be extraordinarily large, so
the functions were separated into three programs. When the GNU project was started, computers
could easily handle larger programs, so all three searching techniques were built into a single
program but all three program names were kept for compatibility with other UNIX-like systems.
2. There are some files under /etc/sysconfig that ordinary users cannot read. We have used the
2>/dev/null idiom to have all error messages be ignored.
36
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Regular expressions are a standard Unix syntax for specifying text patterns.
Regular expressions are understood by many commands, including grep, sed, vi, and many scripting
languages.
Within regular expressions, ^ and $ specify the beginning and end of a line.
Discussion
Introducing Regular Expressions
In the previous chapter you saw grep used to match either a whole word or part of a word. This by its
self is very powerful, especially in conjunction with arguments like -i and -v, but it is not appropriate for
all search scenarios. Here are some examples of searches that the grep usage youve learned so far would
not be able to do:
First, suppose you had a file that looked like this:
[biafra@station]$ cat people_and_pets.txt
==========================
Name: Joe Green
Age: 36
Pets:
Name: Aida
Age: 5
Species: Cat
-----------Name: Hawn
Age: 1
Species: Goldfish
==========================
Name: Sarah Jane
Age: 29
Pets:
37
What if you wanted to pull out just the names of the people in people_and_pets.txt? A command like
grep -w Name: would match the Name: line for each person, but also the Name: line for each
persons pet. How could we match only the Name: lines for people? Well, notice that the lines for pets
names are all indented, meaning that those lines begin with whitespace characters instead of text. Thus,
we could achieve our goal if we had a way to say "Show me all lines that begin with Name:".
Another example: Suppose you and a friend both witnessed a hit-and-run car accident. You both got a
look at the fleeing cars license plate and yet each of you recalls a slightly different number. You read the
license number as "4I35VBB" but your friend read it as "413SV88". It seems that what you read as an I
in the second character, your friend read as a 1. Similar differences appear in your interpretations of
other parts of the license like 5 vs S and BB vs 88. The police, having taken both of your
statements, now need to narrow down the suspects by querying their database of license plates for plates
that might match what you saw.
One solution might be to do separate queries for "4I35VBB" and "413SV88" but doing so assumes that
one of you is exactly right. What if the perpetrators license number was actually "4135VB8"? In other
words, what if you were right about some of the characters in question but your friend was right about
others? It would be more effective if the police could query for a pattern that effectively said: "Show me
all license numbers that begin with a 4, followed by an I or a 1, followed by a 3, followed by a 5
or an S, followed by a V, followed by two characters that are each either a B or an 8".
Query scenarios like these can be solved using regular expressions. While computer scientists sometimes
use the term "regular expression" (or "regex" for short) to describe any method of describing complex
patterns, in Linux and many programming languages the term refers to a very specific set of special
characters used for solving problems like the above. Regular expressions are supported by a large
number of tools including grep, vi, find and sed.
To introduce the usage of regular expressions, lets look at some solutions to two problems introduced
earlier. Dont worry if these seem a bit complicated, the remainder of the unit will start from scratch and
cover regular expressions in great detail.
A regex that could solve the first problem, where we wanted to say "Show me all lines that begin with
Name:" might look like this:
[biafra@station]$ grep ^Name: people_and_pets.txt
...thats it! Regular expressions are all about the use of special characters, called metacharacters to
represent advanced query parameters. The carat ("^"), as shown here, means "Lines that begin with...".
Note, by the way, that the regular expression was put in single-quotes. This is a good habit to get into
early on as it prevents bash from interpreting special characters that were meant for grep.
Ok, so what about the second problem? That one involved a much more complicated query: "Show me
all license numbers that begin with a 4, followed by an I or a 1, followed by a 3, followed by a 5
rha030-3.0-0-en-2005-08-17T07:23:17-0400
38
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Wow, thats pretty short considering how long it took to write out what we were looking for! There are
only two types of regex metacharacters used here: square braces ([]) and curly braces ({}). When two
or more characters are shown within square braces it means "any one of these". So [B8] near the end of
the expression means "B or 8". When a number is shown within curly braces it means "this many of
the preceding character". Thus, [B8]{2} means "two characters that are each either a B or an 8".
Pretty powerful stuff!
Now that youve gotten a taste of what regular expressions are and how they can be used, lets start from
scratch and cover them in depth.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
39
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Wildcards
The "dot" wildcard
The character . is used as a placeholder, to match one of any character. In the following example, the
pattern matches any occurrence of the literal characters x and s, separated by exactly two other
characters.
[student@station student]$ grep "x..s" /usr/share/dict/words | head -5
antitoxins
axers
axles
rha030-3.0-0-en-2005-08-17T07:23:17-0400
40
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
absenteeism
Achaean
Achaeans
acquaint
acquaintance
[student@station student]$ egrep [aeiou][^aeiou][aeiou][^aeiou][aeiou][^aeiou]
/usr/share/dict/words | head -5
abased
abasement
abasements
abases
abasing
Range Expressions vs. Character Classes: Old School and New School
Another way to express a character range is by giving the start- and end-letters of the sequence this way:
[a-d] would match any character from the set a, b, c or d. A typical usage of this form would be
[0-9] to represent any single digit, or [A-Z] to represent all capital letters.
How are the characters ordered? For example, does uppercase C come before or after lowercase b?
Recall the discussion about character encoding from the first Lesson. The encoded value of the letter is
rha030-3.0-0-en-2005-08-17T07:23:17-0400
41
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Character Class
[:alnum:]
alphanumeric
A-Za-z0-9
[:alpha:]
alphabet character
A-Za-z
[:blank:]
space or tab
[:digit:]
numeric digit
0-9
[:lower:]
lowercase letters
a-z
[:punct:]
[:space:]
whitespace character
[:upper:]
uppercase letter
A-Z
Character classes avoid problems you may run into when using regular expressions on systems that use
different character encoding schemes where letters are ordered differently. For example, suppose you
were to run the command:
[elvis@station]$ grep [A-Z] /usr/share/dict/words
On a Red Hat Enterprise Linux system, this would match every word in the file, not just those that
contain capital letters as one might assume. This is because in unicode (utf-8), the character encoding
scheme that RHEL uses, characters are alphabetized case-insensitively, so that [A-Z] is equivalent to
[AaBbCc...etc]. On older systems, though, a different character encoding scheme is used where
alphabetization is done case-sensitively. On such systems [A-Z] would be equivalent to [ABC...etc].
Character classes avoid this pitfall. You can run:
[elvis@station]$ grep [[:upper:]] /usr/share/dict/words
on any system regardless of the encoding scheme being used and it will only match lines that contain
capital letters.
For more details about the predefined range expressions, consult the grep manual page. For more
information on character encoding schemes under Linux, refer back to chapter 8.3. To learn about how
character encoding schemes are used to support other languages in Red Hat Enterprise Linux, begin with
the locale manual page.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
42
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
The question mark (?) means either one or none: the literal character is considered to be
optional in the searched text. For example, the regex pattern ab?c matches the strings ac, and
abc, but not abbc.
b*
The asterisk (*) modifier means any number of (including zero) of the preceding literal
character. The regex pattern ab*c matches the strings ac, abc, abbc, and so on.
b+
The plus (+) modifier means one or more, so the regex pattern b+ matches a non-empty
sequence of bs. The regex pattern ab+c matches the strings abc and abbc, but does not
match ac.
b{m,n}
The brace modifier is used to specify a range of between m and n occurrences of the preceding
character. The regex pattern b{2,4} would match abbc and abbbc, and abbbbc, but not
abc or abbbbbc.
b{n}
With only one integer, the brace modifier is used to specify exactly n occurrences for the preceding
character.
In the following example, egrep prints lines from /usr/share/dict/words that contain patterns
which start with a (capital or lowercase) a, might or might not next have a (lowercase) b, but then
definitely follow with a (lowercase) a.
[student@station student]$ egrep [Aa]b?a /usr/share/dict/words | head -5
Aarhus
Aaron
Ababa
aback
abaft
The following example prints lines which contain patterns which start al, then use the . wildcard to
specify 0 or more occurrences of any character, followed by the pattern bra.
[student@station student]$
algebra
algebraic
algebraically
algebras
rha030-3.0-0-en-2005-08-17T07:23:17-0400
43
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Notice we found variations on the words algebra and calibrate. For the former, the .* expression
matched ge, while for the latter, it matched the letter i.
The expression .*, which is interpreted as "0 or more of any character", shows up often in regex
patterns, acting as the "stretchable glue" between two patterns of significance.
As a subtlety, we should note that the modifier characters are greedy: they always match the longest
possible input string. For example, given the regex pattern:
t.*e
instead of just the. When used in simple searches, such as grep, the difference is usually insignificant.
When regular regular expressions are used in find and replace operations, however, as is done with many
text editors, the difference becomes significant.
Anchored Searches
Four additional search modifier characters are available:
^foo
A caret (^) matches the beginning of a line. Our example ^foo matches the string foo only
when it is at the beginning of a line
foo$
A dollar sign ($) matches the end of a line. Our example foo$ matches the string foo only at
the end of a line, immediately before the newline character.
\<foo\>
By themselves, the less than sign (<) and the greater than sign (>) are literals. Using the
backslash character to escape them transforms them into meaning first of a word and end of a
word, respectively. Thus the pattern \>cat\< matches the word cat but not the word
catalog.
You will frequently see both ^ and $ used together. The regex pattern ^foo$ matches a whole line that
contains only foo and would not match that line if it contained any spaces.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
44
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
abbreviation
abbreviations
abduction
abductions
aberration
[student@station student]$ egrep ion$ /usr/share/dict/words | head -5
abbreviation
abduction
aberration
abjection
ablation
aqueous
dequeue
dequeued
dequeues
dequeuing
[student@station student]$ egrep (o|e){2}. *ee /usr/share/dict/words
bookkeeper
bookkeepers
bookkeeping
Chattahoochee
doorkeeper
rha030-3.0-0-en-2005-08-17T07:23:17-0400
45
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Escaping Meta-Characters
Sometimes you need to match a character that would ordinarily be interpreted as a regular expression
wildcard or modifier character. To temporarily disable the special meaning of these characters, simply
escape them using the backslash (\) character. For example, the regex pattern cat. would match the
letters cat followed by any character: cats or catchup. To match only the letters cat. at the
end of a sentence, use the regex pattern cat\. to disable interpreting the period as a wildcard
character.
Note one distracting exception to this rule. When the backslash character precedes a < or >
character, it enables the special interpretation (anchoring the beginning or ending of a word) instead of
disabling the special interpretation. Shudder. It even gets worse - see the footnote at the bottom of the
following table.
Role
Regex Syntax
Interpretation
wildcard
basic
[abc], [a-z]
inclusion range
basic
[^abc], [^a-z]
exclusion range
basic
modifier
extended
match 0 or 1 of
preceding term
modifier
basic
match 0 or more of
preceding term
modifier
extended
match 1 or more of
preceding term
{m,n}
modifier
extended
46
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Role
Regex Syntax
Interpretation
{n}
modifier
extended
match exactly n
occurrences of the
preceding term
anchor
basic
anchor
basic
\<
anchor
basic
mark beginning of a
word
\>
anchor
basic
(...)
grouping
basic
(... | ...)
grouping
extended
escape a
extended (basic)
Notes:
a. When using extended regular expressions, the backslash (usually) strips special interpretation from
the following character. Red Hat Enterprise Linux uses GNU extensions when parsing basic regular
expressions, however, which use the backslash to enable extendedish interpretation of the following
character. For example, the expression e\{3\} would match eee when using basic regular
expressions. shudder-shudder.
File Globbing
In the following example, the first argument is a regular expression, specifying text which starts with an
l and ends .conf, while the second argument is a file glob which specifies all files in the /etc
rha030-3.0-0-en-2005-08-17T07:23:17-0400
47
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Take a close look at the second line of output. Why was it matched by the specified regular expression?
In a similar vain, when specifying regular expressions on the bash command line, care must be taken to
quote or escape the regex meta-characters, lest they be expanded away by the bash shell with unexpected
results. In all of the examples found in this discussion, the first argument to the egrep command is
protected with single quotes for just this reason.
Examples
Example 1. Literal Searches
Now that we understand regular expressions in more detail, let us revisit some earlier examples and see
them in a new light.
Given the file rhyme that contains the text:
[student@station student]$ cat rhyme
the at in at.
The regular expression \<at\> matches only the individual word at.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
48
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
student]$
student]$
student]$
student]$
student]$
echo
echo
echo
echo
grep
car
far
student]$
student]$
student]$
student]$
echo
echo
echo
echo
ac
> file
abc
>> file
abbc >> file
abbbc >> file
The question mark (?) matches exactly one occurrence of the preceding specifier, if it exists.
[student@station student]$ egrep ab?c file
ac
abc
The plus sign (+) matches one or more of the preceding specifier:
[student@station station]$ egrep ab+c file
abc
abbc
abbbc
The asterisk (*) matches any number, including zero, occurrences of the preceding specifier:
[student@station student]$ egrep ab*c file
ac
abc
abbc
abbbc
rha030-3.0-0-en-2005-08-17T07:23:17-0400
> file
>> file
49
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
>> file
>> file
i am sam
sam i am
am i sam
sam
[student@station student]$ egrep ^sam file
sam i am
[student@station student]$ egrep sam$ file
i am sam
am i sam
sam
[student@station student]$ egrep ^sam$ file
sam
Where ^ and $ anchor to lines, the anchors \< and \> match the beginning and ends of words:
[student@station student]$ egrep \<am\> file
i am sam
sam i am
am i sam
Perhaps we did not match the greeting because we forgot to add the period after the abbreviation. This
regex pattern would match either way:
^Dear (Dr|Mr|Ms)\.?
rha030-3.0-0-en-2005-08-17T07:23:17-0400
50
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
#!/bin/bash
if [ ! $# == 1 ]; then
In this stanza, the script ensures it was passed exactly one argument.
This line contains the interesting regular expression. The grep command will look for a line which
begins with the argument, trailed by a :. Recalling the structure of the /etc/passwd file,
usernames satisfy these conditions.
Saving the file, and making it executable, blondie tries out the script on the (existing) user elvis and
(non-existing) user barney.
[blondie@station blondie]$ mv inhouse bin/
[blondie@station blondie]$ chmod a+x bin/inhouse
[blondie@station blondie]$ inhouse elvis
rha030-3.0-0-en-2005-08-17T07:23:17-0400
51
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
./hwdata-0.75/COPYING:
59 Temple Place, Suite 330, Boston, MA 02111-1307 U
SA
./hwdata-0.75/COPYING:
Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA 02111-1307 USA
./libart_lgpl-2.3.11/COPYING:
59 Temple Place, Suite 330,
Boston, MA 02111-1307 USA
...
After observing the first few lines, elvis realizes that his regex pattern is too general. He is matching zip
codes as well as phone numbers. He refines his pattern, specifying that any preceding character or
trailing character must not be a number.
[elvis@station doc]$ egrep -r [^[:digit:]][[:digit:]]{3}(-| )[[:digit:]]{4}[^[:digit:]] .
This time, elviss search precedes much better, until he hits the file esound.ps. This file contains
PostScript, which routinely uses numbers written in ASCII text to specify coordinates. Knowing that he
was not examining a PostScript file, elvis devises a way to exclude all files that end with the .ps
extension from his search. He first uses the find command to list every file in the directory. He next
greps the output down to all files that do not end .ps. He then uses the xargs command to feed these
filenames into his original grep command as arguments. Because his files are now being specified
individually as command line arguments, he no longer needs to grep recursively.
[elvis@station doc]$ find . | egrep -v \.ps$ | xargs egrep
[^[:digit:]][[:digit:]]{3}(-| )[[:digit:]]{4}[^[:digit:]]
After observing a few more lines of output, elvis realizes he should also exclude files that end .fig and
.pdf from his search, as they also contain many ASCII numbers and are cluttering his output.
Modifying his regular expression in his first grep command, he repeats his search.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
52
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
...
Now that the search seems to be going well, elvis revises the output formatting, asking grep to not
display filenames, and give 2 lines of context around each phone number.
[elvis@station doc]$ find . | egrep -v \.(ps|pdf|fig)$ |
xargs egrep -h -C2 [^[:digit:]][[:digit:]]{3}(-| )[[:digit:]]{4}[^[:digit:]]
its much cheaper and includes a great deal of useful explanatory material.)
In the USA, copies of the standard may be ordered from ANSI Sales at (212)
642-4900, or from Global Engineering Documents at (800) 854-7179. (ANSI
doesnt take credit card orders, but Global does.) Its not cheap: as of
1992, ANSI was charging $95 for Part 1 and $47 for Part 2, plus 7%
-1778 McCarthy Blvd.
Milpitas, CA 95035
phone (408) 944-6300, fax (408) 944-6314
A PostScript version of this document is available by FTP at
ftp://ftp.uu.net/graphics/jpeg/jfif.ps.gz. There is also a plain text
-The Free Software Foundation sells tapes and CD-ROMs
containing Bash; send electronic mail to
\f(CRgnu@prep.ai.mit.edu\fP or call \f(CR+1-617-876-3296\fP
for more information.
.PP
-# -# Aharon (Arnold) Jones
arnold@sleeve.com [ <<=== NOTE: NEW ADDRESS!! ]
# P.O. Box 354
Home Phone: +972 8 989-0381
Fax: +1 603 761-6761
# Nof Ayalon
Cell Phone: +972 51 227-545
(See www.efax.com)
# D.N. Shimshon 97784
Laundry increases exponentially in the
-The Ohio State University
http://www.math.ohio-state.edu/~nevai/
231 West Eighteenth Avenue
http://www.math.ohio-state.edu/~jat/
Columbus, Ohio 43210-1174
1-614-292-5310 (Office/Answering Device)
The United States of America
1-614-292-1479 (Math Dept Fax)
-,-*~^~*-,._.,-*~^~*-,._.,-*~^~*-,._.,-*~^~*-,._.,-*~^~*-,
Joe Farwell
| phone 610-843-6020
| Platinum technology
Systems Administrator | vmail 800-123-9096 x7512 | 620 W. Germantown Pike
joe@platinum.com
| fax
610-872-6021
| Plymouth Meeting,Pa,19462
~*-,._.,-*~^~*-,._.,-*~^~*-,._.,-*~^~*-,._.,-*~^~*-,._.,-*~
delay needs to be calibrated using outside sources.
...
Note that names and numbers have been altered in this output.
All told, elvis ends up with 289 "hits", which he can skim in a reasonable amount of time.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
53
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Online Exercises
Lab Exercise
Objective: Use regular expressions to search for patterns of text.
Estimated Time: 45 mins.
Specification
1. Create a short executable bash script named ~/bin/ispython, which expects a single argument,
which is a filename. If the supplied filenames first line is exactly #!/usr/bin/python (nothing
more, nothing less), the script should print the number 1. Otherwise, the script should print the
number 0.
2. You are looking for files in the /etc directory (but not subdirectories) that contain a standard United
States long distance phone number, written using the pattern of 1-###-###-####, where each # is
replaced with a numeric digit. Collect the filenames of every file in the /etc directory which
contains such a pattern of numbers, and place them in the file ~/etcphone.txt, one file name per
line, sorted alphabetically, using absolute references.
3. The file /usr/share/doc/bash-*/NEWS contains many itemized lists, with list items marked by
lines whose first characters are a series of one or more letters, followed by a period and space, as in
the following:
y.
New prompting expansions: \a, \e, \H, \T, \@, \v, \V.
z.
Create the following files, each of which contains the number which answers the specified question
as its single word.
filename
question
newsitems.txt
How many lines begin with a series of one or more letters, followed by a
period?
newsitems23.txt
How many lines begin with a series of two or three letters, followed by a
period?
newsitems2.txt
How many lines begin with a series of exactly two letters, followed by a
period?
54
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
question
newsitems3.txt
How many lines begin with a series of exactly three letters, followed by a
period?
4. The file /usr/share/dict/words contains a collection of common dictionary words, stored one
per line. Both common words and proper names are included, each appropriately capitalized.
Using only the egrep command, determine which words start with a capital letter followed only by
vowels. Do not include single letter words. (For the purposes of this exercise, consider vowels as
only the letters A, E, I, O, or U, both uppercase and lowercase.)
List these words, one per line and sorted alphabetically, in the file ~/vowel2.txt.
Deliverables
1. A script called ~/bin/ispython, which, when executed with a single filename as an argument, will print 1 if
the specified files first line is exactly #!/usr/bin/python. Otherwise, the script should print 0 (hint: This can
be accomplished by combining the head and grep commands).
2. The file ~/etcphone.txt, which contains a list of all files in the /etc directory (but not subdirectories) which
contain the pattern 1-###-###-####, where each # is replaced by a numeric digit. The files should be listed as
absolute references, one per line, alphabetized.
3. The files ~/newsitems.txt, ~/newsitems23.txt, ~/newsitems2.txt, and ~/newsitems3.txt, each
of which contain a single number as their only word. The number should be the answer to the respective
question about the file /usr/share/doc/bash-*/NEWS in the table above.
4. The file ~/vowel2.txt, which contains an alphabetically sorted list of all words from
/usr/share/dict/words which start with a capital letter followed only by vowels. (Exclude single letter
words).
Questions
In all of the following questions, the term regular expression implies extended regular expression syntax.
1. Which of the following characters is a regular expression literal character?
( ) a. ?
( ) b. _
( ) c. *
55
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
HOME=/
# run-parts
01 * * * * root
02 4 * * * root
22 4 * * 0 root
42 4 1 * * root
run-parts
run-parts
run-parts
run-parts
/etc/cron.hourly
/etc/cron.daily
/etc/cron.weekly
/etc/cron.monthly
56
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
/etc/crontab?
57
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
10. Which of the following lines would match the regular expression?
( ) a. Resent-Cc: elvis@example.com
( ) b. Original-Resent-Bcc: elvis@example.com
( ) c. To: elvis@example.com
( ) d. All of the above
( ) e. A and C only
(It could have been worse... the following regular expression is also found in the procmailrc(5) man
page.)
(^(Mailing-List:|Precedence:.*(junk|bulk|list)|To: Multiple
recipients of |(((Resent-)?(From|Sender)|X-Envelope-From):|>?From
)([^>]*[^(.%@a-z0-9])?(Post(ma?(st(e?r)?|n)|office)|(send)?Mail(er)?
daemon|m(mdf|ajordomo)|n?uucp|LIST(SERV|proc)|NETSERV|o(wner|ps)
|r(e(quest|sponse)|oot)|b(ounce|bs\.smtp)|echo|mirror|s(erv(ices?|er)
|mtp(error)?|ystem)|A(dmin(istrator)?|MMGR|utoanswer))(([^).!:az0-9][-_a-z0-9]*)?[%@>\t ][^<)]*(\(.*\).*)?)?$([^>]|$)))
rha030-3.0-0-en-2005-08-17T07:23:17-0400
58
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Discussion
In previous Workbooks, we have introduced the sort command in its simplest form: a tool for arranging
the lines of a file or output from a command alphabetically. This Lesson will present the sort command
in more detail.
DEVICE=/dev/psaux
FULLNAME="Generic - 2 Button Mouse (PS/2)"
MOUSETYPE="ps/2"
XEMU3="yes"
XMOUSETYPE="PS/2"
If called with arguments, the arguments are interpreted as (possibly multiple) filenames to be sorted. If
called without argument, the sort command will sort whatever it reads from standard in.
59
Effect
-b, --ignore-leading-blanks
-d, --dictionary-order
-f, --ignore-case
-g, --general-numeric-sort
-n, --numeric-sort
-r, --reverse
As an example, madonna is examining the file sizes of all files that start with an m in the /var/log
directory.
[madonna@station madonna]$ ls -s1 /var/log/m *
20
3104
1552
1952
1236
4
384
636
216
560
/var/log/maillog
/var/log/maillog.1
/var/log/maillog.2
/var/log/maillog.3
/var/log/maillog.4
/var/log/messages
/var/log/messages.1
/var/log/messages.2
/var/log/messages.3
/var/log/messages.4
1236
1552
1952
20
216
3104
384
4
560
636
/var/log/maillog.4
/var/log/maillog.2
/var/log/maillog.3
/var/log/maillog
/var/log/messages.3
/var/log/maillog.1
/var/log/messages.1
/var/log/messages
/var/log/messages.4
/var/log/messages.2
Without being told otherwise, the sort command sorted the lines alphabetically (with 1952 coming
before 20). Realizing this is not what she intended, madonna adds the -n command line switch.
[madonna@station madonna]$ ls -s /var/log/m * | sort -n
4 /var/log/messages
20 /var/log/maillog
216 /var/log/messages.3
rha030-3.0-0-en-2005-08-17T07:23:17-0400
60
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
/var/log/messages.1
/var/log/messages.4
/var/log/messages.2
/var/log/maillog.4
/var/log/maillog.2
/var/log/maillog.3
/var/log/maillog.1
Better, but madonna would prefer to reverse the sort order, so that the largest files come first. She adds
the -r command line switch.
[madonna@station madonna]$ ls -s /var/log/m * | sort -nr
3104
1952
1552
1236
636
560
384
216
20
4
/var/log/maillog.1
/var/log/maillog.3
/var/log/maillog.2
/var/log/maillog.4
/var/log/messages.2
/var/log/messages.4
/var/log/messages.1
/var/log/messages.3
/var/log/maillog
/var/log/messages
Why ls -1?: Why was the -1 command line switch given to the ls command in the first example, but
not the others? By default, when the ls command is using a terminal for standard out, it will group the
filenames in multiple columns for easy readability. When the ls command is using a pipe or file for
standard out, however, it will print the files one file per line. The -1 command line switch forces this
behavior for for terminal output as well.
Effect
-k, --key=POS
-t, --field-separator=SEP
rha030-3.0-0-en-2005-08-17T07:23:17-0400
61
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
-rw-------rw-------rw-------rw-------rw-------rw-------rw-------rw-------rw-------rw-------
1
1
1
1
1
1
1
1
1
1
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
1260041
1581750
1993522
216885
31187
3172217
387345
567049
644859
651
Sep
Sep
Sep
Sep
Oct
Oct
Oct
Sep
Sep
Oct
14
28
22
22
5
5
5
14
28
5
04:05
06:15
10:16
10:22
06:05
04:05
04:07
04:08
06:22
05:40
/var/log/maillog.4
/var/log/maillog.2
/var/log/maillog.3
/var/log/messages.3
/var/log/maillog
/var/log/maillog.1
/var/log/messages.1
/var/log/messages.4
/var/log/messages.2
/var/log/messages
Now that the sizes are no longer reported at the beginning of the line, she has difficulty. Instead, she
repeats her sort using the -k command line switch to sort her output by the 5th column, producing the
desired output.
[madonna@station madonna]$ ls -l /var/log/m * | sort -n -k5
-rw-------rw-------rw-------rw-------rw-------rw-------rw-------rw-------rw-------rw-------
1
1
1
1
1
1
1
1
1
1
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
root
651
31187
216885
387345
567049
644859
1260041
1581750
1993522
3172217
Oct
Oct
Sep
Oct
Sep
Sep
Sep
Sep
Sep
Oct
5
5
22
5
14
28
14
28
22
5
05:40
06:05
10:22
04:07
04:08
06:22
04:05
06:15
10:16
04:05
/var/log/messages
/var/log/maillog
/var/log/messages.3
/var/log/messages.1
/var/log/messages.4
/var/log/messages.2
/var/log/maillog.4
/var/log/maillog.2
/var/log/maillog.3
/var/log/maillog.1
360/360
1200/1200
360/720
720/720
720/1440
360/1200
720/1200
1440/1440
1440/1200
1680/1440
cbm1581
800/720
720
2400
720
1440
1440
720
1440
2880
2880
3360
1600
1600
9
15
9
9
9
9
9
18
18
21
10
10
2
2
2
2
2
2
2
2
2
2
2
2
40
80
40
80
80
40
80
80
80
80
80
80
0
0
1
0
0
1
0
0
0
0
2
0
0x2A
0x1B
0x2A
0x2A
0x2A
0x23
0x23
0x1B
????
0x0C
0x2A
0x2A
0x02
0x00
0x02
0x02
0x02
0x01
0x01
0x00
????
0x00
0x02
0x02
0xDF
0xDF
0xDF
0xDF
0xDF
0xDF
0xDF
0xCF
????
0xCF
0xDF
0xDF
0x50
0x54
0x50
0x50
0x50
0x50
0x50
0x6C
???? # ?????
0x6C # ?????
0x2E
0x2E
She next sorts the data numerically, using the 5th column as her key.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
62
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
360/1200
360/360
360/720
1200/1200
1440/1200
1440/1440
1680/1440
720/1200
720/1440
720/720
800/720
cbm1581
720
720
720
2400
2880
2880
3360
1440
1440
1440
1600
1600
9
9
9
15
18
18
21
9
9
9
10
10
2
2
2
2
2
2
2
2
2
2
2
2
40
40
40
80
80
80
80
80
80
80
80
80
1
0
1
0
0
0
0
0
0
0
0
2
0x23
0x2A
0x2A
0x1B
????
0x1B
0x0C
0x23
0x2A
0x2A
0x2A
0x2A
0x01
0x02
0x02
0x00
????
0x00
0x00
0x01
0x02
0x02
0x02
0x02
0xDF
0xDF
0xDF
0xDF
????
0xCF
0xCF
0xDF
0xDF
0xDF
0xDF
0xDF
0x50
0x50
0x50
0x54
???? # ?????
0x6C
0x6C # ?????
0x50
0x50
0x50
0x2E
0x2E
Her data is successfully sorted using the 5th column, with the formats specifying 40 tracks grouped at the
top, and 80 tracks grouped at the bottom. Within these groups, however, she would like to sort the data
by the 3rd column. She adds an additional -k command line switch to the sort command, specifying the
third column as her secondary key.
[madonna@station madonna]$ grep "^[[:alnum:]]" /etc/fdprm | sort -n -k5 -k3
360/1200
360/360
360/720
720/1200
720/1440
720/720
800/720
cbm1581
1200/1200
1440/1200
1440/1440
1680/1440
720
720
720
1440
1440
1440
1600
1600
2400
2880
2880
3360
9
9
9
9
9
9
10
10
15
18
18
21
2
2
2
2
2
2
2
2
2
2
2
2
40
40
40
80
80
80
80
80
80
80
80
80
1
0
1
0
0
0
0
2
0
0
0
0
0x23
0x2A
0x2A
0x23
0x2A
0x2A
0x2A
0x2A
0x1B
????
0x1B
0x0C
0x01
0x02
0x02
0x01
0x02
0x02
0x02
0x02
0x00
????
0x00
0x00
0xDF
0xDF
0xDF
0xDF
0xDF
0xDF
0xDF
0xDF
0xDF
????
0xCF
0xCF
0x50
0x50
0x50
0x50
0x50
0x50
0x2E
0x2E
0x54
???? # ?????
0x6C
0x6C # ?????
Now the data has been sorted primarily by the fifth column. For rows with identical fifth columns, the
third column has been used to determine the final order. An arbitrary number of keys can be specified by
adding more -k command line switches.
Specifying the Field Separator
The above examples have demonstrated how to sort data using a specified field as the sort key. In all of
the examples, fields were separated by whitespace (i.e., a series of spaces and/or tabs). Often in Linux
(and Unix), some other method is used to separate fields. Consider, for example, the /etc/passwd file.
[madonna@station madonna]$ head /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
rha030-3.0-0-en-2005-08-17T07:23:17-0400
63
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
The lines are structured into seven fields each, but the fields are separated using a : instead of
whitespace. With the -t command line switch, the sort command can be instructed to use some specified
character (such as a :) to separate fields.
In the following, madonna uses the sort command with the -t command line switch to sort the first 10
lines of the /etc/passwd file by home directory (the 6th field).
[madonna@station madonna]$ head /etc/passwd | sort -t: -k6
bin:x:1:1:bin:/bin:/sbin/nologin
news:x:9:13:news:/etc/news:
root:x:0:0:root:/root:/bin/bash
sync:x:5:0:sync:/sbin:/bin/sync
halt:x:7:0:halt:/sbin:/sbin/halt
daemon:x:2:2:daemon:/sbin:/sbin/nologin
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
The user bin, with a home directory of /bin, is now at the top, and the user mail, with a home directory
of /var/spool/mail, is at the bottom.
Summary
In summary, we have seen that the sort command can be used to sort structured data, using the -k
command line switch to specify the sort field (perhaps more than once), and the -t command line switch
to specify the field delimiter.
The -k command line switch can receive more sophisticated arguments, which serve to specify character
positions within a field, or customize sort options for individual fields. See the sort(1) man page for
details.
Prefix line with the number of its occurrences; this is the length of
the run.
64
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
-f, --skip-fields=n
-i, --ignore-case
Ignore case.
-s, --skip-charsn
-u, --unique
-w, --check-chars=n
In order to understand the uniq commands behavior, we need repetitive data on which to operate. The
following python script simulates the rolling of three six sided dice, writing the sum of 100 roles once per
line. The user madonna makes the script executable, and then records the output in a file called trial1.
[madonna@station madonna]$ cat three_dice.py
#!/usr/bin/python
from random import randint
for i in range(100): print randint(1,6)+randint(1,6)+randint(1,6)
[madonna@station madonna]$ chmod 755 three_dice.py
[madonna@station madonna]$ ./three_dice.py > trial1
[madonna@station madonna]$ wc trial1
100
100
260 trial_run
10
10
10
13
8
8
10
10
8
6
4
5
6
7
8
9
10
11
12
13
rha030-3.0-0-en-2005-08-17T07:23:17-0400
65
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Without any command line switches, the uniq command has removed duplicate entries, reducing the
data from 100 lines to only 15. Easily, madonna sees that the data looks reasonable: the sum of every
combination for three six sided die is represented, with the exception of 3. Because only one combination
of the dice would yield a sum of 3 (all ones), she expects it to be a relatively rare occurrence.
1
4
6
10
10
13
13
9
13
4
8
4
1
2
2
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
As would be expected (by a statistician, at least), the largest and smallest numbers have relatively few
occurrences, while the intermediate numbers occur more numerously. The first column can be summed
to 100 to confirm that the uniq command identified every occurrence.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
66
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
5
6
7
8
9
10
11
12
13
14
15
17
18
Examples
Example 1. Sorting the Output of ps aux
The user madonna is examining the processes running on her local machine. She is familiar with the ps
aux command, which tables information about every running process.
[madonna@station madonna]$ ps aux | head -4
USER
root
root
root
VSZ
1380
0
0
RSS
76
0
0
TTY
?
?
?
STAT
S
SW
SW
START
02:05
02:05
02:05
TIME
0:04
0:00
0:00
COMMAND
init [
[keventd]
[kapmd]
Title
Role
USER
PID
%CPU
%MEM
VSZ
67
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Title
Role
RSS
The user madonna would like to order processes in terms of some of these parameters. She first orders
the processes by their virtual memory size, sorting numerically, and listing in descending order. Notice
the use of the tail +2 command, to remove the header from the list of processes.
[madonna@station madonna]$ ps aux | tail +2 | sort -rn -k5 | head
gdm
madonna
apache
apache
apache
apache
apache
apache
apache
apache
1074
1844
909
908
907
906
905
904
903
902
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
5.2
0.2
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
37828
19436
18320
18320
18320
18320
18320
18320
18320
18320
13396 ?
632 pts/0
1464 ?
1464 ?
1464 ?
1464 ?
1464 ?
1464 ?
1464 ?
1468 ?
S
S
S
S
S
S
S
S
S
S
02:06
03:42
02:06
02:06
02:06
02:06
02:06
02:06
02:06
02:06
0:01
0:00
0:00
0:00
0:00
0:00
0:00
0:00
0:00
0:00
/usr/bin/gdmgreeter
sort -rn -k5
/usr/sbin/httpd
/usr/sbin/httpd
/usr/sbin/httpd
/usr/sbin/httpd
/usr/sbin/httpd
/usr/sbin/httpd
/usr/sbin/httpd
/usr/sbin/httpd
The gdmgreeter (which manages logins for the X graphical environment) and httpd daemon (which
implements the Apache Web Server) are the largest processes on her machine, in terms of the amount of
memory they are requesting. (Note also, the sort command made an appearance).
Next, madonna sorts the output by the sixth column, which tables the resident memory sizes of the
processes.
[madonna@station madonna]$ ps aux | tail +2 | sort -rn -k6 | head
gdm
root
root
xfs
elvis
madonna
root
root
madonna
root
1074
1066
914
978
1664
1748
1662
1746
1752
885
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
S
R
S
S
S
S
S
S
S
S
02:06
02:06
02:06
02:06
03:31
03:31
03:31
03:31
03:31
02:06
0:01
0:00
0:00
0:00
0:00
0:00
0:00
0:00
0:00
0:00
/usr/bin/gdmgreet
/usr/X11R6/bin/X
cupsd
xfs -droppriv
/usr/sbin/sshd
/usr/sbin/sshd
/usr/sbin/sshd
/usr/sbin/sshd
-bash
/usr/sbin/httpd
Interestingly, a different collection of processes make the top of the list, including the X server, and
several instances of the sshd daemon (which implements the Secure Shell service). Presumably, these are
the processes that are currently active.
Next, madonna sorts by the third column, relative CPU activity.
[madonna@station madonna]$ ps aux | tail +2 | sort -rn -k3 | head
elvis
elvis
blondie
xfs
smmsp
rpc
1744 33.8
1745 33.7
1826 33.3
978 0.0
864 0.0
586 0.0
rha030-3.0-0-en-2005-08-17T07:23:17-0400
0.1
0.1
0.1
0.9
0.0
0.0
R
R
R
S
S
S
03:31
03:31
03:32
02:06
02:06
02:05
6:01
6:00
5:45
0:00
0:00
0:00
cat /dev/zero
cat /dev/zero
cat /dev/zero
xfs -droppriv
sendmail: Queue
portmap
68
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
914
9
894
885
0.0
0.0
0.0
0.0
S
SW
S
S
02:06
02:05
02:06
02:06
0:00
0:00
0:00
0:00
cupsd
[bdflush]
crond
/usr/sbin/httpd
Her machine is not seeing much current activity, with the exception of three different cat processes,
which seem to be evenly dividing her CPU.
Specifies
cmd
pid
The process ID
state
user
As some examples of using the -o command line switch, madonna first tables processes with their
process ID, the user who owns the process, and the command that is running.
[madonna@station madonna]$ ps -e -o pid,user,cmd | head -5
PID
1
2
3
4
USER
root
root
root
root
CMD
init [
[keventd]
[kapmd]
[ksoftirqd_CPU0]
PID
1
2
3
4
S
S
S
S
S
Now that she has built up some familiarity with the ps command and the -o command line switch, she is
ready to begin asking some questions. She first wants to know who is running processes on the machine,
and how many processes they are running. She tables all processes, listing only the username of who
rha030-3.0-0-en-2005-08-17T07:23:17-0400
69
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
8
2
1
3
1
5
48
1
1
1
apache
blondie
daemon
elvis
gdm
madonna
root
rpc
smmsp
xfs
She would prefer the output to be sorted, so she adds one more sort to the end of the pipe.
[madonna@station madonna]$ ps -e -o user | tail +2 | sort | uniq -c | sort -rn
48
8
6
3
2
1
1
1
1
1
root
apache
madonna
elvis
blondie
xfs
smmsp
rpc
gdm
daemon
Now blondie easily sees that root and apache are running the most processes (presumably daemons in the
background), followed by madonna, elvis, and blondie (presumably interactive users). How many of
these processes are currently running, and how many are sleeping? Using a similar trick, but this time
listing the process state instead of the user owner, she comes up with her answer.
[madonna@station madonna]$ ps -e -o state | tail +2 | sort | uniq -c | sort -rn
73 S
5 R
Most of the processes on her machine (73) are sleeping, while relatively few (5) are running (which
implies they are actively using the CPU).
Online Exercises
Lab Exercise
Objective: Use the sort and uniq command to manage information efficiently.
Estimated Time: 45 mins.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
70
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used,
copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Specification
1. The file /etc/fstab is used to predefine mount points on your system. The third column of this file
specifies the filesystem type of the device to be mounted.
Sort the contents of this file in alphabetically ascending order, using the third column as your
primary key. Store the output in the newly created file ~/fstab.byfs
2. The file /proc/modules lists currently loaded kernel modules, along with the module size (the
second column) and a current usage count (the third column).
Sort the contents of this file in numerically descending order, using the usage count (third column)
as your primary key, and the module size (second column) as your secondary key. Store the results
in the file ~/modules.byuc
3. Sort the /etc/passwd file in alphabetically ascending order, using the users login shell as your
primary key. Store the results in the file newly created file passwd.bylogin
4. The stat command uses the --format command line switch to specify its output format. As seen in
the stat(1) man page (or stat --help), the following command line will list the permissions of a file
in octal notation.
[student@station student]$ stat --format="%a" /etc/passwd
644
Use this command to list the permissions on all files (and directories, etc.) in the /etc/ directory
(but not subdirectories). Use the sort and uniq commands to reduce this information into a simple
table, with the first column being the number of times that the octal mode specified in the second
column occurs. The table should be sorted in numerically descending order, using the number of
occurrences (the first column) as your primary key. Store the table in a newly created file called
~/etcmodes.txt
If completed correctly, your table should have a form similar to the following. (Do not be concerned
if the actual values of your table differ.)
[student@station student]$ cat etcmodes.txt
127
63
16
15
6
5
3
2
1
1
1
644
755
600
777
640
664
444
400
775
750
440
5. The df command lists currently mounted disk partitions, along with the current disk usage. The
fourth column of this commands output lists the amount of available space in blocks.
Create an executable script called ~/bin/avail. When executed, the script should list available
partitions (the output of the df command), sorted in numerically descending order, using the amount
of available space (the fourth column) as the primary key. The header line generated from the df
command should be stripped from the output.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
71
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Deliverables
1. The file fstab.byfs, which contains the contents of the /etc/fstab file sorted in alphabetically ascending order,
using the third column as the primary key.
2. The file modules.byuc, which contains the contents of the /proc/modules file sorted in numerically
descending order, using the third column as the primary key, and the second column as the secondary key.
3. The file passwd.bylogin, which contains the contents of the /etc/passwd file sorted in alphabetically
ascending order, using the users login shell as the primary key.
4. The file etcmodes.txt, which tables the octal permissions (modes) of all files in the /etc directory. The
second column of the table should be the octal mode, and the first column the number of files to which the mode
applies. The tables should be sorted in numerically descending order, using the first column as a primary key.
5. The executable script ~/bin/avail, which when executed (without arguments) displays the output of the df
command sorted in numerically descending order, using the fourth column as its primary key. The header line
produced by the df command should be stripped from the output.
If you have performed the exercises correctly, you should be able to generate output similar to the following. Do not
be concerned if your actual values differ.
[student@station student]$ head -5 fstab.byfs modules.byuc passwd.bylogin etcmodes.txt
/mnt/camera
/dev/pts
/home/elvis/case
/
/boot
15096
70784
51924
13504
8680
auto
noauto,user
0 0
devpts gid=5,mode=620 0 0
ext2
noauto,loop,user,exec 0 0
ext3
defaults
1 1
ext3
defaults
1 2
rha030-3.0-0-en-2005-08-17T07:23:17-0400
72
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S.
and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without
prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone tollfree (USA) +1 866 626 2994 or +1 (919) 754 3700.
5131108
127616
124427
4505312
0
26268
365144
127616
91735
93% /
0% /dev/shm
23% /boot
Questions
1. Which of the following are legitimate invocations of the sort command?
( ) a. sort -k5 -t: data
( ) b. sort -n data
( ) c. sort -rn -k3 data
( ) d. sort -k2 < data
( ) e. All of the above
2. Which of the following would sort the file data in numerically descending order, using the third column as the
primary key?
( ) a. sort -rn -k3 data
( ) b. sort -r -3 data
( ) c. sort -n -p3 data
( ) d. All of the Above
( ) e. A or C
3. Which of the following command lines would sort the file /etc/passwd in numerically ascending order, using
the third : separated field as a primary key?
( ) a. sort --numeric-sort -k3 -t: /etc/passwd
( ) b. sort -nk3 -t: /etc/passwd
( ) c. sort -n k3,: /etc/passwd
( ) d. All of the above
( ) e. A and B
Use the output from the following command to answer the next 2 questions.
[student@station hwdata]$ head -20 MonitorsDB
#
# Monitor information for use by Xconfigurator
# Supplies similar information to the data in the /usr/X11R6/lib/X11/Cards
rha030-3.0-0-en-2005-08-17T07:23:17-0400
73
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S.
and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without
prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone tollfree (USA) +1 866 626 2994 or +1 (919) 754 3700.
4. Which of the following bash command lines would sort the listed monitors in alphabetically ascending order by
EISA ID?
( ) a. grep -v # MonitorsDB | sort -t; -k3
( ) b. grep -v \# MonitorsDB | sort -t\; -k3
( ) c. grep -v \# MonitorsDB | sort -n -t\; -k2
( ) d. grep -v \# MonitorsDB | sort -r -t\; -k2
( ) e. None of the above
5. Which of the following bash command lines would sort the listed monitors in numerically descending order,
using the (first value of) the monitors horizontal "sync frequency" as the primary key?
( ) a. grep -v \# MonitorsDB | sort -nr -t\; -k4
( ) b. grep -v # MonitorsDB | sort -r -t; -k4
( ) c. grep -v # MonitorsDB | sort -t; -k4
( ) d. grep -v \# MonitorsDB | sort -t\; -k4
( ) e. None of the above
Use the output from the following command to answer the next 3 questions.
[student@station log]$ ls -l
-rw------drwxr-xr-x
-rw------drwxr-xr-x
-rw-r--r-drwxr-xr-x
drwx------
1
2
1
2
1
2
2
root
servlet
root
lp
root
root
root
rha030-3.0-0-en-2005-08-17T07:23:17-0400
root
servlet
root
sys
root
root
root
2533
4096
9795
4096
5761
4096
4096
74
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S.
and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without
prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone tollfree (USA) +1 866 626 2994 or +1 (919) 754 3700.
1
1
1
1
2
1
2
2
1
1
1
2
1
2
1
1
1
root
root
root
postgres
root
root
root
root
root
root
root
squid
root
root
root
root
root
root
19136220 Oct 6 03:31 lastlog
root
184228 Oct 6 05:08 maillog
root
20899 Oct 6 04:51 messages
postgres
0 Apr 1 2003 pgsql
rha
4096 Aug 27 10:58 rha
root
22166 Oct 6 04:10 rpmpkgs
root
4096 Oct 6 02:10 sa
root
4096 Apr 5 2003 samba
root
41382 Aug 21 15:47 scrollkeeper.log
root
1161 Oct 6 03:31 secure
root
0 Oct 5 04:07 spooler
squid
4096 Aug 18 07:05 squid
root
0 Oct 5 04:08 up2date
root
4096 Feb 3 2003 vbox
root
0 Oct 5 04:08 vsftpd.log
utmp
122880 Oct 6 03:31 wtmp
root
39015 Oct 6 05:56 xorg.0.log
6. Which of the following command lines would reorder this output in numerically descending order, using the file
size (the fifth column) as the primary key?
( ) a. ls -l | grep -v ^t | sort -rn -k5
( ) b. ls -l | grep -v ^t | sort -n -k4
( ) c. ls -l | grep -v ^t | sort -r -k5
( ) d. ls -l | grep -v ^t | sort -t: -k5
( ) e. None of the above
7. Which of the following command lines would reorder this output in numerically ascending order, using the link
count (the second column) as a primary key, and the file size (the fifth column) as the secondary key?
( ) a. ls -l | grep -v ^t | sort -rn -k5,2
( ) b. ls -l | grep -v ^t | sort -n -k2 -k5
( ) c. ls -l | grep -v ^t | sort -n -k5 -k2
( ) d. ls -l | grep -v ^t | sort -t: -k5 -k2
( ) e. None of the above
8. Which of the following command lines would reorder this output in alphabetically ascending order, using the
group owner (the fourth column) as the primary key, and the filename (the ninth column) as the secondary key?
( ) a. ls -l | grep -v ^t | sort -r -k9 -k4
( ) b. ls -l | grep -v ^t | sort -t- -k4,9
( ) c. ls -l | grep -v ^t | sort -k4 -k9
( ) d. ls -l | grep -v ^t | sort -rn -k4 -k9
75
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
76
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
The cut command extracts texts from text files, based on columns specified by bytes, characters, or
fields.
Discussion
In this Lesson, we explore two commands that are used to extract columns from a stream of text, or
assemble columns into a wider stream: cut and paste.
Effect
-b list
-c list
-f list
The list arguments are actually a comma-separated list of ranges. Each range can take one of the
following forms.
Table 5-2. Range Specifications
N
N-
77
-M
All items from the beginning of the line through the end of the line.
0:
1:
2:
3:
5:
8:
10:
11:
12:
14:
15:
NMI:
ERR:
CPU0
4477340
25250
0
7344
310187
1
166
6575295
544632
80379
341407
0
0
XT-PIC
XT-PIC
XT-PIC
XT-PIC
XT-PIC
XT-PIC
XT-PIC
XT-PIC
XT-PIC
XT-PIC
XT-PIC
timer
keyboard
cascade
ehci-hcd
usb-uhci, ohci1394
rtc
usb-uhci, eth1
usb-uhci, eth0, Audigy
PS/2 Mouse
ide0
ide1
Because the characters in the file are formatted into columns, the cut command can extract particular
regions of interest. If just the IRQ line and the number of interrupts were of interest, the rest of the file
could be cut away, as in the following example. (Note the use of the grep command to first reduce the
file to just the lines pertaining to interrupt lines.)
[student@rosemont student]$ grep [[:digit:]]: /proc/interrupts
0:
1:
2:
3:
5:
8:
10:
11:
12:
14:
15:
| cut -c1-15
4512997
27954
0
7344
312095
1
166
6629756
545523
81025
344239
Alternately, if only the device drivers bound to particular IRQ lines were of interest, multiple ranges of
characters could be specified.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
78
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
0:
1:
2:
3:
5:
8:
10:
11:
12:
14:
15:
| cut -c1-5,34-
timer
keyboard
cascade
ehci-hcd
usb-uhci, ohci1394
rtc
usb-uhci, eth1
usb-uhci, eth0, Audigy
PS/2 Mouse
ide0
ide1
If the character specifications were reversed, can the cut command be used to rearrange the ordering of
the data?
[student@rosemont student]$ grep [[:digit:]]: /proc/interrupts
| cut -c34-,1-5
0: timer
1: keyboard
2: cascade
...
The answer is no. Text will appear only once, in the same order it appears in the source, even if the range
specifications are overlapping or rearranged.
Effect
-d DELIM
Use DELIM to separate fields on input, instead of the default TAB character.
-s
Do not include lines that do not contain the delimiter character (useful for
stripping comments and headers).
--outputdelimiter=STRING
On output, use the text specified by STRING instead of the input field
delimiter.
For example, the file /usr/share/hwdata/pcitable lists over 3000 vendor IDs and device IDs
(which can be probed from PCI devices), and the kernel modules and text strings which should be
associated with them, separated by tabs.
[student@rosemont hwdata]$ head -15 pcitable
#
#
#
#
rha030-3.0-0-en-2005-08-17T07:23:17-0400
79
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
0x0675
0x0675
0x09c1
0x0e11
0x0e11
0x0e11
0x1700
0x1702
0x0704
0x0001
0x0002
0x0046
"unknown"
"Dynalink|IS64PH ISDN Adapter"
"hisax" "Dynalink|IS64PH ISDN Adapter"
"unknown"
"Arris|CM 200E Cable Modem"
"ignore"
"Compaq|PCI to EISA Bridge"
"ignore"
"Compaq|PCI to ISA Bridge"
"cciss" "Compaq|Smart Array 64xx"
The following example extracts the third and fourth column, using the default TAB character to separate
fields. Note the use of the -s command line switch, which effective strips the header lines (which do not
contain any TABs).
[student@rosemont hwdata]$ cut -s -f3,4 pcitable | head
"unknown"
"Dynalink|IS64PH ISDN Adapter"
"hisax" "Dynalink|IS64PH ISDN Adapter"
"unknown"
"Arris|CM 200E Cable Modem"
"ignore"
"Compaq|PCI to EISA Bridge"
"ignore"
"Compaq|PCI to ISA Bridge"
"cciss" "Compaq|Smart Array 64xx"
"unknown"
"Compaq|NC7132 Gigabit Upgrade Module"
"unknown"
"Compaq|NC6136 Gigabit Server Adapter"
"tmspci"
"Compaq|Netelligent 4/16 Token Ring"
"ignore"
"Compaq|Triflex/Pentium Bridge, Model 1000"
As another example, suppose we wanted to obtain a list of the most commonly referenced kernel
modules in the file. We could use a similar cut command, along with tricks learned in the last Lesson, to
obtain a quick listing of the number of times each kernel module appears.
[student@rosemont hwdata]$ cut -s -f3 pcitable | sort | uniq -c | sort -rn | head
1988
148
83
70
37
37
36
24
21
20
"unknown"
"ignore"
"aic7xxx"
"gdth"
"e100"
"Card:ATI Rage 128"
"3c59x"
"Card:ATI Mach64"
"tulip"
"agpgart"
Many of the entries are obviously unknown, or intentionally ignored, but we do see that the aic7xxx SCSI
driver, and the e100 Ethernet card driver, are commonly used.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
80
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
| wc -c
Accounting, we have one byte each for the letters f and r, one byte for the newline which was
appended to the output, leaving two bytes for the .
When using cut -c, the would be considered a single character, but when using cut -b, would be
considered two bytes, as in the following example.
[elvis@station elvis]$ echo f | cut -c 1-2
f
[elvis@station elvis]$ echo f | cut -b 1-2
f?
The first time, the cut command counted the two bytes used to encode the as a single character, but
the second time, it was considered two bytes. As a result, the character was "cut in half" by the cut
command, and the terminal was not able to display it correctly.
Usually, cut -c is the proper way to use the cut command, and cut -b will only be necessary for technical
situations.
Note: Notice the inconsistent nomenclature between with wc and cut. With wc -c, the wc command
really returns the number of bytes contained in a string, while cut -c measures text in characters.
Unfortunately, the wc command makes no equivalent distinction made between characters and
bytes.
File-1 Line 1
File-1 Line 2
rha030-3.0-0-en-2005-08-17T07:23:17-0400
81
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
File-2 Line 1
File-2 Line 2
File-2 Line 3
If we had more than two files, the first line of each file would become the first line of the output. The
second output line would contain the second lines of each input file, obtained in the order we gave them
on the command line. As a convenience, the filename - can be supplied on the command line. For this
"file", the paste command would read from standard in.
Table 5-4. Command Line Switches for paste
Option
Description
-d list
-s, --serial
Transpose the result, so that each line in the first file is pasted into a
single line, each line of the second file is pasted into the next single
line, etc.
Examples
Example 1. Handling Free-Format Records
In a free-format record layout, input record items are identified by their position on the line, not by their
character position. Input fields are expected to be separated by exactly one TAB character, but any
character that does not appear in the data items themselves may be used. Each occurrence of the
delimiter separates a field.
Our favorite example file /etc/passwd has fields separated by exactly one colon (:) character. Field 1
is the account name and field 7 gives the shell program used. Using the cut command, we could output a
new file with just the account name and the shell name:
[student@station student]$ cut -d: -f1,7 /etc/passwd
root:/bin/bash
bin:/sbin/nologin
daemon:/sbin/nologin
adm:/sbin/nologin
...
rha030-3.0-0-en-2005-08-17T07:23:17-0400
82
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
root,/bin/bash
bin,/sbin/nologin
daemon,/sbin/nologin
adm,/sbin/nologin
lp,/sbin/nologin
...
abc123
def456
hij789
lkm012
c1
f4
j7
m0
Noticing that the words are separated by a single spaces, the cut command can be used to easily extract
the third and fifth words (which contain the mount point, and filesystem type, respectively). The
command must be supplied with the -d " " command line switch, which instructs it to treat spaces as a
field delimiters.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
83
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
/ ext3
/proc proc
/proc/bus/usb usbdevfs
/boot ext3
/dev/pts devpts
/dev/shm tmpfs
/misc autofs
Filesystem
/dev/hda3
/dev/hda1
none
1K-blocks
5131108
124427
127616
Filesystem
/dev/hda3
/dev/hda1
none
Apparently not. The cut command is using a space for a field delimiter. The catch is that the cut
command does not collapse multiple spaces into a single space, but treats them individually. Where does
the fifth "field" occur in the df commands output? Somewhere about halfway between the first two
columns. The cut command dutifully prints the first and (empty) fifth field.
Unfortunately, this is a commonly encountered limitation of the cut command. Fortunately, we will find
techniques in a later Lesson that can be used to overcome it.
Recall that the -d switch list argument can take more than one character. This can be used to provide a
different delimiter between each pair of portions written to the output. The list characters are recycled
if necessary:
[student@station student]$ paste -d+-/ file-1 file-2 file-1 file-2 file-1
File-1 Line 1+File-2 Line 1-File-1 Line 1/File-2 Line 1+File-1 Line 1
File-1 Line 2+File-2 Line 2-File-1 Line 2/File-2 Line 2+File-1 Line 2
File-1 Line 3+File-2 Line 3-File-1 Line 3/File-2 Line 3+File-1 Line 3
rha030-3.0-0-en-2005-08-17T07:23:17-0400
84
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Online Exercises
Lab Exercise
Objective: Use cut and paste to manage text.
Estimated Time: 25 mins.
Specification
1. Use the cut command to extract a list of usernames and login shells from the /etc/passwd file,
where the resulting usernames and login shells are separated by a single space. Sort the resulting list
in ascending alphabetical order, using the login shell as the primary key, and the username as a
secondary key. Store the result in the newly created file ~/usershells.txt.
2. The file /proc/cpuinfo contains information about your systems detected CPU. Use the cut
command to extract only the values, not the names or the : that is used to separate the names from
the values. Store the resulting list of values in the newly created file ~/cpuvalues.txt.
3. The file /etc/sysconfig/init is used to define parameters which configure your machines
startup method. Parameters are defined using the same syntax used by the bash shell, i.e.,
NAME=value.
Use some combination of the grep and cut commands to generate a list of the parameter names
found in this file, one name per line. Do not include the parameter values or the = which is used to
separate them, or any of the comment or empty lines found in the original file. Sort the names in
alphabetically ascending order, and store them in the newly create file ~/initparams.txt.
4. The following script can be used to print a series of 10 random numbers.
#!/bin/bash
for i in $(seq 10); do
echo $RANDOM
done
Create the script in a file of your choosing, and make the file executable. Execute the script 5
separate times, each time recording the output in a file named ~/trial1, ~/trial2, ~/trial3,
etc.
Create a file called titles, which contains the words run1, run2, ... run10, one per line, on each of
ten lines.
Use the paste command to combine the files named titles, trial1, trial2, trial3 trial4,
and trial5, in that order, into a file called trials. Use the default TAB character to separate the
columns.
If you have completed the lab correctly, you should be able to generate output similar to the following.
Do not be concerned if some of the values differ.
[student@station student]$ head -4 usershells.txt cpuvalues.txt
initparams.txt titles trial[15] trials
rha030-3.0-0-en-2005-08-17T07:23:17-0400
85
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
27486
27496
7089
12188
12465
13136
7467
31152
14282
15835
7969
12746
32029
1474
8347
31709
Deliverables
1. The file usershells.txt, which contains a list of all users and login shells defined in the /etc/passwd file,
separated by a space. The lines should be sorted in alphabetically ascending order, using login shells as the
86
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or
print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Questions
1. Which of the following command lines would extract characters 10-20 from each line of the file /etc/xpdfrc?
( ) a. cut -c10-20 /etc/xpdfrc
( ) b. cut -c10,20 /etc/xpdfrc
( ) c. cut -f10-20 /etc/xpdfrc
( ) d. cut -f10,20 /etc/xpdfrc
( ) e. None of the above
2. Which of the following command lines would extract the second and fourth fields from the /etc/group file?
Recall that the /etc/group file uses a : to separate fields.
( ) a. cut -d: -f2,4 /etc/group
( ) b. cut -f:2,4 /etc/group
( ) c. cut -t: -f2-4 /etc/group
( ) d. cut -t: -f2,4 /etc/group
( ) e. None of the above
The file Web defines a palette of colors by listing RGB (Red, Green, and Blue) values for each color, one triplet per
line. Use the following transcript to answer the next 2 questions.
[student@station student]$ cat Web
GIMP Palette
# Netscape -- GIMP Palette file
255 255 255
255 255 204
255 255 153
rha030-3.0-0-en-2005-08-17T07:23:17-0400
87
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S.
and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without
prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone tollfree (USA) +1 866 626 2994 or +1 (919) 754 3700.
255
255
255
204
204
102
051
000
255
204
3. Which of the following command lines would extract the green values (i.e, the second column), omitting the two
header lines?
( ) a. tail +3 Web | cut -d" " -f2
( ) b. cut -s -f2 Web
( ) c. grep "[[:digit:]]" Web | cut -c5-7
( ) d. All of the Above
( ) e. A and C only
4. Which of the following command lines would extract the red and blue values (i.e., the first and third columns),
separating them with a : instead of a space on output? The two header lines should again be omitted.
( ) a. tail +3 Web | cut -d: -f1,3
( ) b. tail +3 Web | cut -c1-3,9-11 -o:
( ) c. tail +3 Web | cut -d" " -f1,3 --output-delimiter=:
( ) d. All of the Above
( ) e. B and C only
The file /usr/share/gimp/1.2/gradients/Abstract_1 defines gradient parameters by listing 13 numbers,
separated by spaces. Use the following transcript to answer the following 2 questions.
[student@station gradients]$ cat Abstract_1
GIMP Gradient
6
0.000000 0.286311
0.572621 0.657763
0.716194 0.734558
0.749583 0.784641
0.824708 0.853088
0.876461 0.943172
0.572621
0.716194
0.749583
0.824708
0.876461
1.000000
0.269543
0.215635
0.040368
0.680490
0.553909
1.000000
0.259267
0.407414
0.833333
0.355264
0.351853
0.000000
1.000000
0.984953
0.619375
0.977430
0.977430
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
0.215635
0.040368
0.680490
0.553909
1.000000
1.000000
0.407414
0.833333
0.355264
0.351853
0.000000
1.000000
0.984953
0.619375
0.977430
0.977430
1.000000
0.000000
5. Which of the following command lines would extract the even numbered columns,omitting the first two header
lines?
( ) a. tail +3 Abstract_1 | cut -d" " -c2,4,6,8,10,12
( ) b. tail +3 Abstract_1 | cut -d" " -f2-12:2
( ) c. tail +3 Abstract_1 | cut -d" " -f2,4,6,8,10,12
( ) d. All of the Above
88
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
0
0
0
0
0
0
7. Which of the following command lines would reliably display only the starting memory address from each range?
( ) a. cut -d- -f1 /proc/iomem
( ) b. cut -c 1-8 /proc/iomem
( ) c. cut -d: -f1 /proc/iomem | cut -d- -f1
( ) d. A and C
( ) e. None of the Above
The file /proc/mounts lists all currently mounted devices, along with their mount points, file systems, and mount
options, each separated by spaces.
[student@station student]$ cat /proc/mounts
rootfs / rootfs rw 0 0
/dev/root / ext3 rw 0 0
/proc /proc proc rw 0 0
usbdevfs /proc/bus/usb usbdevfs rw 0 0
/dev/hda1 /boot ext3 rw 0 0
none /dev/pts devpts rw 0 0
none /dev/shm tmpfs rw 0 0
automount(pid780) /misc autofs rw 0 0
rha030-3.0-0-en-2005-08-17T07:23:17-0400
89
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
90
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
The diff command supports a wide variety of output formats, which can be chosen using various
command line switches. The most commonly used of these is the unified format.
The diff command can be told to ignore certain types of differences, such as changes in white space or
capitalization.
When comparing directories, the diff command can be told to ignore files whose filenames match
specified patterns.
Discussion
The diff Command
The diff command is designed to compare two files that are similar, but not identical, and generate output
that describes exactly how they differ. The diff command is commonly used to track changes to text files,
such as reports, web pages, shell scripts, or C source code. Also, utilities coexist with the diff command,
so that given a version of a file, and the output of the diff command comparing it to some other version,
the file can be brought up to date automatically. Most notable of these commands is the patch command.
We first introduce the diff command by way of example. In the open source community, documentation
generally sacrifices correctness of spelling or grammar for timeliness, as demonstrated in the following
README.pam_ftp file.
[blondie@station blondie]$ cat README.pam_ftp
print
comma
could
allow
debug messages
separated list of users which
login only with email adress
invalid email adresses
Options for:
auth:
for authentication it provides pam_authenticate() and
pam_setcred() hooks.
91
Noticing that the words address and addresses are misspelled, blondie sets out to apply changes, first by
correcting the misspelled words, and secondly by appending a line recording her revisions. She first
makes a copy of the file, appending the .orig extension. She secondly makes her edits.
[blondie@station blondie]$ cp README.pam_ftp README.pam_ftp.orig
[blondie@station blondie]$ nano README.pam_ftp
She now uses the diff command to compare the two revisions of the file.
[blondie@station blondie]$ diff README.pam_ftp.orig README.pam_ftp
11,12c11,12
<
could login only with email adress
<
"ignore"
allow invalid email adresses
-->
could login only with email address
>
"ignore"
allow invalid email addresses
18a19
> Spelling corrections applied by blondie, 22 Sep 2003
Without yet going into detail about diffs syntax, we see that the command has identified the differences
between the two files, exemplifying the essence of the diff command. The diff command is so commonly
used, that its output is often referred to as a noun, as in "Heres the diff between those two files".
92
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
"ignore"
print
comma
could
allow
debug messages
separated list of users which
login only with email adress
invalid email adresses
Options for:
auth: for authentication it provides pam_authenticate() and
pam_setcred() hooks.
James Anderson <james@anderson.us>, 17. June 1999
--- 8,19 ---"debug"
"users="
!
!
"ignore"
print
comma
could
allow
debug messages
separated list of users which
login only with email address
invalid email addresses
Options for:
auth: for authentication it provides pam_authenticate() and
pam_setcred() hooks.
James Anderson <james@anderson.us>, 17. June 1999
+ Spelling corrections applied by blondie, 22 Sep 2003
Obviously, the context diff includes several lines of surrounding context before identifying changes.
Changes are annotated by using a ! to mark lines that have changed, + to mark lines that have
been added, and - to mark lines that have been removed. Using a content diff, utilities can
automatically detect when an administrator accidentally tries to update a file twice.
Unified diff (diff -u)
The unified diff is generated by specifying the -u or -U N command line switches. (The second
form is used to specify that exactly N lines of context should be generated.) Rather than duplicating
lines of context, the unified diff attempts to record changes all in one stanza, creating a more
compact, and arguably more readable, output.
[blondie@station blondie]$ diff -u README.pam_ftp.orig README.pam_ftp
"ignore"
"ignore"
print
comma
could
allow
could
allow
debug messages
separated list of users which
login only with email adress
invalid email adresses
login only with email address
invalid email addresses
Options for:
93
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Rather than identifying a line as "changed", the unified diff annotates that the original version
should be deleted, and the new version added.
Side by side diff (diff -y)
The previous three formats were meant to be easy to read by some other utility, such as the ed editor
or the patch utility. In contrast, the "side by side" format is intended to be read by humans. As the
name implies, the two versions of the file are displayed side by side, with annotations in the middle
that help identify changes. The following example requests a side by side diff using the -y command
line switch, and further qualifies that the output should be formatted to 80 columns with -W80.
[blondie@station blondie]$ diff -y -W80 README.pam_ftp.orig README.pam_ftp
Recognized arguments:
Recognized arguments:
"debug"
"users="
"ignore"
print
comma
could
allow
debug m
separat
login o |
invalid |
"debug"
"users="
"ignore"
print
comma
could
allow
debug m
separat
login o
invalid
Options for:
auth:
for authentication it provide
pam_setcred() hooks.
Options for:
auth:
for authentication it provide
pam_setcred() hooks.
While the output would be more effective using a wide terminal, it does provide an intuitive feel for
the differences between the two files.
Quiet diff (diff -q)
The quiet diff merely reports if two files differ, not the nature of the differences.
[blondie@station blondie]$ diff -q README.pam_ftp.orig README.pam_ftp
rha030-3.0-0-en-2005-08-17T07:23:17-0400
94
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Effect
-c
-C, --context[=N ]
-u
-U, --unified[=N ]
-N
-y, --side-by-side
-W, --width=N
--left-column
Print only the left column when using the side by side format.
-q, --brief
95
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Effect
-B, --ignore-blank-lines
-i, --ignore-case
-I,
--ignore-matching-lines=regex
September 2003
Su Mo Tu We Th Fr
1 2 3 4 5
7 8 9 10 11 12
14 15 16 17 18 19
21 22 23 24 25 26
28 29 30
Sa
6
13
20
27
====================
==== This Month ====
====================
September 2003
Su Mo
1
7 8
14 15
21 22
28 29
Tu
2
9
16
23
30
We
3
10
17
24
Th
4
11
18
25
Fr
5
12
19
26
Sa
6
13
20
27
The file cal_edited.txt differs in two respects. First, a four line header was added to the top.
Secondly, an extra (empty) line was added to the bottom. An "ordinary" diff recognizes all of these
changes.
[blondie@station blondie]$ diff cal.txt cal_edited.txt
0a1,4
> ====================
> ==== This Month ====
> ====================
>
9a14
>
With the -B command line switch, however, the diff command ignores the new, empty line at the bottom.
[blondie@station blondie]$ diff -B cal.txt cal_edited.txt
rha030-3.0-0-en-2005-08-17T07:23:17-0400
96
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
With the -I command line switch, the diff command can be told to also ignore any lines that begin with a
=.
[blondie@station blondie]$ diff -B -I "^=" cal.txt cal_edited.txt
Recursive diffs
The diff command can act recursively, descending two similar directory trees and annotating any
differences. The following table lists command line switches relevant to diffs recursive behavior.
Table 6-3. Command Line Switches for Using diff Recursively
Switch
Effect
-r, --recursive
-x, --exclude=pattern
-X, --exclude-from=file
As an example, blondie is examining two versions of a project called vreader. The project involves
Python scripts which convert calendering information from the vcal format to an XML format. She has
downloaded two versions of the project, vreader-1.2.tar.gz and vreader-1.3.tar.gz, and
expanded each of the archives into her local directory.
[blondie@station blondie]$ ls
vreader-1.2
vreader-1.2.tar.gz
vreader-1.3
vreader-1.3.tar.gz
rha030-3.0-0-en-2005-08-17T07:23:17-0400
97
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
conv_db.pyc
datebook.out.xml
datebook.xml
templates/
-- datebook.xml
vreader.py
In order to summarize the differences between the two versions. She runs a recursive diff on the two
directories.
[blondie@station blondie]$ diff -r vreader-1.[23]
The diff command recurses through the two directories, and notes the following differences.
1. The two binary files vreader-1.2/conv_db.pyc and vreader-1.3/conv_db.pyc differ.
Because they are not text files, however, the diff command does not try to annotate the differences.
2. The complementary file to vreader-1.3/datebook.out.xml is not found in the vreader-1.2
directory.
3. The files vreader-1.2/templates/datebook.xml and
vreader-1.3/templates/datebook.xml differ, and diff annotates the changes.
4. The files vreader-1.2/vreader.py and vreader-1.3/vreader.py differ, and diff annotates
the changes.
Often, when comparing more complicated directory trees, there are files that are expected to change, and
files that are not. For example, the file conv_db.pyc is compiled Python code automatically generated
from the text Python script file conv_db.py. Because blondie is not interested in differences between
the compiled versions of the file, she uses the -x command line switch to exclude the file form her
comparisons. Likewise, she is not interested in the files ending .xml, so she specifies them with an
additional -x command line switch.
[blondie@station blondie]$ diff -r -x "*.pyc" -x "*.xml" vreader-1.[23]
rha030-3.0-0-en-2005-08-17T07:23:17-0400
98
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
*.pyc
*.xml
*.py
Because blondie included *.py in her list of file patterns to exclude, the diff command is left with
nothing to say.
Examples
Example 1. Using diff to Examine New Configuration Files
After updating her sendmail RPM package, blondie notices that she has a new configuration file in her
/etc/mail directory, sendmail.cf.rpmnew. She would like to see how this file compares to her
already existing configuration file, /etc/mail/sendmail.cf. She uses diff to summarize the
differences.
[blondie@station blondie]$ diff /etc/mail/sendmail.cf /etc/mail/sendmail.cf.rpmnew
19,21c19,21
< ##### built by root@station.example.com on Tue Apr 1 15:09:38 EST 2003
< ##### in /etc/mail
< ##### using /usr/share/sendmail-cf/ as configuration include directory
--> ##### built by bhcompile@daffy.perf.redhat.com on Wed Sep 17 14:45:22 EDT 2003
> ##### in /usr/src/build/308253-i386/BUILD/sendmail-8.12.8/cf/cf
> ##### using ../ as configuration include directory
40d39
<
101c100
< DSnimbus.example.com
--> DS
She is satisfied that the new version of the configuration file differs only by some comment lines, and the
lack of a local configuration she had added to her version of the file.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
99
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
68a69
> desktop:x:80:80:desktop:/var/lib/menu/kde:/sbin/nologin
Apparently, a new system user has recently been added, probably as a result of adding new software
using an RPM package file.
After editing her copy of the file, she would like to submit her changes to the person who coordinates
changes to the vreader project. In the open source community, this person is usually referred to as the
maintainer of the project. Rather than sending a full copy of her version, she records the differences
between her version and the original in a file called vreader-1.3.blondie.patch.
[blondie@station blondie]$ diff -ru vreader-1.3 vreader-1.3.local
> vreader-1.3.blondie.patch
She now emails only the patch file to the project maintainer, who can easily use a command called patch
to apply the changes to pristine version.
[blondie@station blondie]$ mail -s "my changes" maintainer@patch.org <
vreader-1.3.blondie.patch
Online Exercises
Lab Exercise
Objective: Use the diff command to track changes to files.
Estimated Time: 10 mins.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
100
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used,
copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Specification
1. Use the diff command to annotate the differences between the files
/usr/share/doc/pinfo-0*/COPYING and /usr/share/doc/mtools-3*/COPYING, using the
context sensitive format. Record the output in the newly created file ~/COPYING.diff. When
specifying the filenames on the command line, list the pinfo file first, and use an absolute reference
for both.
2. Create a local copy of the directory /usr/share/gedit-2, using the following command (in your
home directory).
[student@station student]$ cp -a /usr/share/gedit-2 .
To your local copy of the gedit-2 directory, make the following changes.
a. Remove any two files.
b. Create an arbitrarily named file somewhere underneath the gedit-2 directory, with arbitrary
content.
c. Using a text editor, delete three lines from any file in the gedit-2/taglist directory.
Once you have finished, generate a recursive "diff" between /usr/share/gedit-2 and your copy,
gedit-2. Record the output in the newly created file ~/gedit.diff. When specifying the
directories on the command line, specify the original copy first, and use an absolute reference for
both. Do not modify the contents of your gedit-2 unless you also reconstruct your file
~/gedit.diff.
Deliverables
1. The file ~/COPYING.diff, which contains a context sensitive "diff" of the files
/usr/share/doc/pinfo*/COPYING and /usr/share/doc/mtools*/COPYING, where the pinfo version
of the file is used as the original, and each file is specified using an absolute reference.
2. The file ~/gedit.diff, which contains a recursive "diff" of the directories /usr/share/gedit-2 and
~/gedit-2. Both directories should be specified using absolute references, and the system directory should be
used as the original.
Questions
1. Which of the following command lines would generate a "diff" of the two files using the context sensitive format?
( ) a. diff -y origfile newfile
101
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
102
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
vreader-1.3/datebook.xml?
student]$
student]$
student]$
student]$
student]$
6. Which of the following command lines would report no differences between the files cal.txt and cal2.txt?
( ) a. diff --no-white-space cal.txt cal2.txt
( ) b. diff -B cal.txt cal2.txt
( ) c. diff -w cal.txt cal2.txt
( ) d. diff -I cal.txt cal2.txt
( ) e. None of the above
7. Which of the following command lines would report no differences between the files cal.txt and cal3.txt?
( ) a. diff -I "^world" cal.txt cal3.txt
( ) b. diff --ignore-regex "world$" cal.txt cal3.txt
( ) c. diff -i "world" cal.txt cal3.txt
( ) d. diff -r cal.txt cal3.txt
( ) e. None of the above
rha030-3.0-0-en-2005-08-17T07:23:17-0400
103
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S.
and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without
prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone tollfree (USA) +1 866 626 2994 or +1 (919) 754 3700.
104
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
105
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
In its most basic form, the tr command performs byte for byte substitutions.
Using the -d command line switch, the tr command will delete specified characters from a stream.
Using the -s command line switch, the tr command will squeeze a series of repeated characters in a
stream into a single instance of the character.
Discussion
The tr Command
The tr command is a versatile utility that performs character translations on streams. Translating can
mean replacing one character for another, deleting characters, or "squeezing" characters (collapsing
repeated sequences of a character into one). Each of these uses will be examined in the following
sections.
Unlike all of the previous commands in this section, the tr command does not expect filenames as
arguments. Instead, the tr command operates exclusively on the standard in stream, reserving command
line arguments to specify transformations.
The following table specifies the various ways of invoking the tr command.
Table 7-1. Invocation Syntax for the tr Command
Syntax
Effect
tr SET1 SET2
tr -d SET
tr -s SET
tr -s SET1 SET2
First delete all characters found in SET1, then squeeze all characters
found in SET2.
106
Character Specification
As the above table makes clear, the tr command makes extensive use of characters defined in sets. The
syntax for defining a range of characters is based upon the range specifier found in regular expressions.
The following expressions may be used when specifying characters.
Table 7-2. Specifying Characters for the tr Command
Syntax
Character(s)
literal
\n
\r
\t
\\
The \ character.
[A-Z]
[:alnum:]
[:alpha:]
All letters.
[:blank:]
[:digit:]
All digits.
[:lower:]
[:print:]
[:punct:]
[:space:]
[:upper:]
The table is not meant to be a complete list. Consult the tr(1) man page, or tr --help, for more
information.
abczyxghi
Notice that in the output, the character d is replaced with the character z, e is replaced with the
character y, and f is replaced with the character x. The ordering of the sets is important. The third
letter from the first set is replaced with the third letter from the second set.
What happens if the lengths of the two sets have unequal lengths? the second set is extended to the length
of the first set by copying the last character.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
107
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
abcyyxghi
A classic example of the tr command is to translate text into all upper case or all lower case letters. The
"old school" syntax for such a translation would use character ranges.
[madonna@rosemont madonna]$ cat /etc/hosts
As mentioned in the Lesson on regular expressions, however, range specifications can produce odd
results when various character sets are considered. The "new school" approach is to use character classes.
[madonna@rosemont madonna]$ tr [:lower:] [:upper:] < /etc/hosts
Recalling that the ordering of the character ranges is important to the tr command, the character classes
would need to generate consistently ordered ranges. Only the [:lower:] and [:upper:] character classes
are guaranteed to do so, implying that they are the only classes appropriate for use when using tr for
character translation.
abcghi
[madonna@station madonna]$ echo hark, I hear an elephant! | tr -d [:upper:][:punct:]
hark
hear an elephant
rha030-3.0-0-en-2005-08-17T07:23:17-0400
108
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
aaabcdddeeefggg
If called with the -s command line switch and two arguments, the tr command will perform substitutions
(as if the -s had not been specified), but the squeeze any characters from the second set.
[madonna@station madonna]$ echo "aaabbbcccdddeeefffggg" | tr -s bcf xye
aaaxydddeggg
Notice that this is essentially the same as performing the two operation separately.
[madonna@station madonna]$ echo "aaabbbcccdddeeefffggg" | tr bcf xye
aaaxxxyyydddeeeeeeggg
[madonna@station madonna]$ echo "aaabbbcccdddeeefffggg" | tr bcf xye | tr -s xye
aaaxydddeggg
Lastly, the tr command can be called with both the -s and -d command line switches. In this case, the tr
command expects two arguments. The tr command will first delete the first set of characters, and then
squeeze the second set.
[madonna@station madonna]$ echo "aaabbbcccaaadddeeefffggg" | tr -ds bcf ae
adddeggg
Note the order of operations carefully. This command is essentially the same as a delete (tr -d) followed
by a squeeze (tr -s).
[madonna@station madonna]$ echo "aaabbbcccaaadddeeefffggg" | tr -d bcf
aaaaaadddeeeggg
[madonna@station madonna]$ echo "aaabbbcccaaadddeeefffggg" | tr -d bcf
| tr -s ae
adddeggg
Complementing Sets
Other than -s and -d, there are only two command line switches which modify trs behavior, tabled
below.
Table 7-3. Command Line Switches for the tr Command
Switch
Effect
109
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Effect
Complement SET1 before operating (i.e., use the set of characters excluded by
SET1)
As a quick example of the -c command line switch, the following deletes every character that is not a
vowel or a white space character from standard in.
[madonna@station madonna]$ echo aaabbbcccdddeee | tr -cd aeiouAEIOU[:space:]
aaaeee
Why did madonna get two very different results from the same command line? If you dont know the
answer, and even if you do, you should protect arguments to the tr command with quotes.
Examples
Example 1. Using tr to Clean Up the df Command
Recall a few Lessons ago, when we were discussing the cut command, and its ability to extract fields of
text from a stream. We tried to use the cut command to extract the first and fifth fields from the df
commands output, specifying a space as the field delimiter.
[madonna@station madonna]$ df
Filesystem
/dev/hda3
/dev/hda1
none
1K-blocks
5131108
124427
127616
Filesystem
/dev/hda3
/dev/hda1
none
rha030-3.0-0-en-2005-08-17T07:23:17-0400
110
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Now, she can use the cut command to easily extract the appropriate columns.
[madonna@station madonna]$ df | tr -s | cut -d" " -f1,5
Filesystem Use%
/dev/hda3 93%
/dev/hda1 23%
none 0%
She would prefer the text to use the Unix convention (a single new line character). She uses the tr
command to delete all instances of the carriage return character, storing the result into the file
2city12unix.txt.
[madonna@station madonna]$ tr -d \r < 2city12.txt > 2city12unix.txt
In order to confirm that the conversion happened appropriately, she performs a couple of checks. She first
examines the file with cat -A, and notes that the ^M characters have been removed.
[madonna@station madonna]$ head -5 2city12unix.txt | cat -A
rha030-3.0-0-en-2005-08-17T07:23:17-0400
111
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
16364
16364
32728
She notes that the difference in the number of characters (bytes) in the two files is the same as the
number of lines in the files (787603 - 771239 = 16364). This is appropriate if the tr command deleted
one character per line, as expected.
Finally, because she is no longer interested in keeping the DOS formatted version of the file, she renames
the file 2city12unix.txt to 2city12.txt.
[madonna@station madonna]$ mv 2city12unix.txt 2city12.txt
Good writing often requires that authors avoid overusing certain key words. The user madonna would
like to put the test to Charles Dickens. She first uses a text editor to extract the opening paragraph from
the text.
[madonna@station madonna]$ cat para1
She would now like to generate a count of how often particular words are used. In order to use the uniq
-c command, she would like to rearrange the text so that the words appear one per line. She outlines the
following plan.
1. Delete all punctuation marks.
2. Convert all uppercase characters into lowercase, so that It and it are considered the same word.
3. Covert every space character into a new line character, and squeeze multiple new line characters into
one.
She begins implementing her plan one step at a time, so that she can observe the intermediate results.
[madonna@station madonna]$ tr -d [:punct:]
rha030-3.0-0-en-2005-08-17T07:23:17-0400
112
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
it
it
it
it
it
was
was
was
was
was
the
the
the
the
the
it
was
the
best
of
At this point, madonna is comfortable enough with sort and uniq to finish off the process.
[madonna@station madonna]$ tr -d [:punct:] < para1 | tr [:upper:] [:lower:] |
tr -s \n | sort | uniq -c | sort -rn | head -5
14
12
11
10
4
the
of
was
it
we
Inspired by her progress, she next repeats the technique on the entire text. (The process took about 8
seconds on a 700MHz processor).
[madonna@station madonna]$ tr -d [:punct:] < 2city12.txt | tr [:upper:] [:lower:] |
tr -s \n | sort | uniq -c | sort -rn | head -5
8082
4967
4061
3517
2952
the
and
of
to
a
Example 4. Rot13
In the early days of Usenet newsgroups, people adopted a convention for obscuring text called rot13.
Suppose you were posting a joke, and wanted to include the punch line, but did not want the punch line to
be immediately obvious. The punch line could be transformed by rotating each letter by 13 places, so that
a would become n, b would become o, and z would become m, as in the following example.
Q: Why did the chicken cross the road?
A: Gb trg gb gur bgure fvqr.
How would someone find the answer? By piping the text through a tr implemented rot13 translator.
[madonna@station madonna]$ echo "Gb trg gb gur bgure fvqr." | tr A-Za-z N-ZA-Mn-za-m
rha030-3.0-0-en-2005-08-17T07:23:17-0400
113
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Online Exercises
Lab Exercise
Objective: Gain familiarity with the tr command.
Estimated Time: 10 mins.
Specification
1. The /etc/passwd file uses colons as a field delimiter. Create the file ~/passwd.tsv, which is a
copy of the /etc/passwd file converted to use tabs as field delimiters (i.e., every : is converted
to a tab).
2. Create the file ~/file_roller.converted, which is a copy of the file
/usr/share/file-roller/glade/file_roller.glade, with the following transformations.
a. Convert all tabs to spaces.
b. Convert double quotes (") to single quotes (). (Do not use backticks ().)
3. Create a file called ~/openssl.converted, which is a copy of the file
/usr/share/ssl/openssl.cnf, with the following transformations.
a. All comments lines (lines whose first non-whitespace character is a #) are removed.
b. All empty lines are removed.
c. All upper case letters are folded into lower case letters.
d. All digits are replaced with the underscore character (_).
Deliverables
1. The file ~/passwd.tsv, which is a copy of the /etc/passwd file with tabs substituted for colons.
2. The file ~/file_roller.converted, which is a copy of the file
/usr/share/file-roller/glade/file_roller.glade, with all tabs converted to spaces, and all double
114
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or
print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Questions
1. Which of the following command lines would convert all ASCII carriage return characters in the file text.mac
to ASCII new line characters?
( ) a. tr -c CR LF text.mac
( ) b. tr \r \n text.mac
( ) c. tr CR LF < text.mac
( ) d. tr \r \n < text.mac
( ) e. None of the above
2. Which of the following command lines would squeeze a series of repeated space characters in the file df.out
into a single space?
( ) a. tr --squeeze \s df.out
( ) b. tr -s " " df.out
( ) c. tr -ds " " < df.out
( ) d. tr -s SPC < df.out
( ) e. None of the above
3. Which of the following command lines would delete the trailing slash (/) from the string etc/?
( ) a. echo etc/ | tr -d [:punct:]
( ) b. tr -d / etc/
( ) c. tr -d [:letter:] < /etc/fstab
( ) d. echo etc/ | tr -cd /
( ) e. None of the Above
In the following transcript, madonna is trying to save the ls(1) man page, and then edit it, only to find that it is full of
control characters and other mess.
[madonna@station madonna]$ man ls > ls.man.out
rha030-3.0-0-en-2005-08-17T07:23:17-0400
115
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S.
and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without
prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone tollfree (USA) +1 866 626 2994 or +1 (919) 754 3700.
LS(1)
FSF
$
$
$
N^HNA^HAM^HME^HE$
ls - list directory contents$
$
S^HSY^HYN^HNO^HOP^HPS^HSI^HIS^HS$
l^Hls^Hs [_^HO_^HP_^HT_^HI_^HO_^HN]... [_^HF_^HI_^HL_^HE]...$
$
LS(1)$
4. Which of the following commands would effectively remove all of the ^H control sequences from the file
ls.man.out?
( ) a. tr -d ^H < ls.man.out
( ) b. tr -cd [:lower:] ls.man.out
( ) c. tr -d [:punct:] ls.man.out
( ) d. tr -cd [:print:][:space:] < ls.man.out
( ) e. None of the above
After successfully removing the ^h control sequences, and storing the results in the file ls.man.noh, madonna is
still left with a mess to clean up.
[madonna@station madonna]$ tail +5 ls.man.noh | head
NNAAMMEE
ls - list directory contents
SSYYNNOOPPSSIISS
llss [_O_P_T_I_O_N]... [_F_I_L_E]...
DDEESSCCRRIIPPTTIIOONN
List information about the FILEs (the current directory by default).
Sort entries alphabetically if none of --ccffttuuSSUUXX nor ----ssoorrtt.
Mandatory arguments to long options are mandatory for short options
5. Which of the following command lines would remove all underscores from the file ls.man.noh?
( ) a. tr -d [:alnum:] < ls.man.noh
( ) b. tr -d _ < ls.man.noh
( ) c. tr -cd _ ls.man.noh
( ) d. tr -d _ ls.man.noh
( ) e. None of the above
After successfully removing the _ characters, and storing the results in ls.man.noh_, madonna is still frustrated by
a large number of "doubled" letters and hyphens (-).
rha030-3.0-0-en-2005-08-17T07:23:17-0400
116
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation of U.S.
and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print format without
prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email training@redhat.com or phone tollfree (USA) +1 866 626 2994 or +1 (919) 754 3700.
NNAAMMEE
ls - list directory contents
SSYYNNOOPPSSIISS
llss [OPTION]... [FILE]...
DDEESSCCRRIIPPTTIIOONN
List information about the FILEs (the current directory by default).
Sort entries alphabetically if none of --ccffttuuSSUUXX nor ----ssoorrtt.
Mandatory arguments to long options are mandatory for short options
6. Which of the following command lines would convert the doubled letters and hyphens into a single instance?
( ) a. tr -s [:alpha:] ls.man.noh_
( ) b. tr -s [:lower:-] ls.man.noh_
( ) c. tr -cs [:alpha:]- < ls.man.noh_
( ) d. tr -s [:alpha:-] < ls.man.noh_
( ) e. tr -s [:alpha:]- < ls.man.noh_
She successfully removes the doubled characters, and stores the results in the file ls.man.clean.
[madonna@station madonna]$ tail +5 ls.man.clean | head
NAME
ls - list directory contents
SYNOPSIS
ls [OPTION]... [FILE]...
DESCRIPTION
List information about the FILEs (the current directory by default).
Sort entries alphabetically if none of -cftuSUX nor -sort.
7. After applying the correct answer from the previous question, what potential inaccuracies are in the text?
( ) a. Any line originally containing an underscore has been deleted from the text.
( ) b. Any line containing repeated punctuation characters now has only a single instances of the character.
( ) c. Any word originally containing a sequence of repeated letters now has only a single instance of the letter.
( ) d. Any line originally containing a bracket ([ ]) or colon has been deleted from the text.
( ) e. There should be no distortions to the original text.
117
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
2city12.txt
ls.man.clean
ls.man.noh
ls.man.noh_
ls.man.out
para1
ebbbe
10. Which of the following expression could replace the expression ????????
( ) a. tr -s acde
( ) b. tr -s acd e
( ) c. tr -d acd e
( ) d. tr -d acd | tr -s e
( ) e. None of the above
118
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Notes
1. Project Gutenberg is based at the web site http://gutenberg.net.
2. This example is inspired by a similar example found in the coreutils info page (info coreutils).
119
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
The aspell -l command performs a non-interactive spell check on the standard in stream.
The aspell dump command can be used to view the systems master or a users personal dictionary.
The command aspell create personal and aspell merge personal can be used to create or append to a
users personal dictionary from a word list.
Discussion
In the Red Hat Enterprise Linux distribution, the aspell utility is the primary utility for checking the
spelling of text files. In this Lesson, we learn how to use aspell to interactively spell check a file and
customize the spell checker with a personal dictionary.
Using aspell
When running aspell, the first argument (other than possible command line switches) is interpreted as a
command, telling aspell what to do. The following commands are supported by aspell.
Table 8-1. Aspell commands
Command
Action
-l, list
config
dump master|personal|repl
create master|personal|repl
merge master|personal|repl
The following table lists some of the more common command line switches that are used with the aspell
120
Effect
-W --ignore=N
--ignore-case
-p, --personal=filename
Use the word list filename for the personal word list.
-x, --dont-backup
Hey Elvis!
I heard that you were about to take the lab test for the string
procesing workbook in Red Hat Academy. IIRC, its prety
straightforward, if youve been keeping up with the exercises.
LOL, Prince
Before sending the message, prince uses aspell -c to perform an interactive spell check.
[prince@station prince] aspell -c toelvis
Upon execution, the aspell command open an interactive session, highlighting the first recognized
misspelled word.
Hey Elvis!
I heard you were about to take the lab test for the string
IIRC, its prety
straightforward, if youve been keeping up with the exercises.
procesing workbook in Red Hat Academy.
LOL, Prince
=====================================================================
1) processing
6) preceding
2) precessing
7) professing
3) precising
8) promising
4) proceeding
9) proposing
rha030-3.0-0-en-2005-08-17T07:23:17-0400
121
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
At this point, prince has a "live" keyboard, meaning that single key presses will take effect without him
needing to use the return key. He may choose from the following options.
Use Suggested Replacement
The aspell command will do its best to suggest replacements for the misspelled word from its
library. If it has found a correct suggestion (as in this case, it has), that suggestion can be replaced
by simply hitting the numeric key associated with it.
Ignore the Word
By pressing i, aspell will simply ignore the word this instance and move on. Pressing capital I will
cause aspell to ignore all instances of the word in the current file.
Replace the Word
If aspell was not able to generate an appropriate suggestion, prince may use r to manually replace
the word. When finished, aspell will pick up again, first rechecking the specified replacement. By
using capital R, aspell will remember the replacement and automatically replace other instances of
the misspelled word.
Add the Word to the Personal Dictionary
If prince would like aspell to learn a new word, so that it will not be flagged when checking future
files, he may press a to add the word to his personal dictionary.
Exit aspell
By pressing x, prince can immediately exit the interactive aspell section. Any spelling corrections
already implemented will be saved.
As prince proceeds through the interactive session, aspell flags procesing, prety, IIRC, and LOL as
misspelled. For the first two, prince accepts aspells suggestions for the correct spelling. The last two
"words" are abbreviations that prince commonly uses in his emails, so he adds them to his personal
dictionary. Unfortunately, because its is a legitimate word, aspell does not report princes misuse of it.
When finished, prince now has two files, the corrected version of toelvis, and an automatically
generated backup of the original, toelvis.bak.
[prince@station prince]$ ls
toelvis
toelvis.bak
4c4
< processing workbook in Red Hat Academy.
--> processing workbook in Red Hat Academy.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
122
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
procesing
IIRC
prety
LOL
The aspell utility lists the four words it would flag as misspelled. After the interactive spell check, prince
performs a non-interactive spell check on his backup of the original file.
[prince@station prince]$ aspell -l < toelvis.bak
procesing
prety
Because the words IIRC and LOL were added to princes personal dictionary, they are no longer flagged
as misspelled.
LOL
IIRC
153675
[prince@station prince]$ aspell dump master | grep "^add. *ion$"
addiction
addition
adduction
The aspell command can also automatically create a personal dictionary (if it doesnt already exist), or
merge into it (if it does) using words read from standard in. Suppose prince has a previous email
message, in which he used many of his commonly used abbreviations. He would like to add all of the
abbreviations found in that email to his personal dictionary. He first uses aspell -l to extract the words
from the original message.
[prince@station prince]$ aspell -l < good_email.txt
FWIW
rha030-3.0-0-en-2005-08-17T07:23:17-0400
123
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
After observing the results, he decides to add all of these words to his personal dictionary, using aspell
merge personal. When he finishes, he again dumps his (expanded) personal dictionary.
[prince@station prince]$ aspell -l < good_email.txt | aspell merge personal
[prince@station prince]$ aspell dump personal
TTFN
AFK
LOL
RSN
IIRC
FWIW
Getting Help
Where would prince expect to find help for the aspell command?
[prince@station prince]$ man aspell
A reasonable first guess, but in this case wrong. Like most commands, aspell will generate a usage
summary when called with the --help command line switch. Additional documentation can be found in
the /usr/share/doc/aspell-0*/man-text/ directory (as simple text files), or
/usr/share/doc/aspell-0*/man-html/ in html format. The following command, when executed
from an X terminal, will start prince off in the html based documentation.
[prince@station prince]$ mozilla /usr/share/doc/aspell-0.33.7.1/man-html/index.html
Examples
Example 1. Adding Service Names to aspells Personal
Dictionary
The user prince is commonly answering questions related to Linuxs networking services in his emails,
and aspell consistently flags the conventional service names as misspelled words. He would like to add
rha030-3.0-0-en-2005-08-17T07:23:17-0400
124
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Using the less pager to browse the file services.maybe, he finds many duplicate entries. He makes life
easier for himself (and eventually aspell) by regenerating the list, removing duplicates.
[prince@station prince]$ aspell -l < /etc/services | sort | uniq > services.maybe
Browsing the file again, prince is satisfied that the list contains words he would rather not have flagged as
misspelled. He adds the word list to his personal dictionary.
[prince@station prince]$ aspell merge personal < services.maybe
Online Exercises
Lab Exercise
Objective: Use the aspell command to perform routine spell checks.
Estimated Time: 10 mins.
Setup
In order to prepare for this Exercise, remove any personal dictionaries (or replacement list) you have
accumulated using the following command.
[student@station student]$ rm .aspell*
Specification
1. Generate a list of all words that aspell flags as misspelled found in all files underneath the
/etc/sysconfig directory, and its subdirectories. The list should be alphabetically ascending
sorted, and duplicates words should be removed. Store the list (one word per line) in the file
~/sysconfig.spell.txt.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
125
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy.
Any other use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or
otherwise duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being
used, copied, or otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
20c20
< Maarten Litmaath called which-v6, he was using -i as option
--> Maarten Smith called which-v6, he was using -i as option
59c59
<
to explicity search for normal binaries, while using
-->
to explicitly search for normal binaries, while using
73c73
<
ful to explicity search for normal binaries, while
-->
ful to explicitly search for normal binaries, while
[student@station student]$ aspell dump personal
stdout
usr
csh
texinfo
gcc
stdin
Deliverables
1. The file ~/sysconfig.spell.txt, which contains an alphabetically ascending sorted list of all words flagged
by aspell as misspelled found in all files underneath the /etc/sysconfig directory, and its subdirectories. The
file should not contain duplicate words.
2. The file ~/README, which is a copy of the file /usr/share/doc/which-*/README, which has been spell
checked with the aspell command. The word explicity should be replaced with explicitly, and the word Litmaath
with Smith.
3. An aspell personal dictionary that contains exactly the words gcc, stdout, texinfo, stdin, usr, and csh.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
126
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Questions
1. Which of the following command lines would start an interactive aspell spell check on the file report.txt?
( ) a. aspell report.txt
( ) b. aspell -c report.txt
( ) c. aspell -c < report.txt
( ) d. aspell < report.txt
( ) e. None of the above
2. Which of the following command lines would start a non-interactive aspell spell check on the file report.txt?
( ) a. aspell -l report.txt
( ) b. aspell < report.txt
( ) c. aspell -b report.txt
( ) d. aspell -l < report.txt
( ) e. None of the above
3. Which of the following cannot be performed when aspell flags an unrecognized word during an interactive spell
check?
( ) a. The unrecognized word can be added to the systems master dictionary.
( ) b. The unrecognized word can be added to the users personal dictionary.
( ) c. The unrecognized word can replaced from a list of suggested replacements.
( ) d. The unrecognized word can manually replaced by the user.
( ) e. All of the above actions can be performed.
4. Assuming the file mywords.txt contains a series of whitespace separated words, which of the following
command lines could be used to add the words to a users personal dictionary?
( ) a. aspell merge mywords.txt
( ) b. aspell merge personal mywords.txt
( ) c. aspell merge personal < mywords.txt
( ) d. aspell merge < mywords.txt
( ) e. None of the above
127
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
128
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
129
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Using the -p command line switch, the fmt command will only reformat text that begins with the
specified prefix, preserving the prefix.
The split command can be used to split a single file into multiple files based on either a number of
lines or a number of bytes.
Discussion
The fmt Command
Motivation for the fmt Command
Hopefully, the Lessons in this Workbook encountered so far have demonstrated the powerful ways that
text can be manipulated using basic Linux (and Unix) command line utilities. Because Linux provides
such a useful toolkit of text manipulation commands, the data that people handle is often left as simple
text. The /etc/passwd file is the classic example. Rather than embedding user definitions in some
database that requires a custom utility for access, they are defined in a simple text file that anyone with
knowledge of the grep command can search.
The common use of the simple text editor follows as a natural result of the common occurrence of the
simple text file. We emphasize again that text editors are not word processors. Elaborate word processing
applications, such as OpenOffice or AbiWord, generally store information using elaborate markup or
binary formatting to define fonts, colors, and other such details about the texts appearance. In contrast,
simple text editors such as nano, vim, or gedit store just the data: what you see is what you get. As a
result, users use text editor to edit text files with much more control and predictability.
One side effect of the variety of text editors in Linux, and in particular the coexistence of text editors and
word processors, is the inconsistencies with which word wrapping is handled. To a word processor, and
many HTML based text entry forms, new line characters are usually considered not worthy of the
concern of users. A user begins typing text, without ever using the RETURN key, and the application
decides when to wrap a line and where to insert a new line character. While this is not a problem, and
perhaps even desirable, for writing a letter to a friend, it can cause significant problems when editing a
line based configuration file (such as the /etc/passwd file, the /etc/hosts file, the /etc/fstab file,
etc..., etc...).
As an example of the inconsistencies of various text editors, the user elvis tries a simple experiment. He
types the first sentence from the previous paragraph using four different applications: the nano text
130
rha030-3.0-0-en-2005-08-17T07:23:17-0400
131
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
What result does wc show? The four different applications used four different conventions for displaying
and saving the simple text sentence (five, if you include the binary OpenOffice format).
[elvis@station elvis]$ wc side_effect.* 2>/dev/null
1
1
16
0
3
21
31
31
109
31
31
233
188
188
4950
187
190
5703
side_effect.gedit
side_effect.gvim
side_effect.ooffice.sxw
side_effect.ooffice.txt
side_effect.nano
total
The nano text editor was the only application that implemented word wrapping by default. Although
elvis never hit the return key, three ASCII new line characters were inserted. The gedit and gvim
applications were consistent with Linux (and Unix) convention: they did not insert new line characters in
the middle of the text, but they would not let a text file end without a terminating new line character.
Although consistent with each other in terms of how the file was stored, they differed in how the text was
presented to the user: gedit wrapped the text at word boundaries, while gvim wrapped the text only when
it could fit no more on a line. Like gedit, the OpenOffice application wrapped the text while displaying
it, but did not add the conventional Linux new line to the end of the file while saving it to disk. We cant
even begin to discuss why the OpenOffice standard format took nearly 5000 bytes of binary data to store
about 200 characters.
All of this is to say that how an application handles the word wrapping issues is not obvious to the casual
user, and often, when reading text with one utility that was written by another, word wrapping issues
cause problems.
One side effect of the variety of text editors in Linux, and in particular the c
oexistence of text editors and word processors, is the inconsistencies with whic
h word wrapping is handled.
[elvis@station elvis]$ fmt side_effect.gvim
rha030-3.0-0-en-2005-08-17T07:23:17-0400
132
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
31
| wc
188
The cat command, true to its nature, performed no formatting on the file when it displayed it. The fact
that the lines wrapped at 80 characters is a side effect of the terminal that was displaying it. The fmt
command, on the other hand, wrapped the text at word boundaries so that no line was over 75 characters
in length.
Effect
-w, --width=N , -N
-p, --prefix=STRING
rha030-3.0-0-en-2005-08-17T07:23:17-0400
133
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
As a rule, each DbFoo object has exactly one underlying DB_FOO struct
(defined in db.h) associated with it. In some cases, we inherit directly
from the DB_FOO structure to make this relationship explicit. Often,
the underlying C layer allocates and deallocates these structures, so
there is no easy way to add any data to the DbFoo class. When you see
a comment about whether data is permitted to be added, this is what
is going on. Of course, if we need to add data to such C++ classes
in the future, we will arrange to have an indirect pointer to the
DB_FOO struct (as some of the classes already have).
Suppose a programmer edited the comment, adding the following few words on the second line.
[elvis@station elvis]$ cat cxx_comment.txt
//
// As a rule, each DbFoo object has exactly one underlying DB_FOO struct
// (defined in db.h) associated with it. In some cases, but we really dont
expect many of them, we inherit directly
// from the DB_FOO structure to make this relationship explicit. Often,
// the underlying C layer allocates and deallocates these structures, so
// there is no easy way to add any data to the DbFoo class. When you see
// a comment about whether data is permitted to be added, this is what
// is going on. Of course, if we need to add data to such C++ classes
// in the future, we will arrange to have an indirect pointer to the
// DB_FOO struct (as some of the classes already have).
//
Because each line of the text begins with a //, and ends with an ASCII new line character, readjusting
the line to fit back into 80 characters would involve pushing some words to the next line, which would
then also need to be reformatted, and so on.
Fortunately, the fmt command with the -p command line switch makes life much easier.
[elvis@station elvis]$ fmt -70 -p"// " cxx_comment.txt
//
//
//
//
//
//
//
//
//
//
//
rha030-3.0-0-en-2005-08-17T07:23:17-0400
134
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
The fmt command did all of the hard work, and preserved the prefix characters.
1066
9594
47929 pointless.txt
this
this
this
this
this
is
is
is
is
is
line
line
line
line
line
number
number
number
number
number
1062
1063
1064
1065
1066
of
of
of
of
of
a
a
a
a
a
pointless
pointless
pointless
pointless
pointless
file.
file.
file.
file.
file.
Now elvis uses the split command to divide the file into smaller files, each of 200 lines.
[elvis@station elvis]$ split -200 pointless.txt sub_pointless_
[elvis@station elvis]$ wc sub_pointless_a *
200
200
200
200
200
66
1066
1800
1800
1800
1800
1800
594
9594
8892
9000
9000
9000
9001
3036
47929
sub_pointless_aa
sub_pointless_ab
sub_pointless_ac
sub_pointless_ad
sub_pointless_ae
sub_pointless_af
total
this
this
this
this
this
is
is
is
is
is
line
line
line
line
line
number
number
number
number
number
796
797
798
799
800
of
of
of
of
of
a
a
a
a
a
pointless
pointless
pointless
pointless
pointless
file.
file.
file.
file.
file.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
135
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Effect
-l, --lines=N , -N
-b, --bytes=N
-l, --lines=N , -N
--line-bytes=N
Split input into files of at most N bytes, but perform split at the end
of a line.
-a, --suffix=N
Notes:
a. When specifying N , a single letter suffix can be included which acts as a multiplier: b=512,
k=1024, and M=1024*1024.
Splitting Standard In
In the previous Lesson, we saw that aspells master dictionary can be dumped using the following
command.
[elvis@station elvis]$ aspell dump master | wc
153675
153675 1502478
The user elvis would like to store a copy of the dictionary, but he would like to break it down into files of
100 lines each. Realizing that this will create 1536 files, his resulting filenames will run out of letters if
he does not bump up the suffix length to 3 (26*26 = 676). Because he wants to specify the string dict_ as
a prefix, he must supply two arguments, so he uses the special filename - to cause split to read from
standard in.
[elvis@station dict]$ aspell dump master | split -100 -a3 - dict_
[elvis@station dict]$ ls
dict_aaa
dict_aab
dict_aac
...
dict_ahb
dict_ahc
dict_ahd
dict_ahe
dict_ahf
dict_ahg
dict_ahh
dict_ahi
dict_ahl
dict_ahm
dict_ahn
dict_aow
dict_aox
dict_aoy
dict_awh
dict_awi
dict_awj
dict_bds
dict_bdt
dict_bdu
dict_bld
dict_ble
dict_blf
dict_bso
dict_bsp
dict_bsq
dict_bzz
dict_caa
dict_cab
dict_aom
dict_aon
dict_aoo
dict_aop
dict_aoq
dict_aor
dict_aos
dict_aot
dict_avx
dict_avy
dict_avz
dict_awa
dict_awb
dict_awc
dict_awd
dict_awe
dict_bdi
dict_bdj
dict_bdk
dict_bdl
dict_bdm
dict_bdn
dict_bdo
dict_bdp
dict_bkt
dict_bku
dict_bkv
dict_bkw
dict_bkx
dict_bky
dict_bkz
dict_bla
dict_bse
dict_bsf
dict_bsg
dict_bsh
dict_bsi
dict_bsj
dict_bsk
dict_bsl
dict_bzp
dict_bzq
dict_bzr
dict_bzs
dict_bzt
dict_bzu
dict_bzv
dict_bzw
dict_cha
dict_chb
dict_chc
rha030-3.0-0-en-2005-08-17T07:23:17-0400
136
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
dict_aou
dict_aov
dict_awf
dict_awg
dict_bdq
dict_bdr
dict_blb
dict_blc
dict_bsm
dict_bsn
dict_bzx
dict_bzy
100
100
100
100
100
100
788 dict_aaa
790 dict_aab
1008 dict_aac
...
100
100
75
153675
100
1215
100
1206
75
917
153675 1502478
dict_cha
dict_chb
dict_chc
total
Examples
Example 1. Using fmt to Clean Email
While using the mutt terminal based mailer, elvis saves and then views the following email message.
[elvis@station elvis]$ cat email.txt
The email is composed of different included sections, each of which was presumably written by a
different author using a different text editor. The first few lines are fine, but then the last included
comment is all one long line.
Before replying, elvis cleans up the message using the fmt command.
[elvis@station elvis]$ fmt -p"> >> " -w60 email
rha030-3.0-0-en-2005-08-17T07:23:17-0400
137
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
Notice that the fmt command only operated on lines that began with the > >> prefix. (In this case,
there was only one.) The rest of the text was left alone.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
138
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
In the first Lesson, the PNM format was mentioned as a simple example of encoding images. The picture
is first reduced to an array of dots ("pixels"), and then the color of each pixel is encoded into three bytes
of raw data, its "redness", "greenness", and "blueness", each as a value from 0 to 255.
A few lines of ASCII text are prepended to file, to identify the format, the number of pixels in each row,
the number of rows, and the "depth" of the image. (The depth is the number of integers which are used to
encode each color component. Using the scheme described in the previous paragraph, the image would
have a depth of 255).
After a little experimenting with the head command, elvis determines that his image file consists of four
lines of ASCII text, followed by binary data.
[elvis@station elvis]$ head -4 clouds.pnm
P6
The text P6 probably acts as magic. Magic is the term for specific strings (or bytes) that identify
(often binary) file formats. A collection of "magic" identifiers is cataloged in the file
/usr/share/magic. (For the curious, try grep P6 /usr/share/magic.)
Apparently, any line in the ASCII header that begins with a # is interpreted as a comment.
These two numbers probably identify the number of pixel in a row, and the number of rows in the
image. His image is an array of 256x256 pixels.
The last number defines the depth of the image, elvis assumes.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
139
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
P6
# CREATOR: The GIMPs PNM Filter Version 1.0
256 256
255
[elvis@station elvis]$ wc clouds.dat
While the number of lines and words reported by the wc command are meaningless, the number of
"characters" is really the number of bytes in the file. Performing a quick calculation, elvis determines
that an image of 256x256 pixels, with each pixel requiring 3 bytes of data, should be 256*256*3=196608
bytes in length. The wc commands character count agrees.
Next, elvis uses the split command to divide the images raw data into four slices, each 196608/4=49512
bytes in size.
[elvis@station elvis]$ split -b49512 clouds.dat clouds_
[elvis@station elvis]$ wc clouds.dat clouds_ * 2>/dev/null
0
0
0
0
0
0
8
1
1
1
8
19
196608
49152
49152
49152
49152
393216
clouds.dat
clouds_aa
clouds_ab
clouds_ac
clouds_ad
total
Now that elvis has four slices of raw data from the original image, each of which contains one fourth of
the original number of rows. He used a text editor to update the header information to reflect his change,
and stores the updated header in the file clouds.newhdr.
[elvis@station elvis]$ diff -u clouds.hdr clouds.newhdr
As the diff command reveals, elviss only edit was to change the number which defines the number of
rows from 256 to 256/4=64. Now elvis creates 4 new PNM image files by prepending the modified
header to the split image data. When finished, he views his images with the "Eog of GNOME" viewer,
eog.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
140
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is
a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether
in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
elvis]$
elvis]$
elvis]$
elvis]$
elvis]$
cat
cat
cat
cat
eog
clouds.newhdr
clouds.newhdr
clouds.newhdr
clouds.newhdr
clouds_row*
clouds_aa
clouds_ab
clouds_ac
clouds_ad
>
>
>
>
clouds_row1.pnm
clouds_row2.pnm
clouds_row3.pnm
clouds_row4.pnm
Why would elvis want to use command line tools? One answer is precision. Most graphical image
editors use mouse selections to perform these types of operations, which can lead to frustration when
trying to perform exacting edits. The second answer is automation. Suppose elvis had 283 images to
which he needed to perform the same operation. The process used above could be easily automated by
recording the commands in a bash script. (While the need for this level of precision or automation is
hard to imagine when handling abstract images, consider someone who might be handling images
routinely created by a medical imaging device.)
rha030-3.0-0-en-2005-08-17T07:23:17-0400
141
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Online Exercises
Lab Exercise
Objective: Effectively use the fmt and split commands.
Estimated Time: 10 mins.
Specification
1. Use the grep command to print every word in the file /usr/share/dict/words which contains
the text ee. Use the fmt command to reformat the output into lines of (the default) 75 characters
width. Store the result in the file ee_lines.txt.
2. The file /usr/share/doc/bash*/loadables/cut.c contains a couple of large sections of
comment text, whose lines all begin with the text *. Use the fmt command to reformat only the
comment text to a width of 40 characters. Store the result in the file ~/cut40.c.
If performed correctly, you should be able to reproduce results similar to the following.
[student@station student]$ tail +62 cut40.c | head
3. The file /usr/share/zoneinfo/zone.tab lists the locations of cities used to identify timezones
and locals. Use the split command to split this file into files of 80 lines each (except, of course, for
the last file, which will collect the remainder). The new files should exist in your home directory,
and all have the form ~/zone_aa, where the letters aa iterate with each file.
Deliverables
1. The file ee_lines.txt, which contains every word from the file /usr/share/dict/words which contains
the text ee, reformatted to a width of 75 characters per line.
2. The file ~/cut40.c, which contains the contents of the file /usr/share/doc/bash-*/loadables/cut.c,
where all lines beginning with the characters * have been reformatted to a width of 40 characters.
3. The contents of the file /usr/share/zoneinfo/zone.tab, split into files of 80 lines each, with each
rha030-3.0-0-en-2005-08-17T07:23:17-0400
142
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
Questions
1. Which of the following command lines would reformat the contents of the file email.txt to a width of 40
characters?
( ) a. fmt -w40 email.txt
( ) b. format -w40 email.txt
( ) c. fmt -W40 email.txt
( ) d. format -W40 email.txt
( ) e. None of the above
2. Which of the following command lines would reformat all comment lines within the shell script conv.sh (all
lines that begin with #) to a width of 40 characters?
( ) a. fmt -p# -w40 conv.sh
( ) b. fmt -p\# -w40 conv.sh
( ) c. fmt --prefix=# -w40 conv.sh
( ) d. fmt --pre=# -w40 conv.sh
( ) e. None of the above
3. Which of the following command lines would reformat the contents of the file letter.txt to a width of 75
characters per line?
( ) a. fmt -75 letter.txt
( ) b. fmt < letter.txt
( ) c. fmt --width=75 letter.txt
( ) d. All of the Above
( ) e. A and C only
143
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a
violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in
electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed
please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
144
rha030-3.0-0-en-2005-08-17T07:23:17-0400
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other use is a violation
of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise duplicated whether in electronic or print
format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or otherwise improperly distributed please email
training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.
rha030-3.0-0-en-2005-08-17T07:23:17-0400
145
Copyright (c) 2003-2005 Red Hat, Inc. All rights reserved. For use only by a student enrolled in a Red Hat Academy course taught at a Red Hat Academy. Any other
use is a violation of U.S. and international copyrights. No part of this publication may be photocopied, duplicated, stored in a retrieval system, or otherwise
duplicated whether in electronic or print format without prior written consent of Red Hat, Inc. If you believe Red Hat course materials are being used, copied, or
otherwise improperly distributed please email training@redhat.com or phone toll-free (USA) +1 866 626 2994 or +1 (919) 754 3700.