Vous êtes sur la page 1sur 75

Advanced Unix Commands: Sed and AWK.

 sed and awk are two very powerful tools that enable a user to manipulate files in an efficient manner
 

sed: a text editor that works on full streams of text AWK: an output formatting language

Sed
 Sed is a stream editor (thus the name), and it is designed to work on a specified stream of text according to rules set by the user beforehand
 For example, the output of the ls command produces a stream of texta directory listing that can be piped through sed and edited.  In addition, sed can work on files.  If you have a group of files with similar content and need to make a particular edit to the contents of all these files, sed will enable you to do that very easily.

Sed example
 The syntax for the utility is:  sed [options] '{command}' [filename]  Here you combine the contents of two files while at the same time performing a substitution for the name aftab in both files.

Sed example
 1. Create two files, each with a list of first names, in vi:
$ vi names1
aftab lucy preeti neeti shetty

$ vi names2
aftab nancy pallni rony dolly

Sed example The Substitute Command


's/{old value}/{new value}/'  2. At the command line enter and run the following command: $ sed s/aftab/raj/g names1 names2 > names3  3. Display the output of the third file to discover the resulting list of names:  Note:g specifies that sed should look globally. Without that trailing g, if the name aftab happened to be on the same line twice, only the first would be substituted.

Sed Using the -e Option


 Multiple commands may be specified by using the -e option:
 sed -e s/aftab/raj/; s/nancy/fancy/ names1 names2

Sed Using the -e Option


 There are three ways for providing a series of editing instructions for sed to process at the command line.
 One way is to use the semicolon, such as in the previous example, to separate editing instructions.  Another is to precede each individual editing argument with the e switch, like this:
$ sed -e s/aftab/raj/g -e s/nancy/fancy/g names1 names2

Sed Using the -e Option


 A third option is to use the multiple-line entry capability of the shell  The following is how that would appear within the ksh shell environment, but not C shell: $ sed > s/aftab/raj/ > s/nancy/fancy/ names1 names2

Sed record field


 sed can also be used to change record field delimiters from one to another. For example, the following will change all tabs to spaces:  sed 's/ / /g'

Sed record field


 Sometimes, you don't want to change every occurrence that appears in a file.  At times, you only want to make a change if certain conditions are metfor example, following a match of some other data.

Sed record field


 To illustrate, consider the following text file:
$ cat sample one 1 two 1 three 1 one 1 two 1 two 1 three 1

Sed record field


 Suppose that it would be desirable for "1" to be substituted with "2," but only after the word "two" and not throughout every line.
 This can be accomplished by specifying that a match is to be found before giving the substitute command: $ sed '/two/ s/1/2/' sample one 1 two 2 three 1 one 1 two 2 two 2 three 1

Sed record field


 And now, to make it even more accurate: $ sed ' > /two/ s/1/2/ > /three/ s/1/3/' sample one 1 two 2 three 3 one 1 two 2 two 2 three 3

Sed record field


 Bear in mind once again that the only thing changed is the display. I  f you look at the original file, it is the same as it always was.  You must save the output to another file to create permanence.  It is worth repeating that the fact that changes are not made to the original file is a true blessing in disguiseit lets you experiment with the file without causing any real harm, until you get the right commands working exactly the way you expect and want them to.

Sed Files
 Of course, no matter which of the three methods just described is used, none are practical when it comes time to enter a long list of editing commands for sed on the command line.  To provide a large series of commands, sed has the capability to read a file full of commands that contains the editing instructions as a single command-line argument.  This is done using the -f option.  The file denoted with the -f argument simply specifies a text file with a series of actions to be performed in sequence.

Sed Files example


 Create a new file with vi called sed.scr and list a series of editing instructions for sed: $ vi sed.scr
s/aftab/raj/ s/nancy/fancy/ s/rony/johny/

 sed -f sed.scr names1 names2 > names3

Restricting Lines
 The default is for the editor to look at, and for editing to take place on, every line that is input to the stream editor.  This can be changed by specifying restrictions preceding the command.

Restricting Lines
 For example, to substitute "1" with "2" only in the fifth and sixth lines of the sample file's output, the command would be:
$ sed '5,6 s/1/2/' sample_one one 1 two 1 three 1 one 1 two 2 two 2 three 1

Prohibiting the Display


 The default is for sed to display on the screen (or to a file, if so redirected) every line from the original file, whether it is affected by an edit operation or not; the "n" parameter overrides this action. "-n" overrides all printing and displays no lines whatsoever, whether they were changed by the edit or not.
 For example: $ sed -n -f sedlist sample_one

Prohibiting the Display


 In the above example, nothing is displayed on the screen. Doesn't this negate the whole purpose of the edit? Why is this useful? It is useful only because the "-n" option has the ability to be overridden by a print command (-p).  To illustrate, suppose the script file were modified to now resemble the following:  $ cat sedlist /two/ s/1/2/p /three/ s/1/3/p

Prohibiting the Display


 Then this would be the result of running it: $ sed -n -f sedlist sample two 2 three 3 two 2 two 2 three 3 In this case, only the lines affected by the edit are displayed.

Prohibiting the Display


 Another method of utilizing this is to print only a set number of lines. For example, to print only lines two through six while making no other editing changes:  $ sed -n '2,6p' sample two 1 three 1 one 1 two 1 two 1

Deleting Lines
 Substituting one value for another is far from the only function that can be performed with a stream editor.  There are many more possibilities, and the second-most-used function in my opinion is delete.  Delete works in the same manner as substitute, only it removes the specified lines (if you want to remove a word and not a line, don't think of deleting, but think of substituting it for nothing s/cat//).  The syntax for the command is: '{what to find} d'

Deleting Lines
 To remove all of the lines containing "two" from the sample file: $ sed '/two/ d' sample one 1 three 1 one 1 three 1

Deleting Lines
 To remove the first three lines from the display: $ sed '1,3 d' sample one 1 two 1 two 1 three 1

expressions
 There are several things to keep in mind with the stream editor as they relate to global expressions in general, and as they apply to deletions in particular:  The up carat (^) signifies the beginning of a line, thus sed '/^two/ d' sample would only delete the line if "two" were the first three characters of the line  The dollar sign ($) represents the end of the file, or the end of a line, thus sed '/two$/ d' sample_one would delete the line only if "two" were the last three characters of the line.

Appending and Inserting Text


 Text can be appended to the end of a file by using sed with the "a" option. This is done in the following manner:
sed '$a\ > This is where we stop > the test' sample one 1 two 1 three 1 one 1 two 1 two 1 three 1 This is where we stop the test

Appending and Inserting Text


 Within the command, the dollar sign ($) signifies that the text is to be appended to the end of the file
 To append the lines into the fourth position instead of at the end, the command becomes: $ sed '3a\ > This is where we stop > the test' sample one 1 two 1 three 1 This is where we stop the test one 1 two 1 two 1 three 1

Reading and Writing Files


 The ability to redirect the output has already been illustrated, but it needs to be pointed out that files can be read in and written out to simultaneously during operation of the editing commands.  For example, to perform the substitution and write the lines between one and three to a file called sample3:  $ sed '/two/ s/1/2/ ; /three/ s/1/3/ ; 1,3 w sampl3' sample one 1 two 2 three 3 one 1 two 2 two 2 three 3

Reading and Writing Files


 cat sample3 one 1 two 2 three 3

The Change Command


 In addition to substituting entries, it is possible to change the lines from one value to another.  The thing to keep in mind is that substitute works on a character-for-character basis, whereas change functions like delete in that it affects the entire line: $ sed '/two/ c\ > We are no longer using two' sample one 1 We are no longer using two three 1 one 1 We are no longer using two We are no longer using two three 1

Change All but...


 to delete all lines that contain the phrase "two," the operation is:  $ sed '/two/ d' sample one 1 three 1 one 1 three 1  And to delete all lines except those that contain the phrase "two," the syntax becomes:  $ sed '/two/ !d' sample two 1 two 1 two 1

Quitting Early
 The default is for sed to read through an entire file and stop only when the end is reached.  You can stop processing early, however, by using the quit command.  Only one quit command can be specified, and processing will continue until the condition calling the quit command is satisfied.

Quitting Early
 For example, to perform substitution only on the first five lines of a file and then quit:  $ sed ' /two/ s/1/2/; /three/ s/1/3/; 5q' sample one 1 two 2 three 3 one 1 two 2

Quitting Early
 The entry preceding the quit command can be a line number, as shown, or a find/matching command like the following:  $ sed '/two/ s/1/2/; /three/ s/1/3/; /three/q' sample one 1 two 2 three 3

Quitting Early
 You can also use the quit command to view lines beyond a standard number and add functionality that exceeds those in head. For example, the head command allows you to specify how many of the first lines of a file you want to seethe default number is ten, but any number can be used from one to ninetynine. If you want to see the first 110 lines of a file, you cannot do so with head, but you can with sed:  sed 110q filename

More examples
 To use the sed command as a filter in a pipeline, enter:
pr chap2 | sed "s/Page *[0-9]*$/(&)/" | enq Here $ represents the matched character

AWK
 AWK, offers a more general computational model for processing a file  Atypical example of an AWK program is one that transforms data into a formatted report.  The data might be a log file generated by a Unix program such as traceroute, and the report might summarize the data in a format useful to a system administrator  Or the data might be extracted from a text file with a specific format, such as the following example.  In other words, AWK is a pattern-matching program

Use AWK
 Try out this one awk command at the command line: $ awk { print $0 } /etc/passwd  The results will look something like the following
root:x:0:0:root:/root:/bin/ksh bin:x:1:1:bin:/bin:/sbin/nologin sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

Use AWK
 AWK takes two inputs: a command, set of commands, or a command file and a data or data file.
 As with sed the command or command file contains patternmatching instructions for which AWK is to use as a guideline for processing the data or data file.  In this example, AWK isnt processing any data but is simply reading the /etc/passwd files contents and sending the data unfiltered to standard out, much like the cat command.

Extracting with AWK


 The real working power of AWK is in extracting parts of data from a larger formatted body.
 Using the /etc/passwd file again, the following command takes two of the fields from each entry in the /etc/passwd file and creates a more human-friendly output:
$ awk -F: { print username: $1 \t\t\t user id: $3 } /etc/passwd

Extracting with AWK


 By default AWK associates a blank space as a delimiter for the incoming data  To change this association the -F switch is used to denote a different field separator  the colon, for example, is the field separator in the /etc/passwd file. So the quotation marks around the colon, directly following the -F switch denote the delimiter that is in use.  $1 and $2 are used as fields by AWK. $0 represents the whole line.

Use an AWK File


 1. Use vi to enter the following and save the file as print.awk:
BEGIN { FS=: } { printf username: $1 \t\t\t user id: $3 }

 2. Execute awk as follows: $ awk -f print.awk /etc/passwd

Use an AWK File explanation


 The script as executed performs the same function as the previous example; the difference here is the commands reside within a file, with a slightly different format.  Because AWK is a structured programming language, there is a general format to the layout of the file:

Use an AWK File explanation


 1. Beginning commands, which are executed only once at the beginning of the file, are set into a block starting with the word BEGIN. The block is contained in braces exactly as the example shows:
BEGIN { FS=: }

Use an AWK File explanation


 2. Pattern-matching commands are blocks of commands that are executed once for each and every line in the data file. Heres an example: { printf username: $1 \t\t\t user id: $3 \n}
 3. Ending commands, a block of commands first denoted by the word END, are executed only once, when the end of file is reached.
END { Printf All /etc/passwd processing done \n }

Use an AWK example


 To display the lines of a file that are longer than 72 characters, enter: awk 'length >72' chapter1  This selects each line of the chapter1 file that is longer than 72 characters and writes these lines to standard output, because no Action is specified.

Use an AWK example


 To display all lines between the words start and stop, including "start" and "stop", enter: awk '/start/,/stop/' chapter1.

Use an AWK example


 To run an awk command program, sum2.awk, that processes the file, chapter1, enter: awk -f sum2.awk chapter1
 The following program, sum2.awk, computes the sum and average of the numbers in the second column of the input file, chapter1:
{ sum += $2 } END { print "Sum: ", sum; print "Average:", sum/NR; }

Use an AWK example


 In the above example, the first action adds the value of the second field of each line to the variable sum.  All variables are initialized to the numeric value of 0 (zero) when first referenced.  The pattern END before the second action causes those actions to be performed after all of the input file has been read.  The NR special variable, which is used to calculate the average, is a special variable specifying the number of records that have been read.

Use an AWK example


 To print the first two fields in opposite order, enter: awk '{ print $2, $1 }' chapter1

Use an AWK example


 The following awk program awk -f sum3.awk chapter2  prints the first two fields of the file chapter2 with input fields separated by comma and/or blanks and tabs, and then adds up the first column, and prints the sum and average:

BEGIN {FS = ",|[ \t]"} {print $1, $2} {s += $1} END {print "sum is",s,"average is", s/NR }

Patterns
 There are four types of patterns used in the awk command language syntax:
   

Regular Expressions Relational Expressions Combinations of Patterns BEGIN and END Patterns.

Regular Expressions
 The extended regular expressions used by the awk command are similar to those used by the grep or egrep command. The simplest form of an extended regular expression is a string of characters enclosed in slashes. For an example, suppose a file named testfile had the following contents:
smawley, andy smiley, allen smith, alan smithern, harry smithhern, anne smitters, alexis

Regular Expressions
 Entering the following command line: awk '/smi/' testfile
 would print to standard output of all records that contained an occurrence of the string smi.  In this example, the program '/smi/' for the awk command is a pattern with no action. The output is: smiley, allen smith, alan smithern, harry smithhern, anne smitters, alexis

Regular Expressions
 The following special characters are used to form extended regular expressions:
+ Specifies that a string matches if one or more occurrences of the character or extended regular expression that precedes the + (plus) are within the string. The command line: awk '/smith+ern/' testfile  prints to standard output any record that contained a string with the characters smit, followed by one or more h characters, and then ending with the characters ern.  The output in this example is:
smithern, harry smithhern, anne

Regular Expressions
Specifies that a string matches if zero or one occurrences of the character or extended regular expression that precedes the ? (question mark) are within the string. The command line: awk '/smith?/' testfile  prints to standard output of all records that contain the characters smit, followed by zero or one instance of the h character.  The output in this example is:
smith, alan smithern, harry smithhern, anne smitters, alexis

Regular Expressions
| Specifies that a string matches if either of the strings separated by the | (vertical line) are within the string. The command line:
awk '/allen | alan /' testfile

 prints to standard output of all records that contained the string allen or alan.  The output in this example is:
smiley, allen smith, alan

Regular Expressions
() Groups strings together in regular expressions. The command line:
awk '/a(ll)?(nn)?e/' testfile

 prints to standard output of all records with the string ae or alle or anne or allnne.  The output in this example is:
smiley, allen smithhern, anne

Regular Expressions
Specifies that a string matches if exactly m occurrences of the pattern are within the string. The command line: awk '/l{2}/' testfile  prints to standard output smiley, allen {m}

Regular Expressions
Specifies that a string matches if at least m occurrences of the pattern are within the string. The command line: awk '/t{2,}/' testfile  prints to standard output: smitters, alexis {m,}

Regular Expressions
{m, n} pattern <= n). Specifies that a string matches if between m and n, inclusive, occurrences of the are within the string ( where m

The command line: awk '/er{1, 2}/' testfile  prints to standard output:
smithern, harry smithern, anne smitters, alexis

Regular Expressions
[String]
the Signifies that the regular expression matches any characters specified by the String variable within square brackets. The command line:

awk '/sm[a-h]/' testfile  prints to standard output of all records with the characters sm followed by any character in alphabetical order from a to h. The output in this example is: smawley, andy

Regular Expressions
[^ String]
A ^ (caret) within the [ ] (square brackets) and at the beginning of the specified string indicates that the regular expression does not match any characters within the square brackets. Thus, the command line:

awk '/sm[^a-h]/' testfile  prints to standard output:


smiley, allen smith, alan smithern, harry smithhern, anne smitters, alexis

Regular Expressions
~,!~
Signifies a conditional statement that a specified variable matches (tilde) or does not match (tilde, exclamation point) the regular expression. The command line:

awk '$1 ~ /n/' testfile  prints to standard output of all records whose first field contained the character n. The output in this example is: smithern, harry smithhern, anne

Regular Expressions
^
Signifies the beginning of a field or record.

The command line: awk '$2 ~ /^h/' testfile  prints to standard output of all records with the character h as the first character of the second field. The
output in this example is:

smithern, harry

Regular Expressions
$
Signifies the end of a field or record.

The command line: awk '$2 ~ /y$/' testfile  prints to standard output of all records with the character y as the last character of the second field. The output in this example is:
smawley, andy smithern, harry

Regular Expressions
. (period) Signifies any one character except the
terminal new-line character at the end of a space. The command line: awk '/a..e/' testfile  prints to standard output of all records with the characters a and e separated by two characters. The output in this example is:
smawley, andy smiley, allen smithhern, anne

Regular Expressions
*(asterisk) Signifies zero or more of any characters. The command line: awk '/a.*e/' testfile  prints to standard output of all records with the characters a and e separated by zero or more characters. The output in this example is:
smawley, andy smiley, allen smithhern, anne smitters, alexis

Regular Expressions
\ (backslash) The escape character. When preceding any of the characters that have special meaning in extended regular expressions, the escape character removes any special meaning for the character. For example, the command line: /a\/\// would match the pattern a //, since the backslashes negate the usual meaning of the slash as a delimiter of the regular expression. To specify the backslash itself as a character, use a double ackslash.

Recognized Escape Sequences


The awk command recognizes most of the escape sequences used in C language conventions, as well as several that are used as special characters by the awk command itself. The escape sequences are:
        

\" \/ \\ \a \b \n \r \t \v

\" (double-quotation) mark / (slash) character \ (backslash) character Alert character Backspace character New-line character Carriage-return character Tab character Vertical tab.

Relational Expressions
The relational operators < (less than), > (greater than), <= (less than or equal to), >= (greater than or equal to), = = (equal to), and ! = (not equal to) can be used to form patterns. For example, the pattern: $1 < $4 matches records where the first field is less than the fourth field.

Relational Expressions
The relational operators also work with string values. For example: $1 != "q" matches all records where the first field is not a q.

Combinations of Patterns
 Patterns can be combined using three options:


Ranges are specified by two patterns separated with a , (comma). Actions are performed on every record starting with the record that matches the first pattern, and continuing through and including the record that matches the second pattern. For example:

/begin/,/end/ matches the record containing the string begin, and everyrecord between it and the record containing the string end, including the record containing the string end.

Combinations of Patterns
 Parentheses ( ) group patterns together.  The boolean operators || (or), && (and), and ! (not) combine patterns into expressions that match if they evaluate true, otherwise they do not match. For example, the pattern: $1 == "al" && $2 == "123" matches records where the first field is al and the second field is 123.

Vous aimerez peut-être aussi