Académique Documents
Professionnel Documents
Culture Documents
sed and awk are two very powerful tools that enable a user to manipulate files in an efficient manner
sed: a text editor that works on full streams of text AWK: an output formatting language
Sed
Sed is a stream editor (thus the name), and it is designed to work on a specified stream of text according to rules set by the user beforehand
For example, the output of the ls command produces a stream of texta directory listing that can be piped through sed and edited. In addition, sed can work on files. If you have a group of files with similar content and need to make a particular edit to the contents of all these files, sed will enable you to do that very easily.
Sed example
The syntax for the utility is: sed [options] '{command}' [filename] Here you combine the contents of two files while at the same time performing a substitution for the name aftab in both files.
Sed example
1. Create two files, each with a list of first names, in vi:
$ vi names1
aftab lucy preeti neeti shetty
$ vi names2
aftab nancy pallni rony dolly
Sed Files
Of course, no matter which of the three methods just described is used, none are practical when it comes time to enter a long list of editing commands for sed on the command line. To provide a large series of commands, sed has the capability to read a file full of commands that contains the editing instructions as a single command-line argument. This is done using the -f option. The file denoted with the -f argument simply specifies a text file with a series of actions to be performed in sequence.
Restricting Lines
The default is for the editor to look at, and for editing to take place on, every line that is input to the stream editor. This can be changed by specifying restrictions preceding the command.
Restricting Lines
For example, to substitute "1" with "2" only in the fifth and sixth lines of the sample file's output, the command would be:
$ sed '5,6 s/1/2/' sample_one one 1 two 1 three 1 one 1 two 2 two 2 three 1
Deleting Lines
Substituting one value for another is far from the only function that can be performed with a stream editor. There are many more possibilities, and the second-most-used function in my opinion is delete. Delete works in the same manner as substitute, only it removes the specified lines (if you want to remove a word and not a line, don't think of deleting, but think of substituting it for nothing s/cat//). The syntax for the command is: '{what to find} d'
Deleting Lines
To remove all of the lines containing "two" from the sample file: $ sed '/two/ d' sample one 1 three 1 one 1 three 1
Deleting Lines
To remove the first three lines from the display: $ sed '1,3 d' sample one 1 two 1 two 1 three 1
expressions
There are several things to keep in mind with the stream editor as they relate to global expressions in general, and as they apply to deletions in particular: The up carat (^) signifies the beginning of a line, thus sed '/^two/ d' sample would only delete the line if "two" were the first three characters of the line The dollar sign ($) represents the end of the file, or the end of a line, thus sed '/two$/ d' sample_one would delete the line only if "two" were the last three characters of the line.
Quitting Early
The default is for sed to read through an entire file and stop only when the end is reached. You can stop processing early, however, by using the quit command. Only one quit command can be specified, and processing will continue until the condition calling the quit command is satisfied.
Quitting Early
For example, to perform substitution only on the first five lines of a file and then quit: $ sed ' /two/ s/1/2/; /three/ s/1/3/; 5q' sample one 1 two 2 three 3 one 1 two 2
Quitting Early
The entry preceding the quit command can be a line number, as shown, or a find/matching command like the following: $ sed '/two/ s/1/2/; /three/ s/1/3/; /three/q' sample one 1 two 2 three 3
Quitting Early
You can also use the quit command to view lines beyond a standard number and add functionality that exceeds those in head. For example, the head command allows you to specify how many of the first lines of a file you want to seethe default number is ten, but any number can be used from one to ninetynine. If you want to see the first 110 lines of a file, you cannot do so with head, but you can with sed: sed 110q filename
More examples
To use the sed command as a filter in a pipeline, enter:
pr chap2 | sed "s/Page *[0-9]*$/(&)/" | enq Here $ represents the matched character
AWK
AWK, offers a more general computational model for processing a file Atypical example of an AWK program is one that transforms data into a formatted report. The data might be a log file generated by a Unix program such as traceroute, and the report might summarize the data in a format useful to a system administrator Or the data might be extracted from a text file with a specific format, such as the following example. In other words, AWK is a pattern-matching program
Use AWK
Try out this one awk command at the command line: $ awk { print $0 } /etc/passwd The results will look something like the following
root:x:0:0:root:/root:/bin/ksh bin:x:1:1:bin:/bin:/sbin/nologin sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
Use AWK
AWK takes two inputs: a command, set of commands, or a command file and a data or data file.
As with sed the command or command file contains patternmatching instructions for which AWK is to use as a guideline for processing the data or data file. In this example, AWK isnt processing any data but is simply reading the /etc/passwd files contents and sending the data unfiltered to standard out, much like the cat command.
BEGIN {FS = ",|[ \t]"} {print $1, $2} {s += $1} END {print "sum is",s,"average is", s/NR }
Patterns
There are four types of patterns used in the awk command language syntax:
Regular Expressions Relational Expressions Combinations of Patterns BEGIN and END Patterns.
Regular Expressions
The extended regular expressions used by the awk command are similar to those used by the grep or egrep command. The simplest form of an extended regular expression is a string of characters enclosed in slashes. For an example, suppose a file named testfile had the following contents:
smawley, andy smiley, allen smith, alan smithern, harry smithhern, anne smitters, alexis
Regular Expressions
Entering the following command line: awk '/smi/' testfile
would print to standard output of all records that contained an occurrence of the string smi. In this example, the program '/smi/' for the awk command is a pattern with no action. The output is: smiley, allen smith, alan smithern, harry smithhern, anne smitters, alexis
Regular Expressions
The following special characters are used to form extended regular expressions:
+ Specifies that a string matches if one or more occurrences of the character or extended regular expression that precedes the + (plus) are within the string. The command line: awk '/smith+ern/' testfile prints to standard output any record that contained a string with the characters smit, followed by one or more h characters, and then ending with the characters ern. The output in this example is:
smithern, harry smithhern, anne
Regular Expressions
Specifies that a string matches if zero or one occurrences of the character or extended regular expression that precedes the ? (question mark) are within the string. The command line: awk '/smith?/' testfile prints to standard output of all records that contain the characters smit, followed by zero or one instance of the h character. The output in this example is:
smith, alan smithern, harry smithhern, anne smitters, alexis
Regular Expressions
| Specifies that a string matches if either of the strings separated by the | (vertical line) are within the string. The command line:
awk '/allen | alan /' testfile
prints to standard output of all records that contained the string allen or alan. The output in this example is:
smiley, allen smith, alan
Regular Expressions
() Groups strings together in regular expressions. The command line:
awk '/a(ll)?(nn)?e/' testfile
prints to standard output of all records with the string ae or alle or anne or allnne. The output in this example is:
smiley, allen smithhern, anne
Regular Expressions
Specifies that a string matches if exactly m occurrences of the pattern are within the string. The command line: awk '/l{2}/' testfile prints to standard output smiley, allen {m}
Regular Expressions
Specifies that a string matches if at least m occurrences of the pattern are within the string. The command line: awk '/t{2,}/' testfile prints to standard output: smitters, alexis {m,}
Regular Expressions
{m, n} pattern <= n). Specifies that a string matches if between m and n, inclusive, occurrences of the are within the string ( where m
The command line: awk '/er{1, 2}/' testfile prints to standard output:
smithern, harry smithern, anne smitters, alexis
Regular Expressions
[String]
the Signifies that the regular expression matches any characters specified by the String variable within square brackets. The command line:
awk '/sm[a-h]/' testfile prints to standard output of all records with the characters sm followed by any character in alphabetical order from a to h. The output in this example is: smawley, andy
Regular Expressions
[^ String]
A ^ (caret) within the [ ] (square brackets) and at the beginning of the specified string indicates that the regular expression does not match any characters within the square brackets. Thus, the command line:
Regular Expressions
~,!~
Signifies a conditional statement that a specified variable matches (tilde) or does not match (tilde, exclamation point) the regular expression. The command line:
awk '$1 ~ /n/' testfile prints to standard output of all records whose first field contained the character n. The output in this example is: smithern, harry smithhern, anne
Regular Expressions
^
Signifies the beginning of a field or record.
The command line: awk '$2 ~ /^h/' testfile prints to standard output of all records with the character h as the first character of the second field. The
output in this example is:
smithern, harry
Regular Expressions
$
Signifies the end of a field or record.
The command line: awk '$2 ~ /y$/' testfile prints to standard output of all records with the character y as the last character of the second field. The output in this example is:
smawley, andy smithern, harry
Regular Expressions
. (period) Signifies any one character except the
terminal new-line character at the end of a space. The command line: awk '/a..e/' testfile prints to standard output of all records with the characters a and e separated by two characters. The output in this example is:
smawley, andy smiley, allen smithhern, anne
Regular Expressions
*(asterisk) Signifies zero or more of any characters. The command line: awk '/a.*e/' testfile prints to standard output of all records with the characters a and e separated by zero or more characters. The output in this example is:
smawley, andy smiley, allen smithhern, anne smitters, alexis
Regular Expressions
\ (backslash) The escape character. When preceding any of the characters that have special meaning in extended regular expressions, the escape character removes any special meaning for the character. For example, the command line: /a\/\// would match the pattern a //, since the backslashes negate the usual meaning of the slash as a delimiter of the regular expression. To specify the backslash itself as a character, use a double ackslash.
\" \/ \\ \a \b \n \r \t \v
\" (double-quotation) mark / (slash) character \ (backslash) character Alert character Backspace character New-line character Carriage-return character Tab character Vertical tab.
Relational Expressions
The relational operators < (less than), > (greater than), <= (less than or equal to), >= (greater than or equal to), = = (equal to), and ! = (not equal to) can be used to form patterns. For example, the pattern: $1 < $4 matches records where the first field is less than the fourth field.
Relational Expressions
The relational operators also work with string values. For example: $1 != "q" matches all records where the first field is not a q.
Combinations of Patterns
Patterns can be combined using three options:
Ranges are specified by two patterns separated with a , (comma). Actions are performed on every record starting with the record that matches the first pattern, and continuing through and including the record that matches the second pattern. For example:
/begin/,/end/ matches the record containing the string begin, and everyrecord between it and the record containing the string end, including the record containing the string end.
Combinations of Patterns
Parentheses ( ) group patterns together. The boolean operators || (or), && (and), and ! (not) combine patterns into expressions that match if they evaluate true, otherwise they do not match. For example, the pattern: $1 == "al" && $2 == "123" matches records where the first field is al and the second field is 123.