Vous êtes sur la page 1sur 10

Regular Expressions linuxreg.

txt <2002 01 01>

------------------- -------------------------

Regular expressions are patterns for searching for a string occurrence

within text. They are extremely useful in several major Gnu/Linux or

UNIX filter programs such as grep, awk and lex. They look similar to

the 'glob' filename constructs but have a separate syntax and work on

'text strings' rather than file names. The use of regular expression

patterns is consistent through all programs written for Gnu/Linux or UNIX.

NOTE: DOS system users can become familiar with regular expressions by

adding clone utilities of grep and awk. These can be downloaded from

the txtutil directory of the SimTel msDOS Repository (www.simtel.net).

The Metacharacters

------------------

Most punctuation characters have a meaning other than their literal value.

If you need to use the literal value, preceed the metacharacter with a slash.

The metacharacters are \ ^ $ . [ ] | ( ) * + ? -

Special Escape Sequences

------------------------

\b (backspace) \f (formfeed) \n (newline or lf) \r (carriage return)


\t (tab) and \ddd (octal number) are special sequences in C and Unix.

Some Common Forms of Regular Expressions

----------------------------------------

. a single replication of any character except newline (\n)

* zero or more replications of character that precedes it

+ one or more replications of character that precedes it

? zero or one replication of character that precedes it

[abc] any of a set of given characters

[a-j] any of a CONSEQUTIVE range of given characters (ASCII ordered)

[^a-j] any character NOT in the given character range

\x UNIX escaper used for special metacharacters such as * and ?

"x" brackets method used for special metacharacters such as * and ?

^pattern matches pattern occurrence at beginning of line ONLY

pattern$ matches pattern occurrence at end of line ONLY

Extended Regular Expressions

----------------------------

| allows alternative choices for patterns

\< matches beginning characters of a word

(eg. \<he matches her, here etc)

\> matches the end characters of a word

(eg. ing\> matches pending, reading, etc)


/ matches only if suffix matches

(eg. am/pl matches amplitude and ample but not ampere)

Examples of Regular Expressions

-------------------------------

John matches John but not john

[Jj]ohn matches John and john

joh?n matches john and jon

[a-zA-Z] matches any single alphabetic character

[a-zA-Z]+ matches all alphabetic words

x[0-9a-fA-F] matches any hex character preceded by hex marker

[a-zA-Z_][0-9a-zA-Z_]* matches any identifier that can include digits

only if they are not first character.

Precedence

----------

Concatenation is the highest precedence so there must be a grouping

operator to override it in some cases or to aid in readability.

Parentheses or round brackets are the grouping operator for regular

expressions. For example

then|ten matches either then or ten


the(n|t)en matches either thenen or theten

More Complicated Examples

-------------------------

Consider the regular expression for a quoted string which normally can't

contain quotes within the string. A legitimate regular expression is

\"[^\"]*\" ie. a quote followed by zero or more reps of anything but

a quote followed by a quote.

\".*\" does not work as .* will find the closing quote and not leave

anything for the last \" to match on.

Your Graduation Exercise

------------------------

You may consider yourself a master of regular expressions if you can

decipher what the next expression is intended to match. A guru degree

(magna cum laude) is awarded for stating the cases it fails on and

what the corrected expression should be! Contact the registrar at

ve3ll@rac.ca [Russell's Academy of Computing] to submit work.

\"(\ \ \)*([^\"]+(\ \ \")*)*[^\"]*\"


grep

----

grep searches and reports on regular expression matches within a file.

These pattern searches are line oriented and do not wrap. An example of

a simple search is grep "the" text file | more. This search is case

sensitive and will find more than the word 'the' such as therefore.

To make it a bit more productive use grep -i "the " ie case insensitive

to catch a sentence beginning 'The' and eliminating most extraneous

finds with the space in the pattern. Note that pipes and redirection

can be used to send the results to other utilities or to save them in a

file.

grep allows the use of extended regular expression patterns by using

either egrep or grep -E (note switch must be in uppercase).

grep allows fast pattern matching when you are not using metacharacters.

for example fgrep "billy" or grep -F "billy" will return the same results

as grep "billy" but much quicker as no special character checking is done.

Once again note that the switch must be in uppercase.

awk

---

The awk (and its GNU clone gawk) filter/report generator utility provides

facilities for processing text. It is fully programmable, providing loops,

conditionals, variables, and math operators with a notation similar to C.

A program written in awk accepts input either from the standard input or
redirected from data files. awk processes input on a record by record

basis (delimited by record separators {newline}). Each record is broken

up into fields (delimited by field separators {tab}).

awk program zero:

$ ls -l | awk '{print $0}'

total 254

-rw-r--r-- 1 awkuser awkgrp 1184 Mar 30 11:21 file1

-r--r--r-- 1 awkuser awkgrp 2418 Mar 30 11:21 file2

-r--r--r-- 1 awkuser awkgrp 112530 Mar 30 11:21 file3

-rw-r--r-- 1 awkuser awkgrp 1005 Mar 30 11:21 file4

-rw-rw-r-- 1 awkuser awkgrp 30 Mar 30 11:22 file5

This command reads input piped from the ls utility and displays it.

This example illustrates:

<>awk program can be given on the command line within single forward quotes.

<>awk accepts input from standard input and displays to standard output.

<>awk statements are enclosed in curly braces.

<>$0 indicates the whole record.

NOTE: Each field in a record is referred to as $1, $2, $3, and so on,

where $1 is the first field, $2 is the second field etc.

$ ls -l | awk '{print $5, $9}'

would print fields five and nine of the data piped from ls.
awk program one:

Create a program file first.awk containing the following:

!/bin/awk -f

BEGIN { print ("Starting program execution") ; }

NR == 2 { print ("Second record is " $0) ; }

END { print ("Finished processing"); }

Save this file and change its permission so that this program can be run.

chmod a+x first.awk

Now execute this program :

$ ls -l | ./first.awk

Starting program execution

Second record is -rw-r--r-- 1 awkuser awkgrp 1184 Mar 30 11:21 file1

Finished processing

In first.awk, line 1 shows that it is an awk program. The next 3 lines in

the program have a condition on the left side, whereas the right side has

the statements to be run if the condition is true. The patterns BEGIN and

END are used to capture control before the first input line has been read

and after the last input line has been read respectively. These keywords

do not combine with any other patterns. The third line is an example of

how a pattern can be specified. Awk provides a number of predefined

variables. Some of them are :


NR Number of the record being processed

NF Number of fields in this record

FS Field separator

RS Record separator

FILENAME Name of the current input file

Statements in awk are in the form of pattern { action }

If the pattern is found, then the action statements are executed. If there

is no action, the matching line is printed. If there is no pattern, the

action is performed on every input line. Patterns are a combination of

relational expressions and regular expressions. A relational expression

is of the form:

expression op expression

where an op is any of the following operators

< less than

<= less than or equal to

> greater than

>= greater than or equal to

== equals

!= not equal to

~ contains

!~ does not contain


Regular expressions are as in egrep. In patterns, they must be surrounded

by slashes. Isolated regular expressions in a pattern apply to the entire

line.

An action is a sequence of statements. Conditional statement is of the form

if ( expression ) statement [ else statement ]

Looping expressions are of the form :

while ( expression ) statement

do statement while ( expression )

for ( expression ; expression ; expression ) statement

If multiple statements are to be specified, then should be enclosed between

{ and } braces.

Sample awk Programs

-------------------

Print lines longer than 72 characters:

length > 72

Print first two fields in opposite order:

{ print $2, $1 }

Add the fifth column, print sum and average:

ls -l | awk ' { s += $5 }
END { print "sum is", s, " average is", s/NR }'

Display Username, login, userid and groupid from the password file:

awk -F: '

BEGIN { printf("%s:%s:%s:%s\n", "Username", "login","Userid","Groupid");}

{ printf("%s:%s:%s:%s\n", $5, $1,$3,$4);}' /etc/passwd

Scan the password file and get the next available userid:

#!/bin/ksh

PASSWDFILE=/etc/passwd

sort -t : -n +2 -3 $PASSWDFILE |

awk -F: '

BEGIN { previous = 0 ; }

$3 > 100 {

if ($3 - previous > 1)

print "Values between " previous " and " $3 ;

print previous + 1 ;

previous = $3 ;

}'

Vous aimerez peut-être aussi