Reg Exp

Regular Expressions linuxreg.
txt <2002 01 01>
------------------- -------------------------
Regular expressions are patterns for searching for a string occurrence
within text. They are extremely useful in several major Gnu/Linux or
UNIX filter programs such as grep, awk and lex. They look similar to
the 'glob' filename constructs but have a separate syntax and work on
'text strings' rather than file names. The use of regular expression
patterns is consistent through all programs written for Gnu/Linux or UNIX.
NOTE: DOS system users can become familiar with regular expressions by
adding clone utilities of grep and awk. These can be downloaded from
the txtutil directory of the SimTel msDOS Repository (www.simtel.net).
The Metacharacters
------------------
Most punctuation characters have a meaning other than their literal value.
If you need to use the literal value, preceed the metacharacter with a slash.
The metacharacters are \ ^ $ . [ ] | ( ) * + ? -
Special Escape Sequences
------------------------
\b (backspace) \f (formfeed) \n (newline or lf) \r (carriage return)

\t (tab) and \ddd (octal number) are special sequences in C and Unix.
Some Common Forms of Regular Expressions
----------------------------------------
. a single replication of any character except newline (\n)
* zero or more replications of character that precedes it
+ one or more replications of character that precedes it
? zero or one replication of character that precedes it
[abc] any of a set of given characters
[a-j] any of a CONSEQUTIVE range of given characters (ASCII ordered)
[^a-j] any character NOT in the given character range
\x UNIX escaper used for special metacharacters such as * and ?
"x" brackets method used for special metacharacters such as * and ?
^pattern matches pattern occurrence at beginning of line ONLY
pattern$ matches pattern occurrence at end of line ONLY
Extended Regular Expressions
----------------------------
| allows alternative choices for patterns
\< matches beginning characters of a word
(eg. \<he matches her, here etc)
\> matches the end characters of a word
(eg. ing\> matches pending, reading, etc)

/ matches only if suffix matches
(eg. am/pl matches amplitude and ample but not ampere)
Examples of Regular Expressions
-------------------------------
John matches John but not john
[Jj]ohn matches John and john
joh?n matches john and jon
[a-zA-Z] matches any single alphabetic character
[a-zA-Z]+ matches all alphabetic words
x[0-9a-fA-F] matches any hex character preceded by hex marker
[a-zA-Z_][0-9a-zA-Z_]* matches any identifier that can include digits
only if they are not first character.
Precedence
----------
Concatenation is the highest precedence so there must be a grouping
operator to override it in some cases or to aid in readability.
Parentheses or round brackets are the grouping operator for regular
expressions. For example
then|ten matches either then or ten

the(n|t)en matches either thenen or theten
More Complicated Examples
-------------------------
Consider the regular expression for a quoted string which normally can't
contain quotes within the string. A legitimate regular expression is
\"[^\"]*\" ie. a quote followed by zero or more reps of anything but
a quote followed by a quote.
\".*\" does not work as .* will find the closing quote and not leave
anything for the last \" to match on.
Your Graduation Exercise
------------------------
You may consider yourself a master of regular expressions if you can
decipher what the next expression is intended to match. A guru degree
(magna cum laude) is awarded for stating the cases it fails on and
what the corrected expression should be! Contact the registrar at
ve3ll@rac.ca [Russell's Academy of Computing] to submit work.
\"(\ \ \)*([^\"]+(\ \ \")*)*[^\"]*\"

grep
----
grep searches and reports on regular expression matches within a file.
These pattern searches are line oriented and do not wrap. An example of
a simple search is grep "the" text file | more. This search is case
sensitive and will find more than the word 'the' such as therefore.
To make it a bit more productive use grep -i "the " ie case insensitive
to catch a sentence beginning 'The' and eliminating most extraneous
finds with the space in the pattern. Note that pipes and redirection
can be used to send the results to other utilities or to save them in a
file.
grep allows the use of extended regular expression patterns by using
either egrep or grep -E (note switch must be in uppercase).
grep allows fast pattern matching when you are not using metacharacters.
for example fgrep "billy" or grep -F "billy" will return the same results
as grep "billy" but much quicker as no special character checking is done.
Once again note that the switch must be in uppercase.
awk
---
The awk (and its GNU clone gawk) filter/report generator utility provides
facilities for processing text. It is fully programmable, providing loops,
conditionals, variables, and math operators with a notation similar to C.
A program written in awk accepts input either from the standard input or
redirected from data files. awk processes input on a record by record
basis (delimited by record separators {newline}). Each record is broken
up into fields (delimited by field separators {tab}).
awk program zero:
$ ls -l | awk '{print $0}'
total 254
-rw-r--r-- 1 awkuser awkgrp 1184 Mar 30 11:21 file1
-r--r--r-- 1 awkuser awkgrp 2418 Mar 30 11:21 file2
-r--r--r-- 1 awkuser awkgrp 112530 Mar 30 11:21 file3
-rw-r--r-- 1 awkuser awkgrp 1005 Mar 30 11:21 file4
-rw-rw-r-- 1 awkuser awkgrp 30 Mar 30 11:22 file5
This command reads input piped from the ls utility and displays it.
This example illustrates:
<>awk program can be given on the command line within single forward quotes.
<>awk accepts input from standard input and displays to standard output.
<>awk statements are enclosed in curly braces.
<>$0 indicates the whole record.
NOTE: Each field in a record is referred to as $1, $2, $3, and so on,
where $1 is the first field, $2 is the second field etc.
$ ls -l | awk '{print $5, $9}'
would print fields five and nine of the data piped from ls.
awk program one:
Create a program file first.awk containing the following:
!/bin/awk -f
BEGIN { print ("Starting program execution") ; }
NR == 2 { print ("Second record is " $0) ; }
END { print ("Finished processing"); }
Save this file and change its permission so that this program can be run.
chmod a+x first.awk
Now execute this program :
$ ls -l | ./first.awk
Starting program execution
Second record is -rw-r--r-- 1 awkuser awkgrp 1184 Mar 30 11:21 file1
Finished processing
In first.awk, line 1 shows that it is an awk program. The next 3 lines in
the program have a condition on the left side, whereas the right side has
the statements to be run if the condition is true. The patterns BEGIN and
END are used to capture control before the first input line has been read
and after the last input line has been read respectively. These keywords
do not combine with any other patterns. The third line is an example of
how a pattern can be specified. Awk provides a number of predefined
variables. Some of them are :

NR Number of the record being processed
NF Number of fields in this record
FS Field separator
RS Record separator
FILENAME Name of the current input file
Statements in awk are in the form of pattern { action }
If the pattern is found, then the action statements are executed. If there
is no action, the matching line is printed. If there is no pattern, the
action is performed on every input line. Patterns are a combination of
relational expressions and regular expressions. A relational expression
is of the form:
expression op expression
where an op is any of the following operators
< less than
<= less than or equal to
> greater than
>= greater than or equal to
== equals
!= not equal to
~ contains
!~ does not contain

Regular expressions are as in egrep. In patterns, they must be surrounded
by slashes. Isolated regular expressions in a pattern apply to the entire
line.
An action is a sequence of statements. Conditional statement is of the form
if ( expression ) statement [ else statement ]
Looping expressions are of the form :
while ( expression ) statement
do statement while ( expression )
for ( expression ; expression ; expression ) statement
If multiple statements are to be specified, then should be enclosed between
{ and } braces.
Sample awk Programs
-------------------
Print lines longer than 72 characters:
length > 72
Print first two fields in opposite order:
{ print $2, $1 }
Add the fifth column, print sum and average:
ls -l | awk ' { s += $5 }
END { print "sum is", s, " average is", s/NR }'
Display Username, login, userid and groupid from the password file:
awk -F: '
BEGIN { printf("%s:%s:%s:%s\n", "Username", "login","Userid","Groupid");}
{ printf("%s:%s:%s:%s\n", $5, $1,$3,$4);}' /etc/passwd
Scan the password file and get the next available userid:
#!/bin/ksh
PASSWDFILE=/etc/passwd
sort -t : -n +2 -3 $PASSWDFILE |
awk -F: '
BEGIN { previous = 0 ; }
$3 > 100 {
if ($3 - previous > 1)
print "Values between " previous " and " $3 ;
print previous + 1 ;
previous = $3 ;
}'

Reg Exp

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Reg Exp

Transféré par

Droits d'auteur :

Formats disponibles

Regular Expressions linuxreg.

txt <2002 01 01>

Regular expressions are patterns for searching for a string occurrence

within text. They are extremely useful in several major Gnu/Linux or

patterns is consistent through all programs written for Gnu/Linux or UNIX.

the txtutil directory of the SimTel msDOS Repository (www.simtel.net).

The metacharacters are \ ^ $ . [ ] | ( ) * + ? -

Special Escape Sequences

\b (backspace) \f (formfeed) \n (newline or lf) \r (carriage return)

Some Common Forms of Regular Expressions

. a single replication of any character except newline (\n)

* zero or more replications of character that precedes it

+ one or more replications of character that precedes it

? zero or one replication of character that precedes it

[abc] any of a set of given characters

[a-j] any of a CONSEQUTIVE range of given characters (ASCII ordered)

[^a-j] any character NOT in the given character range

\x UNIX escaper used for special metacharacters such as * and ?

"x" brackets method used for special metacharacters such as * and ?

^pattern matches pattern occurrence at beginning of line ONLY

pattern$ matches pattern occurrence at end of line ONLY

Extended Regular Expressions

| allows alternative choices for patterns

\< matches beginning characters of a word

(eg. \<he matches her, here etc)

\> matches the end characters of a word

(eg. ing\> matches pending, reading, etc)

(eg. am/pl matches amplitude and ample but not ampere)

Examples of Regular Expressions

John matches John but not john

[Jj]ohn matches John and john

joh?n matches john and jon

[a-zA-Z] matches any single alphabetic character

[a-zA-Z]+ matches all alphabetic words

x[0-9a-fA-F] matches any hex character preceded by hex marker

[a-zA-Z_][0-9a-zA-Z_]* matches any identifier that can include digits

only if they are not first character.

Concatenation is the highest precedence so there must be a grouping

operator to override it in some cases or to aid in readability.

Parentheses or round brackets are the grouping operator for regular

expressions. For example

then|ten matches either then or ten

More Complicated Examples

contain quotes within the string. A legitimate regular expression is

\"[^\"]*\" ie. a quote followed by zero or more reps of anything but

a quote followed by a quote.

anything for the last \" to match on.

Your Graduation Exercise

You may consider yourself a master of regular expressions if you can

decipher what the next expression is intended to match. A guru degree

what the corrected expression should be! Contact the registrar at

ve3ll@rac.ca [Russell's Academy of Computing] to submit work.

\"(\ \ \)*([^\"]+(\ \ \")*)*[^\"]*\"

grep searches and reports on regular expression matches within a file.

to catch a sentence beginning 'The' and eliminating most extraneous

can be used to send the results to other utilities or to save them in a

grep allows the use of extended regular expression patterns by using

either egrep or grep -E (note switch must be in uppercase).

as grep "billy" but much quicker as no special character checking is done.

Once again note that the switch must be in uppercase.

facilities for processing text. It is fully programmable, providing loops,

conditionals, variables, and math operators with a notation similar to C.

basis (delimited by record separators {newline}). Each record is broken

up into fields (delimited by field separators {tab}).

\"(\ \ \)([^\"]+(\ \ \"))[^\"]\"