Vous êtes sur la page 1sur 175

Introduction to Perl

What is Perl?
Practical Extraction and Report Language
Perl is a Portable Scripting Language No compiling is needed. Runs on Windows, UNIX, LINUX and cygwin Fast and easy text processing capability

Fast and easy file handling capability


Written by Larry Wall Perl is a high-level, general-purpose, interpreted,

dynamic programming language


2

About Perl: Practical Extraction and Report Language


1987
Larry Wall Develops PERL PERL is not officially a Programming Language per se. Walls original intent was to develop a scripting language more powerful than Unix Shell Scripting, but not as tedious as C. PERL is an interpreted language. That means that there is no explicitly separate compilation step. Rather, the processor reads the whole file, converts it to an internal form and executes it immediately.

1989
October 18 Perl 3.0 is released under the GNU Protection License

1991
March 21 Perl 4.0 is released under the GPL and the new Perl Artistic License

Now
Perl 6

How to Access Perl


To install at home Perl Comes by Default on Linux, Cygwin, MacOSX www.perl.com Has rpm's for Linux www.activestate.com Has binaries for Windows

Resources For Perl


Books:

Learning Perl

By Larry Wall Published by O'Reilly By Larry Wall,Tom Christiansen and Jon Orwant Published by O'Reilly

Programming Perl

Web Site

http://safari.oreilly.com

Contains both Learning Perl and Programming Perl in ebook form

Web Sources for Perl


Web

www.perl.com www.perldoc.com www.perl.org www.perlmonks.org

The Basic Hello World Program


which perl pico hello.pl Program:
#! /path/perl -w print Hello World!\n;

Save this as hello.pl Give it executable permissions

chmod a+x hello.pl ./hello.pl


7

Run it as follows:

Hello World Observations


.pl extension is optional but is commonly used
The first line #!/usr/local/bin/perl tells UNIX where

to find Perl -w switches on warning

Variables and Their Content

Numerical Literals
Numerical Literals

6 12.6 1e10 6.4E-33 4_348_348

Integer Floating Point Scientific Notation Scientific Notation Underscores instead of commas for long numbers

11

String Literals
String Literals
There

is more than one way to do it! 'Just don't create a file called -rf.' Beauty?\nWhat's that?\n Real programmers can write assembly in any language.
12

Types of Variables
Types of variables: Scalar variables : $a, $b, $c Array variables : @array Hash variables : %hash File handles : STDIN, STDOUT, STDERR

$a = 5; $a = perl;

# now an integer # now a string

13

Operators on Scalar Variables


Numeric and Logic Operators Typical : +, -, *, /, %, ++, --, +=, -=, *=, /=, ||, &&, ! ect Not typical: ** for exponentiation
String Operators Concatenation: . - similar to strcat $first_name = First MCA; $last_name = Bharathiar University; $full_name = $first_name . . $last_name;

14

Equality Operators for Strings


Equality/ Inequality : eq and ne

$language = Perl;

if ($language == Perl
if ($language eq Perl)

Use eq / ne rather than == / != for strings


15

Relational Operators for Strings


Greater than

Numeric : >

String : gt

Greater than or equal to

Numeric : >=
Numeric : < Numeric : <=

String : ge
String : lt String : le

Less than

Less than or equal to

16

String Functions
Convert to upper case

$name = uc($name);
$name = ucfirst($name);

Convert only the first char to upper case

Convert to lower case

$name = lc($name); $name = lcfirst($name);


17

Convert only the first char to lower case

A String Example Program


#!/usr/local/bin/perl $var1 = mCA; $var2 = first mca; $var3 = MCA; print ucfirst($var1); print uc($var2); print lcfirst(uc($var3)); # Prints MCA' # Prints FIRST MCA' # Prints mCA'

18

Variable Interpolation
Perl looks for variables inside strings and replaces

them with their value


$stooge = Larry print $stooge is one of the three stooges.\n;

Produces the output:


Larry is one of the three stooges.
This does not happen when you use single quotes print '$stooge is one of the three stooges.\n;

Produces the output:


$stooge is one of the three stooges.\n
19

Character Interpolation
List of character escapes that are recognized

when using double quoted strings


\n \t \r

newline tab carriage return

Common Example :

print Hello\n; # prints Hello and then a return


20

Numbers and Strings are Interchangeable


If a scalar variable looks like a number and Perl needs

a number, it will use it as a number

$a = 4; print $a + 18; # $b = 50; # print $b 10;

# a number prints 22 looks like a string, but ... # will print 40!

21

Control Structures: Loops and Conditions

Loops and Conditions


If ... else ... Statements Unless ... else Statements While Loop Until Loop For Loops
23

if...else statement
if...else statement - use this statement to

execute some code if a condition is true and another code if the condition is false
if (condition)

code to be executed if condition is true; else code to be executed if condition is false;

24

If ... else ... statements


if ( $weather eq Rain ) { print Umbrella!\n; } elsif ( $weather eq Sun ) { print Sunglasses!\n; } else { print Anti Radiation Armor!\n; }
25

Unless ... else Statements


Unless Statements are the opposite of if ... else

statements.
unless ($weather eq Rain) { print Dress as you wish!\n; } else { print Umbrella!\n; }

And again remember the braces are required!


26

while loop syntax

while ( expression ) { Single statement or Block of statements; }

While Loop
Example :

$i = 0; while ( $i lt 1000 ) { print $i\n; $i++; }


28

Until Loop
The until function evaluates an expression

repeatedly until a specific condition is met.


Example:

$i = 0; until ($i eq 1000) { print $i\n; $i++; }


29

For Loops

Syntax 1: for ( $i = 0; $i lt 000; $i=$i+2 ) { print $i\n; }


Syntax 2: for $i(0..1000) { print $i\n; }
30

Moving around in a Loop


next: ignore the current iteration last: terminates the loop. What is the output for the following code snippet: for ( $i = 0; $i < 10; $i++) { if ($i == 1 || $i == 3) { next; } elsif($i == 5) { last; } else 0 {print $i\n;} 2 } 4

31

Exercise
Use a loop structure and code a program that

produces the following output:


A AA AAA AAAB AAABA AAABAA AAABAAA AAABAAAB .. TIP: $chain = $chain . A;
32

Exercise
#! /usr/bin/perl for ($i=0, $j=0; $i<100; $i++) { if ( $j==3){$chain.=B;$j=0;} else {$chain.=A; $j++;} print $chain\n; }

33

Exercise
for ($i=0; $i<100; $i++) { $v=rand 100; #print Patient $i $v\n; printf Patient %d %.2f\n\n, $i, $v; #%s : chaines, strings #%d : integer #%f : floating points }
34

Collections Of Variables: Arrays

Arrays
Array variable is denoted by the @ symbol @array = ( Larry, Curly, Moe );
To access the whole array, use the whole

array

print @array; # prints : Larry Curly Moe

Array Indexes start at 0 !!!!! To access one element of the array : use $
Why? Because every element in the array is scalar
print $array[0]\n; # prints : Larry
36

Arrays cont ...


To find the index of the last element in the

array
print $#array; # prints 2 in the previous # example
Note another way to find the number of

elements in the array:


$array_size = @array;

37

#!/usr/bin/perl print "content-type: text/html \n\n"; #HTTP HEADER

# DEFINE AN ARRAY @coins = ("Quarter","Dime","Nickel"); # PRINT THE ARRAY print "@coins"; print "<br />"; print @coins;
38

PERL - Slicing Array Elements

There is no specific slice() function for slicing up elements of an array. Instead PERL allows us to create a new array with elements of another array using array indexing.
myrangefriend.pl: #!/usr/bin/perl print "content-type: text/html \n\n"; #HTTP HEADER # SEQUENTIAL ARRAY @nums = (1..200); @slicenums = @nums[10..20,50..60,190..200]; print "@slicenums"; myrangefriend.pl: 11 12 13 14 15 16 17 18 19 20 21 51 52 53 54 55 56 57 58 59 60 61 191 192 193 194 195 196 197 198 199 200

44

Sorting Arrays
Perl has a built in sort function Two ways to sort: Default : sorts in a standard string comparisons order sort LIST Usersub: create your own subroutine that returns an integer less than, equal to or greater than 0 Sort USERSUB LIST

46

Numerical Sorting Example


#!/usr/local/bin/perl -w @unsortedArray = (3, 10, 76, 23, 1, 54); @sortedArray = sort numeric @unsortedArray; print @unsortedArray\n; # prints 3 10 76 23 1 54 print @sortedArray\n; # prints 1 3 10 23 54 76 sub numeric { return $a <=> $b; } # Numbers: $a <=> $b : # Strings: $a cpm $b :

-1 if $a<$b , 0 if $a== $b, 1 if $a>$b -1 if $a<$b , 0 if $a== $b, 1 if $a>$b


47

String Sorting Example


#!/usr/local/bin/perl -w @unsortedArray = (Raja, Mani, Kannan); @sortedArray = sort { lc($a) cmp lc($b)} @unsortedArray;

print @unsortedArray\n; # prints Raja Mani Kannan


print @sortedArray\n; # prints Kannan Mani Raja

48

Array Operations
push(@ARRAY, LIST)

add the LIST to the end of the @ARRAY pop(@ARRAY) remove and return the last element of @ARRAY unshift(@ARRAY, LIST)

add the LIST to the front of @ARRAY shift(@ARRAY)


remove and return the first element of @ARRAY scalar(@ARRAY) return the number of elements in the @ARRAY

Check 04_arrayOps.pl

Arrays and Loops


Foreach allows you to iterate over an array Example: foreach $element (@array) { print $element\n; } This is similar to : for ($i = 0; $i <= $#array; $i++) { print $array[$i]\n; }
50

Sorting with Foreach


The sort function sorts the array and returns the list in

sorted order. Example :


@array( Larry, Curly, Moe); foreach $element (sort @array) { print $element ; }

Prints the elements in sorted order:

Curly Larry Moe


51

Arrays
If you want to assign the first value of an array into a scalar, the script would be : ($result)=@array; To assign the first two elements of an array into scalar values : ($result1,$result2)=@array scalar variable. $result=@array;
52

Arrays
last index number in an array. Add in a $ and it will provide : $result = $@array; the amount will have to be adjusted by one. $result = $@array+1; To copy one array to a new (second) array : @array1 = @array2 To add an new value to the beginning of an array, the UNSHIFT command is used : unshift(@array,newelement);

53

Arrays
To add a new value to the end of an array also has two options : (@array,newelement); @array=(@array,newelement);
Next is combining two arrays into a new array :

@newarray=(@firstarray,@secondarray);

To remove the first value of an array the SHIFT command is used : shift(@array); You can store that removed value into a scalar at the same time too : $result=shift(@array);

54

Arrays
To remove the last element of an array : pop(@array); To remove the last element of an array and store it in a scalar : $result=pop(@array); To replace a specific element in an array : $array[number]=$newelement;

55

# sort lexically @articles = sort @files; # same thing, but with explicit sort routine @articles = sort {$a cmp $b} @files; # same thing in reversed order @articles = sort {$b cmp $a} @files; # sort numerically ascending @articles = sort {$a <=> $b} @files; # sort numerically descending @articles = sort {$b <=> $a} @files; # sort using explicit subroutine name sub byage { $age{$a} <=> $age{$b}; # presuming integers } @sortedclass = sort byage @class;
57

Manipulating Arrays

Manipulating Arrays
Split a string into words and put into an array

Split into characters


Array to space separated string Array of characters to string

Join with any character you want


Join with multiple characters To append to the end of an array :

To remove the last element of the array (LIFO)


To remove the first element of the array To prepend to the beginning of an array
59

Strings to Arrays : split


Split a string into words and put into an array @array = split( /;/, Larry;Curly;Moe ); @array= (Larry, Curly, Moe);
# creates the same array as we saw

previously

Split into characters @stooge = split( //, curly ); # array @stooge has 5 elements: c, u, r, l, y

60

Split cont..
Split on any character @array = split( /:/, 10:20:30:40); # array has 4 elements : 10, 20, 30, 40 Split on Multiple White Space @array = split(/\s+/, this is a test; # array has 4 elements : this, is, a, test

61

Arrays to Strings
Array to space separated string @array = (Larry, Curly, Moe); $string = join( ;, @array); # string = Larry;Curly;Moe Array of characters to string @stooge = (c, u, r, l, y); $string = join( , @stooge ); # string = curly

62

Joining Arrays cont


Join with any character you want @array = ( 10, 20, 30, 40 ); $string = join( :, @array); # string = 10:20:30:40 Join with multiple characters @array = 10, 20, 30, 40); $string = join(->, @array); # string = 10->20->30->40

63

Arrays as Stacks and Lists


To append to the end of an array : @array = ( Larry, Curly, Moe ); push (@array, Shemp ); print $array[3]; # prints Shemp

To remove the last element of the array (LIFO) $elment = pop @array; print $element; # prints Shemp @array now has the original elements (Larry, Curly, Moe)
64

Arrays as Stacks and Lists


To prepend to the beginning of an array @array = ( Larry, Curly, Moe ); unshift @array, Shemp; print $array[3]; # prints Moe print $array[0]; # prints Shemp To remove the first element of the array $element = shift @array; print $element; # prints Shemp The array now contains only : Larry, Curly, Moe

65

Associative Arrays
An associative array, each ID key is

associated with a value. When storing data about specific named values, a numerical array is not always the best way to do it. With associative arrays we can use the values as keys and assign values to them.

@ages = array( Raja"=>32, Raman"=>30, Mathan"=>34);

@ ages[Raja'] = "32";

@ ages[Raman'] = "30"; @ ages[Mathan'] = "34";

<?php

@ ages[Raja'] = "32"; @ ages[Raman'] = "30"; @ ages[Mathan'] = "34";


echo Raja is " . @ ages[Raja'] . " years old."; ?>

Multidimensional Arrays
A Perl array is a data type that allows you to

store a list of items


Two-dimensional Arrays Three-dimensional Arrays

Two-dimensional Arrays

<?php @ shop = array( array("rose", 1.25 , 15), array("daisy", 0.75 , 25), array("orchid", 1.15 , 7) ); ?>

Three-dimensional Arrays
<?php @ shop = array(array(array("rose", 1.25, 15), array("daisy", 0.75, 25), array("orchid", 1.15, 7) ), array(array("rose", 1.25, 15), array("daisy", 0.75, 25), array("orchid", 1.15, 7) ), array(array("rose", 1.25, 15), array("daisy", 0.75, 25), array("orchid", 1.15, 7) ) ); ?>

Multi Dimensional Arrays


If you need to:

@tab=([Monday,Tuesday], [Morning,Afternoon,Evening]);

@ a= @ tab[0][0] # $a == Monday @ tab2=(midnight, Twelve); @ tab[2]=\@tab2 # integrate tab2 as the


last row of tab
To add more dimensional to your arrays
72

String functions in Perl


Chop Chomp Length Substring Reversing a String


73

Chop
The chop function is used to "chop off" the

last character of a string variable.


$who_are_you = you; chop ($who_are_you); print "You are $who_are_you!"; You are yo!
74

Chomp
the chomp command on that last line instead.

$who_are_you = you
chomp ($who_are_you); print "You are $who_are_you!";

You are you!


it won't remove the "u".
75

Length
The length function simply gives you back the

number of characters in a string variable.


$ice="cold";

$length_ice = length ($ice);


4

76

Substring
The substring function is a way to get a

portion of a string value.


$ice="cold";

$age = substr($ice, 1, 3); print $ice";


old

77

Lower and Upper

78

# reverse word order $string = 'Yoda said, "can you see this?"'; @allwords = split(" ", $string); $revwords = join(" ", reverse @allwords); print $revwords, "\n"; this?" see you "can said, Yoda

79

$gnirts = reverse($string);

# reverse letters in $string


# reverse elements in @words

@sdrow = reverse(@words);

80

Paragraphs

82

Hashes
Hashes are complex list data, like arrays

except they link a key to a value. To define a hash, we use the percent (%) symbol before the name.
%coins = ("Quarter", 25, "Dime", 10, "Nickel", 5);
print %coins; Nickel5Dime10Quarter25
83

Hashes using while loop


%coins = ( "Quarter" , 25, "Dime" , 10, "Nickel", 5 ); while (($key, $value) = each(%coins)) { print $key.", ".$value."<br />"; }
Nickel, 5 Dime, 10 Quarter, 25

84

Sorting Hashes by Key


%coins = ( "Quarter" , 25, "Dime" , 10, "Nickel", 5 ); foreach $key (sort keys %coins) { print "$key: $coins{$key}<br />"; }
Dime: 10 Nickel: 5 Quarter: 25

85

Sorting Hashes by Value


%coins = ( "Quarter" , .25, "Dime" , .10, "Nickel", .05 ); foreach $value (sort {$coins{$a} cmp $coins{$b} } Nickel 0.05 Dime 0.1 keys %coins) Quarter 0.25 { print "$value $coins{$value}<br />"; }
86

Special Variables
Global Scalar Special Variables.

Global Array Special Variables.


Global Hash Special Variables.

Global Special Filehandles.


Global Special Constants.

Regular Expression Special Variables.


Filehandle Special Variables.
87

$@ $EVAL_ERROR The Perl syntax error message from the last eval command.

$$ $PROCESS_ID or $PID $< The pid of the Perl process running this script.

$REAL_USER_ID or $UID

The real user ID (uid) of this process.

$> $EFFECTIVE_USER_ID or $EUID The effective user ID of this process.

$( $REAL_GROUP_ID or $GID The real group ID (gid) of this process.


90

argument
A piece of data supplied to a program, subroutine,

function, or method to tell it what it's supposed to do. Also called a "parameter". Subroutines

A subroutine is a named block of code Separate from the main part of the program Usually put subroutines at end of file Subroutines can take arguments and return values print(), chomp(), chop() are built-in subroutines

If the subroutine returns a meaningful value, it is also

called a function.

Defining a subroutine
sub header { print "-" x 79, "\n"; print "December Sales Report\n"; print "-" x 79, "\n"; }

99

command-line arguments
./argv.pl 1 2 3 4

perl argv.pl 1 2 3 4
thanks, you gave me 4 command-line

arguments: 1 2 3 4
100

printf( qq{<%10d>\n}, 12); < 12> printf( qq{<%-10d>\n}, 12);


%s: a string %d: an integer

<12 >

%f: a floating point number in decimal notation %e: a floating point number in scientific notation
101

How do I read command-line arguments with Perl?


Arguments are the values you pass to a Perl script. With Perl, command-line arguments are stored in the array

named @ARGV
$ARGV[0] contains the first argument, $ARGV[1] contains

the second argument, etc.


$#ARGV is the subscript of the last element of the

@ARGV array
number of arguments on the command line is $#ARGV +

1.

102

argument
Reading command line arguments from perl

Passing arguments
Accessing Function Parameters Setting Default Values for Function Parameters

Passing Values by Reference


Returning More Than One Value

103

Reading command line arguments from perl


perl my_script.pl 34 66 dallas
for my $arg (@ARGV) { print qw{You passed in $arg\n}; }

You passed in 34 You passed in 66 You passed in dallas


104

Accessing Function Parameters


// add two numbers together function add($a, $b) { return $a + $b; } $total = add(2, 2); // 4

Setting Default Values for Function Parameters


function wrap_html_tag($string, $tag = 'b') { return "<$tag>$string</$tag>"; }

Passing Values by Reference


function wrap_html_tag(&$string, $tag = 'b') { $string = "<$tag>$string</$tag>"; }

Returning More Than One Value


function averages($stats) { ... return array($median, $mean, $mode); } list($median, $mean, $mode) = averages($stats);

file
A named collection of data, usually stored on

disk in a directory in a filesystem. Roughly like a document, if you're into office metaphors. In modern filesystems, you can actually give a file more than one name. Some files have special properties, like directories and devices.

More about file management


open(INFILE,"myfile"): reading open(OUTFILE,">myfile"): writing open(OUTFILE,">>myfile"): appending open(INFILE,"someprogram |"): reading from program open(OUTFILE,"| someprogram"): writing to program opendir(DIR,"mydirectory"): open directo

Operations on an open file handle $a = <INFILE>: read a line from INFILE into $a @a = <INFILE>: read all lines from INFILE into @a $a = readdir(DIR): read a filename from DIR into $a @a = readdir(DIR): read all filenames from DIR into @a read(INFILE,$a,$length): read $length characters from INFILE into $a print OUTFILE "text": write some text in OUTFILE
Close files / directories close(FILE): close a file closedir(DIR): close a directory

Other file management commands


binmode(HANDLE): change file mode from text to binary unlink("myfile"): delete file myfile rename("file1","file2"): change name of file file1 to file2 mkdir("mydir"): create directory mydir rmdir("mydir"): delete directory mydir chdir("mydir"): change the current directory to mydir system("command"): execute command command die("message"): exit program with message message warn("message"): warn user about problem message Example open(INFILE,"myfile") or die("cannot open myfile!"); Other About $_ Holds the content of the current variable Examples: while(<INFILE>) # $_ contains the current line read foreach (@array) # $_ contains the current element in @array

File Handlers
Opening a File: open (SRC, my_file.txt); Reading from a File $line = <SRC>; # reads upto a newline character Closing a File close (SRC);

File Handlers cont...


Opening a file for output: open (DST, >my_file.txt);

Opening a file for appending

open (DST, >>my_file.txt);


Writing to a file:

print DST Printing my first line.\n;


Safeguarding against opening a non existent file open (SRC, file.txt) || die Could not open file.\n;

File Test Operators


Check to see if a file exists:

if ( -e file.txt) { # The file exists! }


Other file test operators: -r readable -x executable -d is a directory -T is a text file

Quick Program with File Handles


Program to copy a file to a destination file
#!/usr/bin/perl -w open(SRC, file.txt) || die Could not open source file.\n; open(DST, >newfile.txt); while ( $line = <SRC> ) { print DST $line; } close SRC; close DST;

Some Default File Handles


STDIN : Standard Input $line = <STDIN>; # takes input from stdin
STDOUT : Standard output print STDOUT File handling in Perl is sweet!\n; STDERR : Standard Error print STDERR Error!!\n;

The <> File Handle


The empty file handle takes the command line

file(s) or STDIN;

$line = <>;

If program is run ./prog.pl file.txt, this will

automatically open file.txt and read the first line. If program is run ./prog.pl file1.txt file2.txt, this will first read in file1.txt and then file2.txt ... you will not know when one ends and the other begins.

The <> File Handle cont...


If program is run ./prog.pl, the program will wait for

you to enter text at the prompt, and will continue until you enter the EOF character

CTRL-D in UNIX

Example Program with STDIN


Suppose you want to determine if you are one of the

three stooges
#!/usr/local/bin/perl %stooges = (larry => 1, moe => 1, curly => 1 ); print Enter your name: ? ; $name = <STDIN>; chomp $name; if($stooges{ lc($name) }) { print You are one of the Three Stooges!!\n; } else { print Sorry, you are not a Stooge!!\n; }

Combining File Content


Given The two Following Files:
File1.txt 1 2 3

And
File2.txt a b c

Write a program that takes the two files as arguments and outputs a third file that looks like:
File3.txt 1 a 2 b 3

Tip: ./mix_files File1.txt File2.txt File3.txt

Combining File Content


#! /usr/bin/perl open (F, $ARGV[0]); open (G, $ARGV[1]); open (H, >$ARGV[2]); while ( defined (F) && defined (G) && ($l1=<F>) && ($l2=<G>)) { print H $l1$l2; } close (F); close (G); close (H);

Chomp and Chop


Chomp : function that deletes a trailing newline from the

end of a string. $line = this is the first line of text\n; chomp $line; # removes the new line character print $line; # prints this is the first line of # text without returning Chop : function that chops off the last character of a string. $line = this is the first line of text; chop $line; print $line; #prints this is the first line of tex

pattern
A template used in pattern matching.

Patterns are subject to an additional level of

interpretation as a regular expression. pattern matching Taking a pattern, usually a regular expression, and trying the pattern various ways on a string to see whether there's any way to make it fit.

Pattern Matching
Introduction Expressions Copying and Substituting Approximate Matching Simultaneously Matching from Where the Last Pattern Matching Letters Left Off Matching Words Greedy and Non-Greedy Matches Commenting Regular Expressions Detecting Duplicate Words Finding the N th Occurrence of a Match Expressing AND, OR, and NOT in a Matching Multiple Lines Single Pattern Reading Records with a Pattern Matching Multiple-Byte Characters Separator Matching a Valid Mail Address Extracting a Range of Lines Matching Abbreviations Matching Shell Globs as Regular Program: urlify Expressions Program: tcgrep Speeding Up Interpolated Matches Regular Expression Grabbag Testing for a Valid Pattern Honoring Locale Settings in Regular
126

Introduction
match( $string, $pattern );

subst( $string, $pattern, $replacement );


$meadow =~ m/sheep/;

# True if $meadow contains "sheep"

$meadow !~ m/sheep/;

# True if $meadow doesn't contain "sheep"


# Replace "old" with "new" in $meadow
127

$meadow =~ s/old/new/;

/ovine/ and a string to match this against,

it searches the string for an "o" that is

immediately followed by a "v", then by an "i", then by an "n", and then finally by an "e".

128

three aspects
greed

eagerness
Backtracking

129

Greed
It is the principle that if a quantifier (like *) can

match a varying number of times Eagerness is the notion that the leftmost match wins. The engine is very eager to return you a match as quickly as possible, sometimes even before you are expecting it. Consider the match "Fred" =~ /x*/. Formally, it means zero or more of them, and in this case, zero sufficed for the eager matcher.
130

"Fred" =~ /x*

"Fred" contain any x 's


/x*/ doesn't truly mean "any x's",

131

eagerness
$string = "good food";

$string =~ s/o*/e/;
good food geod food geed food

geed feed
ged food ged fed egood food
132

Pattern-Matching Modifiers

133

non-overlapping matches

134

Special Variables
$string = "And little lambs eat ivy"; $string =~ /l[^s]*s/; print "($`) ($&) ($')\n"; (And ) (little lambs) ( eat ivy)

135

Copying & Substituting Simultaneously


Instead of:

$dst = $src;
$dst =~ s/this/that/; use: ($dst = $src) =~ s/this/that/;

136

Matching Letters (matching regular letters)


if ($var =~ /^[A-Za-z]+$/) { # it is purely alphabetic } if ($var =~ /^[^\W\d_]+$/) { print "var is purely alphabetic\n"; }
137

Matching Words (separates one word from the next,)


/\S+/

# as many non-whitespace bytes as possible

/[A-Za-z'-]+/
# as many letters, apostrophes, and hyphens

/\b([A-Za-z]+)\b/

# usually best /\s([A-Za-z]+)\s/

# fails at ends or w/ punctuation


138

Commenting Regular Expressions


comments outside the pattern,

comments inside the pattern with the /x

modifier, comments inside the replacement part of s///, alternate delimiters.

139

Finding the Nth Occurrence of a Match


One fish two fish red fish blue fish $WANT = 3; $count = 0; while (/(\w+)\s+fish\b/gi)

if (++$count == $WANT) { print "The third fish is a $1 one.\n"; Warning: don't `last' out of this loop } }
The third fish is a red one.

140

Matching Multiple Lines


a string containing more than one line, but

the special characters . (any character but newline), ^ (start of string), and $ (end of string) don't seem to work for you. This might happen if you're reading in multiline records or the whole file at once.

141

Regular Expressions cont..


A regular expression characterizes a regular language Examples in UNIX: ls *.c

Lists all the files in the current directory that are postfixed '.c' Lists all the files in the current directory that are postfixed '.txt'

ls *.txt

The Match Operator


What would are program then look like:

if($word=~m/ing/) { print $word\n;}

Regular Expressions Types


Regular expressions are composed of two types of

characters:

Literals

Normal text characters

Metacharacters

special characters Add a great deal of flexibility to your search

Metacharacters
Match more than just characters Match line position ^ start of a line $ end of a line

( carat ) ( dollar sign )

Match any characters in a list : [ ... ] Example : /[Bb]ridget/ /Mc[Ii]nnes/

matches Bridget or bridget matches McInnes or Mcinnes

Our Simple Example Revisited


Now suppose we only want to match words that end in

'ing' rather than just contain 'ing'. How would we change are regular expressions to accomplish this:

Previous Regular Expression: $word =~m/ ing / New Regular Expression: $word=~m/ ing$ /

Ranges of Regular Expressions


Ranges can be specified in Regular Expressions Valid Ranges [A-Z] Upper Case Roman Alphabet [a-z] Lower Case Roman Alphabet [A-Za-z] Upper or Lower Case Roman Alphabet [A-F] Upper Case A through F Roman Characters [A-z] Valid but be careful
Invalid Ranges [a-Z] Not Valid [F-A] Not Valid

Ranges cont ...


Ranges of Digits can also be specified [0-9] Valid [9-0] Invalid Negating Ranges / [^0-9] / Match anything except a digit / [^a] / Match anything except an a / ^[^A-Z] / Match anything that starts with something other than a single upper case letter First ^ : start of line Second ^ : negation

Our Simple Example Again


Now suppose we want to create a list of all the

words in our text that do not end in 'ing' How would we change are regular expressions to accomplish this:

Previous Regular Expression: $word =~m/ ing$ / New Regular Expression: !($word=~m/ (ing)$ /)

Matching Interogations

$string=~/([^.?]+\?)/ $string=~/[.?]([A-Z0-9][^.?]+\?)/ $string=~/([\w\s]+\?)/

Removing HTML Tags

$string=~s/\<[^>]+\>/ /g g: substitute EVERY instance

Literal Metacharacters
Suppose that you actually want to look for all strings

that equal $' in your text

Use the \ symbol / \$ / Regular expression to search for

What does the following Regular Expressions Match?

/ [ ABCDEFGHIJKLMNOP$] \$/ / [ A-P$ ] \$ /

Matches any line that contains ( A-P or $) followed by $

Patterns provided in Perl


Some Patterns \d [09] \w [a z A Z 0 9_] \s [ \r \t \n \f ] (white space pattern) \D [^ 0 - 9] \W [^ a z A Z 0 9_] \S [^ \r \t \n \f] Example : ( 19\d\d ) Looks for any year in the 1900's

Using Patterns in our Example


Commonly words are not separated by just a single

space but by tabs, returns, ect... Let's modify our split function to incorporate multiple white space

#!/usr/local/bin/perl while(<>) { chomp; @words = split/\s+/, $_; foreach $word(@words) { if($word=~m/ing$/) { print $word\n; } }

Word Boundary Metacharacter


Regular Expression to match the start or the end of a

'word' : \b Examples:

/ Jeff\b / Match Jeff but not Jefferson / Carol\b / Match Carol but not Caroline / Rollin\b / Match Rollin but not Rolling /\bform / Match form or formation but not Information /\bform\b/ Match form but neither information nor formation

DOT Metacharacter
The DOT Metacharacter, '.' symbolizes any character

except a new line / b . bble/

Would possibly return : bobble, babble, bubble

/ . oat/ Would possibly return : boat, coat, goat Note: remember '.*' usually means a bunch of anything,

this can be handy but also can have hidden ramifications.

PIPE Metacharacter
The PIPE Metacharacter is used for alternation / Bridget (Thomson | McInnes) / Match Bridget Thomson or Bridget McInnes but Bridget Thomson McInnes / B | bridget / Match B or bridget / ^( B | b ) ridget / Match Bridget or bridget at the beginning of a line

NOT

Our Simple Example


Now with our example, suppose that we want to not

only get all words that end in 'ing' but also 'ed'. How would we change are regular expressions to accomplish this:

Previous Regular Expression: $word =~m/ ing / New Regular Expression: $word=~m/ (ing|ed)/

The ? Metacharacter
The metacharacter, ?, indicates that the character

immediately preceding it occurs zero or one time Examples:

/ worl?ds /

Match either 'worlds' or 'words'

/ m?ethane /

Match either 'methane' or 'ethane'

The * Metacharacter
The metacharacter, *, indicates that the character

immediately preceding it occurs zero or more times Example :

/ ab*c/

Match 'ac', 'abc', 'abbc', 'abbbc' ect...

Matches any string that starts with an a, if possibly followed by a sequence of b's and ends with a c.

Sometimes called Kleene's star

Our Simple Example again


Now suppose we want to create a list of all the words in our text that end in 'ing' or 'ings' How would we change are regular expressions to accomplish this:

Previous Regular Expression: $word =~m/ ing$ /

New Regular Expression:


$word=~m/ ings?$

Exercise

For each of the strings (a)--(e), say which of the patterns (i)--(xii) it matches. Where there is a match, what would be the values of $MATCH, $1, $2, etc.?

1) 2) 3) 4) 5) 1) 2) 3) 4) 5) 6) 7) 8) 9) 11) 12) 13)

the quick brown fox jumped over the lazy dog The Sea! The Sea! (.+)\s*\1 9780471975632 C:\DOS\PATH\NAME /[a-z]/ /(\W+)/ /\W*/ /^\w+$/ /[^\w+$]/ /\d/ /(.+)\s*\1/ /((.+)\s*\1)/ /(.+)\s*((\1))/ /\DOS/ /\\DOS/ /\\\DOS/

Exercise

For each of the strings (a)--(e), say which of the patterns (i)--(xii) it matches. Where there is a match, what would be the values of $MATCH, $1, $2, etc.?

1) 2) 3) 4) 5) 1) 2) 3) 4) 5) 6) 7) 8) 9) 10) 11) 12)

the quick brown fox jumped over the lazy dog The Sea! The Sea! (.+)\s*\1 9780471975632 C:\DOS\PATH\NAME /[a-z]/ /(\W+)/ /\W*/ /^\w+$/ /[^\w+$]/ /\d/ /(.+)\s*\1/ /((.+)\s*\1)/ /(.+)\s*((\1))/ /\DOS/ /\\DOS/ /\\\DOS/ 1,2,3 1,2,3,5 1,2,3,5 4 1,2,3,5 3,4 2,
2 5 5 5

1,2,3,5 1,2,3,5,7,9 1,2,3, 5, 6 3,4,6 2,3,5,10,11,12

Modifying Text With Regular Expressions

Modifying Text
Match Up to this point, we have seen attempt to match a given regular expression Example : $variable =~m/ regex /

Substitution Takes match one step further : if there is a match, then replace it with the given string Example : $variable =~s/ regex / replacement/

$var =~ s/ Cedric / Notredame /g;

Substitution Example
Suppose when we find all our words that end in 'ing' we

want to replace the 'ing' with 'ed'.


#!/usr/local/bin/perl -w while(<>) { chomp $_; @words = split/ \s+/, $_; foreach $word(@words) { if($word=~s/ing$/ed/) { print $word\n; } } }

Special Variable Modified by a Match


$target=I have 25 apples

$target=~/(\d+)/

$& => 25 Copy of text matched by the regex

$' =>I have A copy of the target text until the first match
$` => apples A copy of the target text after the last match $1, $2, $3, ect $1=25 The text matched by 1st, 2nd, ect., set of parentheses. Note : $0 is not included here

$+

A copy of the highest numbered $1, $2, $3, ect..

Our Simple Example once again


Now lets revise our program to find all the words that

end in 'ing' without splitting our line of text into an array of words
#!/usr/local/bin/perl -w while(<>) { chomp $_; if($_=~/([A-Za-z]*ing\b)/g) { print "$&\n"; } }

Example
#!/usr/local/bin $exp = <STDIN>; chomp $exp; if($exp=~/^([A-Za-z+\s]*)\bcrave\b([\sA-Za-z]+)/) { print $1\n; print $2\n; } Run Program with string : I crave to rule the world! Results: I to rule the world!

Example
#!/usr/local/bin $exp = <STDIN>; chomp $exp; if($exp=~/\bcrave\b/) { print $`\n; print $&\n; print $\n; } Run Program with string : I crave to rule the world! Results: I crave to rule the world!

171

Vous aimerez peut-être aussi