Vous êtes sur la page 1sur 20

Regular Expressions and Their

Usages in Web User Inputs


By Tom Xian

Points of Regular Expressions


What is Regular Expression
A pattern of text string describing a certain amount of
text.
Examples:
Phone number: 408-376-6280
\d{3}-\d{3}-\d{4} or (\d{3}-){2}\d{4}

Email: txian@ebay.com
\w+@\w+(\.\w+)*

IP address: 192.169.0.33
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][09]|[01]?[0-9][0-9]?)

Points of Regular Expressions


Supported by many programming languages
including Java, C++, Perl, Python,Java
Script, .Net, etc.
Build string patterns quickly and precisely.
Excellent for validating user inputs in
Web/html.
Support Unicode.

Pattern Matching
Simply speaking, a string has at least one
sub-string that matches the defined pattern
(expression)
The cat captured a mouse yesterday has
the pattern cap.
Gr[ae]y will match Gray or Grey, but
not Graey, nor Graay.

Regex Engine Internals


Regex engine is a piece of software to
perform the matching between regular
expression and a text string.
The Regex-Directed Engine always returns
the leftmost match
He captured a catfish for his cat.
When applying cat as expression, catfish is
the first match, not the last word cat.

Character Sets and Meta Characters


Brackets [abc], representing any one character
inside the bracket.
Meta characters inside brackets, -, \, ^, and ]
Using - inside the brackets for range
[0-9], any single digits
[a-zA-Z], any letter.

Using ^ to negate the meaning of the character


inside the brackets.
[^0-9], meaning any character except a digit.

\w, meaning word character, [a-zA-Z0-9_]

Character Sets and Meta Characters


Famous ?, +, and *
?, option item before it.
Colou?r, for Colour, or Color.
(021-)?32174568, for 021-32174568 or 32174568

+ at least one occurrence before it.


1+, for 1, 11,111

* at least zero occurrence before it.


A*, for , A, AA, AAA

\W, for non-word, [^a-zA-Z0-9_]


Dot (.), representing a single character except the newline character (\n for Unix family, and \r\n for
Windows).
[^\n]|[^\r\n]

Character Sets and Meta Characters


Anchors, ^ and $
^ for beginning
^a matches a, ab, or aaa

$ for end, matches right after the last character


in the string.
x$, matches word relax, not boxes

Word boundaries, \b
\beBay\b, eBay as a single word, not eBays.

Alternation, |
(live|die)

Character Sets and Meta Characters


Repetition {n,m}, besides +, and *.

d{3}, for ddd.


d{1,3}, for d, dd, or ddd, not dddd
{0,} is same as *.
{1,} is same as +
Examples of telephone number with patterns, ddd-dddd,
or ddddddd
(\d{3}-\d{4})|\d{7}

Email of someone@somewhere
\w+@\w+
More generically, ([a-zA-Z0-9]+)(\.[a-zA-Z0-9]+)*@ ([a-zA-Z09]+)(\.[a-zA-Z0-9]+)*
\w+(\.\w+)*@ \w+(\.\w+)*

Predefined Character Classes


. Any character (may or may not match
line terminators)
\d A digit: [0-9]
\D A non-digit: [^0-9]
\s A whitespace character: [ \t\n\x0B\f\r]
\S A non-whitespace character: [^\s]
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w]

Using Back-reference
In Perl, A regular expression can be reused
in a compound expression. \1 - \9
Example
([a-c])x\1x\1 will match axaxa, bxbxb and cxcxc.
Here \1 represents ([a-c]), or \1 represents the first
(expression).
([a-z])x([0-9])y\1y\2 will match ax0yby3
\1 for ([a-z])
\2 for ([0-9])

Case Studies
Someones English name
Definition: first name (middle name) last name
(Sr.|Jr.)
Example, George W. Bush
[A-Z]\w*[ \t]+([A-Z]\w*|[A-Z]\.?)?[ \t]+[AZ]\w*
(Sr\.|Jr\.)?

Credit Card Number


Definitions: dddd-dddd-dddd-dddd or 16 digitis
Expression: ((\d{4}-){3}\d{4})|(\d{16})

Case Studies
Birth Date
Definition: (m)m/(d)d/yyyy
Example: 02/29/1964
Month: 1, 2, 9,10, 11,12, 01, 02, etc.
[1][0-2]|0?[1-9]

Day: 1,2, 9, 10, 11,31, 01, 02,


[1-9]|0?[1-9]|1[0-9]|2[0-9]|3[01]
0?[1-9]|[12][0-9]|3[01]

Practical Birth Date, someone who was born after BC


000\d|00\d\d|0\d\d\d|1\d\d\d|200[0-5]
0\d{3}|1\d{3}|200[0-5]
[01]\d{3}|200[0-5]

Final version:
([1][0-2]|0?[1-9])/(0?[1-9]|[12][0-9]|3[01])/([01]\d{3}|200[0-5])

Case Studies
IP v4 address in dot notation

Definition: 0-255.0-255.0-255.0-255
Example: 192.168.0.24
0-255: \d{1,2}|1[0-9]{2}|2[0-4][0-9]|25[0-5]
Overall
(\d{1,2}|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\. \d{1,2}|1[0-9]
{2}|2[0-4][0-9]|25[0-5]){3}
Using back-reference
\d{1,2}|1[0-9]{2}|2[0-4][0-9]|25[0-5](\.\1){3}

Case Studies
URL
Definition: protocol://host_string[:port]/URI
Port has value range from 1 to 65535
Example: http://cgi5.ebay.com/ws/isapi.dll?sellitem
Protocol: http|ftp|rmi|t3|https
Host_string: \w+(\.\w+)*
Port number: 6[0-4]\d{3}|654\d{2}|6552\d|6553[0-5]|[1-5]?\d{1,4}
URI: ///
(/[a-zA-Z_0-9\?\.%&=@]+)*
Overall
(http|ftp|rmi|t3|https):// \w+(\.\w+)*(\:6[0-4]\d{3}|654\d{2}|6552\d|
6553[0-5]|[1-5]?\d{1,4}
)?(/[a-zA-Z_0-9\?\.%&=@]+)*

Java API
http://java.sun.com/j2se/1.4.2/docs/api/index.html
java.util.regex.Pattern
java.util.regex.Matcher

Sample Java Code using v1.4


import java.util.regex.*;
public class regExp
{
//define the patterns.
//phone: ddddddd, or ddd-ddd-dddd
private final static Pattern phonePattern = Pattern.compile("\\d{7}|(\\d{3}-){2}\\d{4}");
// email: someone@somewhere, abc.xyz@ebay.com.cn ...
private final static Pattern emailPattern =
Pattern.compile("\\w+(\\.\\w+)*@\\w+(\\.\\w+)*");
static boolean isPhone(String testString)
{
if(testString == null)
return false;
Matcher m = phonePattern.matcher(testString);
return m.matches();
}

Sample Java Code using v1.4


static boolean isMatchedPattern(Pattern pat, String testString)
{
if(pat == null || testString == null)
return false;
Matcher m = pat.matcher(testString);
return m.matches();
}
public static void main(String[] args)
{
if(args.length == 0)
{
System.out.println("No arg.");
}

Sample Java Code using v1.4


else if(args.length == 1)
{
if(isPhone(args[0]))
System.out.println("matched phone number pattern");
else
System.out.println("not matched phone number pattern");
if(isMatchedPattern(emailPattern,args[0]))
System.out.println("matched email pattern");
else
System.out.println("not matched email pattern");
}

Sample Java Code using v1.4


else if(args.length == 2)
{ // args[0] is pattern, args[1] is test string
if(Pattern.matches(args[0], args[1]))
System.out.println(args[1] +" is matched pattern," + args[0]);
else
System.out.println(args[1] +" is not matched pattern," + args[0]);
}
}
}

Vous aimerez peut-être aussi