Académique Documents
Professionnel Documents
Culture Documents
Email: txian@ebay.com
\w+@\w+(\.\w+)*
IP address: 192.169.0.33
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][09]|[01]?[0-9][0-9]?)
Pattern Matching
Simply speaking, a string has at least one
sub-string that matches the defined pattern
(expression)
The cat captured a mouse yesterday has
the pattern cap.
Gr[ae]y will match Gray or Grey, but
not Graey, nor Graay.
Word boundaries, \b
\beBay\b, eBay as a single word, not eBays.
Alternation, |
(live|die)
Email of someone@somewhere
\w+@\w+
More generically, ([a-zA-Z0-9]+)(\.[a-zA-Z0-9]+)*@ ([a-zA-Z09]+)(\.[a-zA-Z0-9]+)*
\w+(\.\w+)*@ \w+(\.\w+)*
Using Back-reference
In Perl, A regular expression can be reused
in a compound expression. \1 - \9
Example
([a-c])x\1x\1 will match axaxa, bxbxb and cxcxc.
Here \1 represents ([a-c]), or \1 represents the first
(expression).
([a-z])x([0-9])y\1y\2 will match ax0yby3
\1 for ([a-z])
\2 for ([0-9])
Case Studies
Someones English name
Definition: first name (middle name) last name
(Sr.|Jr.)
Example, George W. Bush
[A-Z]\w*[ \t]+([A-Z]\w*|[A-Z]\.?)?[ \t]+[AZ]\w*
(Sr\.|Jr\.)?
Case Studies
Birth Date
Definition: (m)m/(d)d/yyyy
Example: 02/29/1964
Month: 1, 2, 9,10, 11,12, 01, 02, etc.
[1][0-2]|0?[1-9]
Final version:
([1][0-2]|0?[1-9])/(0?[1-9]|[12][0-9]|3[01])/([01]\d{3}|200[0-5])
Case Studies
IP v4 address in dot notation
Definition: 0-255.0-255.0-255.0-255
Example: 192.168.0.24
0-255: \d{1,2}|1[0-9]{2}|2[0-4][0-9]|25[0-5]
Overall
(\d{1,2}|1[0-9]{2}|2[0-4][0-9]|25[0-5])(\. \d{1,2}|1[0-9]
{2}|2[0-4][0-9]|25[0-5]){3}
Using back-reference
\d{1,2}|1[0-9]{2}|2[0-4][0-9]|25[0-5](\.\1){3}
Case Studies
URL
Definition: protocol://host_string[:port]/URI
Port has value range from 1 to 65535
Example: http://cgi5.ebay.com/ws/isapi.dll?sellitem
Protocol: http|ftp|rmi|t3|https
Host_string: \w+(\.\w+)*
Port number: 6[0-4]\d{3}|654\d{2}|6552\d|6553[0-5]|[1-5]?\d{1,4}
URI: ///
(/[a-zA-Z_0-9\?\.%&=@]+)*
Overall
(http|ftp|rmi|t3|https):// \w+(\.\w+)*(\:6[0-4]\d{3}|654\d{2}|6552\d|
6553[0-5]|[1-5]?\d{1,4}
)?(/[a-zA-Z_0-9\?\.%&=@]+)*
Java API
http://java.sun.com/j2se/1.4.2/docs/api/index.html
java.util.regex.Pattern
java.util.regex.Matcher