PATTERN MATCHING (OR HOW TO DO A LOOK LIKE)
You can test a string against a pattern (known as a REGULAR EXPRESSION if you want to say "does this string look like this pattern".
Regular expressions comprise a number of elements (of 6 basic types I'll tell you about in a minute) and are matched from left to right ... the regular expression is compared against the string element by element and if it's still "yes, that matched" when the comparison gets to the end, you have a match.
Using PHP's "ereg" function as an example ...
if (ereg("ham",$teststring)) { ... says look for the string "h", "a", "m" within $teststring, and return a true value if it occurs and a false value if it does not occur.
all letters and digits (so including h, a and m) are "literals" - the first of the basic types that you can put in a regular expression THE SIX BASIC TYPES ARE:
-> literals
-> character groups
-> anchors (a.k.a. zero width assertions)
-> counts
-> groupings
-> alternations
And we'll look at them one by one.
1. Literals. A character specified in the regular expression is matched exactly against the same character in the teststring. All letters and digits that appear in a regular expression (unless within some other type) are literals, as are many of the special characters such as % ! @ & - _ = < > / , : " ' and ; (this is NOT a complete list. If you want other special characters to match exactly, you mus preceed them with a \ (to say "I really want a ...") and
remember that you should use single not double quoted strings (PHP) for your regular expression to avoid the double quote operator picking up the backslash! Example:
if (ereg('@hotmail\.com',$teststring)) { ... will match and perform the block if the $testsring variable contains "@hotmail.com". The \ is needed before the "." as "." is NOT one of the special characters that's taken as a literal.
Note that this example WOULD match "rupert@hotmail.com.au" as it contains the required sequence of characters! 2. Character Groups. Written between square brackets, these match one character from $teststring against AND one character from the group. So [aeiouAE] would match a lower case a, e, i, o, u or a capaital A or E. You can use a "-" within a character group to specifiy a range of characters, and use a ^ directly after the [ to match any character EXCEPT the one(s) listed. There are other character groups too (once again, I'm giving you the concept) - note especially that "." matches any one character.
Example:
if (ereg('c[aeiou][^t]',$teststring)) { .... will match
-> a letter c
-> a lower case vowel (a, e, i, o or u)
-> and any character which is NOT a lower case t.
So it WILL match can cog and cup but NOT bog cat cot or cut. It WILL also match acorn as this contains the sequence you're looking for WITHIN the string.
3. Anchors. By default, regular expression matches are made anywhere within the teststring - the previous example match "acorn" for example. If you apply anchor - you use ^ to indicate "start of string" and $ to indicate end of string for example - then you can limit you match to the start or end ... and if you do both, you're specifying a regular expression that matches the whole string.
Example:
if (ereg('^c.t$',$teststring)) { .... will match a string that starts with a c, folled by any other character, followed by a t. And at that point the teststring must match - in other words, test string has to be 3 characters long. This will match cat cot cxt and even c*t. It will NOT match Scot, cats or scattergram.
4. Counts. Each literal, character group (and anchor) that you've seen so far matches once against the teststring. By adding a count AFTER any of these elements, you can specify that you want it to match a different fumber of times. The counts that you'll find used time and time again are:
? previous item occurs 0 or 1 times ("perhaps a")
+ previous item occurs 1 or more times ("some")
* previous item occurs 0 or more times ("perhaps some")
Example:
if (ereg('^https?://',$teststring)) { ... will match a teststring starting with http; that MAY be followed by an "s". Then the following characters (whether of not there was an s) will be ://. As there was no anchor, the match will be successful whatever else follows in the teststring.
5. Groupings. If you want your counts to apply to more than one character, you can use round brackets around the section to which the count applies.
Example:
if (ereg('^https?://(www\.)?wellho.net',$teststring)) { ... will match a test string staring with http:// or https://; that may be followed by www. (either all 4 of those characters or none of them) and it will then be followed by wellho.net.
6. Alternation. The "|" character in a regular expression means "or" over a wider scope than the character grouping - [http][ftp] would match any letter h ot t or p followed by any letter f or t or p, but (http|ftp) would match either "http" or "ftp". Note that it's sensible to group the alternatives with round brackets if you're not sure of how far the | will go.
Example:
if (ereg('^https?://(www\.)?wellho.net(/|$)',$teststring)) { ... will match exactly what the previous example matched ... EXCEPT that it must either be followed by a further /, or end at that point.
I hope those examples help you in your first steps with regular expressions - you are limited only by your imagination in what you can do, and there are many many more elements that I haven't introduced you to within the basic types. We do run a complete course on regular expressions ;-) ...
SOME FURTHER NOTES:
No partial matches - in other words, if a match fails then you get a false back rather than a message to tell you that "it matched but only up to this point".
Different flavours - regular expression handlers and functions come in a number of different flavours; PHP has two of them (ereg which I've used here are preg). At the level I've got to so far, most of the features are common ground.
Language Syntax - different syntax / calling functions are used within regular expressions in different languages.
Case - the examples show above are case sensitive. In PHP, eregi is a case insentitive alternative and other languages also provide a way of ignoring case.
Captures - having matched, you sometimes want to refer to the part of the teststring that matched specific parts of the regular expression. In order to capture part of the incoming string, you should use a set of grouping brackets to indicate the 'interesting bit'. How you can refer back to it later is function / language specific.
See also
Regular Expression course details
Please note that articles in this section of our
web site were current and correct to the best of our ability when published,
but by the nature of our business may go out of date quite quickly. The
quoting of a price, contract term or any other information in this area of
our website is NOT an offer to supply now on those terms - please check
back via
our main web site