Using group expressions in regular expression pattern matching

I’ve used group expressions in regexes many times, but only for replacement. Yesterday I learned that they can also be used for matching.

For example, let’s say you have the text:

Banananananas don’t grow in Mississississippi because banananas are afraid of getting turned into Missississippi’s famous bananana pudding.

The following regular expression will find instances of iss or an that are repeated more than twice.

(iss|an)\1\1+

You can use \1\1 as the replacement (or $1$1 in Dreamweaver, which uses backslashes to identify groups in match expressions, but dollar signs to represent groups in replace expressions) to turn the misspelled words into Mississippi and banana(s).

Another example might be applying consistent formatting to phone numbers or dates.

Phone numbers
Let’s say you usually use 555-555-1212 as the format for phone numbers and sometimes you use 555.555.1212, but the new trend is to use spaces instead of dashes or dots as separators:

Find: ([\d]{3})([-\.])([\d]{3})\2([\d]{4})
Replace: \1 \3 \4

Dates
Let’s say you usually use 12/5/2013 as the format for dates, dabbled with 12.5.2013, but now you’ve decided that dashes are clearer:

Find: ([\d]{1,2})([\./])([\d]{1,2})\2([\d]{4})
Replace: \1-\3-\4

In both cases you could just repeat the bracketed character class, but then you could end up matching strings you didn’t intend to:

  • 555-555.1212
  • 12.5/2013

Leave a Reply

Your email address will not be published. Required fields are marked *