PowerShell Basics #4: Matching and capturing with regular expressions

Using regular expressions on Windows hasn’t been particularly easy, as the standard command-line tools have provided very little support to these powerful beasts. On the other hand, various spawns of Unix have had loads of support for regexes on the command line, including classic tools such as grep, sed and awk.

PowerShell changes things here. It brings the full power of .NET regexes to the table, but makes them more easily accessible through some syntactic sugar.

Basic searches

Searching for text matching a given regex is very straightforward. Suppose you had the following lines of text in a file, probably representing addresses where your web service has been accessed from:

192.168.0.52
172.0.0.1
www.mysite.example
10.4.4.1
intranet.contoso.com
192.168.0.55
www.foobar.example

You might, for example, want to get only the lines that look like an IP address. For this, a very simple regular expression will do. It can be applied to the data with the –match operator.

type trafficsources.txt | where { $_ -match "^(\d{1,3}\.){3}\d{1,3}$" }

The –match operator provides case-insensitive matching. Therefore, -match "W+" would match the lines with “www” on them. If you need case-sensitivity, use -cmatch. If you want to be more specific about case-insensitivity, you can also use -imatch. If you want to look for lines that don't match the pattern, you can use the -notmatch, -inotmatch and -cnotmatch operators.

Capturing things

One common use for regular expressions is to capture pieces of data from the matching string for further processing. For example, imagine you had the following set of contacts in a text file called emails.txt:

Jouni Heikniemi <jouni@domain.example>
John Doe <john@contoso.example>
jane@mysite.example
phyllida@fabrikam.example ("Phyllida von Huber")

Now, imagine you want to extract the email addresses. First off, you could pick up just the lines with addresses by doing something like type emails.txt | where { $_ -match "\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]+\b" }. This wouldn't be strictly necessary if every line in the file contained an email address, but it helps in the next step, so you might do the filtering anyway.

Now, what –match does behind the scenes is that it populates a variable called $matches, which contains all the contents of the capturing groups the regular expression has. It currently has none, but the first group always represents the whole string the regex matched. Therefore, we can do this:

PS D:\temp> type emails.txt |   where { $_ -match "\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]+\b" } |   foreach { $matches[0] }
jouni@domain.example
john@contoso.example
jane@mysite.example
phyllida@fabrikam.example

In a more complex scenario, you’d definitely want to use named groups to clarify the matching. For example, suppose you wanted to pick the username and domain parts separately from the addresses. While you’re at it, you might as well construct some objects from the matches to get a cleaner view and help you manipulate the results.

PS D:\temp> $contacts = type emails.txt |
  where { $_ -match "\b(?<username>[A-Z0-9._%+-]+)@(?<domain>[A-Z0-9.-]+\.[A-Z]+)\b" } |
  foreach { new-object PSObject –prop
     @{ UserName=$matches['username']; Domain=$matches['domain'] } }
PS D:\temp> $contacts | format-table

Domain           UserName
------           --------
domain.example   jouni
contoso.example  john
mysite.example   jane
fabrikam.example phyllida

PS D:\temp> $contacts[0].Domain
domain.example

There you go! These examples should give you quite a good idea on how to use regular expressions to match and parse data. In the next installment of the series, I will look into using regular expressions to do textual replacements.

Share with your friends:
  • Digg
  • del.icio.us
  • Facebook
  • Twitter

February 13, 2010  Tags:   Posted in: .NET

Leave a Reply