PowerShell Basics #4: Matching and capturing with regular expressions
Using regular expressions on Windows hasn’t been particularly easy, as the standard command-line tools have provided very little support to these powerful beasts. On the other hand, various spawns of Unix have had loads of support for regexes on the command line, including classic tools such as grep, sed and awk.
PowerShell changes things here. It brings the full power of .NET regexes to the table, but makes them more easily accessible through some syntactic sugar.
Basic searches
Searching for text matching a given regex is very straightforward. Suppose you had the following lines of text in a file, probably representing addresses where your web service has been accessed from:
192.168.0.52 172.0.0.1 www.mysite.example 10.4.4.1 intranet.contoso.com 192.168.0.55 www.foobar.example
You might, for example, want to get only the lines that look like an IP address. For this, a very simple regular expression will do. It can be applied to the data with the –match operator.
type trafficsources.txt | where { $_ -match "^(\d{1,3}\.){3}\d{1,3}$" }
The –match operator provides case-insensitive matching. Therefore, -match "W+"
would match the lines with “www” on them. If you need case-sensitivity, use -cmatch. If you want to be more specific about case-insensitivity, you can also use -imatch. If you want to look for lines that don't match the pattern, you can use the -notmatch, -inotmatch and -cnotmatch operators.
Capturing things
One common use for regular expressions is to capture pieces of data from the matching string for further processing. For example, imagine you had the following set of contacts in a text file called emails.txt:
Jouni Heikniemi <jouni@domain.example> John Doe <john@contoso.example> jane@mysite.example phyllida@fabrikam.example ("Phyllida von Huber")
Now, imagine you want to extract the email addresses. First off, you could pick up just the lines with addresses by doing something like type emails.txt | where { $_ -match "\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]+\b" }
. This wouldn't be strictly necessary if every line in the file contained an email address, but it helps in the next step, so you might do the filtering anyway.
Now, what –match does behind the scenes is that it populates a variable called $matches, which contains all the contents of the capturing groups the regular expression has. It currently has none, but the first group always represents the whole string the regex matched. Therefore, we can do this:
PS D:\temp> type emails.txt |
where { $_ -match "\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]+\b" } |
foreach { $matches[0] } jouni@domain.example john@contoso.example jane@mysite.example phyllida@fabrikam.example
In a more complex scenario, you’d definitely want to use named groups to clarify the matching. For example, suppose you wanted to pick the username and domain parts separately from the addresses. While you’re at it, you might as well construct some objects from the matches to get a cleaner view and help you manipulate the results.
PS D:\temp> $contacts = type emails.txt | where { $_ -match "\b(?<username>[A-Z0-9._%+-]+)@(?<domain>[A-Z0-9.-]+\.[A-Z]+)\b" } | foreach { new-object PSObject –prop @{ UserName=$matches['username']; Domain=$matches['domain'] } } PS D:\temp> $contacts | format-table Domain UserName ------ -------- domain.example jouni contoso.example john mysite.example jane fabrikam.example phyllida PS D:\temp> $contacts[0].Domain domain.example
There you go! These examples should give you quite a good idea on how to use regular expressions to match and parse data. In the next installment of the series, I will look into using regular expressions to do textual replacements.
February 13, 2010
· Jouni Heikniemi · 5 Comments
Tags: PowerShell · Posted in: .NET
5 Responses
83Jennie - August 5, 2017
Hello blogger, i must say you have very interesting
articles here. Your website should go viral. You need initial
traffic boost only. How to get it? Search for; Mertiso's tips go viral
AnnettSmall - November 15, 2017
I have checked your page and i've found some duplicate content,
that's why you don't rank high in google, but there is
a tool that can help you to create 100% unique articles,
search for; Boorfe's tips unlimited content
Jason - February 5, 2018
A great refresher on regex, match groups, and objects
Finn - August 6, 2018
Good work, much more useful than microsoft sites!
Temple Nussbaum - January 12, 2021
It is very rare these days to find sites that provide info someone is looking for. I am glad to see that your blog share valued information that can help to many scaners. thank and keep writing!
Leave a Reply