PowerShell Basics #3: Manipulating data in text files

To continue my series of PowerShell Basics posts, I’ll cover some basic features around manipulating text file data.

First off, the problem is really two separate problems: Handling text file IO and manipulating the contents. PowerShell provides decent tools for both, and since everything is just .NET objects, you can run any .NET string manipulations you desire.

Seeing what’s in there: get-content

First, you usually want to see what’s in a file. The type command familiar from DOS still works; it’s now an alias for a commandlet called Get-Content. Using it is exactly as straightforward as you’d expect: type my.txt shows the contents of my.txt.

But again, “shows” in a very loose sense. Get-Content returns objects, namely the strings that make up the lines of the file. It is the PowerShell UI that actually shows these, and you can manipulate them how you want. For example, to know the number of lines in a file, just take a count:

PS D:\temp> (type my.txt).Count
3

Naturally, this also allows you line-by-line access to the file.

PS D:\temp> $lines = type my.txt
PS D:\temp> $lines[0]
Foo

If you just need the first few lines, you can use the –TotalCount parameter, abbreviated as –t,  as a head (for you Unix-minded) or top (for you SQL people) operator.

PS D:\temp> type my.txt -t 2
Foo
Bar

Note that the –t operator is really just an optimization: you could always get two first lines by manipulating the resulting array with an expression like (type my.txt)[1..2], but that can be really slow with a half-gig log file; with –TotalCount, you only load the necessary lines, but with array manipulations, you get them all.

Writing stuff: set-content, add-content

Typically, your scripts may want to write data into a text file. There are a few approaches to this. First of all, if you just want to output a few lines of text to a file, use the Set-Content commandlet, which really has no equivalent in cmd.exe world. It takes a file name and a group of objects, then writing those objects into the file. In the example below, we just pass it an implicitly created string array.

PS D:\temp> Set-Content animals.txt "cat", "dog", "giraffe"
PS D:\temp> type animals.txt
cat
dog
giraffe

If you want to add new lines to the end of an existing file, we have Add-Content which works the same way, but appends new lines. Therefore:

PS D:\temp> Add-Content animals.txt "elephant", "cow"
PS D:\temp> type animals.txt
cat
dog
giraffe
elephant
cow

You can also use piping and array operations to edit the files. For example, to truncate a text file to just its four last lines (an operation perhaps valid for a log file), you could do:

PS D:\temp> (type animals.txt)[-4..-1] | set-content animals.txt
PS D:\temp> type animals.txt
dog
giraffe
elephant
cow

As you see, the cat got cut out. The “-4..-1” thing is again a feature of PowerShell array indexing: negative indices refer to the elements of an array from the end of the array, i.e. [-4..-1] means “a range of elements from the fourth-last to the last one”.

Sorting the data

When you need the data sorted, you want the Sort-Object cmdlet. It can do a whole lot more for you as well, but for sorting text, very simple things suffice. To get the items into dictionary order, just pipe the whole thing to sort (an alias to Sort-Object).

PS D:\temp> type animals.txt | sort
cow
dog
elephant
giraffe

If you need descending order, just apply –descending (or –desc for short). Also, PowerShell supports culture-specific sorting and defaults to your thread culture. For example, if you need sorting by German rules, add a –culture de-DE. To force sorting by the US English rules, –culture en-US does the trick,

Also, since Sort-Object actually sorts the objects (strings), you can also sort by any property of the System.String class. The most likely application for this is sorting by string length:

PS D:\temp> type animals.txt | sort Length
cow
dog
giraffe
elephant

The variable assignment syntax

There is one more trick to learn here. While the previous syntax is reasonably clean, there are a few scenarios where an even more terse syntax makes sense. PowerShell allows you to access contents of text files as they were variables. The catch is that it requires you to use the full path.

Anyway, you want to get the count of lines in your hosts file? Just type

${c:\windows\system32\drivers\etc\hosts}.Count

and you’re set.

What if you need to load the animals, append a few plants and throw the stuff into living.txt?

PS D:\temp> $living = ${d:\temp\animals.txt}
PS D:\temp> $living += "rose"
PS D:\temp> $living += "pineapple"
PS D:\temp> ${d:\temp\living.txt} = $living
PS D:\temp> type living.txt
dog
giraffe
elephant
cow
rose
pineapple

There’s a lot more you can do. I’ll get to regular expressions, grouping and whatnot quite soon, so stay tuned!

February 5, 2010 · Jouni Heikniemi · 2 Comments
Tags:  · Posted in: .NET

2 Responses

  1. Scripting STSADM with PowerShell | SharePoint Blues - May 2, 2010

    […] you have no idea what I’m talking about, there are plenty of resources on TechNet, and some great tutorials by our CTO, Jouni Heikniemi to get you […]

  2. tapiov - June 10, 2014

    Very nice articles Jouni. Please keep these online!