CSV parser for C#
Need to parse CSV (Comma Separated Values) files in C#? There are many solutions starting from the OLE DB adapter, but here's an easy-to-use CSV Parser written in pure C#: CSVReader.cs. Now, here's a quick tutorial.
First, let's recite the rules of CSV: Each line in a text file represents a record. The fields on each line are separated by commas. If a field starts by a double quote ("
), the field ends when the next quote is encountered. If you need to embed a quote inside a quoted field, use a double quote (""
). Take for example the next trivial CSV file:
my fields,go,here John said: "Don't move","""I won't"", he replied"
The first line parses into three separate fields ("my fields", "go", "here"). The second one is trickier, but it produces two values. You need to note that the quotes in the first field (John said: "Don't move"
) do not mean field boundaries. The behavior would be different if a double quote started the field, as it does for the second field ("I won't", he replied
). This is why the quotes don't need doubling for the first field.
Now, the CSVReader class can be used to read the file like this:
using (CSVReader csv = new CSVReader(@"c:\myfile.csv")) { string[] fields; while ((fields = csv.GetCSVLine()) != null) { Console.WriteLine("New CSV line begins"); foreach (string field in fields) Console.WriteLine("CSV field: " + field); } }
And as you can guess, the code produces output like this:
New CSV line begins CSV field: my fields CSV field: go CSV field: here New CSV line begins CSV field: John said: "Don't move" CSV field: "I won't", he replied
As usual, feedback and/or bug reports are welcome.
October 23, 2004
В·
Jouni Heikniemi В·
38 Comments
Posted in: .NET
38 Responses
Frank Zehelein - November 16, 2004
Greate Software! Thank you for sharing it!
A verry little improvement could be to make the separating char adjustable:
public char Separator
{
get
{
return separator;
}
set
{
separator = value;
}
}
#region Private variables
private char separator = ',';
…
int nextComma = data.IndexOf(separator, fromPos);
Jouni - November 16, 2004
Yeah Frank, I was thinking about the same thing as well. I'll probably devise something like that for the next JHLib release of the CSVParser.
Akshay - November 19, 2004
Good stuff, my compliments on some well-written code.
I realise this might sound trivial, but just to cover my bases:- under what license can I include and re-distribute your code?
Jouni - November 19, 2004
The newest version of the code was posted in the JHLib collection. I suggest you use JHLib's version, as it has been corrected according to FxCop rules and suggestions (i.e. it fits the generic Microsoft library conventions better). The changes are fairly trivial, though – if you've already taken this code, you're not missing out on much (apart from possible future updates).
JHLib's license statement applies here, too: "JHLib is free. It is not released under any formal license such as GPL; it's just plainly and simply free. You can do whatever you wish with the code; I don't offer support or carry responsibility for anything related to the source or the binaries. That said, I'm naturally interested in feedback and suggestions, as well as your own code changes. Also, it would be nice to hear if you start using the library somewhere – it always gives ideas for further development."
(from http://www.heikniemi.net/jhlib/)
Feel free to mail me if you need any further help.
robin - November 24, 2004
hi jouni!
nice class! i am using it in a customer-project. since one requirement was that there can be spaces between commas and datafields. since your class got confused with that, i improved it a little bit to handle that requirement. if you are interested, send me an email and i will send you the modified version..
best regards,
robin
Dane Paul - December 16, 2004
Yeah, this is a really good class. I made a simple modification for it to accept a string from the code and parse that string only. Everything works perfect now. Thanks a bunch.
Baz - January 4, 2005
One problem I see with your class is that if I want to read the Nth line, I need to parse N-1 lines even if I don't care about them.
dan hoang - January 19, 2005
Easy to use but I found a problem parsing the following line, it could be solved by adding one line of code shown below.
"Distance Offset = -21.466 m,,,,,,,,,,,,,,,,,,,,"
// If we're at the end of the string, let's consider this a field that
// only contains the quote
if (fromPos == data.Length-1) {
startSeparatorPosition = fromPos; // Add this line
fromPos++;
return "\"";
}
Flack - February 4, 2005
Hello,
I am trying to use your code here to parse a certain line. The line itself comes from a user selecting some rows in an open Excel file and dropping them onto my form.
Anyway, the line looks like this:
"""TE,ST""",q,1,",",",/, ,","
This line corresponds to the row in Excel, which has these values (each line represents a value from A1 to H1):
"TE,ST"
q
1
"
,
/
,
Now, when this line is parsed, I get back 7 values, as follows:
"TE,ST"
q
1
,
,/, ,
"
Is there any way that your code could be changed to handle this case correctly, or is it too complicated?
Thanks for the help.
Jouni - February 5, 2005
I've heard the same being said by somebody else. A guy whose name I don't remember sent me email a couple of months ago and told me he changed the parser to accept Excel-originated CSV without a hitch. He promised to send me the updated source but never did.
Thanks for the test case though; I'm pretty sure the code can be changed to do what you wish. I have plans to collect up all the recent cases and make fixes so that they all work. No promises on the timeframe though, I'm pretty busy these days. If you hack the code yourself, please do mail me the source if you can. :-)
Rob Mello - April 14, 2005
Nice work Jouni.
Abi - April 18, 2005
Nice piece of work.
If I want to parse only from the third line of my CSV file, how would I do it using this code?. Any suggestions?
Thanks,
-Abi
PaLoMo2 - April 26, 2005
BUG: I heave esported a multiline textbox field and the result is this:
Name,FamilyName,Tel,Note
John,Holmes,5552522,"test note
note note"
I thing that the problem is the char "\n" and the file reader is able to read 1 line at time.
PaLoMo2 - April 26, 2005
BUG: I heave esported a multiline textbox field and the result is this:
Name,FamilyName,Tel,Note
John,Holmes,5552522,"test note
note note"
I thing that the problem is the char "\n" and the file reader is able to read 1 line at time.
Dick Walker - May 15, 2005
Hi,
thanks for the code. It doesn't seem to cope with double quotes within double quotes. See example. The 5 field is split into 2.
"20000083","HEATHER SMITH MAIL RETURNED","","18 DURBAN WAY 10 APR'97","MINTO "LEFT ADDRESS"","MADE CLASS 9","NSW","2566","","","","",0.00
Chuck King - May 19, 2005
Cool code…thanks!
If you want to make sure you won't be there a good while, change the sample code to use a stringbuilder, something like:
private void button1_Click(object sender, System.EventArgs e)
{
using (CSVReader csv = new CSVReader(@"c:\Test1.csv"))
{
string[] fields;
int linenumber = 0;
System.Text.StringBuilder sb = new System.Text.StringBuilder();
while ((fields = csv.GetCSVLine()) != null)
{
linenumber++;
sb.Append("CSV Line Number " + linenumber.ToString() + " begins ********************\n");
foreach (string field in fields)
sb.Append("——— CSV field: " + field + "\n");
}
txt1.Text += sb.ToString();
}
}
E - July 29, 2005
Doesnt support records spread out across multiple lines. :|
Nemanja - September 24, 2005
Just stubmled at this code, 'cause i'm too lazy to write csv reader from scratch.
Line:
if (i < data.Length – 1 && data[i + 1] == '"')
should be changed to:
if (i < data.Length – 1 && (data[i + 1] == '"' || data[i – 1] == '\\'))
This way reader can reckognize more accurately quotes embedded in string. Hope this helps…
Lee Newman - November 11, 2005
Thanks!!
Craig - January 16, 2006
Hi,
I've got this change to catch the End of File character.
public string[] GetCSVLine()
{
string data = reader.ReadLine();
if (data == null) return null;
if (data.Length == 1) return null; // EOF char
if (data.Length == 0) return new string[0];
ArrayList result = new ArrayList();
ParseCSVFields(result, data);
return (string[])result.ToArray(typeof(string));
}
Craig
Chris Walker - March 14, 2006
Just downloaded this and wanted to say THANKS! I was going to build this class myself and you just saved me some time!
Woot! You Rock!
Aleksey Sokolovskiy - May 9, 2006
Thank you very much! It's really a time saver. The code works perfectly.
Steven - May 16, 2006
Thanks a mil for the code. Very helpful in teaching me how to code better too.
Mark - August 3, 2006
Anyone know how to hack this to use a file uploaded by a user via FileUpload… yet without saving the file to the server?
Grant Merwitz - August 15, 2006
Great class, saved me alot of time.
Thanks, u rock!
Grant Merwitz - August 15, 2006
Mark,
I would suggest using the constructor that reads a stream "public CSVReader(Stream s)",
try using the FileUpload's stream attribute.
HTH
AA - October 20, 2006
I get an error when attempting to open a csv file that is already open in notepad. Just wanted to check if the code is able to handle that in some way.
Anthony Main - October 23, 2006
I have just found a bug in your reader (am yet to investigate a fix)
If a field in the CSV contains data split over multiple lines it returns an array with only elements upto that field
Craig - November 20, 2006
Wow thanks alot really really helpful!
mandar - March 2, 2007
Simple CSV parser/reader function
mandar - March 2, 2007
Simple CSV parser/reader function
http://www.codeproject.com/useritems/Basic_CSV_Parser_Function.asp
Michael - March 25, 2007
Dude, your code is awesome – thanks a stack. Needed to get a test app out very quickly, and it's really saved a lot of time.
one thing I noticed, though – any fields after the first one which are encapsulated in double quotes are returned with those double quotes – eg:
"FRP002", "Frozen Peas", "340g", 23.16
will be returned as:
FRP002
"Frozen Peas"
"340g"
23.16
I fixed that on my side with a routine to check for a pair of double quotes, (to get around the kind of problem), and if it doesn't find a pair it takes off the first and last quotes, ie (if strHeader is teh sting being returned for that value:
if (strHeader.Length>4)
{
if (strHeader.StartsWith("\"\"") == false)
{
if (strHeader.StartsWith("\"") == true)
{
strHeader = strHeader.Substring(1);
}
}
if (strHeader.EndsWith("\"\"") == false)
{
if (strHeader.EndsWith("\"") == true)
{
strHeader = strHeader.Substring(0, strHeader.Length-1);
}
}
}
… and then strHeader will have been "cleaned" of the extra pair of quotation marks…
Michael - October 15, 2007
For an alternate approach I ended up using a regex from to handle the splitting of a single line read from the csv file:
public static string[] SplitCsv(string values)
{
Regex regex = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
string[] result = regex.Split(values);
return Array.ConvertAll(result, delegate(string s)
{
//remove start and end quote if it exists
if (s.StartsWith("\""))
s = s.Substring(1, s.Length – 2);
//unescape quotes
return s.Replace("\"\"", "\"");
});
}
Erick Rivas - December 6, 2007
I hope you don't mind. I uploaded the CSVReader project up to ohloh.net, with a couple of minor bug fixes and enhancements.
Paul Sanders - December 10, 2007
Very useful – saved me a lot of time. Thanks very much for sharing.
Paul Sanders
http://www.alpinesoft.co.uk
mo - January 9, 2008
thank you it works nice
David Kemp - January 10, 2008
I'm trying this with:
A,"test","test" something
Excel (which is the user's default 'benchmark' test) imports this as
A
test
"test" something
Your library seems to parse this as:
A
test
test
something
I can't seem to see which is better, or worse, but I'd like it if it were configurable.
Job - February 13, 2008
Very nice software, but im getting a "out of memory" error.
while (pos < data.Length)
{
result.Add(ParseCSVField(data, ref pos));
}
The result array is not large enouge for my data.
Spec's say the capicity will be fixing automaticly.
What can I do?