.net String vs. StringBuilder – concatenation performance
Most people have a gut feeling about when to use StringBuilder for concatenation and when to just add strings together with the + operator. But what are the exact situations in which each of the approaches is better? When the question gets asked, people often give out overly simple rules such as "5 catenations". Is that really correct for the vast majority of cases? Of course, being the dubious me, I decided to test it and resolve the question once and for all.
The basic setting is this: StringBuilder.Append
is faster than String + String. However, new StringBuilder()
requires time. Now the question is: How many Append calls are required to have the speed benefit exceed the construction cost of the StringBuilder? Ultimately, the answer would be just one magic number. Unfortunately, in practice it isn't.
Here are the simplified conclusions. They shouldn't be taken literally, because situations vary and there's a code readability issue as well (most people read String + String more easily than sb.Appends). Regardless, for most cases these rules do provide the correct answer from a performance perspective.
- If you have no idea on the resulting string size, use StringBuilder if you have at least 7 concatenations.
- If you can roughly (with 30% accuracy) estimate the resulting string size, use StringBuilder if you have at least 5 concatenations.
- If you can estimate the resulting string size with good accuracy, use StringBuilder if you have at least 3 concatenations.
- Under no conditions is StringBuilder faster for less than 3 concatenations.
- StringBuilder beats strings for 10+ concatenations in every practical situation.
- The longer the strings are, the more final string size estimations will help you (but accuracy becomes more critical).
I don't expect you to believe me any more than any other information source on the net. But to back up my claims a bit, I'll discuss the background of these results next.
How do string concatenations and StringBuilder work?
String objects in .net are immutable. Once the string has been created, the value can't be changed. When you type s = s + "foo";
, you actually discard the old s and create a new string object containing the result of the concatenation. When repeated several times, you end up constructing many temporary string objects.
StringBuilder, on the other hand, represents a mutable string. The class itself contains quite a few methods to change the contents of the string. This includes appending new strings to the end – the most common operation by far. Internally, StringBuilder reserves a buffer of memory which is used only partially at first (usually). Concatenations that fit into the buffer are just pasted in and the string length is changed. If the new resulting string wouldn't fit into the buffer, a new buffer is allocated and the old contents are moved in. In no case new objects need to be created.
The sore points of StringBuilder are the construction cost (which makes the "magic number" practically always at least 3) and the cost of allocating a new buffer when the resultant string would exceed the current buffer size. The latter one explains why the preknowledge (or a good estimation) of the resultant string size helps so much: StringBuilder can just allocate a sufficient buffer once.
Running the performance tests
Testing this is actually pretty simple. Choose a string operation, implement it using both ways and repeat sufficiently many iterations while measuring the execution time. There are basically two factors involved: the length of the strings being handled and the number of concatenations. Few real-world scenarios use a fixed amount of concatenations with fixed-length strings, so a very realistic test case would do real-world concatenations. However, constructing such a scenario isn't easy as it tends to adds non-string operations into the loops, thus messing up timing.
Pure string concatenation loops are very rare in any case, so even if you're able to speed up your string operations by 50%, it's very unlikely your software will speed up that much. The point here is this: if you want absolutely best performance, measure it yourself – in your real-world scenario. However, fair amount of testing on some of my applications has convinced me that the simple rules outlined above actually do hold up even with fairly varying material.
So, my test was essentially a loop of string concatenations with each iteration appending another string of predetermined length and content to a temp variable. I mostly varied the number of concatenations (iterations of the loop) to find out the cutoff point, but I also played with the string length. All tests were repeated 10 million times by an outer loop to provide better sampling. Everything was run on my AMD Athlon 2800+ with 1 GB of Memory, XP Pro and .net Framework 1.1.
The following source snippet shows the basic versions of the testing loops:
// String version string s2 = new String('x', Int32.Parse(args[0])); int loops = Int32.Parse(args[1]); for (int j = 0; j < 10000000; j++) { string s = ""; for (int i = loops; i > 0; --i) s += s2; } // StringBuilder version string s2 = new String('x', Int32.Parse(args[0])); int loops = Int32.Parse(args[1]); for (int j = 0; j < 10000000; j++) { StringBuilder sb = new StringBuilder(); for (int i = loops; i > 0; --i) sb.Append(s2); sb.ToString(); }
The extra ToString call at the end of StringBuilder version is there to level the field for the approaches: the first one's end result is a String, so it should be the same for the last one as well. Leaving that ToString out had a marginal effect on the results: while it did make a 8% difference with a single concatenation, the effect quickly died as the number of operations increased.
Finding the magic number
I started with 10-character strings, running from 1-50 concatenations (each repeated 10 million times as outlined above). The result is the chart below, displaying the relative execution times against the number of iterations (1-15). Absolute execution times aren't shown since they're hardly relevant.
The blue line is the performance of the pure String approach. It looks linear at first sight, but it isn't. If the String approach had to allocate space for X chars (where X is the length of the string being added, 10 here) per loop iteration, the time requirement would grow in a linear way. However, the amount of memory needed – and also, the amount of existing data being copied to the newly constructed string object – increases with every iteration. For Nth iteration, the String version allocates space for N*X chars. Thus, every iteration is slower than the previous one, and the String time curve steepens quickly as N grows.
The red line is StringBuilder at its basic settings. If you add a trendline, SB actually performs fairly linearly with increasing N*X. The bumps in the line are caused by the buffer allocations. Now, knowing how StringBuilder works in .net helps here: The default buffer size is 16 chars, and it's doubled each time it overflows. Remembering that X is 10 here, it's no big surprise that the bumps appear at 2 (after 16 chars), 4 (32), 7 (64) and 13 (128) iterations.
As you can see here, the first time the SB result is below the String result is at six concatenations. However, the memory alloc bump at 7 concats makes SB again slower than pure strings. After that, however, the results are clear. Even though the bump at 13 catenations is considerable, it's nevertheless much below the blue line. However, the exact figures aren't relevant: the bump locations are much tied to the amount of chars gathered so far. However, with most normal strings the cutoff point is somewhere between 4 and 8.
The power of estimations
The green line represents a StringBuilder initialized to the size of the final string (using the StringBuilder's int-taking constructor). As you can see, this is the fastest approach by a very clear marginal. And, as you can see, the cutoff is at three catenations! The obvious drawback here is that you have to know the buffer size beforehand, which you usually can't do. For the cases you do know it (such as this simple fixed-length scenario), it's blazingly fast. At 50 catenations with 10-char strings, it's 550% faster than pure String-based catenations and 35% faster than uninitialized StringBuffer. The differences tend to grow as the size of the data increases.
The good thing is this: even a rough estimation of the resulting string size helps. If you overestimate the string size, you're allocating extra memory, but you're avoiding mid-loop buffer expansions. The extra memory allocation will slow you down at some point, but the effect may be negligible. If you underestimate the string size, you're going to have a buffer operation at some point. However, it's very likely you've still skipped early reallocations.
For example, if you're generating a 150 char string in 10 char increments (but you don't know these characteristics beforehand), initializing the StringBuilder with default values causes four buffer reallocations (16 -> 32, 32 -> 64, 64 -> 128, 128 -> 256). While initialization to 150 (or any larger value) would avoid the allocations altogether, even an initialization to a rough estimate such as 100 will help: you'll have only one realloc happening.
The moral of the story: Estimate whenever you reasonably can. Even a bad estimation will usually provide 10-20% benefit over a StringBuilder constructed with the default values. However, if your strings are very long, you'll want to read the following chapter first.
The effect of the string length
How about string lengths? Varying the string component length (X above) with a default StringBuilder has actually pretty little effect. For fairly short strings, the cutoff point is usually a bit lower, but this is largely caused by the fact that more short strings fit into the default StringBuilder buffer of 16 chars. However, the absolute gain here is usually irrelevant since the concatenations on short strings are very fast regardless of the method used.
The pure String-based concatenation slows down as the number of chars in the string grows. The worst scenario is many additions of short strings at the end of a long string. For example, when 2 chars get added at the end of a 500 char string, 99,6 % of the memory allocated is for the old part of the string. Duh!
For StringBuilders, later buffer reallocs are slower, of course. More memory needs to be allocated and more old content needs to be moved around. So, the longer your strings become, the more you'll gain by estimating. For 50 catenations of 50-char strings, a perfect estimation gets you a 50% speed benefit over a StringBuilder with default settings!
However, there's a catch. As the memory allocations grow, the significance of your estimation accuracy plays a bigger and bigger role. Suppose we have the previously discussed 50×50 char string, resulting in 2500 bytes of final size. Now, the following table lists the execution times with different estimations. Times are relative to the default settings, so that the default is indicated by 100%; smaller figures mean faster execution (less time).
Initial buffer size | Time |
---|---|
16 (default) | 100 % |
50 | 97 % |
2000 | 88 % |
2499 | 104 % |
2500 | 49 % |
3000 | 53 % |
4000 | 62 % |
5000 | 103 % |
10000 | 268 % |
As you can see, if you can guess the final size of the resultant string, you're very fast – only 49% of the default execution time. However, make the buffer one byte too small (2499 in this example), and you've just ruined your performance. Adding the last element doubles the buffer to 4998 bytes, which has quite a lot of overhead in it. In the other direction, even a 60% overalloc at 4000 bytes is pretty fast (only 62% of the original execution time). Unfortunately that costs memory, and with strings at the sizes of several megabytes, you probably can't afford that luxury.
On the other hand, you also saw that also slight underallocation wastes RAM eventually. Neither is the default approach perfect: always doubling the buffer tends to allocate extra space, too. So, slight overallocation might be both the fastest and the most memory-sparing approach unless you can do a perfect estimate.
Guessing is hard, but luckily the consequences of a bad guess aren't usually catastrophic. If you can avoid massive overallocation, you're not likely to do much worse than the default settings. In any case, the execution time without StringBuilder is 712% on the scale above; it's pretty unlikely you could do worse than that. :-)
Conclusions
StringBuilder performance is a tricky thing. In the last chapter you saw that the StringBuilder with perfect size estimation can be 15 times faster than normal string concatenation. But earlier in the article you also saw that even the default StringBuilder beats normal string catenation by a clear marginal once the cutoff point of 4-8 concatenations is passed.
Except for the most critical string handling loops, optimizing the process to the point of making perfect estimations isn't usually worth it. For reasons of code clarity you might even want to avoid using StringBuilder when the amount of concatenations is only slightly over the cutoff point and you're working with an operation that's not critical to the millisecond level. For example, constructing a ten-part SQL statement is likely to be faster with StringBuilder, but the speed difference is negligible when compared to the execution time of that statement. Though, once you become familiar with the StringBuilder class, you'll be reading sb.Appends just like you read plus signs.
August 22, 2004
В· Jouni Heikniemi В· 29 Comments
Posted in: .NET
29 Responses
Antti-Juhani Kaijanaho - August 22, 2004
Obviously, all measurements are heavily dependent on what you measure. I'd expect the concrete numbers to change between different implementations of Java, and I would expect the trends to stay same.
My rule of thumb is to use StringBuffer when building strings from variable number of components (essentially, when building a string in a loop). Otherwise, I tend to use String.
An - August 22, 2004
Or was that Java? You don't mention the language, and that does look a little strange to be Java, but sufficiently similar to have fooled me.
Jouni Heikniemi - August 22, 2004
Heh. It's C# – the post was in .net category and the post does mention .net Framework, but perhaps that's not clear enough. I added ".net" to the post title as well.
It would be interesting to see similar benchmarks run on Java. Even though numbers are bound to be different, I believe the same principles apply to both worlds.
Antti-Juhani Kaijanaho - August 22, 2004
Well, in principle dotnet can run Java :) And my point about multiple implementations hold for C# too, since there is at least Mono.
In fact, it was the dotnet references that made me suspect my initial assumption.
(BTW, at least to me saying "dotnet" is clearer than writing it with a real period:)
Jemm's Blog - August 23, 2004
StringBuilder challenges String in concatenation benchmark
- TrIpLeZoNe - - June 2, 2005
.net String vs. StringBuilder – concatenation performance
.net String vs. StringBuilder – concatenation performanceHere's EXACTLY what I'm going to discuss and…
ghenz - June 14, 2005
Jouni Heikniemi,
Nicely explained in plain english…
Thanks.
Niktu - September 12, 2005
From what i've read, java string concatenation operator '+' intenally uses stringByffer for expression that concatenates many strings. So only useful usage of stringbufer would be when you concatenate strings in loop, or you can't do that in one expression (because for some weird reason you need to do other operations in between concatenations … )
I was wondering if that case was with c#, that seems rather obvious optimalization, but i couldn't find any mention of it
(in essence, would:
string str = "dsadsa" + "dsadsad" + someString +
"dadasdsad" + someOtherString + "dsaddsasad" + "dsadsadas" + "sadjhsadsadhsa" + someEndingString + "ehh, i got tired";
be done internally by StringBuffer in c# like in java ?
Atul Yadav - December 6, 2005
please tell me difference between String(capital S) and string(small s) in C#
Jouni - December 6, 2005
No functional difference. The other one (string with a small s) is a language specific alias for String (which is a class name from the Base Class Library).
mmarian - June 5, 2006
very nice!
Craig Fisher - June 22, 2006
I'm curious when a string is created. Is it created when you put "" around it or only in the end.
Does "1" make 2 strings and then combine them into 1? So are these equivalent?
1.
string a = "abcdef" + "ghijkl";
2.
string a = "abcdef";
a += "ghijkl";
How do those compare with:
1.
StringBuilder str = new StringBuilder();
str.Append("abcdef" + "ghijkl");
2.
StringBuilder str = new StringBuilder();
str.Append("abcdef");
str.Append("ghijkl");
Niktu - June 29, 2006
Craig, i found that best way to get such answer is to compile such examples and look up resulting bytecode with Reflector …
… esentially when you do assignment:
String a = "sdas" + "dsada" + myInt.toString() + "dsadsad"
it gets efficient teatment – there is internal function used to append constant strings together
(doasn't constuct many interediate objects as:
a += "sdas"; a +="dsads"; a+= myInt.toString(); …)
… so unless you concatenate in a loop, or have screenful of concatenations frequently separated with other processing (or overuse += where + and line break would do just as fine :) you can give StringBuilder a rest :P
Erik Molekamp - August 29, 2006
Very good article. Thank you!
I've always had the gut feeling that StringBuilder would be more efficient in many cases, but didn't have the data to back it up and convince my colleagues to use it.
What about the performance of AppendFormat?
Which is faster:
— 3 Appends —
sb.Append("fixed string 1");
sb.Append(stringVariable);
sb.Append("fixed string 2");
— or AppendFormat —
sb.AppendFormat("fixed string 1{0}fixed string 2", stringVariable);
Erik
Tomato - September 20, 2006
very nice & cleary explained artice!
JB - October 26, 2006
Hi, as you seem to be quite experienced in these kind of problems, I wonder if the principles tought regarding the memory allocation is applicable in Java when considering String vs. StringBuffer (our servers still use 1.4, so StringBuilder isnt available yet.).
Manish - December 20, 2006
Very good and helpful article i want to thanks to you for this article.
Harry - January 31, 2007
Thank you for this benchmark, it will help me to finish more quickly one of my projects
keith holdaway - February 16, 2007
Is this Java relevant when comparing stringBuffer and stringBuilder classes?
Nirbhay Kumar Singh - March 5, 2007
This is the very very excelent way of comparission between string and stringbuilder
Monica - March 14, 2007
Thanks so much for this–I had an app that was doing a large number of concatenations. In this case I knew what the final string length would be. I preset the starting capacity on all the stringbuilders used in the app–it actually visibly improved speed!
nupur nag - June 6, 2007
sir,
How to concatenate different values of buttons i a single textbox in C#.net
Scott Bateman - August 4, 2007
Agreed that there are certain situations where the StringBuilder is useful, but in general I think it is not necessary and probably overused by most developers.
Check out my reasoning here:
http://codeslammer.wordpress.com/2007/07/07/do-not-use-the-stringbuilder/
Peter - August 10, 2007
Nobody should be using single concats over a bounded list of strings. Consider that most concatenations are bounded and in those cases StringBuilder cannot ever perform better than String.Concat(). The concat code above is flawed in thinking that one would use single concats over and over. Compare the StringBuilder code with this version of concat:
for (int j = 0; j 0;)
sargs[i] = s2;
String.Concat(sargs);
}
This concat runs almost twice as fast as StringBuilder at any loop count. And this is not just true of loops, as long as the counts are bounded, Concat is always faster.
There are definitely times when you should use StringBuilder, but general concatenation of strings is not one of them – no matter the number of strings being concatenated.
Peter - August 10, 2007
Entry form garbled the pasted code, second try:
for (int j = 0; j 0;)
sargs[i] = s2;
String.Concat(sargs);
}
David Cumps - September 16, 2007
I've taken the liberty to do some additional testing into the memory usage of various methods.
Might be useful: http://blog.cumps.be/string-concatenation-vs-memory-allocation/
Nachi - January 4, 2008
Nice Article an useful one.
Rahul Patil - April 14, 2008
good article.
but clear differences are not given.
point #1: If the new resulting string wouldn't fit into the buffer, a new buffer is allocated and the old contents are moved in. In no case new objects need to be created.
is not differentiating stringbuilder with string cause in case of string class also this thing happens.
so clear differentiation must be given
subhash - April 16, 2008
how Strings useful than the StringBuffer