How slow is database access, really?
Many modern applications use relational databases. Performance optimization discussions often end with the statement “If you’re hitting a database, nothing else really matters”. Is that really so?
Note: This post is written to provide a rough idea of the performance impact magnitudes involved in database access. There is no substitute for local measurements.
To test the hypothesis, I created an application that does a very-lightweight, yet useful database operation: retrieve the current date from the server. Make it 1000 times.
for (int i = 0; i < 1000; ++i) { SqlConnection conn = new SqlConnection(@"..."); conn.Open(); SqlCommand cmd = new SqlCommand("SELECT getdate()", conn); DateTime serverTime = (DateTime)cmd.ExecuteScalar(); conn.Close(); }
Well then, practice your estimation skills. How many seconds does it take to run that snippet of code?
It varies (but I’d be delighted to hear your guess in the comments!). In my case, it took about four seconds. That’s four milliseconds per datetime query, and that’s over a slowish WLAN to a SQL Server 2008 running in a fairly low-power Hyper-V host. If I run the loop twice or thrice as many times, I do get down to something like 2.8-3.3 ms/request, but the difference is too small to be relevant in a rough benchmark like this.
Is getdate() a good example function?
A fair point. However, I think it is, for much of the time. First of all, comparing any more complex queries in abstract is very hard – you’ll always have to measure locally. Also, a very big percentage of extremely simple queries execute in practically zero duration on the SQL Server, providing an experience that is pretty close to a simplistic query such as SELECT getdate().
Then again, the 4 ms figure may be heavily underestimated if your extremely simplistic query executes in 0 ms, but happens to return a 500-kilobyte result set. In that case, transporting all the data will make a difference. If you just return a few fields on a single row, the execution time will not grow exponentially, and the comparison stays pretty valid.
OK, so is 4 ms per query a lot? What else could you do with that?
Well, it’s quite a lot in a sense. It’s reasonably hard to consume four milliseconds in your own code, unless you’re doing some really heavy-duty processing.
If you took the time from the client’s clock using DateTime.Now, it would be about 2700 times faster.
Reading a 30-byte file from the local hard disk takes about 1/20th of the time it takes to hit the database, but that’s when you’re hitting the same file all the time – perhaps a reasonable scenario if you decided to cache the database return value onto the disk, but not always. If you write a file of the same size, that takes about 1/3 of the time it takes to hit the database. Of course, all this varies heavily on the media.
Parsing XML can be notoriously slow. If you use the DOM parsing model (System.Xml.XmlDocument), parsing a reasonably heavy-structure 50-kilobyte document takes about one millisecond. Using XmlReader would speed up the works, naturally, so you could actually parse quite a bit of XML in the time you retrieved the current time from the database.
The conclusions
Of course, you can waste 4 ms easily by manipulating some images, calling a few web services, calculating pi to a few thousand decimals and so on. But since the vast majority of applications don’t do these things, the database delay actually is pretty significant.
For most interactive applications running on the client, the user would never notice a 4 ms delay. It’s relevant on a web server though, because occupying a thread for additional 4 ms on every request reduces maximum throughput. If your application could spit out HTML at 2 ms/request rate, throwing a 4 ms delay will cut its performance by two thirds. Then again, very few web sites actually need all that performance – even 6 ms / request is a really, really fast rate for replying to web requests.
But to sum this up, this myth is actually valid: Database hits are expensive enough to dominate almost everything else. So if you have an application with a performance problem, forget about fine-tuning your foreach loops and concentrate on the database access first.
March 23, 2010
· Jouni Heikniemi · 3 Comments
Tags: performance · Posted in: .NET, Misc. programming
3 Responses
Jaba - March 24, 2010
Latency, the mood killer. Here's a view from the OSS side of the road.
One of the latest buzzwords is memcached, http://memcached.org/. It truly shines in the environments it belongs to, such as Facebook and Flickr.
memcached takes tons of load off from the database servers, but not without a small gotcha. For a simple queries and small databases a query from an actual database can very well be faster without memcached, since memcached brings in some additional latency. Many site owners do install memcached even for small setups, thinking it will bring in a performance boost ("yay! mysqld process now only takes 1% of cpu, my site must be faster") without realizing that the additional latency (still in the millisecond range you mentioned, though) has made their site a bit slower.
But if you have a lot of concurrent requests and your site is dynamic, memcached pretty soon starts to make sense. It allows one to have a lot more concurrent connections without starting to get slower, unlike the actual database servers, which might serve well until X concurrent clients, but then starts to get slower and slower. Of course memcached has its limits in concurrency, too, but the limit is far higher than with traditional databases and caching with memcached is drop dead simple.
Jouni - March 24, 2010
Yeah. Caching, distributed or otherwise, adds an entirely different perspective into the whole DB perf mix. For the Microsoft Platform, AppFabric caching is essentially the same as memcached, both from a usage and scalability perspective.
A totally different question is, how much distributed caching actually benefits your typical site. At least here in Finland, most web applications do quite well with localized caching at least from a performance standpoint.
Jaba - March 24, 2010
In Finland, whatever we do, is peanuts compared to Facebook et cetera. I admin some of the busiest sites in Finland, such as hs.fi and iltasanomat.fi, and their traffic is absolutely nothing compared to Really Busy Sites.
But still we do gain something thanks to memcached.
Traffic of one site might not be something to brag about or even not to make your database sweat. But if you have a centralized database cluster handling traffic for several popular sites, things starts to matter. One site might not kill your brand new database server, two sites might only make it sweat a bit, but the third might finally be too much for it.
In the shared database servers scenario memcached makes a lot sense, too. If you have a shared database for several services, it's soon faster to memcached all of them instead of letting all of them to query directly to database.
Leave a Reply