Valio.fi deep dive #4: Solving the ORM dilemma
In my series on valio.fi deep dives, I’ll return to the original fields of Offbeat and discuss some backend solutions. We won’t dive into the code today, but will take a look at some hard architectural choices.
Essentials of your ORM decision
We chose to build a reasonably large web site using NHibernate. A few members of the Microsoft developer crowd immediately asked: “Why not Entity Framework?” Meanwhile, a few of our developer friends had a lengthy discussion deeming ORMs worthless and unnecessarily complex.
There are really two subquestions here:
- Should we use OR mapping to make data access more straightforward?
- If yes, which OR mapper should we use?
Let me be absolutely clear on this one: I think both those questions are important to consider honestly. Many object-oriented developers feel that automated object/relation mapping is an essential part of developer productivity. I say that’s a valid point, but somewhat self-supported: ORM is important if you aren’t good enough at performing data access without it.
In this post, I’ll go over our decision making process, and then look back at where our use of NHibernate succeeded and failed. Mind you, the preceding emphasis is not a style issue: We take full responsibility for both our successes and failures, and do not intend to blame or praise NHibernate unnecessarily. It was just lying around, we used it.
To ORM or not to ORM?
This question was at the top of the list a few years ago. Now ORMs are mature enough to be the default choice for most developers, thus eliminating the question. Although the option of working without an ORM was debated, we sort of wanted one anyway. Our decision was based on a few facts:
- We had a non-trivial (although not particularly complex by any measure) business model, consisting of a few dozen entities.
- We had reasonably straightforward query requirements, mostly based on foreign key navigation.
- We had a mostly-read model, which ORM could naturally support by providing features such as identity mapping and caching.
- We had relatively few complex write operations and thus thought we’d get away without encountering the worst parts of mappings.
All these are good signs for ORM use. We weren’t entirely right in our predictions when making these decisions, but looking at it afterwards, I’m not sure we could have done better.
Picking the titans to fight
The next step was to compare Entity Framework to NHibernate.
Wait, why exactly these two? Welll, there are two factors: First, we had NHibernate experience available. Second, EF was the Microsoft Option. Well, we also had experience in other ORMs beside NHibernate and EF, but after considering our maturity and expertise in each, decided to limit ourselves to these two.
“The Microsoft Option”? It’s usually the official platform-integrated benchmark solution. Razor for view engines, EF for ORM, WCF for REST implementations and so on. Often not the best option, but the best known one. Should always be one option when picking a technology – being well-known is a big bonus, plus your software design never passes even a cursory review without you being questioned about ignoring the default option.
Don’t confuse that with picking the Microsoft Option blindly. We as a community of Microsoft developers are plagued by a disease that encourages choosing the MSFT solution every time, regardless of factual benefits. That is just as bad as ignoring the Microsoft option without thought.
Being included in the platform is a merit that justifies being shortlisted into any technology comparison, but definitely not winning them.
The fact that we chose only those two products for comparison seems arbitrary. This is an important point. It is entirely possible that there could have been a better solution we just didn’t know of. We decided to limit ourselves to these two because we considered the cost of exploring other options greater than the potential benefit. For any given technology, the benefit of evaluating a new option is about this:
B = p * ( AE – LC ) – EC
In the equation, the benefit B comes from subtracting the learning cost LC from the added efficiency AE and multiplying the result with p, the probability of this technology actually being picked. Furthermore, you need to subtract the evaluation cost EC from the benefit (although conceivably EC and LC would somehow merge, but we’ll keep it simple now).
Now, consider this: For an OR mapper, sufficient evaluation of an unknown variant takes at least a day. The trivial stuff you do in an hour, but going through all the complex scenarios (custom field types, xml handling, transactions, caching, cascading deletes, …) takes at least a day. The learning cost of doing all that properly is at least 5-10 days, particularly in a team of multiple developers. Even if we considered that some Acme ORM would be a contestant equally favorable to EF and NHibernate (i.e. p = 1/3), we would get:
B = 1 / 3 * (AE – 5) – 1
Quickly looking at it, we realize that with AE < 8, we would probably gain no benefit at all. Even if we estimated (before evaluating anything!) that a new technology saved us a whole month (20 man-days) of effort, our estimated benefit would be 1/3 * (20-5) –1 = 4 days. Would four days actually offset spending the one day in evaluation? And if we chose that, we’d gain 15 days but take a whole bunch of risk in exchange. How well can future generations of maintenance programmers benefit from this choice? And mind you, 1 day for eval, 5 for learning and 1/3 for usage probability are very optimistic estimates.
I’m very much pro-evaluation when choosing technologies, but you have to know when to stop. Considering that we didn’t plan to do many man-months of data management effort anyway, the potential gains never felt important enough to warrant the exploration effort. Thus, we focused on things that we found more productive.
The discussion above has nothing to do with ORMs per se, but ORMs are a good demonstration of this principle: since each stack does roughly the same things, knowing one of them relatively well is an extremely strong argument for picking it up and getting to real work.
Entity Framework vs. NHibernate, real arguments
We chose NHibernate for a few reasons. The key arguments were:
- NHibernate is more mature (and at the time the decisions were made in September 2010, the difference was even greater)
- NHibernate was better equipped for distributed caching (EFCachingProvider labels itself as a “sample”) and manipulation of fetching strategies.
- NHibernate + FluentNHibernate provided a reasonably approachable way to configure various complexities; EF’s approach of hiding some of this in a design diagram isn’t entirely without problems.
We did give EF serious consideration – finding people with NHibernate competence isn’t trivial, and as such, NHibernate is a hindrance to the further development. However, given the constraints we had, we had more confidence in choosing technology with a longer history (particularly on the Java side).
Regrets?
Our data layer architect Lauri* spent considerable time tweaking and tuning the data access mechanics – but most of this was done to create a smooth way to organize DAL code and make it accessible through the IoC mechanism we had. The bulk of that work was unrelated to NHibernate in itself, though much of it would have been different if we had opted for raw data access instead.
Most significant architectural headaches we had with NHibernate involved some of the more complicated structures. We have an everything-in-one-table –modeled class hierarchy with a discriminator field and an XML column for subclass data storage. Getting this to work wasn’t exactly smooth, nor was parsing the same column into two distinct fields. I may get back to this in more detail if there’s interest.
Performance problems are one of the most common fears with ORMs. For us, query performance wasn’t really a problem – reckless use of lazy loading did result in some issues, but this is more a result of underconstrained data usage than a problem related to choice of ORM itself. We have a few scenarios where NHibernate’s query cache manipulations actually slow down some operations, and we had to work around that. Most of the time, NHibernate’s fairly intuitive system of specifying fetching strategies worked like a charm.
Perhaps the most head scratching was caused by cascading delete/updates and ordered lists with index values in the database. Each of those required special trickery and effort, and particularly complex update scenarios were considerably difficult to verify as working correctly. These are the definite drawbacks in using an ORM – the abstraction really gets in your way far more than it benefits. Of course, such extremes do not typically occur in the vast majority of the program code.
*) We never appointed a data layer architect. He just grew up to that role. We like it that way.
Lessons learned
NHibernate did a good job, but it did exhibit all the problems of an ORM. Man, that stuff is hard! I mean, both EF and NHibernate make it very easy to whip up the simple demo application. If you have a reasonably straightforward business application with few concurrency issues and low performance requirements, chances are an ORM will totally save you. If you have a high-volume application with severe execution time constraints, you will probably encounter details that you never wanted to hear about.
I think the cost of using direct data access (data readers, commands and the like) is relatively linear compared to the project complexity. ORMs are much easier for a long time, but once you get to the hard end of the scale, their intricacies really start hurting you. I think this is particularly true for Entity Framework; for NHibernate, the availability of the source code provides some relief – or just just an extra measure of confidence – for the most extreme hacking scenarios.
Don’t pick an ORM to make things easier unless you’re dead certain that you’ll stay in the comfort zone of your competence. Of course, if you’re the resident EF/NHibernate surgeon, you’ll enjoy the benefits far further than a team full of newbies.
Personally, I wouldn’t change our basic decision to use NHibernate. However, I would be more judicious about our options in each particular point of the application – using an ORM somewhere does not necessarily mean it is the best thing to do everywhere. Also, I would make sure the ORM understanding is better disseminated inside the team – although not everybody needs to be able to fine-tune those fetch optimizations and lazy loads, everybody should understand the issues clearly enough to spot possible problems in reviews and other pre-deploy phases.
Up next, a week of vacation. Later in June, we’ll look into the back-end in more detail.
May 27, 2011
· Jouni Heikniemi · One Comment
Tags: data access, Entity Framework, NHibernate, ORM, valio.fi · Posted in: .NET
One Response
Heikniemi Hardcoded » Valio.fi deep dive #8: Resources and ORM - January 10, 2012
[…] we’re standing at the edge of the code pool. Let’s dive in! Good background reading: Deep dive #4 on ORM choice, deep dive #7 on database […]
Leave a Reply