Valio.fi deep dive #6: Features of our custom CMS

In the last post, I touched on the choice of using a CMS product or writing your platform yourself. We picked the custom platform approach, and this time I’ll tell you what that led into.

What’s in a Content Management System?

Wikipedia defines CMS in a very clumsy and overgeneric way. Let’s not go there. Given my last post’s definitions on applications and sites, I’ll just list a few key tenets of a modern CMS:

  • Administrators must be able to produce and publish content.
  • The content will consist of text and images, which will be mixed relatively freely.
  • The administrators must be able to define a page structure (and the URIs) for the content.
  • Typically, the administrators must be able to maintain a navigation hierarchy and cross-linkage (tagging, “similar content” highlighting etc.) between the pages.
  • The system must be ready to accept user feedback (comments, likes, whatever).

These are typical features for complex blog engines. Full-blown CMSes often add features like workflows, extensibility frameworks, versioning and whatever.

Dissecting Valio’s content

For Valio, we had two content sources: First, the actual database content derived from Valio systems (recipes, product information) and second, content produced and entered on the site level. Most pages rely mostly on either of the sources, but almost all contain pieces of the other type.

Case 1: The Recipe page

Let’s look into a typical recipe page with some annotations first (click for larger size):

valio-reseptisivu-exp

There are four distinct regions marked in the image.

First, we have elements from the page template, the actual HTML. This contains everything that is not in a box in the image: There are fragments of template HTML here and there. For us, the template is technically a set of recursively contained MVC views and partial views. They take some structural metadata as a model and thus render the correct elements (including breadcrumb paths, possible highlight elements and so on).

Second, there is the recipe itself. In the Valio case, this is a typical example of “application data” – the recipes are imported from an internal LOB system designed specifically for recipe maintenance, and the actual recipe data has minimal editing tools on the site; the exception is the users’ ability to create and edit their own recipes, but that’s a slightly different scenario – and at any rate, it is a way to modify the business data, not maintain the site content.

Third, there are the recipe links on the right side of the page. These links are definitely a spot for maintenance: content editors are free to customize whatever is shown with each recipe. However, due to the volume of the data, hand-crafted maintenance cannot be the only option. Thus, we generate appropriate links automatically from business data whenever there are no more specific requirements. This is clearly an example of a CMS-style function, although with some business understanding thrown in.

Fourth, there is the commenting feature. This is standard user generated content, and most definitely a CMS-like element.

All in all, recipes are a very app-like thing from a content perspective, although with some CMS elements added, Consider this in the context of your own development work: How would you add features like this (commenting support, link generation, ability to customize the highlights)?

Before looking at our approach, let’s slice down a totally different kind of page.

Valio-artikkelisivu-expCase 2: A product article

Articles represent the entirely other extreme from recipes: They are not apps in the sense that their content is entered mostly on the site.

Now, look at the article page image – a typical, albeit a very short, example of its kind. You’ll notice a couple of things: Basically, the page has a template (the unboxed parts). Also, it has two boxes, columns – or let’s just call them zones.

Why zones? Well, if you look closer at the article page (click to get a larger picture), you’ll notice that the zones are split by dashed lines. Those dashed lines represent individual fragments of content. We call them widgets.

Now, this sounds awful lot like a CMS – perhaps even SharePoint with its Web Parts and Web Part Zones. In fact, our CMS functionality is very similar. We have a couple of dozen different widgets, and you can drop and rearrange them into zones. Widgets can also be parameterized – some only trivially, others extensively.

Let’s quickly run over the widgets on the article to the right. On the main content zone, the first one is a picture widget. You can simply attach an image or several to it, and create either a static image or a gallery. The introduction text is just a text widget (very much a DHTML editor on the administrative end), but set to use a layout that renders the default text with the intro font.

Below that, there’s a short run of text – it’s another text widget, this time with another layout. And further down, there’s yet another widget: a recipe highlight. This one shows a specific, predefined recipe with a largish image and the key ingredients. The layout for the recipe highlight has been defined for the site, but the data is pure business data, not designed or edited for the site.

On the right-hand side, there’s a Link widget (the link to the “Piimät” category), an Article Highlight widget set to show three highlights – some of them may be editor-customized, while the rest are automatically filled by metadata-driven searches. Then there’s another recipe highlight, but with a very different layout from the one in the main zone. And finally, there’s a three-item Product Highlight widget.

The building blocks finally take shape

CMS structureAfter explaining this all, let’s look at the big picture. Our key concept is a resource – you might more easily grasp it as a page. Each resource has a URI and some content.

Ok, but how is that content laid out? Each resource also has a type, and we have a few dozen of them. The most understandable ones are those like Recipe, ThemeArticle, Product and FrontPage. Each of these types defines a template, which consists of the two things:  an HTML template that defines the raw markup plus the available zones, and a default set of content (prepopulated widgets in zones etc.). In addition to the template, the resource type also defines the code needed to execute a page – in practice, a controller.

A resource template contains a set of widgets for a typical layout scenario, but often writers will create additional widgets to flesh out the article: they’ll want to include sidebar elements, perhaps use product galleries, embed video or whatever.

App-like resources such as recipes are different. First of all, these resources are typically born when an integration task creates them. Suppose Valio devises a new recipe for a meat stew. As they enter it into their recipe database (an operative system beyond our control) and the publication time passes, the Recipe resource is automatically spawned. The resource is populated with the reference to the business entity (The New Stew), and product data such as ingredients and preparation instructions are properly shown.

But that’s not the end of the story. Even with these relatively self-sufficient app-like pages, the page still has widgets. Although the content editor only has limited influence in the app-driven part of the page, the right columns in particular are open to customization. The templates define these widgets as auto-populating: typically, “find three to five items that match current resource’s metadata”. But using the admin view, the content manager can define a custom search (ignoring the local metadata) or even specify the actual search results themselves. If the admin only wants to specify one thing to highlight, the rest can still be populated through automation.

Auxiliary features

The previously described elements take care of the main content production and editing workflows. Resources enable us to semi-seamlessly arrange together content from varying sources. But there is more to all these resources, and even this list isn’t extensive.

At one end, we have user participation and user-driven content production. For the sake of simplicity, let’s split this into two. First, there is the custom recipe editing, which is a huge topic in itself. In a nutshell, the recipe editor creates the business entities (recipies, ingredients etc.), whips up a new Recipe-type resource and links all these things together for display. The second, more approachable part is everything else: the ability to comment, like and vote on things. We record all this data – as well as popularity information such as view counts – per resource, allow moderation and content blocking on a resource level and so on.

Another additional feature provided by resources is the preview toolset. Each of the resources has a copy of itself created. Only content managers can see it, and it’s called the draft. In fact, you can’t even edit the published resources – you’ll always edit the draft. And then we have two actions to help further: Publish, which replaces the published version with a copy of the draft, and Revert, which replaces the draft with a copy of the public version.

Conclusion

As I write it, the features listed sound simple. In fact, they are, and they make up a reasonable usable CMS. That doesn’t, however, indicate triviality of implementation: there were quite a few difficult compromises made during the design process. From a purely technical standpoint, the system is very incomplete in terms of features and tools. From a practical usability and efficiency standpoint, I think we hit a pretty good medium: we catered to the key business needs without bloating the codebase (or the budget).

In a further post, I will finally cover the topic that originally drove me to write about the whole CMS design issue: the database implementation of resources. Yes @tparvi, I heard you, I just wanted to make sure I can focus on the technical specifics once we get there :-) Up next: Inheritance hierarchies, custom XML columns with NHibernate, and semi-reflected widget construction. Oh yes.

November 5, 2011 · Jouni Heikniemi · 3 Comments
Tags: ,  Â· Posted in: Web

What’s new in .NET Framework 4.5? [poster]

.NET Framework 4.5 had its CTP released in Build, and RTM is coming next year. The key improvement areas are asynchronous programming, performance and support for Windows 8/WinRT – but worry not, it’s not all about those new thingies.

Instead of just listing it all out, here’s a poster you can hang on your wall and explore. The ideal print size is a landscape A3. If you want it all in writing, follow the links at the end of this post. Click on the image for a larger version.

[UPDATE 2011-11-16: I have changed the poster to include changes in F# 3.0.]

[UPDATE 2012-03-07: The poster has been updated for .NET 4.5 Beta release. Also, the poster is being delivered to TechDays Finland 2012 participants - the new updated version is equal to the one available in print.]

More information

Check out these links:

 

If you prefer to have the poster in Finnish, we have published it on the ITpro.fi Software development expert group site.

Any feedback is naturally welcome, and I’ll make a reasonable effort to fix any errors. Enjoy!

October 29, 2011 · Jouni Heikniemi · 34 Comments
Tags: ,  Â· Posted in: .NET

A lesson in problem solving: Never assume the report is correct

A while ago, a colleague of mine reported that our OData services were functioning improperly. I fell for it and started looking for the issue. I never should have. Not that soon.

“The OData services don’t seem to expand entities properly when JSON formatting is used.”

imageI was like “Huh?”. We had a bunch of OData endpoints powered by Windows Communication Foundation Data Services, and everything had worked fine. Recently, the team using the interfaces switched from the default Atom serialization to JSON in order to cut down data transfer and thus improve performance. And now they’re telling me that entity expansion, a feature very native to the OData itself, is dependent on the transportation format of the data. Really?

The alarm bells should have been ringing, but they were silent. I went on and found nothing by Googling. Having spent a whole 15 minutes wondering about this, I then went on trying it myself. Since WCF only provides JSON output through content negotiation, I had to forge an HTTP header to do this. So I went on typing:

PS D:\> wget -O- "--header=Accept:application/json" "http://....svc/Products/?$expand=Packages"

And to my surprise, the resulting JSON feed shape really did not contain the expanded entities. Could it be that .NET had a bug this trivial? Baffled, I was staring at my command line when it suddenly hit me.

Can you, dear reader, spot the error?

 

 

 

The problem is that the expand parameter doesn’t get properly sent. See, I’m crafting the request in PowerShell, and to the shell, $expand looks like a variable reference. It then gets replaced with the value of the variable (undefined), resulting in a request to “http://…svc/Products/?=Packages”. No wonder WCFDS isn’t expanding the entities! Of course, we don’t see this with Atom, since we typically do Atom requests from browser, which doesn’t have this notion of a variable expansion.

So I run up to my colleague to verify he wouldn’t be falling victim to the same misconception. He was issuing the request from bash shell in Mac OS X, but variable interpolation rules for bash are roughly equal to PowerShell, so he was seeing the same issue. So everything actually worked exactly as it should, we were just asking for the wrong thing.

If I had tried removing the –header part from the request, I would instantly have spotted that the expansion didn’t work with Atom either, but I didn’t. Why? Because I was paying too much attention to the problem report’s JSON part, thinking the expansion for Atom works automatically, and neglecting to check the connection between the two. Next time, I’ll be more analytic.

October 23, 2011 · Jouni Heikniemi · One Comment
Posted in: General

Valio.fi deep dive #5: Content Management Systems as platforms

After a brief hiatus, it’s time to look at the Valio case again. This time, I’ll explain a few decisions behind our content management model. I’m sure some of this sounds familiar to almost every web site developer, although most won’t probably dive as deep as we did.

The setup

So you’re developing an ASP.NET MVC web application. You whip up Visual Studio and crank out a few controllers, views and whatnot. Ka-zoom, you have the working skeleton version available in a day or so. At this point, what you’re actually doing is that you’re exposing your key business objects – typically the database – through a server and a browser.

A few days or weeks later you ship the app, and immediately a business user tells you that “Actually, that segment of text is wrong. The correct one is…”. You quickly update a view. Two weeks and thirty updates later, you grow tired. Not only do they want to change the text, but they also want to swap in new images. And one day, you’ll get a mail asking you for a chance to slip in an additional information page. At a specified URI, of course. “Can I have it added to the navigation, too?”

At this point, you’re likely to have a few content delivery related customizations across your codebase. Your route table has a handful of entries specifically for campaign and info pages. Your version control history for the Views directory shows an increasingly high percentage of checkins related to one-off customization requests.

There will be a day when you find yourself asking: Should I just have started with a content management system (CMS), using extensibility hooks to implement your business functionality?

The problem

All web sites require some kind of CMS functionality: the ability to edit typical fragments of web content. Because of this, almost all web sites are based on a CMS. And there are so many of them; see the Wikipedia list.

At the other end of the spectrum, most web applications have very limited needs for content editing. Almost all have some requirements in this regard: even the most closed warehouse management tool often has a small area reserved for administrative announcements.

Most web projects fall between these two extremes. A typical e-commerce application is mostly an application, but it definitely needs content input: product descriptions, images, special offers etc. all need to be designed, tested and typed in on the go, without needing a developer’s input. A complex and diverse collection of content (such as Valio) is by definition a site and needs to be editable, but it also has a huge load of functionality – some of which may be considerably easier to implement using plain programming tools, not a CMS framework.

Picking our sides

When planning the technology strategy for the Valio project, we knew we were in trouble no matter what we picked.

There were requirements for considerable application-like functionalities including user-produced complex content (recipes), APIs for third parties, and demanding search rules. Simultaneously, we knew we would have to entertain a diverse group of content producers, from experienced web editors to amateur moderators.

If we chose a CMS… We would have at least a stub implementation for most of our CMS functionalities.

We would implement all our logic using the CMS’s extensibility API. This might range from being moderately acceptable to extremely painful, but we probably wouldn’t know before we tried.

We would have an authentication / user profile system available out of the box. However, most profile systems are not designed to be very well extendable. In particular, most don’t support complex integration to external directories.

If we wrote a custom application… We would have a relatively straightforward task implementing all the custom things. Hard things are still hard, but we’d be able to estimate them with reasonable confidence.

We would have to write all the CMS tools by hand. Given our reasonably complex set of requirements (page previews, limited versioning, web part –like composability, customizable URIs, page templates), we knew this was going to be quite a few lines of code.

We could do whatever we want with the authentication. Of course, that meant doing it all by hand, from square one.

 

As you probably know by now, we picked the custom application route. The CMS side had its perks, but we decided against it for the following reasons:

  • The client had a fairly particular vision of the content management model. While they didn’t have stated requirements for the administration UI, we knew there were quite a few practical requirements, particularly regarding bulk moderations, that were not typically sufficiently implemented in CMS packages.
  • While we had lots of CMS experience, we also had the requirement of using Microsoft technology, and an inner desire to use MVC to enable very fine-grained HTML output management. We also preferred an open source approach to ensure we could deliver everything we wanted. That left us in a situation where none of us had experience on a CMS that would match the requirements. Since evaluating CMS extensibility is very time-consuming, we didn’t want the additional risk of eating our precious days.
  • On the other hand, having lots of CMS experience (including developing two of them) gave us a head start on designing the necessary infrastructure. Thus, we felt less intimidated by the challenge of creating our own tooling.

At the crossroads?

If you’re facing the same choice, I wouldn’t blindly recommend following us. We have been met with plenty of criticism and surprised faces when telling this story. Many people consider custom app writing a symptom of the NIH syndrome, particularly on a field as well established as CMSes. Also, it is a non-trivial exercise even if you have a good idea on what you’re doing.

The key lesson here is to play to your strengths, and choose by the project type. If you have a team with experience in a particular CMS and you have complex content management requirements, that particular CMS is likely to be a good idea. Then again, if all your users need is a single bulletin board for announcements, taking on a CMS framework is probably a hugely unnecessary piece of extra baggage.

However, one important thing is schedule predictability. If custom code and a CMS seem equally strong, consider any pre-baked system a risk in terms of change management: you quite likely cannot predict all its design constraints in beforehand.

For example, the requirements for Like button throttling in the Valio case were discovered reasonably late in the project, as was the moderation workflow for user's recipes. Most CMSes don’t offer smooth customizability in scenarios like this, and thus a small requirement can suddenly result in a larger refactoring, perhaps writing a module that replaces a part of the CMS itself. You would also be excessively optimistic in thinking that relatively obscure elements such as community content workflows would – or even could – be defined before the project.

The diagram below illustrates some of the design aspects and their weight in a totally customized scenario as well as a CMS-based one.

image

Not all CMS platforms are equal. The right end of the axis represents the versatile but hard-to-extend CMS solutions like SharePoint. The middle section of the chart represents CMS stacks that are more like toolkits for do-it-yourself site development; there are plenty of open source CMSes that are unfinished enough to be called such.

The conclusion

We picked what we picked because of 1) the requirements we had and 2) the people we were. Neither is irrelevant, and this is a key takeaway: A different team might pick an entirely different solution for the same requirements, and they might quite well be right.

You won’t have an easy time deciding on this: it’s a complex architectural choice, and estimating the actual impact of either option is reasonably hard. In the next post, I’ll discuss the key technical choices of our CMS implementation (on class/table level) to give you an idea on what sort of challenges you might be facing and help you in the process of gauging your cliff.

October 20, 2011 · Jouni Heikniemi · 2 Comments
Tags: ,  Â· Posted in: Web

Looking back at TechDays Finland 2011

Pretty close to half a year ago, Microsoft held the largest annual developer + IT Pro event in Finland, TechDays 2011. In six more months, it’s happening again.

As I was considering the various topics I might talk about, it always took me back to thinking about the years I’ve been talking there. What did people like? What kind of topic would interest people? How can I be better? My talks generally seem to fill a room of 100-200 people, but what do I have to say that’s worth 100-200 hours of Finnish developers’ time?

The TechDays speakers have had almost nonexistent visibility at the feedback gathered from the event. Well, that is, until now. Allow me to present the TechDays 2011 Feedback Infographic: (click on the image for additional resolution)

 

Td2011-Feedback

Challenges for TechDays 2012 speakers

There are plenty of conclusions one can draw from the data. The one that struck me the most is that people want highly practical information, but not without the theory. Case presentations beat pure theory hands-down, but the most popular sessions were the ones with high-energy presenters, a sound theoretical basis and a continuous pummeling of practical demos.

As I’m left pondering this, I want to offer three TD2012 challenges for fellow Finnish speakers:

  • Sessions in Finnish scored 0.27 (on a scale of 1..5, that’s a lot) lower than sessions in English. Many attendees seem to gravitate towards the heavily rehearsed tracks coming from foreign travelling speakers. This must stop. Even though I think many of the travellers are absolutely great, Finnish technology professionals must be able to be more relevant and interesting*.
  • Developer sessions scored a 3.61, which isn’t particularly bad, but it’s not good either. Many developer sessions are far too much based on the equivalent PDC/TechEd/Build sessions, perhaps even slides. Let me repeat: people want practical information and your experience with the theory. Let’s do better this year.
  • Read Scott Berkun’s Confessions of a Public Speaker and either Presentation Zen or Slideology. Reduce the amount of verbiage in your decks and talks, and replace it with raw energy.

*) While berating the state of Finnish speaking, I must tip my hat off to Sami Laiho, a local Windows MVP. As stated in the infographic, the man did five different presentations, scored a 4.43 average (which would, had Sami’s presentations been a single track, been the most popular track of all TechDays), and pulled off the most popular presentation with a stunning 4.58 score. Oh, and the 4.58 was done on day 2 first slot, which speakers typically shun as the previous night’s attendee party is considered to hamper the contact with the audience. Blah blah.

Data disclaimer

A few words on the data used:

This post is based on a data dump of TechDays 2011 feedback I have received from Microsoft. No personally identifiable information was ever transmitted to me. The infographic has been cleared for publication by Microsoft, but such publication probably indicates no endorsement or broader approval. I do not have permission to redistribute the raw data, so questions for further info may or may not get addressed.

This blog post contains conclusions and opinions, which are naturally mine. On the other hand, the infographic is based on objective, raw event data with the following exceptions:

  • Some questions have been grouped together for better illustration (namely, the food and venue/organization ones).
  • I have manually divided the presentations into “type” categories (theory, practice, case, lecture series). This grouping is non-official.
  • Grouping presenters was done by me. Deeming people “professional speakers” or “nobodies” involved subjective analysis.
  • The textual description and the visualization is mine. The original data sheet nominates no kings, nor were pizza slices actually available.

Thanks for listening – and let’s hear your thoughts on TechDays 2011 and its feedback :-)

October 10, 2011 · Jouni Heikniemi · 18 Comments
Tags: ,  Â· Posted in: General

9 things about one Offbeat year

One year and a couple of days ago we kicked off Offbeat Solutions, our consulting company of four people. We are still the same four, but far bigger four: we know more, we’ve done more, we’ve seen more. Here are a few insights from the first year.

3 things we nailed

Customer satisfaction. We took deliberate risks in picking our projects, but didn’t overdo it. We managed to deliver challenging things with a positive attitude, and people around us seem quite happy. We wanted to become trusted advisors to our clients, and for most, we did.

Not hiring anyone. It would have been easy (and even profitable) to get new people along; we had exactly the right cases for that, and we could have done so with acceptable costs. But had we done that, we wouldn’t have this strong unity, the ability to take on future incertainty with extreme trust in ourselves. There will be time for new hires, but it is not yet.

imageSetting up the office late enough. We discussed this office thing early on, but executed right at the 12-month mark. And damn, am I happy about it. Having worked together for a year, we had a reasonably good idea on what we’ll need to be productive (and have fun). Had we gone for the place instantly, we would not have had the resources to make it good. We probably still wouldn’t have moved out, simply because of the effort it takes. We could’ve been cheaper on this, but probably not much more efficient. More on the office thing in a later post.

3 things we sucked at

Teamwork rigor. This may come as a surprise to you, particularly if you worked with us. Still, internally, we feel that we haven’t been at the top of our game in working together. Sure, we beat the average organization hands down, but we find ourselves insufficiently seeking support from our each other – even when we obviously have a great chance. One of our 2011-2012 goals is to kill all procrastination by not only allowing, but demanding requests of assistance when things seem to slow down.

Technology breadth. We wanted to focus in working with the Microsoft stack, but take in the best parts from other technology families. Sure, we leverage lots of open source and external tools, but we’re nowhere near good enough. We write great ASP.NET MVC apps, but after our second year, we should have zero problem writing the same with Node.js, Rails, Akka or whatnot. And even inside the Microsoft stack, there’s plenty to learn.

Sharing with the community. We had the vision of being a storyteller, a company whose experiences would be publicly shared. We did some of that (first and foremost, our posts on valio.fi), but we were nowhere near as loud as we wanted to be. Given that we consider the year a great learning experience, it’s simply a shame that we haven’t found the time to share more.

3 things that made the year worth living

imageWork/life balance. Can’t say it was perfect, and we certainly worked too much at one time. Still, we have exploited our freedom of choosing where and when we work. There were frustrations, but few of them were really caused by too many hours.

Sufficient diversity. We did quite a few things. We worked on CRM strategies and security reviews. We wrote code and helped manage projects. We tackled SharePoint, Azure and PHP all alike. We worked with startups, SMBs, large enterprises, NGOs and the government. All great learning experiences.

The fun of it. We developed a culture – of course, not really intentionally. We developed a (bad) taste of office music, we printed out big pictures of legendary artists on the walls – and naturally, we started wearing military uniforms made of rubber during code reviews. We were supposed to be offbeat, right?

Summary

It’s been a blast, but we’ve grown rather critical about ourselves. Our start was way smoother than we expected, and that gave us a somewhat false feeling of security.

We’ve been fighting it for months now. It would be easy to bury ourselves into a few big customers, learn to understand their business deeply and become irreplaceable. But then, a few years down the line, we’d find ourselves fat, lazy and unlearned. Pretty much the opposite we planned originally.

Plan for 2011-2012: Work on the most difficult and ugly things we can find. Win ourselves every day. We’ll be back to tell you how we’re doing.

September 2, 2011 · Jouni Heikniemi · One Comment
Posted in: Entrepreneurship

SANKO-tapahtuma: Funktionaalinen ohjelmointi ja F#

This post is an exception from the English majority of my writings. I will be summarizing the Finnish .NET User Group’s event on F# and functional programming, and will do it in Finnish. In summary, a great event!

Neljäs SANKO-tapahtuma järjestettiin 9.6. ohjelmistoyhtiö Reaktorin tiloissa Helsingin keskustassa. Tuomas Hietanen ja Rami Karjalainen alustivat kertomalla kokemuksiaan funktionaalisesta ohjelmoinnista ja esittelemällä F#-kieltä. Kolmen tunnin tiukan esityksen jälkeen ilta jatkui ruokailulla ja saunomisella. Paikalla oli kokonaisuutena kolmisenkymmentä ihmistä.

imageTuomaksen ja Ramin esitykset voisi tiivistää seuraavasti: He näkivät funktionaalisuuden olennaisena positiivisena trendinä, ja kokivat F#:n erittäin vakavasti otettavana kielenä vähintäänkin liiketoimintalogiikan ja datan toteutukseen. He olivat pitkään tehneet yhdessä vakuutusalan hanketta, jossa F# oli keskeisessä roolissa.

Yleisössä oli vahvaa kiinnostusta aiheeseen, ja Tuomas saikin selittää funktioteoriaa ja muita periaatteita varsin pitkällisesti. Käyttökokemus F#:sta itsestään oli harvassa, mutta C#:lla funktionaalista ohjelmointia – siis LINQ:n tehokäyttöä – oli kyllä harrastanut monikin. Funktionaalisuutta harjoitellessa F# nousi C#:a suosituimmaksi mm. vahvemman tyyppi-inferenssin (tyyppien automaattinen päättely, jolloin koodissa tarvitaan vähemmän turhaa tyyppien toistelua ja generics-kulmasulkeita) ja interaktiivisuuden takia. Käytiinpä esitysten lomassa myös keskustelu, jossa pohdittiin testivetoisen kehittämisen (TDD) arvoa ja sitä, onko itseasiassa interaktiiviseen REPL-työkaluun pohjautuva interaktiivinen kehitysmalli tehokkaampi.

Tuomaksen esitys: kalvot ja esimerkkisovellus

Ramin matskut: kalvot ja koodi

SANKO-toimintaan voit tutustua Facebookissa ja LinkedInissä. F#:lle on myös oma käyttäjäryhmänsä.

Kiitos kaikille osallistujille ja esiintyjille – ja tietysti myös Reaktorille sponsoroinnista. Ja hei, Reaktorin läsnäolo .NET-kentällä tarkoittaa siis sitä, että Suomen parhaana työpaikkana monesti palkittu softatalo on nyt auki myös Microsoft-osaajille. Se on merkittävä asia, enkä sano tätä vain siksi, että sain hyvää ruokaa. :-)

June 26, 2011 · Jouni Heikniemi · 165 Comments
Tags: ,  Â· Posted in: General

Valio.fi deep dive #4: Solving the ORM dilemma

In my series on valio.fi deep dives, I’ll return to the original fields of Offbeat and discuss some backend solutions. We won’t dive into the code today, but will take a look at some hard architectural choices.

Essentials of your ORM decision

We chose to build a reasonably large web site using NHibernate. A few members of the Microsoft developer crowd immediately asked: “Why not Entity Framework?” Meanwhile, a few of our developer friends had a lengthy discussion deeming ORMs worthless and unnecessarily complex.

There are really two subquestions here:

  1. Should we use OR mapping to make data access more straightforward?
  2. If yes, which OR mapper should we use?

Let me be absolutely clear on this one: I think both those questions are important to consider honestly. Many object-oriented developers feel that automated object/relation mapping is an essential part of developer productivity. I say that’s a valid point, but somewhat self-supported: ORM is important if you aren’t good enough at performing data access without it.

In this post, I’ll go over our decision making process, and then look back at where our use of NHibernate succeeded and failed. Mind you, the preceding emphasis is not a style issue: We take full responsibility for both our successes and failures, and do not intend to blame or praise NHibernate unnecessarily. It was just lying around, we used it.

To ORM or not to ORM?

This question was at the top of the list a few years ago. Now ORMs are mature enough to be the default choice for most developers, thus eliminating the question. Although the option of working without an ORM was debated, we sort of wanted one anyway. Our decision was based on a few facts:

  1. We had a non-trivial (although not particularly complex by any measure) business model, consisting of a few dozen entities.
  2. We had reasonably straightforward query requirements, mostly based on foreign key navigation.
  3. We had a mostly-read model, which ORM could naturally support by providing features such as identity mapping and caching.
  4. We had relatively few complex write operations and thus thought we’d get away without encountering the worst parts of mappings.

All these are good signs for ORM use. We weren’t entirely right in our predictions when making these decisions, but looking at it afterwards, I’m not sure we could have done better.

Picking the titans to fight

The next step was to compare Entity Framework to NHibernate.

Wait, why exactly these two? Welll, there are two factors: First, we had NHibernate experience available. Second, EF was the Microsoft Option. Well, we also had experience in other ORMs beside NHibernate and EF, but after considering our maturity and expertise in each, decided to limit ourselves to these two.

“The Microsoft Option”? It’s usually the official platform-integrated benchmark solution. Razor for view engines, EF for ORM, WCF for REST implementations and so on. Often not the best option, but the best known one. Should always be one option when picking a technology – being well-known is a big bonus, plus your software design never passes even a cursory review without you being questioned about ignoring the default option.

Don’t confuse that with picking the Microsoft Option blindly. We as a community of Microsoft developers are plagued by a disease that encourages choosing the MSFT solution every time, regardless of factual benefits. That is just as bad as ignoring the Microsoft option without thought.

Being included in the platform is a merit that justifies being shortlisted into any technology comparison, but definitely not winning them.

The fact that we chose only those two products for comparison seems arbitrary. This is an important point. It is entirely possible that there could have been a better solution we just didn’t know of. We decided to limit ourselves to these two because we considered the cost of exploring other options greater than the potential benefit. For any given technology, the benefit of evaluating a new option is about this:

B = p * ( AE – LC ) – EC

In the equation, the benefit B comes from subtracting the learning cost LC from the added efficiency AE and multiplying the result with p, the probability of this technology actually being picked. Furthermore, you need to subtract the evaluation cost EC from the benefit (although conceivably EC and LC would somehow merge, but we’ll keep it simple now).

Now, consider this: For an OR mapper, sufficient evaluation of an unknown variant takes at least a day. The trivial stuff you do in an hour, but going through all the complex scenarios (custom field types, xml handling, transactions, caching, cascading deletes, …) takes at least a day. The learning cost of doing all that properly is at least 5-10 days, particularly in a team of multiple developers. Even if we considered that some Acme ORM would be a contestant equally favorable to EF and NHibernate (i.e. p = 1/3), we would get:

B = 1 / 3 * (AE – 5) – 1

Quickly looking at it, we realize that with AE < 8, we would probably gain no benefit at all. Even if we estimated (before evaluating anything!) that a new technology saved us a whole month (20 man-days) of effort, our estimated benefit would be 1/3 * (20-5) –1 = 4 days. Would four days actually offset spending the one day in evaluation? And if we chose that, we’d gain 15 days but take a whole bunch of risk in exchange. How well can future generations of maintenance programmers benefit from this choice? And mind you, 1 day for eval, 5 for learning and 1/3 for usage probability are very optimistic estimates.

I’m very much pro-evaluation when choosing technologies, but you have to know when to stop. Considering that we didn’t plan to do many man-months of data management effort anyway, the potential gains never felt important enough to warrant the exploration effort. Thus, we focused on things that we found more productive.

The discussion above has nothing to do with ORMs per se, but ORMs are a good demonstration of this principle: since each stack does roughly the same things, knowing one of them relatively well is an extremely strong argument for picking it up and getting to real work.

Entity Framework vs. NHibernate, real arguments

We chose NHibernate for a few reasons. The key arguments were:

  • NHibernate is more mature (and at the time the decisions were made in September 2010, the difference was even greater)
  • NHibernate was better equipped for distributed caching (EFCachingProvider labels itself as a “sample”) and manipulation of fetching strategies.
  • NHibernate + FluentNHibernate provided a reasonably approachable way to configure various complexities; EF’s approach of hiding some of this in a design diagram isn’t entirely without problems.

We did give EF serious consideration – finding people with NHibernate competence isn’t trivial, and as such, NHibernate is a hindrance to the further development. However, given the constraints we had, we had more confidence in choosing technology with a longer history (particularly on the Java side).

Regrets?

Our data layer architect Lauri* spent considerable time tweaking and tuning the data access mechanics – but most of this was done to create a smooth way to organize DAL code and make it accessible through the IoC mechanism we had. The bulk of that work was unrelated to NHibernate in itself, though much of it would have been different if we had opted for raw data access instead.

Most significant architectural headaches we had with NHibernate involved some of the more complicated structures. We have an everything-in-one-table –modeled class hierarchy with a discriminator field and an XML column for subclass data storage. Getting this to work wasn’t exactly smooth, nor was parsing the same column into two distinct fields. I may get back to this in more detail if there’s interest.

Performance problems are one of the most common fears with ORMs. For us, query performance wasn’t really a problem – reckless use of lazy loading did result in some issues, but this is more a result of underconstrained data usage than a problem related to choice of ORM itself. We have a few scenarios where NHibernate’s query cache manipulations actually slow down some operations, and we had to work around that. Most of the time, NHibernate’s fairly intuitive system of specifying fetching strategies worked like a charm.

Perhaps the most head scratching was caused by cascading delete/updates and ordered lists with index values in the database. Each of those required special trickery and effort, and particularly complex update scenarios were considerably difficult to verify as working correctly. These are the definite drawbacks in using an ORM – the abstraction really gets in your way far more than it benefits. Of course, such extremes do not typically occur in the vast majority of the program code.

*) We never appointed a data layer architect. He just grew up to that role. We like it that way.

Lessons learned

NHibernate did a good job, but it did exhibit all the problems of an ORM. Man, that stuff is hard! I mean, both EF and NHibernate make it very easy to whip up the simple demo application. If you have a reasonably straightforward business application with few concurrency issues and low performance requirements, chances are an ORM will totally save you. If you have a high-volume application with severe execution time constraints, you will probably encounter details that you never wanted to hear about.

imageI think the cost of using direct data access (data readers, commands and the like) is relatively linear compared to the project complexity. ORMs are much easier for a long time, but once you get to the hard end of the scale, their intricacies really start hurting you. I think this is particularly true for Entity Framework; for NHibernate, the availability of the source code provides some relief – or just just an extra measure of confidence – for the most extreme hacking scenarios.

Don’t pick an ORM to make things easier unless you’re dead certain that you’ll stay in the comfort zone of your competence. Of course, if you’re the resident EF/NHibernate surgeon, you’ll enjoy the benefits far further than a team full of newbies.

Personally, I wouldn’t change our basic decision to use NHibernate. However, I would be more judicious about our options in each particular point of the application – using an ORM somewhere does not necessarily mean it is the best thing to do everywhere. Also, I would make sure the ORM understanding is better disseminated inside the team – although not everybody needs to be able to fine-tune those fetch optimizations and lazy loads, everybody should understand the issues clearly enough to spot possible problems in reviews and other pre-deploy phases.

 

Up next, a week of vacation. Later in June, we’ll look into the back-end in more detail.

May 27, 2011 · Jouni Heikniemi · One Comment
Tags: , , , ,  Â· Posted in: .NET

Valio.fi featured on the Vierityspalkki blog

A step aside from the deep dives!

The valio.fi project is now also presented in the famous Finnish Vierityspalkki.fi blog. If you can read Finnish, check out my guest post detailing the site and project from a general web developer (non-backend) perspective.

Meanwhile, deep dive #4 is cooking. The topic will be OR mapping, and I’ll post it on the next week. My apologies for the delay inbetween – you wouldn’t want the details (dental surgery and all).

May 20, 2011 · Jouni Heikniemi · No Comments
Tags:  Â· Posted in: Web

Valio.fi deep dive #3: Review tooling

After my previous post on review policy, let’s have a look at the tools we used for reviewing code.

I have found there to be two approaches to reviewing code. Let’s do a quick comparison first and discuss the tools of the trade next.

Patch-based review Social review
The key concept here is that a developer prepares a set of changes, and publishes it for review. The reviewer focuses on the changes (i.e. “the patch”). 

Reviews typically happen through a system (from an work-item attached unidiff file to a full web-based system).

Comments on the review are often formalized documents, email messages or if a specialized system is used, its comment entries.

The developer presents the code to the reviewer. Some do this in a meeting room where multiple people review simultaneously, most of the time it happens pairwise at the dev’s workstation. 

Reviews tend to focus on code understanding as a whole, and discussion about the code happens in-review. Written documents are sometimes produced, but most organizations rely on verbal communication and the developer’s personal notes.

 

Mind you, this is not review theory, but a very important split when looking at the tools. Most projects can easily benefit for the social approach because it requires less tooling and technical sophistication. My experience is that neither catches everything, and the ideal approach might involve both – but reviewing everything twice is often too expensive in terms of time.

We did both, and the social one was easy

imageSocial reviews happen naturally when a team communicates well. The rigidity of the team organization usually determines if social reviews are specifically scheduled, if comments and their responses etc. are tracked, what kind of audit trail is required to accept a social review from a project management perspective and so on.

We used a reasonably liberal system. We required reviews (see the previous post), but accepted socials just like more formal, patch-based reviews. Either form of review was sufficient as long as the reviewer and developer felt like it. And we had no tools for social reviews. We had no document templates, no scheduling, no way to track the comments.

It worked for a reasonably well-bonded team of nine, and it might work for you. Don’t use lack of tooling as an excuse to not review socially; if tools are necessary for you, you’ll see the need to find them as you go.

Patch-based is a different beast

When you adopt patch-based reviewing, pay attention to the tools. I’ve done this for years with the pure unix diff/patch approach, and would not recommend it for development today. There are tools available, and looking at them is a good idea™. We used Crucible, but let me be crystal clear: We are tool pragmatists. We used Crucible because it was easily available, relatively cheap and worked for us. We don’t claim it was the best solution even for us, much less you. If you have the liberty of choosing your tools, do check it out – but also look at the competition.

But before you go on a spree of tool comparison, let me remind you that your review abilities, processes and social capacity far overwhelm your selection of tools in terms of determining your success with code reviews. If you’re setting up a review system from the scratch, you might spend a day on finding the appropriate tooling, but spend four more thinking about what, how and why are you reviewing.

Enough with the intro, go Crucible

The rest of the post will now discuss our patch-based reviews and the use of Crucible (to which the repository browser FishEye is tightly integrated to). First, I’ll walk you through the typical elements of a review, then discuss some of the tooling experiences.

As discussed before, we were in the post-checkin camp. Therefore, our reviews started with the developer committing the code to our version control (Subversion). We had Crucible and FishEye set up to poll the repository on regular intervals. Thus, on our Crucibile web site, we continously saw the stream of checkins listed.

image

For any of the checkins, one could create a review and pick the reviewers desired. There are multiple ways to configure and use Crucible; we used an approach that forced the author to pick the reviewers, but the reviewers could ask for more people to join. Although we encouraged an open review culture where anyone could review anything, we felt it necessary to identify the reviewers responsible – this would give everybody a clear idea on what’s on their plate (“Oh, I have these four things to review today”).

image

In Crucible, each review request contains the files to be reviewed. Since Crucible is integrated to the version control, these files are picked right from the source control tree, and can be either diffs (changes between two versions) or whole files. In any case, the files are shown in the browser, with coloring for the diff elements.

image

The whole point of reviewing is to get feedback on the code. Crucible’s feedback system is based on two concepts: comments focused on a certain code block (one or multiple lines) and comments on the whole review. The former is an approach to very specific feedback. Crucible also allows replying to the comments, thus generating nested comment/discussion trees.

image

Side note: As you can see from the screenshots, we had English as the code language but Finnish as the working language, and reviews were conducted in Finnish. We didn’t consider find this to be a problem. For us, a code comment is documentation for the future developer (potentially not knowing Finnish); a review comment is discussion between the team now (which is totally Finnish).

We make no recommendation on this – apart from keeping the commenting/reviewing bar as low as possible. Finns generally comment more eagerly in Finnish, but your mileage may vary.

The review process is fairly simple: Every requested reviewer goes through all the files, adds his comments and marks himself as complete. Once every reviewer has completed the review, the author goes through the comments, replies on them, fixes the code and finally closes the review. Rounds of discussion can ensue inbetween, but are often better handled outside Crucible (see below).

Experiences and best practices

Keep your reviews small. In fact, keep your checkins small. If your checkins are small, your reviews are too. Fix one aspect of code in a single checkin, and then review that. Even though Crucible encourages the model of picking a source control checkin and clicking “Create Review”, you can and should split checkins for review. I’ve written about this in detail back in 2004, so why repeat myself?

Review layers of trust or reuse separately. For example, if you develop a feature that needs new general use tools you implemented in the tools library, consider reviewing the tools separately. Separate review of the tools forces more focus on the general usability of the tools as the time-pressed reviewer doesn’t have the inclination to just look at how they function at the specific call site. However, you don’t necessarily want to take this too far; reviewing user interface and related data access in one go is probably a good idea.

Review unit tests. If your code has unit tests, include them in the review – and review them. In fact, quite a lot of attention should be paid to the tests, as they are a good indicator of various code issues. Reviewer should understand the tests thoroughly and then analyze their thoroughness – are all error paths reasonably tested? This approach provides an excellent way to spot bugs that you would very easily skip when just reading the actual implementation code.

Don’t make reviews your discussion board. Prefer short comments, debate face-to-face and then summarize the result in a short comment. Chatting through Crucible is inefficient, and while reviews can have considerable short-term documentary value, this value is not improved by including all discussion. Including the relevant results of a discussion provides the best balance.

Combine review methodologies to reduce pain. An author about to check in a batch of changes should consider ways to improve the review experience. Since diff-based reviewing is based on line-by-line change tracking, certain types of changes make diffs very hard to read. For example, renaming a member in project-wide use causes scattered diffs for hundreds, maybe thousands of code lines. Such a diff is very cumbersome to review properly (i.e. verify that nothing else has changed); it’s usually much better to do the rename together to eliminate the need for after-the-fact reviewing.

Don’t be afraid to take another spin. Every now and then, a review results in considerable code rewriting. This doesn’t mean the original code sucked; sometimes it’s just that the review reveals things that were not accounted for (a similar construct elsewhere in the project, a new requirement or whatever). If the code must significantly change because of the review, reviewers should be comfortable asking for another round (“make these changes and hit me with another review request”). Some may consider this unnecessary supervision and lack of trust for the code author, but they are the ones who misunderstand the key reasons behind reviews. Review until you have confidence in the code being fixed properly.

 

Up next: A dive into the backend technology, for a change.

May 4, 2011 · Jouni Heikniemi · 3 Comments
Tags: ,  Â· Posted in: Misc. programming