March 31, 2005

The spring is here (and web security)

The bicycling season is now officially open for me. I aim at going to work by bike for half of the year (April - September). Although the first days are filled with pain and fatigue, the positive effects are coming already. The dose of oxygen before work is a great performance booster, and of no less importance is the relaxing ride home after-work. What a great way to forget your stress!

Now that work was mentioned: I had an interesting 7-hour meeting on application security with a nice professional group of people from various companies. While the basics of web application security are rather simple, they're surprisingly badly known to most developers. The dangers of XSS are not fully realized, and most people simply don't understand the huge risks involved in SQL injections. There is a tremendous need for more readable, compact information. Framework support is necessary but not sufficient by itself.

Web security in a nutshell: Most vulnerabilities are the programmer's fault. A trivial slip causes catastrophes. You cannot buy a product to fix what the coder has broken. The only way to really improve things is by boosting the security skills of the organization. Learning is mandatory. Those working with me are thus forewarned. :-)

Posted by Jouni Heikniemi at 07:45 PM | Comments (0) | General

March 28, 2005

Feeds are now complete

As requested by quite a few, the RSS and Atom feeds on this blog now have the full content available. I don't actually even understand why MT only picks a short excerpt by default, but there's nothing a little template editing couldn't accomplish.

Posted by Jouni Heikniemi at 08:04 AM | Comments (0) | General

March 27, 2005

Extended responsibilities

When I started working at Blue Meteorite last summer, I foresaw myself as building the background architecture for web sites. It didn't go exactly that way. But, after several diversions and detours (great ones, though!) I'm slowly back in the web business, having spent the first six or so months with requirements analysis and Windows Forms apps.

To cut a long story short, I have been appointed Development Manager in charge of our publishing framework (or content management system, if you will), Meteor. My main task is to work with the product's chief architect to set the future direction for the system, both from a conceptual and a technology perspective. Of course, it's a both a time consumer and an enermous source of challenge and inspiration.

The near future should certainly be a time of learning for me. Although I've been working with the web and communication both in the publishing business and the public sector, the perspective of a software vendor is rather new. Particularly, trying to find the balance between product development and customized service provision is a worthy challenge. I expect to know much more in a few months... and really look forward to it.

Posted by Jouni Heikniemi at 10:00 PM | Comments (0) | General

March 25, 2005

How to acquaint yourself with masses of code?

During the last few months, I've been delving into a huge lump of source code, trying to understand its structure and come up with ways to make the application more structured and expandable. I'm blessed: the code itself is actually rather decent, although the passage of time has certainly deteriorated the structure somewhat (this happens with all code anyway). It's been another great learning process, so I'll share some of the fruit: Jouni's five steps towards turning a huge mass of code into something you actually understand:

1. Scan through the directory structure

Print out a directory tree of the source. Run quickly through the code and make notes on what's in each directory (which classes, what sort of functionality). If you already have this sort of a document, great - but don't use it, do this yourself. Compare with the existing notes and see what you've missed. Be careful, as the old document might be somewhat outdated. Spend no more than one minute per file.

You should end up with a working knowledge of what's in the tree and what's not, although you probably cannot remember it all. Turn your structure document into an electronic form (unless it's there already) or make sure the existing document is up to date.

2. Gather the requirements, bug reports and ideas for the code

Ask around, take a look at mail archives and whatever you have. Try to find out what's wrong with the code, what kind of structural issues have been bothering people, how should the application in its whole develop and so on. Particularly, try to identify the "can-of-worms-bugs" - the ones which everybody constantly talks about but which never get done. Understanding the needs is important even if you're not planning on massive developments, as it often also relays information on how people actually use the code.

If you have a decent bug reporting system and people actually use it, you're going to have an easy time here.

3. Review the codebase

This one takes time, but it's worth it. Go through every class and every method (perhaps not every line, but almost) of the code. Try to identify the major problems you found during the last step on the source code level. If you've heard explanations on why something is difficult to fix, try to understand the reasoning yourself. Make notes to support your memory.

Try to spot patterns. This is very hard unless you're an experienced programmer, but do it anyway. Try to identify the sorts of operations that repeat themselves throughout the codebase. Pay particular attention to these: Is the code required for a frequent task readable? Is it error-prone? Is it easily expandable? For example, if your frequent event is "open a database connection", how do you do it? Do you pass DB connection info and credentials around every time? What if you need to pass a special timeout value - could you do it? Is exception handling done the same way every time? Is it done in any reasonable way?

If you have special expertise, you can use this review round to spot other issues: security flaws, performance bottlenecks, globalization concerns, whatever. But! Don't make the mistake of thinking you'd find all the problems - f.e. exhaustive searching for security flaws cannot be combined with introductory review of an existing codebase. The same goes for most other non-mechanic hunts for code issues.

Allocate sufficient time. For me, it takes an hour to effectively go through 20-50 k of C# code, depending on the complexity of the operations involved. Your speed will vary a lot based on your experience and working habits. I repeat: This part takes ridiculous amounts of time. However, it will provide you with knowledge that considerably helps you in the next steps.

4. Build the code

If we lived in a perfect world, this task would always be trivial. Automated build environments should turn this task into a no-op (by forcing the code solution/project/package to be very independent of any configuration), but it rarely happens. Even with a decent autobuild, there's often some work required in setting up your personal build environment. Be analytic. Why the requirements? Could they be removed? Is the process of doing a clean build straightforward enough?

Although the process of creating a build isn't particularly strongly related to the code itself, the inter-package relations (such as those implied by project references in Visual Studio solutions or whatever your build environment has) tend to become more clear by looking at the build process. Also, if you don't have a module dependency graph (which modules require which parts to build), draw one. Any format is acceptable as long as it's accurate. Again, verify that any existing documents are up-to-date before relying on them.

If the build produces warnings, note them. See if you can figure out the bad practices behind them. If the build produces errors, you're in for a world of hurt. Find a way to fix the issues now or you'll regret it. Return here when you're done. If a breaking build is everyday stuff in your dev team and you can't convince them into changing the habit, go on. It can't stop you, but you're still going to suffer. You have been warned.

After this step, you should have a working understanding (not just knowledge!) of how the software is composed (the modules) and how the modules themselves work (from the code review). You could've built the software earlier - actually, most of us do it as the first step. That may work, if you have a strong build environment and everything goes fine. But if you get errors and the build fails, it'll get frustrating. On the other hand, if you already know how the code works, it's likely that some exploration on the code errors will become a good learning experience.

5. Get a feel on the development

Pick the most trivial of the issues you identified in step 2. Fix it. Make sure everything works thereafter. Repeat a few times, depending on the complexity of the bugs. The purpose of this exercise is not to enhance the product, but to provide a better understanding of how the software gets developed. If the bugs you fixed were more than typo fixes, you should've written or changed at least a few dozens of lines of code. Get somebody from the old dev team to review your changes so you'll get feedback.

The more sophisticated your build environment is, the more there is to learn in this phase. Take a look at all the metrics your code changes produced. If you have code churn analysis or unit tests, there will probably be quite a few interesting reports to scroll through, perhaps even some tests to write.

Once you're past this phase, there's little mental virginity left in your head for this project. Therefore, this is the last possible moment of writing up the ideas you got during the process. What frustrated you? Which parts of the code looked most dubious? Which patterns and practices felt uncomfortable? Raise discussion. Propose better approaches. File bugs. You won't have the same edge later on. Beware of the cynicism that naturally comes from the senior members of the development team.

Next up: Jouni's five steps to fixing the issues found in this process ;-) (no, not really, but I _will_ try to post more notes later on)

Posted by Jouni Heikniemi at 08:54 AM | Comments (2) | Misc. programming

March 20, 2005

Http file upload with parameters

An update to my old code sample for HTTP uploads: you can now post form variables with the files. The new version of Upload.cs contains a method overload that allows you to pass in a StringDictionary of POST parameters. Some basic instructions for use are available at the old post. This file will replace the current Upload.cs in JHLib - until the next release, you can also take a look at the current version of the library for some usage examples of the upload code in general.

Posted by Jouni Heikniemi at 11:20 AM | Comments (11) | .net

Blog comments rather dead for a few days

Uh... I accidentally blacklisted the string ".." on 16th March and didn't realize it until yesterday night, so any blog comments having two consecutive full stops got flagged as spam and rejected. Sorry (and thanks to Max for notifying me on this)!

I could go into a lengthy rant on how software should try to keep the user from making such mistakes, particularly as this one was added by MT-Blacklist's automatic blacklist addition... but since it would sound like trying to pass the blame, I'll just shut up and suffer. Rule number one: Never confirm without reading what you're confirming.

Posted by Jouni Heikniemi at 07:48 AM | General

March 19, 2005

How my review requests vanished

When I left for my vacation almost three weeks ago, I had several review requests for Bugzilla patches pending. And this morning when I finally set out to get them all, they had vanished. Admittedly rather easy for me, although I wouldn't have minded about doing some either. I took one from the "from the wind queue", and it was wondrously easy though :-)

Slightly related: MaxKA filed bug 286822 ("Process Problem: Review From the Wind Never Happens") recently, explaining the issue that reviews never get done unless you choose the reviewer right - and choosing the right one requires extensive knowledge not publicly available for any new Bugzilla contributor. Max's effort of compiling a list of reviewer competencies is worthy, but the list will be difficult to maintain. Still, I can hardly imagine the situation would be worse than it's now.

One thing about "From the Wind" (requesteeless) review requests is that they often represent the most unreadable patches we ever get (all this is IMO, of course). The list tends to have a load of patches from people with no Bugzilla dev experience, they've just hacked something up for their site and then post the initial patch. Rather often, when you deny review, the patch vanishes forever as the author doesn't care enough about getting the code integrated. That said, it's exactly this group of Bugzilla users we should pay more attention to, as they do provide feedback in a way more constructive manner than the people just complaining about missing features.

However, taking into account the rather high quality requirements for new Bugzilla code, it'll be a hard task for a new developer to get his code if the only support he gets is in the form of reviews (which can be great support at times, admitted). It would be great to see the efforts behind Max's list evolve into a sort of tutoring program where special attention would be paid for writing clear and instructive review comments (unlike the vague and offensive ones we sometimes do now ;-)) and providing guidance even in the writing phase of the patch.


I should've said this quite some time ago already, but here goes: My sincere thanks to the "new" (well, it's certainly relative) group of developers that have shown up on the Bugzilla scene. It's great to see the product evolve even though most of the old team has vanished from sight. It's great to cvs up one's test installation every now and then and see the changes trickle in. Many great things have been done. MaxKA, LpSolit, Travis, Wurblzap, Wicked, Glob and everybody else I forgot to mention, keep it up! Perhaps your efforts will make my conscience throb painfully enough to force me to actually code something one day. :-)

Posted by Jouni Heikniemi at 08:03 AM | Comments (1) | Bugzilla

March 18, 2005

Uh-huh

The Boot Camp has been survived - I'm still alive and now equipped with a fancy diploma, a hefty stack of hands-on lab exercises, a little bit of new knowledge and not much else. It would be unfair to say I was disappointed by the course - but pretty close. Two days of a new platform, and all they did was show us a bunch of demos and make us work on some (admittedly really good) exercise tasks.

Trying to keep the rant short: It's ridiculous to just pour as much information as you can during a two-day course. It'll never sink into anyone's head in that time. When people don't learn, they'll blame themselves ("I guess I was told that, I just didn't understand"), so it's basically a good deal for the business. But how about spending some more time actually teaching the issues and making sure people really remember something after the course is over? What's the worth in stating "ASP.net 2 has a provider architecture" if people have no idea on its impact on development practices?

Knowledge is relatively easy to distribute, understanding far less so. If I go and get tutored by professionals, I'd really expect more than just recitals of press releases and published tutorials. But here at work again, I'm still facing the same basic problem: Knowledge does very little as long as you don't truly understand how a feature should be used. Dabbling in new technology is cool, but making it work in a production application requires so much more. Hey... It's called experience. Too bad only few educators are ready to share theirs.

To avoid being totally negative, many Whidbey-related things handled (well, "mentioned" would be more accurate) during the course make great subjects for some blog posts. Also, I finally look forward to clearing my task backlog during the weekend. I'll be back.

Posted by Jouni Heikniemi at 07:43 AM | Comments (0) | General

March 13, 2005

Converting your girlfriend into a CSS zealot

Can't be done? Think again! Go check out John Allsop's blog and read his girlfriend's confession on sexually transmitted CSS knowledge, the effect of accessibility arguments and all the related stuff (now you couldn't help getting curious, right?). Categorized as "funny, worth the 10 minutes".

Apart from being funny, the story does remind of certain facts: First, things such as standards often look pretty distant and irrelevant. They may be, until you come face-to-face with them the first time. And the time will come, for all of us. After that, you'll see the flaws everywhere. No wonder the attitude towards standards is often either complete ignorance or religious zeal. There are no shades of gray.

Second, simple arguments often make the difference. The tables versus CSS debate has been hashed out often enough to wear out most of the key points made by the CSS camp. But I still love that PDA/cellphone point. It's so irresistible. It has certain charm - in spite of the fact that most modern sites never lure a single PDA visitor before their next redesign. Somebody ought to craft a "How to convince your manager on CSS benefits" discussion flow chart. I'll bet a pre-written script would work just fine 90% of the time.

Thanks for the original link go out to Heidi.

Posted by Jouni Heikniemi at 09:32 PM | Comments (0) | Web

Up next: Boot camp

I'm going to spend tomorrow and Tuesday on a course called ".NET 2.0 Boot camp" which should pretty much cover everything Whidbey in two days (20 hours total). Of course, the truth is that only a handful of new features will be decently explored, but I look forward to it nonetheless. While playing with Whidbey's new class library is quite doable at home, try setting up a Visual Studio Team System test environment with all those required three servers. Nah... This is where I love people setting up the sandbox for me. ;-)

VSTS is definitely the part I most want to hear about. I haven't much touched it so far, but in 48 hours I expect to be considerably more knowledgeable. VSTS certainly marks a change in our set of possibilities - the only question is: is it too late for most of us? The industry has already selected other source control applications, work item trackers, unit test frameworks and whatnot. It'll be interesting to see if VSTS can win our hearts. For new businesses, it will certainly provide many interesting alternatives for arranging the daily processes.

If there's still anything left of me after this, I'll be sure to post a recap.

Posted by Jouni Heikniemi at 08:00 PM | Comments (0) | .net

March 12, 2005

Domain specific languages

A colleague of mine pointed me at the presentation slides of Microsoft Architecture Days held during my holiday. Of particular interest for me was the presentation on Domain Specific Languages (DSL). The slides are in English, but the presentation itself (available as a video clip) is in Finnish. Although I had heard of DSL before, this was my first real introduction to the theme - and a good one!

In case you don't know what DSM (Domain Specific Modeling) is, here's a very short recap: It's all about about crafting a modeling tool (usually a visual one) for your specific need. It's different from UML-based modeling tools in several aspects. First, DSLs discuss the problem, not the solution. Instead of modeling with code concepts such as "This is a class", you model with domain terminology along the lines of "This is a login dialog". You then need a custom implementation - a code generator turning your model into a target language such as C# - for your modeling objects.

Second, the DSL has the semantics and behavior of your domain; if you're building a banking application, you use the terminology from the banking world. No, not just that; you use terminology from your banking application. This is different from most CASE constructs where the modeling language attempted to be universal, naturally killing much of the customizability. In the DSL world, you control every aspect of it. There won't be a 3rd party vendor stating that your dialog components must look like this, and the generation process can produce any artifacts you desire, ranging from compilable source to resource XML files and whatever.

Third, successful DSL cases have a common characteristic: All code is generated from the DSM, and there is no need for roundtripping (reverse engineering your code-level changes back into the model). You don't customize the end-language result - just as you don't hex edit the exe files produced by your compiler. You customize the generator to produce the right results.

Thus, domain specificity allows both a higher level of abstraction and more customizability. The drawback is, of course, that you need to tailor a language and a code generator for your purpose, and this is often a non-trivial task. It's not a particularly new issue though; for example NetHack has had a map-generation language implemented with lex/yacc for a long time. Most of us wouldn't touch those monsters though, so it's all about tools. Luckily, there's going to be DSL support in Visual Studio Team System - one more reason to wait for Whidbey. There are plenty of modeling tools for other platforms, so it's not a .NET thing.


So how does this affect my daily job?

The need for higher abstraction - describing problems instead of struggling with code concepts - is everywhere. Right now, we're resolving much of the issues by trying to build sophisticated class frameworks to support the development needs. It's an important path to take, but it certainly falls short. One of the problems is that as long as we work on the code level, we cannot avoid some of the necessary evils: even the best classes need instantiation, using directives, method calls, event handlers and whatnot. Another one is that even with a great framework, a lot of code is still needed for a complex application. Unless extreme discipline for code management is used, an application with a lot of code is rarely very readable. Also, domain-level problems ("Calculate the shipping costs for X") tend to vanish in the depths of code (lines and lines filled with logic such as "ShippingCostManager.GetWeatherVariables(x, WeatherController.CurrentWeather)").

Be it web page design, Bugzilla development, business reporting applications or whatever, this need for higher level modelling exists for most situations. DSM won't solve all the problems, and crafting your own language won't probably be the right way for simple one-shot applications. But for complex product families and software with long lifecycles, DSM has been reported to give 3-10 times increase in productivity. It's certainly a promise worth more investigation - the companies behind that sort of assertions were Nokia and Lucent, among others (see the presentation slides for more details).

For the working environment, DSLs could change quite a few things. It takes the expert programmers and architects to build the language, but thereafter much of the development is doable even with lesser technical skills. Used correctly, this might work as a way to get the client's domain expertise to a much better use. Possibly, it might even allow software companies to empower the visionary people with lesser technical ability. It can dramatically reduce the effort needed to guide new employees into the development; a problem-oriented development approach hides much of the technical details inherent in all complex applications. It could help architects to enforce the patterns they now distribute as documents and presentations; with a DSL, you probably couldn't do the wrong things.

It would be irresponsible to conclude such a positive post without a pessimistic note. There is no silver bullet. But objectively taken, Nethack's map generation language certainly makes much of a difference in creating action-packed special levels for the game. There, the small DSL used is an enormous time saver. If the language generation tools advance well beyond the infamous lex/yacc pair, the initial cost of using DSL could drop drastically. The concept of high abstractions and custom languages make most architects wet their pants in excitement. The need for this technology is evident. As I already said, it's all about tools.

Additional info: DSMForum.org, Microsoft DSL Tools page.

Posted by Jouni Heikniemi at 08:56 AM | Comments (0) | Misc. programming

March 11, 2005

Slowly returning to life

The winter holiday season is slowly turning to its end, and perhaps not a moment too soon. Not that I didn't enjoy all the traveling and stuff - I certainly did and the time off was great - but the backlog is getting ridiculous. My Bugzilla review queue is at its all time high, there are about 150 reasonably important emails to be handled and quite a few other things to do before I'm back in normal operation.

A few quick notes from my holidays (and the books I read):

Pelican Brief was decent. I still need to read a few more books from the Clancy/Grisham/etc. clan to be actually able rank them, but I look forward to it. Some light literature was a nice break from the routine. Well, how about the more routine IT stuff then?

Writing Secure Code is a must read for everybody in software development. Despite being Windows-oriented, many of the concepts described in the book contribute to a healthy generic understanding of many security issues. Fully reading and understanding the book would take weeks, but even a few hours of browsing it will give you many good ideas on improving the security and privacy aspects of your application. Definitely recommended.

UML Distilled, 3rd Ed. was another interesting read. I wouldn't embrace it as warmly as I did for the security book, but a good read nonetheless. The innate heaviness of UML gets thrown away on the introductory pages, and the book itself is a critical overview of the most useful elements of UML. Well, that's certainly something different from the usual reference-tomes. If you already know how to model things and want to do it in UML, read this. If you don't, you'll need additional guidance - this is not a "learn how to model using UML in 21 days" book.


And finally, a realization dawned during one particularly nice and sunny winter day: I really should blog more. It's not about being able to write more things for you to read, but rather about having the reason and the context to think and re-think the daily dilemmas in life. I spent years convincing myself that thinking is actually work at its best, and it's perfectly fine to spend time just milling things over (instead of producing an instant concrete result). Next I need to convince myself that thinking tools - such as this blog - are an equally acceptable way of promoting happiness and efficiency.

Posted by Jouni Heikniemi at 08:51 PM | Comments (0) | General