April 24, 2005

Paving the road for CSS layouts

Not like it's really news anymore, but the IE team published some details on IE7 beta 1. Oh all right, they only said it'll have PNG alpha support as well as CSS compatibility fixes, but at least that's something the web community wanted to hear.

Putting aside the discussion on whether the new version of IE will win back converts from Firefox and Opera, I find the most interesting aspect to be the new possibilities for CSS layouts. They are quite doable already, but with IE's enhanced support, even the most stubborn table-defenders are going to have to reconsider their position. With the bugs gone, CSS layouts will no longer be the tricky magic they've used to be. Rather, they'll hopefully be an everyday alternative for projects ranging from trivial to extremely complex.

One more important aspect: The slow browser upgrade/adoption rate has traditionally been a hindrance to the evolution of browsers. I believe this particular problem is subsiding. Here the key factor is called automatic updates. Predictions are dangerous, but I'll make one: When IE7 is out, most of IE6 will vanish in a relatively short period of time. We'll then have a set of IE5.x browsers (and some IE6s too, of course) running on older platforms with less aggressive update schemes than Windows XP - and a majority of users with IE7 or Firefox. Now guess which family of browsers is going to be the next Netscape 4?

Posted by Jouni Heikniemi at 09:38 AM | Comments (0) | Web

April 16, 2005

Localizing linked images in Word (and horrible interfaces)

A couple of weeks back, I was struggling with a set of Word documents that had external links to images (equivalent to the HTML img element). I desperately needed to get the image links broken and the binary image data included in the document. There were thousands of documents, so an automated solution was required.

Now, what's wrong with the following Word automation code (pseudocode, but it's the same in C#, VBA and whatever)? More specifically, why did it make the images vanish? For the record, practically the same code was experienced to work elsewhere.

foreach (Shape s in document.Shapes) {
  if (s.LinkFormat != null) {
    s.SavePictureWithDocument = true;
    s.BreakLink();
  }
}

It took us quite a few hours to figure this out, so I'll spill it for you. It's all about delays. When the links point to an HTTP URI, setting the SavePictureWithDocument to true seems to start a background thread that actually retrieves the images from the remote server. But since it does take some time (albeit very little), BreakLink is run and the document closed prior to completion of the download. Thus, the images disappear.

What made debugging worse was the fact that the code worked well when stepped in a debugger - the delays of me hitting the Step button were enough for the HTTP download to complete. I believe you can imagine the confusion until we figured this out. Though, in the end the ugly hack was easy: Just add a short delay (we used two seconds) before the BreakLink call. The correct fix would've been significantly more complex, so I never got that far for a one-shot application.

Now, which part of SavePictureWithDocument or BreakLink API documentation warns us about this behavior? You guessed it: None whatsoever.

Most coders aren't particularly famiiliar with threading and parallel computing issues. Any API that starts background threads without explicit request from the programmer is a risk to software stability. To some extent I can understand this in Word's case, but still, the behavior of the API is far from optimal. If the user has just set SavePictureWithDocument and then calls BreakLink, how likely it is that the user had just wanted to trigger the HTTP request but not use the results from it? BreakLink could quite well block in this case until the requests were done (although granted, this might be an issue with serious server lag). At any rate, a documentation entry on these issues would be highly welcomed.

Just another story from a land where lack of proper documentation (and surprisingly few hits from Google, too!) costs quite a few working hours.

Posted by Jouni Heikniemi at 08:54 AM | Comments (1) | Misc. programming

April 10, 2005

Code quality, part IV: Nameless abstractions are just complexity

I recently bumped into this:

foreach my $module ( grep( !/^Perl$/, $installed->modules() ) )
{
    if ($module =~ /DBD::(.*)$/) {
        $installed_DBD_modules{lc($1)} = 1;
    }
}

For the non-perl-literate ones: The code block runs through the list of Perl's external code modules and collects the names of installed database drivers into a hash table. We now focus on the grep( !/^Perl$/, $installed->modules() ) part, which is the enumeration we're looping through. The modules() call (from an external code library) returns not just the list of installed modules, but also "Perl" itself. The grep statement is there to remove this, thus allowing the code inside the loop to just concentrate on the external modules only.

I see the grep statement as an abstraction layer. It hides (or attempts to hide) the fact that the data source also returns perl itself, not just external code modules. However, as it is implemented currently, I view the grep part as being nothing but unnecessary complexity - removing it wouldn't affect the result at all as the in-loop regex wouldn't match "Perl" anyway!

This particular abstraction layer has one problem: It doesn't really abstract out anything, as people cannot understand its operation without parsing the line in their heads. Now, if the grep statement and its contents were pushed out into a method such as "GetInstalledModuleNames()", things would turn out entirely differently. Also, it would make the code more easily maintainable as changes to the filtering method wouldn't affect the call site (or perhaps several of them) anymore.

Therefore, naming is essential. Often, naming means extracting code into methods. Don't avoid it. It's not a sin to have a method that gets called only once. Neither is it bad practice to have a method with but a few - or even just one - lines of code. Code length is irrelevant, readability rules. Most complex expressions (regexes, heavy boolean conditions, non-trivial string catenations etc.) are detrimental to code readability if embedded in a block of other code. Not so when they're well secluded in their own little methods.

Rule of thumb: Never mix trivial and non-trivial code in a single method. "GetInstalledModuleNames()" is always easy to understand, but its implementation may range from very easy (like here) to quite difficult (do the same in C++ for example!). The name is what makes a potentially complex operation trivial.

Posted by Jouni Heikniemi at 09:10 AM | Comments (4) | Misc. programming