August 31, 2004

.net Framework 1.1 SP1 out

Microsoft .NET Framework 1.1 Service Pack 1 is out. It's available in two variants: one for Windows 2000/XP and another for Windows Server 2003. Too bad the knowledge base links on the download pages don't work, so the fixes listing is a mystery so far. The overview says: "The primary focus of Microsoft .NET Framework 1.1 Service Pack 1 (SP1) is improved security. In addition, the service pack includes roll-ups of all reported customer issues found after the release of the Microsoft .NET Framework 1.1. Of particular note, SP1 provides better support for consuming WSDL documents, Data Execution prevention and protection from security issues such as buffer overruns."

Oh, and there's a SP3 for .net Framework 1.0, too. Right here.

Edit next morning: The KB articles are out now and linked to the download pages. There's a list of 66 bugs in the Win2000/XP 1.1 SP 1 list. I never encountered any of them, but there are a few pretty serious ones included.

Posted by Jouni Heikniemi at 10:44 PM | Comments (0) | .net

August 29, 2004

Bugzilla and bug numbering per product

Every couple of months somebody pops up in the netscape.public.mozilla.webtools newsgroup and asks if he can make his Bugzilla number bugs so that the first bug for each product is #1. Most of the time the reasoning goes along the lines of "If a customer files bug 100 now and bug 200 a week later, he'll think the product is awfully buggy." Often, it continues with "Our bosses want the numbering to work that way".

Here are some of the recent threads: 1, 2, 3, 4, and here are some of the common answers:

Short and sweet: You can't do that with Bugzilla (unless you do significant changes to the codebase), and the team is not planning to implement support for that in the foreseeable future. If you absolutely must do it, have several installations of Bugzilla or be prepared for heavy-duty Perl hacking. That's really painful, I know. I'll start caring if you can convince me I'm wrong in what I say next.


The longer and even sweeter answer spans the rest of this blog post.

Any conclusions made from bug numbers are based on misconceptions. The only conclusion you can safely draw is that bug 2 was filed after bug 1 and before bug 3, but that's it. If you know the bug reporting patterns and habits for the organization using Bugzilla, you can make educated guesses. Unless you know, forget it. Bug numbers do not relate to bug counts in any relevant way.

Bugzilla has an extremely flexible query engine for a reason. You can use it to set up queries such as "Non-enhancement bugs in product X". That sort of queries are considerably better at measuring bug counts than pure bug numbers. You should tell your customers and bosses this. If they don't believe, retry after a while.

If they still don't believe, I have this one tip for you: Read the manual page on MySQL's AUTO_INCREMENT and make your bug numbers start from an arbitrary number with at least 7 digits. That greatly reduces the possibility that anyone would even consider them to represent any sort of counts. It's a surprise if they even notice any relation between the numbers of different reports.

Is fewer bug reports better?

The desire to control bug numbering is often a sign of more serious attitude problems related to software quality management. Before you address the question of bug numbering in your organization, consider the much broader question of bug filing culture first.

"Bugs" in Bugzilla are actually "work ticket suggestions". While they can be real bugs, they can also be enhancement requests, project management tokens or even help requests. Or duplicate reports thereof. In the big scheme, different products are just another source of randomness here. Should enhancement requests also get their own numbering?

The meaning of a bug number is to act as an identifying token that can be used as a reference both inside and outside the bug reporting tool. Binding the bug number to some of its characteristics has great inherent potential for confusion. The most obvious problem scenario rises when you have to move a bug between products. If you're absolutely convinced that will never happen you're either wrong or going to have little problem using separate Bugzilla installations anyway.

Sometimes - surprisingly often - a very fine-grained bug filing culture is best for the development. At my previous job we had a few projects where bugs were filed in great amounts. This wasn't an indication of the product's bugginess but of its complexity: As the number of code lines grew, the more importance was set on properly designing and splitting any task. The simpler the work items were, the easier it was to make sure they were done correctly. Usually that meant more small bugs instead of few big ones.

If your organization is struggling with bosses and customers that talk about needs for product-specific bug numbering, it is likely you still don't have real experience on systematic bug tracking. Don't make too many assumptions on your future bug-filing culture. While some organizations only add bug reports provided by customers, it's quite possible for a team to suddenly consider it beneficial to file their internal findings (usually tracked as TODO/FIXME code comments) in Bugzilla. If that happens, you have a choice to make: Do we allow developers to use this tool even if it makes the bug count grow? If not, what do we gain?


How do you want your bugs served - reported or hidden?

As a final word of caution, I'll say this: Even a query with the finest filtering can't usually replace professional human evaluation of the bugs listed. In the end, nobody cares about bug reports - it's bugs that people loathe. Take company A with a motivated testing team and an open bug policy. Take company B with the same product, no testing department and no centralized bug listing at all. B might have report counts that look much better than A's, but the illusion won't last long - A's product is bound to be better. Even if B suddenly got a testing department equivalent of A's, it would likely deliver a worse product - i.e. one based on less-informed decisions. The amount of bug reports correlates with the amount of information, not the amount of bugs.

If you strive to manipulate your visible bug material, are you doing it for the right reasons? Are you using the right methods? Closing your eyes or requiring certain severity for bugs "published" in Bugzilla doesn't make the issues go away. If you can't let your paying customers have an honest view of your product's state, forget about letting the customer have any access to your bug database and spend the resources on making your product better.

Don't shoot the messenger, shoot the bugs.

Posted by Jouni Heikniemi at 07:23 PM | Comments (2) | Bugzilla

Gmail is (nothing but) a good webmail

The people at Google seem to have a golden touch. They create a positive media hype with almost everything they do, including Gmail.

Every now and then they have lived up to the hype. But having used Gmail for some time now, I'm saying the biggest cool thing in it is the Google logo. Ok, you have decent threading and a lightning-fast search. However, I still have to see the person who actually uses webmail with enough mail to significantly benefit from these features. Gmail UI is good, but it's not good enough for me to switch over - I still need the richness of a native GUI app. Gmail's famous JS hacks are evolutionary, not revolutionary.

But to be honest, I don't think I've ever seen a better webmail. Gmail rocks, but it just doesn't rock enough to become an alternative to traditional desktop mail apps. Not for people like me.

Then again, if you're going to get a webmail account anyway and can get an invitation, I'll recommend Gmail anytime. Or, if you're a Linux geek, you'll certainly want it. Why, you ask? Two words: Gmail filesystem.

Posted by Jouni Heikniemi at 02:57 PM | Comments (0) | Web

August 27, 2004

Firefox as the default browser: opening issues

Firefox 0.9 has a bug that causes errors of style "The system cannot find the file specified" when urls are opened using the default browser (ie. just starting a new process with the url as programname). If you're using .net, System.Diagnostics.Process.Start("http://www.heikniemi.net/") nets you a System.ComponentModel.Win32Exception from System.Diagnostics.Process.StartWithShellExecuteEx. Scary, huh?

Luckily it's just a Firefox bug, bug 246078 to be exact. It has been fixed, but at the moment no release available has the fix (apart from nightly builds). There is a 3rd party registry hack that fixes the issue.

Slightly related, Firefox has recently started opening external URIs in new windows. This is being discussed in bug 172962, and right now it looks like Firefox 1.0 is going to have a user option which allows you to select the behavior between "Last active window/tab", "New tab in last active window" and "New window". Yay!

Posted by Jouni Heikniemi at 05:24 PM | Comments (0) | Web

August 22, 2004

.net String vs. StringBuilder - concatenation performance

Most people have a gut feeling about when to use StringBuilder for concatenation and when to just add strings together with the + operator. But what are the exact situations in which each of the approaches is better? When the question gets asked, people often give out overly simple rules such as "5 catenations". Is that really correct for the vast majority of cases? Of course, being the dubious me, I decided to test it and resolve the question once and for all.

The basic setting is this: StringBuilder.Append is faster than String + String. However, new StringBuilder() requires time. Now the question is: How many Append calls are required to have the speed benefit exceed the construction cost of the StringBuilder? Ultimately, the answer would be just one magic number. Unfortunately, in practice it isn't.

Here are the simplified conclusions. They shouldn't be taken literally, because situations vary and there's a code readability issue as well (most people read String + String more easily than sb.Appends). Regardless, for most cases these rules do provide the correct answer from a performance perspective.

  • If you have no idea on the resulting string size, use StringBuilder if you have at least 7 concatenations.
  • If you can roughly (with 30% accuracy) estimate the resulting string size, use StringBuilder if you have at least 5 concatenations.
  • If you can estimate the resulting string size with good accuracy, use StringBuilder if you have at least 3 concatenations.
  • Under no conditions is StringBuilder faster for less than 3 concatenations.
  • StringBuilder beats strings for 10+ concatenations in every practical situation.
  • The longer the strings are, the more final string size estimations will help you (but accuracy becomes more critical).

I don't expect you to believe me any more than any other information source on the net. But to back up my claims a bit, I'll discuss the background of these results next.

How do string concatenations and StringBuilder work?

String objects in .net are immutable. Once the string has been created, the value can't be changed. When you type s = s + "foo";, you actually discard the old s and create a new string object containing the result of the concatenation. When repeated several times, you end up constructing many temporary string objects.

StringBuilder, on the other hand, represents a mutable string. The class itself contains quite a few methods to change the contents of the string. This includes appending new strings to the end - the most common operation by far. Internally, StringBuilder reserves a buffer of memory which is used only partially at first (usually). Concatenations that fit into the buffer are just pasted in and the string length is changed. If the new resulting string wouldn't fit into the buffer, a new buffer is allocated and the old contents are moved in. In no case new objects need to be created.

The sore points of StringBuilder are the construction cost (which makes the "magic number" practically always at least 3) and the cost of allocating a new buffer when the resultant string would exceed the current buffer size. The latter one explains why the preknowledge (or a good estimation) of the resultant string size helps so much: StringBuilder can just allocate a sufficient buffer once.

Running the performance tests

Testing this is actually pretty simple. Choose a string operation, implement it using both ways and repeat sufficiently many iterations while measuring the execution time. There are basically two factors involved: the length of the strings being handled and the number of concatenations. Few real-world scenarios use a fixed amount of concatenations with fixed-length strings, so a very realistic test case would do real-world concatenations. However, constructing such a scenario isn't easy as it tends to adds non-string operations into the loops, thus messing up timing.

Pure string concatenation loops are very rare in any case, so even if you're able to speed up your string operations by 50%, it's very unlikely your software will speed up that much. The point here is this: if you want absolutely best performance, measure it yourself - in your real-world scenario. However, fair amount of testing on some of my applications has convinced me that the simple rules outlined above actually do hold up even with fairly varying material.

So, my test was essentially a loop of string concatenations with each iteration appending another string of predetermined length and content to a temp variable. I mostly varied the number of concatenations (iterations of the loop) to find out the cutoff point, but I also played with the string length. All tests were repeated 10 million times by an outer loop to provide better sampling. Everything was run on my AMD Athlon 2800+ with 1 GB of Memory, XP Pro and .net Framework 1.1.

The following source snippet shows the basic versions of the testing loops:

// String version

string s2 = new String('x', Int32.Parse(args[0]));
int loops = Int32.Parse(args[1]);

for (int j = 0; j < 10000000; j++) {
  string s = "";
  for (int i = loops; i > 0; --i)
    s += s2;
}

// StringBuilder version

string s2 = new String('x', Int32.Parse(args[0]));
int loops = Int32.Parse(args[1]);

for (int j = 0; j < 10000000; j++) {
  StringBuilder sb = new StringBuilder();
  for (int i = loops; i > 0; --i)
    sb.Append(s2);
  sb.ToString();
}

The extra ToString call at the end of StringBuilder version is there to level the field for the approaches: the first one's end result is a String, so it should be the same for the last one as well. Leaving that ToString out had a marginal effect on the results: while it did make a 8% difference with a single concatenation, the effect quickly died as the number of operations increased.

Finding the magic number

I started with 10-character strings, running from 1-50 concatenations (each repeated 10 million times as outlined above). The result is the chart below, displaying the relative execution times against the number of iterations (1-15). Absolute execution times aren't shown since they're hardly relevant.

gr1.png

The blue line is the performance of the pure String approach. It looks linear at first sight, but it isn't. If the String approach had to allocate space for X chars (where X is the length of the string being added, 10 here) per loop iteration, the time requirement would grow in a linear way. However, the amount of memory needed - and also, the amount of existing data being copied to the newly constructed string object - increases with every iteration. For Nth iteration, the String version allocates space for N*X chars. Thus, every iteration is slower than the previous one, and the String time curve steepens quickly as N grows.

The red line is StringBuilder at its basic settings. If you add a trendline, SB actually performs fairly linearly with increasing N*X. The bumps in the line are caused by the buffer allocations. Now, knowing how StringBuilder works in .net helps here: The default buffer size is 16 chars, and it's doubled each time it overflows. Remembering that X is 10 here, it's no big surprise that the bumps appear at 2 (after 16 chars), 4 (32), 7 (64) and 13 (128) iterations.

As you can see here, the first time the SB result is below the String result is at six concatenations. However, the memory alloc bump at 7 concats makes SB again slower than pure strings. After that, however, the results are clear. Even though the bump at 13 catenations is considerable, it's nevertheless much below the blue line. However, the exact figures aren't relevant: the bump locations are much tied to the amount of chars gathered so far. However, with most normal strings the cutoff point is somewhere between 4 and 8.

The power of estimations

The green line represents a StringBuilder initialized to the size of the final string (using the StringBuilder's int-taking constructor). As you can see, this is the fastest approach by a very clear marginal. And, as you can see, the cutoff is at three catenations! The obvious drawback here is that you have to know the buffer size beforehand, which you usually can't do. For the cases you do know it (such as this simple fixed-length scenario), it's blazingly fast. At 50 catenations with 10-char strings, it's 550% faster than pure String-based catenations and 35% faster than uninitialized StringBuffer. The differences tend to grow as the size of the data increases.

The good thing is this: even a rough estimation of the resulting string size helps. If you overestimate the string size, you're allocating extra memory, but you're avoiding mid-loop buffer expansions. The extra memory allocation will slow you down at some point, but the effect may be negligible. If you underestimate the string size, you're going to have a buffer operation at some point. However, it's very likely you've still skipped early reallocations.

For example, if you're generating a 150 char string in 10 char increments (but you don't know these characteristics beforehand), initializing the StringBuilder with default values causes four buffer reallocations (16 -> 32, 32 -> 64, 64 -> 128, 128 -> 256). While initialization to 150 (or any larger value) would avoid the allocations altogether, even an initialization to a rough estimate such as 100 will help: you'll have only one realloc happening.

The moral of the story: Estimate whenever you reasonably can. Even a bad estimation will usually provide 10-20% benefit over a StringBuilder constructed with the default values. However, if your strings are very long, you'll want to read the following chapter first.

The effect of the string length

How about string lengths? Varying the string component length (X above) with a default StringBuilder has actually pretty little effect. For fairly short strings, the cutoff point is usually a bit lower, but this is largely caused by the fact that more short strings fit into the default StringBuilder buffer of 16 chars. However, the absolute gain here is usually irrelevant since the concatenations on short strings are very fast regardless of the method used.

The pure String-based concatenation slows down as the number of chars in the string grows. The worst scenario is many additions of short strings at the end of a long string. For example, when 2 chars get added at the end of a 500 char string, 99,6 % of the memory allocated is for the old part of the string. Duh!

For StringBuilders, later buffer reallocs are slower, of course. More memory needs to be allocated and more old content needs to be moved around. So, the longer your strings become, the more you'll gain by estimating. For 50 catenations of 50-char strings, a perfect estimation gets you a 50% speed benefit over a StringBuilder with default settings!

However, there's a catch. As the memory allocations grow, the significance of your estimation accuracy plays a bigger and bigger role. Suppose we have the previously discussed 50x50 char string, resulting in 2500 bytes of final size. Now, the following table lists the execution times with different estimations. Times are relative to the default settings, so that the default is indicated by 100%; smaller figures mean faster execution (less time).

Initial buffer sizeTime
16 (default)100 %
5097 %
200088 %
2499104 %
250049 %
300053 %
400062 %
5000103 %
10000268 %

As you can see, if you can guess the final size of the resultant string, you're very fast - only 49% of the default execution time. However, make the buffer one byte too small (2499 in this example), and you've just ruined your performance. Adding the last element doubles the buffer to 4998 bytes, which has quite a lot of overhead in it. In the other direction, even a 60% overalloc at 4000 bytes is pretty fast (only 62% of the original execution time). Unfortunately that costs memory, and with strings at the sizes of several megabytes, you probably can't afford that luxury.

On the other hand, you also saw that also slight underallocation wastes RAM eventually. Neither is the default approach perfect: always doubling the buffer tends to allocate extra space, too. So, slight overallocation might be both the fastest and the most memory-sparing approach unless you can do a perfect estimate.

Guessing is hard, but luckily the consequences of a bad guess aren't usually catastrophic. If you can avoid massive overallocation, you're not likely to do much worse than the default settings. In any case, the execution time without StringBuilder is 712% on the scale above; it's pretty unlikely you could do worse than that. :-)

Conclusions

StringBuilder performance is a tricky thing. In the last chapter you saw that the StringBuilder with perfect size estimation can be 15 times faster than normal string concatenation. But earlier in the article you also saw that even the default StringBuilder beats normal string catenation by a clear marginal once the cutoff point of 4-8 concatenations is passed.

Except for the most critical string handling loops, optimizing the process to the point of making perfect estimations isn't usually worth it. For reasons of code clarity you might even want to avoid using StringBuilder when the amount of concatenations is only slightly over the cutoff point and you're working with an operation that's not critical to the millisecond level. For example, constructing a ten-part SQL statement is likely to be faster with StringBuilder, but the speed difference is negligible when compared to the execution time of that statement. Though, once you become familiar with the StringBuilder class, you'll be reading sb.Appends just like you read plus signs.

Posted by Jouni Heikniemi at 02:30 PM | Comments (35) | .net

August 21, 2004

Money for bugs

Microsoft's Gunnar Kudrjavets mentions the Mozilla Security Bug Bounty (get $500 by finding and reporting a security bug) and ponders the effect of money in the bug reporting process. Naturally, Gunnar's concept of a bug-finding competition with some teams getting money and some not is pretty far from everyday life. In truth, few people search for bugs. The reward would have to be really disproportionate to change that. But if you bump into a bug, the additional incentive just might make you report it.

Thinking of it in terms of probability, if your chance of getting the bounty is 1/100 per hour of work done, the expected value is $5/hour. Not many people would bother - at least not those with enough ability to actually go looking for security bugs. However, once you've discovered something you think that could qualify as a security bug (by pure chance, in your daily use), things change. If you can write the bug report in half an hour and there's a 50-50 chance of getting the prize, you've just netted yourself an expected value of $250 in 30 minutes. Most of us would take that opportunity.

So, paying for bugs mostly encourages reporting, not finding. However, from the software developers' perspective (and for a product with a sufficiently large user base) those two things end up being very close - it's likely every bug will be encountered by someone, and it's just a question of which ones get reported.

To return to Gunnar's original thoughts on the competition: I'd like to see it tested as well. I'm not certain money would make that much of a difference. The people most adept at finding serious bugs are probably more thrilled by the competition than possible money involved. Thus, my hypothesis is that if Gunnar's teams are top-of-the-field in the technical sense, money will be less of an object. If the teams consist of people who are less driven by ESR's hacker attitude, money might affect motivation enough to make a difference in the results. And of course, in the long term, if we're talking about how much QA organizations get paid for their daily job, money will become a motivating factor even for the technically most advanced teams.

Posted by Jouni Heikniemi at 10:08 AM | Comments (0) | Misc. programming

August 19, 2004

Trapping Enter key in Windows Forms TextBox

Suppose you want to do something special when the user hits enter in one of your Windows Forms app's TextBoxes? "Easy!", you say, thinking about hooking up a KeyDown event handler - until you try it and find out it doesn't actually work. And then you go Googling, just like I did earlier today.

And yeah, it's true, you can't catch keys reserved for form navigation with KeyDown or KeyPress events. Well, that blows. Of course, a quick search of the web turns up quite a few workarounds, so you're saved. Since I figured this out already, let me save you some time.

When a key is hit in a TextBox control, a method named IsInputKey gets called on the control. That method takes a Keys enumeration as a parameter and returns a bool - true if that particular key is an "input key", false if not. False also means that the form handler will take care of (or ignore) the keystrike instead of passing it over to the control. And you guessed it, TextBox.IsInputKey returns false for Keys.Enter (this is different if you have a multiline textbox and AcceptReturn enabled, but let's not go there now).

So, your problem is solved once you make your TextBoxes accept enter as an input key. The answer is subclassing the textbox, i.e. creating your own control. That's not really as bad as it sounds. In fact, you can get away with just this:

public class EnterTextBox : TextBox {
  protected override bool IsInputKey(Keys key) {
    if (key == Keys.Enter) 
      return true;
    return base.IsInputKey(key);
  }
}

Now start using your new textbox controls. If you're using no IDE at all or you have Visual Studio, just go and substitute new EnterTextBox() for new System.Windows.Forms.TextBox(). VS's Form Designer seems to handle this quite nicely. If you're using #develop, you can't use that shortcut - #develop forms designer will wipe away your textboxes. Luckily your EnterTextBox will have appeared in the Custom Components section of the toolbox, so it's only a matter of dragging some controls.

As you can guess, the following event handler now works properly:

void TextBox1_KeyDown(object sender, KeyEventArgs e) {
  if (e.KeyCode == Keys.Enter) {
    e.Handled = true;
    MessageBox.Show("Enter hit in textbox1!");
  }
}

Remember to set Handled to true to signal the textbox that the keystroke has actually been dealt with already.

If you did the Googling part, you probably saw there are a few other solutions to this as well. One of them involves overriding ProcessDialogKey, and another common one is overriding the WndProc handler. They both get notifications for enter hits regardless of what IsInputKeys returns. If you have a static definition for what enter should mean (for example, always clear the field), you could use one of those approaches. However, if the action required varies by field, you should use the solution above to override IsInputKey instead - that way you can hook into the KeyDown event to customize behavior on an instance-by-instance basis.

Posted by Jouni Heikniemi at 09:29 PM | Comments (18) | .net

August 16, 2004

Continuous integration

Yet another article (this time by Martin Fowler) preaching on the benefits of continuous integration. For those of you not familiar with it, it's a software development principle that says you must have an automated build process that verifies the functionality of the source code several times a day. So, at all times, code in your source control must compile and must pass the predefined simple tests (like the application must start properly etc.). If it stops passing, recent changes get backed out until your build works again.

It was only couple of years ago when I first heard about continuous integration - and was shocked. "Not everybody is doing this already? Duh." For me, having a continuously working build of the source has always been a natural approach, and working on Bugzilla (and with Mozilla project's autobuild tools such as Tinderbox and Bonsai) just strengthened the habit.

Even if you don't have an automated build process, get rid of the separate concept of "integration". Every time you check in to your repository, make sure your build works. With a small team, it works pretty well even without the automation as long as everybody is feeling responsible and checks in some code every day. And if you haven't read anything on continuous integration yet, go check out Martin's article. It's good, even though it contains little new for those already familiar with the concepts.

Posted by Jouni Heikniemi at 10:04 PM | Comments (0) | Misc. programming

August 15, 2004

Spam in blog comments (on MovableType)

Gerv blogged about trouble with comment spam in his weblog. I think most of us have suffered from this at some point. IP blocks tend to help with one group of spammers, but also weed out some possible real commenters. Also, IP lists are a maintenance headache.

Googling for other solutions (for the MovableType engine specifically) reveals there's a really varying bunch out there; ranging from referrer based blocking to url-based blacklisting. Including image-based keywords that a user must type before being able to add the comment. And a miscellaneous set of seven tips to avoid spam.

I'm pretty certain many of those are applicable and do their job, but so far, I'm also thinking many of them are overengineered. My theory is that there are so many MT based blogs out there that almost any non-trivial customization is sufficient to thwart most of the comment spam. Even if you just added a text field saying "Type 'foobar' into this field: [ ]" and required foobar on comment submission, I don't think most spammers would bother to create a custom rule just for your blog - it's easier to spend the energy looking for new blogs. Perhaps even just renaming mt-comments.cgi helps?

Anyway, suffering from the problem myself, I decided to start from the easy end of solution spectrum: I added a hidden field and a single-line check for its content in mt-comments.cgi. A few days into it, I haven't seen spam yet. See an older Burningbird blog entry for details.

It's quite likely that at some point an MT spam script will parse the HTML form and fill my hidden fields correctly. But that's ok, I've got a bag of tricks left even with the hidden field stuff, and once that approach is done with, I'll just throw in more logic. But until I get my first post-hack spam comment, I believe a simple solution goes pretty far here. Whichever approach you pick, the point is doing it yourself: any out-of-the-box solution will be worked around because the gain is big enough. Any personal solution is much more likely to be left alone.

Posted by Jouni Heikniemi at 09:17 AM | Web

August 14, 2004

Browsing Whidbey assemblies

What's new in Windows Forms for Whidbey? What did they change for System.XML? How about the ASP.net object infrastructure? There's a good amount of Whidbey related news out there (particularly in the MS blogs), but for another kind of view try the .Net2TheMax .NET Browser. It's a web site that allows you to wander through Whidbey Beta 1 assembly hierarchy and see what has changed or been added since 1.1. For example, take a look at this list of changes in the string class.

The UI is not perfect and you have to have an idea of the internal structure of .net (for example, most of the interesting stuff in System namespace lies in the mscorlib instead of the more logically named assembly called System), but a little bit of exploring will certainly give you an idea on what's new - and it'll also give you good keywords for Googling for further information. Happy hunting!

Posted by Jouni Heikniemi at 08:49 PM | Comments (0) | .net

August 10, 2004

XP SP2: Web related changes

If you're a web developer, you'll definitely want to browse through Microsoft's document of browsing-related changes in XP SP2.

For starters, there is the IE popup blocker. Popups will be allowed if opened by a link clicked by the user, but other than that, no more JScript window.opens for untrusted domains. By the way, a likely side effect is a rush towards interstitial advertising ("you will be redirected to the requested content after watching these ads for 5 seconds") as popups start losing their effect.

An equally drastic change is the new set of limitations on popup window size and positioning. Popup windows can no longer extend above the top or below the bottom of the parent control, and they must overlap the parent window horizontally. They will stay with the parent window if the parent moves, and they will always appear above their parent windows - so no more hiding or faking dialog boxes. I'll just quote some of the most important other rules:

"Windows that are outside the viewable screen when they are opened are positioned onto the viewable area."
"Windows that are larger than the viewable screen when they are opened are resized to the viewable area."
"Scripts cannot move a window off-screen"
"Scripts cannot resize windows such that the title bar, address bar, or status bar cannot be seen."
"When creating a window, the definition of the fullscreen=yes specification is changed to mean “show the window as maximized,” which will keep the title bar, address bar, and status bar visible."
"Internet Explorer has been modified to not turn off the status bar for any windows. The status bar is always visible for all Internet Explorer windows."

In addition to popups, security zones have a bunch of new tweaks thrown in. Add-ons can be easily disabled. And to introduce another breaking change, "the Internet Explorer will now attempt to rename downloaded files in the Internet Explorer cache to have matching content types and extensions to protect against files that mislead the user about their type."

Most of the changes are pretty irrelevant for legit applications, but in case you had a web UI that was designed to be shown in a small window and then open up image thumbnails (or news articles or whatever) in a larger popup, you're probably in for a redesign - preferably a quick one. Another likely breaking scenario involves quirky webmail apps relying on setting MIME type to application/octet-stream to force download (instead of inline opening) of attachments.

It'll be interesting to see the amount of web apps that'll blow up when SP2 starts spreading. And it feels so good to have Firefox installed. :-)

Posted by Jouni Heikniemi at 08:58 PM | Comments (0) | General

August 09, 2004

XP SP2 for developers

Short and sweet: Windows XP Service Pack 2 is out. Here's the lengthy list of changes and here's some documentation for developers.

Posted by Jouni Heikniemi at 10:50 PM | Comments (1) | General

A view on bug report voting systems: 0/1 vs. 0-x

MSDN Product Feedback Center Blog explains how bug votes and Dr. Watson reports affect bug priorities inside Microsoft. The story discusses the differences between "this is important for me"-style vote (0/1 votes) versus the current MSDN style (where you can give a bug an importance score of 1..5 - or a 0-x voting system from a more generic perspective).

The aim of both approaches is to get a community opinion on which bugs are the most critical ones. Dr. Watson's error reports (just like f.e. Mozilla Quality Feedback Agent's similar ones) act as 0/1 votes - either a user experiences a crash or not. So, for crash bugs, Watson measures the amount of occurrences in the user base. For non-critical bugs, a good measure is harder to find, since you usually cannot get automated problem reporting.

A 0/1 vote on bugs is slightly problematic, because sorting by the vote count doesn't really tell you about the severity of the issue. An enhancement request to a really visible part of the product can bypass a dataloss bug in a less popular module. Even if you trim out enhancements requests (RFEs), you're still left with a pretty flat view of purely quantitative measurements.

The bad part here is that a 0-x voting system is usually no less troublesome. While you do get qualitative opinions on the importance, they're likely to be misleading. Most normal people vote maximum importance on any bug they've encountered regardless of its objective severity. A particularly serious bunch of reporters can probably constrain their voting habits a bit, but with big publicity products 0-x voting easily degrades into a 0/5 system.

Don't believe that? Go to PFC and search for bugs reported in last couple of days, filtering by "rating 5". As a result, I noted that 33 of 88 bugs are of rating 5. Sure, at least 20 of those drop from the 5.0 rating because somebody else considers them irrelevant and votes 1 - but not all of them. People aren't very active at downvoting reports they don't consider important, even though (at least theoretically) the best way to make your own bug report stand out would be to vote 1 on every other report.

Eventually the system ends up with numerous irrelevant reports at the top of the list, and loss of confidence in the whole rating system. How about filtering by vote count then? Sure, but you'll be losing information _and_ gathering a base of rating 5 bugs which you'll never see on your radar. Doesn't really look good from a customer perspective.

Oh all right, I'm pretty pessimistic here. Still, past years have strengthened my belief in one model: 0/1 voting and an easy way to post verbal comments. You'll get whining ("Fix this, it's the main reason my friends won't use X!!!"), but the freetext entry also gathers the most informative views and the best arguments - that's far more effective and credible qualitative analysis than a 0-5 vote. It's harder (less automated) for the developers, but definitely better for the product.

Posted by Jouni Heikniemi at 09:32 PM | Comments (1) | General

August 07, 2004

ISV Buddies: towards a more open Microsoft

Earlier in the summer, Microsoft launched ISV Buddy Program - that is, a program that allows Independent Software Vendors to get a contact person inside Microsoft. The contact person is an "insider", providing the ISV a quick access to resources and inside information. For Microsoft, it's a sort of extended support and of course, good publicity. And of course, it gives them plenty of information about their customers - a valuable asset indeed. Somasegar's blog entry gives some additional views on the subject (from a MS perspective).

Again, Microsoft deserves some praise on this move. Although there is fairly limited experience of the program so far, the concept is pure gold. Even without a formal program, many of us have built good person-level contacts with employees at clients, subcontractors, administration - and even big software corporations like Microsoft. And all of us probably understand the importance of such relations - or at least you'll do once you've received some help for solving your tough problems.

Another view of this is "In Open source projects, you don't have to find yourself an insider - you can be one yourself". That's true, and a really good point. For some tools, becoming the insider and the professional yourself is the best solution. However, for other pieces of software, you just can't invest the time to get yourself all that knowledge by yourself. For many pieces of open source software you can find decent support, but for most of them, you can't find reliable support. At least for free.

The debate between open and closed source aside, any step Microsoft takes towards personal contacts and responsible customer support is good. Even though the issues with closed source remain, it's another step towards a much more open approach. And to sum it up, I'll make a bold claim: for most developers, the key benefits of open source lie more in the open development process than the availability of the source itself.

Posted by Jouni Heikniemi at 08:53 PM | Comments (0) | .net

August 04, 2004

The pain of big code reviews

Some ranting on code reviews in general:

Bugzilla's bug 185090 is nearing completion, and I certainly hope I've finished my part by finally granting review to the 70 kbyte patch. To the no less than 13th iteration of the patch. For those of you not familiar with the Mozilla family review process, a short recap: Every change to the codebase must be reviewed by at least one designated code reviewer. The review is usually conducted by looking at the cvs diff ("patch"), pointing out issues and discarding the patch until its clean.

The patch I finally r+'d was far from the biggest in Bugzilla (or even my) history. Nor was 13 iterations an extreme amount if we compare to the general group of patches for major enhancements. Yet still, doing line-by-line review for 70 k of code several times in a row is hell on earth. It's always easy to come up with lots of comments for the first patch, but after several iterations most people (including me on most days) simply cannot focus enough to effectively review the code as a whole time after time again. You just become numb.

Reviewing interdiffs ("patches of patches") doesn't work very well for larger changes except when the interdiff is trivial. It's extremely easy to miss issues that way, and reviewing code readability in context is hard if not impossible. So in the end, the only way to effectively review is by looking at the patch, testing, and looking more at the patch.

Of course, there's no perfect solution. But having done quite a many fairly big (100+k) code reviews both for Bugzilla and my former employer, I'm pretty certain there are very few features that require such amount of new logic in one patch. So once again, the key is small iterative changes . Apart from find/replace changes such as renaming identifiers, almost every change can be split into a few easily reviewable patches (10-20 k is pretty nice).

But if so, why don't people split their patches more eagerly? For many development cultures, I think it's a matter of false beliefs about time. Getting a review from your collegue (at work) or someone from the dev group (for Bugzilla/Mozilla/etc. development) can take days, weeks at worst. "Well, in that case it's most effective to have them review as much as possible in a single run, right?" No.

Long review queues are, to a large extent, caused by the fact that a thorough review of a 100k patch takes at least a couple of hours. Since you can't allocate that big a time slice easily, you tend to slip on reviewing. But it's usually far easier to find time for half a dozen 30 minute sessions than to allocate a single 2-hour slot - and it's very much easier to go through half a dozen 20k patches than a single 100k one. Also, let me remind you that the amount of iterations required for a positive review rises very sharply in relation to the patch size.

To play some number games, assume that a reviewer will be able to go through five 20k patches or one 100k patch in a week. Assume that a feature can be implemented either in a single 100k giant or iteratively in seven 20k patches. And finally, assume that it takes 6 iterations for the 100k patch to be ready, while a 20k patch can be checked in after getting three rounds of review. Well, getting 6 reviews for the 100k patch takes 6 weeks. Getting 3 reviews for seven 20k patches each takes just a bit over four weeks (3x7/5)!

The example above is pretty conservative. In practice, I'd pick _ten_ 20k patches anytime instead of a single 100k lump. Also, the 3-6 balance in the iterations required is unrealistic: the difference is usually more drastic. But I guess this shows the point. Next time somebody asks for review on a big patch, my first aim is to find a reason I can deny review based on a "You can do this in smaller steps" type argument.

Ps. At this point, it's fair to admit I didn't come up with a decent way to split the patch that started all this ranting. But for many megabugs the answer is pretty apparent, and those are the ones that should get hacked into mincemeat.

Posted by Jouni Heikniemi at 09:35 PM | Comments (0) | Misc. programming

August 01, 2004

New string stuff in Whidbey

The .net framework 2.0 comes with quite a few exciting new features. One of the less visible, yet very useful enhancement is String.Split's ability to split by a group of strings; in 1.1 you only could split by single characters. In 2.0, you can say "a=b and c=d".Split("and").

You can also give Split a new StringSplitOptions enum parameter where you can specify options related to splitting. The only option existing at the moment (2.0 beta 1) is RemoveEmptyEntries. Naturally it just removes empty elements which occur if separators immediately follow each other. I'm pretty surprised the framework team didn't bother with "CaseInsensitive" and "TrimResult" options, too: CaseInsensitivity exists in many string functions now (with 2.0 the support got added to StartsWith and EndsWith, too), and trimming all the split results is often useful when splitting natural text by words (which tend to have spacing around them).

As it is, case insensitive string splits must still be done using Regex.Split, which of course allows you to make the Regex case insensitive - and it's easy to make the Regex ignore the spacing as well. The following code sample is a good illustration:

string text = "a=b OR c=d and e=f";

text.Split(new string[] { "and", "or" }); 
// produces "a=b OR c=d ", " e=f" (note the whitespace) 

Regex.Split(text, "\\s*(?:and|or)\\s*", RegexOptions.IgnoreCase);
// produces "a=b", "c=d", "e=f"

Regular expressions save the world again, but a properly equipped Split would be so much easier to use...

For a listing of other new stuff in the framework class library 2.0, see the article on BCLTeam blog.

Posted by Jouni Heikniemi at 10:35 PM | Comments (0) | .net

Transitions

The last few days have been nothing short of hectic. It was to be guessed, though: Final tasks at the old job, preparing for the new one. My medium-scale farewall party was fun - and included a monster-scale swiss schnitzel (in-baked with ham, cheese and feta - and yes, those two are lemon halves). And after this one day of rest, it's time to start all over again, in an organization I barely know, mostly working with people I have never even talked to, doing things that are familiar to some extent, but nevertheless pretty different from what I did before. Well, at least it's a challenge.

But the relentless flow of IT news doesn't pause simply because I'm busy. There's a good deal of new, interesting stuff to delve into. From the top of my mind: Comega (some call it C# 3.0) compiler preview, Dundas Chart 4.1 beta, Bugzilla 2.18rc2... Uh.

Posted by Jouni Heikniemi at 08:32 AM | Comments (1) | General