I've spent a few days this fall studying the technological dimensions of the new Office 2007 family - which, incidentally, went RTM on Monday (no binaries yet!). Not being an experienced SharePoint developer myself, immersion into the world of Microsoft Office SharePoint Server (MOSS) has been quite a trip.
Many oldschool SPS developers seem enthusiastic about the new possibilities of MOSS. I share some of that; the product certainly shows a lot of potential. But I'm getting more and more worried about the developer support that's going to available when the first really ambitious projects hit the shore. For example, I was trying to add a custom entry to a popup menu - a well-supported and documented development scenario. Googling with the strongly related term, ECBItem, returned two results. Neither of the documents helped. And zero newsgroup matches.
Also: I've attended some of the most esteemed Office 2007 courses held in Finland so far. For the most part, the instructors haven't been able to help with many of the practical development questions. There are white papers and even books around, but practical information is far less common.
The examples above are just a few isolated incidents, but they do tell a story: When developing with MOSS 2007, it's easy to stray into areas where you're reasonably alone. Sure, Microsoft's support will give you a hand and intensive Googling may help - but you're also exposing yourself to a lot of beta-level - or even SPS2003-age - material. Most of MOSS potential is relatively easy to unleash, but stretching its limits requires a lot of effort at the moment. It's easy to underestimate the amount of work required for some tasks - beware! The new world does come with its own dangers.
Not like it's anything new, but I'll repeat it anyway: There's much too much going on in Microsoft world. Of course, at the time of PDC05 all the news channels naturally pick up... but still, this is getting ridiculous. I'm a fast reader and in a position where information flows are rather organized and readable - but I still have serious trouble staying even remotely up-to-date with everything going on in the Microsoft software development world.
There are plenty of Whidbey things I still don't fully grasp - well actually, that's true even for .NET 1.1! And what is Microsoft doing? Out comes another set of new APIs, a few standards extensions, WPF (former Avalon), WCF (former Indigo) and a load of other stuff. Well let's not forget IE7, Office 12 and of course, Windows Vista.
There was a time when saying "Those who know Microsoft's new products don't understand a bit about how to use them" was mostly backed by envy. But... these days I'm starting to believe it. To fully leverage an enormous technological wave such as Whidbey you need to have a huge amount of knowledge - and experience - under your belt. Actually understanding the possibilities of WPF (just as a single example!) requires a lot of time - time you can't spend in real projects.
Perhaps there's still a little slice of envy behind the words. Of course, it's nice to be on top of the technology geysir. On the other hand, most developers aren't going to be. The majority of programmer job applicants still don't have managed code experience. The other extreme is writing their first WPF applications. We're going to have an "impedance mismatch" far greater than the one the LINQ project is trying to solve.
I've always known bad code bites you back in a number of ways, but this one was new. See the following code snippet:
string lastHeading = DateTime.MinValue.ToShortDateString();
foreach (Item i in GetItems()) {
if (i.Date.ToShortDateString() != lastHeading) {
lastHeading = i.Date.ToShortDateString();
Console.WriteLine(lastHeading);
}
Console.WriteLine(" " + i.Text);
}
So what does it do? Quite nicely, it takes care of printing date headers for printed items in date order. That is, under the heading "2005-09-14" are printed the entries related to that particular date.
Now, you could ask, "What's wrong with that code then?" Yes, that's not entirely obvious. The issue here is that comparisons related to the "lastHeading" are string comparisons when in fact it's DateTimes being compared. Thus, this example violates the rule of "Always use the most correct and exact data type". However, the current approach could also be argued for by saying it's actually the "heading semantics" that rule here, thus making strings the correct data type - but I won't be going there. Not now.
Well, why did this sort of structure from somebody's code fail on me today? Believe it or not, it was because of the initialization to DateTime.MinValue.ToShortDateString(). Now, there's nothing fundamentally wrong with that - it's a surefire way to force the header to be printed (although setting the string to empty or null would be more logical here if we consider this to be purely string data).
Well, we got an ArgumentNullException with an interesting error message. Why's that? Because somebody ran the software using Arabic culture (ar-SA), which uses Hijri calendar system. Unfortunately Hijri calendar doesn't have a way to format anything as low as DateTime.MinValue. Uh... What a really awkward reason to fail.
The moral of the story: Date operations (DateTime, TimeSpan et al.) are by nature culture independent. Display formatting for them rarely is, so never assume too much. And use the correct data types.
People just don't get it. Every week I bump into a living example of a person who has enormous misconceptions on what's going on in the web development world. If you can read Finnish, take a look at 2kMediat.com's ASP.NET tutorial. Among the text, it is stated that (translation mine) "The most important difference between ASP and ASP.NET programming comes from a single issue: ASP.NET doesn't allow using several programming languages on the same page."
Apart from the fact that it's technically incorrect, it shows a fundamental misunderstanding of the whole technology palette that .NET represents. "The most important"? How many of you guys ever even wrote an ASP page with several languages embedded? How many of you even knew it was possible? And these are the guys posting tutorials...
My point? Microsoft is pushing out .NET 2.0 with the new ASP.NET 2 platform. We have another great wave of great technology coming about, but most people don't even faintly grasp the dimensions of the current .NET Framework. My prediction is that this will have two key consequences:
1) The rift between hobby coders (most commonly using PHP) and the professionals (ASP.NET, modern Java) will be broader. As shifting from a hobbyist status to a web coder pro will be harder, there will be increasing demand for proper education. Since it won't be sufficiently available, many professional solutions will be created using inappropriate tools and lacking knowledge, falling back to what we saw in today's example.
2) Even though the chasm is wide, many will jump. We'll have loads of "experienced ASP.NET coders" who have no idea on what they're really doing. The control model provides more than enough tools for creating a total mess - The Daily WTF will certainly thrive and Microsoft's platform will get the blame for bad code.
Mr. Time, please prove me wrong.
While visiting the Assembly event, I also paid a visit to Microsoft's sponsor stand demonstrating the Visual Web Developer 2005 product. As many of you already know, VWD is a light "Express" version of Visual Studio specifically aimed at ASP.NET site development. For now, it is still unclear whether the express versions will be just cheap or totally free when they're finally ready, but at least the betas are freely downloadable from Microsoft's web site.
VWD is great. I've seen several demos of it and built a few web apps (for testing only, so far) with the full Whidbey Visual Studio. It rocks. However, there's one key problem with VWD: While you can construct rather complex web applications with ease that surpasses many of the competing technologies (PHP, Servlets), it's an entirely different task to get them hosted! Of course, you can get your own server on the web, but unless you can sacrifice your home computer or pay for a dedicated machine, it's far less convenient.
Now that's not saying there wouldn't be ASP.NET hosting services available. There are, but the selection is rather small and starting the service is far less convenient than for the mainstream hosts. This is locally compounded by the fact that ASP.NET provision (for consumer-payable prices) is pretty much unavailable in Finland, and shopping across the ocean is always a hindrance for some.
I talked about this to Microsoft's Aali Alikoski who agreed - but of course, had no solution available. It seems Microsoft is working on the issue, but it's admittedly a rather difficult one. Even I (with relatively little Linux/Apache experience) would rather take on the task of setting up a multi-user PHP hosting box than an ASP.NET one. Even if I were allowed to use Mono. ;-)
Let's hope there's a solution coming for this. Since professionals (without problems of this sort) tend to use full Visual Studio, VWD is in a tough spot: if sites produced with it can be run nowhere, it's not too likely to become a success story.
I get lots of spam all the time. Most of it tries to sell me medication for issues I don't have. But today, I got a message I didn't even realize to be spam at first. It looked just like another product announcement - .NET components this time - coming from PureComponents. However, the alarming thing is this: I never ever register anywhere with the address the mail was sent to (it's only used in newsgroups). Neither have I ever even heard of PureComponents before. True enough, the message doesn't say "You're getting this because you have subscribed..." blahblah, the usual stuff with legit subscribed product offerings. It simply offers an unsubscription link.
I'm not even going to bother using it. I'll just delete the mail outright. Any company sending mail to addresses collected from newsgroups will not have my (or my employer's or any client's as far as I am concerned) money totally regardless of the quality of their products.
So why is this incident worth a post? It was one of the first spam messages related to programming; one of the first ones that actually could have made me buy something. I hope it won't get more common. It shouldn't; software developers are a critical bunch of nerds. Sending unsolicited mail to them is... well, short-sighted. At best.
If you're into developing .NET APIs, take a look at Microsoft's excellent video series Designing .NET Class Libraries. Although aimed at Microsoft's internal developers, the videos provide quite a bit of thought food for anyone seriously into developing class libraries.
A funny part of the sessions is that they have Word transcripts of the presentation. I don't know if they're done by a speech recognition app or a typist with really bad English, but as the content grows more and more technical, the scripts grow exponentially worse. Calling a queue just "Q" is acceptable, but how's turning "C-sharp has" into "Segents"? What about "V Two of the Dialnet Framework" for "v2 of the .NET Framework"? Whatever the source, they could've worked a bit more on these.
I finally got to the point of installing Whidbey beta 2 on my home computer. I was afraid of this in advance as I knew the uninstallation tools for pre-beta CTP releases have traditionally been abysmal. Well, it was easier than I expected... until I got to the point of installing SQL Server 2005 April CTP. Ok, even there everything was a smooth ride until the installation of maintenance tools (Management Studio et al.).
That's where the installer failed. No error message, nothing. The progress indicator was at 100% already, the text was "Updating component registrations". Then boom! The indicator started scrolling backwards and several "rolling back xxx" style texts appeared. The summary log mentioned that we're dealing with error code 2147944003, which wasn't particularly helpful. Luckily (?) the installation did dump a hefty 5.6 megabytes of detailed setup log. After a couple of hours of debugging and testing (hey Microsoft, it would've helped if the error log had contained the error code at some point...), I found out the cause to be this:
DDSet_Status: BeginTransaction()->IHxRegisterSession::CreateTransaction() returned 8004036e
Now why? Well, luckily those method names were the first thing to return some useful Google hits, so things started progressing. Among the results returned was this, which promptly advised me to find a hidden file called Rgstrtn.lck on my C drive and nuke it. Done, and everything works fine. The lock file is apparently used by the HTML help system, and a previously failed installation had locked the help system into a state where no installation with a help component could succeed.
Unless you've been missing all the recent news on Microsoft development world, you must have heard we now have .NET Framework 2.0 beta 2 out as well as beta 2 for Visual Studio 2005 family. So what?
One thing you should understand is that a beta 2 coming from Microsoft is not just another beta. While beta 1 releases of stuff can be flaky and have a non-finished feature set, beta 2 is often a sign of quality rapidly approaching the release levels. If you haven't tested Whidbey yet, you should now seriously consider it. A good way to start is downloading the VS2005 Express Edition.
You don't have to like .NET, but the release of the first truly mature version of the Framework will be one of the most important changes in the Windows development world for quite some time, you shouldn't just ignore it. Give it a spin if you want to consider yourself an aware professional. Doing it in the beta phase will even give you more time to get acquainted with it before you have to.
An update to my old code sample for HTTP uploads: you can now post form variables with the files. The new version of Upload.cs contains a method overload that allows you to pass in a StringDictionary of POST parameters. Some basic instructions for use are available at the old post. This file will replace the current Upload.cs in JHLib - until the next release, you can also take a look at the current version of the library for some usage examples of the upload code in general.
I'm going to spend tomorrow and Tuesday on a course called ".NET 2.0 Boot camp" which should pretty much cover everything Whidbey in two days (20 hours total). Of course, the truth is that only a handful of new features will be decently explored, but I look forward to it nonetheless. While playing with Whidbey's new class library is quite doable at home, try setting up a Visual Studio Team System test environment with all those required three servers. Nah... This is where I love people setting up the sandbox for me. ;-)
VSTS is definitely the part I most want to hear about. I haven't much touched it so far, but in 48 hours I expect to be considerably more knowledgeable. VSTS certainly marks a change in our set of possibilities - the only question is: is it too late for most of us? The industry has already selected other source control applications, work item trackers, unit test frameworks and whatnot. It'll be interesting to see if VSTS can win our hearts. For new businesses, it will certainly provide many interesting alternatives for arranging the daily processes.
If there's still anything left of me after this, I'll be sure to post a recap.
I've been toying with Whidbey quite a lot lately. One thing I didn't fully appreciate from reading the spec at first was partial classes. Sure, the concept of being able to divide a class in several files is great, but the full realization of its practical benefits kept me waiting until I designed my first complex form with VS.NET 2005.
#region Using directives
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Windows.Forms;
#endregion
namespace MyClient
{
partial class MainForm : Form
{
public MainForm()
{
InitializeComponent();
}
}
}
This is what I get when I pick "View code" for the form. Where are the control declarations? Where's that InitializeComponent() method maintained by the form designer? That's right, they're somewhere I can't see, and that's right where they belong. In practice, they're in MainForm.Designer.cs, which is totally maintained by the designer, as opposed to my MainForm.cs, which is totally maintained by me (except for VS.NET adding event handler stubs when I request so).
If I had had this little facet of separation at my disposal for the last few years, I would've been a much happier man. Not to say the current (2003) Form designer model doesn't work, but it's certainly clumsy and error-prone. The borders of machine-generated code and human creation are always weak spots. Now we're one step further from confusing the designed form and the integrated logic. An excellent innovation, that.
PS. I've done quite a lot recently, even though you can't tell by reading the blog. "Test-Driven Development in Microsoft .NET" is among the titles I've recently read - and heartily recommend. More on that soon, I hope.
When using enumerations, .NET languages automatically assign the numeric values for your enums. For example:
public enum MyColor {
Red,
Green,
Blue
}
This yields an enum with the numeric representation of Red = 0, Green = 1, Blue = 2. All this is fine - the numeric versions don't really matter normally. But one thing you should realize is that external assemblies referring to this enum do not get the symbolic names compiled in, but rather the numeric values. Now, if you have A.DLL defining that enumeration and B.DLL that calls a method in A.DLL (say, "SetColor(MyColor)") with something like ClassFromA.SetColor(MyColor.Blue) doesn't get the name "Blue" embedded into its binary version, but just rather "SetColor(2)".
Now go and change A's enum to have values of { Red, Mauve, Green, Blue }. The enum gets numbered from 0 (Red) to 3 (Blue). Everything works fine within A, but unless B is recompiled from source with references to the new version of A, the SetColor call looks like it actually had the parameter of MyColor.Green when it arrives in A.DLL. This is an extremely nasty way of introducing hard-to-find bugs in your code.
Thus, my recommendation: Always manually number all enumerations that have scope outside the defining compilation unit. An easier approach is to just always add the new values to the end of the enumeration, but it won't allow you to remove values from the enum unless you add more manual numbering. I prefer manual numbering even though you have to be careful to keep the numbers unique, as it allows you to sort the enumeration definition in any way without affecting the actual result. So here's an example:
public enum MyColor {
Red = 1,
Green = 2,
Blue = 3
}
If you wish to have them sorted by name in your definition code, go for it:
public enum MyColor {
Blue = 3,
Green = 2,
Red = 1
}
The enumeration values should be treated as one-time-identifiers - once they're assigned to a value and a DLL version is used for compilation of other libraries, the semantics of an enum value shouldn't be changed. So you shouldn't remove Green and reuse the numeric value 2 later on - it will cause the problem identified above. Leave gaps in the numbering and always use a new value for new items.
Also note that most solutions regarding enum value serialization (saving custom configuration files, calling external services with enum-type params etc.) have the same effects than the external-DLL-scenario. So, even if your enum only had scope inside one executable, you may be surprised when loading a config file from an older version of the software that had enum values stored in it. Be careful out there!
I desperately needed to find out where Adobe Photoshop Album 2.0 stores its database files (damn software that doesn't allow configuring this!). Well, a quick search of the obvious places didn't reveal a thing, so I wrote a few lines of code to monitor the files being accessed. The following code is a trivial console program to monitor what's going on in your C: drive. Don't you just love the ease of .NET?
using System;
using System.IO;
namespace fswatchtest
{
class Program
{
static void Main(string[] args)
{
FileSystemWatcher fswC = new FileSystemWatcher(@"c:\");
fswC.IncludeSubdirectories = true;
fswC.Changed += new FileSystemEventHandler(fsw_Changed);
fswC.EnableRaisingEvents = true;
Console.ReadLine();
}
static void fsw_Changed(object sender, FileSystemEventArgs e)
{
Console.WriteLine(e.FullPath);
}
}
}
A quick tip from recent experience: If you're writing your own Windows Forms controls and trying to unit test them, data binding issues are likely to pop up at some point. When you put the controls on the form, data binding works like magic. But when you're instantiating the controls outside the form context, you suddenly realize data binding is totally dead. Whazzup?
It's all about binding context. Without going into details here, your controls need one. Normally, the form has a single binding context that gets set to all the controls on it (I think this happens during the autogenerated Controls.Add call in InitializeComponent, although I haven't really verified this). Well, just do a MyControl.BindingContext = new BindingContext(). If you need to simulate several controls that work together as if they were on the same form, assign the same BindingContext object to all of them.
I'll start my planned series of code quality related posts with a .NET specific issue. I was planning on a more generic rant for a starter, but I've decided to try to blog things as they cross my mind - otherwise my list of topics will just keep growing (it's about three pages on 10-point Arial now) and I'll never post anything. So here we go!
The .NET System.Collections.ArrayList class represents a mutable-size collection of untyped objects (fwiw, very similar to Java's Vector). In contrast, .NET has typed but inflexible normal arrays (for example, int[] in C#). Now, as you're writing your application logic methods and returning lists of things, using an ArrayList is certainly a tempting alternative. Often, you need to use an ArrayList to construct a list-alike result (f.e. reading from a firehose DB cursor, you couldn't preallocate an array of proper size anyway). So, since converting to a typed array is somewhat cumbersome (something like (int[])myArrayList.ToArray(typeof(int))), why should I bother? Why not just return the ArrayList?
Well, the apparent reason is that "untyped containers suck". At some point, a user of the method is likely to cast the objects in the container into the wrong type (Were those values ints or longs? Did that method return Persons or Employees?), causing an exception. Good use of comments - particularly XML ones - will help, but won't resolve the issue. When returning objects, doing the actual conversion is easy. Thus, the rule is simple: Don't return ArrayLists. Return typed arrays. It's extremely rare for the caller to need to mutate the returned collection, and when that need arises, they can just use new ArrayList(myArray) to pull the array's contents into your newly created list.
What about when you're taking a list of something as a parameter? Again, don't use ArrayLists. Actually, I can think of only one situation where you'd want to do this: When you're passing an arraylist and intend to modify it. Most of the time you can safely replace the parameter modifying by taking an read-only parameter and returning a new array instance. When you can't, you should pass an IList instead of an ArrayList - if you're ready to abandon strong typing, then why not reap the benefit of allowing more flexible input? An IList parameter will happily take not just ArrayList, but also other containers that have list semantics.
For read-only (usually foreach-only) lists of items - such as a typical public decimal GetSalarySumByEmployeeIDs(ArrayList employeeIDs), typed arrays are again one of the best choices. Replace ArrayList with int[], and you just can't pass Employee objects by accident. However, if all your existing code uses ArrayLists, forcing all callsites to start converting their arraylists into arrays may prove to be too much of a strain. In this case, use IEnumerable. It's exactly as type-(un)safe than ArrayList, but allows passing of both arrays and arraylists (and a host of other containers).
To sum it up: ArrayList is useful as an internal structure, but sucks as an interface or public method signature element. Avoid using it publicly. Whenever you can, replace ArrayLists with typed arrays. If you can't, replace them with an interface that provides sufficient functionality. If you're on Whidbey, replace read-only items (both return values and in-parameters) with IEnumerable<T> and modifiable parameters possibly with IList<T>.
I've just pushed out JHLib 1.1, a new version of my tool library. The most relevant addition is the support for a Csv writer, but included is also the custom separator fix for the Csv reader requested by so many readers. There's also a class for easier number -> hex string conversions. I've also added unit tests for the class library (although they're slightly incomplete for the Internet classes - good ideas on unit testing a POP3 Client, anyone?).
Now, that concludes my plan item #3. Two down, three left...
Regarding some of my promises before Christmas, I've now hit number 2 - my Whidbey installation with VS.NET 2005 and Yukon betas is up. Not completely, but sufficiently to allow me to do whatever I wish to test at this point. Feels good to have everything back up - I accidentally disintegrated my Whidbey about a month ago, but everything seems to run again. What a great use for Christmas :-)
Now, I'll go spending some real Xmas time again. But when the ever watchful eye of the family is looking elsewhere, I'll restart my hacking engine. Until then, happy holidays to everybody!
Channel9 of Microsoft published a two-part interview with Jason Zander, the Product Unit Manager for CLR. The half-hour discussion revolved around most of the possible .NET-related topics, including an interesting and insightful overview of .NET overall structure. Without rehashing everything said on the interview (part 1, part 2), a couple of performance-related things are worth a note.
Deterministic finalization was just quickly mentioned in the video, but the issue was again picked up in the comments. For those of you who don't know, deterministic finalization is the ability to exactly define when the destruction of an object takes place. This feature is missing in .NET (where the garbage collector finalizes the objects at some point of time), something that caused real panic and anger at one time, especially since people with C++ and VB backgrounds were so used to it.
The classic Brian Harry Manifesto on Deterministic Finalization was mentioned again. It had been a couple of years since I last read that - but it's still great. I mean, even if you don't know that much about reference counting or tracing, you will see a host of problems in a totally different light after reading that. But almost as much fun as Brian's posting are the replies to it (follow the link above and keep clicking "Next in topic").
For example, Brian's "We feel that it is very important to solve the cycle problem without forcing programmers to understand, track down and design around these complex data structure problems." got quickly answered with "The game's over if people are being shielded from having to actually design their software". Further down the thread, Microsoft's approach to .NET memory management was labeled as 'hand-holding, design-free "programming"'.
Later on, somebody says "It's downright *laughable* that Microsoft is worried about performance when their target is the internet!" Allright, that's the spot when at the latest you're going to have to smile. All of this took place in October 2000, only a bit more than four years ago. And what do we have now? Two of the foremost modern programming platforms (Java and .NET) are both based on garbage collection with a non-deterministic finalization scheme. Second, the performance and the Internet is an everyday challenge now, one that many people worry about. Not just network throughput, but the scalability of ASP.NET applications, Web Service endpoints and whatnot.
I won't spoil the fun of reading the thread by quoting more - there are some real gems of human arrogance down there. Sure, not everybody will ever pick up .NET or Java, and some programming tasks are still left to be done with C++. But still, how far down the "New patterns suck" road can you go? It's a pleasure to realize how much less tech-oriented programming tasks are today. I mean, creating enterprise business applications is hard enough as it is, even without worrying about cyclic GC issues or C++ style memory management. I'd much rather spend my time doing class diagrams in Visio than crafting elaborate refcounting architectures.
"4A 4F 55 4E 49 20 48 45 49 4B 4E 49 45 4D 49"? Today, among other things, I wrote a C# method that converts byte arrays to their hex representations. That's very simple actually - ToString does most of the grunt work, but some parameterizations help in customizing things.
public static string ToHexString(
byte[] bytes, bool spacesBetweenBytes, bool upperCase) {
StringBuilder sb =
new StringBuilder(bytes.Length*(spacesBetweenBytes ? 3 : 2));
string byteFormat =
"{0:" + (upperCase ? 'X' : 'x') + "2}" + (spacesBetweenBytes ? " " : "");
foreach (Byte b in bytes)
sb.AppendFormat(byteFormat, b);
// Cut off the last space if we were using spacesBetweenBytes
if (spacesBetweenBytes && bytes.Length > 0)
sb.Length--;
return sb.ToString();
}
If the bytes parameter is an array of bytes 123 and 234, the hex representations by the combinations of the two boolean params are as follows:
| ToHexString w/ {123, 234} |
spacesBetweenBytes false |
spacesBetweenBytes true |
| upperCase false | 7bea | 7b ea |
| upperCase true | 7BEA | 7B EA |
I believe you have no trouble guessing what this overload does:
public static string ToHexString(int num, bool spacesBetweenBytes, bool upperCase) {
return ToHexString(
BitConverter.GetBytes(num),
spacesBetweenBytes,
upperCase
);
}
Update 2004-11-21 9:00 UTC+2: I made the int version use BitConverter instead of implicitly setting the byte order (endianness). Sorry about the edit; I must've been asleep when writing the entry.
Another interesting MSDN article has been published. Using CLR Integration in SQL Server 2005 is a broad (and verbose) description of key aspects of CLR/T-SQL integration. Among the questions answered are:
SELECT THIRD_BIGGEST(Age) FROM Person?)Grab a cup of coffee (or whichever poison you prefer) and allocate at least half an hour if you intend to read the article in one go. It's heavy but definitely worth it. Some of the examples - such as the RSS reader implemented as a TVF - sound ridiculous at first, but think about it. All of this is going to change the way we program against databases and design our architectures. The start of the revolution is only a year away.
I finally finished (yeah right!) my personal Regex testing tool. With REpad, you can easily test both match/capture-type regexps and replacements. And when you've polished your regexp, you can easily do some string conversions (from/to C# strings and regex literals) through the context menu.
If you need this sort of tool, get the binaries or the sources. And once more, feedback is welcome.
I've finally gathered some of the example code I've posted on this blog as well as some other snippets from my code library. They are now available as a free code library, JHLib. The library currently contains the CSV Parser, ProperCase algorithm, HTTP upload code and a Pop3 client with an Rfc822 compliant header parser. Also, there's a demo application for each of those sections.
I'll be adding more code to the library as I have the time (there's still a lot of quick-and-dirty stuff I'd never want to show anybody :-)). You can stay up-to-date with the library changes by reading this blog; I'll be certain to post about any updates.
The lost treasures of .net class library, part I: The System.Collections.Specialized namespace has a few excellent classes that can help you (at least until you get generics, that is). Too bad many coders have never even heard of them. Here's a short introduction:
HybridDictionary and ListDictionary are just perf-tweaked Hashtable implementations. Fine, but not earth-shattering. But it gets better: StringCollection is essentially an typed arraylist, or an flexible-size string array, if you will. StringDictionary is an Hashtable with the key strongly typed as a string. NameValueCollection is excellent for configurations and some other situations (by the way, it's actually used for this purpose by .net).
For some do-it-yourself-spirit, there's also NameObjectCollectionBase, a base class for your own string-keyed typed Hashtables. See the help for an example. And last but not least, CollectionsUtil with static method shortcuts for creating case-insensitive SortedLists and Hashtables.
Need to parse CSV (Comma Separated Values) files in C#? There are many solutions starting from the OLE DB adapter, but here's an easy-to-use CSV Parser written in pure C#: CSVReader.cs. Now, here's a quick tutorial.
First, let's recite the rules of CSV: Each line in a text file represents a record. The fields on each line are separated by commas. If a field starts by a double quote ("), the field ends when the next quote is encountered. If you need to embed a quote inside a quoted field, use a double quote (""). Take for example the next trivial CSV file:
my fields,go,here John said: "Don't move","""I won't"", he replied"
The first line parses into three separate fields ("my fields", "go", "here"). The second one is trickier, but it produces two values. You need to note that the quotes in the first field (John said: "Don't move") do not mean field boundaries. The behavior would be different if a double quote started the field, as it does for the second field ("I won't", he replied). This is why the quotes don't need doubling for the first field.
Now, the CSVReader class can be used to read the file like this:
using (CSVReader csv = new CSVReader(@"c:\myfile.csv")) {
string[] fields;
while ((fields = csv.GetCSVLine()) != null) {
Console.WriteLine("New CSV line begins");
foreach (string field in fields)
Console.WriteLine("CSV field: " + field);
}
}
And as you can guess, the code produces output like this:
New CSV line begins CSV field: my fields CSV field: go CSV field: here New CSV line begins CSV field: John said: "Don't move" CSV field: "I won't", he replied
As usual, feedback and/or bug reports are welcome.
Yet another snippet from my evergrowing code library: How to programmatically upload (possibly multiple) files to an HTML form. I won't post the full code here, but you can download a zip file with a Visual Studio solution including a test app. The upload code is in a separate file (HttpFileUpload.cs), so it's easy to use even without VS.
The HttpFileUpload class provides three methods: UploadFile, UploadByteArray and Upload. The first two call the last one and are just shortcuts to the generic behavior of the upload code. Let's only take a look at the generic method here; the other two are trivial to figure out by reading the in-code documentation.
The Upload method takes four parameters: the target url, a cookie container, credential cache for passing logon information and the objects to be uploaded (as a params array of UploadSpec objects). UploadSpec objects can be constructed by passing either the pathname or the byte array. Note that even with the byte array form, you still have to specify a fictional filename for the receiving end (although leaving it empty isn't usually fatal). Also, you'll have to specify the name of the form field to which the uploaded data should be stuffed into.
An example call:
byte[] someByteData = GetSomeBytes(); HttpFileUpload.Upload( "http://localhost/myupload.cgi", null, null, new HttpFileUpload.UploadSpec(@"c:\windows\win.ini", "file1"), new HttpFileUpload.UploadSpec(someByteData, "myfile.exe", "file2") );
I acknowledge there are still some additional features that might be needed in the future. In case I end up enhancing the library, I'll post an update. Meanwhile, feel free to use the code for whatever needs you have; feedback is naturally welcome.
Edit 2005-03-20: The code has been updated to support form variable posting. See the new post on the subject.
Ooph. A rididculously busy two-week period is turning to its end, and not a single day too early. It's been work, studies and lots of pre-scheduled free-time activity in a fairly rough mixture. I really need the weekend this time.
One thing I'd like to pick and show from the spare hours of the last week: Dundas Gauge was released. I was part of the beta program, so I've been toying around with the product for quite some time already. If you don't already know Dundas, go take a look. They produce data visualization components - first charts and now gauges. The software is from the high end of both the price and quality scale.
As for me, I was direly in need of a spare time coding project. Even though we've already got a load of all sorts of system visualization apps, I just had to write my own. This one has full XML configurability and can bind any sorts of gauges to any sorts of performance counters and other data sources. Ah, if I only had the time to finish it. If I will, you can be certain I'll let you know.
For some perverted reason, I HAD to try to write the best propercasing algorithm on Earth. This one does all of the following (highlights bolded):
jouni heikniemi -> Jouni Heikniemi
jouni von lederhosen -> Jouni von Lederhosen
THE EYE OF THE TIGER -> The Eye of the Tiger
1250 MHZ -> 1250 MHz
RoNaLD MCDoNaLD, USa -> Ronald McDonald, USA
Enough babble, the code is up next.
// CONFIGURATION:
// The following words will always be in lower case (except in the start of the string)
static string[] lowerCaseWords = { "of", "the", "and", "or", "a", "an", "von" };
// The following prefixes will cause their next character to be uppercased
// Note: Keep the first character uppercase when defining these; all else must be in lowercase
static string[] upperCasePrefixes = { "Mc", "O'" };
// The following words will be always presented in the case they have here.
static string[] fixedCaseWords = { "USA", "NATO", "MHz" };
/// <summary>
/// Converts the given string into ProperCase.
/// </summary>
/// <param name="original">The original string, f.e. "THE EYE OF THE TIGER"</param>
/// <returns>The string converted into ProperCase, f.e. "The Eye of the Tiger"</returns>
public static string ProperCase(string original) {
if (original == null || original.Length == 0) return "";
// Run the original through the massage word-by-word
string result =
Regex.Replace(original.ToLower(), @"\b(\w+)\b", new MatchEvaluator(HandleSingleWord));
// Always uppercase the first character
return Char.ToUpper(result[0]) + (result.Length > 1 ? result.Substring(1) : "");
}
// This helper method properizes (sp?) the case of a single word (regex match)
// NOTE: The input is in all lowercase as forced by the ProperCase method.
private static string HandleSingleWord(Match m) {
string word = m.Groups[1].Value;
// Is this word defined as all-lowercase?
foreach (string lcw in lowerCaseWords)
if (word == lcw)
return word;
// Is this word defined as a fixed-case word?
foreach (string fcw in fixedCaseWords)
if (String.Compare(word, fcw, true) == 0)
return fcw;
// Ok, this is a normal word; uppercase the first letter
if (word.Length == 1)
return Char.ToUpper(word[0]).ToString();
word = Char.ToUpper(word[0]) + word.Substring(1);
// Check if this word starts with one of the uppercasing prefixes
// Note: Only one of the uppercasing prefixes is applies
foreach (string ucPrefix in upperCasePrefixes)
if (word.StartsWith(ucPrefix) && word.Length > ucPrefix.Length)
return word.Substring(0, ucPrefix.Length) +
Char.ToUpper(word[ucPrefix.Length]) +
(word.Length > ucPrefix.Length + 1
? word.Substring(ucPrefix.Length + 1)
: "");
return word;
}
Afterwards, I spotted a tiny programming error. I don't think it's going to be seen in any production application, but it can produce slightly wrong result in a certain situation. Can you spot it?
Starting new tasks (other programs) in .net is actually very easy. You just have to find the Process class, which somewhat surprisingly resides in the System.Diagnostics namespace. Once you're there, a simple call will go far: Process.Start("notepad.exe") does what you'd expect. And because the Windows shell has some built-in logic for various cases, Process.Start("http://www.google.com/") pops up your default browser, too. Now that's cool.
What if you want to run something as if you had typed that into the command interpreter's prompt? The solution is equally easy, you just have to know a couple of things. First, the location (full pathname) of the command interpreter is located in the environment variable COMSPEC. Second, cmd.exe (the default command interpreter) accepts a /c parameter meaning "Run rest of the command line and then exit" - exactly what we want. The following helper method packs up this functionality:
public static void ExecThroughCmdShell(string command) {
System.Diagnostics.Process.Start(
Environment.GetEnvironmentVariable("COMSPEC"),
" /c " + command
);
}
Call the method with syntax like ExecThroughCmdShell("dir /s /p c:\\windows"); to have a new shell window pop up. If you need to wait for the task to return before continuing, the Start method returns a Process object which has a WaitForExit method just for this purpose.
Microsoft .NET Framework 1.1 Service Pack 1 is out. It's available in two variants: one for Windows 2000/XP and another for Windows Server 2003. Too bad the knowledge base links on the download pages don't work, so the fixes listing is a mystery so far. The overview says: "The primary focus of Microsoft .NET Framework 1.1 Service Pack 1 (SP1) is improved security. In addition, the service pack includes roll-ups of all reported customer issues found after the release of the Microsoft .NET Framework 1.1. Of particular note, SP1 provides better support for consuming WSDL documents, Data Execution prevention and protection from security issues such as buffer overruns."
Oh, and there's a SP3 for .net Framework 1.0, too. Right here.
Edit next morning: The KB articles are out now and linked to the download pages. There's a list of 66 bugs in the Win2000/XP 1.1 SP 1 list. I never encountered any of them, but there are a few pretty serious ones included.
Most people have a gut feeling about when to use StringBuilder for concatenation and when to just add strings together with the + operator. But what are the exact situations in which each of the approaches is better? When the question gets asked, people often give out overly simple rules such as "5 catenations". Is that really correct for the vast majority of cases? Of course, being the dubious me, I decided to test it and resolve the question once and for all.
The basic setting is this: StringBuilder.Append is faster than String + String. However, new StringBuilder() requires time. Now the question is: How many Append calls are required to have the speed benefit exceed the construction cost of the StringBuilder? Ultimately, the answer would be just one magic number. Unfortunately, in practice it isn't.
Here are the simplified conclusions. They shouldn't be taken literally, because situations vary and there's a code readability issue as well (most people read String + String more easily than sb.Appends). Regardless, for most cases these rules do provide the correct answer from a performance perspective.
I don't expect you to believe me any more than any other information source on the net. But to back up my claims a bit, I'll discuss the background of these results next.
String objects in .net are immutable. Once the string has been created, the value can't be changed. When you type s = s + "foo";, you actually discard the old s and create a new string object containing the result of the concatenation. When repeated several times, you end up constructing many temporary string objects.
StringBuilder, on the other hand, represents a mutable string. The class itself contains quite a few methods to change the contents of the string. This includes appending new strings to the end - the most common operation by far. Internally, StringBuilder reserves a buffer of memory which is used only partially at first (usually). Concatenations that fit into the buffer are just pasted in and the string length is changed. If the new resulting string wouldn't fit into the buffer, a new buffer is allocated and the old contents are moved in. In no case new objects need to be created.
The sore points of StringBuilder are the construction cost (which makes the "magic number" practically always at least 3) and the cost of allocating a new buffer when the resultant string would exceed the current buffer size. The latter one explains why the preknowledge (or a good estimation) of the resultant string size helps so much: StringBuilder can just allocate a sufficient buffer once.
Testing this is actually pretty simple. Choose a string operation, implement it using both ways and repeat sufficiently many iterations while measuring the execution time. There are basically two factors involved: the length of the strings being handled and the number of concatenations. Few real-world scenarios use a fixed amount of concatenations with fixed-length strings, so a very realistic test case would do real-world concatenations. However, constructing such a scenario isn't easy as it tends to adds non-string operations into the loops, thus messing up timing.
Pure string concatenation loops are very rare in any case, so even if you're able to speed up your string operations by 50%, it's very unlikely your software will speed up that much. The point here is this: if you want absolutely best performance, measure it yourself - in your real-world scenario. However, fair amount of testing on some of my applications has convinced me that the simple rules outlined above actually do hold up even with fairly varying material.
So, my test was essentially a loop of string concatenations with each iteration appending another string of predetermined length and content to a temp variable. I mostly varied the number of concatenations (iterations of the loop) to find out the cutoff point, but I also played with the string length. All tests were repeated 10 million times by an outer loop to provide better sampling. Everything was run on my AMD Athlon 2800+ with 1 GB of Memory, XP Pro and .net Framework 1.1.
The following source snippet shows the basic versions of the testing loops:
// String version
string s2 = new String('x', Int32.Parse(args[0]));
int loops = Int32.Parse(args[1]);
for (int j = 0; j < 10000000; j++) {
string s = "";
for (int i = loops; i > 0; --i)
s += s2;
}
// StringBuilder version
string s2 = new String('x', Int32.Parse(args[0]));
int loops = Int32.Parse(args[1]);
for (int j = 0; j < 10000000; j++) {
StringBuilder sb = new StringBuilder();
for (int i = loops; i > 0; --i)
sb.Append(s2);
sb.ToString();
}
The extra ToString call at the end of StringBuilder version is there to level the field for the approaches: the first one's end result is a String, so it should be the same for the last one as well. Leaving that ToString out had a marginal effect on the results: while it did make a 8% difference with a single concatenation, the effect quickly died as the number of operations increased.
I started with 10-character strings, running from 1-50 concatenations (each repeated 10 million times as outlined above). The result is the chart below, displaying the relative execution times against the number of iterations (1-15). Absolute execution times aren't shown since they're hardly relevant.
The blue line is the performance of the pure String approach. It looks linear at first sight, but it isn't. If the String approach had to allocate space for X chars (where X is the length of the string being added, 10 here) per loop iteration, the time requirement would grow in a linear way. However, the amount of memory needed - and also, the amount of existing data being copied to the newly constructed string object - increases with every iteration. For Nth iteration, the String version allocates space for N*X chars. Thus, every iteration is slower than the previous one, and the String time curve steepens quickly as N grows.
The red line is StringBuilder at its basic settings. If you add a trendline, SB actually performs fairly linearly with increasing N*X. The bumps in the line are caused by the buffer allocations. Now, knowing how StringBuilder works in .net helps here: The default buffer size is 16 chars, and it's doubled each time it overflows. Remembering that X is 10 here, it's no big surprise that the bumps appear at 2 (after 16 chars), 4 (32), 7 (64) and 13 (128) iterations.
As you can see here, the first time the SB result is below the String result is at six concatenations. However, the memory alloc bump at 7 concats makes SB again slower than pure strings. After that, however, the results are clear. Even though the bump at 13 catenations is considerable, it's nevertheless much below the blue line. However, the exact figures aren't relevant: the bump locations are much tied to the amount of chars gathered so far. However, with most normal strings the cutoff point is somewhere between 4 and 8.
The green line represents a StringBuilder initialized to the size of the final string (using the StringBuilder's int-taking constructor). As you can see, this is the fastest approach by a very clear marginal. And, as you can see, the cutoff is at three catenations! The obvious drawback here is that you have to know the buffer size beforehand, which you usually can't do. For the cases you do know it (such as this simple fixed-length scenario), it's blazingly fast. At 50 catenations with 10-char strings, it's 550% faster than pure String-based catenations and 35% faster than uninitialized StringBuffer. The differences tend to grow as the size of the data increases.
The good thing is this: even a rough estimation of the resulting string size helps. If you overestimate the string size, you're allocating extra memory, but you're avoiding mid-loop buffer expansions. The extra memory allocation will slow you down at some point, but the effect may be negligible. If you underestimate the string size, you're going to have a buffer operation at some point. However, it's very likely you've still skipped early reallocations.
For example, if you're generating a 150 char string in 10 char increments (but you don't know these characteristics beforehand), initializing the StringBuilder with default values causes four buffer reallocations (16 -> 32, 32 -> 64, 64 -> 128, 128 -> 256). While initialization to 150 (or any larger value) would avoid the allocations altogether, even an initialization to a rough estimate such as 100 will help: you'll have only one realloc happening.
The moral of the story: Estimate whenever you reasonably can. Even a bad estimation will usually provide 10-20% benefit over a StringBuilder constructed with the default values. However, if your strings are very long, you'll want to read the following chapter first.
How about string lengths? Varying the string component length (X above) with a default StringBuilder has actually pretty little effect. For fairly short strings, the cutoff point is usually a bit lower, but this is largely caused by the fact that more short strings fit into the default StringBuilder buffer of 16 chars. However, the absolute gain here is usually irrelevant since the concatenations on short strings are very fast regardless of the method used.
The pure String-based concatenation slows down as the number of chars in the string grows. The worst scenario is many additions of short strings at the end of a long string. For example, when 2 chars get added at the end of a 500 char string, 99,6 % of the memory allocated is for the old part of the string. Duh!
For StringBuilders, later buffer reallocs are slower, of course. More memory needs to be allocated and more old content needs to be moved around. So, the longer your strings become, the more you'll gain by estimating. For 50 catenations of 50-char strings, a perfect estimation gets you a 50% speed benefit over a StringBuilder with default settings!
However, there's a catch. As the memory allocations grow, the significance of your estimation accuracy plays a bigger and bigger role. Suppose we have the previously discussed 50x50 char string, resulting in 2500 bytes of final size. Now, the following table lists the execution times with different estimations. Times are relative to the default settings, so that the default is indicated by 100%; smaller figures mean faster execution (less time).
| Initial buffer size | Time |
|---|---|
| 16 (default) | 100 % |
| 50 | 97 % |
| 2000 | 88 % |
| 2499 | 104 % |
| 2500 | 49 % |
| 3000 | 53 % |
| 4000 | 62 % |
| 5000 | 103 % |
| 10000 | 268 % |
As you can see, if you can guess the final size of the resultant string, you're very fast - only 49% of the default execution time. However, make the buffer one byte too small (2499 in this example), and you've just ruined your performance. Adding the last element doubles the buffer to 4998 bytes, which has quite a lot of overhead in it. In the other direction, even a 60% overalloc at 4000 bytes is pretty fast (only 62% of the original execution time). Unfortunately that costs memory, and with strings at the sizes of several megabytes, you probably can't afford that luxury.
On the other hand, you also saw that also slight underallocation wastes RAM eventually. Neither is the default approach perfect: always doubling the buffer tends to allocate extra space, too. So, slight overallocation might be both the fastest and the most memory-sparing approach unless you can do a perfect estimate.
Guessing is hard, but luckily the consequences of a bad guess aren't usually catastrophic. If you can avoid massive overallocation, you're not likely to do much worse than the default settings. In any case, the execution time without StringBuilder is 712% on the scale above; it's pretty unlikely you could do worse than that. :-)
StringBuilder performance is a tricky thing. In the last chapter you saw that the StringBuilder with perfect size estimation can be 15 times faster than normal string concatenation. But earlier in the article you also saw that even the default StringBuilder beats normal string catenation by a clear marginal once the cutoff point of 4-8 concatenations is passed.
Except for the most critical string handling loops, optimizing the process to the point of making perfect estimations isn't usually worth it. For reasons of code clarity you might even want to avoid using StringBuilder when the amount of concatenations is only slightly over the cutoff point and you're working with an operation that's not critical to the millisecond level. For example, constructing a ten-part SQL statement is likely to be faster with StringBuilder, but the speed difference is negligible when compared to the execution time of that statement. Though, once you become familiar with the StringBuilder class, you'll be reading sb.Appends just like you read plus signs.
Suppose you want to do something special when the user hits enter in one of your Windows Forms app's TextBoxes? "Easy!", you say, thinking about hooking up a KeyDown event handler - until you try it and find out it doesn't actually work. And then you go Googling, just like I did earlier today.
And yeah, it's true, you can't catch keys reserved for form navigation with KeyDown or KeyPress events. Well, that blows. Of course, a quick search of the web turns up quite a few workarounds, so you're saved. Since I figured this out already, let me save you some time.
When a key is hit in a TextBox control, a method named IsInputKey gets called on the control. That method takes a Keys enumeration as a parameter and returns a bool - true if that particular key is an "input key", false if not. False also means that the form handler will take care of (or ignore) the keystrike instead of passing it over to the control. And you guessed it, TextBox.IsInputKey returns false for Keys.Enter (this is different if you have a multiline textbox and AcceptReturn enabled, but let's not go there now).
So, your problem is solved once you make your TextBoxes accept enter as an input key. The answer is subclassing the textbox, i.e. creating your own control. That's not really as bad as it sounds. In fact, you can get away with just this:
public class EnterTextBox : TextBox {
protected override bool IsInputKey(Keys key) {
if (key == Keys.Enter)
return true;
return base.IsInputKey(key);
}
}
Now start using your new textbox controls. If you're using no IDE at all or you have Visual Studio, just go and substitute new EnterTextBox() for new System.Windows.Forms.TextBox(). VS's Form Designer seems to handle this quite nicely. If you're using #develop, you can't use that shortcut - #develop forms designer will wipe away your textboxes. Luckily your EnterTextBox will have appeared in the Custom Components section of the toolbox, so it's only a matter of dragging some controls.
As you can guess, the following event handler now works properly:
void TextBox1_KeyDown(object sender, KeyEventArgs e) {
if (e.KeyCode == Keys.Enter) {
e.Handled = true;
MessageBox.Show("Enter hit in textbox1!");
}
}
Remember to set Handled to true to signal the textbox that the keystroke has actually been dealt with already.
If you did the Googling part, you probably saw there are a few other solutions to this as well. One of them involves overriding ProcessDialogKey, and another common one is overriding the WndProc handler. They both get notifications for enter hits regardless of what IsInputKeys returns. If you have a static definition for what enter should mean (for example, always clear the field), you could use one of those approaches. However, if the action required varies by field, you should use the solution above to override IsInputKey instead - that way you can hook into the KeyDown event to customize behavior on an instance-by-instance basis.
What's new in Windows Forms for Whidbey? What did they change for System.XML? How about the ASP.net object infrastructure? There's a good amount of Whidbey related news out there (particularly in the MS blogs), but for another kind of view try the .Net2TheMax .NET Browser. It's a web site that allows you to wander through Whidbey Beta 1 assembly hierarchy and see what has changed or been added since 1.1. For example, take a look at this list of changes in the string class.
The UI is not perfect and you have to have an idea of the internal structure of .net (for example, most of the interesting stuff in System namespace lies in the mscorlib instead of the more logically named assembly called System), but a little bit of exploring will certainly give you an idea on what's new - and it'll also give you good keywords for Googling for further information. Happy hunting!
Earlier in the summer, Microsoft launched ISV Buddy Program - that is, a program that allows Independent Software Vendors to get a contact person inside Microsoft. The contact person is an "insider", providing the ISV a quick access to resources and inside information. For Microsoft, it's a sort of extended support and of course, good publicity. And of course, it gives them plenty of information about their customers - a valuable asset indeed. Somasegar's blog entry gives some additional views on the subject (from a MS perspective).
Again, Microsoft deserves some praise on this move. Although there is fairly limited experience of the program so far, the concept is pure gold. Even without a formal program, many of us have built good person-level contacts with employees at clients, subcontractors, administration - and even big software corporations like Microsoft. And all of us probably understand the importance of such relations - or at least you'll do once you've received some help for solving your tough problems.
Another view of this is "In Open source projects, you don't have to find yourself an insider - you can be one yourself". That's true, and a really good point. For some tools, becoming the insider and the professional yourself is the best solution. However, for other pieces of software, you just can't invest the time to get yourself all that knowledge by yourself. For many pieces of open source software you can find decent support, but for most of them, you can't find reliable support. At least for free.
The debate between open and closed source aside, any step Microsoft takes towards personal contacts and responsible customer support is good. Even though the issues with closed source remain, it's another step towards a much more open approach. And to sum it up, I'll make a bold claim: for most developers, the key benefits of open source lie more in the open development process than the availability of the source itself.
The .net framework 2.0 comes with quite a few exciting new features. One of the less visible, yet very useful enhancement is String.Split's ability to split by a group of strings; in 1.1 you only could split by single characters. In 2.0, you can say "a=b and c=d".Split("and").
You can also give Split a new StringSplitOptions enum parameter where you can specify options related to splitting. The only option existing at the moment (2.0 beta 1) is RemoveEmptyEntries. Naturally it just removes empty elements which occur if separators immediately follow each other. I'm pretty surprised the framework team didn't bother with "CaseInsensitive" and "TrimResult" options, too: CaseInsensitivity exists in many string functions now (with 2.0 the support got added to StartsWith and EndsWith, too), and trimming all the split results is often useful when splitting natural text by words (which tend to have spacing around them).
As it is, case insensitive string splits must still be done using Regex.Split, which of course allows you to make the Regex case insensitive - and it's easy to make the Regex ignore the spacing as well. The following code sample is a good illustration:
string text = "a=b OR c=d and e=f";
text.Split(new string[] { "and", "or" });
// produces "a=b OR c=d ", " e=f" (note the whitespace)
Regex.Split(text, "\\s*(?:and|or)\\s*", RegexOptions.IgnoreCase);
// produces "a=b", "c=d", "e=f"
Regular expressions save the world again, but a properly equipped Split would be so much easier to use...
For a listing of other new stuff in the framework class library 2.0, see the article on BCLTeam blog.
My article Implementing Perl-style list operations using C# 2.0 is now public on CodeProject. Go read if you want new tools for your array/list toolbox.
Microsoft has announced the product line overview of Visual Studio 2005. Since professional developers are already used to the full VS experience, the most interesting part is the Express product line, aimed at "beginning programmers and non-professional developers". Although the licensing conditions and final pricing remain to be published, it's rumoured that the price tag would be in the 49 - 99 $ range or 40 - 85 euros approximately.
I've been testing the Visual C# 2005 Express Edition Beta, and I must say this: If the price ends up the lower end of that rumoured range, it's going to be a hit. And it's going to hit the competition, Borland mostly. The VS line has gathered such a community behind it (take a look at CodeProject or GotDotNet just for examples) that it's going to be increasingly hard for Delphi to compete. I'm not even mentioning C#Builder here -- last I tried it, I quickly fled back to my text editor (not even VS at the time).
Even though Microsoft talks about beginners, I don't think most people are going to run out of features on VC#2005 Express even if they were quite able programmers indeed. I mean, these feature tables are just an illusion: for most part, they don't contain rows that have "Yes Yes Yes" - elementary features such as syntax highlighting or IntelliSense are taken for granted. Now, if we compare VC#2005EE to, say, Turbo Pascal 5.5 of the early 90s, VC# is packed with features nobody ever even dreamed about back then. Yet, people created massively complex software with those tools. For most everyday programming tasks, we've crossed the border where programming IDE stopped being a hindrance quite a long time ago - it's now a question of the developer's ability to develop and handle immense abstract structures.
By no means am I trying to say there's no longer need for tool development. VS2005 is much better than 2003, and there's still much room for improvement. But still, looking at feature charts makes you unnecessarily greedy. Those tools don't usually make you a better programmer which is - in the end - The Thing required to successfully create and maintain pieces of non-trivial software, be it for commercial or non-commercial purposes.
The next logical step here will be providing the Express tools for free. I'm looking forward to it. But even now, I'd be ready to toss 50 of my own euros to get the dev environment VC#2005EE provides.
After installing .net framework 2.0 beta 1 a while ago, I've been wanting to perf test C# generics to see if the speed increase from untyped containers is noticeable. I'm surprised on the little effect they had on tests, but on the positive side, Whidbey framework seems considerably faster than 1.1 anyway.
Testing part 1 was done by compiling the following on both 2.0 and 1.1:
class MyClass : IComparable {
public readonly int x;
public MyClass(int x) { this.x = x; }
public int CompareTo(object o) {
return this.x.CompareTo(((MyClass)o).x);
}
}
static void Main(string[] args) {
ArrayList a = new ArrayList();
Random r = new Random();
for (int i = 0; i < 1000000; ++i)
a.Add(new MyClass(r.Next()));
a.Sort();
}
The code simply creates custom objects and sorts them. On framework 1.1, running the program took about 4.3 seconds (average of 10 repetitions). On framework 2.0, the exactly same source produced an executable with a running time of 2.7 seconds - that's a 37% improvement just by switching the framework version!
At this time, I was expecting quite a lot from generics. So I changed the code a bit:
class MyClass : IComparable<MyClass> {
public readonly int x;
public MyClass(int x) { this.x = x; }
public int CompareTo(MyClass o) {
return this.x.CompareTo(o.x);
}
public bool Equals(MyClass o) { return this.x == o.x; }
}
static void Main(string[] args) {
Random r = new Random();
List l = new List<MyClass>();
for (int i = 0; i < 1000000; ++i)
l.Add(new MyClass(r.Next()));
l.Sort();
}
The IComparable now uses a generic typed version, and I've replaced untyped ArrayList with the generic List
.net framework 1.1 used about 4.6 seconds; 2.0 did it in 2.9 secs and the generic version clocked 2.5 seconds. So, reading the array through in a foreach loop didn't really make a difference - the relative speed differences were equal.
One shouldn't be too disappointed on the performance of generics, though: The speed increase from 1.1 to generics version is a whopping 46% - the fact that even untyped containers got quite a speedup doesn't really make generics worse. Couple that with the better syntax and less risk for nasty runtime errors, and I think we'll find generics quite useful indeed.
long sum = 0;
foreach (MyClass mc in a) { sum += mc.x; }
The articles for MSDN Magazine 8/2004 are out. Nothing particularly dazzling at this time, unless you're interested in SQL Server Reporting Services, Genetic Algorithms, Sharepoint, ADO.net, ASP.net and Windows Forms internals and, uh...
I guess I just summarized my main problem with MSDN Magazine: the breadth and depth of the articles in every freakin' issue is baffling. Although it's nice to have all that information sitting in your bookshelf even if you don't actively read all of it, month by month it's becoming harder - and not to mention more time-consuming - to digest even a fraction of the contents. But I guess this is the way things are now; or, as Joel On Software puts it: No developer with a day job has time to keep up with all the new development tools coming out of Redmond, if only because there are too many dang employees at Microsoft making development tools!"