Code quality, part IV: Nameless abstractions are just complexity

I recently bumped into this:

foreach my $module ( grep( !/^Perl$/, $installed->modules() ) )
if ($module =~ /DBD::(.*)$/) {
$installed_DBD_modules{lc($1)} = 1;

For the non-perl-literate ones: The code block runs through the list of Perl's external code modules and collects the names of installed database drivers into a hash table. We now focus on the grep( !/^Perl$/, $installed->modules() ) part, which is the enumeration we're looping through. The modules() call (from an external code library) returns not just the list of installed modules, but also "Perl" itself. The grep statement is there to remove this, thus allowing the code inside the loop to just concentrate on the external modules only.

I see the grep statement as an abstraction layer. It hides (or attempts to hide) the fact that the data source also returns perl itself, not just external code modules. However, as it is implemented currently, I view the grep part as being nothing but unnecessary complexity – removing it wouldn't affect the result at all as the in-loop regex wouldn't match "Perl" anyway!

This particular abstraction layer has one problem: It doesn't really abstract out anything, as people cannot understand its operation without parsing the line in their heads. Now, if the grep statement and its contents were pushed out into a method such as "GetInstalledModuleNames()", things would turn out entirely differently. Also, it would make the code more easily maintainable as changes to the filtering method wouldn't affect the call site (or perhaps several of them) anymore.

Therefore, naming is essential. Often, naming means extracting code into methods. Don't avoid it. It's not a sin to have a method that gets called only once. Neither is it bad practice to have a method with but a few – or even just one – lines of code. Code length is irrelevant, readability rules. Most complex expressions (regexes, heavy boolean conditions, non-trivial string catenations etc.) are detrimental to code readability if embedded in a block of other code. Not so when they're well secluded in their own little methods.

Rule of thumb: Never mix trivial and non-trivial code in a single method. "GetInstalledModuleNames()" is always easy to understand, but its implementation may range from very easy (like here) to quite difficult (do the same in C++ for example!). The name is what makes a potentially complex operation trivial.

April 10, 2005 В· Jouni Heikniemi В· One Comment
Posted in: Misc. programming

One Response

  1. Anonymous (coward) - May 8, 2005

    How do you like
    grep {
    /DBD::(.*)$/ && ($installed_DBD_modules{lc($1)} = 1)
    } $installed->modules();
    Though one wonders why it would go into a hash with a counter. Can a module be installed more than once?
    I suppose there is a reason why this (which doesn't remove "DBD::" or convert to lowercase) wouldn't have been enough:
    @DBD = grep /DBD::/ $installed->modules();