Error-handling: a fable in code

One of my favorite domains to review in existing applications, because it tends to be so error-ridden, is … error-handling. Too many programmers regard a language’s exception-handling syntax as a solution rather than just a mechanism, so error-handling tends to be misguided or at least neglected. A little more attention in this area often pays off with far greater end-user satisfaction.

Perhaps the hardest part of handling errors is simply to remember that it is programming. I encounter many coders who appear to believe that it’s someone else’s job. In fact handling errors should be a routine part of definition and fulfillment of requirements. Here’s a parable about what often happens with even a single line of code:

True-life rework

An application needs to read a configuration file:

        fp = open(CONF, "r")
    

While this is Python, what happens next is equally likely in Java, JavaScript (maybe with cookies), Perl, C#, or other common languages. At this point, the application “works”, and attention moves on from this particular line to more pressing matters …

… until the day CONF goes missing, and an end-user sees a traceback on her screen. That is clearly not acceptable, and someone quickly rushes

          try:
              fp = open(CONF, "r")
          except:
              pass
      

into production while hunting down CONF. It turns out that the user had launched the application from a bookmark no one had considered (or disabled cookies, or had customized the installation in an unexpected way, or done any of the other things end-users do). Folklore within the organization concluded that the error was “fixed”, and someone elsewhere coded in protection against the bookmarking …

… until the next time an end-user was clever enough to re-create a similar situation. This time, instead of appearance of a traceback, a distant part of the application broke down. Eventually, after too-much debugging effort, the code in the vicinity of CONF was upgraded to

            try:
                fp = open(CONF, "r")
            except:
                alert_user()
                return
        

Business returns to normal …

… until the day an end-user sees a warning on his screen about bookmarks (or cookies, or missing initialization), and is more frustrated than ever, because he already did what the warning advises. After more too-difficult debugging, someone discovers there’s a rare possibility that CONF hasn’t been properly assigned. The coders begin to realize the hazard of a “naked except“, and more carefully qualify:

            try:
                fp = open(CONF, "r")
            except NameError:
                alert_user_about_initialization()
                return
            except IOError:
                alert_user()
                return
        

Problem solved …

… until the day a sysad rationalizes networking in the back-office, and a critical file-share ends up with unexpected permissions. An end-user sees a warning about a condition that has nothing to do with firewalls, and is utterly frustrated until someone recognizes that IOError covers a multitude of causes. Soon our CONF reader looks something like

            try:
                fp = open(CONF, "r")
            except NameError:
                alert_user_about_initialization()
                return
            except IOError, e:
                if e.errno == 13:
                    alert_about_networking()
                elif e.errno == 2:
                    alert_user()
                return
            except:
                last_ditch_alert()
        

It’s still not done. This is far from the end. The last iteration above of what started as a single line would eventually toss at least two more as-yet-undiagnosed problems.

Something is clearly wrong. To reach this point involved multiple upset end-users and too-many late-night debugging sessions, and the “hot spot” of the initial open still is not “bullet-proof”.

This is the point in a tale where I like to present a solution with almost miraculous powers. For this problem, though, there isn’t one; in fact, “error-handling” is so thorny that I’ve already collected a book’s-worth of material on the subject and its remedies. While there are plenty of tips along the way–no bare except-s, for instance–and articles like “Robust exception handling” do a good job of explaining the basics, the general problem simply lacks a magical solution. IT organizations need to recognize that “error-handling” demands its own analysis, requirements definition, testing, and maintenance. Customers pay for positive features, of course, not for nicely-handled errors, of course. Features-and-functionality need to come first; still, a majority of the time or at least attention in any particular session of use of an application can lie within its error-handling. Improvements in error-handling represent a great opportunity to eliminate distractions so that users can appreciate functionality. Often, the best way to help users see the value of the features in your programs is to make sure errors are handled professionally.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>