Monday, October 22, 2007

Bugs. Infestation. Vulnerability. Gauging your Level.

A good way to start a war among software developers is to state that it's impossible to write bug-free software for anything more complicated than a "Hello World" program. And that goes for claiming the reverse as well. Though "Helo World" has even had its problems at times.

However, there are Bug Infestation Vulnerability Levels (BIVLs) that correspond to software development practices, and these can be a useful shorthand for identifying just how much you're wanting to work at, and spend on, bug eradication.

(First off, we do need to make some assumptions, like that the hardware works, and that the compiler correctly compiles, or that the virtual machine or interpreter correctly executes the byte codes or other constructs making up the program.)

OK, so let's start with ...

BIVL 0: Invulnerable.

Believe it or not, by leveraging formal mathematical techniques like Z Notation and the B Method, and by it now being practical to perform program verification to formally prove correctness, it is actually possible to write defect-free non-trivial applications, and incredibly enough, still actually be seriously productive in doing it, at least in a corporate, mission-critical environment.

But then when you think about it, why shouldn't you have a decent productivity rate when employing such formal practices? Once the code is written, compiled, and formally proven correct--meaning zero defects--there's no debugging (because there's no bugs, get it?), so there's no tracking down problems, no devising fixes, and no patch integration. Kinda knocks a lot of time off the test phases, eh?

Now whether what you write conforms to your requirements is another matter, but as far as software implementation goes, this is as close to defect-free software development as you're going to get, and it's no longer just for trivial applications.

BIVL 1: Best Effort

Despite all the proven benefits in saving time and effort involved with the use of formal methods and proving program correctness, it remains limited to a very, very small number of projects, usually just those with extraordinary safety or security requirements.

Frankly, the rest of us programmers just want to do some design and then get coding. For those who are really passionate about writing high quality software, but just can't get into the formal stuff, obsessing about doing everything else possible to keep bugs out of the code results in BIVL 1 software. That means:
  • Using a programming language that helps prevent or flush out errors
  • Employing assertions, design contracts, or the equivalent
  • Code inspections
  • Expecting any invocation of any external service to fail sooner or later
  • Distrusting the validity of everything that comes in from an external interface. Or from any code you didn't personally write. Or from code that you did personally write.
Making a BIVL 1 Best Effort is not defined in terms of the programmer's "best effort", i.e., the best they themselves are inherently capable of. It's actually putting into practice those objective programming practices listed above (along with others) that result in making the Best Effort possible to produce high quality software. Doing this conscientiously can even make a Terrible Programmer look pretty good.

BIVL 2: I've got to get this done tonight.

This, unfortunately, is the norm. You gotta get the code out the door, or over to test, and there just isn't time to add code to validate every external interface (which should all be working correctly anyway), or figure out what kinds of assertions to put in the code, and no one has any time to inspect anything; but hey, you're a good programmer, or at least "extremely adequate". Sure, you check for the stuff you know could fail, like a file not found, or losing a connection, but beyond that the likelihood of a function failing really is exceedingly rare.

And besides, all software has bugs.

At this point click over to your favorite software project failure statistics.

The obvious irony of this is that part of the reason there isn't time to put in all the checking and validation stuff is because of having to spend time fixing bugs that, well, could've been caught sooner and fixed more easily if more error checking and validation (and paranoia) had been incorporated earlier in the development cycle. So valuable time is spent going back and fixing bugs, and retesting, and reintegrating, and redelivering, and reviewing and prioritizing the never-ending stream of bug reports.

Why is it there's never time to do it right, but there's always time to do it over?

BIVL 3: "Hey, at least it didn't crash..."

If you're going to write code this crappy, why even bother? What are you hoping to accomplish?

One guy I worked with spent a week developing some aero data analysis functions, which then ended up on the shelf for a few months. He moved on to another department, and so when it came time to integrate these functions into the full system another developer picked them up. This latter developer decided to hand run some of the functions just as a sanity check, and kept getting incorrect results. When he dug into the code he discovered that the analysis was all wrong, and that the functions never came up with the right results. He ended up taking another week or so to rewrite it, this time verifying that the outputs were actually correct, instead of just non-zero. The original developer? I think he was ladder-climbing into management--probably just as well.

Maybe your software does happen to work correctly, but only when absolutely nothing unexpected happens, when all services and interfaces work properly and return only valid data, when nothing is at (or beyond) the boundary conditions, when all inputs fall within the realm of "normal" use.

And if something off-kilter does occur? Well, you have no idea how the software is going to react. Maybe it will crash, maybe it will start acting flaky, and maybe there will be absolutely no indication at all that something has been going awry for a long, long time.

What did you think was going to happen?

Summarizing the Bug Infestation Vulnerability Levels:
  • 0 - I have proven that nothing can go wrong (really, no joke).
  • 1 - I know everything that can go wrong.
  • 2 - I don't know what all can go wrong, but I'm pretty sure when it's going right.
  • 3 - Is this thing on?
The amount of time spent testing and integrating always exceeds the time spent coding. It's a tired refrain, but true nonetheless: Spending more time in coding, and conscientiously coding defensively, reduces test and integration time, and therefore overall development time.

Driving down your BIVL can actually reduce development time by markedly reducing the rework portion of the development schedule. There are techniques to drive that level down that result in increased code quality. Some are more formal than others, but the main requirement is recognizing that good software is difficult to write, and then being mentally prepared and disciplined and equipped with the right techniques and tools to manage that difficulty.

Good code is doable, and doable efficiently.

What's your Level?

No comments: