What does $1265 of bugs look like?
Four months ago, I announced Tarsnap bug bounties ranging from $1 to $2000; and yesterday I released version 1.0.30 of the Tarsnap client code — which I'm calling the "bug bounty edition". Over four months I awarded 211 bounties totalling $1265; running the bug bounties has been a very interesting experience, and I'll be writing later about some of the lessons I've learned from it, but first I'd like to answer a simpler question: What does $1265 of bugs look like?To start with, there's a lot of spelling mistakes — 53, by my count. Within a few hours of my announcing the bug bounties, someone had run a spell checker against all of the comments in the Tarsnap source code and found about half of these; the rest, consisting mostly of cases where one English word was replaced by another ("field" had turned into "filed", "from" into "fro", "its" into "it's", etc.) trickled in over the following months. While the automated spell checker doesn't serve my stated goal of "get people to look at the code", the fact that so many "wrong English word" typos were reported clearly demonstrates that people were looking at the code.
Then there's all the style bugs — about 70 instances of them. A multi-line comment with the final */ on the last line of text instead of on a new line. Comments describing a function foo() instead of foo(void). Single-sentence comments which weren't punctuated as sentences. A multiple-sentence comment with one space between sentences instead of two. Spaces at the start of a line where there should only be tabs. Whitespace at the end of a line. A surplus blank line. All of them trivial except perhaps to OCD sufferers — but all of them signs of careful reading.
Next is a group of 60 issues which I call "harmless bugs". In these, the code is absolutely wrong — but it's wrong in a way which really doesn't matter. A variable is declared and assigned a value but never used again; for instance: Obviously wrong, but it's not going to cause any problems. A memory allocation is leaked in an error path which will result in the process exiting a few microseconds later. A timestamp of (time_t)(INT64_MIN) — approximately 280 billion years before the Big Bang — is not correctly recorded in an archive. The memcpy function could be asked to copy zero bytes from NULL, which according to the C99 standard has undefined behaviour even though every C library in the world treats it as a no-op. This group of bugs took me the most time to handle, as there were a lot of them and in each case I had to look at the surrounding code to make sure that the error was in fact harmless.
Finally there's 11 bugs which users could actually stumble across. Some of these are obscure — three relate to the handling of mtree files via @archive directives, and I'd be surprised if anyone has ever used Tarsnap's @archive directive with an mtree file — but a few have either been stumbled across already or are plausible enough that I'm very glad to have caught them before anyone stumbled across them:
- If Tarsnap fails to read a directory — e.g., if a hard disk is in the process of failing while Tarsnap is running — it didn't handle the error properly and could silently fail to archive files. (If we can't read a disk, we can't archive it; but we should at least exit with an error message!)
- If the directory Tarsnap is run from becomes inaccessible while Tarsnap is running and two or more paths are specified to be archived, Tarsnap could end up archiving the wrong files. (Tarsnap needs to return to where it started before it starts to archive the next path; if it can't do this, it should exit with an error message rather than looking for files in the wrong place.)
- If Tarsnap is killed (via ^C) at exactly the wrong moment after creating or deleting an archive, its cache directory could get into a state where future attempts to use Tarsnap would result in it immediately exiting with an error message. Ironically this bug lay in some code which was trying very carefully to ensure that Tarsnap's local state couldn't be damaged by a system crash happening at the wrong moment; the fix was trivial once I realized that the link(2) system call wasn't idempotent.
- If the tarsnap-keygen command was run with --machine '', i.e., an empty machine name, it would fail — as it should — but only after spending five minutes trying to register with the server — which it shouldn't. The fix, of course, was to detect and reject an invalid name on the client side, rather than sending it to the server and watching it reject the request as being nonsense.
So what does $1265 of bugs look like? A handful of serious bugs, and a handful more which are serious but sufficiently obscure that they would likely have gone for years without causing problems; sixty places where code was wrong but in ways which simply didn't matter; and well over a hundred things which were wrong yet didn't actually affect the compiled code.
But most importantly, $1265 of bugs gives me the peace of mind of knowing that I'm not the only person who has looked at the Tarsnap code, and if there are more critical bugs like the security bug I fixed in January, they've escaped more than just my eyeballs. Worth the money? Every penny.