What does $1265 of bugs look like?
Four months ago, I announced Tarsnap bug bounties ranging from $1 to $2000; and yesterday I released version 1.0.30 of the Tarsnap client code — which I'm calling the "bug bounty edition". Over four months I awarded 211 bounties totalling $1265; running the bug bounties has been a very interesting experience, and I'll be writing later about some of the lessons I've learned from it, but first I'd like to answer a simpler question: What does $1265 of bugs look like?To start with, there's a lot of spelling mistakes — 53, by my count. Within a few hours of my announcing the bug bounties, someone had run a spell checker against all of the comments in the Tarsnap source code and found about half of these; the rest, consisting mostly of cases where one English word was replaced by another ("field" had turned into "filed", "from" into "fro", "its" into "it's", etc.) trickled in over the following months. While the automated spell checker doesn't serve my stated goal of "get people to look at the code", the fact that so many "wrong English word" typos were reported clearly demonstrates that people were looking at the code.
Then there's all the style bugs — about 70 instances of them. A multi-line comment with the final */ on the last line of text instead of on a new line. Comments describing a function foo() instead of foo(void). Single-sentence comments which weren't punctuated as sentences. A multiple-sentence comment with one space between sentences instead of two. Spaces at the start of a line where there should only be tabs. Whitespace at the end of a line. A surplus blank line. All of them trivial except perhaps to OCD sufferers — but all of them signs of careful reading.
Next is a group of 60 issues which I call "harmless bugs". In these, the code is absolutely wrong — but it's wrong in a way which really doesn't matter. A variable is declared and assigned a value but never used again; for instance: Obviously wrong, but it's not going to cause any problems. A memory allocation is leaked in an error path which will result in the process exiting a few microseconds later. A timestamp of (time_t)(INT64_MIN) — approximately 280 billion years before the Big Bang — is not correctly recorded in an archive. The memcpy function could be asked to copy zero bytes from NULL, which according to the C99 standard has undefined behaviour even though every C library in the world treats it as a no-op. This group of bugs took me the most time to handle, as there were a lot of them and in each case I had to look at the surrounding code to make sure that the error was in fact harmless.
Finally there's 11 bugs which users could actually stumble across. Some of these are obscure — three relate to the handling of mtree files via @archive directives, and I'd be surprised if anyone has ever used Tarsnap's @archive directive with an mtree file — but a few have either been stumbled across already or are plausible enough that I'm very glad to have caught them before anyone stumbled across them:
- If Tarsnap fails to read a directory — e.g., if a hard disk is in the process of failing while Tarsnap is running — it didn't handle the error properly and could silently fail to archive files. (If we can't read a disk, we can't archive it; but we should at least exit with an error message!)
- If the directory Tarsnap is run from becomes inaccessible while Tarsnap is running and two or more paths are specified to be archived, Tarsnap could end up archiving the wrong files. (Tarsnap needs to return to where it started before it starts to archive the next path; if it can't do this, it should exit with an error message rather than looking for files in the wrong place.)
- If Tarsnap is killed (via ^C) at exactly the wrong moment after creating or deleting an archive, its cache directory could get into a state where future attempts to use Tarsnap would result in it immediately exiting with an error message. Ironically this bug lay in some code which was trying very carefully to ensure that Tarsnap's local state couldn't be damaged by a system crash happening at the wrong moment; the fix was trivial once I realized that the link(2) system call wasn't idempotent.
- If the tarsnap-keygen command was run with --machine '', i.e., an empty machine name, it would fail — as it should — but only after spending five minutes trying to register with the server — which it shouldn't. The fix, of course, was to detect and reject an invalid name on the client side, rather than sending it to the server and watching it reject the request as being nonsense.
So what does $1265 of bugs look like? A handful of serious bugs, and a handful more which are serious but sufficiently obscure that they would likely have gone for years without causing problems; sixty places where code was wrong but in ways which simply didn't matter; and well over a hundred things which were wrong yet didn't actually affect the compiled code.
But most importantly, $1265 of bugs gives me the peace of mind of knowing that I'm not the only person who has looked at the Tarsnap code, and if there are more critical bugs like the security bug I fixed in January, they've escaped more than just my eyeballs. Worth the money? Every penny.
What I meant to say...
Yesterday morning I was interviewed on Floss Weekly about my Tarsnap backup service and my work bringing FreeBSD to Amazon EC2. This was the first non-print news interview I've done since high school and the first live interview I've ever done, and it was a very interesting experience. I have considerable experience speaking at conferences, including two talks about FreeBSD on EC2, but I was surprised at how different being interviewed was: Rather than covering topics methodically, we jumped around a lot, making it hard for me to keep track of what I had said and what still needed saying.Watching the video this morning I was struck by the number of times that I forgot to mention something or started to say something but lost track of where the sentence was headed before I got to the end of it; so without further ado, here's the "what I meant to say" errata for the interview:
What I said: I wanted to have the ability to run FreeBSD [in EC2],
particularly for my Tarsnap project [...] and eventually I managed to
get it working.
What I should have said: Even though I've gotten FreeBSD running
in EC2, I don't have the Tarsnap server code running on FreeBSD yet,
since FreeBSD/EC2 is still quite new. I'd be happy using FreeBSD/EC2
for many things, but for a backup service there needs to be a
very high bar for "make sure it works" before using something, and I
don't think FreeBSD/EC2 has had enough testing for me to be confident
that it's at that point yet.
What I said: The way EC2 works is, it's rent by the hour virtual
machines; you can go in there, say "I want five machines, ten machines,
a hundred machines for the next hour"; or rather you say "I want to
start a hundred machines right now" and then you can stop them at some
later point and then you only get charged for the number of hours that
you're actually using them.
What I should have said: There's also "spot" instances, where you
say how much you're willing to pay per hour of instance time, and at any
moment whoever is willing to pay the most for an instance gets it —
these are usually cheaper than the "on-demand" rate, but they carry the
risk that your instance will be killed off if someone places a higher bid.
A lot of people use these for batch jobs; I've personally found that these
are useful for testing FreeBSD images, since I could cheaply spin up 100
instances at a time to see how frequently certain panics occurred.
What I said: [Tarsnap] runs on basically everything POSIXy; so
FreeBSD, NetBSD, OpenBSD, a whole bunch of Linuxes, OpenSolaris [...]
cygwin as well, it works fine there; there's a few other more obscure
UNIXes that people have tried it on that it works fine on, I can't
remember all of them.
What I should have said: OS X is one of the "obscure UNIXes"
where people run Tarsnap.
What I said: The version of [bsdtar and libarchive] which Tarsnap
is currently using doesn't have very good Windows support; but newer
versions of those have good Windows support, so when I get around to it
I will be updating to a newer libarchive, at which point it should be
fairly straightforward to support Windows [...]
What I should have said: Also, I'm a FreeBSD developer, not a
Windows developer, so when I started work on Tarsnap I decided it was
better to start with what I knew — I'd rather have a thousand
happy UNIX users than a million Windows users who are unhappy beecause I
did things wrong when trying to write code for an unfamiliar environment.
What I said: The Tarsnap server code is not public [...]
What I should have said: ... but parts of it are. In particular,
I wrote my Kivaloo data
store because I wanted a better key-value store for the Tarsnap
server to use, and I released that under the BSD license at the end of
March. With the Tarsnap server code, just like the client code, I'll
open source whatever pieces I think can be independently useful.
What I said: As far as EC2 goes, I've been working on my own [...]
What I should have said: ... but there have been several great
developers working on the underlying FreeBSD/Xen codebase, and without
all the work they've done over the years there's no way I would ever
have been able to do the final few bugfixes and packaging gunk needed
to get FreeBSD into EC2. [This was a classic case of forgetting where
a sentence was going — I started the sentence thinking "I'll talk
about EC2 specifically and then contrast it with FreeBSD/Xen generally"
but got lost midway.]
Like most people, I've always frowned on the use of "talking points" by politicians; it seems that they discourage deep discussions of issues in favour of repeating set phrases ad nauseam, and when the talking points are prepared by political parties rather than by the speaker they can stifle any attempt to uncover the speaker's personal views. Today I've learned the other side: In a live interview, it's very easy to lose track of things and forget to mention important points. I don't know when I'm going to get an opportunity to do another interview like this, but when the time comes, I'm going to be prepared with a list of topics and important facts about them, so that I don't fail to mention vital work done by other developers, struggle to remember the name of a TV show, or forget that OS X is a flavour of UNIX.