Think before you code
In the past ten days, I haven't written a single line of code for tarsnap. And I'm proud of it.There has been a trend lately, particularly where internet startups are concerned, to measure success by the number of lines of code written. People talk about the all-important "first-mover advantage" and about things moving at "internet speed", while Paul Graham -- co-founder of Viaweb and of the Y Combinator startup incubator -- lists "Release Early" as #1 on his list of lessons for startups to learn.
In a recent article about Y Combinator, Paul Graham pointed to the fact that two people had written 40,000 lines of code in three months as a sign that they were doing something right; he went on to point out that "you never see that in a big company". To me this number is, if anything, a sign that things are going horribly wrong: Anyone who is consistenly writing more than 5,000 lines of code per month is either (a) not working on a problem which is difficult enough to be interesting (any half-competent programmer can write binary searches, quicksorts, and depth-first graph traversals at a rate of thousands of lines of code per day); (b) utterly incompetent (we've all seen people who can replace ten lines of working code with a thousand lines of buggy code); or (c) going to have to throw out and rewrite most of their code once they realize that it doesn't solve the problem which they needed to solve -- with this realization most likely coming after their first release.
Of these three scenarios, I think it is the third which is the most dangerous, and -- with apologies to Fred Brooks -- I'm going to call it the "zeroth-system effect". In his book "The Mythical Man-Month", Fred Brooks notes that after a first system which works but has very few features typically comes a second system which is bloated with all the features which were considered but not included in the first system (and as a result of this, the second system invariably takes longer and is more expensive than expected). The zeroth system is what comes before the first system: Not only does it have very few features, but it doesn't work.
Of course, when I say that the zeroth system doesn't work, I don't mean that it doesn't successfully do anything useful; rather, I mean that it contains such fundamental bugs as to cast doubt upon whether the authors had any understanding of the field in which they were operating. An example of this is the recent Cross-site scripting vulnerability in reddit: Cross-site scripting attacks were widely known and understood long before reddit was created, and the classic example of where XSS vulnerabilities can occur -- sites where users can post comments to be read by other users -- exactly matches this vulnerability.
So what have I been doing for the past ten days? Improving my understanding of the field in which I'm about to write code -- namely, cryptography and cryptographic protocols. I'm about to start writing the client-server protocol code for tarsnap, and I want it to be secure: Based on the work I've already done, there's no danger of an attacker (assuming he hasn't stolen keys, can't factor 2048-bit integers, can't break AES, etc.) being able to decrypt or forge tarsnap backups, but the client-server protocol needs to be secure in order to prevent an attacker from (a) impersonating a client and deleting backups, (b) impersonating a client and wasting said client's money, or (c) impersonating a server and causing a client to think that data is being backed up (or that a backup is being deleted) when it isn't. Part of what I've concluded is that given the state of cryptographic libraries (i.e., there isn't anything available which is immune to side channel attacks) I'm going to use Elgamal key agreement instead of transporting session keys over RSA: Elgamal keys can be generated far more quickly (since they can all use the same modulus), so by bounding the rate at which information leaks through side channels and by frequently generating new keys, I can prevent any server key from being disclosed via a side channel attack. If I had jumped into writing code immediately, I wouldn't have been able to make such a reasoned decision.
"Write code" is definitely important. "Release early", too. But more important than either of those is "Understand the problem you're trying to solve"; and most important of all: "Do it right".
Miscellaneous updates
In the past month, I have sat down to write entries here several times, only to end up deciding that I didn't have enough to say about the topic in question. Well, not enough times four probably is enough, so here's the miscellaneous news/musings update for the month.On May 16 - 19 I was in Ottawa attending BSDCan'07, meeting all of the FreeBSD developers whom I hadn't seen since BSDCan'06, and giving a talk about FreeBSD Portsnap (my slides are available in PDF format). My talk was unofficially subtitled "a case study in black magic" --- unofficially since the conference T-shirts were already being printed when I came up with the subtitle --- and I very much hope that's what people take away from my talk (and from reading the slides, for those people who didn't attend the talk itself): Lessons which can be learned from Portsnap and applied to other problems. As usual, BSDCan was fantastic --- Dan is an amazing conference organizer --- and I'm sure BSDCan'08 will be even better next year.
While I was at BSDCan, many people asked me if I had followed up on my earlier musings concerning Encrypted snapshotted remote backups. The answer is yes: I decided to work on this instead of taking a job with a company which I can't name due to an NDA. At BSDCan, the status of my work was that I had a really great offline non-encrypted snapshotted backup system and was in the middle of putting the bits together for the encryption; the status is now that I have what I consider to be the world's greatest offline encrypted snapshotted backup system, and am working on the "online" code.
Speaking of cryptography, the past few weeks have reminded me why I don't like OpenSSL very much. Among its many other problems, OpenSSL:
- Doesn't have any documented mechanism for exporting and importing "raw" RSA key parameters (i.e., without base 64 encoding or similar nonsense). In my code I'm reaching into a "struct rsa_st" to access the BIGNUM fields directly, and hoping that OpenSSL won't change so much as to make this stop working any time soon.
- Doesn't have any documented mechanism for differentiating between "internal OpenSSL error" and "RSA signature is invalid" (or, less importantly, between "internal OpenSSL error" and "RSA-encrypted message is invalid"). In my code I'm looking at the error code returned by ERR_get_error, but this is not a documented solution (and is not guaranteed to work in the future -- OpenSSL has added new error codes as part of security patches in the past).
- Doesn't have any documentation at all for the AES_* family of functions. It's not hard to figure out how to use them, but still...
To conclude with a completely non-technical note, it seems that 2006/07 is the year of Colin's friends leaving Vancouver: Within a 12 month window, I have friends moving to London, Japan, Montreal, and Winnipeg. They all have perfectly good reasons for leaving, and they're all coming back to Vancouver sooner or later; but it's still a bit disconcerting seeing so many of my mid-20ish friends scattering.