More about encrypted backup
Richard Bejtlich linked to my last post here and several people have emailed me to suggest existing services or utilities which I might want to use. I have considered all of these; for a variety of reasons none of them satisfy my needs.The first suggestion I received (from several different people) was Mozy Remote Backup. There are a few reasons I don't want to use this:
- Mozy doesn't support FreeBSD.
- It's not clear how efficient Mozy is at handling modifications in files. Mozy's overview talks about "Block-level differential backup", and their FAQ mentions that Mozy "Backs up only blocks that have changed"; but their Affiliate Program page mentions byte-level incremental backups, which are far better. The fact that Mozy's website isn't even self-consistent about such an important detail doesn't exactly inspire confidence either.
-
Mozy advertises their "448-bit symmetric encryption"; this brings to
mind Bruce Schneier's
Snake Oil
Warning Sign #5: Ridiculous key lengths (you'll have to scroll down
a bit to find it), and an insightful observation I encountered a while
back (I can't remember who made this observation, sadly): "Beyond 256
bits, the security of a system using symmetric encryption tends to be
inversely proportional to the length of the key advertised".
Indeed, there are a couple of points which give me direct cause to be skeptical about Mozy's security. First, they use Schneier's Blowfish cipher instead of his more recent Twofish cipher; while Blowfish was a remarkably solid block cipher for its time, there is no justification for using it now instead of Twofish or AES (Rijndael). Second, and far more disturbingly, Mozy recommends that you encrypt your data not with a key which you alone hold, but instead with a default Mozy key -- which essentially means that your data would not be encrypted at all. After such a monumental blunder, the fact that Mozy offers the option of encrypting data with my own key isn't going to make me trust them to get anything right as far as security is concerned. - Finally, I don't like Mozy's tiered pricing. Computers are very good at keeping track of how much disk space / bandwidth / cell phone airtime / electricity / etc. people are using and sending out appropriate bills at the end of a month. Being asked to decide ahead of time whether I will want 5GB of storage, 30GB of storage, or 60GB of storage simply makes me think that Mozy's business model is dependent upon people paying for far more storage than they actually use.
The next suggestion I received (again, multiple times) was duplicity. This inspires rather more confidence than Mozy, but still has one critical limitation: It operates within the traditional model of "full backup + incremental backups" instead of a snapshotted model. This means that you can't delete a backup without making all the incremental backups taken after that point useless; I wouldn't be able to, for example, have hourly backups for the past week, daily backups for the past month, weekly backups for the past year, and monthly backups beyond that. It also slows down the process of recovering from the backups, since you would have to download the full backup and all of the incremental backups thereafter instead of simply restoring the latest snapshot directly.
The third popular suggestion I received was Box Backup. The "Programmers(sic) Notes" included are a bit difficult to understand; it sounds like boxbackup does use some very complicated magic with its "encrypted rsync" to allow some old bits of files to be removed, but I'm not sure if this includes intermediate versions of backed-up files or only the versions which are the oldest at the time. The later possibility is fine if you only really care about having a backup of the most recent version of everything, but it's not useful if you want (as I do) lots of recent backups but far less frequent older backups. Box Backup also leaks more information than I'm comfortable with; it allows the 0wner of the system on which the backups are being stored to identify
- The structure of the directory tree,
- The number of files in each directory,
- Approximately how large each file is, and
- Which files have been modified.
UPDATE: See my more recent post for a clarification about Box Backup.
In short, I'm still not aware of any utilities of services which satisfy my backup wants. Any other suggestions? Please let me know.
Encrypted snapshotted remote backup
As most readers will be aware, I'm currently unemployed; as many of you have guessed, this is related more to having too many options than too few. In order to help me decide what I should do next, I'm looking for feedback from you, my readers: If an encrypted snapshotted remote backup service was available, would you pay to use it?That's a lot of buzzwords; here's what I mean by them:
- Encrypted: When you configure the client code on your system, you provide a symmetric key. All of your data is encrypted with that key before it leaves your system; this would include not only file contents, but also metadata (file names, ownership, permissions, flags, the directory structure, et cetera). Obviously you would have to keep a copy of that key somewhere safe.
- Snapshotted: Every backup you performed would behave like a full backup; but the storage space used by many snapshots would be equal to the space required by one snapshot plus the differences between the snapshots. You would be able to delete or restore from any of the snapshots efficiently regardless of how many other snapshots you had taken. (FreeBSD users will note that this is essentially the same behaviour as snapshots exhibit on the UFS2 file system.)
- Remote: The backups would be sent over the Internet, thereby protecting them from fire, theft, or any of the many other events which cause problems when a system and its backups are sitting next to each other on a table.
- Backup: You probably already know what this means. You would run "tarsnap -c ..." to create a snapshot, "tarsnap -t ..." to view the files in a snapshot (e.g., so that you could figure out when you accidentally deleted a file), and "tarsnap -x ..." to extract all or part of a snapshot. Unlike tar, of course, you would also be able to run "tarsnap -d ..." to delete a snapshot.
- Service: Instead of buying hardware for storing backups yourself, you would pay GB of bandwidth and per GB-month of storage for having your backups stored remotely. The price would most likely be around $0.25/GB-month of storage and $0.25/GB of bandwidth -- slightly higher than Amazon's Simple Storage Service, but of course S3 isn't an encrypted snapshotted backup service. :-)
As a result of my background in algorithms and security, and my experience with bsdiff and portsnap, I think I'm ideally suited to produce such a service (and more importantly, the client code which would contain all the intelligence required for it -- given that all the data would be encrypted before being sent to the backup server, there isn't much opportunity for intelligence on the server side).
If such a service existed, I would certainly use it; this should not be very surprising, since this entire idea originated with me asking myself what I would like to see in a perfect backup system. I have no desire, however, to spend a long time creating such a service if I would be its sole user -- particularly given the aforementioned employment opportunities available. So please let me know if you would use such a service; I'd also be interested to hear how many systems and what total volume of data you would want to back up, as well as any other ideas you might have for what features "a perfect backup system" should have.
UPDATE: Tarsnap now exists and is in public beta.
Note to employers
Consider the following two offers:- "We would like to hire you for the position of FooBar to work on Baz; you'll be working with Dr. A, Mr. B, and Mrs. C. We're willing to pay you $XXX,XXX per year, plus a YY% annual bonus and ZZZ shares in the company."
- "We would like to hire you for the position of FooBar; we'll pay you $XXX,XXX per year, plus a YY% annual bonus and ZZZ shares in the company."
If you want to hire me, offer me a job. Don't simply offer me money; it won't work.
FreeBSD Update
On August 31st, I committed the FreeBSD Update 2.0 build and client code to the FreeBSD CVS repositories. I'll be MFCing the client code to RELENG_6 tomorrow so that FreeBSD Update can be part of FreeBSD 6.2-RELEASE. This is the largest part of what I wanted to get done over the summer, so I'm glad that I've finished it.Unfortunately, this ended up taking rather longer than I expected; part of this, as I mentioned earlier, is due to the slow feedback time of testing build code which takes two hours or more, while part of it is because it took me much longer than it should have taken to sort out how the client-side "IgnorePaths", "UpdateIfUnmodified", "AllowAdd", "AllowDelete", and "KeepModifiedMetadata" options should interact with each other. The questions of "What when users decide to delete sendmail?" and "What if a security or errata update creates a new file, and then a later update modifies that file?" caused me particular headaches. On the positive side, spending a long time thinking about issues like this means that I'm fairly confident that FreeBSD Update can gracefully handle whatever crazy situations users throw at it.
As a result of spending longer than I expected on FreeBSD Update, as well as spending time producing code for upgrading between FreeBSD versions -- expect to see more of this as the FreeBSD 6.2 release cycle starts -- I haven't had time for some of the work I wanted to get done on Portsnap. Even though my paid 4 months of FreeBSD development are over, I expect to continue spending most of my time working to get this done, at least for the immediate future: I have enough money left over from the summer that I don't need to get a new job immediately, and I'm still waiting for one or more job offers before I decide what I'll be doing for the next year, so I might as well take advantage of the time.