Pipelined HTTP GET utility
While writing portsnap, I found myself in need
of a utility for performing pipelined HTTP. This is different from
"normal" HTTP in that it allows the HTTP client to have several HTTP
requests "in flight" at once, and can dramatically increase performance
when a large number of small files need to be downloaded. (This was
the case with portsnap, where downloading 300 files of 200 bytes each
is not unusual).
As a result, I've written a minimalist pipelined HTTP client and added
it as an experimental feature to the latest version of portsnap; but
I've also decided to package it separately here in case anyone else
finds themselves in need of such code. Version 0.2 of phttpget is
available here.
Note that phttpget is currently extremely minimalist. Of particular
note:
-
Phttpget can only issue GET requests.
-
Phttpget cannot download files larger than 2GB (but this can be easily
changed -- search for INT_MAX and replace it by something bigger).
-
Phttpget blithely ignores HTTP errors and redirects... in fact, if the
HTTP status code is anything other than 200, phttpget will skip over
that file and move on to the next file.
-
Phttpget ignores timestamps provided by the server. When it creates a
file, the file's timestamp will be set to the current date, not the
date provided by the server.
-
Phttpget creates downloaded files in the current directory, with names
equal to final segment of the download path (i.e., if it downloads
http://www.example.com/foo/bar/baz then it will create a file
named baz in the current directory). Phttpget makes not
attempt to check for symlinks or other nastiness. Do not use
phttpget if any other user can write to your current directory!
-
If you already have a file where phttpget wants to create a file, it
will silently remove the existing file.
-
I wrote phttpget in about 28 hours, and finished under 12 hours ago.
It has had very little testing and probably still contains lots of
bugs. (12 hours later: bugcount--. Version 0.1 had a deadlock when
fetching a very large number of files due to a missing "break"; this
is fixed in version 0.2.)
If you use this code, please
contact me, both because
I'm interested to know about these things and so that I can notify you
in the unlikely event of a security issue being discovered.