FreeBSD + Vantec NexStar 3
A few days ago, I added an entry to the FreeBSD Developers Want List indicating that I would like to have a large hard drive and USB-attachable enclosure, in order to permit me to perform backups in a more sane manner. Santa (aka. Daniel Seuffert) provided me with a 250GB Seagate Barracuda 7200.9 SATA2 hard drive and a Vantec NexStar 3 USB 2.0 enclosure, and since several people were curious as to how well this hardware was supported by FreeBSD, I thought I should provide a brief report.The good: It works. I installed the drive into the enclosure, plugged in the power, and plugged the USB cable into my Dell D600 laptop, and FreeBSD 6.0-RELEASE-p4 recognized it immediately:
Jan 27 19:07:15 hexahedron kernel: umass0: Sunplus Technology Inc. USB to Serial-ATA bridge, rev 2.00/c4.fd, addr 2I could then read and write to /dev/da0 just like I would any other 250GB hard drive. I could partition it, label it, create filesystems on it -- everything Just Worked.
Jan 27 19:07:15 hexahedron kernel: da0 at umass-sim0 bus 0 target 0 lun 0
Jan 27 19:07:15 hexahedron kernel: da0: <ST325082 4AS > Fixed Direct Access SCSI-2 device
Jan 27 19:07:15 hexahedron kernel: da0: 40.000MB/s transfers
Jan 27 19:07:15 hexahedron kernel: da0: 238475MB (488397168 512 byte sectors: 255H 63S/T 30401C)
The bad: It is a bit slow. USB 2.0 can, in theory, transmit data at 60MB/s, while according to StorageReview the Seagate drive has a transfer rate varying from 34.4 MB/s to 62.0 MB/s. In contrast, the transfer rate I obtained via USB was constant at approximately 25 MB/s across the entire drive.
FreeBSD's diskinfo -c explains the reason for the poor performance: command overhead. In contrast to my laptop's hard drive, where there is an overhead cost of 97 microseconds for a single read request, the USB-attached drive has an overhead cost of 730 microseconds. I imagine that this increased cost is largely due to the USB<-->SATA translation, but also partly due to my laptop's poor interrupt routing -- the USB controller is sharing IRQ 11 with several other devices, and the FreeBSD kernel needs to pick up the Giant lock to handle each interrupt.
The ugly: FreeBSD doesn't handle removal of drives very gracefully. When I unplug the USB cable, FreeBSD recognizes that the device is gone -- but if there is a filesystem mounted from the device, that filesystem remains mounted. FreeBSD doesn't want to unmount the filesystem, since it thinks the underlying device is busy; but at the same time you (obviously) can't do anything with that filesystem. If you ask FreeBSD to forcibly unmount the filesystem -- or if FreeBSD shuts down, at which point it forcibly unmounts every filesystem -- then it will panic.
I imagine that this could be fixed by teaching the kernel to forcibly unmount filesystems at the point when their underlying device is being removed (but before freeing the data structures associated with the device), but I'm not comfortable enough in the FreeBSD kernel to try to make that sort of change myself. In any case, there is a very simple answer to unplugging the drive while it has a filesystem mounted: Don't do that!
Canadian election results trivia.
Now that the results of the 39th Canadian general election are (mostly) in, I have looked through the numbers (helpfully provided by Elections Canada in CSV format) and pulled out some of the more interesting statistics:
- Closest ridings: In the riding of Parry Sound--Muskoka, the Conservative candidate, Tony Clement, is currently ahead of the Liberal candidate, Andy Mitchell, by 21 votes. Other close ridings (within 250 votes) are Louis-Hebert (Conservative 103 votes ahead of Bloc Quebecois), Desnethe--Missinippi--Churchill River (Liberal 106 votes ahead of Conservative), Winnipeg South (Conservative 110 votes ahead of Liberal), Glengarry--Prescott--Russell (Conservative 210 votes ahead of Liberal), and St. Catharines (Conservative 244 votes ahead of Liberal).
- Widest margins of victory: The widest margin of victory is in Crowfoot, where the Conservative candidate, Kevin Sorenson, is 39,134 votes ahead of the NDP candidate, Ellen Parker. The top 14 margins of victory are all Conservative wins in Alberta; the only other margin of victory of 25,000 votes or more is in Beauce, where the Conservative candidate, Maxime Bernier, is 25,918 votes ahead of the Bloc Quebecois candidate, Patrice Moore.
- Votes cast: Thanks to a growing population and increased voter turnout, 14,816,000 votes were cast, exceeding the previous record (13,667,671 votes cast, in the 1993 federal election) by over a million.
- Votes received by the winning party: The Conservative party received 5,371,000 votes, the third-largest total ever, after the Progressive Conservative party in 1984 (6,278,818 votes) and the Liberal party in 1993 (5,647,952 votes).
- Votes received by the second-place party: The Liberal party received 4,477,000 votes, the second-largest total ever for a losing party, after the Liberal party in 1979 (which received 4,595,319 votes -- almost half a million more than Joe Clark's Progressive Conservatives -- but came second in the number of seats).
- Proportion of votes received by the winning party: The Conservative party received 36.3% of the votes cast, the second-lowest proportion ever for a winning party, after the Progressive Conservative party in 1979 (which received 35.89% of the popular vote and formed a short-lived minority government, in spite of the Liberal party receiving 40.11% of the popular vote).
- The Green party did not win any seats, but did come second in the riding of Wild Rose, where Sean Maw trails the Conservative candidate, Myron Thompson, by 33558 votes. The Green party came third in two ridings, Bruce--Grey--Owen and Calgary West, and fourth in 223 ridings.
Note to media and blogs: Feel free to republish the above (in part or in whole), giving credit to Colin Percival or a link to this post.
Garbage collection is evil.
For several days I've been wrestling with a peculiar performance problem in Maple. In the Quadratic Sieve code I'm currently writing, I use external C code to perform the sieving -- that is, I have a QuadraticSieveSieveInterval() function which I wrote in C and call from Maple -- and the relations are collected and filtered in Maple. This allows me to keep the amount of C code needed to a minimum by using Maple for some of the messy initialization (e.g., computing modular square roots).The performance problem arose in the "collecting relations in Maple" part. My code is roughly as follows:
while (Nrels < NumberOfRelationsWanted) do rels := QuadraticSieveSieveInterval( ... ); for rel in [rels] do Nrels := Nrels + 1; rtab[Nrels] := rel; od; od;
With NumberOfRelationsWanted equal to 30000, I noticed something very odd: If I commented out the "rtab[Nrels] := rel" line -- that is, if I counted the relations, but didn't store them -- then the code would be faster by roughly 150 seconds. However, after collecting all the relations, I could copy them all into a new hash table in under one second. Somehow adding the relations to a table while they were being generated was 200 times slower than adding the relations to a table after they are generated.
After some exploration of Maple's profiling capabilities, I noticed an unexpected function was (according to the profiler) using 15% of the total CPU time: the garbage collector. This made me immediately suspicious, since the most significant difference (aside from the very much increased time taken) between throwing the relations away and collecting them in a table is that collecting them means that the total memory usage increases over time. With a bit more searching, I found that a kernel option "gcfreq" which controls the frequency with which Maple's garbage collector is called. The default value is "every million words allocated"; I changed this to "every hundred million words allocated", and suddenly my code was 160s faster -- even with the "store the relation in a table" operation which had been peculiarly slow, my code was now faster than it had been without that operation before.
I'm not sure quite why my code was causing the garbage collector to perform so poorly, but it might be related to the combination of very small memory allocations (used by Maple) and rather large memory allocations (in the sieving code itself). Whatever the cause, it's worth remembering that while garbage collection isn't always slow, it certainly can be slow and should be investigated as a possible cause of unexplained poor performance. J.K. Rowling remarked, via a character in the second Harry Potter book, that one should "never trust anything that can think for itself, if you can't see where it keeps its brain"; in much the same vein, I would suggest that one should never trust a programming language if you can't see where and how it allocates and deallocates memory.