The implicit constant in the Quadratic Sieve.
The Quadratic Sieve is, in a generic sense, the second-best general-purpose algorithm for factoring large integers. It takes time roughly exponential in the square root of the length of the integer to be factored (the square root of the logarithm of the length is, for all practical purposes, a constant), while the Number Field Sieve takes time roughly exponential in the cube root of the length; however, the Quadratic Sieve has a smaller constant factor inside the exponentiation, with the result that the Number Field Sieve only becomes faster than the Quadratic Sieve somewhere around 110--140 digits.Rather than talking about the Quadratic Sieve, however, I should really talk about a Quadratic Sieve, since there are several variations. First came the Multiple Polynomial Quadratic Sieve (which should really be called the Multiple Arithmetic Progression Quadratic Sieve), which uses the same polynomial f(x) = x^2 - N as normal, but instead of taking values of x close to the square root of N, takes values of x with x^2 < 2 N and within arithmetic progressions such that x^2 - N is a multiple of the square of a large prime. This provided a significant improvement since, by using different arithmetic progressions, the length of the sieving interval -- and thus the size of the squarefree parts which we hope will be smooth -- can be kept within reasonable limits.
A further improvement can be obtained by reducing the overhead costs of the sieving. Taking arithmetic progressions x = a X + b with different values of a, b as MPQS does, there is an "initialization" stage needed for each pair (a, b) where some computations are performed for each prime in the factor base. This cuts into the performance benefits achieved by MPQS, since it means that you can't let the length M of the sieving interval get too small, or the size F of the factor base get too large, at risk of spending most of your time in the initialization stage. The Self-Initializing Quadratic Sieve (which should really be called the Amortized Initialization Quadratic Sieve) takes the value a to be a product of k + 1 primes, and then works with the 2^k different values of b which result in x^2 - N being a multiple of a. Since most of the initialization work depends only upon a, this provides a very significant reduction in the initialization cost -- thereby permitting smaller sieve lengths M to be used and providing a speedup of roughly a factor of 2.
We can do better. Even with SIQS, there is a tradeoff between the length M of the sieving interval and the size F of the factor base: When sieving with a prime, there is an overhead cost (reading the prime from memory, comparing the offset to the sieve bounds, etc.) independent of the number of integers within the sieving range which are multiples of the prime in question. If M < F then this overhead will be entirely wasted in some cases -- we'll have to spend time considering primes which never appear as a factor of a value within the sieving interval. In order to avoid upstaging myself, I'm going to stop writing here, but on Friday I'll be giving a talk to the Computer Algebra Group at Simon Fraser University where I will explain how the sieving interval can be further decreased without suffering excessive overhead costs; I expect this to provide another factor of 2 speedup.
Now to the interesting point: A factor of 2 speedup in the Quadratic Sieve is more significant than it sounds. It allows you to factor integers 3-4 digits longer in the same amount of time; that much isn't very impressive. It also moves the tradeoff point between the Quadratic Sieve and the Number Field Sieve -- the point where GNFS takes over as the fastest algorithm available -- upwards by roughly 15 digits. Gain a couple of factors of two, and the tradeoff point is around 155 digits -- that is, 512 bits -- and the Quadratic Sieve becomes interesting for "cryptography-sized" integers. (Of course, nobody should be using 512-bit RSA keys any more -- but a significant number of SSL sites do.)
I would be surprised if the Quadratic Sieve ever again set a record-breaking factorization -- at the moment, GNFS has an advantage of over a factor of ten -- but in light of the relative simplicity of the Quadratic Sieve (remember, simple code is fast code!) I think it deserves more attention than it has received lately.
New portsnap buildbox.
Just a quick update: Portsnap builds are now running on the new buildbox I mentioned last week, and everything seems to be working properly. Each build (including building an INDEX for each of 4.x, 5.x, and 6.x) takes slightly under an hour, so a ports commit performed at time X will be reflected in the tree distributed by portsnap somewhere between time X + 50 minutes and time X + 2 hours.
Portsnap server problems.
I received an email two weeks ago from Jacques Vidrine, FreeBSD's Security Officer Emeritus: Some new donated machines were now available for use by the FreeBSD Security Team. After upgrading to FreeBSD 6.0, I started the process of rewriting my portsnap build code to run on FreeBSD 6.x instead of FreeBSD 4.x.None too soon, as it turns out. I received a rather disturbing email yesterday from my current portsnap buildbox:
Could not create directory '/root/.ssh'.When I investigated further, it turned out that the hard drive was dying: /root/ and a number of binaries in /bin/ had shuffled off this mortal coil. A few months ago, I encountered a problem with some files on /tmp/ and fixed the problem via newfs; sadly, that's not an option when the problem is on the root partition.
Host key verification failed.
I've managed to get that system partially functional: Portsnap builds are running again, but without INDEX builds. This means that portsnap users will be receiving up-to-date ports trees together with rather out-of-date INDEX files. (But hopefully no more than 24 hours out of date -- I'm generating INDEX files on a different system and feeding them to my portsnap buildbox.)
Meanwhile, I'm going to try to get the new portsnap buildbox up and running as soon as possible...