EC2 boot time benchmarking
Last week I quietly released ec2-boot-bench, a tool for benchmarking EC2 instance boot times. This tool is BSD licensed, and should compile and run on any POSIX system with OpenSSL or LibreSSL installed. Usage is simple — give it AWS keys and tell it what to benchmark:usage: ec2-boot-bench --keys <keyfile> --region <name> --ami <AMI Id> --itype <instance type> [--subnet <subnet Id>] [--user-data <file>]and it outputs four values — how long the RunInstances API call took, how long it took EC2 to get the instance from "pending" state to "running" state, how long it took once the instance was "running" before port TCP/22 was "closed" (aka. sending a SYN packet got a RST back), and how long it took from when TCP/22 was "closed" to when it was "open" (aka. sending a SYN got a SYN/ACK back):
RunInstances API call took: 1.543152 s Moving from pending to running took: 4.904754 s Moving from running to port closed took: 17.175601 s Moving from port closed to port open took: 5.643463 s
Once I finished writing ec2-boot-bench, the natural next step was to run some tests — in particular, to see how FreeBSD compared to other operating systems used in EC2. I used the c5.xlarge instance type and tested FreeBSD releases since 11.1-RELEASE (the first FreeBSD release which can run on the c5.xlarge instance type) along with a range of Linux AMIs mostly taken from the "quick launch" menu in the AWS console. In order to perform an apples-to-apples comparison, I passed a user-data file to the FreeBSD instances which turned off some "firstboot" behaviour — by default, FreeBSD release AMIs will update themselves and reboot to ensure they have all necessary security fixes before they are used, while Linuxes just leave security updates for users to install later:
>>/etc/rc.conf firstboot_freebsd_update_enable="NO" firstboot_pkgs_enable="NO"
For each of the AMIs I tested, I ran ec2-boot-bench 10 times, discarded the first result, and took the median values from the remaining 9 runs. The first two values — the time taken for a RunInstances API call to successfully return, and the time taken after RunInstances returns before a DescribeInstances call says that the instance is "running" — are consistent across all the AMIs I tested, at roughly 1.5 and 6.9 seconds respectively; so the numbers we need to look at for comparing AMIs are just the last two values reported by ec2-boot-bench, namely the time before the TCP/IP stack is running and has an IP address, and the time between that point and when sshd is running.
The results of my testing are as follows:
AMI Id (us-east-1) | AMI Name | running to port closed | closed to open | total |
ami-0f9ebbb6ab174bc24 | Clear Linux 34640 | 1.23 | 0.00 | 1.23 |
ami-07d02ee1eeb0c996c | Debian 10 | 6.26 | 4.09 | 10.35 |
ami-0c2b8ca1dad447f8a | Amazon Linux 2 | 9.55 | 1.54 | 11.09 |
ami-09e67e426f25ce0d7 | Ubuntu Server 20.04 LTS | 7.39 | 4.65 | 12.04 |
ami-0747bdcabd34c712a | Ubuntu Server 18.04 LTS | 10.64 | 4.30 | 14.94 |
ami-03a454637e4aa453d | Red Hat Enterprise Linux 8 (20210825) | 13.16 | 2.11 | 15.27 |
ami-0ee02acd56a52998e | Ubuntu Server 16.04 LTS | 12.76 | 5.42 | 18.18 |
ami-0a16c2295ef80ff63 | SUSE Linux Enterprise Server 12 SP5 | 16.32 | 6.96 | 23.28 |
ami-00be86d9bba30a7b3 | FreeBSD 12.2-RELEASE | 17.09 | 6.22 | 23.31 |
ami-00e91cb82b335d15f | FreeBSD 13.0-RELEASE | 19.00 | 5.13 | 24.13 |
ami-0fde50fcbcd46f2f7 | SUSE Linux Enterprise Server 15 SP2 | 18.13 | 6.76 | 24.89 |
ami-03b0f822e17669866 | FreeBSD 12.0-RELEASE | 19.82 | 5.83 | 25.65 |
ami-0de268ac2498ba33d | FreeBSD 12.1-RELEASE | 19.93 | 6.09 | 26.02 |
ami-0b96e8856151afb3a | FreeBSD 11.3-RELEASE | 22.61 | 5.05 | 27.66 |
ami-70504266 | FreeBSD 11.1-RELEASE | 25.72 | 4.39 | 30.11 |
ami-e83e6c97 | FreeBSD 11.2-RELEASE | 25.45 | 5.36 | 30.81 |
ami-01599ad2c214322ae | FreeBSD 11.4-RELEASE | 55.19 | 4.02 | 59.21 |
ami-0b0af3577fe5e3532 | Red Hat Enterprise Linux 8 | 13.43 | 52.31 | 65.74 |
In the race to accept incoming SSH connections, the clear winner — no pun intended — is Intel's Clear Linux, which boots to a running sshd in a blistering 1.23 seconds after the instance enters the "running" state. After Clear Linux is a roughly three way tie between Amazon Linux, Debian, and Ubuntu — and it's good to see that Ubuntu's boot performance has improved over the years, dropping from 18 seconds in 16.04 LTS to 15 seconds in 18.04 LTS and then to 12 seconds with 20.04 LTS. After the Amazon Linux / Debian / Ubuntu cluster comes SUSE Linux and FreeBSD; here, interestingly, SUSE 12 is faster than SUSE 15, while FreeBSD 12.2 and 13.0 (the most recent two releases) are noticeably faster than older FreeBSD.
Finally in dead last place comes Red Hat — which brings up its network stack quickly but takes a very long time before it is running sshd. It's possible that Red Hat is doing something similar to the behaviour I disabled in FreeBSD, in downloading and installing security updates before exposing sshd to the network — I don't know enough to comment here. (If someone reading this can confirm that possibility and has a way to disable that behaviour via user-data, I'll be happy to re-run the test and revise this post.)
UPDATE: Turns out that Red Hat's terrible performance was due to a bug which was fixed in the 2021-08-25 update. I tested the new version and it now lands in the middle of the pack of Linuxes rather than lagging far behind.
Needless to say, FreeBSD has some work to do to catch up here; but measurement is the first step, and indeed I already have work in progress to further profile and improve FreeBSD's boot performance, which I'll write about in a future post.
If you find this useful, please consider supporting my work either via my FreeBSD/EC2 Patreon or by sending me contributions directly. While my work on the FreeBSD/EC2 platform originated from the needs of my Tarsnap online backup service, it has become a much larger project over the years and I would be far more comfortable spending time on this if it weren't taking away so directly from my "paid work".