How to port your OS to EC2

I've been the maintainer of the FreeBSD/EC2 platform for about 7.5 years now, and as far as "running things in virtual machines" goes, that remains the only operating system and the only cloud which I work on. That said, from time to time I get questions from people who want to port other operating systems into EC2, and being a member of the open source community, I do my best to help them. I realized a few days ago that rather than replying to emails one by one it would be more efficient to post something publicly; so — for the benefit of the dozen or so people who want to port operating systems to run in EC2, and the curiosity of maybe a thousand more people who use EC2 but will never build AMIs themselves — here's a rough guide to building EC2 images.

Prerequisites

Before we can talk about building images, there are some things you need:

Your OS needs to run on x86 hardware. 64-bit ("amd64", "x86-64") is ideal, but I've managed to run 32-bit FreeBSD on "64-bit" EC2 instances so at least in some cases that's not strictly necessary.
You almost certainly want to have drivers for Xen block devices (for all of the pre-Nitro EC2 instances) or for NVMe disks (for the most recent EC2 instances). Theoretically you could make do without these since there's some ATA emulation available for bootstrapping, but if you want to do any disk I/O after the kernel finishes booting you'll want to have a disk driver.
Similarly, you need support for the Xen network interface (older instances), Intel 10 GbE SR-IOV networking (some newer but pre-Nitro instances), or Amazon's "ENA" network adapters (on Nitro instances), unless you plan on having instances which don't communicate over the network. The ENA driver is probably the hardest thing to port, since as far as I know there's no way to get your hands on the hardware directly, and it's very difficult to do any debugging in EC2 without having a working network.
Finally, the obvious: You need to have an AWS account, and appropriate API access keys.

Building a disk image

The first step to building an EC2 AMI is to build a disk image. This needs to be a "live" disk image, not an installer image; but if you have a "live USB disk" image, that's almost certainly going to be the place to start. EC2 instances boot with a virtual BIOS so a disk image which can boot from a USB stick is almost certainly going to boot — at least as far as the boot loader — in EC2.

You're going to want to make some changes to what goes into that disk image later, but for now just build a disk image.

Building an AMI

I wrote a simple tool for converting disk images into EC2 instances: bsdec2-image-upload. It uploads a disk image to Amazon S3; makes an API call to import that disk image into an EBS volume; creates a snapshot of that volume; then registers an EC2 AMI using that snapshot.

To use bsdec2-image-upload, you'll first need to create an S3 bucket for it to use as a staging area. You can call it anything you like, but I recommend that you

Create it in a "nearby" region (for performance reasons), and
Set an S3 "lifecycle policy" which deletes objects automatically after 1 day (since bsdec2-image-upload doesn't clean up the S3 bucket, and those objects are useless once you've finished creating an AMI).

You'll also need to create an AWS key file in the format which bsdec2-image-upload expects:

ACCESS_KEY_ID=...
ACCESS_KEY_SECRET=...

Having done that, you can invoke bsdec2-image-upload:

# bsdec2-image-upload disk.img "AMI Name" "AMI Description" aws-region S3bucket awskeys

There are three additional options you can specify:

--sriov: Mark the AMI as supporting the Intel 10 GbE SR-IOV network interface.
--ena: Mark the AMI as supporting the Amazon ENA network interface.
--public: Copy the image out to all the EC2 region and mark the AMIs as public.

After it uploads the image and registers the AMI, bsdec2-image-upload will print the AMI IDs for the relevant region(s). (Either for every region, or just for the single region where you uploaded it.)

Go ahead and create an AMI now, and try launching it.

Boot configuration

Odds are that your instance started booting and got as far as the boot loader launching the kernel, but at some point after that things went sideways. Now we start the iterative process of building disk images, turning them into AMIs, launching said AMIs, and seeing where they break. Some things you'll probably run into here:

EC2 instances have two types of console available to them: A serial console and an VGA console. (Or rather, emulated serial and emulated VGA.) If you can have your kernel output go to both consoles, I recommend doing that. If you have to pick one, the serial console (which shows up as the "System Log" in EC2) is probably more useful than the VGA console (which shows up as "instance screenshot") since it lets you see more than one screen of logs at once; but there's a catch: Due to some bizarre breakage in EC2 — which I've been complaining about for ten years — the serial console is very "laggy". If you find that you're not getting any output, wait five minutes and try again.
You may need to tell your kernel where to find the root filesystem. On FreeBSD we build our disk images using GPT labels, so we simply need to specify in /etc/fstab that the root filesystem is on /dev/gpt/rootfs; but if you can't do this, you'll probably need to have different AMIs for Nitro instances vs. non-Nitro instances since Xen block devices will typically show up with different device names from NVMe disks. On FreeBSD, I also needed to set the vfs.root.mountfrom kernel environment variable for a while; this also is no longer needed on FreeBSD but something similar may be needed on other systems.
You'll need to enable networking, using DHCP. On FreeBSD, this means placing ifconfig_DEFAULT="SYNCDHCP" into /etc/rc.conf; other systems will have other ways of specifying network parameters, and it may be necessary to specify a setting for the Xen network device, Intel SR-IOV network, and the Amazon ENA interface so that you'll have the necessary configuration across all EC2 instance types. (On FreeBSD, ifconfig_DEFAULT takes care of specifying the network settings which should apply for whatever network interface the kernel finds at boot time.)
You'll almost certainly want to turn on SSH, so that you can connect into newly launched instances and make use of them. Don't worry about setting a password or creating a user to SSH into yet — we'll take care of that later.

At this point, you should be able to launch an EC2 instance, get console output showing that it booted, and connect to the SSH daemon. (Remember to allow incoming connections on port 22 when you launch the EC2 instance!)

EC2 configuration

Now it's time to make the AMI behave like an EC2 instance. To this end, I prepared a set of rc.d scripts for FreeBSD. Most importantly, they

Print the SSH host keys to the console, so that you can veriy that they are correct when you first SSH in. (Remember, Verifying SSH host keys is more important than flossing every day.)
Download the SSH public key you want to use for logging in, and create an account (by default, "ec2-user") with that key set up for you.
Fetch EC2 user-data and process it via configinit to allow you to configure the system as part of the process of launching it.

If your OS has an rc system derived from NetBSD's rc.d, you may be able to use these scripts without any changes by simply installing them and enabling them in /etc/rc.conf; otherwise you may need to write your own scripts using mine as a model.

Firstboot scripts

A feature I added to FreeBSD a few years ago is the concept of "firstboot" scripts: These startup scripts are only run the first time a system boots. The aforementioned configinit and SSH key fetching scripts are flagged this way — so if your OS doesn't support the "firstboot" keyword on rc.d scripts you'll need to hack around that — but EC2 instances also ship with other scripts set to run on the first boot:

FreeBSD Update will fetch and install security and critical errata updates, and then reboot the system if necessary.
The UFS filesystem on the "boot disk" will be automatically expanded to the full size of the disk — this makes it possible to specify a larger size of disk at EC2 instance launch time.
Third-party packages will be automatically fetched and installed, according to a list in /etc/rc.conf. This is most useful if configinit is used to edit /etc/rc.conf, since it allows you to specify packages to install via the EC2 user-data.

While none of these are strictly necessary, I find them to be extremely useful and highly recommend implementing similar functionality in your systems.

Support my work!

I hope you find this useful, or at very least interesting. Please consider supporting my work in this area; while I'm happy to contribute my time to supporting open source software, it would be nice if I had money coming in which I could use to cover incidental expenses (e.g., conference travel) so that I didn't end up paying to contribute to FreeBSD.

Posted at 2018-07-14 06:30 | Permanent link | Comments

Daemonic Dispatches