Thoughts on Glacier pricing
While writing my last blog post I spent a long time looking at the pricing of Amazon's new Glacier archival storage service; or more precisely, the pricing of the "Data Retrieval" component. After much thought, I have come to the conclusion that Glacier's pricing is incomprehensible, broken, and fundamentally un-Amazonian.Let's take "incomprehensible" first. The Glacier pricing page states (in a footnote, no less!) that:
You can retrieve up to 5% of your average monthly storage (pro-rated daily) for free each month. If you choose to retrieve more than this amount of data in a month, you are charged a retrieval fee starting at $0.01 per gigabyte.Ok, that's a start (although, as it turns out, not an entirely accurate one), but "starting at" is hardly precise; let's head over to the Glacier FAQ for some more details:
Each month you can retrieve up to 5% of the data you store in Glacier for free. This allowance is pro-rated daily. For example, in a 30 day month, you can retrieve approximately 0.17% of your stored data for free daily (5% / 30 days = 0.17% per day). This means if you store 12 terabytes of data you can retrieve 20.5 gigabytes a day for free. You are charged a retrieval fee when your retrievals exceed your daily allowance.That's good to know, although it contradicts the pricing page — assuming the FAQ is accurate, exceeding your daily allowance is enough to run up a bill, even if you were well within your allowance for the month. Moving on:
If, during a given month, you do exceed your daily allowance, we calculate your fee based upon the peak hourly usage from the days in which you exceeded your allowance. [...] Next we subtract your free allowance from the peak hourly retrieval for the month. To determine the amount of data you get for free, we look at the amount of data retrieved during your peak day and calculate the percentage of data that was retrieved during your peak hour. We then multiply that percentage by your free daily allowance. [...] We then subtract your free allowance from your peak usage to determine your billable peak. [...] The amount you pay is your billable peak, multiplied by the number of hours in the month, multiplied by the retrieval fee.Leaving aside the confusion of interleaving the pricing algorithm with an example (which I have elided, along with the embarrassing lapse in arithmetic which led to 20.5 / 24 being equal to 0.82 instead of 0.85), there is a glaring lack of definition: What is the "peak hourly usage"?
In most AWS services, this is straighforward: Add up the requests you issued in each hour. Glacier is unlike most AWS services, however — its retrieval requests "typically complete within 3-5 hours". Is an hour's usage defined to be the requests issued in that hour? The requests completed in that hour? The portions completed in that hour? In a thread on the AWS forum, an Amazonian states that you can "expect the total amount of data retrieved for a single retrieval to be spread evenly across four hours for the purposes of working out the cost of the retrieval", but it's not clear what the word "expect" means here: Is this is how the pricing is defined, or merely a merely a rule of thumb? Either way, the web forum is not where this information belongs: The Glacier pricing page ought to have enough information to allow Amazon Web Services users to figure out how much they will end up paying — just like the pages for all the other AWS services do.
Next up: "broken". It's clear that the purpose of this retrieval charge is to bill people for their peak usage. I don't know all the details of how Amazon Glacier is implemented, but whether it's hard drives which are spun down most of the time or tape robots with a limited number of drives, Amazon has a limited amount of read throughput. Problem is, Amazon isn't always charging for peak usage. Consider someone who has 9 TB of data stored, and downloads data on two days: On day 1, he downloads 24 GB, at a rate of exactly 1 GB per hour; on day 2, he downloads 10 GB all at once. This hypothetical user has a free daily retrieval allowance of 15 GB, so his burst of 10 GB in a single hour on day 2 is ignored; instead, he is billed based on the 1 GB/hour he downloaded on day 1 (which, after the free retrieval usage is considered, ends up as 0.375 GB/hour of billed usage, for a cost of $2.70). Of course, this is a somewhat contrived example, and in many cases the "peak hours" used for computing a customer's bill will, in fact, be their peak hours; but as far as reliably capturing peak usage goes, the Glacier pricing model falls quite neatly into the "almost but not quite" category.
There's another problem with the Glacier retrival charges which is far more serious, albeit rather subtle: The pricing is non-concave. Much to the annoyance of economists — who like to see strictly convex pricing models — aside from "freemium" and other limited-usage "trial" plans, all the services we use are concave. The common term for this is "volume discounts", but in the context of Amazon Web Services, what it really means is this: If you have multiple accounts and sign up for Consolidated Billing, you might end up paying less (due to moving into a higher volume / lower price tier for S3 storage or outgoing bandwidth) but you should never end up paying more. Glacier violates this property.
Consider two Amazon Glacier users, Alice and Bob, who each have 600 GB of data stored. For simplicity, we'll assume that every month has 30 days, so that each of them has a daily free retrieval allowance of 1 GB. On the first day of each month, Alice downloads 1 GB of data at 1 PM, while Bob downloads 1 GB of data at 1 PM and another 1 GB of data at 2 PM; for the rest of the month, neither of them downloads any data from Glacier. Alice's bill is simple: She's within her free retrieval allowance every day, so she just pays $6.00 for the storage. Bob is also paying $6.00 each month for storage, but he has to pay for retrieval bandwidth as well, since he downloaded 2 GB on a day when his free daily retrieval allowance was only 1 GB. His peak hour was 1 GB out of the 2 GB for the day, so he gets 0.5 GB of his free retrieval allowance attributed to that hour; the remaining 0.5 GB is his billable peak hourly retrieval rate, so he ends up paying $3.60 for retrieval, yielding a total bill of $9.60 for the month.
Between them, Alice and Bob were paying a total of $15.60 each month; but now they decide that arranging illicit liaisons cryptographically is too much work, and decide to bring their relationship into the open and get married. A few days after their honeymoon, they sign up for Amazon Consolidated Billing. They have a combined 1200 GB of data stored, so they're paying $12.00 for storage — that part hasn't changed — and they get a daily free retrieval allowance of 2 GB instead of two separate allowances of 1 GB. Their peak hour is now 2 GB out of a total of 3 GB for the day, so of their free retrieval allowance, 1.33 GB is attributed to that peak hour, leaving them with a billable peak hourly retrieval rate of 0.67 GB/hour — which works out to $4.80 of retrieval bandwidth. By signing up for Consolidated Billing, they increased their combined bill from $15.60 up to $16.80. (Epilogue: They each blame the other for the increased bill they're receiving from Amazon, leading to a breakdown in their relationship, and they get divorced a few months later. It was inevitable anyway: Cryptographers never really trust anyone.)
Finding the worst case is an interesting optimization problem which I leave as an exercise to the reader; suffice it to say that if Alice and Bob can choose arbitrary data retrieval patterns, they can arrange that Consolidated Billing will cost them 76.8% more than being each billed separately. As a pricing model, I'm calling this broken.
Finally, "fundamentally un-Amazonian". It may seem presumptious for me to tell Amazon what is or is not un-Amazonian, but I do have some justification: I've been using Amazon Web Services — and evangelizing it within my corner of the open source and startup worlds — since 2006, which is a fair bit longer than most Amazon Web Services employees. More importantly, however, I remember a line which might well have appeared in some form in every single Amazon Web Services sales pitch in history:
With traditional hosting, you have to provision to meet peak demand. Amazon Web Services saves you money by letting you rapidly scale up and down to match your needs, without paying for excess unused capacity.Glacier's data retrieval pricing throws that out the window: Unlike every other service Amazon provides, with Glacier you're not paying for your net usage throughout each month: Instead, you're paying for your peak hour — and once that peak hour has come and gone, you're still paying for that peak until the end of the month, no matter how low your usage might fall.
Glacier is a fantastic service. The pricing for data retrieval sucks. You guys can do better.