What are the highest-impact Cloud services? Storage would be near the top of any list. Where by “Storage” I mean what S3 does: Blobs-of-bytes storage that is effectively unlimited in capacity, credibly more durable than anything you could build yourself, and easily connected to the world, either directly or through a CDN. I think we’re entering a period of churn where there’s going to be serious competition on storage price and performance. Which, by the way, is crucially relevant to the Fediverse.
Let’s start with AWS, since they invented the modern Storage concept. The most important thing about S3 is this: There appear to be zero credible reports of S3 data loss. Given the number of objects it holds, and the numbers of years it’s held them, that’s remarkable.
It’s a safe place to store your data. Yeah, the API is a little klunky, and the latency can be high, and the hardwired bucket/object hierarchy is annoying, and so are the namespace issues. And it’s not cheap.
But it’s safe. And fast enough to be useful. And safe. And dead easy to connect up to a CDN. And did I mention that it’s safe?
S3… · AWS, to their credit, aren’t resting on their laurels. Here is a good Register interview with Andy Warfield, one of the lead S3 engineers and also a really good person. He’s talking about another variation on the basic S3 service, called “Express”, which has more filesystem-y semantics, higher performance, but (reading between the lines) a little less durability? (Also, more expensive.)
What’s notable about S3 isn’t this particular feature, but the fact that AWS keeps rolling out new ones. So it’s a moving target for the emerging competition.
…but cheaper… · In recent years and especially over the last few months, alternatives and competitors to S3 keep crossing my radar. A bunch of them have a premise that’s essentially “S3-compatible, but cheaper”: Backblaze B2, Digital Ocean Spaces, Wasabi, IDrive e2, Cloudflare R2, and Telnyx Cloud Storage. I’m sure I’ve missed some.
…and faster! · Some of the products claim to be way faster. Which matters if it’s true, but so far I don’t know of any popular benchmarking standards, so I’d take the numbers with a grain of salt. If I really cared, for a big project, I’d want to try it with my own code.
Here are a few of those:
S2 · See Designing serverless stream storage. This is still more a research project than a product, but I drop it in here because it says that access to S3 Express made it possible. Its claim to fame appears to be higher performance.
Tigris · Tigris offers what they describe as “Globally Distributed S3-Compatible Object Storage”. I think the best description of what that means is by Xe Iaso of Fly.io, in Globally Distributed Object Storage with Tigris. It’s not just well-written, it’s funny. Apparently Fly.io bundles Tigris in, with command-line and billing integration.
Bunny · The fastest object storage, replicated to the edge is their big claim.
CDN? · Bunny sounds like it’s partly a CDN. And it’s not the only one. Which makes obvious sense; if you want to deliver the stuff you’re storing to users around the world at scale, you’re going to be hooking your storage and CDN together anyhow. So those lines are going to stay blurry.
Compatibility and intellectual property · S3 compatibility is an issue. It’s interesting that AWS has apparently decided not to defend the S3 API as intellectual property, and so these things cheerfully claim 100% plug-compatibility. And when they don’t have it, they apologize (that apology looks unusually far under the covers; I enjoyed reading it).
Durability? · They may claim compatibility, but mostly do not claim equivalent durability. I’ll be honest; if I were picking one, that would worry me. I’d need to see pretty full disclosure of how the services work under the covers.
Unknowns · I just mentioned durability, which is a technology issue. The other big unknowns are about business not technology. First of all, can you sustainably make money selling storage at a price that undercuts AWS’s? I haven’t the vaguest idea.
Second, is this a threat to AWS? There is a vast amount of data that is never gonna migrate off S3 because who’s got the time for that, but if the competition really can save you a lot of money that could hit S3’s growth hard, and Amazon wouldn’t like that. Who knows what might happen?
Now let’s change the subject.
Fediverse storage · I’ll use myself as a Fediverse example. As I write this, my @[email protected] Mastodon account has just over 18K followers, distributed across 3K-and-change instances. So whenever I post a picture or video, each of those instances fetches it and then keeps its own copy, if only in a short-lived cache.
All these files are immutable and identical. Smell an opportunity? Yeah, me too. Someone needs to build an object-store/CDN combo (I’ve already heard people say “FDN”). The API should cater to Mastodon’s quirks. You could split the cost equally or deal it out in proportion to traffic, but either way, I think there’d be big cost savings for nearly every instance.
Furthermore, it doesn’t feel technically challenging. If I were still at AWS, I’d be working on a PR/FAQ right now. Well, except for, since everything is S3-compatible and CDNs are commoditized, it would be plausible (and attractive) to build your FDN in a way that doesn’t tie you to any particular infrastructure provider.
Someone has already started on this; see Jortage Communal Cloud; small as of yet, but pointing in the right direction.
Fun times! · The storage world is a market with no monopolist, where providers are competing on price, performance, and durability. Be still my beating heart.