Mastodawn

werner

S3 Files and the changing face of S3

https://www.allthingsdistributed.com/2026/04/s3-files-and-the-changing-face-of-s3.html

S3 Files and the changing face of S3

Andy Warfield writes about the hard-won lessons dealing with data friction that lead to S3 Files

All Things Distributed

Show thread

mgaunard 1d ago

Zero mention of s3fs which already did this for decades.

Show thread

luke5441 1d ago

A more solid (especially when it comes to caching) solution would be appreciated.

I thought that would be their https://github.com/awslabs/mountpoint-s3 . But no mention about this one either.

S3 files does have the advantage of having a "shared" cache via EFS, but then that would probably also make the cache slower.

GitHub - awslabs/mountpoint-s3: A simple, high-throughput file client for mounting an Amazon S3 bucket as a local file system.

A simple, high-throughput file client for mounting an Amazon S3 bucket as a local file system. - awslabs/mountpoint-s3

GitHub

Show thread

rowanG077 1d ago

I was thinking: "No way this has existed for decades". But the earliest I can find it existing is 2008. Strictly speaking not decades but much closer to it than I expected.

Show thread

huntaub 1d ago

This is pretty different than s3fs. s3fs is a FUSE file system that is backed by S3.

This means that all of the non-atomic operations that you might want to do on S3 (including edits to the middle of files, renames, etc) are run on the machine running S3fs. As a result, if your machine crashes, it's not clear what's going to show up in your S3 bucket or if would corrupt things.

As a result, S3fs is also slow because it means that the next stop after your machine is S3, which isn't suitable for many file-based applications.

What AWS has built here is different, using EFS as the middle layer means that there's a safe, durable place for your file system operations to go while they're being assembled in object operations. It also means that the performance should be much better than s3fs (it's talking to ssds where data is 1ms away instead of hdds where data is 30ms away).

[delayed]

> changes are aggregated and committed back to S3 roughly every 60 seconds as a single PUT

Single PUT per file I assume?

Show thread

LazyMans 1d ago

Based on docs, correct.

Show thread

gonzalohm 1d ago

I cannot 100% confirm this, but I believe AWS insisted a lot in NOT using S3 as a file system. Why the change now?

Show thread

LazyMans 1d ago

They found a way to make money on it by putting a cache in front of it. Less load for them, better performance for you. Maybe you save money, maybe you dont.

Show thread

yandie 1d ago

It appears that they put an actual file system in front of S3 (AWS EFS basically) and then perform transparent syncing. The blog post discusses a lot of caveats (consistency, for example) or object namings (incosistencies are emitted as events to customers).

Having been a fan of S3 for such a long time, I'm really a fan of the design. It's a good compromise and kudos to whoever managed to push through the design.

Show thread

PunchyHamster 1d ago

Because people will use it as filesystem regardless of the original intent because it is very convenient abstraction. So might as well do it in optimal and supported way I guess ?

Show thread

gervwyk 1d ago

any recommendations for a lambda based sftp sever setup?

Show thread

PunchyHamster 1d ago

Eagerly awaiting on first blogpost where developers didn't read the eventually consistent part, lost the data and made some "genius" workaround with help of the LLM that got them in that spot in the first place