See this graph? It's my last employer's AWS monthly bill from January 2023 to January 2026, but just their Aurora databases' storage, IO and backups. I made a few fixes in the couple months before I left a year ago, but it took a year to show how much money the fixes saved. 🧵

I should've probably waited until next month to ask for the graph, as there's probably a bit more to drop still.

Their biggest Aurora database server had a massively inflated ibdata1 file due to a zombie query on the replica that ran for months, before I started in 2022. Aurora replicas share the same storage as the primaries, and you can't just shrink that file. Newer Aurora/MySQL servers don't let long queries inflate that file.

I think 85% of the data files were wasted space, and backups were file snapshots so the backups were inflated too.

And the backups were more expensive than anything else with the databases, which I didn't look at before because the backups weren't my job (bit weird but ok).

Anyway, I did a few things in my final months, which I probably wrote about before:

1. Forced an investigation into the backups, so we found a way to save money while keeping the same number.

2. Rebuilding the main database so that it doesn't have the inflated ibdata1 file of wasted and expensive space. 🧵

1. was difficult in that I was overstepping and I wasn't allowed to make changes. But figuring out how we were overpaying for backups, within the backup configurations, halved our Aurora bill.

2. was technically difficult, partly because AWS had a database cloning tool that the support engineer suggested to me, which I wasted time on using until I saw that it didn't really work. I would've had to rebuild indexes, and it just wouldn't copy certain datatypes.

Oh, and getting a snapshot sql backup wasn't possible due to some Aurora thing I forget now. MySQL stuff that doesn't work in the Aurora fork. So I had to take an inconsistent sql backup of the database to build a new replica (with its own storage), start up binary log replication, and run a bunch of comparisons and data fixes to make sure the new replica's data was an exact replica.

This was stuff I'd done a bunch of times in my career, but Aurora's limitations made it harder. 🧵

Well, I'd done a bunch of this in my career, but actually "shrinking the ibdata1 file" was a bucket list thing, because I'm very boring.

Anyway, we eventually failed over to the new replica (making it the primary) and the new backups were immediately smaller. But it took a year for the old big backups to eventually delete, which is why the full impact of my changes took a year to show up.

The company has added more database servers since I left, too, yet the fix is still a huge improvement. 80%, maybe more?

Maybe it's weird to care anymore, since I assume I'm never working as a database engineer again, but I like to have visual proof of the work I did.

And it's not like the AWS "devops guru" AI was ever going to state "by the way, you're overpaying on backups and your data files are mostly wasted space". At least, it never said that to us.

I know there are AWS spend reduction consulting companies, but I don't know if they're aware of the ways that RDS and Aurora can overcharge, since it requires MySQL knowledge.

Saving the company money isn't generally the purpose of an engineer, but I think it should be when cloud providers are set up to let you accidentally overpay for things.

I also made other database changes over the years there to make queries faster, lessen IO, etc. The big change at the end was something I asked for over a year earlier but had to wait for permission to do.

@giflian
From ~$3k down to $750.

Nice work!

@DaveMasonDotMe more like $600, and that's including new servers they've since added. And maybe missing a month more of deletes. Not that that's important