Mastodawn

The Psychotic Network Ferret

So, I was thinking about this batshite solution I encountered a few years ago. An industrial fab facility had a SAP server, big honking beast of a machine, 256 GB of RAM, and this was 2019, so yeah, money was spent.

But the database (mysql) was ostensibly, too slow. Their SAP consultants solution?

On boot, the system would take 128 GB of RAM, and create a ramdisk mounted on /mysql. It would then copy /var/lib/mysql, or wherever MySQL was stored into /mysql, and start the mysqld with flags to use the ramdisk.

Yes, this was absolutely insane, and tremendously fragile. But I digress. Every night at 0300, the mysqld was shutdown, and /mysql was rsynced to permanent storage. So, if they lost power in the middle of the day they lost potentially hours of work that had to be manually re-entered.

The punchline to this joke is that backing up the ramdisk to disk took about 40-45 minutes. The UPS backing this server had about half that in runtime. So, it never did anything even remotely useful. In fact, it sometimes lead to some terrible corruptions that had to be painstakingly repaired. Yes, they did have a script that was triggered by the UPS going on battery. It never finished on time.

Anyways...... My brain was thinking about this, and decided..... CAN WE MAKE THIS WORK?!?!?!?!!111???/one?!//111

I think we can, but I would like to state that this is insane, and probably _really_ expensive for fairly little gain.

But hear me out. System boots from a single disk ZFS stripe, creates a RAMDISK and adds it to the zpool as a mirror. Obviously, this would only work for smaller disks or shitloads of RAM, and really only benefits read performance.

Would I do this? Probably only to see if I could make it work, I doubt it is actually a useful solution. But, it popped into my head just now, and I had to get it out, just so my brain would stop thinking about it.

Thoughts?

gloriouscow 5d ago

I wonder how much time and engineering effort has been spent on speeding up databases that were just never properly indexed

The Psychotic Network Ferret 5d ago

@gloriouscow Right?

And hell...... it was just one 1TB SAS disk behind it. Just improving disk I/O would have been trivially easy.

But the budget for solutions was $0, because, "we already paid for this, it should just work better." Also, the MSP I worked for at the time was absolutely terrified of taking ownership of anything Linux based.

The only reason I was even authorized to touch the machine was to install some virus scanner that they decided was required. I learned a ton about shit SAP solutions and made it work where several other employees had failed miserably. All because I can read the systemd docs and change some startup order.

MSP employees are almost always pathetic.

Tim Chase 5d ago

"get this, so instead of properly indexing the system, what would happen if we made it more fragile, more complex, more likely to get increased data-loss exposure, and more expensive in the process?"

The whole story pains me at multiple levels. You have my sympathy, @nuintari

The Psychotic Network Ferret 5d ago

@gumnos @gloriouscow Shit like this is why I can't do MSP work. This kind of bullshit is _everywhere_.

Emelia/Emi 5d ago

@nuintari I've looked in to similar stuff with mixing SSDs and spinning rust for "fast read slow write" stuff, and the consensus seems to be "ZFS really doesn't like different disks in a vdev having different performance characteristics" because it won't account for that when scheduling read IO, so you'll get incredibly inconsistent performance.

(I'm not familiar with MySQL's innards as much as with postgres, but you'd think that if the box has more than enough RAM to fit the whole damn database as a ramdisk, you'd be able to tune it to operate more like redis, keeping the entire thing in memory so you never have to touch disk for reads... I know postgres can be fairly easily tuned to be very greedy with regards to RAM usage)

The Psychotic Network Ferret 5d ago

@becomethewaifu Yup, the original solution was absolutely terrible in so many ways. I too prefer PostGreSQL, but do know enough about MySQL that I could have fixed this right. But the MSP I was working for at the time had no spine for big, scary Linux stuff.

Yet another reason I hate working for MSPs.

Emelia/Emi 5d ago

@nuintari Indeed. I've heard enough stories that I'm glad I got in at [redacted], an incredibly boring corporate job where I don't have much in the way of sysadmin responsibilities, but our admins are generally Quite Competent, aside from sticking their heads in the dirt with regards to IPv6...

I seem to remember StackExchange posting something about how they eventually figured out as part of their performance tuning that it was faster to just give the database boxes a TB of RAM to fit the indexes in memory than it was to have a separate cache server, as it took the same amount of time to simply query the cache as it did to render the page 'from nothing' with an in-memory index...

The Psychotic Network Ferret 5d ago

@becomethewaifu I _want_ to be doing sysadmin/networkadmin duties, but I want to do them WELL.

In the current era of abject laziness and surrender to the AI gods, doing quality work is dead.

I'm considering changing professions. Modern IT is completely terrible. Another profession might also be terrible, but at least I'll be too green to know any better.

Justin Derrick 5d ago

@becomethewaifu @nuintari RAM > clever crap.

A couple decades ago I remember being parachuted into a customer site that was having performance problems. It took a day or so to get through my checklist of known performance issues to search for, but I eventually found that their database was capped at using 500MB of RAM on a 16GB machine. I changed one setting at allow the DB to use up to 8GB, and told their ops folks to restart the server overnight.

The client was over the moon that I fixed this issue they had been suffering with for weeks, in a little more than 48h.

(And I moved the DB RAM check to the top 10 of my performance checklist.)

The Psychotic Network Ferret 5d ago

@JustinDerrick @becomethewaifu Well yes, this is course actual sanity.

There are so many solutions to this problem that can survive a reboot better and will perform just as well.

Fritz Adalis 5d ago

@nuintari
Doesn't mysql have log shipping or equivalent? Have it ship the logs to the slow disk, at least it would be consistent and the recovery point would be whatever the log queue depth was at the time.

The Psychotic Network Ferret 5d ago

@FritzAdalis Yes..... but do you expect real solutions from modern SAP consultants? Or hell, from SAP consultants from ANY era?

Fritz Adalis 5d ago

@nuintari
Or from any enterprise consultant, really.

The Psychotic Network Ferret 5d ago

@FritzAdalis There is a special place in in hell for ERP/SAP consultants.

Phil Dennis-Jordan 5d ago

@nuintari As a curious aside, Linux MD RAID has the concept of a “write-mostly” member, which mostly doesn’t serve any read I/Os issued to the array, only writes. (Unless the array becomes degraded.)
Obviously traditional RAID has other issues…
With ZFS I guess you could just clone the persistent pool in the RAM disk, then use snapshots to periodically transfer the changes back to the persistent storage pool?

The Psychotic Network Ferret 5d ago

@pmdj I mean, this was a pointless mental exercise whose primary goal was to get this awful idea out of my head.

Clone to RAM + regular snapshots is just the same issue as the original shit solution, done more often. Given the original problem was sitting 100GB in RAMDISK, periodic snapshots would probably be slower than just using the SAS disk under the hood.

Phil Dennis-Jordan 5d ago

@nuintari That really depends on the access patterns. For mostly tiny writes, I’d expect the period sync to clearly outperform the raw disk based approach. For mostly-reading loads you’d of course be better off with a large (ARC) cache.

The Psychotic Network Ferret 5d ago

@pmdj Yeah, in short, this is a terrible solution. There is almost always a better way than such weak ass hacks.

It wasn't even a real mental exercise, it was an effort to get a dumbass idea out of my head.

@nuintari Run two jails. One runs the godawful-mysql-on-ramdisk setup. The other runs mysql against the disk. Run the ramdisk mysql as the primary and make the normal mysql a replicating secondary. It's still dumb, but you get only a short delay on writing back out to disk and your UPS shutdown script can just nuke the primary and attempt a clean shutdown on the secondary to be able to (hopefully) cleanly shutdown while there's power.

The Psychotic Network Ferret 5d ago

@overeducatedredneck This isn't far off from a proper solution.

1) better database tuning
2) better disk backend
3) proper fucking replication

heaven forbid, eh?

@nuintari Exactly. I mean, if you can hold all the DB filess in half the RAM, then the OS and database server should be caching most of it in RAM anyway.

At least once in my life, I improved database performance by using more RAM and CPU and worse (cheaper) disks.

The Psychotic Network Ferret 5d ago

@overeducatedredneck Yeah, the entire solution lacked proper understanding.

silverwizard 5d ago

@nuintari @overeducatedredneck Wait - but can't you put a giant pile of ARC on RAM? Which means the reads will be properly quick (assuming there's regular DB walks) and then probably make a ramdisk write cache (or wildly fast disk). Let ZFS handle the cache flushes and the longterm storage, and let your RAM handle the reads and writes?

ティージェーグレェ 5d ago

RAM disks are great! Amiga Workbench shipped with one by default in an era where floppy disks were still the norm. IMHO, that was the apogee of personal computing and if you didn't enjoy it when it was fresh, you missed out. It doesn't look the same through a retro lens.

Throwing a zpool into a RAM disk? I mean, I dunno, maybe it would work? You can try I guess.

Those consultants spending gobs of money on 256GB of RAM (though in 2019, I don't think it was that insanely expensive? I knew of folks operating with terabytes of RAM years earlier than that, and in 2020 someone was trying to get me to work at some SAP related gig where they had 40TB of RAM on some system, but their friggin job application couldn't handle UTF-8 and that just seemed like bad news.) not spending a comparatively small amount on a sufficient UPS though? What a nightmare.

Albeit, I'm spoiled, circa 2002-2006 my employer had an entire building-wide UPS, with a 600kw diesel generator on automatic (and it would do a self test once a week! Sure beat the sorts of situations I dealt with at earlier employers) oh yeah, and we were adjacent to a hospital (during a crisis, hospitals are the last to lose grid power and the first to have it restored). So, I have experienced good power designs, and they're a far cry from what you described.

The Psychotic Network Ferret 5d ago

@teajaygrey So, as for 256 GB of RAM being a lot in 2019: It was for a company that was comparatively small. Sub 500 employees. Also, openly hostile to IT expenses, You see this a LOT in smaller orgs. Everything is relative.

As for big ass UPS. Yes, I have spent the vast majority of the last 30 years of my professional career working for ISPs. When the network is your product, you tend to have sufficient power to keep said product running through the worst local Edison can toss at you. The kind of battery systems where, if you touch them, you will be thrown against the far wall, but will be dead before you hit them. Plus the gigantic geni out back.

But this was a gig for a local MSP that did work for smaller, local companies. It was during my brief stint where I tried to take an "easier" job.

It turns out, I absolutely hate doing such mediocre work. I couldn't even convince this client to spring for a generator, let alone a better UPS. I HATE MSP work. Solutions like the aforementioned nonsense are the best you can hope for.

I was NOT saying this was a good idea. I was wondering if their shit solution could be improved upon with minimal impact.

Bitslingers-R-Us 5d ago

@nuintari I think your mirror idea would work, but keep in mind that if you reach the maximum depth of your write cache, the physical disk will hold up writes to the RAM disk until things get caught up.

Something I don’t get: how can it take 40 to 45 minutes to back up, say, 128 gigs? That’s only slightly faster than free-128-gig-USB-stick-from-Microcenter speeds. With a decent NVMe SSD, that should take, say, 128 seconds or so.

RAM disks are great for certain uses. I’ve booted machines with no disks, create a RAM disk, installed #NetBSD on to it, chrooted to it, then run entirely in the RAM disk for months. It’s a great solution to certain problems.

(edit: RAM disk <-> physical disk hold up)

@nuintari Run live/replica on ramdisk/normal disk instead? Two mysql instances on same server, RAM one serves everything, disk one gets the replication commits only. That way you get fast reads and fast commits.

Really just use NVMe but that's hardware and outside of constraints.