I'm looking for a little help working through some database scaling issues with #WriteFreely.

On our Write.as instance, we're up to 4.25 million rows in our `posts` table, and it's causing problems for some individual blogs with 9,000+ posts.

I have an idea of what some bottleneck queries are. But wondering if there are easy db optimizations we might make (indexes, etc.), and if anyone can help identify what they might be.

Boosts appreciated!

@matt hmm, it might be easier to have a structured conversation investigation perf issues in a forum where it's easier to share log files and the like. Is there a github issue tracking this already?
@gnomon Great point -- I've just opened an issue here with some things I've tried: https://github.com/writefreely/writefreely/issues/741
Rendering a blog is slow for giant databases · Issue #741 · writefreely/writefreely

Describe the bug On our Write.as instance, our posts table has over 4.25 million rows. For an individual blog with 9,000+ posts, rendering the blog index page is taking a long time (4 seconds or mo...

GitHub

@matt

It might not be a quick fix, but spinning off older entries into their own quasi-archive tables could help, especially if they're ones that are rarely accessed. You'd need a routing mechanism somewhere to detect when to jump from the main table to the archives.

And do you have something like Redis in place for caching common / recent queries? That could help for material that's frequently accessed but doesn't change often.

@info Yeah, definitely think splitting up the table could make sense, if I can figure out what lines to do that along.

I don't have Redis in place, but am using Varnish on the frontend, which has generally helped reduce database load across the system -- it's just that this blog in particular can still run slow.

@matt

Always the hardest part with DB design, segmenting records.

If you have traffic stats on what gets the most requests? It could be a scenario where the most recent material stays in the main or live table(s), for kind of obvious reasons.

The older but high activity segments get their own table(s) with some advance caching.

Low traffic / nobody ever looks at it? Archive. Not worth the bandwidth or overhead.

Which database do you use?
@tweets MySQL with innoDB
I recently dealed with a more than 8 million row full text innoDB db. A good starting point is MylSQL tuner:
https://github.com/major/MySQLTuner-perl

Most import settings in context of scaling issues are index and innodb_buffer_pool_size. First is a question of reading/writing ratio, second a question of hardware. Maybe these are some starting points. :)
GitHub - major/MySQLTuner-perl: MySQLTuner is a script written in Perl that will assist you with your MySQL configuration and make recommendations for increased performance and stability.

MySQLTuner is a script written in Perl that will assist you with your MySQL configuration and make recommendations for increased performance and stability. - major/MySQLTuner-perl

GitHub

@matt Unfortuantely I know nothing about your stack or your code (other than it being MySQL-backed). However, here's the general troubleshooting lists of things that make a database-backed web application slow:

1. Too many queries: app is pulling lots of things from the DB constantly that could be cached, or doing joins in the app code

2. Unoptimized requests: some DB requests need better queries/better data structure/indexes

...

@matt ... both of the above could be found via query logs and metrics.

3. Inadequate hardware: if your DB is 150GB and you have 1GB of RAM, you will always be trying to fight performance.

4. Locking: app is designed so that some requests block other requests.

5. Misconfiguration: DB, platform, or other parts of the stack are configured in a way that limits throughput.

...

@matt ... if this were Postgres, I could give you some queries to hunt for tables that need indexes, but it's not. Hopefully some MySQL folks will speak up with the same.
@matt One very specific tip: you should look into partitioning your largest tables.
@fuzzychef Appreciate all the tips! I definitely think partitioning is ultimately the way to go. But went into a little more detail in this issue: https://github.com/writefreely/writefreely/issues/741
Rendering a blog is slow for giant databases · Issue #741 · writefreely/writefreely

Describe the bug On our Write.as instance, our posts table has over 4.25 million rows. For an individual blog with 9,000+ posts, rendering the blog index page is taking a long time (4 seconds or mo...

GitHub