Mastodawn

Michael Brazda

Need a Postgres expert to help fix the Peertube DB. Willing to throw some cash to someone who helps.

Show thread

Michael Brazda Jun 18, 2023

Yes,

Show thread

Malinux Jun 18, 2023

@omnipotens I messed up my last migration, but if anyone wants a second pair of eyes I'd be delighted to join.

Show thread

Kainoa May 30, 2023

@omnipotens cc @esm @esm 👀

Show thread

Essem

May 30, 2023

@kainoa @omnipotens @esm i'd probably be down to help with postgres stuff, whether or not i'm an "expert" at it is debatable though lol

Show thread

Michael Brazda May 30, 2023

@esm @kainoa @esm

The issue is the database got corrupted

"videoRedundancy" failed: PQgetResult() failed.
pg_dump: error: Error message from server: ERROR: could not access status of transaction 3004004288
DETAIL: Could not open file "pg_xact/0B30": No such file or directory.
pg_dump: error: The command was: COPY public."videoRedundancy" (id, "expiresOn", "fileUrl", url, strategy, "videoFileId", "actorId", "createdAt", "updatedAt", "videoStreamingPlaylistId") TO stdout;

Show thread

Essem

May 30, 2023

@omnipotens @kainoa @esm first of all i'd suggest applying a backup if you have one, if you don't for whatever reason, you should set up an auto backup script that uses pg_dumpall after this

give a restart a try, if that doesn't fix it you could try messing with pg_dump flags, --data-only and --schema-only are good ones to try (separately)

Show thread

Michael Brazda May 30, 2023

@esm @kainoa @esm

The data-only failed.

Show thread

Kainoa May 30, 2023

@omnipotens @esm this may help? https://dba.stackexchange.com/a/321886

How to determine which foreign keys are circular? As highlighted by warning from pg_dump

When carrying out pg_dump on a (timescale) database running within a local container I have the following: pg_dump: warning: there are circular foreign-key constraints on this table: pg_dump:

Database Administrators Stack Exchange

Show thread

Essem

May 30, 2023

@omnipotens @kainoa @esm hmmmm, it seems like it keeps failing on the videoRedundancy table specifically; I'm not too familiar with peertube but after looking at the docs I think it might just be possible to recreate that table

do the actual postgres db daemon logs say anything whenever you attempt to pg_dump? we may be able to isolate and recreate what's affected

Show thread

Michael Brazda May 30, 2023

@esm @kainoa @esm

This is what comes up in the logs when I try a dump

Show thread

Essem

May 30, 2023

@omnipotens @kainoa @esm about the same then

are you able to get the indexes in the videoRedundancy table, in either the current db or the backup? if not then i'd try dropping that specific table and restoring it from the november backup

Show thread

Michael Brazda May 30, 2023

@esm @kainoa @esm

Sorry my database skills are very limited and I am not to sure how to do that. I have not really messed with DB in 10 years.

Show thread

Essem

May 30, 2023

@omnipotens @kainoa @esm ah, do you have access to some sort of UI like phpMyAdmin, Adminer, or DBeaver? if not, then try running this in the postgres shell (sudo -u postgres psql) to try and get the indexes on the table:

SELECT indexname, indexdef FROM pg_indexes WHERE tablename = "peertube_prod.public.videoRedundancy";

Show thread

Michael Brazda May 30, 2023

@esm @kainoa @esm

peertube_prod=> SELECT indexname, indexdef FROM pg_indexes WHERE tablename = "peertube_prod.public.videoRedundancy";
ERROR: column "peertube_prod.public.videoRedundancy" does not exist
LINE 1: ...xname, indexdef FROM pg_indexes WHERE tablename = "peertube_...
^

Show thread

Essem

May 30, 2023

@omnipotens @kainoa @esm apologies, try replacing "peertube_prod.public.videoRedundancy" with 'videoRedundancy'

Show thread

Michael Brazda May 30, 2023

@esm @kainoa @esm

same
peertube_prod=> SELECT indexname, indexdef FROM pg_indexes WHERE tablename = "videoRedundancy";
ERROR: column "videoRedundancy" does not exist
LINE 1: ...xname, indexdef FROM pg_indexes WHERE tablename = "videoRedu...

Show thread

Essem

May 30, 2023

@omnipotens @kainoa @esm single quotes, not double, postgres is a bit picky with that

Show thread

Michael Brazda May 30, 2023

@esm @kainoa @esm

Here is the output

Show thread

Michael Brazda May 30, 2023

@esm @kainoa @esm

oops forgot to paste :)

https://pastebin.com/Nq8u1Zge

peertube_prod=> SELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'vi - Pastebin.com

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

Pastebin

Show thread

Essem

May 30, 2023

@omnipotens @kainoa @esm hmmm, so the table isn't completely toast, it just seems like the data is messed up

after backing up the current data directory as wolf480pl mentioned earlier (not a dump, just copying the /var/lib/pgsql or /var/lib/postgresql directory outright), give pg_restore --data-only -d peertube_prod -t videoRedundancy <november.dump> a shot (replace with the path to the database backup from november)

Show thread

Michael Brazda May 30, 2023

@esm @kainoa @esm

Here is a bit more logs

Show thread

Kainoa May 30, 2023

@omnipotens @esm @esm could you try putting them on a pastebin? It's hard to read from screenshots

Show thread

Michael Brazda May 30, 2023

@kainoa @esm @esm

https://pastebin.com/NaKWhSuF

==> /var/log/postgresql/postgresql-12-main.log <==2023-05-30 18:29:01.074 - Pastebin.com

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

Pastebin

Show thread

Kainoa May 30, 2023

@esm @omnipotens @esm in my opinion I'd say you're far better than most 👑

Show thread

Michael Brazda May 30, 2023

@esm @kainoa @esm

At this point If we loose a little bit of data I be ok but the last successful backup was nov2022.

I cannot even take a backup now.

Show thread

Wolf480pl May 30, 2023

@omnipotens @esm @kainoa @esm
That pg_xact (formerly known as pg_clog) directory contains some kind of "transaction metadata log" (which is different from WAL but I don't quite get in what way) and it seems like it's very important for postgres...

The warning about circular dependencies seems like a red herring, but the missing file in pg_xact seems like a severe issue.

I suggest you first make a copy of your postgres data directory (the one that contains subdirs like base, pg_xlog, etc.
1/

Show thread

Wolf480pl May 30, 2023

@omnipotens @esm @kainoa @esm

And the backup is so that if you lose data while trying to fix it, you can try again.

The problem looks similar to the error described here:
https://dba.stackexchange.com/questions/325104/error-could-not-access-status-of-transaction-could-not-open-file-pg-xact-0029

Except there, restarting the database fixed it.

If this is something that postgres's builtin recovery process can fix, a restart should help.

It's also possible that you trying to dump that table takes to long and creates the linked issue, but then I'd expect the number in the filename to increase

"ERROR: could not access status of transaction Could not open file "pg_xact/0029": No such file or directory" when attempting to LISTEN

When attempting to LISTEN to any channel, PostgreSQL says that Could not open file "pg_xact/0029": No such file or directory. This is a freshly restored database, so it can't be database

Database Administrators Stack Exchange

Show thread

Michael Brazda May 30, 2023

@wolf480pl @esm @kainoa @esm

yeah restarting it did not help tried that already.

Show thread

Wolf480pl May 30, 2023

@omnipotens @esm @kainoa @esm

Do you have a backup of the postgres data directory?

Show thread

Michael Brazda May 30, 2023

@wolf480pl @esm @kainoa @esm

Looks like the last one available was Nov2022

Show thread

Wolf480pl May 30, 2023

@omnipotens @esm @kainoa @esm

Make another one now.
In addition to the one you already have.

Yes, it's corrupted (apparently), but if we try fixing it and make it worse, this will let you try again.

Show thread

Wolf480pl May 30, 2023

@omnipotens @esm @kainoa @esm
Ideas for *after* you've made a backup of the postgres data directory:

- make sure nothing other than your psql is connected to postgres (check pg_stat_activity to see what is connected)
- disable triggers [1]
- try getting a single row from the table:
select * from videosomething limit 1;

If this doesn't cause the error, try selecting all the rows (no limit), if that works try pg_dump

Show thread

Michael Brazda May 30, 2023

@wolf480pl @esm @kainoa @esm

I am starting to think it is hosed

Show thread

Wolf480pl May 30, 2023

@omnipotens @esm @kainoa @esm
There are levels of hosed.

I'm no expert but I suspect most of the data from that table is still there, the question is how much effort you're willing to spend to recover it.

If you're getting this error on a 1-row select from that table even if that's the first query after restart, then I don't know how to proceed, but I suspect someone on postgres mailing list does.

Show thread

Wolf480pl May 31, 2023

@omnipotens @esm @kainoa @esm

Ok so we followed up in DMs and it turned out that:

1. The transaction ID from the error message was higher than txid_current() - it was in the future. It couldn't've been real, must've been postgres reading garbage where it'd expect transaction_id, similar to the "garden-variety data corruption" described in [1].

/cont

[1]: https://www.postgresql.org/message-id/11462.1238032941%40sss.pgh.pa.us

Re: could not access status of transaction

Tom Duffey <[email protected]> writes: > One of our databases suffered a problem yesterday during a normal > update, something we …

PostgreSQL Mailing List Archives

Show thread

Wolf480pl May 31, 2023

@omnipotens @esm @kainoa @esm

2. Looking at the table "heap" file (base/<db-oid>/<tbl-oid>) with hexdump, and searching for the little-endian hex encoding of the corrupted txid, it appeared only in one place.
Compared to rows seen earlier in the file, it seemed like 16 bytes earlier than a row should start. Possibly the row pointers at the start of a page got borked during a hardware failure while the rows were being moved around.

/cont

Show thread

Wolf480pl May 31, 2023

@omnipotens @esm @kainoa @esm

3. Trying to add fake pg_xact files like described in the "pg_clog problems" slide of [2] didn't help - it found another corrupted txid, one that appeared in many places in the file. The number or rows that I could select with `select * from <table> limit <n>` before getting an error did not go up. I suspected all rows on that page had wrong offsets.

/cont

[2]: https://thebuild.com/presentations/worst-day-fosdem-2014.pdf

Show thread

Wolf480pl May 31, 2023

@omnipotens @esm @kainoa @esm

4. Overwriting the whole page with zeros (stopping postgres, copying the file to the side, dd-ing into it with conv=notrunc and the right bs/seek/count, verifying, copying the file back) as advised in the "really bad heap damage" slide of [2] did make it possible to read more rows. There was another error with a different corrupted txid, which was also unique, and much much later in the file. Zeroing out that page too allowed reading the rest of the table.

Show thread

Wolf480pl May 31, 2023

@omnipotens @esm @kainoa @esm

5. I did save the copy of those pages before zeroing them out. They contain string fields with URLs, which are easy to spot and could be used as an anchor to guess correct row offsets and extract data from those rows. Maybe even extract correct txids, but I wouldn't know how to check if these rows are live or dead, so could end up with extra data that shouldn't be there.

Show thread

Wolf480pl May 31, 2023

@omnipotens @esm @kainoa @esm

Also, the bitfields in the row pointers confused me, maybe if I understood them it'd be possible to fix those two pages in-place. Or maybe not.

Main takeaway: it's usually recoverable, if you're not working under stress.

Show thread

Essem

May 31, 2023

@wolf480pl @omnipotens @kainoa @esm very nice!

Show thread

Wolf480pl Jun 1, 2023

Yesterday I helped linuxrocks.online folks fix their peertube's postgres db after a corruption and an ominous "ERROR: could not access status of transaction <number>" errors.

Turns out postgres's physical data layout isn't all that complicated.

See upthread for details, but reply here as I don't know if folks above want this in their mentions.

Show thread

Greg Komula

May 31, 2023

@wolf480pl @omnipot,,ens @esm @kainoa @esm

Show thread

almindor Jun 18, 2023

@omnipotens were you able to solve the issue?