Mastodawn

Alice Averlong🏳️‍⚧️Aug 27, 2023

okay so the chunk IDs seem to be related to different types of chunk handlers
chunk IDs 0-31 use a 3-parameter callback, and 32-47 use a 4-parameter callback

Show thread

Alice Averlong🏳️‍⚧️Aug 27, 2023

you could have just made them all take 4 parameters and just have some of them ignore the 4th parameter but NO we gotta make everything complicated so that foone's little brain can't handle it

Show thread

Alice Averlong🏳️‍⚧️Aug 27, 2023

you'd think the programmers of an Azumanga Daioh, of all games, would realize that the eventual reverse engineer hacking their game might be an Osaka, and would not over-complicated it

Show thread

Alice Averlong🏳️‍⚧️Aug 27, 2023

oh hello. Someone left the output of a tool on the disc!
Data Pack2 by OOTUKA, Technosoft Co LTD, eh?

Show thread

Alice Averlong🏳️‍⚧️Aug 27, 2023

that's very interesting. Technosoft had nothing to do with this game... they didn't even exist anymore when it came out.

Show thread

Alice Averlong🏳️‍⚧️Aug 27, 2023

but given the 1996-1998 dates, I'm guessing they made this tool for one of their PS1 games they released in that period, and it later got used by Ganbarion for Azumanga Donjara Daioh and the One Piece games

Show thread

Alice Averlong🏳️‍⚧️Aug 27, 2023

Shuji Yoshida is credited as "Library Program" on all three games I know that use PAC files.
It's possible he's OOTUKA.

Show thread

Alice Averlong🏳️‍⚧️Aug 27, 2023

or it might mean he made the APF files

Show thread

Alice Averlong🏳️‍⚧️Aug 27, 2023

okay so the output of that tool is kinda handy.
because while it's not 100% correct (they changed shit after this file was made), it's still partially correct: azending.pac DOES include endto.pac, in it's entirety

Show thread

Alice Averlong🏳️‍⚧️Aug 27, 2023

and it looks like there's a 32 or 36 byte header before the file. So maybe the PAC files are concatenated subfiles with headers right before them

Show thread

Alice Averlong🏳️‍⚧️Aug 27, 2023

okay it's a 54-byte header.
so PAC is a lazy TAR clone

Show thread

Alice Averlong🏳️‍⚧️Aug 27, 2023

I just need to write a script to decode it. but my brain isn't working now

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

the weird thing is that the text file suggests the PAC files contain filenames, but I don't see them. Now, there IS a stretch of bytes that could be a filename, but I can't seem to decode it as anything sensible:
B3 A5 A3 B2 A5 B4 6E B0 A1 A3

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

it does decode as shift-jis (which the text file was encoded as) but turns into:
ｳ･｣ｲ･ｴｰ｡｣

which I don't think makes any sense

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

and if you decode it as utf-16, the most reasonable encoding for windows computers at the time, you end up with ꖳ늣뒥끮ꎡ, which makes even less sense.

I'm pretty sure they didn't name the files in their Azumanga Daioh game in a mix of Mande, Korean, and Sino-Tibetan scripts

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

but by matching up the filenames with the text file (azmem.txt) and what subfiles are definitely inside azending.pac, that pile of gibberish is supposed to mean "secret.pac"

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

wait

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

maybe this means something.
the "C" in "SECRET" is encoded the same as the "C" in "PAC"
And note that the A in PAC is encoded as A1, which is only 2 less than the A3 which C is encoded as.

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

what encoding puts ABCDEF at A1 and up, though?

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

answer: nothing python 3.11 can encode to.

Maybe this isn't an encoding. Maybe this is encryption.

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

it's just the ascii value + 64

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

B3 A5 A3 B2 A5 B4 6E B0 A1 A3
subtract 64 from each letter

>>> ''.join(chr(x-64) for x in [0xB3,0xA5,0xA3,0xB2,0xA5,0xB4,0x6E,0xB0,0xA1,0xA3])
'secret.pac'

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

also the 54-byte header thing was wrong. it's variable length, because of course it is!

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

okay so, PAC:
the header for the file itself is 16 bytes.
Then each chunk starts with a null-terminated string, encoded with that silly +64 ASCII mode.
Then there's another NUL byte, then 32 bytes of per-chunk header, then the raw chunk data.

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

ugh.
the +64 ascii string thing doesn't work for all files. some of them end up negative

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

34 B6?

THAT DOESN'T MAKE ANY SENSE

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

way too short to be a filename and it's also -12, 118 after decoding

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

HOW DO YOU HAVE NEGATIVE ASCII INDEXES

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

if we assume it loops around and thus this should be F4 76, it's not valid shift-jis, but in utf-16 it'd be 直, which... makes little sense.

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

I changed my code to ignore that sometimes the filenames make no sense, but then it errors after that: apparently the filenames not decoding ALSO breaks the variable-length headers. Interesting.

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

interesting: logo.pac goes "40 3F 00 00 A7 AC AF A7 AF 9F 70 71 6E B4 A9 AD 60 D4"

so my code was stopping after 40 3F.
but A7 AC AF A7 ... looks more like a filename

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

and it encodes as "glogo_01.tim \x94"

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

so I must be missing something, like some out-of-band file length indicator

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

got it. the first 2-4? bytes of the PAC are a list of how many 4-byte words come before the filename.
the 40 3f 00 00 before the filename in LOGO.PAC isn't part of the filename, it's part of the header.

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

I can't figure out how it's determining when filenames end, though.

Maybe it's assuming they all have extensions and all extensions are 3 letters long?

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

that makes some of the files make sense and some of the others not make sense!

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

oh god there is compression

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

not all files are compressed. but some are

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

found the code where it parses the PAC headers.
It's terrible as expected.

The pre-pac header stuff gives you a pointer into each header, but then the fun part is that the pointer is not to the beginning, it's to the middle. So it looks things up by indexing forward AND backward

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

so the filename starts at the offset of, uh, negative 28

Show thread

Alice Averlong🏳️‍⚧️

and here's how it determines the ending: it's until it hits a 0, OR the filename ends up being 12 characters long.

FUCK

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

someday I'm gonna reverse engineer a game and not want to timetravel back to its creation and ask them WHAT THE FUCK at gunpoint

sometimes I won't even ask, I'll just start shooting

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

so I'm just gonna take all my current PAC parsing code and throw it out and replace it with the nonsense of the actual code.

that was my fatal mistake: I was writing parsing code assuming this shit made any fucking sense

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

also I think there's a mistake in this code OR ghidra is decoding it incorrectly.
it seems to be trying to ensure all filenames are uppercase, but because it's wrong, it is corrupting all non-lowercase characters.

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

they might not have noticed if they apply the same "uppercase" transformation when trying to load filenames, because both would be corrupted in the same way

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

okay so now I've got working filenames, offsets, lengths, and compressed lengths. So I can find out what files are where and if they're compressed. I can't uncompress them yet.

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

I have located the decompression routine.
now to try to figure out what the fuck it does

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

this decompression routine is big-endian.

on a little-endian system.

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

WHERE DID THEY GET THIS

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

it seems it's loading 16bit lengths, then using the top 15 bits? with the lowest bit as a flag?

I don't recognize this. I don't think it's DEFLATE

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

so looking at this code, it doesn't seem to involve huffman encoding. there's no tables, just some look-back with a sliding (I think?) window.

So this is just a slightly fancy RLE, I think?

Show thread

Matteꙮ Italia Aug 28, 2023

@foone some LZ77 variant maybe?

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

I'm gonna try to bypass figuring out the compression right now by just stuffing the ghidra code into a C program and calling it from python

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

Bingo. it works!

Mostly. my output file is always 64mb but that's because I don't have a good way to tell how big it should be

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

even more bingo.
I have textures now.

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

there's some weird-ass shit going on here. like, the datafiles have some PAC chunks with type 36.
As far as I can tell, there's no code that handles chunk-36.
So the only way that makes sense is if part of the game dynamically loads code which then registers a chunk-36 parser

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

so each character is stored in a a PAC file under viewer\ (inside AZU.APF)
So like, the first Chiyo-chan is chi_v.pac
That PAC file contains 91 texture images and a 95 kilobyte GMD file, which seems to contain all the geometry AND animations.

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

so the next step is to figure out how GMD works.
Fortunately I know where the function that parses it starts

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

the code also seems to handle decoding two versions of the GMD format, but I can only find one in use in the datafiles.

Maybe they used the other in the One Piece games, and just never dropped support?

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

oh hello. I took a quick look at the second of the One Piece games, and it turns out they did the same thing as Azumanga and included the executable inside the APF file... but they did it twice, and the two don't match!

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

and it looks like that for the second One Piece game, it uses .TMD files, which have a completely different header than the two supported by azumanga

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

ahh. it looks like the GMD format gets loaded recursively, and I bet that's why there are a total of 3 (not 2, as I suspected) different versions of it.
I bet versions 1 and 0 appear inside version 2

Show thread

h100gfld Aug 28, 2023

@foone

Is this encryption dot gif

Show thread

The Doctor Aug 30, 2023

@foone That sounds like a BOFH excuse.

Show thread

Studio 8502

Aug 30, 2023

@drwho @foone Random fun fact: Every WME64 unit I sell is tested by connecting via Telnet to the BOFH excuse server (telnet://bofh.jeffballard.us:666) before shipping.

Show thread

HarJIT Aug 28, 2023

@foone Gives me flashbacks to https://ccsids.net/3-3220-050/sc354.html

The strange case of Code Page 354 | Code page information

Show thread

endrift 🏳️‍⚧️Aug 28, 2023

@foone where does Factor 5 fit in here

Show thread

Alice Averlong🏳️‍⚧️Aug 28, 2023

@endrift they know what they fucking did

Show thread

FozzTexx Aug 28, 2023

@foone sounds like so,ethung they copied from CP/M

Show thread

Paul Lalonde Aug 28, 2023

@foone DOS filenames - 8.3, so 12 chars. Ugly.

Show thread

Sekoia Aug 29, 2023

@foone isn't there a C function that does exactly that? fnread or smth? Technically that saves a byte, but... it's a singular byte. Just null-terminate it.