why the heck does the PS1 have a "NoFunction" syscall?

I know about NOPs, by why a NOP syscall?

it has been zero days since I crashed an emulator

ahh, fucking MIPS.
How do you get a full 32bit address into a register?

MOV EAX,800771DC ?

NO GET THAT X86 BULLSHIT OUT OF HERE.

lui v0,0x8007
addiu v0,v0,0x6e50
addiu v0,v0,0x404

that's an address encoding that'll put some hair on your chest!

you may notice the math here doesn't make sense. I agree that it doesn't make sense. but it seems to work. Something is very wrong
ahh no it's just a confusing loop.
that address doesn't equal 800771DC, it's 80077254
You gotta love when it turns out a game is just spewing debugging info on the normally invisible serial terminal, so you just need to connect to see it
remember when writing code that parses data formats, always make sure it's a complex mess of dynamic callbacks indexed on magic bytes that you do arithmetic on. never just have a big switch table or a bunch of if-thens.
this won't make your program any better but it will absolutely give headaches to the poor reverse engineers trying to figure out your file formats 21 years later
so I'm trying to figure out the PAC format used inside the APFrs files used by Azumanga Donjyara Daioh and One Piece: Grand Battle! 1/2.
it has at least 11 types of sub-chunks, of which I know SDFC, VH, VB, and SEP are 4 of them.
The other 7? unknown.
however, those are only the ones known at compile time: there's a lookup table for the chunk types, and I know that at least at one point, it registers and unregisters two more.
I can't be sure yet if those two more are overriding existing chunk types, or if they're entirely new ones
partially because the chunk numbers aren't used as-is. They seem to be adjusted at runtime. So like, some chunks are 0-31, but chunks 32 and up get 32 subtracted from them? It's confusing

or... every callback is registered in pairs, and the second callback is at the same number as the first, +16, and in all cases, it's set to NULL?

WHAT EVEN IS THIS

okay so the chunk IDs seem to be related to different types of chunk handlers
chunk IDs 0-31 use a 3-parameter callback, and 32-47 use a 4-parameter callback
you could have just made them all take 4 parameters and just have some of them ignore the 4th parameter but NO we gotta make everything complicated so that foone's little brain can't handle it
you'd think the programmers of an Azumanga Daioh, of all games, would realize that the eventual reverse engineer hacking their game might be an Osaka, and would not over-complicated it
oh hello. Someone left the output of a tool on the disc!
Data Pack2 by OOTUKA, Technosoft Co LTD, eh?
that's very interesting. Technosoft had nothing to do with this game... they didn't even exist anymore when it came out.
but given the 1996-1998 dates, I'm guessing they made this tool for one of their PS1 games they released in that period, and it later got used by Ganbarion for Azumanga Donjara Daioh and the One Piece games
Shuji Yoshida is credited as "Library Program" on all three games I know that use PAC files.
It's possible he's OOTUKA.
or it might mean he made the APF files
okay so the output of that tool is kinda handy.
because while it's not 100% correct (they changed shit after this file was made), it's still partially correct: azending.pac DOES include endto.pac, in it's entirety
and it looks like there's a 32 or 36 byte header before the file. So maybe the PAC files are concatenated subfiles with headers right before them
okay it's a 54-byte header.
so PAC is a lazy TAR clone
I just need to write a script to decode it. but my brain isn't working now
the weird thing is that the text file suggests the PAC files contain filenames, but I don't see them. Now, there IS a stretch of bytes that could be a filename, but I can't seem to decode it as anything sensible:
B3 A5 A3 B2 A5 B4 6E B0 A1 A3

it does decode as shift-jis (which the text file was encoded as) but turns into:
ウ・」イ・エー。」

which I don't think makes any sense

and if you decode it as utf-16, the most reasonable encoding for windows computers at the time, you end up with ꖳ늣뒥끮ꎡ, which makes even less sense.

I'm pretty sure they didn't name the files in their Azumanga Daioh game in a mix of Mande, Korean, and Sino-Tibetan scripts

but by matching up the filenames with the text file (azmem.txt) and what subfiles are definitely inside azending.pac, that pile of gibberish is supposed to mean "secret.pac"
wait
maybe this means something.
the "C" in "SECRET" is encoded the same as the "C" in "PAC"
And note that the A in PAC is encoded as A1, which is only 2 less than the A3 which C is encoded as.
what encoding puts ABCDEF at A1 and up, though?

answer: nothing python 3.11 can encode to.

Maybe this isn't an encoding. Maybe this is encryption.

it's just the ascii value + 64

B3 A5 A3 B2 A5 B4 6E B0 A1 A3
subtract 64 from each letter

>>> ''.join(chr(x-64) for x in [0xB3,0xA5,0xA3,0xB2,0xA5,0xB4,0x6E,0xB0,0xA1,0xA3])
'secret.pac'

also the 54-byte header thing was wrong. it's variable length, because of course it is!
okay so, PAC:
the header for the file itself is 16 bytes.
Then each chunk starts with a null-terminated string, encoded with that silly +64 ASCII mode.
Then there's another NUL byte, then 32 bytes of per-chunk header, then the raw chunk data.
ugh.
the +64 ascii string thing doesn't work for all files. some of them end up negative

34 B6?

THAT DOESN'T MAKE ANY SENSE

way too short to be a filename and it's also -12, 118 after decoding
HOW DO YOU HAVE NEGATIVE ASCII INDEXES
if we assume it loops around and thus this should be F4 76, it's not valid shift-jis, but in utf-16 it'd be 直, which... makes little sense.
@foone 64 is just a single bit unset if set..... I wonder do you just invert that bit maybe? Or something else incredibly silly.
@Doridian the problem is that it doesn't make sense with the dot. it could almost be toggling the top two bits, but then the . turns into ®
@foone Maybe numbers aren't included so it's supposed to be 4v?
I changed my code to ignore that sometimes the filenames make no sense, but then it errors after that: apparently the filenames not decoding ALSO breaks the variable-length headers. Interesting.

interesting: logo.pac goes "40 3F 00 00 A7 AC AF A7 AF 9F 70 71 6E B4 A9 AD 60 D4"

so my code was stopping after 40 3F.
but A7 AC AF A7 ... looks more like a filename

and it encodes as "glogo_01.tim \x94"
so I must be missing something, like some out-of-band file length indicator
got it. the first 2-4? bytes of the PAC are a list of how many 4-byte words come before the filename.
the 40 3f 00 00 before the filename in LOGO.PAC isn't part of the filename, it's part of the header.

I can't figure out how it's determining when filenames end, though.

Maybe it's assuming they all have extensions and all extensions are 3 letters long?

that makes some of the files make sense and some of the others not make sense!
oh god there is compression
@foone we need a foone out of context bot on mastodon for things like this.
@jensen my old twitter joke was always that that bot already exists, and it's at @foone
not all files are compressed. but some are

found the code where it parses the PAC headers.
It's terrible as expected.

The pre-pac header stuff gives you a pointer into each header, but then the fun part is that the pointer is not to the beginning, it's to the middle. So it looks things up by indexing forward AND backward

so the filename starts at the offset of, uh, negative 28

and here's how it determines the ending: it's until it hits a 0, OR the filename ends up being 12 characters long.

FUCK

someday I'm gonna reverse engineer a game and not want to timetravel back to its creation and ask them WHAT THE FUCK at gunpoint

sometimes I won't even ask, I'll just start shooting

so I'm just gonna take all my current PAC parsing code and throw it out and replace it with the nonsense of the actual code.

that was my fatal mistake: I was writing parsing code assuming this shit made any fucking sense

also I think there's a mistake in this code OR ghidra is decoding it incorrectly.
it seems to be trying to ensure all filenames are uppercase, but because it's wrong, it is corrupting all non-lowercase characters.
they might not have noticed if they apply the same "uppercase" transformation when trying to load filenames, because both would be corrupted in the same way
okay so now I've got working filenames, offsets, lengths, and compressed lengths. So I can find out what files are where and if they're compressed. I can't uncompress them yet.
I have located the decompression routine.
now to try to figure out what the fuck it does
The strange case of Code Page 354 | Code page information

@foone sounds like so,ethung they copied from CP/M
@foone DOS filenames - 8.3, so 12 chars. Ugly.
@foone isn't there a C function that does exactly that? fnread or smth? Technically that saves a byte, but... it's a singular byte. Just null-terminate it.
@foone Are you sure you're not trying to rescue the file system on an ancient MFM hard drive found under a burned out car?
@foone It could be graphic characters, or maybe they just ignore the msb
@foone oh I had the idea for this a while back
@foone the forbidden codepoints