not much more to see on CD 2, let's see the last one.
CD 3 contains data4.cab (as expected) and same stuff as CD 2 and 1 did which is ie6pl and Shockwave installer
oddly enough there seems to be no difference in size or file count between extracting only those found on CD 1 than those found on all CD s
Now all there's left to do is to try and decipher those .vol files i kept seeing around and having no idea what's in them.
Well, 7zip didn't didn't decompress it.
binwalk seems to be doing something, that's cool.
Filling my smaller disk to it's brim is what it was doing.
To clear things up it's a small 256GiB disk with no compression, for system boot and other tasks that require fast disk but not lots of storage, my usual working directory is on in, but looks like i need to move it to compressed 1TB ZFS drive and maybe tell binwalk something.
This "something" turn out to be flag -r (delete carved files).
I've found and am binwalk-ing file containing actual definitions, but binwalk has no idea whatsoever about filenames, either i find how to get them or i'm going to have find a way to automagically name them.
All those files seem to have a <script> tag that has object definition in some ancient form of JS, (i've copy-pasted it into node,js and it worked despite variable name beginning with a $) as a way of storing metadata, this is cool because it allows me to write a script that automagically renames those files to their titles and/or copies the entries to a SQL db.
after i figure out how to fix the encoding of course
As is commonly known it's the web browser that knows and understands all the weird character encodings. So i've asked my librewolf to tell me and it says it's windows 1250. I've told that to iconv and it sure did convert that sample file i have, which is cool.
The binwalk put to decode definitions is still cooking of course.
I've wrote a quick and dirty PHP script that gets uses DOM and nodejs to get title from one of those files and then rename it to said title, now i think nodejs should actually output JSON with all the data and php should put it all into a SQLite database while renaming those files to some kind of unique ID, but this is for tomorrow because it's late, binwalk is still carving and i've gtg sleep.
It is tomorrow now, Lets continue, computer claims there are 122822 files extracted taking up total of 568M which is fine.
Now i'm using (on a copy of course) parallel and iconv to translate all those files from windows 1250 encoding to UTF-8 because windows encodings do not spark joy.
Turns out that common linux utilities such as ls and mv have input argument count limit and it's less than 122822.
The program works, it's not fast (because of calls to node-js i suppose) but it's getting metadata and text from those files.
Now all i need is to make it put it all to sqlite db and make dinner, sadly dinner will have to be made before code.
After me being interrupted multiple times the php script that copies data from those files to sql db is finished, it's not fast (because of node.js and popen()) but it will copy everything eventually (or it won't and will crash instead). It didn't on first couple thousand so it's gonna be ok i guess.
I didn't have time for this for a while and the code crashed in the middle of the night which sucks. Now i've found two possible culprits and eliminated them, also added that much-needed printf() that shows which file it's working on right now should be there long time ago but i just didn't do that for reasons unknown. I'm gonna explore it further when i'll have some time again.
I've got all of those definitions in a sqlite db! Finally!
Got around to fiddling with it again. Now i'm gonna try to recover images from file called d-bmp.vol which was on CD 2. Last time i've tried i did it on small and fast SSD that i've got my system on, and it somehow managed to fill all free space on it. Now i'm doing it on my big 1TB disk with ZFS and compression, it's got plenty of free space so it shouldn't fill up.
Well i'm gonna let binwalk cook and just go to sleep, it's still not even near done and i'm sleepy.
It's got 0x10ed693c files that weigh 210MiB as reported by du.
I pasted wrong number, that big number is d-bmp.vol file size. Now it's unpacked and it's got 5888 files and 254MiB (as reported by du).
There seems to be references to images in those HTML files but other than i've initially thought there are. Images are indeed referred to using their MD5 hashes (which means my guess was good) but they're referenced using a field in metadata, mysterious hash-like things in html comments near end of files which contain descriptions still remain a mystery.
I've found out about that in kind of reverse fashion, i've picked random image (GIF with simple schematics of a transformer) and ran grep -R on all html files, it had indeed found one referencing transformer, which is good.
Some (atleast one) of those "image" files are flash files, b/c hell yeah
I've made some statistics there are 305 "Macromedia Flash" files, 222 GIF-s and 5346 jpegs, one file is unknown.
I've found that mysterious file and it seems to contain metadata in a format i can't recognize or understand from it's hexdump, it seems to contain bunch of things that look like md5 hash-es, i've already guessed that and i'm quite sure that my 12 lines of PHP is the faster way to name them.
This suggests that other .vol files might have similar structure, there are three of them, one with definitions that i have already mostly figured out and one with (i suppose) videos, which is again, more sensible to auto-rename with my script than to try to this magic meta file.
I'm looking at those files, and there seem to be two independent ways of adding images to definition text. One is by using magic JS Link() function and other is by <img> html tag, it's src attribute is indecipherable to me. After looking closely at a good example i think i understand what's in that magic html comment at bottom of each definition file, it contains it's ID and hash. Encyclopedia's internal IDs are quite valuable to me because those articles reference each other using them.
I've modified my script that copies data from html files to sqlite database, now it's saving those id's from comments treats links differently, now links whose type isn't "txt" it displays Link() function's parameters as JSON instead of just ignoring them, this would allow me to make part of image and other references work. I still have to figure out how to make <img> embedding work.
I've found (in another mysterious .vol) bunch of GIFs with rendered mathematical equations, it's mystery file seems to contain the string by which one file i've checked earlier (one about quadratic function) tried to refer to in it's <img> tag. The bad part is i have no idea how to determine which one it is. and i still have no idea how the possibly-metadata file works.
This strange metadata file format is finally beginning to make sense. When it's bedtime of course. It's "record" is hella long (0x24 bytes!) and contains file's address within .vol file, string offset relative to beginning of first string in latter part of the file and bunch of stuff i have yet to decipher. I will make a C program that parses this shit and properly unpacks those .vol files, which is possibly what i should have done in the first place.
I've wrote a C script, full of magic and it worked, renamed all those files with equations and stuff. It was like two days ago and i failed to report on it. Now i'm gonna make a shrimple website so i can show my friends that i've indeed managed to extract it from all the cursedness. First of all i have to transfer all my definitions from sqlite database to mysql so i can use in in normal web setting (not that i particulary like mysql)
Also i'm installing phpmyadmin from apt b/c i'm too lazy to get current version from site.
Apparently phpmyadmin doesn't support 100MB imports so i have to do cat dump.sql | sudo mysql enc
apparently definition type needs to be LONGTEXT not just text, which was totally fine in sqlite

Soo after not touching the whole encyclopedia thingy for a long time i randomly decided to try to reverse engineer its executables.

I semi randomly picked an .exe. The file is called MEP_2003.exe and i can't remember any context for it because i don't

I'm just looking trough the strings and i see hardcoded scripts. yea
Ok, it seems to be the program i'm looking for. Ofc i didn't tell you what it is that i'm looking for. It's how they get their filenames for inline images from those .vol files.
And (of course) decoding .vol file magic is not in this .exe, since it imports functions that seem related.
There's a .dll with a Very Informative Name 20sys_R.dll that exports those functions.
Actually it should be possible to make a C program that loads this .dll and uses it to decompress my files. Or i might look at it long and hard enough to decipher metadata structure.
I'm too sleepy do to any of it rn.

okok, i do be back until dinner needs my attention, and i might wirte down moreorless exactly what i'm doing so my posts noone reads have some educational value.

I've opened the 20sys_R.dll in iaito (my reverse engineering program of choice, they are all fine, most ppl use Ghidra as far as i know)

and i'm looking at exports, the program shows me demangled symbols which is very good because they're more readable, they're actually program's guesses on function definitions.
since those are exports they have names assigned by developer (and therefore human-readable) and not by Iaito, which is very nice for me.