Mastodawn

CD 3 contains data4.cab (as expected) and same stuff as CD 2 and 1 did which is ie6pl and Shockwave installer

oddly enough there seems to be no difference in size or file count between extracting only those found on CD 1 than those found on all CD s

Show thread

montar Mar 28

Now all there's left to do is to try and decipher those .vol files i kept seeing around and having no idea what's in them.

Show thread

montar Mar 28

Well, 7zip didn't didn't decompress it.

Show thread

montar Mar 28

binwalk seems to be doing something, that's cool.

Show thread

montar Mar 28

Filling my smaller disk to it's brim is what it was doing.

Show thread

montar Mar 28

To clear things up it's a small 256GiB disk with no compression, for system boot and other tasks that require fast disk but not lots of storage, my usual working directory is on in, but looks like i need to move it to compressed 1TB ZFS drive and maybe tell binwalk something.

Show thread

montar Mar 28

This "something" turn out to be flag -r (delete carved files).

Show thread

montar Mar 28

I've found and am binwalk-ing file containing actual definitions, but binwalk has no idea whatsoever about filenames, either i find how to get them or i'm going to have find a way to automagically name them.

Show thread

montar Mar 28

All those files seem to have a <script> tag that has object definition in some ancient form of JS, (i've copy-pasted it into node,js and it worked despite variable name beginning with a $) as a way of storing metadata, this is cool because it allows me to write a script that automagically renames those files to their titles and/or copies the entries to a SQL db.

Show thread

montar Mar 28

after i figure out how to fix the encoding of course

Show thread

montar Mar 29

As is commonly known it's the web browser that knows and understands all the weird character encodings. So i've asked my librewolf to tell me and it says it's windows 1250. I've told that to iconv and it sure did convert that sample file i have, which is cool.

Show thread

montar Mar 29

The binwalk put to decode definitions is still cooking of course.

Show thread

montar Mar 29

I've wrote a quick and dirty PHP script that gets uses DOM and nodejs to get title from one of those files and then rename it to said title, now i think nodejs should actually output JSON with all the data and php should put it all into a SQLite database while renaming those files to some kind of unique ID, but this is for tomorrow because it's late, binwalk is still carving and i've gtg sleep.

Show thread

montar Mar 29

It is tomorrow now, Lets continue, computer claims there are 122822 files extracted taking up total of 568M which is fine.

Show thread

montar Mar 29

Now i'm using (on a copy of course) parallel and iconv to translate all those files from windows 1250 encoding to UTF-8 because windows encodings do not spark joy.

Show thread

montar Mar 29

Turns out that common linux utilities such as ls and mv have input argument count limit and it's less than 122822.

Show thread

montar Mar 29

The program works, it's not fast (because of calls to node-js i suppose) but it's getting metadata and text from those files.

Show thread

montar Mar 29

Now all i need is to make it put it all to sqlite db and make dinner, sadly dinner will have to be made before code.

Show thread

montar Mar 29

After me being interrupted multiple times the php script that copies data from those files to sql db is finished, it's not fast (because of node.js and popen()) but it will copy everything eventually (or it won't and will crash instead). It didn't on first couple thousand so it's gonna be ok i guess.

Show thread

montar Mar 31

I didn't have time for this for a while and the code crashed in the middle of the night which sucks. Now i've found two possible culprits and eliminated them, also added that much-needed printf() that shows which file it's working on right now should be there long time ago but i just didn't do that for reasons unknown. I'm gonna explore it further when i'll have some time again.

Show thread

montar Apr 1

I've got all of those definitions in a sqlite db! Finally!

Show thread

montar Apr 1

Got around to fiddling with it again. Now i'm gonna try to recover images from file called d-bmp.vol which was on CD 2. Last time i've tried i did it on small and fast SSD that i've got my system on, and it somehow managed to fill all free space on it. Now i'm doing it on my big 1TB disk with ZFS and compression, it's got plenty of free space so it shouldn't fill up.

Show thread

montar Apr 1

Well i'm gonna let binwalk cook and just go to sleep, it's still not even near done and i'm sleepy.

Show thread

montar Apr 1

It's got 0x10ed693c files that weigh 210MiB as reported by du.

Show thread

montar Apr 2

I pasted wrong number, that big number is d-bmp.vol file size. Now it's unpacked and it's got 5888 files and 254MiB (as reported by du).

Show thread

montar Apr 2

There seems to be references to images in those HTML files but other than i've initially thought there are. Images are indeed referred to using their MD5 hashes (which means my guess was good) but they're referenced using a field in metadata, mysterious hash-like things in html comments near end of files which contain descriptions still remain a mystery.

Show thread

montar Apr 2

I've found out about that in kind of reverse fashion, i've picked random image (GIF with simple schematics of a transformer) and ran grep -R on all html files, it had indeed found one referencing transformer, which is good.

Show thread

montar Apr 2

Some (atleast one) of those "image" files are flash files, b/c hell yeah

Show thread

montar Apr 2

I've made some statistics there are 305 "Macromedia Flash" files, 222 GIF-s and 5346 jpegs, one file is unknown.

Show thread

montar Apr 2

I've found that mysterious file and it seems to contain metadata in a format i can't recognize or understand from it's hexdump, it seems to contain bunch of things that look like md5 hash-es, i've already guessed that and i'm quite sure that my 12 lines of PHP is the faster way to name them.

Show thread

montar Apr 2

This suggests that other .vol files might have similar structure, there are three of them, one with definitions that i have already mostly figured out and one with (i suppose) videos, which is again, more sensible to auto-rename with my script than to try to this magic meta file.

Show thread

montar Apr 2

I'm looking at those files, and there seem to be two independent ways of adding images to definition text. One is by using magic JS Link() function and other is by <img> html tag, it's src attribute is indecipherable to me. After looking closely at a good example i think i understand what's in that magic html comment at bottom of each definition file, it contains it's ID and hash. Encyclopedia's internal IDs are quite valuable to me because those articles reference each other using them.

Show thread

montar Apr 2

I've modified my script that copies data from html files to sqlite database, now it's saving those id's from comments treats links differently, now links whose type isn't "txt" it displays Link() function's parameters as JSON instead of just ignoring them, this would allow me to make part of image and other references work. I still have to figure out how to make <img> embedding work.

Show thread

montar Apr 2

I've found (in another mysterious .vol) bunch of GIFs with rendered mathematical equations, it's mystery file seems to contain the string by which one file i've checked earlier (one about quadratic function) tried to refer to in it's <img> tag. The bad part is i have no idea how to determine which one it is. and i still have no idea how the possibly-metadata file works.

Show thread

montar Apr 3

This strange metadata file format is finally beginning to make sense. When it's bedtime of course. It's "record" is hella long (0x24 bytes!) and contains file's address within .vol file, string offset relative to beginning of first string in latter part of the file and bunch of stuff i have yet to decipher. I will make a C program that parses this shit and properly unpacks those .vol files, which is possibly what i should have done in the first place.

Show thread

montar Apr 6

I've wrote a C script, full of magic and it worked, renamed all those files with equations and stuff. It was like two days ago and i failed to report on it. Now i'm gonna make a shrimple website so i can show my friends that i've indeed managed to extract it from all the cursedness. First of all i have to transfer all my definitions from sqlite database to mysql so i can use in in normal web setting (not that i particulary like mysql)

Show thread

montar Apr 6

Also i'm installing phpmyadmin from apt b/c i'm too lazy to get current version from site.

Show thread

montar Apr 6

Apparently phpmyadmin doesn't support 100MB imports so i have to do cat dump.sql | sudo mysql enc