Mastodawn

I'm afraid it's java there's lots of .jar files in there. I dislike java.

Apparently Java is used by small html files with not much more than an <applet> tag which uses java to display jpeg. In that html file in <applet> there's a parameter that points to .ivr file that contains some magic metadata and file name that ends with .jpg. this metadata does not look useful to me.

Show thread

montar 4d ago

Also names of those files seem to be their md5 hashes. You know, to make it more painful to look at.

Show thread

montar 4d ago

There are 144 sets of html ivr and jpg files. Those are pictures of well-known places like castles, rustic town squares etc, also (despite being jpegs) they don't seem too compressed.

Show thread

montar 4d ago

In another directory there's bunch of html files with applet tags pointing to jar files. I've unpacked one file and found this. I must remind you that it's an encyclopedia or something.

Show thread

montar 4d ago

Also i begin to suspect that the directory "ie6pl" contains polish version of famous Internet Exploder 6.

Show thread

montar 4d ago

Looks like that CD 1 doesn't contain actual encyclopedia anyway. I'm going to move on to next ones.

Show thread

montar 4d ago

CD 2 seems to contain far less files. It's still being copied over to archive pc's harddrive and then i'll copy it to my main workstation, funny how reading data from CD takes longer than rsync-ing it over my network.

Show thread

montar 4d ago

At first glance i see that directory ie6pl seems to be on this cd too (dunno why) and that there's data3.cab and not much more. It's still copying.

Edit: typo

Show thread

montar 4d ago

Ok, i've got those files, it appears that it contains Adobe Shockwave installer and ie6pl directory that (presumably) cointains IE6. Also there's data3.cab, let's unpack it and see.

Show thread

montar 4d ago

I've had to copy data1.hdr from CD 1 because it looks like i have to have headers for those InstallShield "cabinet" files to extract them.

Show thread

montar 4d ago

It didn't really work, there were pretty much no files which is strange, i'll collect all .cad files from all three CDs in a separate directory and extract them all later.

Show thread

montar 4d ago

not much more to see on CD 2, let's see the last one.

Show thread

montar 4d ago

CD 3 contains data4.cab (as expected) and same stuff as CD 2 and 1 did which is ie6pl and Shockwave installer

Show thread

montar 4d ago

oddly enough there seems to be no difference in size or file count between extracting only those found on CD 1 than those found on all CD s

Show thread

montar 4d ago

Now all there's left to do is to try and decipher those .vol files i kept seeing around and having no idea what's in them.

Show thread

montar 4d ago

Well, 7zip didn't didn't decompress it.

Show thread

montar 4d ago

binwalk seems to be doing something, that's cool.

Show thread

montar 4d ago

Filling my smaller disk to it's brim is what it was doing.

Show thread

montar 4d ago

To clear things up it's a small 256GiB disk with no compression, for system boot and other tasks that require fast disk but not lots of storage, my usual working directory is on in, but looks like i need to move it to compressed 1TB ZFS drive and maybe tell binwalk something.

Show thread

montar 4d ago

This "something" turn out to be flag -r (delete carved files).

Show thread

montar 4d ago

I've found and am binwalk-ing file containing actual definitions, but binwalk has no idea whatsoever about filenames, either i find how to get them or i'm going to have find a way to automagically name them.

Show thread

montar 4d ago

All those files seem to have a <script> tag that has object definition in some ancient form of JS, (i've copy-pasted it into node,js and it worked despite variable name beginning with a $) as a way of storing metadata, this is cool because it allows me to write a script that automagically renames those files to their titles and/or copies the entries to a SQL db.

Show thread

montar 4d ago

after i figure out how to fix the encoding of course

Show thread

montar 4d ago

As is commonly known it's the web browser that knows and understands all the weird character encodings. So i've asked my librewolf to tell me and it says it's windows 1250. I've told that to iconv and it sure did convert that sample file i have, which is cool.

Show thread

montar 4d ago

The binwalk put to decode definitions is still cooking of course.

Show thread

montar 4d ago

I've wrote a quick and dirty PHP script that gets uses DOM and nodejs to get title from one of those files and then rename it to said title, now i think nodejs should actually output JSON with all the data and php should put it all into a SQLite database while renaming those files to some kind of unique ID, but this is for tomorrow because it's late, binwalk is still carving and i've gtg sleep.

Show thread

montar 4d ago

It is tomorrow now, Lets continue, computer claims there are 122822 files extracted taking up total of 568M which is fine.

Show thread

montar 4d ago

Now i'm using (on a copy of course) parallel and iconv to translate all those files from windows 1250 encoding to UTF-8 because windows encodings do not spark joy.

Show thread

montar 4d ago

Turns out that common linux utilities such as ls and mv have input argument count limit and it's less than 122822.

Show thread

montar 4d ago

The program works, it's not fast (because of calls to node-js i suppose) but it's getting metadata and text from those files.

Show thread

montar 4d ago

Now all i need is to make it put it all to sqlite db and make dinner, sadly dinner will have to be made before code.

Show thread

montar 4d ago

After me being interrupted multiple times the php script that copies data from those files to sql db is finished, it's not fast (because of node.js and popen()) but it will copy everything eventually (or it won't and will crash instead). It didn't on first couple thousand so it's gonna be ok i guess.

Show thread

montar 2d ago

I didn't have time for this for a while and the code crashed in the middle of the night which sucks. Now i've found two possible culprits and eliminated them, also added that much-needed printf() that shows which file it's working on right now should be there long time ago but i just didn't do that for reasons unknown. I'm gonna explore it further when i'll have some time again.

Show thread

montar 1d ago

I've got all of those definitions in a sqlite db! Finally!

Show thread

montar 23h ago

Got around to fiddling with it again. Now i'm gonna try to recover images from file called d-bmp.vol which was on CD 2. Last time i've tried i did it on small and fast SSD that i've got my system on, and it somehow managed to fill all free space on it. Now i'm doing it on my big 1TB disk with ZFS and compression, it's got plenty of free space so it shouldn't fill up.

Show thread

montar 22h ago

Well i'm gonna let binwalk cook and just go to sleep, it's still not even near done and i'm sleepy.

Show thread

montar 22h ago

It's got 0x10ed693c files that weigh 210MiB as reported by du.

Show thread

montar 12h ago

I pasted wrong number, that big number is d-bmp.vol file size. Now it's unpacked and it's got 5888 files and 254MiB (as reported by du).

Show thread

montar 3h ago

There seems to be references to images in those HTML files but other than i've initially thought there are. Images are indeed referred to using their MD5 hashes (which means my guess was good) but they're referenced using a field in metadata, mysterious hash-like things in html comments near end of files which contain descriptions still remain a mystery.