I'm afraid it's java there's lots of .jar files in there. I dislike java.
Apparently Java is used by small html files with not much more than an <applet> tag which uses java to display jpeg. In that html file in <applet> there's a parameter that points to .ivr file that contains some magic metadata and file name that ends with .jpg. this metadata does not look useful to me.
Also names of those files seem to be their md5 hashes. You know, to make it more painful to look at.
There are 144 sets of html ivr and jpg files. Those are pictures of well-known places like castles, rustic town squares etc, also (despite being jpegs) they don't seem too compressed.
In another directory there's bunch of html files with applet tags pointing to jar files. I've unpacked one file and found this. I must remind you that it's an encyclopedia or something.
Also i begin to suspect that the directory "ie6pl" contains polish version of famous Internet Exploder 6.
Looks like that CD 1 doesn't contain actual encyclopedia anyway. I'm going to move on to next ones.
CD 2 seems to contain far less files. It's still being copied over to archive pc's harddrive and then i'll copy it to my main workstation, funny how reading data from CD takes longer than rsync-ing it over my network.

At first glance i see that directory ie6pl seems to be on this cd too (dunno why) and that there's data3.cab and not much more. It's still copying.

Edit: typo

Ok, i've got those files, it appears that it contains Adobe Shockwave installer and ie6pl directory that (presumably) cointains IE6. Also there's data3.cab, let's unpack it and see.
I've had to copy data1.hdr from CD 1 because it looks like i have to have headers for those InstallShield "cabinet" files to extract them.
It didn't really work, there were pretty much no files which is strange, i'll collect all .cad files from all three CDs in a separate directory and extract them all later.
not much more to see on CD 2, let's see the last one.
CD 3 contains data4.cab (as expected) and same stuff as CD 2 and 1 did which is ie6pl and Shockwave installer
oddly enough there seems to be no difference in size or file count between extracting only those found on CD 1 than those found on all CD s
Now all there's left to do is to try and decipher those .vol files i kept seeing around and having no idea what's in them.
Well, 7zip didn't didn't decompress it.
binwalk seems to be doing something, that's cool.
Filling my smaller disk to it's brim is what it was doing.
To clear things up it's a small 256GiB disk with no compression, for system boot and other tasks that require fast disk but not lots of storage, my usual working directory is on in, but looks like i need to move it to compressed 1TB ZFS drive and maybe tell binwalk something.
This "something" turn out to be flag -r (delete carved files).
I've found and am binwalk-ing file containing actual definitions, but binwalk has no idea whatsoever about filenames, either i find how to get them or i'm going to have find a way to automagically name them.
All those files seem to have a <script> tag that has object definition in some ancient form of JS, (i've copy-pasted it into node,js and it worked despite variable name beginning with a $) as a way of storing metadata, this is cool because it allows me to write a script that automagically renames those files to their titles and/or copies the entries to a SQL db.
after i figure out how to fix the encoding of course
As is commonly known it's the web browser that knows and understands all the weird character encodings. So i've asked my librewolf to tell me and it says it's windows 1250. I've told that to iconv and it sure did convert that sample file i have, which is cool.
The binwalk put to decode definitions is still cooking of course.
I've wrote a quick and dirty PHP script that gets uses DOM and nodejs to get title from one of those files and then rename it to said title, now i think nodejs should actually output JSON with all the data and php should put it all into a SQLite database while renaming those files to some kind of unique ID, but this is for tomorrow because it's late, binwalk is still carving and i've gtg sleep.
It is tomorrow now, Lets continue, computer claims there are 122822 files extracted taking up total of 568M which is fine.
Now i'm using (on a copy of course) parallel and iconv to translate all those files from windows 1250 encoding to UTF-8 because windows encodings do not spark joy.
Turns out that common linux utilities such as ls and mv have input argument count limit and it's less than 122822.
The program works, it's not fast (because of calls to node-js i suppose) but it's getting metadata and text from those files.
Now all i need is to make it put it all to sqlite db and make dinner, sadly dinner will have to be made before code.
After me being interrupted multiple times the php script that copies data from those files to sql db is finished, it's not fast (because of node.js and popen()) but it will copy everything eventually (or it won't and will crash instead). It didn't on first couple thousand so it's gonna be ok i guess.
I didn't have time for this for a while and the code crashed in the middle of the night which sucks. Now i've found two possible culprits and eliminated them, also added that much-needed printf() that shows which file it's working on right now should be there long time ago but i just didn't do that for reasons unknown. I'm gonna explore it further when i'll have some time again.
I've got all of those definitions in a sqlite db! Finally!
Got around to fiddling with it again. Now i'm gonna try to recover images from file called d-bmp.vol which was on CD 2. Last time i've tried i did it on small and fast SSD that i've got my system on, and it somehow managed to fill all free space on it. Now i'm doing it on my big 1TB disk with ZFS and compression, it's got plenty of free space so it shouldn't fill up.
Well i'm gonna let binwalk cook and just go to sleep, it's still not even near done and i'm sleepy.
It's got 0x10ed693c files that weigh 210MiB as reported by du.
I pasted wrong number, that big number is d-bmp.vol file size. Now it's unpacked and it's got 5888 files and 254MiB (as reported by du).
There seems to be references to images in those HTML files but other than i've initially thought there are. Images are indeed referred to using their MD5 hashes (which means my guess was good) but they're referenced using a field in metadata, mysterious hash-like things in html comments near end of files which contain descriptions still remain a mystery.
I've found out about that in kind of reverse fashion, i've picked random image (GIF with simple schematics of a transformer) and ran grep -R on all html files, it had indeed found one referencing transformer, which is good.
Some (atleast one) of those "image" files are flash files, b/c hell yeah
I've made some statistics there are 305 "Macromedia Flash" files, 222 GIF-s and 5346 jpegs, one file is unknown.
I've found that mysterious file and it seems to contain metadata in a format i can't recognize or understand from it's hexdump, it seems to contain bunch of things that look like md5 hash-es, i've already guessed that and i'm quite sure that my 12 lines of PHP is the faster way to name them.
This suggests that other .vol files might have similar structure, there are three of them, one with definitions that i have already mostly figured out and one with (i suppose) videos, which is again, more sensible to auto-rename with my script than to try to this magic meta file.
I'm looking at those files, and there seem to be two independent ways of adding images to definition text. One is by using magic JS Link() function and other is by <img> html tag, it's src attribute is indecipherable to me. After looking closely at a good example i think i understand what's in that magic html comment at bottom of each definition file, it contains it's ID and hash. Encyclopedia's internal IDs are quite valuable to me because those articles reference each other using them.