Okay, I've got a page (128 words) of #PDP8 code that will generate a truncated Soundex hash of any ASCII or 6-bit TEXT string.

Since I have only 12 bits to store it, and the consonant classifiers are from 1-6, that takes 3 bits per digit. I'm using 5 bits to store the initial, so I get two digits plus a final bit just to distinguish if there was a third digit or not.

I coded this up in awk first to test it, with a giant `switch` statement (only in gawk, alas) mapping letters to digits. But in PAL assembler for the PDP-8, I decided instead to make a table like so:

```
ALPHBT, 2170; /CBA@
2173; /GFED
2277; /KJIH
7554; /ONML
2621; /SRQP
7173; /WVUT
0272; /[ZYX
0000; /_^]\
```

Basically the characters in this section of the character set are there in reverse order. The top three bits of the 5-bit char values indicate which word of this table to load in, and the last two indicate how many times to shift right by 3 before masking off the 3 least-significant bits. So the letter D is `04`, which means we get word `1` and shift it `0` times before pulling out the octal digit `3` (which is correct!)

Most of the code is tests for various bit patterns to see if we need to keep shifting or if we care (is it even a letter? Is it a vowel? Is it the same digit we saw last time? `H` and `W` are special cases...)

The 128 words includes all temporary variables and the pretty-printer. I'm still deciding if I like that `1` in the last place, or if I should show it as a `+` or something. I think it's misleading for folks who actually know Soundex.

I'm also relatively confident that there are optimisations I could make on the test logic. So much of it is "store, reload, load a comparator, add, skip on condition, etc etc" that there's bound to be room for a few dirty tricks. If I could, I'd fit a routine to tokenise an entire string of words into an array of hashes, and print the set.

(And yes, I could make that awk version more portable by leaning on regexes, I suppose)

Well now, I've finally had time to come back to this project and fix it up now that I'm done with my postgraduate degree.

I changed the encoding, because soundex can't distinguish between `SOUTHEAST` and `SOUTHWEST`. So I now use the most significant bit of the 12-bit hash to mean "this is soundex, not raw text". When it's raw text, you get one or two characters directly in the 12-bit word.

So this means that `S`, `SE`, `SW`, etc are all distinguished inexpensively. I've done analysis on the dictionaries for Hibernated, Trinity, Moonmist, and the Cloak of Darkness demo, and the collision rate is (I think) acceptable given the other factors that can disambiguate verbs and objects in the game grammar.

So, why am I doing this? What good could possibly come from a "soundex without enough digits" scheme?

Well, coding on the PDP-8 (or 12 or LINC) has taught me that the algorithms we think of as for "small machines" are all relative. People nowadays will think absolutely nothing of a process casually allocating more memory than any computer I owned in the 80s or 90s. And so you will find people touting efficient low-resource algorithms that will brag that they make do with a scant few kilobytes of memory for the index tables or trees they use. Such frugality!

Well, the PDP-8 most commonly came with 8k of core memory. Using 6 of those for a dictionary of possible words the player could type in just won't fit. The LINC only had 2k of core, and spent most of its time swapping to the random-access tapes it had!

So a couple years ago I found a paper describing a text compression algorithm for which I'm confident I can write a decompression algorithm in a page (128 machine words) of code, and had the idea to use a modified Soundex encoding to replace the input dictionary.

So I'll have a pass that tokenises all input into a string of these hashes, and the code for an object will include those single-machine-word hashes in a list of "these are the words you can use to refer to me". And grammar objects will help disambiguate verbs that collide (such as `L200`, which maps to both `LOCK` and `LOOK`, but nobody types `LOCK AT DOOR` or `LOOK DOOR WITH KEY`, so that's fine).

And one advantage is that this will make the confusions a little more understandable. Yeah, "lock" and "look" can be hard for humans to hear the difference in a noisy room, so it can be part of the charm that the game is a little fuzzy, less picky about how you spell words, but also has trouble telling `BLOOD` from `BLADE` without more information.

Incidentally, I would absolutely love to have a PiDP8 to demonstrate all this on, once I get it working. If anyone here has one that they either never got working or no longer use, hit me up. I'm happy to supply my own raspi for it.
I'm finding the challenge of "package this one function/functionality in one 128-word page" really fun to meet. It makes the code relocatable (so I could load it in as an overlay into any page in memory), and ensures that I'm not stepping on any "globals" in the zero page. I still need to take care with the core frame registers from time to time, but I think I already have a regimen for that anyway.

I had a busy week, so haven't been able to sit down and bang out any code for the ASCII text compression system, but that is definitely next. I'm basing it on https://doi.org/10.1093/comjnl/24.4.324 (EDIT: the author has put this up at http://www.jackpike.co.uk/36.Text%20compression%20using%20a%204%20bit%20coding%20scheme.pdf), but tuned for my 12-bit words.

One feature I worked out last night while drifting off to sleep is that if I make 0000 the "grab two nybbles as ASCII" symbol, it will natively handle a string of unpacked ASCII with only a little computation overhead (peanuts compared to waiting for the teletype ready signal!)

I'm debating keeping "Conbak" as the name for this. It's more pronounceable than most of the Spells of Quendor, and never appeared in any games (but it does have a bonus reference to 12 in the lore). I was thinking maybe "Constructing Bitty Adventures Kit" as a backronym if I do.

I'm using Vince Slyngstad's pal assembler, as at some point I hope to optimise code for @tastytronic's PDP-12, but my target minimum platform is a PDP-8/i with 8k of core and at least one DECTape drive. No EAE necessary (although it was an *extremely* common option for 12s).

And believe me, I would LOVE to have an 8/e, even without the EAE! The `BSW` instruction is SO USEFUL at keeping code density down. But the 12 doesn't have it, so I'm being cautious.

I'll have to look up which models had the `MQ` instructions available even without EAE, as that makes a handy two-word stack and gives you a machine-native `OR` instruction even without all the `MUL` circuitry hooked up.

My one complaint is that I can't get SIMH to build the pdp8 simulator on my Alpine system, even with gcompat installed. I'll need to bash about and figure out what includes aren't working.

(or, you know...someone could sell me their used PiDP8)
@spacehobo @scruss or just use the software on a Pi ? The switches and blinking lights are entirely optional.
@GrantMeStrength @scruss I have simh running on my old ubuntu system. What I want are the switches and lights and later on maybe a silent 700 to demonstrate some of the physicality of using a PDP-8 on a portable exhibit.
@spacehobo @GrantMeStrength there are kits with front panels out there that use the Harris/Intersil CMOS pdp-8 chipset. They're about the same speed as an 8e, but don't support double precision
@scruss @GrantMeStrength I mean, if anyone has one of those pre-assembled and happy to sell, I'd go for that in a heartbeat!
@scruss @GrantMeStrength Although, again, I'm hoping to use OS/8 so it would need to support/emulate TU55/TU56 tape drives.

@spacehobo @GrantMeStrength they're all disk-based via CompactFlash cards. They typically run a late version of OS-8

Getting a prebuilt one might be tricky. I have a working one but I never got round to the front panel

@scruss @GrantMeStrength Yeah, see, I know exactly how well any kit-of-parts project would go for me, so I'm looking for pre-built, used, neglected models.
@scruss @GrantMeStrength But these have no EAE, is that the distinction? That's the only "double-precision" feature set I know of in an 8's CPU.

@spacehobo yes, it seems so. Fortran double precision statements don't work on the SBC6120, while they do on the PiDP8.

While I'd let mine go, it wouldn't be cheap: the kit (via a friend) was expensive, and it wasn't as easy a build as the instructions make out. Also the shipping from Toronto would be lolno.

Not sure how nicely the PiDP8 supports a real mark-parity DEC-style serial port that'll talk to a terminal properly, either.

@scruss
Simh does a good job, and I can handle serial terminals on linux
@scruss @GrantMeStrength https://uk.rs-online.com/web/c/switches/toggle-switches-slide-switches/toggle-switches/?selectedNavigation=attributes.Features=Illuminated ← Also I'd probably go down a rabbit hole of using these illuminated RS PRO toggle switches instead of separate switches and lights...
Toggle Switches | Panel & PCB Mount Toggle Switches | RS

Shop Toggle Switches at RS for a Wide Range of Panel & PCB Mount On-Off Toggle Switches from Trusted Brands with Fast Delivery