In 1981, Intel released the iAPX 432, calling this "micro-mainframe" one of the most important advances in computing since the 1950s. But it was a flop, costing Intel $100 million. An unexpected side-effect, though, was the 8086 processor. 1/n
The 432 processor put object-oriented programming and storage allocation in hardware. This ambitious processor was split across two chips: the Instruction Decoding Unit decoded instructions into micro-instructions. The Microinstruction Execution Unit executed them.
I took die photos of the first chip, the 43201. This chonky half-a-processor is twice the size of the 8086 processor and doesn't even execute instructions. It has 3.8× the transistors (110,000 vs 29,000) and has 6× the microcode (64 Kb vs 11 Kb).
Why a separate chip just to decode instructions? The 432's instructions are absurdly complicated. An instruction is from 6 to 321 bits long and can start anywhere in a byte. Decoding instructions needed a complex state machine complete with subroutines.
This complicated block diagram from a 432 patent shows what's inside the decoder chip. To summarize: instructions enter at the left (ACD) and micro-instructions exit at the right (µI bus). Microprogram ROM is in the center. The Composer Matrix extracts chunks of instructions.
I partially reverse-engineered the die photo to label it with approximate functional blocks. The top half is the microcode ROM and the state-machine PLAs (programmable logic arrays). The bottom half disassembles the instruction stream and shuffles pieces around.
This closeup of the microcode ROM shows the vertical select and output data lines and the zig-zag polysilicon select lines. Bits are stored by putting a transistor at each zig-zag, or not. Changing the focus shows the underlying transistor pattern and thus the microcode bits.
Why so much microcode? The basic operations and addressing modes took 250 micro-instructions; the other 3.7K implemented floating point and the "sophisticated object-oriented functions" of the system. The 432 was one of the first to use IEEE-754 floating point, still used today.
Binary decoders select rows and columns in the ROM. Each column matches a binary number: 0000, 0001, 0010, etc. The boxes indicate transistors, attached to a 0 or 1 line. The low-bit transistors (red) alternate every column, orange alternate every two columns, etc.
Since instructions aren't aligned with bytes, a 32-bit shifter called the "Composer Matrix" shifts the word to extract each instruction field. Diagonal control lines energize transistors to select an arbitrary shift.
A PLA-based state machine steps through the chunks of the instruction, running microcode routines as needed. The bit instruction pointer keeps track of the location in a byte. (A jump instruction can end up in the middle of a byte.)
An interesting circuit is the charge pump, an analog circuit on this digital chip. It has an oscillator and capacitor to generate a negative voltage. This negative bias on the silicon improves performance. The charge pumps are almost identical to the ones in the 8086 processor.
As the 432 project fell behind schedule, Intel realized they urgently needed something to sell. Intel quickly threw together a short-lived stopgap processor to sell until the 432 was ready: the 8086 (1978) was a huge success and is still around in the x86 architecture.
The iAPX 432 was finally released in 1981 to great fanfare. But its performance was dismal, 1/4 the speed of the 8086, making the 432 a flop.
The paper that killed the 432 was "A Performance Evaluation of the Intel iAPX 432". I recently realized that one of the paper's co-authors was my former officemate https://twitter.com/Bob_Mayo.
https://archive.org/details/PerformanceEvaluationOfTheIntelAPX432
Bob Mayo (@Bob_Mayo) / Twitter

Climate-tech curious. Fan of a livable planet. Engineering and research at the intersection of hardware and software. https://t.co/iy4dCbcctn

Twitter
The big computer architecture debate of the 1980s was RISC vs CISC, pitting Reduced Instruction Set Computers against Complex Instruction Set Computers. RISC processors were simple but fast with lots of registers, moving complexity to software. Instructions were easy to decode.
Built just before RISC, the 432 took CISC to the extreme, putting everything possible into hardware rather than software: objects, garbage collection, etc. Intel called it the Silicon Operating System. With no user-visible registers, instructions were stack and memory-based.
Minimizing the "semantic gap" between high-level languages and assembly language was a big thing back then. The 432 was designed for the Ada language with instructions to perform high-level operations. The Ada compiler was $30,000; we're spoiled now by open-source compilers.
What if the 432 had won? Computing would be very different. Many security problems wouldn't exist. You can't have a buffer overflow because every data structure is a separate object with memory segment size enforced in hardware. You can't smash the stack or make bad pointers.
The 432 was designed around fault-tolerant multiprocessing. One chip could validate another and fail over if necessary. Computers would be much more reliable if the 432 had won.
There aren't many die photos of the 432 chipset, but I made this summary from various sources. The 43201 and 43202 form the "General Data Processor". The 40203 was an attached I/O co-processor. The Bus Interface and Memory Control were for fault-tolerant multiprocessor systems.
Taking the Instruction Decoding Unit die photos was a bit tricky because the die is encased in a paperweight. Thanks to moylecroft for loaning me the paperweight.
Chips-on-board photo by @brouhaha (CC BY-SA 2.0) https://commons.wikimedia.org/wiki/File:Intel_SBC_432_100_board,_component_side_(15162604484).jpg
Die photos of 43204/43205 from Intel/CHM.
File:Intel SBC 432 100 board, component side (15162604484).jpg - Wikimedia Commons

@kenshirriff Knowing the ebb and flow of generic vs specific computing hardware, I'm pretty sure the model would have been dropped into a more generic, largely software driven system eventually and we'd have all the security problems in any case.. =)

@kenshirriff In 1985 the ARM1 was revolutionary in having a barrel-shifter "for free" in the instruction set, which cost a huge amount of die area but was impressively flexible compared to the 1-bit-per-clock shifts of 68k and x86.

Here they have the same structure and area cost, but its utility is almost completely invisible to the programmer! The 432 was wild...

@kenshirriff The 43201 microcode can be dumped electrically without decap, and I've done that for a release 1 C43201. Unfortunately the 43201 and 43202 contain many PLAs which (AFAIK) can't be dumped other than by decap and photomicrograph.
@brouhaha Have you decoded any of the 43201 microcode?
@kenshirriff Only a tiny amount. I'll try to find time to put what I've got into a GitHub repo.
Most of the microcode implements the high-level instructions, which are quite complex. I haven't been able to figure out any of those yet.
@kenshirriff Here's a disassembler for 43201 microcode:
https://github.com/brouhaha/iapx432-gdp-uc-dis
There are numerous issues with using it:
* the entry points are uncertain, because they come from PLAs
* not all 432 instructions execute microcode
* the execution of some instructions involves other sequencers, especially floating point operations in the 43202
* because of changes in the various architecture releases, some microinstructions may differ
GitHub - brouhaha/iapx432-gdp-uc-dis: Dissassembler for Intel iAPX 432 GDP microcode ROM

Dissassembler for Intel iAPX 432 GDP microcode ROM - GitHub - brouhaha/iapx432-gdp-uc-dis: Dissassembler for Intel iAPX 432 GDP microcode ROM

GitHub
@kenshirriff If the 43203 Interface Processor has any microcode ROM dump mechanism, or other test modes, they are not documented, and probably would be triggered by a supervoltage on a pin.
@kenshirriff ah this answers my question 😎
@kenshirriff Every machine needs a "herring fault" line.
@kenshirriff I assume there's some sort of incredible advantage to having bit aligned instructions that offsets all the additional complexity on top of byte aligned ones? You save some bits of storage, sure, but...
@Lalufu Bit-alignment was supposed to improve instruction density so you could get more instructions with fewer memory accesses. But it turned out that the 432's instruction density wasn't as good as regular processors in most cases, so it was a bad idea.

@kenshirriff

I have a few of these I’ve been meaning to decap. Learning that the instruction size ‘varies from 6 to 321’ bits long is making me reconsider. It feels like breaking the seal on some sort of cursed tomb.

Yeah, its *probably* not haunted. But why take the risk? 🤔

@kenshirriff I can guess microcode top-right, what’s top-left ? 2 large blocks of cache ?
@sxpert My labeled die photo is later in the thread. The top left block is microcode. The top right blocks are PLAs (Programmable Logic Arrays) for the decoding state machine.
https://oldbytes.space/@kenshirriff/110231913532961112
Ken Shirriff (@[email protected])

Attached: 1 image I partially reverse-engineered the die photo to label it with approximate functional blocks. The top half is the microcode ROM and the state-machine PLAs (programmable logic arrays). The bottom half disassembles the instruction stream and shuffles pieces around.

OldBytes Space - Mastodon
@sxpert @kenshirriff *Cache*?! In the 1970s?! I don't know what it is but I'll place money on it not being cache.

@kenshirriff

The Rational R1000 Ada computer we have in Datamuseum.dk is the same basic idea, but it worked out, IBM bought Rational for a couple of billions in 1990ies.

Some really amazing software probably made all the difference.

@kenshirriff A late friend of mine was on the 432 team at Intel. He never put it on his resume. It’s a shame that it had such a stigma, I think Intel struggles with anything that’s not X86, like the 432 and Itanium.
@dogzilla @kenshirriff Didn't i860 do well for a while?
@lopta @kenshirriff They had some design wins at a couple of the UNIX server/workstation vendors, not huge, but a lot better than the 432. As I recall, it was a pretty good design. Not enough to out compete with the X86. Sun even had X86 workstations for a bit.
@dogzilla @kenshirriff More than once, if you count the Sun 386i.
@kenshirriff This might be the first time I've actually seen a photo of iAPX 432!
@kenshirriff This thing sounds absolutely bonkers.
@kenshirriff I saw this brochure (I think?) the first time. It was heavily ADA oriented with built in blocking message passing operations. Crazy. I *think* we may have been evaluating it for a project but as you say, it was a flop.
@kenshirriff I have a delightful collection of old microchips...but an iAPX 432 continues to elude me. Are there any of these out in the wild?
@kenshirriff I guess if it led to 8086 it was an important advance in computing even if it itself was a flop
@kenshirriff interesting, didn't know this system before, thanks for the explanations