why do I keep hacking 16bit DOS games? do I hate myself?

32bit programs are SO MUCH EASIER to RE, because when you see an address, you know what it means. 0x12345678 always means 0x12345678!

16bit games are full of MOV AX, 1234h and it's like, WHAT'S DS AT THIS POINT? WHICH 1234?
there's 65536 possible memory addresses it could be!

not to mention that there's more than one way to address a given part of memory.

in 32bit and 64bit code, if you see 0x12345678, you know that some code that writes to 0x12335662 doesn't change it.

not so in 16bit games. you have plenty of ways to refer to the same address.

This is why 16bit x86 is SO much more annoying than 8-bit computers.
with 8-bit computers, you have 16-bit addresses, because 256 bytes is rarely enough memory. So they work by having some addresses which are longer. simple, right? so instead of an 8bit number, you have a 16bit number.

16bit x86 does this as well. 16bits of ram is only 64kb, and that's just not enough. So you expand it to 24bits or 32bits, for "long addresses", right? same as you use in 8bit computers?

NOPE

segmented addressing, the solution they use, is not as simple as just adding some more bits. a 16bit segment and a 16bit offset.

so that's just a weird way of explaining a 32bit number, right?
NOPE

no, you combine 16bits and 16bits and get... 20 bits.

it's a 20bit address.

so what, they ignore all but the bottom 4 bits of the segment?

NO THAT WOULD MAKE SENSE

instead the full 16bit segment is used, but it's turned into a 20bit address by shifting it 4 bits over and adding in the offset.

So it's the TOP 4 bits that are important, not the bottom 4.

Okay that's fine, but wait, I said adding. Not "replacing".

Yes, all 16bits are used. So the address 0000:0000 is (linear) 0x0, and 0001:0000 is (linear) 0x10

which also means that 0001:0000 and 0000:0010 are both linear 0x10.

So you can get pointer aliasing even though both pointers HAVE DIFFERENT VALUES

And if that wasn't bad enough, there's also the A20 gate nonsense. Now, the A20 gate was added with the 286, for backwards compatibility with how the 8086/8088 worked, which is that memory wrapped.
so not only are 0001:0000 and 0000:00010 the same address, so is FFFF:0020!

but don't worry, for the 286 they wanted to add more than 1 megabyte of RAM, which is the max you can address with a 20bit address, so they added the ability to disable address wrapping.

on the keyboard controller.

@foone That's a PC specific hack. The CPU didn't bother. It's in much later processors because the caches moved so the emulation had to move to the CPU to keep PC compatibility madness happy.

On segments; You can have segments in 32bit mode, in fact Linux used them extensively for thread local storage and the kernel equivalent thereof, as well as user space addressing.

Other funny about the 286 is it has MMU features that can only be replicated on relatively modern 64bit x86 parts !

@etchedpixels @foone I vaguely remember that the 486 has an #A20 pin, which tells it whether address wraparound is enabled for its internal cache.

I guess at some point, x86 grew features to handle the translation entirely internally ("had to move to the CPU")?

@cr1901 @foone 386SL it's port 0xEE/EF internally on the CPU for example as the pieces got more integrated and the keyboard controller ended up in the SoC (or in many cases became an emulation trap for the USB controller). Modern systems don't even have an emulated AT style keyboard controller quite often.