Mastodawn

Alice Averlong🏳️‍⚧️Dec 12, 2023

why do I keep hacking 16bit DOS games? do I hate myself?

32bit programs are SO MUCH EASIER to RE, because when you see an address, you know what it means. 0x12345678 always means 0x12345678!

Show thread

Alice Averlong🏳️‍⚧️Dec 12, 2023

16bit games are full of MOV AX, 1234h and it's like, WHAT'S DS AT THIS POINT? WHICH 1234?

Show thread

Alice Averlong🏳️‍⚧️Dec 12, 2023

there's 65536 possible memory addresses it could be!

Show thread

Alice Averlong🏳️‍⚧️Dec 12, 2023

not to mention that there's more than one way to address a given part of memory.

in 32bit and 64bit code, if you see 0x12345678, you know that some code that writes to 0x12335662 doesn't change it.

not so in 16bit games. you have plenty of ways to refer to the same address.

Show thread

Alice Averlong🏳️‍⚧️Dec 12, 2023

This is why 16bit x86 is SO much more annoying than 8-bit computers.
with 8-bit computers, you have 16-bit addresses, because 256 bytes is rarely enough memory. So they work by having some addresses which are longer. simple, right? so instead of an 8bit number, you have a 16bit number.

Show thread

Alice Averlong🏳️‍⚧️Dec 12, 2023

16bit x86 does this as well. 16bits of ram is only 64kb, and that's just not enough. So you expand it to 24bits or 32bits, for "long addresses", right? same as you use in 8bit computers?

NOPE

Show thread

Alice Averlong🏳️‍⚧️Dec 12, 2023

segmented addressing, the solution they use, is not as simple as just adding some more bits. a 16bit segment and a 16bit offset.

so that's just a weird way of explaining a 32bit number, right?
NOPE

Show thread

Alice Averlong🏳️‍⚧️Dec 12, 2023

no, you combine 16bits and 16bits and get... 20 bits.

it's a 20bit address.

Show thread

Alice Averlong🏳️‍⚧️Dec 12, 2023

so what, they ignore all but the bottom 4 bits of the segment?

NO THAT WOULD MAKE SENSE

Show thread

Alice Averlong🏳️‍⚧️Dec 12, 2023

instead the full 16bit segment is used, but it's turned into a 20bit address by shifting it 4 bits over and adding in the offset.

Show thread

Alice Averlong🏳️‍⚧️Dec 12, 2023

So it's the TOP 4 bits that are important, not the bottom 4.

Okay that's fine, but wait, I said adding. Not "replacing".

Yes, all 16bits are used. So the address 0000:0000 is (linear) 0x0, and 0001:0000 is (linear) 0x10

Show thread

Alice Averlong🏳️‍⚧️Dec 12, 2023

which also means that 0001:0000 and 0000:0010 are both linear 0x10.

So you can get pointer aliasing even though both pointers HAVE DIFFERENT VALUES

Show thread

Alice Averlong🏳️‍⚧️Dec 12, 2023

And if that wasn't bad enough, there's also the A20 gate nonsense. Now, the A20 gate was added with the 286, for backwards compatibility with how the 8086/8088 worked, which is that memory wrapped.

Show thread

Alice Averlong🏳️‍⚧️Dec 12, 2023

so not only are 0001:0000 and 0000:00010 the same address, so is FFFF:0020!

Show thread

Alice Averlong🏳️‍⚧️Dec 12, 2023

but don't worry, for the 286 they wanted to add more than 1 megabyte of RAM, which is the max you can address with a 20bit address, so they added the ability to disable address wrapping.

on the keyboard controller.

Show thread

The Penguin of Evil

@foone That's a PC specific hack. The CPU didn't bother. It's in much later processors because the caches moved so the emulation had to move to the CPU to keep PC compatibility madness happy.

On segments; You can have segments in 32bit mode, in fact Linux used them extensively for thread local storage and the kernel equivalent thereof, as well as user space addressing.

Other funny about the 286 is it has MMU features that can only be replicated on relatively modern 64bit x86 parts !

Show thread

William D. Jones Dec 12, 2023

@etchedpixels @foone I vaguely remember that the 486 has an #A20 pin, which tells it whether address wraparound is enabled for its internal cache.

I guess at some point, x86 grew features to handle the translation entirely internally ("had to move to the CPU")?

Show thread

The Penguin of Evil Dec 13, 2023

@cr1901 @foone 386SL it's port 0xEE/EF internally on the CPU for example as the pieces got more integrated and the keyboard controller ended up in the SoC (or in many cases became an emulation trap for the USB controller). Modern systems don't even have an emulated AT style keyboard controller quite often.