Mastodawn

Tube🌿Time Aug 16, 2023

i'll need to figure out what is up with the MADE24 line. could be that the pin doesn't actually do that. the HDD pinout is one that i reverse engineered a while back, so it might be a mistake.

this could also explain the damage to the PC, perhaps the card tried to write to the data bus when it was not supposed to and damaged the output drivers of some other chip.

Tube🌿Time Aug 18, 2023

huh, the "MADE24" line is controlled by bit 7 of register 96. I wonder what that is.

Tube🌿Time Aug 18, 2023

lol, this is the active high CHRESET! i was wondering why that line seemed to be missing.

Tube🌿Time Aug 23, 2023

moving on to the Teensy interface. i had to choose the IO pins carefully so i can make a 16-bit parallel IO port.

Tube🌿Time Aug 23, 2023

got the Teensy interface up and running. i'm using direct IO port access on the Teensy 4.1. take a look at core_pins.h in the Teensy header files. basically you can read from GPIOx_PSR and write to GPIOx_DR.

i also had to add a short delay to create some setup time for the FPGA--the Teensy 4.1 is a hair too fast lol

Tube🌿Time Sep 3, 2023

bidirectional registers now work! i can write a command from the PC to the Teensy, and i can write a response from the Teensy and read it from the PC. there are also status flags showing when new data is available. it may not seem like much, but this is huge progress.

Tube🌿Time Sep 4, 2023

excellent progress today. I've been able to implement the "Get Diagnostic Status" command. it transfers the command block and handles the returning status block as well as the flags and interrupts. best of all, it works on real hardware using my diagnostic program!

Tube🌿Time Sep 4, 2023

OK why does pin 1 start halfway down the edge of this chip???

my best guess is that the die is rotated to a 45 degree angle. anyway i want to dump the contents so i can analyze the drive firmware.

Tube🌿Time Sep 5, 2023

no 28-pin TSOP socket, oh well

Tube🌿Time Sep 5, 2023

now i'm knee deep in Ghidra listings. this code probably runs the entire hard drive, not just the host interface.

Tube🌿Time Sep 6, 2023

this sort of reverse engineering is very much like solving a challenging puzzle. you push and push until you can deduce something based on what you already know, then you pivot, taking that new knowledge and pushing on that until you learn even more.

Tube🌿Time Sep 6, 2023

so last night I identified the power on self test routines by inspection. it's not too hard to identify a checksum routine or a memory test routine. this helped me fill in the memory map.

also, the POR test function stores the results at a particular memory location, and the codes match up with the POR error codes in the DBA-ESDI spec! the next step is to search the whole ROM for any instructions that read this memory location--this should identify the functions that generate the status block.

Tube🌿Time Sep 8, 2023

drive firmware is turning into a bit of a slog so i switched over to the IBM BIOS. having a spec is nice, but the code will cover a bunch of corner cases.

Tube🌿Time Sep 16, 2023

managed to reverse engineer enough that I was able to read the defect map out of one of the original hard drives. sounds easy but the process uses DMA.

Tube🌿Time Sep 23, 2023

working through a nasty timing hazard with the mailbox flags on the command port. sometimes you write data and the "data available" flag never gets set.

so now i am digging through the logic that yosys generated to see if it even makes sense.

Tube🌿Time Sep 24, 2023

having good test programs is important. here's the status interface register dropping values. the Teensy program is just writing an incrementing number, and the diagnostics program is checking for gaps.

Tube🌿Time Sep 26, 2023

got that all sorted out. it was a synchronization issue with the flags between the two interfaces.

this is the "seek" command successfully completing! this is a *major* step since it requires 4 working mailboxes and interrupts.

Tube🌿Time Sep 27, 2023

another important step today--i got the data port and data port mailbox flags working. it can also detect 8-bit vs 16-bit transfers. getting very close to working PIO transfers.

Tube🌿Time Sep 28, 2023

nice! I managed to get PIO data transfers working well enough for the buffer test routine to pass.

Tube🌿Time Sep 28, 2023

ok this is fantastic--I've managed to transfer my first actual sector! it's just using PIO and the data is not from a real filesystem, but this is another big step forward!

Tube🌿Time Sep 29, 2023

DMA on Micro Channel is really hard. i'm running a bunch of simulations first, making adjustments to the logic as needed.

so many moving parts.

Tube🌿Time Sep 30, 2023

wow, got four bytes to transfer successfully over DMA! not sure why it got stuck after that.

Tube🌿Time Sep 30, 2023

just ran the same test again and it transferred the whole sector over DMA!!

so at least read transfers are working partially. writes just hang the machine after transferring half a sector. it's probably time for the logic analyzer.

Tube🌿Time Oct 3, 2023

not sure why I always end up in front of a logic analyzer, but here we are.

Tube🌿Time Oct 3, 2023

several issues. this first issue, during a host to device write, holds the arbitration bus too long. it should release immediately after the second arb/gnt pulse

Tube🌿Time Oct 3, 2023

had a theory and it reproduces in simulation. the transfer request flag isn't getting cleared soon enough. la_dma_selected is what can clear this flag and it is changed on the falling edge of cmd, which is too late to catch the ARB/GNT pulse.

Tube🌿Time Oct 3, 2023

yes, that solves the crashing problem. but data isn't getting transferred correctly, so I've got more work to do.

Tube🌿Time Oct 3, 2023

weirdly enough, it works the second try!!! something on the host was prematurely turning off DMA. maybe a bug in difdiag.

Tube🌿Time Oct 4, 2023

so the interrupt_detected flag is supposed to be set in the irq14 handler, and it is *supposed* to be set only when DMA is done. but somehow interrupt_detected is set without the IRQ handler ever being called! then the DMA operation is broken down prematurely.

Tube🌿Time Oct 4, 2023

using the logic analyzer, i proved that the irq14 handler never gets called. the only code that *ever* sets the interrupt_detected flag exists in this handler. it's declared as a volatile so it can't be cached in a register.

Tube🌿Time Oct 4, 2023

I wrote the flag value out to an unused IO port, 0x4F, so I can see it on the logic analyzer. a neat trick!

Tube🌿Time Oct 4, 2023

so i don't know how this flag is getting set. my hack is to preemptively clear the flag right before starting DMA, and so far, it seems to be working.

i think this code was "working" with the real ESDI drive because that one uses burst mode DMA and it finishes up very quickly, before the irq14wait routine can exit early.

Tube🌿Time Oct 4, 2023

decided to look at the real drive. and guess what--it's not using burst mode. the POS registers have it turned off by default. it's also slow to read the data from the spinning disk, so IBM must have figured that it wasn't really necessary.

Tube🌿Time Oct 5, 2023

now I'm reading up on accessing SD cards from the Teensy 4.1. looks like SdFat is the library? could it be so easy?

Tube🌿Time Oct 7, 2023

turns out it's easy but I had to reformat the SD card using the official sdcard.org utility. anyway, I've read the first sector from a real disk image!

Tube🌿Time Oct 7, 2023

the drive now gives the POS ID. let's try to boot!

Tube🌿Time Oct 7, 2023

hmm 01048200 is a drive select acknowledgement error.

Tube🌿Time Oct 7, 2023

the BIOS runs faster than the DIFDIAG utility, and so it seems like it is hitting a timing problem that i didn't hit before.

my drive code seems to randomly hang up and not respond correctly.

Tube🌿Time Oct 7, 2023

it's occasionally getting a spurious end-of-interrupt command which is really odd and points to an issue with the mailboxes (again, sigh).

but it's SO DARN CLOSE. it's transferring sectors from the IML region in the disk image.

Tube🌿Time Oct 8, 2023

figured out one problem. the disk boot routines slam the drive with an ATN and the first command word in 5.5us. the Teensy code takes too long to see the ATN and clears the command register full flag, which drops the first word. oops.

so it *almost* boots now. in fact it successfully loads the IML sectors from the hidden partition on the drive, and no longer throws an I999... error code!

Tube🌿Time Oct 8, 2023

my drive doesn't implement this weird feature called pseudo RBAs--it's a way to artificially limit the maximum possible block address, presumably so they can hide the partition data.

i suspect the BIOS checks this, so i'll have to implement it. ugh. that means i need to figure out this incomprehensible diagram.

Tube🌿Time Oct 8, 2023

holy crap it's booting I can't believe it sdfadfsdfsdfsffasdf

Tube🌿Time Oct 8, 2023

well, it's working well enough to run qbasic. right now the drive is read-only.

Tube🌿Time Oct 14, 2023

i think i need to dig into the 01290200 cache error that has been coming up. i'm concerned that an issue with my DBA-ESDI card has caused it, but i'm not sure.

Tube🌿Time Oct 14, 2023

looks like the cache is inside the CPU. i can't find any cache chips on the motherboard.

Tube🌿Time Oct 14, 2023

see? no cache or memory chips. the larger devices are probably semicustom gate array parts that IBM was fond of using. doubt they contain any cache memory.

Tube🌿Time Oct 14, 2023

looks like the error is generated by an NMI that gets tripped when the cache is being set up. could be a number of causes but in general it is an issue with the internal CPU cache.

Tube🌿Time Oct 14, 2023

could also be this test of the DMA controller which is also included in the same set of tests and triggers the same error code, for some reason.

Tube🌿Time Oct 14, 2023

this gives me an idea.

Tube🌿Time Oct 14, 2023

pulling the CMOS battery...

Tube🌿Time Oct 14, 2023

hmm, the error still comes up. so i just tried what i *should have tried* at the start -- the 700 series diagnostic disk.

Tube🌿Time Oct 14, 2023

when the diagnostic detects the cache error, it asks if you have replaced the CPU card. i *lied to it* and said that I had, so when it asked if i wanted to keep the cache disabled, i said "N".

Tube🌿Time Oct 14, 2023

aaaand that fixed it! we're now booting to DOS off my DBA-ESDI disk replacement.

Tube🌿Time Oct 14, 2023

so here's what i think happened:
1. my early version of the FPGA code had a typo that caused the BURST# line to be held low
2. this caused the DMA controller to get stuck and time out during the cache test, presumably a very early CPU test that checks for cache coherency.
3. this error is *sticky* and gets written to some nonvolatile memory (perhaps not CMOS since i couldn't clear it by pulling the battery.)

Tube🌿Time Oct 14, 2023

this is all very good because i know the root cause and it's not something terrible like data bus contention, and it's thankfully not permanent damage.

Tube🌿Time Oct 16, 2023

it boots windows 3.1 now. it was trying to run a weird hdd power saving mode command I hadn't implemented. it also complains about the swap file because the filesystem is read only still.

Tube🌿Time Oct 16, 2023

so about that write issue: it's an off-by-two error somewhere. two bytes being a single 16-bit word, so it's really an off-by-one error.

Tube🌿Time Oct 17, 2023

figured it out and fixed it. i forget to set the "transfer request" flag to kick off DMA.

in another routine, it sees that this flag is clear and assumes that a word has already been read using DMA, so it reads a crap value and then sets the transfer request flag again to start the next DMA transfer. that "crap value" pushes the valid data forward by one word.

Tube🌿Time Oct 18, 2023

on to the next issues: randomly the ATN register mailbox flag gets set but the data in it is stale. also, the status interface register will randomly get read from by the host.

I think these are two facets of the same problem: the mailbox flags sometimes respond when you access a register that they are not supposed to be monitoring!

Tube🌿Time Oct 19, 2023

the mystery deepens. according to the logic analyzer, temp_atn_set never goes high. reg_atn_set (for crossing clock domains) is always 000. flag_atn is only set to 1 on this single line of code!

and yet, somehow, it magically flips to a 1.

Chuck Oct 17, 2023

@tubetime of course it is. 🙂 Thinking it was an off by two error was off by one. 😂

dumb future Oct 16, 2023

@tubetime This was really enjoyable to follow along with! Congrats on the progress

Ted Spence Oct 14, 2023

@tubetime simulating an ESDI disk? Now I need to read up on it. I remember when ESDI vs SCSI was a serious question

Tube🌿Time Oct 14, 2023

@Tedspence not standard ESDI--this is DBA-ESDI.

jgeorge Oct 15, 2023

@tubetime @Tedspence is this “IBM” ESDI? I haven’t a bunch of systems that use those big ass IBM 5.25” ESDI disks and those disks currently have like a 90% failure rate. This could rescue many machines! Please sign me up for your newsletter!

Tube🌿Time Oct 15, 2023

@jgeorge only if those are DBA-ESDI, not regular ESDI.

Eloy. @ EH23 Oct 15, 2023

@tubetime wow a ThinkPad 700C with a working screen! I've seen multiple of them, screen was always very bad.

Tube🌿Time Oct 15, 2023

@eloy it needed recapping

Fritz Adalis Oct 14, 2023

@tubetime
@kenshirriff just posted about these CPUs. They have most of the support circuitry in the cpu itself.

bitsavers.org Oct 14, 2023

@FritzAdalis @tubetime @kenshirriff

I just did some digging around and I've not been able to find any good on-line documentation for the 486SL series. it had a separate data book ISBN 9781555121921 and there are ancient pointers to a pdf on intel's long dead ftp site to it
https://web.archive.org/web/20180512055622/http://www.rcollins.org/intel.doc/486Manuals.html

Tube🌿Time Oct 15, 2023

@bitsavers @FritzAdalis IBM made some changes too. see https://www.vogons.org/viewtopic.php?t=71435&start=20

IBM 486SLC/2 Mega Thread (was Weird idea: 83mhz overdrive for 386 sx.) - Page 2 \ VOGONS

bitsavers.org Oct 15, 2023

@tubetime @FritzAdalis
http://ps-2.kev009.com/pcpartnerinfo/ctstips/b01a.htm

486 Processors - Questions and Answers

Brian Swetland Oct 14, 2023

@tubetime Inconvenient!

Marsh Ray Oct 9, 2023

@tubetime “It’s working well enough to run QBASIC”

Love it 😂

Ian Hanschen Oct 9, 2023

@tubetime this is incredibly rad

bitsavers.org Oct 8, 2023

@tubetime
woot!

Samantaz Fox Oct 9, 2023

@tubetime Woohoo! Congrats!

bitsavers.org Oct 8, 2023

it doesn't look too bad, you just have the choice to save it back to the config area or not with set max-rba (which it probably never does out of the factory). did you clone a disk with the config area?

Tube🌿Time Oct 8, 2023

@bitsavers there's no easy way to image that. it'd require reverse engineering the drive firmware.

Vlad Vukicevic Oct 14, 2023

@tubetime "No command can not access" so.. uh.. commands can access? 🤯

Tube🌿Time Oct 14, 2023

@vvuk the document is full of typos and errors.

Vlad Vukicevic Oct 15, 2023

@tubetime Your project is an ESDI drive emulator, right? (specific drive type, but ESDI interface) Would it work in another computer that had an ESDI controller and understood the IBM drive?

(I assume so, but then you mentioned microchannel which confused me -- I'm assuming the "creaky old IBM laptop" interface is ESDI?)

Vlad Vukicevic Oct 15, 2023

@tubetime just saw your comments about this being DBA-ESDI. Time to read about the difference...

Vlad Vukicevic Oct 15, 2023

@tubetime Oh. https://x.com/foone/status/1369553555313418241?s=46&t=I51G_PgUJMKTHP86YlGk5g

foone🏳️‍⚧️ on X

so IBM invented ESDI DBA: Direct Bus Attachment. It's ESDI-like but it's actually MICROCHANNEL IN THE HARD DRIVE!

X (formerly Twitter)