Mastodawn

Tube🌿Time Oct 3, 2023

yes, that solves the crashing problem. but data isn't getting transferred correctly, so I've got more work to do.

Tube🌿Time Oct 3, 2023

weirdly enough, it works the second try!!! something on the host was prematurely turning off DMA. maybe a bug in difdiag.

Show thread

Tube🌿Time Oct 4, 2023

so the interrupt_detected flag is supposed to be set in the irq14 handler, and it is *supposed* to be set only when DMA is done. but somehow interrupt_detected is set without the IRQ handler ever being called! then the DMA operation is broken down prematurely.

Show thread

Tube🌿Time Oct 4, 2023

using the logic analyzer, i proved that the irq14 handler never gets called. the only code that *ever* sets the interrupt_detected flag exists in this handler. it's declared as a volatile so it can't be cached in a register.

Show thread

Tube🌿Time Oct 4, 2023

I wrote the flag value out to an unused IO port, 0x4F, so I can see it on the logic analyzer. a neat trick!

Show thread

Tube🌿Time Oct 4, 2023

so i don't know how this flag is getting set. my hack is to preemptively clear the flag right before starting DMA, and so far, it seems to be working.

i think this code was "working" with the real ESDI drive because that one uses burst mode DMA and it finishes up very quickly, before the irq14wait routine can exit early.

Show thread

Tube🌿Time Oct 4, 2023

decided to look at the real drive. and guess what--it's not using burst mode. the POS registers have it turned off by default. it's also slow to read the data from the spinning disk, so IBM must have figured that it wasn't really necessary.

Show thread

Tube🌿Time Oct 5, 2023

now I'm reading up on accessing SD cards from the Teensy 4.1. looks like SdFat is the library? could it be so easy?

Show thread

Tube🌿Time Oct 7, 2023

turns out it's easy but I had to reformat the SD card using the official sdcard.org utility. anyway, I've read the first sector from a real disk image!

Show thread

Tube🌿Time Oct 7, 2023

the drive now gives the POS ID. let's try to boot!

Show thread

Tube🌿Time Oct 7, 2023

hmm 01048200 is a drive select acknowledgement error.

Show thread

Tube🌿Time Oct 7, 2023

the BIOS runs faster than the DIFDIAG utility, and so it seems like it is hitting a timing problem that i didn't hit before.

my drive code seems to randomly hang up and not respond correctly.

Show thread

Tube🌿Time Oct 7, 2023

it's occasionally getting a spurious end-of-interrupt command which is really odd and points to an issue with the mailboxes (again, sigh).

but it's SO DARN CLOSE. it's transferring sectors from the IML region in the disk image.

Show thread

Tube🌿Time Oct 8, 2023

figured out one problem. the disk boot routines slam the drive with an ATN and the first command word in 5.5us. the Teensy code takes too long to see the ATN and clears the command register full flag, which drops the first word. oops.

Show thread

Tube🌿Time Oct 8, 2023

so it *almost* boots now. in fact it successfully loads the IML sectors from the hidden partition on the drive, and no longer throws an I999... error code!

Show thread

Tube🌿Time Oct 8, 2023

my drive doesn't implement this weird feature called pseudo RBAs--it's a way to artificially limit the maximum possible block address, presumably so they can hide the partition data.

i suspect the BIOS checks this, so i'll have to implement it. ugh. that means i need to figure out this incomprehensible diagram.

Show thread

Tube🌿Time Oct 8, 2023

holy crap it's booting I can't believe it sdfadfsdfsdfsffasdf

Show thread

Tube🌿Time Oct 8, 2023

well, it's working well enough to run qbasic. right now the drive is read-only.

Show thread

Tube🌿Time Oct 14, 2023

i think i need to dig into the 01290200 cache error that has been coming up. i'm concerned that an issue with my DBA-ESDI card has caused it, but i'm not sure.

Show thread

Tube🌿Time Oct 14, 2023

looks like the cache is inside the CPU. i can't find any cache chips on the motherboard.

Show thread

Tube🌿Time Oct 14, 2023

see? no cache or memory chips. the larger devices are probably semicustom gate array parts that IBM was fond of using. doubt they contain any cache memory.

Show thread

Tube🌿Time Oct 14, 2023

looks like the error is generated by an NMI that gets tripped when the cache is being set up. could be a number of causes but in general it is an issue with the internal CPU cache.

Show thread

Tube🌿Time Oct 14, 2023

could also be this test of the DMA controller which is also included in the same set of tests and triggers the same error code, for some reason.

Show thread

Tube🌿Time Oct 14, 2023

this gives me an idea.

Show thread

Tube🌿Time Oct 14, 2023

pulling the CMOS battery...

Show thread

Tube🌿Time Oct 14, 2023

hmm, the error still comes up. so i just tried what i *should have tried* at the start -- the 700 series diagnostic disk.

Show thread

Tube🌿Time Oct 14, 2023

when the diagnostic detects the cache error, it asks if you have replaced the CPU card. i *lied to it* and said that I had, so when it asked if i wanted to keep the cache disabled, i said "N".

Show thread

Tube🌿Time Oct 14, 2023

aaaand that fixed it! we're now booting to DOS off my DBA-ESDI disk replacement.

Show thread

Tube🌿Time Oct 14, 2023

so here's what i think happened:
1. my early version of the FPGA code had a typo that caused the BURST# line to be held low
2. this caused the DMA controller to get stuck and time out during the cache test, presumably a very early CPU test that checks for cache coherency.
3. this error is *sticky* and gets written to some nonvolatile memory (perhaps not CMOS since i couldn't clear it by pulling the battery.)

Show thread

Tube🌿Time Oct 14, 2023

this is all very good because i know the root cause and it's not something terrible like data bus contention, and it's thankfully not permanent damage.

Show thread

Tube🌿Time Oct 16, 2023

it boots windows 3.1 now. it was trying to run a weird hdd power saving mode command I hadn't implemented. it also complains about the swap file because the filesystem is read only still.

Show thread

Tube🌿Time Oct 16, 2023

so about that write issue: it's an off-by-two error somewhere. two bytes being a single 16-bit word, so it's really an off-by-one error.

Show thread

Tube🌿Time Oct 17, 2023

figured it out and fixed it. i forget to set the "transfer request" flag to kick off DMA.

in another routine, it sees that this flag is clear and assumes that a word has already been read using DMA, so it reads a crap value and then sets the transfer request flag again to start the next DMA transfer. that "crap value" pushes the valid data forward by one word.

Show thread

Tube🌿Time Oct 18, 2023

on to the next issues: randomly the ATN register mailbox flag gets set but the data in it is stale. also, the status interface register will randomly get read from by the host.

I think these are two facets of the same problem: the mailbox flags sometimes respond when you access a register that they are not supposed to be monitoring!

Show thread

Tube🌿Time Oct 19, 2023

the mystery deepens. according to the logic analyzer, temp_atn_set never goes high. reg_atn_set (for crossing clock domains) is always 000. flag_atn is only set to 1 on this single line of code!

and yet, somehow, it magically flips to a 1.

Show thread

Tube🌿Time Oct 19, 2023

looking at the generated logic, i see no explanation either. temp_atn_set (aka sd_cmd, my test point) never goes high. no glitches, no nothing. to set the flop, EN must be high and R must be low, and a clock edge must occur.

Show thread

Tube🌿Time Oct 19, 2023

there's a glitch! that's why I missed it before, it's only 2ns. this is the signal from the MCA bus clock domain, and it's getting picked up in my other clock domain's edge detector.

Show thread

Tube🌿Time Oct 19, 2023

and i believe this is the cause. this line right here. each signal, la_*, is an output from a flip flop latched by the micro channel bus cmd line. however, this line of code creates some combinational logic--there's a timing hazard here...

Show thread

Tube🌿Time Oct 19, 2023

the problem? the line (la_addr == REG_ATN) creates a bunch of gates that are slightly slower than the simple AND gates in the previous part of the line.

so la_mca_op=1, ~la_s0_w_l=1, and (la_addr == REG_ATN) *is also a 1 for a very short time!!!* this is because the previous value of la_addr WAS a REG_ATN.

what i need to do is take that entire wire and turn it into a latch (a reg) and clock it on cmd.

Show thread

Tube🌿Time Oct 19, 2023

so here's the solution: all the signals in the MCA bus domain go to a latch clocked in that domain (the first "always" block).

then *without any combinational logic* the output of that latch goes *directly* to another latch (the second "always" block) located in the main clock domain.

(i have another flip flop in main clock domain just for detecting the edge)