OK, OK, ok, story time.

Way back when (early 90s), when Omni was consulting for McCaw Cellular (or AT&T Wireless, not sure which it was at the time), we were working on apps for NeXTSTEP for sales, customer care, and such for cell phones, nation wide. We'd occasionally get a crash reports and I don't even remember how those got back to us back in the day before automated collection and reporting, but eventually we were able to reproduce it.

Back then NeXT was using gcc as the system compiler and it turns out that the `new[]` C++ operator would allocate room for the stuff you asked for, plus an extra word at the front of the block, where it would store the count (and then give you the shifted address). Except at some point that changed because it was silly and that redundant count was removed. Except that *also* `delete[]` still took the pointer given and loaded the word *before* it to load the count (and then did nothing with it). Given enough hours, you'd eventually have `delete[]` looking off into a previous unallocated page get a stern talking to from the MMU.

Having discovered this, and not having a way to patch the compiler or system libraries, I instead wrote a perl script to process the assembly output of the compiler, find instances of this and fix them, hand verifying each fix was correct while the hack was needed, and every compiled file went through this until we got new tools that fixed the problem for real.

Duct tape and bailing wire, y'all.

@tjw On the day Opteron was supposed to tape out, a colleague discovered a logic bug. After some analysis, we figured out we could fix it by disconnecting a wire from one gate and attaching it to another. But running through or design flow would take days, and the ripple effect of changing connectivity could cause more problems. So I loaded the chip mask into VIM and modified the polygons directly, then we taped it out.

Don’t remember for sure, but I don’t think we told management 🙂

@cmaier @tjw Now that's what I call proper low-level debugging 👍

@cmaier @tjw here is a hw story for you. A chip comes back. It won't come out of rest. They find it was because someone messed up and had a connection done that was only supposed to be there while doing full chip testing and it was to be connected to ground rather to always on.
The few chips that came back they were able to salvage by zapping the connection to the right location.

I dont have the full details as I heard this 2nd hand and I am just a software guy. This was also 8 years ago so this was still on newish process, I think 45nm.

@cmaier @tjw “…and this, children, is why we use vim instead of vi.”

Do you remember what the bug would’ve affected?

@AnachronistJohn @tjw nope. I think it was the load/store unit though.
@cmaier Amazing. It would have taken me years to sleep again.
@tjw Nah. We were cowboys. There were only a couple dozen people total working on that chip.
@cmaier cool! What format was the mask data file?
@tomf It was a long time ago, but I believe it would have been .def, since I was just changing the wires and not anything below M1. Probably once I edited it we’d have launched LVS just to be safe, and that would have re-generated the gds. Our whole flow was text files back then, with little binary databases that tools would create on their own as-needed.