Out of the oven, BGAs all look good under side view optical microscopy (best I can do without X-ray).
Two 0402s needed touchup with an iron due to poor wetting; they were 33 ohm resistors from a reel I've had since 2014 so they might be starting to oxidize too much for my ROL0 flux to handle.
Tomorrow I'll populate the through hole connectors then start the bringup process.
All soldered up and ready to start bringup!
Later today after my little lab assistant goes to bed, that is. She's still a year or two from being ready to take readings off test points for me... Being able to speak in full sentences is probably a prerequisite.
These are just quick phone pics, I'll do some beauty shots with the A7R and macro lens later.
Fit testing the thermal solution. Looks mostly good, but not permanently mounting it yet. If i find problems early on it'll be easier to rework without a heatsink in the way.
I provisioned for two fans but we'll start with one and see how it goes.
The QDR-II+ heatsink is somewhat sheltered by the RS232 jack and probably won't see much airflow bit heatsinking it was more of a "just in case" vs the FPGA and main PHY which will definitely need it. So i think I'll be OK.
Kid is asleep so it's back to the lab for me.
After a bit of cable management we've got the first signs of life out of the board.
Applied 12V power to the input and it's drawing 3.6 mA. This is normal and expected, as all power rails are supposed to be off at this point other than the raw input and the 3.3V standby rail driven by an LDO to power the supervisor.
Next step is to put some code on the supervisor and start bringing up more power rails.
Spent a little while updating my STM32 peripheral library for the L031 (this was my first design using it) but I now have the PLL active and a blinky running at 16 MHz from flash.
Now to get a serial console up so I can get some more debug output besides a single LED...
Ok, UART is alive. Next step is to bring up a timer, then I'll have enough stuff working on the supervisor that I can begin actual power rail testing.
Can you tell I spend a lot of time in IDA? :P
Timer and logging framework are up. Ready to actually move forward with bringup.
So far the only rails that are active are 12V0_RAW (unregulated 12V prior to the main load switch) and 3V3_SB (3.3V standby for the supervisor), which is *very* in spec - averaging 3.30027V.
Next rail is 12V0, the core 12V power feed for all of the other DC-DC converters. This is driven by a load switch which limits slew rate so that I don't pull too much inrush current.
This is the first rail that's under software control from the supervisor.
It came up just fine and measures 11.9979V. Total power draw from the input climbed to 17 mA which doesn't sound unreasonable for five big DC-DC bricks.
Next is 1V0, the core power supply for the FPGA, QSGMII PHY, and SGMII PHYs. This is a big one with a lot of load on it, so lots of room for something to go wrong.
It came up perfectly as well, sitting at about 1.00015V. Overall input power draw is around 100 mA at 12V so an extra 83 mA. Assuming 90% conversion efficiency this means the board is pulling about 896 mA on 1V0 at idle!
In the interests of limiting potential damage to the expensive prototype if there's a short, the supervisor is pretty aggressive with timing and rail monitoring. If it commands a rail to come up and it fails to give PGOOD after 5 ms, it will automatically panic and shut down all power, then print a diagnostic message to the UART.
Unfortunately, the streak has come to an end with 1V2 which failed to come up within the (admittedly aggressive) 2 ms timeout. The automatic shutdown did its job and I don't think anything fried.
Next step: toss some probes down and see what's going on with that rail.
Looks like the 2ms timeout might just be too aggressive. It seems like the rail (blue) is coming up just fine then the MCU gets antsy and shuts it down before it's come up all the way.
But hey, it was a good test of my protections!
1V8 is next. This is the core rail for the QDR-II+ SRAM and also runs (through a load switch which is currently off) most of the single ended digital I/Os on the board.
It came up fine, measures 1.792V, and the board is pulling 135 mA from the 12V input.
Still performing nominally but I need to be up early-ish tomorrow to do family weekend things so this is probably as far as I'm going to get.
Tomorrow I need to bring up the 1V8_IO, 2V5, 3V3, and Vref/Vtt rails and verify that all of the analog rails filtered off the core ones have correct voltages.
Then I can hook up to the JTAG on the main MCU and the FPGA, load some blinkies, and begin the fun part of the bringup process!
But first, time to open a support case with STMicro for the six datasheet errata I found while bringing up the supervisor firmware.
Just *once* I want to do a design with a new digital chip of nontrivial complexity and not have to do this. Plz?
Time to bring up another rail. 1V8_IO is slightly lower than 1V8 (1.7886V) due to voltage drop across the load switch, but this is well within acceptable limits.
Pulling 188.9 mA (2.3W) on the 12V input.
Tried to bring up Vref / Vtt for the QDR-II+ but I'm seeing 1.8V instead of 900 mV which isn't good.
This is <= VCCIO so I don't think I damaged any of the input buffers on the RAM. (The FPGA is definitely fine since these pins aren't even configured as Vref inputs yet and the bank is powered by 1.8V).
But I either have a PCB assembly problem or something wrong in the schematic. Time to do some digging...
Not great, seeing 1.8V on Vtt even with the Vtt regulator disabled (but with 1V8_IO on).
With 1V8_IO disabled but 1V8 on, I'm also seeing 1.8V on Vtt. But Vref isn't showing much of anything in that state.
Yep, LP2996 datasheet says that AVIN is supposed to come up first.
This is a bit of a conundrum because the FPGA has VCCAUX driven by 1V8 and VCCO driven by 3V3.
And it wants VCCAUX to come up before VCCO.
But we're allowed to do the opposite (VCCO > VCCAUX + 2.625V) for up to Tvvco2vccaux (300-800ms depending on temperature) per power cycle, with a total of 240K power cycles. This might lead to some glitching on 3.3V GPIOs on the FPGA but I think that will be OK in this use case.
Yay for fixing hardware problems in software! I'll just bring up 3.3V before 1.8 and we should be OK.
For LATENTRED I'll switch AVIN to run on 3V3_SB at which point everything should be OK.
With this sequencing fix, 3V3 comes up fine (slightly low, 3.2804V, but that's acceptable) and the board is now drawing 207 mA / 2.5W at the input.
And Vtt is now showing zero volts with the regulator disabled, which is what we expect.
With the regulator enabled (and 1V8_IO enabled) we show 900.39 mV on Vtt and 897.33 mV on Vref, while drawing 227 mA (2.7W) at the input).
This is a bit more of a Vref-Vtt delta than I'd like but it shouldn't be enough to cause problems.
Final power rail is 2V5, which runs a lot of analog stuff in the PHY.
This came up fine as well, although also a bit low: 2.4834V.
Now pulling 293 mA (3.5W) at the input.
This is all of the core power rails done. Now I just have to add a few lines of code to release the FPGA and MCU resets and I'll be ready to start bringup of them.
This was a close shave. Almost couldn't fit both JTAG cables next to each other.
I verified non-interference of the board side male connector but forgot the female IDC connectkr overhung on the sides.
Bringup is going pretty well I think.
Maybe could use a bit more kapton tape?
Gradually bringing up firmware on the main MCU. UART, uptime timer, and config variable database are running, about to work on the link to the FPGA.
But first I need to do a bit of independent testing on the FPGA.
Just loaded a test bitstream on the FPGA and verified the LEDs all work. And the supervisor is able to see when the FPGA is up.
Next step, I think, will be getting the MCU and FPGA to talk to each other.
Got a stripped down version of the base FPGA bitstream running.
It's super nice having all of the data from different instrumentation all coming to one place in ngscopeclient so I can have a single dashboard to look at everything.
After fixing a few PEBKAC issues, MCU and FPGA are talking over quad SPI.
But the data coming back is shifted by a nibble or two from what I expect. Not yet sure if timing or logic issue.
Should have put test points on the QSPI bus but silly me thought that since it worked last time, I'd be fine with PHY layer stuff and could just use an ILA on the FPGA...
And they're talking properly! That's it for tonight, I have to be awake in five hours...
I'll probably work on thermal stuff after work, since that affects the health of the rest of the board. The tachometer output of the fan goes to the FPGA (for... reasons) so I need to implement a speed monitoring block and make it output RPM values over QSPI to the MCU.
Then I need to add a PWM generator on the MCU, and bring up an I2C bus to poll the four temperature sensors around the PCB.
Also I found a new design oversight.
I have monitors for the supervisor on every regulator PGOOD pin so I can detect and shut down if a rail starts sagging due to overcurrent etc.
But I don't have an ADC pin on the 12V input so I can't detect a failure of input power and sequence rails off properly. All I can do is wait until one rail trips out of regulation then panic shutdown the rest (without proper sequencing delays since this is indistinguishable from a short).
I2C4 isn't happy. Trying to read the MAC address EEPROM and getting hung up sending an I2C start bit. The register is supposed to be self cleared in hardware and I'm not seeing it ever clear.
So either there's a peripheral setup issue (nothing jumps out at me in a quick register dump) or something is wrong in hardware (SDA or SCL stuck/open).
Unfortunately this bus is on internal and back side routing exclusively (again, should have put a top side test point on... Derp). So I'm gonna have to rip off some tape and invert the board when I get home from work and see what's really going on.
Started a google doc with a live "things to do better next time" list. So far all are minor annoyances or things I can work around without having to bodge the board. (Anyone have a self hosted, lightweight suggestion for this kind of thing? Etherpad or something?)
https://docs.google.com/document/d/10j4HWuMBLfLvX5Notvezs26lcIxuNnWbeJlv_JciUEA/edit?usp=drivesdk
The I2C4 issue smells like a soldering issue so far, but I'll know more when I get home and land probes on the bus.
My main bench scope is out for service still so I'll need to use the 16 GHz monster to troubleshoot my I2C. Miiiiiight be slight overkill...
(I could also use the PicoScope but it's on the other side of the bench, not sure if probes will reach all the way over here)
LP2996 needs to be powered by 3v3_SB so AVIN is up before PVIN Provide 2 way comms bus (i2c?) From super to main mcu for querying rail status and requesting warm reboots/shutdowns Move supervisor to stm32l031 qfn48 package (need to buy some) to get more IO capacity Hook FPGA done pin to main MCU...
Back from work and debugging the I2C issues.
I2C1 (temp sensors) is giving NAKs to any bus access while I2C4 (mac addr eeprom) hangs trying to send a start bit.
Probing I2C1 at the pins of the temp sensors shows SDA stuck at 0 while SCL is floating high as expected. Wonder if I have a bad solder connection on the pullups?
Time to pull some tape and cables off the board and get it back under the microscope.
OK, that explains everything.
Misread the alt function table and had PB6-PB9 set to AF4.
Turns out that while AF4 is I2C4 on some other pins, on PB8/PB9 it's... I2C1.
So I had two sets of pins muxed to the same peripheral and Bad Things(tm) happened, including traffic going out the wrong pins (gee, I wonder why it never got acked...)
Yep, this looks more sane.
The FPGA -> MCU QSPI link probably needs some timing tweaks still; it works at 25.6 MHz but when I try to bump it up to 32 or 42.6 MHz I start seeing results shifted by a nibble.
Will troubleshoot that later, I don't need more than 100 Mbps of MCU-FPGA throughput now (if ever).
Next step will be building the fan tachometer in the FPGA, I think.
Tachometer core on the FPGA builds OK but is giving values that are way off the ~5k RPM I measured for the fan with a scope.
Not yet sure why. The tach block integrates N (currently 16) cycles of the waveform, measuring period against a stable reference clock, then converts frequency from Hz to RPM.
I have a dead time (currently 1000 clocks at 187.5 MHz, so 5.3 us) after each toggle for debouncing which might be too short. Or maybe it's a math error converting from Hz to RPM. I'll find out tomorrow.
Turns out that while I did have a small math error (two *pulses* per revolution on the green wire, not two *toggles* per revolution), the main error was actually in my bit-serial divider IP.
Which I had written back in grad school for my thesis, and it worked great on that CPU because I happened to have the inputs stable from when a divide was issued until it retired. The interface spec called for the divider to register the inputs on the first cycle, but one line of code used the unregistered value instead. Oops!
Anyway, I now have working fan tachometers (no PWM outputs yet, so they're always at max RPM), plus I can read the FPGA sensors using the XADC, and the I2C sensors scattered around the board.
The STM32 also has an on-die temp sensor which I'm not using yet, but I think that's the only missing bit.
None of the Ethernet PHYs or power supply components have die temperature sensors on them to my knowledge. The SFP+ may have a sensor on its I2C bus, but I haven't brought that up yet (that will come much later).
Also tweaked a few timing settings on the quad SPI and I'm now getting reliable performance at 42.66 MHz (170.64 Mbps). That's as fast as I can go without either changing my FPGA-side QSPI IP to not require 4x oversampling, or moving it out of the RAM controller clock domain into something faster (which would then necessitate a lot more CDC blocks on the core fabric SFRs).
While the sensors are brought up in that they work and I have functions that read them, there's no commands in the CLI to read them later on (yet). So for now all you can get is single-point measurements during boot.
So now there's a few directions I can go for what to bring up next:
* PWM outputs for the fans
* Warm reboot request between main MCU and supervisor
* RGMII management interface
* SFP+ uplink
* SGMII edge ports
* QSGMII edge ports
* QDR-II+ SRAM
I'm thinking the RAM might be good to do next since it's fairly self contained and easy to test in isolation.
While waiting for a RAM test bitstream, wired up a test fixture for sniffing and verifying traffic on the SFP+.
It's just two back to back optics connected through 6 dB RF splitters with the other leg of each going to the scope.
And it's a good thing I checked.
Apparently this wall port is spitting out 1000base-X traffic, not 10Gbase-R.
Time to go fix that before I think about bringing up the 10GbE on this board!
Aha, that would do it. PP4/34 is connected via an obviously temporary patch cable to a 1000base-SX optic on one of my 1G switches. And there's a cable coming off my 10G core switch dangling right next to it.
I must have needed a 1000baseX test signal a while back and forgot to reconnect the cable.
And getting nice looking 10Gbase-R idles coming off the switch now.
The line coming off the LATENTPINK board is flatlined, which is unsurprising as the FPGA design loaded on it doesn't yet bring up any of the transceivers.
It seems all of my simulation testing paid off, possibly? My homebrewed QDR-II+ controller seems to have worked on the first attempt in real hardware!
It uses a fair bit of juice (unsurprisingly, given all of the SSTL signals). Power consumption jumped from 5.5W to 8.2W (2.7W delta) when I loaded the new bitstream, but everything is still happy (FPGA Tj is at 39.5C and seems to be stable).
This is running the RAM at 375 MHz (750 MT/s), comfortably less than the 450 MHz (900 MT/s) speed grade limit. But that's all I need to get 24 Gbps of throughput, which is the requirement for this board to saturate 14x 1 Gbps + 1x 10 Gbps links.
No MIG, no PHASERs, no weird MEMORY_QDR mode on the ISERDES to sample on CQ and CQ# rising edges.
Just using IDDR's clocked by a 90 degree PLL shifted version of CQ/CQ# fed to a single IBUFDS.
Next step will be to write a full BIST core so I can get more confidence than "I poked two addresses in the VIO and it seemed OK".
Started bringing up the SFP+ interface.
The MCU now correctly detects optic insertion/removal and toggles TX_DISABLE a short time after the optic is inserted.
So far RX_LOS is ignored and I don't do anything with the RS pins. The DOM logging is just a test, I won't actually dump all the sensors every time an optic is inserted long term. That will be under "show interface transceiver" or similar (along with lots more details).
But something is wrong, the transmit data seems very unstable and I'm not seeing anything that makes sense.
I think this might just be the optic sending noise with the FPGA either not transmitting at all, or transmitting gibberish. My logic analyzer in the FPGA fabric is failing to arm because Vivado isn't seeing a clock.
Well that explains the implementation warning I was getting about an "invalid clock configuration" that I had been chasing for a while but never found the root cause of.
The transceiver quad PLL had a typo in one setting so it wasn't locking. That explains a lot.
Now linking up and seeing broadcasts on the sandbox network.
SFP+ link/activity LEDs on the board don't currently do anything, so that will probably be the next TODO item.
Note that the eye patterns in the screenshot are taken off the SFP+ mid-span tap, so while they' can be used as a reasonable proxy for jitter in the actual waveform, they won't show small reflections or vertical eye closure present on the actual DUT. At some point I'll probably land probes on the actual differential pairs on the PCB, but for the moment it looks to be clean enough I doubt there's any problem there.
@jpm Yeah.There's a ton of EEPROM and monitoring fields I'm not logging yet, this is just a start to sanity check that I can talk to it and get plausible values.
My focus at this stage is board bringup not feature-complete firmware development. Once I verify a subsystem isn't obviously broken I move on to the next.
@azonenberg I can't get over how quickly you're powering through this work, how neat your work and work area are, and how you're *also* making the time to take us with you. It's inspirational!
In the meantime, I can't get a DPI panel to work with a Raspberry Pi. Our projects are BASICALLY equivalent, yes indeed!
@j You're seeing the one bench I cleared off (with carefully chosen camera angles to not show the [redacted] from $DAYJOB's client on the adjacent bench).
I assure you there's other parts of the lab that are less pretty right now. With a toddler at home and work keeping me busier than usual, I haven't been keeping up with my usual weekly/monthly maintenance as much as I'd like. So I've been focusing on anything that impacts safety or gets in the way of the stuff I'm actively working on.
I have three GPUs sitting on another bench waiting for me to schedule a maintenance shutdown of the VM server to put them in for ngscopeclient CI testing. They've been sitting there since like April.
@j Nobody can ever accuse me of not dreaming big :)
This project has been in the making since 2012. Many of my other big projects (like ngscopeclient/libscopehal and the probes) originally started as me realizing I needed better tooling to develop/debug it.
The extreme number of networking and networking-adjacent protocol decodes in libscopehal (baseT autonegotiation, 10baseT, 100baseTX, 1000baseX, 10GbaseR, SGMII, QSGMII, RGMII, MDIO, and probably more that escape me off the top of my head) is not an accident.
@jpm Yeah it's https://www.fs.com/products/11552.html?attribute=71429&id=2062755 or something very similar.
It's got DOM, but I haven't read through the relevant part of the spec to get that up yet. I figured I'd do that as part of the broader SFP+ bringup (including the SERDES IP on the FPGA and all of the other stuff).
@azonenberg yep I’ve got similar ones from FS and are fully supported for DOM.
SFF-8472 looks pretty easy, no more difficult than any other I2C device, and at first glance mostly looks like mapping bits and bytes to descriptive strings or numbers with little calculation involved
@azonenberg
@cr1901
There's Nextcloud, depending on what you call lightweight.
https://nextcloud.com/blog/nextcloud-introduces-collaborative-rich-text-editor/
You need to take meeting notes with your colleagues. You quickly want to jot down some thoughts. You draft a new proposal during a team call. Just some reasons why you might need a lightweight, distraction-free text editor that lets you edit text with multiple users. Of course, heavy-weight solutions with full MS Office support […]
@azonenberg I quite like https://hedgedoc.org/ (formerly known as CodiMD/HackMD CE)
It's a realtime collaborative markdown editor, so not WYSIWYG but with a side-by-side realtime preview. I consider that an advantage for most of my use cases. Getting formatted and/or structured content out of an etherpad lite into something else used to be somewhat annoying, but is trivial with markdown.
No experience with hosting it myself, but I also haven't heard any complaints from people I know that host one (who mostly moved away from etherpad lite).
@philpem Yep, not sure about the H735 in particular but it wouldn't surprise me if it had one.
The 500 uV shift in Vcore is cool to see but not surprising at all.