Flop: 63.7 squm
Mux: 28.5 squm
Scan flop (forbidden fruit): 83.4 squm
You know what this means gamers
Also seems like there are no DFFEs in the library. At least mapping those to scan flops with Q connected to SI would save some area, routing and (I think) datapath delay.
@lofty pointed me to the USE_LIGHTER flag too, which can infer clock gates from DFFE groups.
Really this is stuff to look into when I have a bit more RTL in place, but I'm a bit alarmed by the QoR I'm seeing.
Day 3: added a second smaller Hazard3 (APU) for doing audio processing, with private RAM. There'll be some fixed-function upsampling etc too, generating PWM going out to audio pads. Main CPU can access APU's address space but not the other way round. Debugger sees both. APU is RV32EMBZcaZcb for now.
Also add the RISCBoy SRAM controller. IO routing and timing looks like it's going to be very challenging; I'm trying to run a fast parallel bus using essentially all pads on north, east and south sides of the chip.
Day 4: I debugged the load-bearing YAML and then YOLO'd the PPU into the chip with full force.
1W1R memories had to be remapped as I only have 1RW: scan buffer is fine as I never implemented blending; palette RAM now drops writes concurrent with reads; command processor call stack is just synthesised.
I have to say the "RV32" signal format on Surfer is incredibly based. Not sure how good the ISA coverage is yet but even partial coverage is useful.
I've got it on the 32-bit expanded versions so not worried about Zcd vs Zcmp/Zcb confusion etc
One thing I noticed from looking at how the Hazard3 register file is mapped with clock gating inference enabled: register 0 (x0) is present and correct in the netlist even though its output is always squashed in the next pipestage. Oops.
I've always kept the actual regfile code completely uniform to preserve BRAM compatibility, but maybe it's time to make that a bit uglier (in a generate block). I'm also interested in latch-based register files but the constraints could be interesting.
The CG inference looks correct, and I am getting the smallest flop type with one CG per 32-bit register, so nothing to complain about there.
This is the crt0 for my bootrom. Think this has a good amount of silly stuff going on for 30 bytes of code.
Also I'm not going to write ELF patching to fix it but it still annoys me that GCC stacks registers on entry to a `noreturn` function. What are you planning to do with those bro?
Also I realised I was being dumb with the FIR filter implementation so that should come down in area quite a bit.
For a density comparison on the register files, each of the square RAM blocks is 512 x 8, so 4 kbit single-ported.
Tonight on man vs tool
At this point I have the entire timing table from an async SRAM datasheet pasted into TCL comments
It's pretty cool that I can file a GitHub issue about DRCs on standard cells. Feels like progress
https://github.com/wafer-space/gf180mcu-project-template/issues/34
Have 70 ps hold violations. Add 80 ps of hold margin. Result: still have hold violations, and setup is 2 ns worse.
This is why we don't try to fix CTS issues by resizing buffers on the datapath
I finished the design (and probably all the verification) of the streaming SPI read peripheral for the APU. It's fairly simple, the only neat trick is the CPU can reach in and pause the stream so it releases the SPI GPIOs. The CPU can squeeze in a quick access to something else, like the shift register for the buttons, then un-pause the stream and it picks up where it left off.
I don't have a system DMA so having this integrated into the APU is a neat way to support streaming audio samples (ADPCM etc) from flash.
@wren6991 Ah the PPU had completely slipped my mind.
I'm really enjoying this project as a useful visualization of what's possible with the space available in the wafer.space shuttle runs.
Signed-off-by: Jaehyun Kim <[email protected]>