Dougall

@dougall
2.9K Followers
303 Following
1,045 Posts

Low-level systems stuff. Reverse engineering, security research, bit twiddling, optimisation, SIMD, uarch. 64-bit ARM enthusiast.

he/they

Bloghttps://dougallj.wordpress.com
Twitterhttp://twitter.com∕dougallj∕status∕1590357240443437057.ê.cc/twitter.html
Githubhttps://github.com/dougallj
Cohosthttps://cohost.org/dougall

Correction: two instruction NEON float prefix sum.

I guess I'm a bit out of practice, was focusing on the complex ops too much, and two seemed too good to be true.

Three instruction NEON float prefix sum. I'd wanted to abuse FCMLA (floating-point complex multiply accumulate) for non-complex arithmetic for so long, and I finally came up with something :)

With two unnecessary multiplies to save one instruction, this may only work out on Apple CPUs, but it's a bit of fun.

(For loops you can broadcast the carried value with vfmaq_laneq_f32(scan, ones, prev, 3) for three multiplies saving two instructions. LLVM fights you on that, though.)

[oops, see reply]

My first IR die shot, the GameCube's ATI Flipper GPU. I love that you can just look through a solid silicon wafer with the right wavelength of light.

Image flipped to line up with Nintendo's labeled die shot from 2001 (right).

I love to see a reviewer doing microarchitectural testing before release ❤️ Look at that per-core bandwidth!

(Though I think they accidentally used the A17 P-core diagram as the basis for their A19 P-core diagram, otherwise there are a lot of reversions that they didn't comment on.)

https://www.youtube.com/watch?v=Y9SwluJ9qPI

It's strange seeing stars beside a full moon.

The iridescence comes from diffracted specular reflections, so it definitely does look like that, but only when you hold it at an angle relative to a light source such that it glints. That said, it's surprisingly hard to take a photograph of how it "usually" looks, as it's always doing slightly funny things to the light.

But here are my efforts to take less shiny reference photos:

Apple M1

My first die shots. You can see a lot more than I'd imagined without a microscope. This is a 1:1 macro lens, and just changing the angle/size of the light source gives you a ton of control.

(My photographs, die prepared by "IT AI IC Cyber Style Store"/foreverfire2005 on ebay)

Perfection.

Woah, I'm late to the party, but the Fujitsu A64FX uarch docs were actually awesome: https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitecture_Manual_en_1.8.1.pdf

I don't think I've seen documentation this detailed for a core this complex before. Pipeline stages, bypass penalties, resource allocation and release stages 🤯

A64FX/doc/A64FX_Microarchitecture_Manual_en_1.8.1.pdf at master · fujitsu/A64FX

Contribute to fujitsu/A64FX development by creating an account on GitHub.

GitHub