Dougall

@dougall
2.9K Followers
305 Following
1,050 Posts

Low-level systems stuff. Reverse engineering, security research, bit twiddling, optimisation, SIMD, uarch. 64-bit ARM enthusiast.

he/they

Bloghttps://dougallj.wordpress.com
Twitterhttp://twitter.com∕dougallj∕status∕1590357240443437057.ê.cc/twitter.html
Githubhttps://github.com/dougallj
Cohosthttps://cohost.org/dougall

Heinous Type Tom7 Project Summary:

https://youtu.be/M1si1y5lvkk

No one can force me to have a secure website!!!

YouTube

Wookash did a livestreamed follow-up interview with me a while ago and posted the video today: https://www.youtube.com/watch?v=CPCbjHILJV4

(There's a second part with Q&A forthcoming at some point.)

Compression Isn’t Just About Size | Fabian Giesen

YouTube

I thought aliasing buffers and compressed textures in order to write to them was a very bad idea, but after playing with it, it actually seems fine?

https://www.ludicon.com/castano/blog/2026/04/writing-to-compressed-textures-in-metal/

Writing to Compressed Textures in Metal

Writing to Compressed Textures in Metal using heap aliasing.

Ignacio Castaño
Andreas Abel added latency, throughput, and port usage data for Emerald Rapids, Meteor Lake, Arrow Lake, and Zen 5 to https://uops.info/table.html 🎉
uops.info - Table

We released two tech talks today going over how to take advantage of the new architecture, features and associated developer tools.

Accelerate your machine learning workloads with the M5 and A19 GPUs

https://developer.apple.com/videos/play/tech-talks/111432/

https://www.youtube.com/watch?v=wgJX1HndGl0Boost

your graphics performance with the M5 and A19 GPUs

https://developer.apple.com/videos/play/tech-talks/111431/

https://www.youtube.com/watch?v=_5yEcJfB6nk

Accelerate your machine learning workloads with the M5 and A19 GPUs - Tech Talks - Videos - Apple Developer

Discover how to take advantage of the M5 and A19 GPUs to accelerate machine learning. Find out how to use the Neural Accelerators inside...

Apple Developer
New blog post: A Decade of Slug
This talks about the evolution of the Slug font rendering algorithm, and it includes an exciting announcement: The patent has been dedicated to the public domain.
https://terathon.com/blog/decade-slug.html
@never_released @dougall @saagar @alexr @siracusa New in Xcode 26.4b3: 👋 M5 Pro/Max.
CPUFAMILY_ARM_SOTRA (H17S) contains P-cores and M-cores, and there's now a CLUSTER_TYPE_M enum to go with TYPE_E and TYPE_P.
I guess this deserves to be posted on a regular cadence for the benefit of anyone who hasn't seen it before: https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_are_switching_written_numbers_when_scanning
Xerox scanners/photocopiers randomly alter numbers in scanned documents

Xerox scanners/photocopiers randomly alter numbers in scanned documents Please see the „condensed time line“ section (the next one) for a time line of how the Xerox saga unfolded. It for example depicts that I did not push the thing to the public right away, but gave Xerox a lot of time before I did so. <iframe width="700" height="394" src="https://www.youtube.com/embed/c0O6UXrOZJo" frameborder="0" allowfullscreen></iframe>

D. Kriesel
Shrunk the machine code a little here https://github.com/pkhuong/tiny_batcher The x86-64 build now clocks in at 199 bytes (224 on aarch64)! I think x86-64 might benefit from using high half byte registers, but hopefully simpler tricks can get us to <= 3 cache lines.
GitHub - pkhuong/tiny_batcher: a size-optimised sorting library for C and C++

a size-optimised sorting library for C and C++. Contribute to pkhuong/tiny_batcher development by creating an account on GitHub.

GitHub

I mean you can get fancy with them, but you can write a basic implementation in <100 lines of Python that is good enough to prove various 64-bit adder circuits logically equivalent, in a fraction of a second. It feels like cheating.

https://gist.github.com/rygorous/948308f7d998e5fd4e98344687580338

BDD implementation of some very basic circuit verification proving various 64-bit adder architectures equivalent

BDD implementation of some very basic circuit verification proving various 64-bit adder architectures equivalent - bdd_adders.py

Gist