I just published the first version of my illustrated SVE (ARMv9 SIMD) instruction list, with a truly absurd number of instruction diagrams and descriptions.

Check it out at https://dougallj.github.io/asil/

A64 SIMD Instruction List: SVE Instructions

I updated this with an Apple M4 preset, which is the default on the SME page, and sticky table headers (thanks to @amonakov for the suggestion).

It's exciting to get to try out some of these instructions on the new iPad Pro!

@dougall It's a little funny that within hours of publishing this, Arm releases new SIMD instructions for SVE and SME xD

Specifically the FP8 ones

@fclc Hahaha true – it was momentarily up to date!

@dougall to be clear: SVE is arm v8.2 extension.

Just was not popular.

@dougall
I just listened to Kwabena Agyeman talk about SIMD on @embedded . This seems like a great resource and a lot of work!

What did you use for making the diagrams?

@moonling Fun! Yeah, SVE is currently kind of at the opposite end of the space from embedded SIMD extensions like Arm Helium – currently found in supercomputers and some high end phones and tablets (but I guess the ARM Cortex-A510 and similar may find their way to embedded spaces at some point).

I just wrote my own Python scripts to generate the SVG code for each, copying previous scripts and adding what I needed for each diagram.

@dougall nit: how hard would it be to add instruction bitcode for these? I've had to work with a custom JIT engine for work, and whoever wrote it previously did not deign to explain the bajillion magic numbers and masks :/

@crystalmoon Oh no... For SVE specifically?

It's definitely possible, but it's probably not easy. If you click through to "exploration tools" (top-right) it shows the encoding.

The tables are partially generated from Arm's XML, and the XML can be used to generate the magic numbers and masks, for example:

https://gist.githubusercontent.com/SciresM/edee7a2e388480e0df43b7bacf94ca33/raw/ea2864759457ba815205c6506ad632ee9e4aacd9/arm64.py

I'm not too keen on trying to tie together these two piles of hacks, but it might be possible to make an index... Couldn't you just feed them to a disassembler?

@dougall I am going the other way around, I *am* the assembler in this case 😭

@dougall would be nice to make table headers sticky so they don't scroll out of view. It should be a matter of wrapping them in <thead> and adding

thead {
top: 0;
position: sticky;
}

to CSS. Do you want a pull request with that?

@amonakov Good suggestion, thanks! Probably best to avoid PRs, since the HTML is generated by scripts, but I'll look into it.

(Looks like I need to add the THEADs too)

@dougall those diagrams are really nice. How much of it is automatically generated from the metadata and instruction specs and how much is done by hand?
Or, to put it another way, is there any information that would have made these easier to generate these diagrams?

@adreid Thanks! I'm a fan of your work on the specs :)

For each table row I manually list the instructions (e.g. "zip" -> zip1_z_zz, zipq1_z_zz, zipq2_z_zz). This is used to automatically populate the columns for the supported sizes from the spec. I automatically cross-match this with the SIMD intrinsics data (https://developer.arm.com/architectures/instruction-sets/intrinsics/), and I scan/pattern-match the starts of the ASL to automatically find the requirements (e.g. "HaveSVE() && HaveSVE2BitPerm() && NonStreamingSVEEnabled()").

Intrinsics – Arm Developer

@adreid The diagrams and descriptions aren't automated based on the specification (except for the page title, and the link to the documentation.) The ASL was extremely useful, and it's likely that the majority of simpler instructions could have been done by symbolic execution of the ASL, and it could have been a starting point for more complex instructions. It'd be fun to try this, but I'm not confident in my ability to pull it off.
@adreid (I was also thinking of these as "study notes" – part of the goal was to spend time thinking about each instruction to improve my understanding.)

@adreid One maybe-practical change I'd consider would be pulling the instrinsics data into the machine-readable-specs.

If I were to use symbolic execution, I think one of the trickier parts would be broadcasts, like in SUDOT above. Specifying these in terms of a broadcast function, followed by an element-wise operation might be preferable? But I'd want to try the symbolic execution approach before recommending something like that.

@dougall thanks for the explanation.
I think the symbolic execution approach would work well. And probably not too hard to implement based on the interpreter in ASLi.
@dougall I know that the Intel intrinsic specs are indexed by the unique name of the instruction form. https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#ig_expand=0 and the XED database connects the instruction form with its binary encoding
https://intelxed.github.io and the x86 spec I’m working on gives the instruction semantics (not released yet)
Intel® Intrinsics Guide

Intel® Intrinsics Guide includes C-style functions that provide access to other instructions without writing assembly code.

Intel
@dougall and I came across some pages that link arm intrinsics with instructions.
https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#widening-addition
https://developer.arm.com/architectures/instruction-sets/intrinsics/vaddl_s8
Hopefully there is a machine readable version of this data - but scraping the html wouldn’t be too painful if not.
Arm Neon Intrinsics Reference

@dougall nice 😎 any plans to follow up with SME?

@stsquad Thanks!

Yeah, that's always been the plan, unfortunately I lost a bit of momentum as I couldn't find much by way of interesting uses for SME. It's not bad, it's just mostly for matrix multiplication.

I did start trying to figure out how to do the diagrams nicely. I have a few ideas, but trying to depict the ZA register in an intuitive-but-not-overwhelming way definitely adds to the challenge.