So I took @warrenm's MetalSlug (https://github.com/metal-by-example/MetalSlug) - itself an implementation of the https://sluglibrary.com paper…

Ported it to https://metalsprockets.com

Optimised it so it renders all text in a single draw call (from an original 40,000 draw calls…)

And then entered the matrix?

Every glyph is vector. Whoa.

(Code needs a cleanup - it got messy but will put it online soon)

A completely vector based terminal view eh @Migueldeicaza? Or did the work you do recently use SLUG?
@schwa wow, my thing uses rasterized bitmaps and the atlas. This is so cool
@Migueldeicaza Not at all optimised - but concept works. 40fps on a M5 Max isn't great.
@schwa omg this is sublime. I love this

@Migueldeicaza Ok - 60fps, should easily manage 120fps when I am not plugged into that speed bump of a Apple Studio Display.

(Big difference running release vs debug - those Swift arrays are slooow. And a couple of obvious optimisations).

Not sure if SwiftTerm needs this but can look into contributing.

@schwa @Migueldeicaza have you all tried any of the noncopyable containers we've been working on in https://github.com/apple/swift-collections?tab=readme-ov-file#basiccontainers-module for your hot paths
GitHub - apple/swift-collections: Commonly used data structures for Swift

Commonly used data structures for Swift. Contribute to apple/swift-collections development by creating an account on GitHub.

GitHub
@joe @schwa oh I had not! I need to pay more attention, which ones there?
@joe @schwa oh RigidArray! Will check it out! That’s exactly what I need
@Migueldeicaza @schwa `UniqueArray` (which grows dynamically but has unique ownership) and `RigidArray` (which grows only up to a prespecified capacity limit) would be the closest to drop-in-replacement for existing `Array` usage
@joe @Migueldeicaza In my case I don't think the speed boosts are warranted the work. But I'll be reaching for them for another couple of projects that will benefit.
@schwa @Migueldeicaza another possibly less-disruptive thing you might experiment with when you have the chance is adopting `Span`/`MutableSpan` in places where you're only reading or writing an array's elements in-place

@joe @Migueldeicaza Yeah i'm using (mutable)span in a few places.

I do have a great issue around perf of simd_float4x4 in the (original) unsafebufferpointer. Writing the matrices the proper way (building up the matrix) was ~4fps… doing it just by writing directly to the members… full speed.

I was blaming array when really it was simd.

I'll see if i can reproduce again and understand it rationally (likely safe vs non safe access). Was fun.

@joe @schwa @Migueldeicaza +1! Noncopyable types are how I've been shipping Swift rewrites that are faster than C
@joe @schwa @Migueldeicaza There’s a reason I manually ported my sample from Swift to Obj-C. Debug performance is absolutely atrocious, and these marginal hacks call into question whether ordinary Swift developers can write idiomatic, performant code. I don’t doubt that performance weenies can write code that’s “faster than C”—now solve it for everyone else.
@warrenm @joe @schwa I am just at the stage in life where I rather spend time with the profiler and picking these idioms than spending days staring a core dumps :-)

@schwa this would be very neat!

Yeah swift arrays are bad, dynamic exclusive access is still my main problem

Ok I guess I wasn't left any choice in the matter.

For those wondering…

What's cool about this is that it uses true vector glyphs. A lot of text in 3D (and in your GPU accelerated Terminals etc) render the glyphs to textures at fixed (usually high) resolution. Or uses Signed Distance Fields trickery to render them well.

This code is using the slug algorithm that @warrenm ported recently after the paper went public domain.

Much more info here:

https://metalbyexample.com/slug/

Slug

I expected to have to wait 12 more years to write this post. Back in 2017, Eric Lengyel published the landmark paper, “GPU-Centered Font Rendering Directly from Glyph Outlines” in the J…

Metal by Example

I also think it's cool because it uses my http://metalsprockets.com project which makes working with Metal on mac, iOS and visionOS super(*) easy. But I'm biased.

(* for some value of super)

GitHub - schwa/MetalSprockets: MetalSprockets

MetalSprockets. Contribute to schwa/MetalSprockets development by creating an account on GitHub.

GitHub
@schwa Ohhh. This looks excellent! I got tired of all the boilerplate I need for even the simplest Metal project.
@schwa @warrenm Love the Vision Pro version. We used SDF's in Mapbox GL to render glyphs from a texture map. Been meaning to check out this new algorithm, as I think I'll have a new version in my future soon.
@incanus it’s not new at all. It just wasn’t public domained
@schwa See, this is just how much I have not checked it out, I didn't even know. Been one of those months.
@schwa I was kind of hoping/expecting the glyphs would pile up on your palm and then you would pour them onto the floor…
@Dhmspector I leave that as an exercise for whoever wants to do when i push to github.
@schwa good job! I can see the Golden Retrivier, Chocolate Labrador, and the Irish Setter.
@schwa Absolutely epic.

@warrenm You did the hard work of porting it to Metal. Thank you so much.

All i did was put the atlas textures into argument buffers and the indices into those texture argument buffers and indices to model matrices in each glyph vertex.

Gets rid of the submesh concept and allows everything to go into one draw call (at the expense of more bytes per vertex)

@schwa @warrenm This is very cool. I just finished a little side project where I could have used this - I ended up importing .obj with letters to manipulate them in Metal. 😅
@schwa @warrenm that's a cool project to port! The declarative syntax reads much better to my weary eyes than the manual boilerplate :)
@schwa @warrenm That’s pretty dang sweet!
@schwa @warrenm Why Slug and not the Loop/Blinn technique (which I believe is faster).

@alexr @warrenm Because warren's code caught my attention and i had a "hmm what if" thought at just the right time…

Links?

@schwa @warrenm The PS3's GPU was exceptionally fast at this because the final 16 or 4:1 anti-aliasing can be done using the on-board stencil buffer.
@alexr @schwa @warrenm Loop-Blinn is indeed faster, but that speed comes with some serious trade-offs, just as SDFs are even faster but have their own limitations. Compared to Slug, Loop-Blinn is much harder to implement due to the one-curve-per-triangle design. The triangulation phase can be impractical for even moderately complex fonts because you can end up generating hundreds of tris. With Slug, you just need one quad per glyph no matter what.
@alexr @schwa @warrenm The simple triangulation makes Slug better suited for things like decaling onto 3D surfaces where clipping might be necessary. Slug also doesn't need an extra indirection in the vertex shader to reference the per-glyph meshes. And because it considers multiple curves per pixel, the rendering quality is significantly higher at smaller font sizes. You also get at least 256 coverage levels without any need for stencil.
@EricLengyel @alexr @warrenm Thanks for the info and making it public domain. Much appreciated.
@EricLengyel @schwa @warrenm Do the different amounts of vertex vs fragment computation matter for any GPU still worth targeting? We didn’t have Slug back when we did our PS3 work and it came out a whole hell of a lot better than the cargo-culted SDF all the teams were using then.
@alexr @schwa @warrenm When it comes to speed? Probably not. But I have found that a lot of people are still concerned with memory usage, and all that vertex data can add up fast, especially for a CJK font. Also consider that people want to create arbitrary vector graphics at run time, and having to go through that triangulation step on the CPU every time would be prohibitive.
@alexr @schwa @warrenm The GPU in the PS3 can't handle Slug, unfortunately. But it runs great on the PS4!
@EricLengyel @schwa @warrenm Our PS3 solution compressed all the vertex data by converting from TrueType’s 16+16 vertices to a 10-bit Hilbert distance. (I should have patented that.) 10 bits was enough for every full Unicode font we measured. Much less if you didn’t want CJK, etc.
Pretty!

Kinda hating the name collision with the Neo Geo franchise of Metal Slug, but whatever.

Admittedly, for me and Matrix-related visualizations? I probably had the most fun with @[email protected]'s matrixdump (took tcpdump output and shuffled it vertically) but I dunno where that code is anymore. Looks as if his https://monkey.org/~jose/ page is long since deprecated. I used to have that running on a system underneath my desk at a past employer on a tiny 12 or 13" CRT. Some coworkers thought it was a screen saver, but it was actual network traffic. You really did start to see patterns in it after a while too. ^_^


CC: @[email protected]
@schwa @warrenm Finally Vision Pro has its killer app
@numist @warrenm not yet child, not yet…