Mastodawn

The least painful C/C++ build tool I've used is xmake

The reason why I like it (beyond ease-of-use) is that it can spit out CMakeLists.txt and compile_commands.json for IDE/LSP integration and also supports installing Conan/vcpkg libraries or even Git repos.

    set_project("myapp")
    set_languages("c++20")
    add_requires("conan::fmt/11.0.2", {alias = "fmt"})
    add_requires("vcpkg::fmt", {alias = "fmt"})
    add_requires("git://github.com/fmtlib/fmt v11.0.2", {alias = "fmt"})

target("myapp") set_kind("binary") add_files("src/*.cpp") add_packages("fmt")

Then you use it like

  # Generate compile_commands.json and CMakeLists.txt
  $ xmake project -k compile_commands
  $ xmake project -k cmake

# Build + run $ xmake && xmake run myapp

Show thread

gavinray 2d ago

I'd not heard of this before, quick search turned up this 2025 post which suggests "fused cross-entropy loss" kernel was integrated into PyTorch:

https://pytorch.org/blog/peak-performance-minimized-memory/

  > "The integration involves modifying the TransformerDecoder module in torchtune to bypass the linear layer computation, allowing the Liger Fused Linear Cross Entropy Loss to handle the forward projection weights. "

Is this the same thing as you discuss above?

Peak Performance, Minimized Memory: Optimizing torchtune’s performance with torch.compile & Liger Kernel – PyTorch

Show thread

gavinray 5d ago

It doesn't make sense to me that an embedded VM/interpreter could ever outperform direct code

You're adding a layer of abstraction and indirection, so how is it possible that a more indirect solution can have better performance?

This seems counterintuitive, so I googled it. Apparently, it boils down to instruction cache efficiency and branch prediction, largely. The best content I could find was this post, as well as some scattered comments from Mike Pall of LuaJIT fame:

https://sillycross.github.io/2022/11/22/2022-11-22/

Interestingly, this is also discussed on a similar blogpost about using Clang's recent-ish [[musttail]] tailcall attribute to improve C++ JSON parsing performance:

https://blog.reverberate.org/2021/04/21/musttail-efficient-i...

Building the fastest Lua interpreter.. automatically!

This is Part 1 of a series of posts. Part 2 is available here: Building a baseline JIT for Lua automatically It is well-known that writing a good VM for a dynamic language is never an easy job. High

Show thread

gavinray Apr 2

This other comment by the same user in one of the links from 2 weeks ago I found the easiest to understand, in brief:

https://news.ycombinator.com/item?id=47389233

At 25:48 you have the explanation for the (multiple) tricks: https://youtu.be/8r... | Hacker News

Show thread

gavinray Mar 31

  > At least they're throwing consumers a bone via the ARK deal.

I had to look this up. There's a venture fund you can invest in with as little as $500 as a consumer -- though it's limited to quarterly withdrawals.

https://www.ark-funds.com/funds/arkvx

The fund is invested in most of the hot tech companies.

ARK Venture Fund (ARKVX) - ARK Ventures

The ARK Venture Fund seeks to democratize venture capital, offering all investors access to what we believe are the most innovative companies throughout their private and public market lifecycles.

Ark Invest

Official	https://
Support this service	https://www.patreon.com/birddotmakeup