0 Followers
0 Following
5 Posts
Engineer @ Hasura, working on Data Connectors

Interested in GraphQL, the JVM, TypeScript, Relational DB's (particularly Postgres), and Query Engines.

Github: https://github.com/GavinRay97

Email: [email protected]

Twitter: @GavinRayDev
This account is a replica from Hacker News. Its author can't see your replies. If you find this service useful, please consider supporting us via our Patreon.

Officialhttps://
Support this servicehttps://www.patreon.com/birddotmakeup

The least painful C/C++ build tool I've used is xmake

https://github.com/xmake-io/xmake

The reason why I like it (beyond ease-of-use) is that it can spit out CMakeLists.txt and compile_commands.json for IDE/LSP integration and also supports installing Conan/vcpkg libraries or even Git repos.

set_project("myapp")
set_languages("c++20")

add_requires("conan::fmt/11.0.2", {alias = "fmt"})
add_requires("vcpkg::fmt", {alias = "fmt"})
add_requires("git://github.com/fmtlib/fmt v11.0.2", {alias = "fmt"})

target("myapp")
set_kind("binary")
add_files("src/*.cpp")
add_packages("fmt")


Then you use it like

# Generate compile_commands.json and CMakeLists.txt
$ xmake project -k compile_commands
$ xmake project -k cmake

# Build + run
$ xmake && xmake run myapp

I'd not heard of this before, quick search turned up this 2025 post which suggests "fused cross-entropy loss" kernel was integrated into PyTorch:

https://pytorch.org/blog/peak-performance-minimized-memory/

> "The integration involves modifying the TransformerDecoder module in torchtune to bypass the linear layer computation, allowing the Liger Fused Linear Cross Entropy Loss to handle the forward projection weights. "

Is this the same thing as you discuss above?

Peak Performance, Minimized Memory: Optimizing torchtune’s performance with torch.compile & Liger Kernel – PyTorch

It doesn't make sense to me that an embedded VM/interpreter could ever outperform direct code

You're adding a layer of abstraction and indirection, so how is it possible that a more indirect solution can have better performance?

This seems counterintuitive, so I googled it. Apparently, it boils down to instruction cache efficiency and branch prediction, largely. The best content I could find was this post, as well as some scattered comments from Mike Pall of LuaJIT fame:

https://sillycross.github.io/2022/11/22/2022-11-22/

Interestingly, this is also discussed on a similar blogpost about using Clang's recent-ish [[musttail]] tailcall attribute to improve C++ JSON parsing performance:

https://blog.reverberate.org/2021/04/21/musttail-efficient-i...

Building the fastest Lua interpreter.. automatically!

This is Part 1 of a series of posts. Part 2 is available here: Building a baseline JIT for Lua automatically It is well-known that writing a good VM for a dynamic language is never an easy job. High

This other comment by the same user in one of the links from 2 weeks ago I found the easiest to understand, in brief:

https://news.ycombinator.com/item?id=47389233

At 25:48 you have the explanation for the (multiple) tricks: https://youtu.be/8r... | Hacker News

> At least they're throwing consumers a bone via the ARK deal.

I had to look this up. There's a venture fund you can invest in with as little as $500 as a consumer -- though it's limited to quarterly withdrawals.

https://www.ark-funds.com/funds/arkvx

The fund is invested in most of the hot tech companies.

ARK Venture Fund (ARKVX) - ARK Ventures

The ARK Venture Fund seeks to democratize venture capital, offering all investors access to what we believe are the most innovative companies throughout their private and public market lifecycles.

Ark Invest