Mastodawn

Welcome to day three of #QConLondon! Doing last minute touchups for my talk at 10:30, so probably not going to see the keynote. Will livepost again once I'm back to watching talks

RE: https://bsky.app/profile/did:plc:rvlyeda73kxm7l2weegk73pa/post/3mhao6nupj22k

Show thread

Hillel 6d ago

Ok it's done, gonna drink some tea and get ready. See you on the other side!

Show thread

Hillel 6d ago

Wrote down every question I got as slide annotations and then managed to lose them all. Now trying to write them all down from memory

Show thread

Hillel 6d ago

---- "Using Async/Await for Computational Scheduling", Orson Peters, #QConLondon Most people use AA for I/O and networking, and that's what LLMs suggest you use it for. This talk is about using it for CPU-intensive work. Async is effectively user-level cooperative multitasking.

Show thread

Hillel 6d ago

There are several flavors of AA- also called coroutines/promises/futures. This talk's flavor is "low-level", found in Rust/C++/Zig, but not Python or Go (which is different). Rust AA: Language support for Future trait, Waker type, async fn syntax. Executors define `spawn` and `poll`

Show thread

Hillel 6d ago

A future needs a `pool`, which takes a waker and outputs a Poll enum {Ready(T), Pending}. Waker has an unsafe `new` and a wake function. `async fn` transforms fn from returning val to returning a future. Using example of `factorial`, which is a "bad idea only for demonstration"

Show thread

Hillel 6d ago

Async/Await effectively abstracts away state machines. By Rust convention, the executor has `spawn` as an entry point, which takes a future and returns a JoinHandle future. This provides parallelism. Can implement mutexes, semaphores, channels, barriers. Also waitgroups and joinsets #QConLondon

Show thread

Hillel 6d ago

[I think Waitgroups are fork-join and joinsets are "first-to-return"] Demoing how to use AA to make a semaphore, and how it enables bespoke userspace scheduling. Contrast with threads, which require OS APIs. PROBLEM: blocking the executor. If all threads are busy, I/O can't progress.

Show thread

Hillel 6d ago

That problem is why people say not to mix AA and computation, and-or use `spawn_blocking`, which stops async until the blocking world is gone. This sucks! Easy solution: use two executors! One for I/O, one for CPU. Used this approach for data processing @ polars (dataframe library)

Show thread

Hillel 6d ago

For streaming operations (data flows in, processed, flows out), use Tokio for async I/O, and a custom executor for CPU scheduling. (Admitted unecessary, Tokio can spawn two execs) Fixed pipelines with channels, similar to actor system. #QConLondon

Show thread

Hillel 6d ago

Example query: "get sales, parse dates to date types, get cumulative sum of sales, and filter by [weekday]?" Splits work into low-priority (data inflow) and high-priority (thread-local queues). LIFO best for throughput, as it keeps things in cache. Threads can steal all-but-last tasks

Show thread

Hillel 6d ago

Data is split into small "morsels". In Polars a morself is currently 100k rows Elementwise nodes like filter and format are 1-task-per-thread that loops over input. Nodes are physically connected with 1-element channels. Zip nodes instead wait on two futures. Pulls on input, pushesto output

Show thread

Hillel 6d ago

Of note is that zipping is real hard with just pull or just push. Serial node to Parallel requires a distributor, which is double ended (1 distributed to N distributors). Uses consume token to prevent task stealing [I think]. Global effect, not an actor model #QConLondon

Show thread

Hillel 6d ago

Parallel to Serial requires linearizer, which so simple he explained it all while I was typing Example of cumsum: each morsel is summed independently, and then map-summed with previous morsel's last value. Possible to do without Async/Await, but adding "streaming" constraint makes really hard

Show thread

Hillel 6d ago

Alternative architectures for fixed pipeline: fork-join [which I guess are not waitgroups?], divide-conquer, background processing, ad-hoc interleaving/chaining. AA downsides: Deadlocks, more state/context, hard to mix sync and async, tooling isn't great (flat callstacks), hard onboarding

Show thread

Hillel 6d ago

---- "AI Driven Game Creation", Danielle An, #QConLondon AI is changing how games are being made. Going to show demos of breakthroughs vibecoded in last week. People will play demo live. Then, all the new problems we got. [Screen font is real small, may not be able to read everything]

Show thread

Hillel 6d ago

Videogames is a bigger industry than film or music. AI is changing things: everybody is now vibecoding games. Concept art and 3D assets went from taking weeks to being incredibly cheap. Live demo time! 4 person drawing game. AI/Gemini judged the drawings [Took a while to work, demo had problems]

Show thread

Hillel 6d ago

Live demo 2: multiplayer game where the NPCs are generated from the prompts, affecting model, physical attributes, personality, and abilities Demo 3: turning MSPaint graphic maps into nicer looking levels Demo 4: Crowd going to dynamically update the talk slides as it goes along [???]

Show thread

Hillel 6d ago

[Screen vibrating like crazy] New kind of game: where AI is integrated into game at all layers, making unique and unpredictable experiences. Raises new problems Making games extremely high risk. Now engineers no longer blocked by artists, artists not blocked by programmers. Iteration #QConLondon

Show thread

Hillel 6d ago

Small teams can bring out projects in days or weeks. But still a lot of hard work. Agents are nondeterministic, meaning can't provide consistent experience for players. Update to the LLM can introduce regressions if diff behavior. Instead of `work -> ship`, now `work -> ship -> monitor -> work`

Show thread

Hillel 6d ago

Vibecoding changes how teams work. When 20 people all vibing one codebase, individual features are cool and the end-to-end system is a mess. Engineers need to BOTH make highly parallel work happen, but still all integrate correctly at the end. [Can't read slides at all, font too small]

Show thread

Hillel 6d ago

LLMs add huge latency to player experience, which is not fun. LLMs add a lot more breakages to code. Need tons and tons of unit tests Working in an AI-native way changes the team dynamic. Less blockers, faster iteration. Dissolves division between "junior" and "senior" engineer, vs AI-comfort

Show thread

Hillel 6d ago

LLMs makes code so cheap you can use duplication as a feature, not a bug. Still need to make a scalable system. make players and creators happy. Surprising issue: engineers burning $30k on tokens. Seeing a lot more prototyping of *board games*, interestingly enough #QConLondon

Show thread

Hillel 6d ago

----- "Automatically Retrofitting JIT Compilers", @[email protected], #QConLondon About taking existing language implementations and automatically generating just-in-time compilers for improved performance. Demoing a Mandelbrot in Lua, which takes 3.2 seconds, on standard impl

Show thread

Hillel 6d ago

Created `yklua`, lua with JIT, and reran the same thing. Got 0.8 seconds, 4x faster. "You can bet I cherrypicked this example rotten". Now running micropython benchmark, 15 seconds. `ykmicropython`, which took about ten days of work, is 2x faster.

Show thread

Hillel 6d ago

Definitions: - VM: system with ≥1 language implementation - Interpreter: "simple language implementation - JIT compiler: Impl that observes running program and figures out optimization. Why this project? "People go from 'not caring about performance' to 'it's an existential crisis' in 24 hours"

Show thread

Hillel 6d ago

Often you can eke out some extra performance by dropping in a faster language implementation. Pypy is 3-4x faster than CPython ...There are at least 16 JIT compilers for Python. Almost all are dead. JITs are *hard*. And expensive. And often incompatible with mainstream implementations #QConLondon

Show thread

Hillel

JITs are optimizations, so have to embed assumptions in the language to make them faster. So when the language evolves, the JIT gets left behind. LuaJIT is several versions behind standard Lua. Can we automatically derive JITs? Most such languages have C interpreters. That's the source of truth.

Show thread

Hillel 6d ago

The specification effectively becomes "the JIT must have the same semantics of the C interpreter" in order to be compatible. So we got to "generate a meta-tracing JIT compiler from C interpreters". "Meta makes me nervous. I have to understand the thing and the metathing"

Show thread

Hillel 6d ago

Tracing: manually record hot loops at run-time Meta-tracing: record the interpreter executing loops at run-time. "This is so weird I will look at this from a couple of different directions and hope one makes sense to you." 1: C is AOT (ahead of time). Compiled with ykllvm to make Exe #QConLondon

Show thread

Hillel 6d ago

Then at runtime, if a loop happens in interpreter enough to become "hot", trace it. Start compiling it to get a machine code version. When done, hand it back to the interpreter. At some point might need to "decompile". 2: interpreter is a while loop with a `switch GET_OPCODE`

Show thread

Hillel 6d ago

Say we have `OP_JLE` (jump if less than or equal to 0). If a hot loop has `if x (y += 3)`. Tracing saves the op codes LOOKUP(x), JEQ, LOOKUP(y), etc. Metatracing would be saving the c code that processes the opcodes (I think?) [Technical aspects of how tracing is actually done in yk]

Show thread

Hillel 6d ago

So how does yk optimize a programming: 1. Inlining 2. Standard(ish) compiler optimisations 3. Interpreter hints, like "this function is idempotent". Hints like these are why yklua was so much faster than lua Now looking at Lua's OP_ADDI. Take a program that increments by 64 500k times #QConLondon

Show thread

Hillel 6d ago

After running program, looking at trace. Normally lua would do x +=64 as three opcode instructions, yklua converted it to a single x64 instruction. How did it work? Because we added an interpreter annotation that 64 never changed. Tricky problem: how do we get out of the optimized hot loop?

Show thread

Hillel 6d ago

Ie back from native instructions to AOT code. Involves things like safepoints and stackmaps and shadow stacks. Goal is to abstract away "being in AOT" and "being in JIT", it's transparent. Problem: shadow stack is overhead, stackmaps are a "less loved" feature of LLVM and not as well supported

Show thread

Hillel 6d ago

now a weird JIT trap: ``` while i > 0 do x = ... i = i - 1 end print(x) ``` if i starts < 0, possible to start tracing but then not stop it, and then trace way past the useful point. #QConLondon

Show thread

Hillel 6d ago

yk is not production ready, but way past research project level. So what's next? 1. More LLVIM IR 2. More Optimizations 3. More interpreters (WIP micropython, hopefully CPython soon) www.github.com/ykjit/yk

GitHub - ykjit/yk: yk packages

yk packages. Contribute to ykjit/yk development by creating an account on GitHub.

GitHub

Show thread

Hillel 6d ago

---- Closing keynote: "The Free-Lunch Guide to Idea Circularity", @[email protected], #QConLondon In 1858, Thames was an open-air sewer, and a hot summer lead to the "great stink". Parliament couldn't use their new building. Invested a lot of money in embankments and pumping stations

Show thread

Hillel 6d ago

The fundamental problem was a scaling problem: the Thames didn't have enough throughput ("pooput") to carry away the waste. This was a consequence of prev architectural decision: eliminating London cesspits (area next to house to collect waste). City added plumbing from houses to Thames.

Show thread

Hillel 6d ago

"It was a fundamental tradeoff between centralized stink and disease, and distributed stink and disease" These kinds of "good ideas that lead to bad consequences" keep coming around. This is because there's a constant tradeoff between "optimize for short term" and "being sustainable".

Show thread

Hillel 6d ago

This matters b/c the digital world creates more carbon emissions than aviation. Data centers (w/o network traffic) use about as much electricity as South Korea. Green energy helps, but it can't be the whole solution. We also need to reduce tech energy consumption #QConLondon

Show thread

Hillel 6d ago

Two topics: 1. LightSwitchOps (LSO) 2. Efficient Software LSO: Architect things to be turned off and on often. Efficiency: works on "Quarkus", Java runtime with higher throughput and much lower carbon footprint. How does it work? Java normally loves delaying stuff to runtime with duck-typing.

Show thread

Hillel 6d ago

Extremely dynamic runtime (with static types) makes sense on local computers with changing environments, but that's not how the cloud works. Containers are fixed envs We shifted hard from monoliths to distributed microservices architectures. More resilient, but higher latency and more complex.

Show thread

Hillel 6d ago

Now we're seeing enormous investments in centralized AI. Mac is instead licensing AIs for a "fraction of the cost it would take to run a datacenter", and instead putting AI capabilities in its hardware. Then it sells it to us and we can run AI computations on our decentralized hardware #QConLondon

Show thread

Hillel 6d ago

AWS prime video reduced costs by 90% by moving from microservices to a monolith. "What does it tell us? The lunch was not free." "Hype is a necessary ingredient of the current business ecosystem of the tech industry." - Meredith Whittaker

Show thread

Hillel 6d ago

AI leaders say insane things about AI to attract VC money. Goal of most companies is to get to an exit and cash out. "What attracks investment? You'd think stability, profitability, revenue." But it's actually growth and excitement. That's why AI promises a "world without developers"

Show thread

Hillel 6d ago

Taking a dig at COBOL now Jevon's Paradox: making something more efficient makes it used more. Software is not going away, unlike other professions obsoleted by technology, like knocker-ups. The more software we have, the more we want #QConLondon

Show thread

Hillel 5d ago

Then again, the job market for developers is bleak and companies keep laying off people, claiming it's "due to AI". These are usually companies with AI products, so this is a way of signalling "our AI product is working". Also: overhiring. Hiring lots of people looks good to VCs