Welcome to day three of
#QConLondon! Doing last minute touchups for my talk at 10:30, so probably not going to see the keynote. Will livepost again once I'm back to watching talks
RE: https://bsky.app/profile/did:plc:rvlyeda73kxm7l2weegk73pa/post/3mhao6nupj22kOk it's done, gonna drink some tea and get ready. See you on the other side!
Wrote down every question I got as slide annotations and then managed to lose them all. Now trying to write them all down from memory
----
"Using Async/Await for Computational Scheduling", Orson Peters,
#QConLondon
Most people use AA for I/O and networking, and that's what LLMs suggest you use it for. This talk is about using it for CPU-intensive work.
Async is effectively user-level cooperative multitasking.
There are several flavors of AA- also called coroutines/promises/futures. This talk's flavor is "low-level", found in Rust/C++/Zig, but not Python or Go (which is different).
Rust AA: Language support for Future trait, Waker type, async fn syntax. Executors define `spawn` and `poll`
A future needs a `pool`, which takes a waker and outputs a Poll enum {Ready(T), Pending}. Waker has an unsafe `new` and a wake function.
`async fn` transforms fn from returning val to returning a future. Using example of `factorial`, which is a "bad idea only for demonstration"
Async/Await effectively abstracts away state machines.
By Rust convention, the executor has `spawn` as an entry point, which takes a future and returns a JoinHandle future. This provides parallelism. Can implement mutexes, semaphores, channels, barriers. Also waitgroups and joinsets
#QConLondon[I think Waitgroups are fork-join and joinsets are "first-to-return"]
Demoing how to use AA to make a semaphore, and how it enables bespoke userspace scheduling. Contrast with threads, which require OS APIs.
PROBLEM: blocking the executor. If all threads are busy, I/O can't progress.
That problem is why people say not to mix AA and computation, and-or use `spawn_blocking`, which stops async until the blocking world is gone.
This sucks! Easy solution: use two executors! One for I/O, one for CPU.
Used this approach for data processing @ polars (dataframe library)
For streaming operations (data flows in, processed, flows out), use Tokio for async I/O, and a custom executor for CPU scheduling. (Admitted unecessary, Tokio can spawn two execs) Fixed pipelines with channels, similar to actor system.
#QConLondonExample query: "get sales, parse dates to date types, get cumulative sum of sales, and filter by [weekday]?"
Splits work into low-priority (data inflow) and high-priority (thread-local queues). LIFO best for throughput, as it keeps things in cache. Threads can steal all-but-last tasks
Data is split into small "morsels". In Polars a morself is currently 100k rows
Elementwise nodes like filter and format are 1-task-per-thread that loops over input. Nodes are physically connected with 1-element channels.
Zip nodes instead wait on two futures. Pulls on input, pushesto output
Of note is that zipping is real hard with just pull or just push.
Serial node to Parallel requires a distributor, which is double ended (1 distributed to N distributors). Uses consume token to prevent task stealing [I think]. Global effect, not an actor model
#QConLondonParallel to Serial requires linearizer, which so simple he explained it all while I was typing
Example of cumsum: each morsel is summed independently, and then map-summed with previous morsel's last value. Possible to do without Async/Await, but adding "streaming" constraint makes really hard
Alternative architectures for fixed pipeline: fork-join [which I guess are not waitgroups?], divide-conquer, background processing, ad-hoc interleaving/chaining.
AA downsides: Deadlocks, more state/context, hard to mix sync and async, tooling isn't great (flat callstacks), hard onboarding
----
"AI Driven Game Creation", Danielle An,
#QConLondon
AI is changing how games are being made. Going to show demos of breakthroughs vibecoded in last week. People will play demo live. Then, all the new problems we got.
[Screen font is real small, may not be able to read everything]
Videogames is a bigger industry than film or music. AI is changing things: everybody is now vibecoding games. Concept art and 3D assets went from taking weeks to being incredibly cheap.
Live demo time! 4 person drawing game. AI/Gemini judged the drawings
[Took a while to work, demo had problems]
Live demo 2: multiplayer game where the NPCs are generated from the prompts, affecting model, physical attributes, personality, and abilities
Demo 3: turning MSPaint graphic maps into nicer looking levels
Demo 4: Crowd going to dynamically update the talk slides as it goes along [???]
[Screen vibrating like crazy]
New kind of game: where AI is integrated into game at all layers, making unique and unpredictable experiences. Raises new problems
Making games extremely high risk. Now engineers no longer blocked by artists, artists not blocked by programmers. Iteration
#QConLondonSmall teams can bring out projects in days or weeks. But still a lot of hard work. Agents are nondeterministic, meaning can't provide consistent experience for players. Update to the LLM can introduce regressions if diff behavior.
Instead of `work -> ship`, now `work -> ship -> monitor -> work`
Vibecoding changes how teams work. When 20 people all vibing one codebase, individual features are cool and the end-to-end system is a mess. Engineers need to BOTH make highly parallel work happen, but still all integrate correctly at the end.
[Can't read slides at all, font too small]
LLMs add huge latency to player experience, which is not fun. LLMs add a lot more breakages to code. Need tons and tons of unit tests
Working in an AI-native way changes the team dynamic. Less blockers, faster iteration. Dissolves division between "junior" and "senior" engineer, vs AI-comfort
LLMs makes code so cheap you can use duplication as a feature, not a bug.
Still need to make a scalable system. make players and creators happy.
Surprising issue: engineers burning $30k on tokens.
Seeing a lot more prototyping of *board games*, interestingly enough
#QConLondon-----
"Automatically Retrofitting JIT Compilers",
@[email protected],
#QConLondon
About taking existing language implementations and automatically generating just-in-time compilers for improved performance.
Demoing a Mandelbrot in Lua, which takes 3.2 seconds, on standard impl
Created `yklua`, lua with JIT, and reran the same thing. Got 0.8 seconds, 4x faster.
"You can bet I cherrypicked this example rotten".
Now running micropython benchmark, 15 seconds. `ykmicropython`, which took about ten days of work, is 2x faster.
Definitions:
- VM: system with ≥1 language implementation
- Interpreter: "simple language implementation
- JIT compiler: Impl that observes running program and figures out optimization.
Why this project? "People go from 'not caring about performance' to 'it's an existential crisis' in 24 hours"
Often you can eke out some extra performance by dropping in a faster language implementation. Pypy is 3-4x faster than CPython
...There are at least 16 JIT compilers for Python. Almost all are dead.
JITs are *hard*. And expensive. And often incompatible with mainstream implementations
#QConLondonJITs are optimizations, so have to embed assumptions in the language to make them faster. So when the language evolves, the JIT gets left behind. LuaJIT is several versions behind standard Lua.
Can we automatically derive JITs?
Most such languages have C interpreters. That's the source of truth.