A future needs a `pool`, which takes a waker and outputs a Poll enum {Ready(T), Pending}. Waker has an unsafe `new` and a wake function.
`async fn` transforms fn from returning val to returning a future. Using example of `factorial`, which is a "bad idea only for demonstration"
Async/Await effectively abstracts away state machines.
By Rust convention, the executor has `spawn` as an entry point, which takes a future and returns a JoinHandle future. This provides parallelism. Can implement mutexes, semaphores, channels, barriers. Also waitgroups and joinsets
#QConLondon[I think Waitgroups are fork-join and joinsets are "first-to-return"]
Demoing how to use AA to make a semaphore, and how it enables bespoke userspace scheduling. Contrast with threads, which require OS APIs.
PROBLEM: blocking the executor. If all threads are busy, I/O can't progress.
That problem is why people say not to mix AA and computation, and-or use `spawn_blocking`, which stops async until the blocking world is gone.
This sucks! Easy solution: use two executors! One for I/O, one for CPU.
Used this approach for data processing @ polars (dataframe library)
For streaming operations (data flows in, processed, flows out), use Tokio for async I/O, and a custom executor for CPU scheduling. (Admitted unecessary, Tokio can spawn two execs) Fixed pipelines with channels, similar to actor system.
#QConLondonExample query: "get sales, parse dates to date types, get cumulative sum of sales, and filter by [weekday]?"
Splits work into low-priority (data inflow) and high-priority (thread-local queues). LIFO best for throughput, as it keeps things in cache. Threads can steal all-but-last tasks
Data is split into small "morsels". In Polars a morself is currently 100k rows
Elementwise nodes like filter and format are 1-task-per-thread that loops over input. Nodes are physically connected with 1-element channels.
Zip nodes instead wait on two futures. Pulls on input, pushesto output
Of note is that zipping is real hard with just pull or just push.
Serial node to Parallel requires a distributor, which is double ended (1 distributed to N distributors). Uses consume token to prevent task stealing [I think]. Global effect, not an actor model
#QConLondonParallel to Serial requires linearizer, which so simple he explained it all while I was typing
Example of cumsum: each morsel is summed independently, and then map-summed with previous morsel's last value. Possible to do without Async/Await, but adding "streaming" constraint makes really hard
Alternative architectures for fixed pipeline: fork-join [which I guess are not waitgroups?], divide-conquer, background processing, ad-hoc interleaving/chaining.
AA downsides: Deadlocks, more state/context, hard to mix sync and async, tooling isn't great (flat callstacks), hard onboarding
----
"AI Driven Game Creation", Danielle An,
#QConLondon
AI is changing how games are being made. Going to show demos of breakthroughs vibecoded in last week. People will play demo live. Then, all the new problems we got.
[Screen font is real small, may not be able to read everything]
Videogames is a bigger industry than film or music. AI is changing things: everybody is now vibecoding games. Concept art and 3D assets went from taking weeks to being incredibly cheap.
Live demo time! 4 person drawing game. AI/Gemini judged the drawings
[Took a while to work, demo had problems]
Live demo 2: multiplayer game where the NPCs are generated from the prompts, affecting model, physical attributes, personality, and abilities
Demo 3: turning MSPaint graphic maps into nicer looking levels
Demo 4: Crowd going to dynamically update the talk slides as it goes along [???]
[Screen vibrating like crazy]
New kind of game: where AI is integrated into game at all layers, making unique and unpredictable experiences. Raises new problems
Making games extremely high risk. Now engineers no longer blocked by artists, artists not blocked by programmers. Iteration
#QConLondonSmall teams can bring out projects in days or weeks. But still a lot of hard work. Agents are nondeterministic, meaning can't provide consistent experience for players. Update to the LLM can introduce regressions if diff behavior.
Instead of `work -> ship`, now `work -> ship -> monitor -> work`
Vibecoding changes how teams work. When 20 people all vibing one codebase, individual features are cool and the end-to-end system is a mess. Engineers need to BOTH make highly parallel work happen, but still all integrate correctly at the end.
[Can't read slides at all, font too small]
LLMs add huge latency to player experience, which is not fun. LLMs add a lot more breakages to code. Need tons and tons of unit tests
Working in an AI-native way changes the team dynamic. Less blockers, faster iteration. Dissolves division between "junior" and "senior" engineer, vs AI-comfort
LLMs makes code so cheap you can use duplication as a feature, not a bug.
Still need to make a scalable system. make players and creators happy.
Surprising issue: engineers burning $30k on tokens.
Seeing a lot more prototyping of *board games*, interestingly enough
#QConLondon-----
"Automatically Retrofitting JIT Compilers",
@[email protected],
#QConLondon
About taking existing language implementations and automatically generating just-in-time compilers for improved performance.
Demoing a Mandelbrot in Lua, which takes 3.2 seconds, on standard impl
Created `yklua`, lua with JIT, and reran the same thing. Got 0.8 seconds, 4x faster.
"You can bet I cherrypicked this example rotten".
Now running micropython benchmark, 15 seconds. `ykmicropython`, which took about ten days of work, is 2x faster.
Definitions:
- VM: system with ≥1 language implementation
- Interpreter: "simple language implementation
- JIT compiler: Impl that observes running program and figures out optimization.
Why this project? "People go from 'not caring about performance' to 'it's an existential crisis' in 24 hours"
Often you can eke out some extra performance by dropping in a faster language implementation. Pypy is 3-4x faster than CPython
...There are at least 16 JIT compilers for Python. Almost all are dead.
JITs are *hard*. And expensive. And often incompatible with mainstream implementations
#QConLondonJITs are optimizations, so have to embed assumptions in the language to make them faster. So when the language evolves, the JIT gets left behind. LuaJIT is several versions behind standard Lua.
Can we automatically derive JITs?
Most such languages have C interpreters. That's the source of truth.
The specification effectively becomes "the JIT must have the same semantics of the C interpreter" in order to be compatible.
So we got to "generate a meta-tracing JIT compiler from C interpreters".
"Meta makes me nervous. I have to understand the thing and the metathing"
Tracing: manually record hot loops at run-time
Meta-tracing: record the interpreter executing loops at run-time.
"This is so weird I will look at this from a couple of different directions and hope one makes sense to you."
1: C is AOT (ahead of time). Compiled with ykllvm to make Exe
#QConLondonThen at runtime, if a loop happens in interpreter enough to become "hot", trace it. Start compiling it to get a machine code version. When done, hand it back to the interpreter. At some point might need to "decompile".
2: interpreter is a while loop with a `switch GET_OPCODE`
Say we have `OP_JLE` (jump if less than or equal to 0). If a hot loop has `if x (y += 3)`. Tracing saves the op codes LOOKUP(x), JEQ, LOOKUP(y), etc. Metatracing would be saving the c code that processes the opcodes (I think?)
[Technical aspects of how tracing is actually done in yk]
So how does yk optimize a programming:
1. Inlining
2. Standard(ish) compiler optimisations
3. Interpreter hints, like "this function is idempotent". Hints like these are why yklua was so much faster than lua
Now looking at Lua's OP_ADDI. Take a program that increments by 64 500k times
#QConLondonAfter running program, looking at trace. Normally lua would do x +=64 as three opcode instructions, yklua converted it to a single x64 instruction.
How did it work? Because we added an interpreter annotation that 64 never changed.
Tricky problem: how do we get out of the optimized hot loop?
Ie back from native instructions to AOT code. Involves things like safepoints and stackmaps and shadow stacks. Goal is to abstract away "being in AOT" and "being in JIT", it's transparent.
Problem: shadow stack is overhead, stackmaps are a "less loved" feature of LLVM and not as well supported
now a weird JIT trap:
```
while i > 0 do
x = ...
i = i - 1
end
print(x)
```
if i starts < 0, possible to start tracing but then not stop it, and then trace way past the useful point.
#QConLondonyk is not production ready, but way past research project level. So what's next?
1. More LLVIM IR
2. More Optimizations
3. More interpreters (WIP micropython, hopefully CPython soon)
www.github.com/ykjit/ykGitHub - ykjit/yk: yk packages
yk packages. Contribute to ykjit/yk development by creating an account on GitHub.
GitHub----
Closing keynote: "The Free-Lunch Guide to Idea Circularity",
@[email protected],
#QConLondon
In 1858, Thames was an open-air sewer, and a hot summer lead to the "great stink". Parliament couldn't use their new building. Invested a lot of money in embankments and pumping stations
The fundamental problem was a scaling problem: the Thames didn't have enough throughput ("pooput") to carry away the waste.
This was a consequence of prev architectural decision: eliminating London cesspits (area next to house to collect waste). City added plumbing from houses to Thames.
"It was a fundamental tradeoff between centralized stink and disease, and distributed stink and disease"
These kinds of "good ideas that lead to bad consequences" keep coming around. This is because there's a constant tradeoff between "optimize for short term" and "being sustainable".
This matters b/c the digital world creates more carbon emissions than aviation. Data centers (w/o network traffic) use about as much electricity as South Korea.
Green energy helps, but it can't be the whole solution. We also need to reduce tech energy consumption
#QConLondonTwo topics:
1. LightSwitchOps (LSO)
2. Efficient Software
LSO: Architect things to be turned off and on often.
Efficiency: works on "Quarkus", Java runtime with higher throughput and much lower carbon footprint.
How does it work? Java normally loves delaying stuff to runtime with duck-typing.
Extremely dynamic runtime (with static types) makes sense on local computers with changing environments, but that's not how the cloud works. Containers are fixed envs
We shifted hard from monoliths to distributed microservices architectures. More resilient, but higher latency and more complex.
Now we're seeing enormous investments in centralized AI. Mac is instead licensing AIs for a "fraction of the cost it would take to run a datacenter", and instead putting AI capabilities in its hardware. Then it sells it to us and we can run AI computations on our decentralized hardware
#QConLondonAWS prime video reduced costs by 90% by moving from microservices to a monolith. "What does it tell us? The lunch was not free."
"Hype is a necessary ingredient of the current business ecosystem of the tech industry." - Meredith Whittaker
AI leaders say insane things about AI to attract VC money. Goal of most companies is to get to an exit and cash out.
"What attracks investment? You'd think stability, profitability, revenue." But it's actually growth and excitement.
That's why AI promises a "world without developers"