GDC 2015: "Automated Testing and Instant Replays in Retro City Rampage" by Brian Provinciano of Vblank Entertainment https://gdcvault.com/play/1021825/Automated-Testing-and-Instant-Replays

The gist of this presentation was that the speaker got a lot of mileage out of recording button inputs for record/replay functionality. They spent most of their time describing things you could do with it:

- Automated QA
- Automated testing
- Kiosk mode
- Bug reproducibility
- "Ghost mode" for racing games
- Benchmarking
- Trailer creation

1/4

Automated Testing and Instant Replays in Retro City Rampage

Every new build of your game could automatically play itself from the first level through the final boss, at the push of a button. Your community could record and share instant replays on leaderboards with mere kilobytes of data. Bugs could become...

They spent a few minutes discussing determinism - this approach only works if your engine is deterministic. They listed a few techniques to improve determinism: use explicit random sequence algorithms (rather than rand()), use fixed-point instead of floating point, don't use OS audio callbacks but instead just assume when you think the callback would fire (which to me sounds like a bad idea, but ok), use lookup tables instead of transcendental functions, etc.

2/4

They left my biggest issue unaddressed, though, which is: doesn't every modification you make to the game cause all of your recordings to become meaningless? The speaker mentioned that you can, like, have each recording indicate the version of the game it was made with, and then the live game can simulate bugs that were in that previous version... but this approach doesn't sound scalable to me.

3/4

I guess this system becomes more useful near the end of the project, when you aren't messing with the game logic much.

Review: 4/10 IMO, the "record button presses" approach works for smaller games/engines, but falls down in larger games/engines for practicality reasons.

There have been previous GDC presentations that describe a more semantic record/replay system (for testing MMOs), where each subsystem has an API, consisting of the nouns and verbs that the subsystem interacts with.

(I realize that maintaining a separate API is a substantial amount of additional work, but I actually think for larger projects you're going to want this anyway, both for encapsulation and for testing. So I guess that's another reason why I think the button-based record/replay system works better for smaller games than larger games - because the additional work for a semantic API has a larger proportional cost the smaller the game is.)
From what I hear, though, a counterexample to my argument is Starcraft 2, which (from what I hear) uses the button record/replay system. From what I hear, they actually ship multiple versions of the Starcraft 2 engine inside the game, so that old recordings can be replayed with the old engine. Which seems kind of crazy to me. But they are a very successful shipping game! So, what do I know, right?
@GDCPresoReviews well the executable with all the game code in it is a pathetically miniscule part of the game files so there might not be that much overhead to doing it that way 🤔
@Foxwarrior yeah, I’m mostly thinking about cognitive burden on the development team rather than runtime performance though