Wandering down a side road that me to "the original paper" for a topic:

https://blog.zarfhome.com/2025/02/wanderings-diff

People should do this more often, is my conclusion.

Wanderings in the realm of diff

So I was thinking about a text game idea where paragraphs of text change between a handful of possibilities as you make choices. Sort of a cycling choices effect. But maybe instead of just replacing one block of text with another, I could do ...

Zarf Updates

@zarfeblong A different off-the-shelf option (and a different algorithm to explore, too, in its documentation), if you stick with Python, Python is one of the few languages I know with a generalized diff tool in the standard library: https://docs.python.org/3/library/difflib.html

(Years ago I glued pygments and difflib together to prove to my own satisfaction that diffs of syntax highlighting token streams are better character-level diffs in speed and usefulness. https://github.com/WorldMaker/tokdiff)

difflib — Helpers for computing deltas

Source code: Lib/difflib.py This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce information about file differences i...

Python documentation

@max Thanks, I had forgotten about that.

I am definitely not sticking with Python -- if this turns into an actual game, it will be on iOS or Unity or Godot or something, in whichever native language. But more examples is good, of course.

@zarfeblong The 80s paper the documentation comments in difflib links to are also useful because it’s a hash-based algorithm which makes it really to implement with all sorts of crazy hashable data structures and also most of the good move-detection algorithms I’ve seen are hash-based. Hash-based approaches trade off that robustness by being generally “slower” in O-notation, but not necessarily in practice, esp. with move detection.
@max @zarfeblong oh that might be useful for another purpose -- we need tree-oriented diff algorithms for HTML, and I'm guessing that having hashable subtrees might make that work. I'll have to check that out. (I've been surprised that tree-oriented diff algorithms aren't more common, but maybe it's just my literature search skills which are lacking.)

@cscott @zarfeblong Back when I was playing with Python’s difflib I think I also did a spike turning trees into a tuple stream something like (parent, contents) and the output was fine enough.

I arrived at token stream diffing for a number of reasons that “well formed ASTs/trees” were an interesting red herring to good source control diffs. You generally want to be able to source control work in progress as well and syntax highlighting token streams are good at that.