Wandering down a side road that me to "the original paper" for a topic:

https://blog.zarfhome.com/2025/02/wanderings-diff

People should do this more often, is my conclusion.

Wanderings in the realm of diff

So I was thinking about a text game idea where paragraphs of text change between a handful of possibilities as you make choices. Sort of a cycling choices effect. But maybe instead of just replacing one block of text with another, I could do ...

Zarf Updates
Academics in my orbit will blink at me like superb owls, but for the average code monkey, it's not so obvious.
@zarfeblong this practice was hammered into us in my history and classics coursework and I still do it all the time. it's just so useful!
@zarfeblong you might find the references here interesting (besides Myers someone called Heckel from 1978). I got there because Apple’s UIKit has a diffing algorithm that handles moves, but it’s not open-source, and googling for replacements turned this one up.
@zarfeblong oh dear. Adding the url and not just saying “here” helps: https://github.com/ra1028/DifferenceKit
GitHub - ra1028/DifferenceKit: 💻 A fast and flexible O(n) difference algorithm framework for Swift collection.

💻 A fast and flexible O(n) difference algorithm framework for Swift collection. - ra1028/DifferenceKit

GitHub

@zarfeblong SCCS hit 50 years of age recently and there's a paper from the original author on the "weave" format here:
https://www.mrochkind.com/mrochkind/docs/SCCSretro2.pdf

There's a lot of discussion of this on the Unix History mailing list last December: https://www.tuhs.org/pipermail/tuhs/2024-December/ - I know this isn't what you are doing but to my superficial view this appears to relate.

@zarfeblong A different off-the-shelf option (and a different algorithm to explore, too, in its documentation), if you stick with Python, Python is one of the few languages I know with a generalized diff tool in the standard library: https://docs.python.org/3/library/difflib.html

(Years ago I glued pygments and difflib together to prove to my own satisfaction that diffs of syntax highlighting token streams are better character-level diffs in speed and usefulness. https://github.com/WorldMaker/tokdiff)

difflib — Helpers for computing deltas

Source code: Lib/difflib.py This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce information about file differences i...

Python documentation

@max Thanks, I had forgotten about that.

I am definitely not sticking with Python -- if this turns into an actual game, it will be on iOS or Unity or Godot or something, in whichever native language. But more examples is good, of course.

@zarfeblong The 80s paper the documentation comments in difflib links to are also useful because it’s a hash-based algorithm which makes it really to implement with all sorts of crazy hashable data structures and also most of the good move-detection algorithms I’ve seen are hash-based. Hash-based approaches trade off that robustness by being generally “slower” in O-notation, but not necessarily in practice, esp. with move detection.
@max @zarfeblong oh that might be useful for another purpose -- we need tree-oriented diff algorithms for HTML, and I'm guessing that having hashable subtrees might make that work. I'll have to check that out. (I've been surprised that tree-oriented diff algorithms aren't more common, but maybe it's just my literature search skills which are lacking.)

@cscott @zarfeblong Back when I was playing with Python’s difflib I think I also did a spike turning trees into a tuple stream something like (parent, contents) and the output was fine enough.

I arrived at token stream diffing for a number of reasons that “well formed ASTs/trees” were an interesting red herring to good source control diffs. You generally want to be able to source control work in progress as well and syntax highlighting token streams are good at that.