Wandering down a side road that me to "the original paper" for a topic:
https://blog.zarfhome.com/2025/02/wanderings-diff
People should do this more often, is my conclusion.
Wandering down a side road that me to "the original paper" for a topic:
https://blog.zarfhome.com/2025/02/wanderings-diff
People should do this more often, is my conclusion.
@zarfeblong SCCS hit 50 years of age recently and there's a paper from the original author on the "weave" format here:
https://www.mrochkind.com/mrochkind/docs/SCCSretro2.pdf
There's a lot of discussion of this on the Unix History mailing list last December: https://www.tuhs.org/pipermail/tuhs/2024-December/ - I know this isn't what you are doing but to my superficial view this appears to relate.
@zarfeblong A different off-the-shelf option (and a different algorithm to explore, too, in its documentation), if you stick with Python, Python is one of the few languages I know with a generalized diff tool in the standard library: https://docs.python.org/3/library/difflib.html
(Years ago I glued pygments and difflib together to prove to my own satisfaction that diffs of syntax highlighting token streams are better character-level diffs in speed and usefulness. https://github.com/WorldMaker/tokdiff)
@max Thanks, I had forgotten about that.
I am definitely not sticking with Python -- if this turns into an actual game, it will be on iOS or Unity or Godot or something, in whichever native language. But more examples is good, of course.
@cscott @zarfeblong Back when I was playing with Python’s difflib I think I also did a spike turning trees into a tuple stream something like (parent, contents) and the output was fine enough.
I arrived at token stream diffing for a number of reasons that “well formed ASTs/trees” were an interesting red herring to good source control diffs. You generally want to be able to source control work in progress as well and syntax highlighting token streams are good at that.