I haven't seen anybody mentioning it or even noticing it, like it's just the water we swim in now, but this month marks the fiftieth anniversary of the release of what would become a seminal, and is arguably the single most important, piece of social software ever created.

Written by Douglas McIlroy and James Hunt and released with the 5th Edition of Unix this month in 1974: diff.

https://minnie.tuhs.org/cgi-bin/utree.pl?file=V5/usr/source/s1/diff1.c

My friend @gvwilson has argued, and I am absolutely ready to believe, that you can divide the entire computational universe into "has diff and patch or doesn't", and that people living without it don't even have the language to recognize how bad they've got it, how many opportunities to share and collaborate have been silently denied them.

Word processors, spreadsheets, slides? "Track changes" is _trash_ by comparison. No programmer would consent to live the way we make office workers live.

I wrote a bit more about it here but:

https://exple.tive.org/blarg/2024/06/14/fifty-years-of-diff-and-merge/

I have a theory that you can tell if a technology belongs to the people if two strangers can exchange patches, needing nobody's consent but their own.

Licensing, infrastructure, everything else subordinate to this. If you can't do this, that technology is proprietary.

Fifty Years Of Diff | blarg

@mhoye @gnomon This only works if the technology is something the strangers understand deeply - otherwise the tech will quickly become overrun by bad actors.

Almost no one deeply understands all the tech they use on a daily basis - that would require a lifetime of study. This would suggest that there is an important place for proprietary technology, no?

@mhoye this is a really neat rubric. we like it a lot. we wouldn't make it our sole pillar personally, but it's a really important one.
@mhoye @gvwilson I'm sure you're both familiar with Ink & Switch who are trying to solve this problem of “Version control for everything”: https://www.inkandswitch.com/patchwork/notebook/
Patchwork: Version control for everything

Patchwork is a research project about version control software for writers, developers, and other creatives. This lab notebook contains snippets of our prototypes and findings.

@wjt @mhoye @gvwilson was going to share the same! The local first movement and the use of “CRDTs” gives structured data and merge - and merge just works because it’s semantic - often no reconciliation necessary.
@mhoye @gvwilson Getting hold of CVS was such an improvement in how I worked.
@mhoye @gvwilson I have been working with digital humanities scholars for the past week, and I feel boiling outrage about people not having been provided with the context/knowledge to do the kind of quick and dirty work with text corpora I can do so fluidly on the *nix command line.

@yvonnezlam @mhoye @gvwilson Oh.

A Shakespeare concordance is a pretty common primer exercise for working with words and sequences and co-occurrences, and the command line already has most of the pieces right there.

@mhoye @gvwilson

Hmmmm.... I use diff and merge all the time when I code, but I've never found them very useful for dealing with large prose documents. To the extent that when I latex, I do it in overleaf and use its track changes function.

@mhoye @gvwilson that's a bit of a "say you've never worked with legal documents without ever saying ..." admission. Line-based diffs might be okay for code, but are no use for contracts.
@scruss @gvwilson I don't buy that; any diff/patch pair is necessarily domain dependent, not necessarily wire-format dependent. There's no reason "contract diff" and "contract patch" couldn't understand the necessities of contracts vs bare text.
@mhoye @scruss apologies if I'm mis-remembering a long-ago lecture, but I believe the ability to report semantically meaningful changes was one of the motivations for development of GML (the predecessor of SGML). I have seen people pull up contract changes in Lexis Nexis, which I believe uses some derivative of SGML as a storage format (?).
@mhoye @scruss @gvwilson The history section of the Wikipedia article on SGML makes the same comment. One of my first paid programming jobs was to make use of an undocumented SGML parser that was also buggy. That was a hard summer
@gvwilson @mhoye @scruss this is the sort of thing @timbray might know…
@irvingreid @gvwilson @mhoye @scruss Having recently invested a large amount of work in the construction and editing of legal documents as part of an expert-witness gig working for Uncle Sam, I have opinions. Yes, the legal profession would benefit immensely from a diffing capability that worked as well as what developers have relied on for decades. Yes, one reason the developer tools are so slick is lines-of-text semantics.
Unfortunately the legal profession seems locked in to MS Office.
@timbray @irvingreid @gvwilson @mhoye @scruss I'm surprised. I thought legal was the last holdout for WordPerfect?

@timbray @irvingreid @gvwilson @mhoye Thanks, Tim. Yes, it's all Word, which became lots of (not) fun c.2016 when older versions produced documents that infected others with bad formatting. Manual retyping was the only fix

On top of Word, there are unbelievably expensive plugins for Adobe Acrobat Pro for legal PDF management. They like their pages.

(let's hope no-one gets billed $300 / 15 minutes of senior partner time for lawyers learning the never-fit-for-purpose git)

@timbray @irvingreid @gvwilson @mhoye @scruss

PDFs (of Word documents) is the way they are bundled and transmitted between firms, courts etc. This has its own costs because, especially if they are being disclosed, redactions have to be in PDFs and that's a terrible process.

@simon_lucy it sure is! When I found out that the only safe way to redact #PDF was to have the whole page render as a bitmap, a little bit of me died inside

@mhoye @gvwilson

can diff/patch:
1) attach comments to each change, identifying who made the comment and when?

2) show which changes have been accepted by the counterparty?

3) ensure (possibly even enforce) that both parties are working from the same standard document?

These are all things that legal/engineering document exchange have relied upon for decades, if not centuries

@scruss @mhoye @gvwilson I am "unencumbered by detailed knowledge" :-) about this but I believe the answer is yes. or if not yes one of the 8 million :-) version control systems other than git or mercurial could do this! Hopefully @timbray would know more?!?
@mhoye @gvwilson I always think of "how can I do diffs on this document" and realize office software is awfully bad at that.

@mhoye @gvwilson

Watching friends and family deal with versions of documents, I am immensely frustrated for them. They don't know what they are missing. I feel like an alien who can't communicate the power of what I have: that I have every incremental version of my work going back years.

@mhoye @gvwilson
While I agree “No programmer would consent to live the way we make office workers live” it’s also true that no office worker would tolerate the way we make programmers work: obscure command-line tools that take months to learn and offer many ways to shoot your foot off.
@mhoye @gvwilson Not just computational. Any kind of document management. Instead of the diff format, people like solicitors and politicians deal with crap like 'In the 4th paragraph, before the word "cake", insert the word "delicious"'.
@mhoye I got curious about diff3 and its part in merging and got a very generous reply from Doug McIlroy himself about some of the history, specifically about Paul Jensen's role.
Where did source code merging come from?

The concept of merging changes is central to modern source code control. Specifically the technology of a three way merge, like diff3, which allows two different developers to change the same file …

Nelson's log

@mhoye

I have been thinking about «What would be the commands for a "History of Unix in N commands"?» essay or book. I'm not enough of a historian to be sure what to include, but now "diff" is in my mind as earlier than I knew.

@mhoye No doubt diff is crucial, but the social aspect lifted off only with Larry Wall's patch(1) from 1985. (I think merge only came with RCS?)
@mhoye Larry Wall's patch probably belongs in the conversation, too. Patch made it (relatively) easy to maintain a fork with local changes while also tracking upstream over time. Still absolutely vital for packaging infrastructure on all major distros. Though I guess it all leads to where we are today, with git providing a superset of all the old tools.
@mhoye thanks for posting. I'd forgotten how, er, intuitive largely uncommented C is.
@mhoye
Bell Labs Piscataway had ~1000 software+documentation people.
In 1973, docs were done on typewriters.
We moved that to PWB/UNIX systems, using MM macros, ~1975.
Updates to software documents typically marked deletions with * & changes with change bars… via manually inserted commands. I said “diff” & wrote original diffmk(1), trivial command to automate.
Thanks Doug & James! Sadly, latter (who was in same lab with me ~1979-83, brilliant, also nice) died recently:
https://en.m.wikipedia.org/wiki/James_W._Hunt
James W. Hunt - Wikipedia

@mhoye Remember when sometimes "diff" would give up and emit the single word "Jackpot"?
@mhoye People don't comment code like this anymore, this source file is amazing, the algorithm is described in detail. I see modern day Rust code without a single comment, it's as if the programmers thinks the code is self-documenting and/or nobody else is ever going to look at it and try to understand what it does.
@mok0 @mhoye True, the explanation at the top is really well detailed, but on the other hand the code below is a pile of opaque, single-letter variables and weird abbreviations, as if a math research paper got transcribed directly into C.
@mhoye while diff / patch are important.. there are things that predate them. This is creating hero’s where none are needed.
Fifty Years Of Diff | blarg