Mastodawn

c17r

The Future of Version Control

https://bramcohen.com/p/manyana

Manyana

A Coherent Vision for the Future of Version Control

Bram’s Thoughts

Show thread

bos 5d ago

This is sort of a revival and elaboration of some of Bram’s ideas from Codeville, an earlier effort that dates back to the early 2000s Cambrian explosion of DVCS.

Codeville also used a weave for storage and merge, a concept that originated with SCCS (and thence into Teamware and BitKeeper).

Codeville predates the introduction of CRDTs by almost a decade, and at least on the face of it the two concepts seem like a natural fit.

It was always kind of difficult to argue that weaves produced unambiguously better merge results (and more limited conflicts) than the more heuristically driven approaches of git, Mercurial, et al, because the edit histories required to produce test cases were difficult (at least for me) to reason about.

I like that Bram hasn’t let go of the problem, and is still trying out new ideas in the space.

Show thread

dboreham 5d ago

Note that CRDT isn't "a thing". The CRDT paper provides a way to think about and analyze eventually consistent replication mechanisms. So CRDTs weren't "introduced", only the "CRDT way of discussing replication". Every concrete mechanism described in the CRDT paper is very old, widely used for decades beforehand.

This means that everything that implements eventual consistency (including Git) is using "a CRDT".

Show thread

hrmtst93837 5d ago

If you stretch "CRDT" to mean any old eventually consistent thing, almost every Unix tool morphs into one under a loose enough definition. That makes the term much less useful, because practical CRDTs in 2024 usually mean opaque merge semantics, awkward failure modes, and operational complexity that has very little in common with the ancient algorithms people point at when they say "Git is a CRDT too". "Just Git" is doing a lot of work there.

Show thread

gritzko 5d ago

In 2007 Bram said to me that my Causal Tree algorithm is a variant of weave. Which is broadly correct. In these 20 years, the family of weave-class algos grew quite big. In my 2020 article, I devoted the intro to making their family portrait https://arxiv.org/abs/2002.09511 Could have been a separate article.

Chronofold: a data structure for versioned text

Chronofold is a replicated data structure for versioned text. It is designed for use in collaborative editors and revision control systems. Past models of this kind either retrofitted local linear orders to a distributed system (the OT approach) or employed distributed data models locally (the CRDT approach). That caused either extreme fragility in a distributed setting or egregious overheads in local use. Overall, that local/distributed impedance mismatch is cognitively taxing and causes lots of complexity. We solve that by using subjective linear orders locally at each replica, while inter-replica communication uses a distributed model. A separate translation layer insulates local data structures from the distributed environment. We modify the Lamport timestamping scheme to make that translation as trivial as possible. We believe our approach has applications beyond the domain of collaborative editing.

arXiv.org

Show thread

radarsat1 5d ago

Is it a good thing to have merges that never fail? Often a merge failure indicates a semantic conflict, not just "two changes in the same place". You want to be aware of and forced to manually deal with such cases.

I assume the proposed system addresses it somehow but I don't see it in my quick read of this.

Show thread

recursivecaveat 5d ago

It says that merges that involve overlap get flagged to the user. I don't think that's much more than a defaults difference to git really. You could have a version of git that just warns on conflict and blindly concats the sides.

Show thread

hungryhobbit 5d ago

They address this; it's not that they don't fail, in practice...

the key insight is that changes should be flagged as conflicting when they touch each other, giving you informative conflict presentation on top of a system which never actually fails.

Show thread

bigfishrunning 5d ago

Isn't that how the current systems work though? Git inserts conflict markers in the file, and then emacs (or whatever editor) highlights them

The big red block seems the same as "flagged", unless I'm misunderstanding something

Show thread

rectang 5d ago

I agree. Nevertheless I wonder if this approach can help with certain other places where Git sometimes struggles, such as whether or not two commits which have identical diffs but different parents should be considered equivalent.

In the general case, such commits cannot be considered the same — consider a commit which flips a boolean that one branch had flipped in another file. But there are common cases where the commits should be considered equivalent, such as many rebased branches. Can the CRDT approach help with e.g. deciding that `git branch -d BRANCH` should succeed when a rebased version of BRANCH has been merged?

Show thread

gojomo 5d ago

Should you be counting on confusion of an underpowered text-merge to catch such problems?

It'll fire on merge issues that aren't code problems under a smarter merge, while also missing all the things that merge OK but introduce deeper issues.

Post-merge syntax checks are better for that purpose.

And imminently: agent-based sanity-checks of preserved intent – operating on a logically-whole result file, without merge-tool cruft. Perhaps at higher intensity when line-overlaps – or even more-meaningful hints of cross-purposes – are present.

Show thread

skydhash 5d ago

> It'll fire on merge issues that aren't code problems under a smarter merge, while also missing all the things that merge OK but introduce deeper issues.

That has not been my experience at all. The changes you introduced is your responsibility. If you synchronizes your working tree to the source of truth, you need to evaluate your patch again whether it introduces conflict or not. In this case a conflict is a nice signal to know where someone has interacted with files you've touched and possibly change their semantics. The pros are substantial, and it's quite easy to resolve conflicts that's only due to syntastic changes (whitespace, formatting, equivalent statement,...)

Show thread

gojomo 5d ago

If you're relying on a serialized 'source of truth', against which everyone must independently ensure their changes sanely apply in isolation, the. you've already resigned yourself to a single-threaded process that's slower than what improved merges aim to enable.

Sure, that works – like having one (rare, expensive) savant engineer apply & review everything in a linear canonical order. But that's not as competitive & scalable as flows more tolerant of many independent coders/agents.

Show thread

yammosk 5d ago

And yet after all these year of git supporting no source of truth we still fall back on it. As long as you have an authoritative version and authoritative release then you have one source of truth. Linus imagined everyone contributing with no central authority and yet we look to GitHub and Gitlab to centralize our code. Git is already decentralized and generally we find it impractical.

Show thread

jwilliams 5d ago

Indeed. And plenty of successful merges end up with code that won't compile.

FWIW I've struggled to get AI tools to handle merge conflicts well (especially rebase) for the same underlying reason.

Show thread

layer8 5d ago

Code not compiling is still the good case, because you’ll notice before deployment. The dangerous cases are when it does compile.

Show thread

jwilliams 5d ago

Very true.

I realized recently that I've subconsciously routed-around merge conflicts as much as possible. My process has just subtly altered to make them less likely. To the point of which seeing a 3-way merge feels jarring. It's really only taking on AI tools that bought this to my attention.

Show thread

skydhash 5d ago

I'm surprised to see that some people sync their working tree and does not evaluate their patch again (testing and reviewing the assumptions they have made for their changes).

Show thread

ulrikrasmussen 5d ago

The thing about how merges are presented seems orthogonal to how to represent history. I also hate the default in git, but that is why I just use p4merge as a merge tool and get a proper 4-pane merge tool (left, right, common base, merged result) which shows everything needed to figure out why there is a conflict and how to resolve it. I don't understand why you need to switch out the VCS to fix that issue.

Show thread

crote 5d ago

Seconding the use of p4merge for easy-to-use three-pane merging. Just like most other issues with Git, if your merges are painful it's probably due to terrible native UX design - not due to anything conceptually wrong with Git.

Show thread

TacticalCoder 5d ago

Thirding it except I do it from Emacs. Three side-by-side pane with left / common ancestor / right and then below the merge result. By default it's not like that but then it's Emacs so anything is doable. I hacked some elisp code a great many years ago and I've been using it ever since.

No matter the tool, merges should always be presented like that. It's the only presentation that makes sense.

Show thread

MarsIronPI 5d ago

What tool do you use? Does Magit support it natively?

Show thread

skydhash 5d ago

I think you need to enable 3 way merge by default in git's configuration, and both smerge (minor mode for solving conflicts) and ediff (major mode that encompass diff and patch) will pick it up. In the case of the latter you will have 4 panes, one for version A, another for version B, a third for the result C, and the last is the common ancestor of A and B.

Addendum:
I've since long disabled it. A and B changes are enough for me, especially as I rebase instead of merging.

Show thread

jwr 5d ago

Isn't that what ediff does?

Show thread

roryokane 5d ago

Did you know that VS Code added support for the same four-pane view as p4merge years ago? I used p4merge as my merge tool for a long time, but I switched to VS Code when I discovered that, as VS Code’s syntax highlighting and text editing features are much better than p4merge’s.

I also use the merge tool of JetBrains IDEs such as IntelliJ IDEA (https://www.jetbrains.com/help/idea/resolve-conflicts.html#r...) when working in those IDEs. It uses a three-pane view, not a four-pane view, but there is a menu that allows you to easily open a comparison between any two of the four versions of the file in a new window, so I find it similarly efficient.

Resolve Git conflicts | IntelliJ IDEA

IntelliJ IDEA Help

Show thread

roryokane 5d ago

Even if you don’t use p4merge, you can set Git’s merge.conflictStyle config to "diff3" or "zdiff3" (https://git-scm.com/docs/git-config#Documentation/git-config...). If you do that, Git’s conflict markers show the base version as well:

  <<<<<<< left
  ||||||| base
  def calculate(x):
      a = x * 2
      b = a + 1
      return b
  =======
  def calculate(x):
      a = x * 2
      logger.debug(f"a={a}")
      b = a + 1
      return b
  >>>>>>> right

With this configuration, a developer reading the raw conflict markers could infer the same information provided by Manyana’s conflict markers: that the right side added the logging line.

Git - git-config Documentation

Show thread

ktm5j 5d ago

I'm on my phone right now so I'm not going to dig too hard for this, but you can also configure a "merge tool" (or something like that) so you can use Meld or Kompare to make the process easier. This has helped me in a pinch to work out some confusing merge conflicts.

Show thread

newsoftheday 5d ago

I started using Meld years ago and continue to find people who've never heard of it. It's a pretty good tool.

Show thread

psychoslave 5d ago

That still have an issue with the vocabulary. Things like "theirs/our" is still out of touch but it's already better than a loose spatial analogy on some representation of the DAG.

Something like base, that is "common base", looks far more apt to my mind. In the same vein, endogenous/exogenous would be far more precise, or at least aligned with the concern at stake. Maybe "local/alien" might be a less pompous vocabulary to convey the same idea.

Show thread

kungito 5d ago

After 15 years i still cant remember which is which. I get annoyed every time. Maybe I should invest 15 minutes finally to remember properly

Show thread

IgorPartola 5d ago

Let’s see if I get this wrong after 25 years of git:

ours means what is in my local codebase.

theirs means what is being merged into my local codebase.

I find it best to avoid merge conflicts than to try to resolve them. Strategies that keep branches short lived and frequently merging main into them helps a lot.

Show thread

marcellus23 5d ago

That's kind of the simplest case, though, where "theirs" and "ours" makes obvious sense.

What if I'm rebasing a branch onto another? Is "ours" the branch being rebased, or the other one? Or if I'm applying a stash?

Show thread

IgorPartola 5d ago

> What if I'm rebasing a branch onto another?

Just checkout the branch you are merging/rebasing into before doing it.

> Or if I'm applying a stash?

The stash is in that case effectively a remote branch you are merging into your local codebase. ours is your local, theirs is the stash.

Show thread

clktmr 5d ago

The thing is, you'll typically switch to master to merge your own branch. This makes your own branch 'theirs', which is where the confusion comes from.

Show thread

IgorPartola 5d ago

Not me. I typically merge main onto a feature branch where all the conflicts are resolved in a sane way. Then I checkout main and merge the feature branch into it with no conflicts.

As a bonus I can then also merge the feature branch into main as a squash commit, ditching the history of a feature branch for one large commit that implements the feature. There is no point in having half implemented and/or buggy commits from the feature branch clogging up my main history. Nobody should ever need to revert main to that state and if I really really need to look at that particular code commit I can still find it in the feature branch history.

Show thread

afiori 5d ago

iirc ours is always the commit the merge is starting from. the issue is that with a merge your current commit is the merging commit while with a rebase it is reversed.

I suspect that this could be because the rebase command is implemented as a serie of merges/cherry-picks from the target branch.

[dead]

  git checkout mybranch
  git rebase main

Now git takes main and starts cloning (cherry-picking, as you said) commits from mybranch on top of it. From git's viewpoint it's working on top of main, so if a conflict occurs, main is "ours" and mybranch is "theirs". But from your viewpoint you're still on mybranch, and indeed are left on mybranch when the rebase is complete. (It's a different mybranch, of course; once the rebase is completed, git moves mybranch to point to the new (detached) HEAD.) Which makes "ours" and "theirs" exactly the opposite of what the user expects.

Show thread

XorNot 5d ago

Man do I hate this behavior because it would be really some by just using the branch names rather then "ours" and "theirs"

Show thread

awesome_dude 5d ago

This is one of my pain points, and one time I googled and got the real answer (which is why it's such a pain point).

That answer is "It depends on the context"

> The reason the "ours" and "theirs" notions get swapped around during rebase is that rebase works by doing a series of cherry-picks, into an anonymous branch (detached HEAD mode). The target branch is the anonymous branch, and the merge-from branch is your original (pre-rebase) branch: so "--ours" means the anonymous one rebase is building while "--theirs" means "our branch being rebased".[0]

[0] https://stackoverflow.com/questions/25576415/what-is-the-pre...

What is the precise meaning of "ours" and "theirs" in git?

This might sound like too basic of a question, but I have searched for answers and I am more confused now than before. What does "ours" and "theirs" mean in git when merging my branch into my other

Stack Overflow

Show thread

flutetornado 5d ago

I ended up creating a personal vim plugin for merges one night because of a frustrating merge experience and never being able to remember what is what. It presents just two diff panes at top to reduce the cognitive load and a navigation list in a third split below to switch between diffs or final buffer (local/remote, base/local, base/remote and final). The list has branch names next to local/remote so you always know what is what. And most of the time the local/remote diff is what I am interested in so that’s what it shows first.

Show thread

IshKebab 5d ago

This is better but it still doesn't really help when the conflict is 1000 lines and one side changed one character and the other deleted the whole thing. That isn't theoretical - it happens quite regularly.

What you really need is the ability to diff the base and "ours" or "theirs". I've found most different UIs can't do this. VSCode can, but it's difficult to get to.

I haven't tried p4merge though - if it can do that I'm sold!

Show thread

cxr 5d ago

> I don't understand why you need to switch out the VCS to fix that issue.

For some reason, when it comes to this subject, most people don't think about the problem as much as they think they've thought about it.

I recently listened to an episode on a well-liked and respected podcast featuring a guest there to talk about version control systems—including their own new one they were there to promote—and what factors make their industry different from other subfields of software development, and why a new approach to version control was needed. They came across as thoughtful but exasperated with the status quo and brought up issues worthy of consideration while mostly sticking to high-level claims. But after something like a half hour or 45 minutes into the episode, as they were preparing to descend from the high level and get into the nitty gritty of their new VCS, they made an offhand comment contrasting its abilities with Git's, referencing Git's approach/design wrt how it "stores diffs" between revisions of a file. I was bowled over.

For someone to be in that position and not have done even a cursory amount of research before embarking on a months (years) long project to design, implement, and then go on the talk circuit to present their VCS really highlighted that the familiar strain of NIH is still alive, even in the current era where it's become a norm for people to be downright resistant to writing a couple dozen lines of code themselves if there is no existing package to import from NPM/Cargo/PyPI/whatever that purports to solve the problem.

Show thread

barrkel 5d ago

I don't really get the upside of focus on CRDTs.

The semantic problem with conflicts exists either way. You get a consistent outcome and a slightly better description of the conflict, but in a way that possibly interleaves changes, which I don't think is an improvement at all.

I am completely rebase-pilled. I believe merge commits should be avoided at all costs, every commit should be a fast forward commit, and a unit of work that can be rolled back in isolation. And also all commits should be small. Gitflow is an anti-pattern and should be avoided. Long-running branches are for patch releases, not for feature development.

I don't think this is the future of VCS.

Jujutsu (and Gerrit) solves a real git problem - multiple revisions of a change. That's one that creates pain in git when you have a chain of commits you need to rebase based on feedback.

Show thread

gzread 5d ago

People see that CRDTs have no conflicts and proclaim them as the solution to all problems, not seeing that some problems inherently have conflicts and either can't be represented by CRDTs at all, or that the use of CRDTs resolves conflicts in a way that's worse than if you actually thought about conflict resolution. E.g. that multiplayer text editor that interleaved characters from simultaneous edits.

Show thread

IgorPartola 5d ago

I used to use rebase much more than merge but have grown to be more nuanced over the years:

Merge commits from main into a feature branch are totally fine and easier to do than rebasing. After your feature branch is complete you can do one final main-to-feature-branch merge and then merge the feature branch into main with a squash commit.

When updating any branch from remote, I always do a pull rebase to avoid merge commits from a simple pull. This works well 99.99% of the time since what I have changed vs what the remote has changed is obvious to me.

When I work on a project with a dev branch I treat feature branches as coming off dev instead of main. In this case I merge dev into feature branches, then merge feature branches into dev via a squash commit, and then merge main into dev and dev into main as the final step. This way I have a few merge commits on dev and main but only when there is something like an emergency fix that happens on main.

The problem with always using a rebase is that you have to reconcile conflicts at every commit along the way instead of just the final result. That can be a lot more work for commits that will never actually be used to run the code and can in fact mess up your history. Think of it like this:

1. You create branch foo off main.

2. You make an emergency commit to main called X.

3. You create commits A, B, and C on foo to do your feature work. The feature is now complete.

4. You rebase foo off main and have to resolve the conflict introduced by X happening before A. Let’s say it conflicts with all three of your commits (A, B, and C).

5. You can now merge foo into main with it being a fast forward commit.

Notice that at no point will you want to run the codebase such that it has commits XA or XAB. You only want to run it as XABC. In fact you won’t even test if your code works in the state XA or XAB so there is little point in having those checkpoints. You care about three states: main before any of this happened since it was deployed like that, main + X since it was deployed like that, and main with XABC since you added a feature. git blame is really the only time you will ever possibly look at commits A and B individually and even then the utility of it is so limited it isn’t worth it.

The reality is that if you only want fast forward commits, chances are you are doing very little to go back and extract code out of old versions a of the codebase. You can tell this by asking yourself: “if I deleted all my git history from main and have just the current state + feature branches off it, will anything bad happen to my production system?” If not, you are not really doing most of what git can do (which is a good thing).

Show thread

barrkel 5d ago

I am now wholly bought into the idea of having a feature branch with (A->B->C) commits is an anti-pattern.

Instead, if the feature doesn't work without the full chain of A+B+C, either the code introduced in A+B is orphaned except by tests and C joins it in; or (and preferably for a feature of any significance), A introduces a feature flag which disables it, and a subsequent commit D removes the feature flag, after it is turned on at a time separate to merge and deploy.

Show thread

IgorPartola 5d ago

I treat each feature branch as my own personal playground. There should be zero reason for anyone to ever look at it. Sometimes they aren’t even pushed upstream. Otherwise, just work on main with linear history and feature flags and avoid all this complexity that way.

Just like you don’t expect someone else’s local codebase to always be in a fully working state since they are actively working on it, why do you expect their working branch to be in a working state?

Show thread

hackrmn 5d ago

When you say "unit of work", unit of _which_ work are you referring to? The problem with rebasing is that it takes one set of snapshots and replays them on top of another set, so you end up with two "equivalent" units of work. In fact they're _the same_ indeed -- the tree objects are shared, except that if by "work" you mean changes, Git is going to tell you two different histories, obviously.

This is in contrast with [Pijul](https://pijul.org) where changes are patches and are commutative -- you can apply an entire set and the result is supposed to be equivalent regardless of the order the patches are applied in. Now _that_ is unit of work" I understand can be applied and undone in "isolation".

Everything else is messy, in my eyes, but perhaps it's orderly to other people. I mean it would be nice if a software system defined with code could be expressed with a set of independent patches where each patch is "atomic" and a feature or a fix etc, to the degree it is possible. With Git, that's a near-impossibility _in the graph_ -- sure you can cherry-pick or rebase a set of commits that belong to a feature (normally on a feature branch), but _why_?

Pijul

An easy to use, distributed and fast version control system.

Show thread

barrkel 5d ago

By "unit of work", I mean the atomic delta which can, on its own, become part of the deployable state of the software. The thing which has a Change-Id in Gerrit.

The delta is the important thing. Git is deficient in this respect; it doesn't model a delta. Git hashes identify the tip of a tree.

When you rebase, you ought to be rebasing the change, the unit of work, a thing with an identity separate and independent of where it is based from.

And this is something that the jujutsu / Gerrit model fixes.