Shortcomings of Diff3

Nov 23, 2021

Engineers often find themselves resolving difficult merge conflicts manually.

Merges with conflicts need diffs. Why isn't diff3 (o verview) good enough to do conflict resolution automatically? Without going into the theory, I'll go over some of the issues with diff3. If you're interested in diving deeper, you can read A Formal Investigation of Diff3.

Problems with diff3

Not idempotent. If you run the algorithm over and over again, you can continue to propagate changes. Intuition might say that an algorithm that merges changes together should converge.
Not semantic. It takes no knowledge of structure. That means it doesn't understand programming languages or abstract syntax trees. What if the merge strategy could understand if the output was valid code?
It can't work for CRDTs, operational transforms, or other structured data. This follows from the lack of semantics, but it means that tools like Notion and Google Docs can't use diff3 to merge changes.
Fails for changes that are very similar but textually far from the parent. We would expect two different branches that are very similar to each other to merge easily. Unfortunately, if they are far from the parent branch, it's difficult for diff3 to work.
Not stable. Stability (as defined by the paper above) means that there exists a constant such that for small enough changes, there is a guaranteed small merge.

What's been tried

Semantic merge strategies for specific languages. Here's a tool called SemanticMerge that works for #C, Java, and C. There's also difftastic that supports structural diffs in over 20 different languages. However, difftastic does not generate patches or handle merges.

Patch-based algebras like darcs, which was an alternative version control system to git. You can read about the theory behind how darcs stored patches and resolved conflicts here.

Machine Learning for Merge Conflicts

What if we could train an algorithm to resolve common merge conflicts? We have millions of public merge conflict resolutions on GitHub as a data set. With a little magic, we could probably recreate the original diff'd conflict as well.

It seems like this is the best way to capture semantic differences across different languages – a resolution you would normally only get by parsing a language-specific AST or understanding syntax. Tricky patterns from dependency management conflicts that often live outside the AST in configuration files could be learned and fixed.