One of the most critical features of modern-day version control relies on a poorly understood algorithm – **diff3**
. What is it, how does it work, and why it could be improved?
There are two critical operations that need to know the differences between files:
- Reviewing new changes
- Merging multiple changes
First, version control systems like git
don't actually store the patched differences between files. Instead, git
stores completely new files for every revision and computes the differences at runtime. That's why you can switch between really old revisions and new revisions quickly – otherwise, every patch would need to be sequentially applied.
So how does git
know the differences between files? Comparing two files git
uses a set of algorithms that mostly follow Myers' An O(ND) Difference Algorithm and Its Variations. It's closely related to a common interview question, the longest common subsequence problem.
But one of the most challenging aspects of version control is the 3-way merge. Let's say you, and a coworker independently make different revisions to a common document and then try to reconcile the differences. It's possible that some changes can be merged together automatically, but there might exist conflicts.
One merge tool that git
uses is called diff3
or sometimes kdiff3
, which, as you can guess, produces a merged output of 3 different files that share a common ancestor. diff3
was originally written in 1976 in Version 7 of Unix. The algorithm is a set of heuristics that tries to capture most edge cases of merging text together, but it's not straightforward or easy to understand. Doug McIlroy (or an engineer named Paul Jensen) wrote the first version of diff3
for Unix. You can read McIlroy's paper, An Algorithm for Differential File Comparison.
You can enable printing merge conflicts in diff3
format with the command. This has the benefit of showing you the diff output that includes the common ancestor, which usually helps when resolving conflicts.
$ git config --global merge.conflictstyle diff3
You can browse the source code of diff3
here.