Sinclair Target

Snapshots or Patches? Does It Matter?

Aug 18, 2025

Although overwhelmingly people use Git, other version control systems exist. Some are even new and shiny and experimenting with features that Git doesn’t have.

I recently worked my way through this excellent tutorial for Jujutsu. It excited me, because it made me realize that the design space for version control systems is much larger and more unexplored than I had imagined.

But I don’t want to talk about Jujutsu. The docs for Jujutsu mention that some of its features are inspired by an older system called Darcs. I gave Darcs a spin too. Darcs doesn’t have Jujutsu’s pleasant UX but it’s interesting for another reason. Where Git and Jujutsu both model the history of your repository as a tree of point-in-time snapshots (i.e. commits), Darcs instead models it as a tree of patches.

If you’re a long-time Git user and aren’t that familiar with Darcs, it’s not obvious how these models are different. Even though what Git does under the hood is store snapshots, Git shows you patches all the time—when you run git diff, when you encounter merge conflicts, when you stage hunks into your index.

So are these two models equivalent? Some people believe that whether you store snapshots or patches has implications for performance but doesn’t otherwise affect what is possible within your version control system:

If you have a repository with many objects (e.g. thousands or millions of code files,) then a purely snapshot-based versioning system would end up with repositories ballooning in size, as each version of each file has to be stored separately whenever there are changes.

If you have a repository where each file has many revisions (e.g. you have thousands, or say millions of commits), then a purely changeset-based versioning system would struggle with processing speed, as building a commit would require trawling through millions of commits to piece together a complete file.

I get the impression that most VCSes will have some mechanisms in place to mitigate around both of these problems - like Git which isn’t purely snapshot based and “kind of” uses changesets under the hood. But from what I understand, the basic trade-off between these two forms of storage in version control systems is between performance and space.

A common corollary to this line of thinking is the belief that Git became more popular than Darcs in part because it chose the right trade-off and is fast whereas Darcs is slow. Which reinforces the idea that the performance characteristics of each model are the most important thing.

Now that I’ve tried Darcs, I think the difference between snapshots and patches goes deeper than performance. But I’m having a hard time articulating exactly why, hence this attempt to work it out in writing.

As best as I can tell, the fundamental difference between the two models comes down to what a “base” is in each system.

What Is a “Base,” Anyway?

In Git, the idea of a “base” is implicit in the git rebase command, but weirdly I’ve never read an explanation of what exactly a “base” is supposed to be. There is no entry for “base” when you run man gitglossary.

If we zoom out a bit, every version control system has to have some notion of a “base.” Version control systems capture changes to the files in our repository over time; each change is defined by 1) the actual change itself, let’s call it the transformation, and 2) the state of our repository prior to the change. This second thing is the “base.”

You might wonder why we need to worry about the base as well as the transformation. This is because some transformations could produce different results or even become meaningless depending on the state of the repository prior to the change. To take one example, the transformation “add foo to line 6 of bar.txt” works if bar.txt exists but doesn’t make much sense if bar.txt does not exist. Our version control system shouldn’t let us get to a place where we’re trying to add lines to a nonexistent file. So the base provides necessary context for the transformation.

We can imagine some transformations that don’t need context. “Create bar.txt” is one. If we think of this transformation as meaning “ensure bar.txt exists”, then it doesn’t matter what files are in our repository beforehand or what those files contain—if the file doesn’t exist we can create it and if it does exist we can leave it alone.

In general, though, we need to consider the base. Even in this last case, things get more complicated if bar.txt was previously deleted; what we’re really doing now by creating bar.txt is creating a second version of bar.txt. This might be important context for certain operations in our system.

Bases in Git and Darcs

In Git, every commit (excluding the root commit) has at least one parent commit. A parent commit serves as the base for a change. When you run git rebase, or git cherry-pick (which is just a rebase of a single commit), you are swapping out the parent commit and thus swapping out the base of the change introduced in that commit.

The thing to understand about Git is that the base for every change is the state of the entire repository immediately preceding that change. This is a direct consequence of not storing the transformation but instead computing it, on the fly, as the difference between two subsequent snapshots. The base for each change has to be the preceding snapshot, which captures the state of all our files.

We said before that if we make the change “add foo to line 6 of bar.txt” then we need context about the state of bar.txt. What we don’t need is context about other files. In Git, though, even if bar.txt was last modified several dozen commits ago, the immediately preceding commit is the base we get. It doesn’t matter if the immediately preceding commit changed something unrelated, like, say, bim.txt.

In Darcs, the notion of a “base” is more subtle. In Darcs, you don’t make commits; instead, you record patches. A patch captures the transformation part of our change directly and doesn’t require a reference to an immediately preceding snapshot. A patch can almost be thought of as just the transformation part of a change. The base comes in only when we need that extra context, such as when we need a file to exist before we add lines to it; in this case we say that the patch depends on the previous patch that created the file. This previous patch is the base for our change.

The difference between Git and Darcs here is that the patch we depend on could be one we created a long time ago. It doesn’t have to be the most recent patch in our patch history. A patch can even depend on multiple previous patches, but that doesn’t make it a “merge” patch, just a patch with a base composed of multiple other patches.

In Darcs, our patch history is always shown as linear. However, the patches are only partially ordered in reality. We can get a better sense for this partial ordering by viewing the patch dependency tree using darcs show dependencies:

A Darcs dependency tree.

A dependency tree in a “hello world” Darcs repository.

This tree shows how our patches relate to each other, not chronologically but in terms of which transformations have to precede others. In Git, the notion of a “base” isn’t granular enough to capture any of this.

Affordances

This difference in what a “base” is in both systems leads to different affordances. Some operations are easier using Git and others are easier using Darcs.

I’ve realized that when I’m using a version control system, whether that’s Git or Darcs, sometimes I want to think in terms of snapshots and other times I want to think in terms of patches. By this I mean sometimes I want to think about specific states that my repository was in at a certain time, and other times I want to think about a transformation I want to apply without considering a specific preceding state.

In Git, thinking in snapshots is easy. If I want to revert my working directory to the state it was in at the v1.2 release, I can just checkout that snapshot. If I want to switch to several different states my repository was previously in to search for when a regression was introduced, I can run git bisect and do that. I can run git log to get the full picture of all the states my repository has been in and how they relate to each other.

In Darcs, I can do some of these things but it isn’t as natural. When I want to revert my working directory to a previous state, I have to “unwind” several patches to get there. This should produce the same result, but doesn’t map as well to my mental model of just moving to a fixed, previous state. Darcs has an equivalent to git bisect called darcs test --bisect, which can help me find a version of the repository before the bug was introduced but not the version that was literally what whoever wrote the bug was working with when they made their erroneous commit. The ordering of patches in the output from darcs log reflects the order in which the patches were applied locally and might not be the same for other people; it’s hard to get a sense for the actual historical development of the codebase over time.

But when I want to think in patches, Git is the system that makes it hard. Something that is a huge pain to do in Git is keep a change you want to have locally and apply it on top of whatever you pull from a remote. Your local change might be a change to a configuration file that hardly ever gets updated; even so, you have to rebase your change on top of unrelated changes every time you pull from the remote. Similarly, when composing a PR, it’s natural to think of each of your commits as a patch you are introducing on top of the main branch. If you want to keep related changes in a single commit, or order the commits so that code that depends on other code gets introduced later in the commit sequence, you find yourself doing a lot of tedious interactive rebasing as you develop the feature.

In Darcs, all these things are natural because Darcs allows you to think of patches as transformations most of the time. The bases have to change less often so they can be more invisible. Making a local patch and then pulling unrelated patches from a remote is trivial. Composing a series of patches that implement a single feature would also be easier and wouldn’t require constant rebasing. You would have darcs show dependencies to help you understand the relationship between your patches. A reviewer could look at your feature as a tree of changes rather than a sequence of commits, which might help them better grok your work.

So, yes, while the choice between snapshots and patches has major performance implications, it also determines what is easy and what is hard for users to do within your version control system. It’s probably true that Git became more popular than Darcs because it was faster, among other reasons. But if we’re trying to learn lessons that we could apply to the design of future version control systems, what this choice of models means for the affordances of your system might be the more important lesson.