Darcs Demystified

WARNING: This article is incomplete. In fact, it is only around 25% complete, and in particular it is lacking the whole point. So far it just has the set-up. I showed it to someone as an incomplete draft and that person posted a link to it from their web log, so I'm adding this warning at the top.

In fact, I'm thinking of splitting this into two articles -- a "tutorial" which contains the whole working example and explanation, and a "demystified" which assumes that you already understand that stuff and want to cut to the salient points.

People reading this draft (HELLO, #revctrl!), could just look at these source files: zooko.com/badmerge/.

The darcs revision control system is the most advanced of the new crop of Free Software revision control tools [note: tone this sentence down when giving this article to the authors of alternative tools. ;-)]. (See my Revision Control Quick Reference Guide for a brief summary of these new tools.)

The core on which darcs is built is a novel idea called "patch theory". This idea allows users to manage their changes in a way that is simultaneously more powerful, more convenient, safer and simpler than popular alternatives such as Subversion. Unfortunately, this core idea is not widely understood, and many programmers are understandably cautious about using an automated tool to perform tranformations on their source code when they don't understand precisely what these transformations will do.

This document is intended to give such programmers a "working knowledge" of darcs patch theory so that they can confidently predict what darcs will do when they use it to manipulate their source code. This article is built on three simple examples which, when taken together, illustrate the core concept of patch theory and how it differs from other tools such as Subversion. The examples -- shown in highlighted boxes throughout the article -- are complete, working examples so that if you type the commands shown into a computer that has darcs, then you will see the same results. (It is remarkable that in addition to being the most theoretically sophisticated of the current crop of tools, darcs is also the easiest to set up and use.)[this one too.]

The Basics

Like all modern revision control systems, darcs enables you to create new source code in your working directory, to make records of a series of changes and to store those for future reference, and to transfer changes you've made to other computers over a network. One unusual aspect of darcs is that every working directory is a repository -- there is no distinction between "server" and "client" in darcs. Instead, every darcs user has a local darcs "repository" which contains both their current working directory and their log of the complete history of patches which have been applied to this working directory.

Another aspect of darcs is that it is completely patch-based. There is no command to transfer files from one darcs repository to another, only commands to transfer patches from one darcs repository to another. When a patch arrives in a darcs repository, that patch is applied to the repository, causing a change to happen to a file (or to more than one file) in that repository.

Given these basic facts, let's now look at a concrete example of using darcs. We will later extend the example to show how darcs differs from Subversion.

Simple Example One

Simple Example One: Create A File, Share Your Changes

Suppose a user named Professor Applebaum creates a new darcs repository:

$ mkdir /home/applebaum/newrepo
$ cd /home/applebaum/newrepo
$ darcs init

Now she creates a file in this repository:

$ cat > teach_math.c
int square(int x) {
 int y = x;
 for (int i = 0; i < x; i++) y += x;
 return y;
}

Then she tells darcs to track all changes made to this file:

$ darcs add teach_math.c
    

Before recording a permanent history of these changes, she wants to view them to see if they are right, so she tells darcs to show what changes she has made locally that have not yet been recorded as a patch and added to the darcs repository's permanent collection of patches:

$ darcs diff
diff -rN old-applebaum/teach_math.c new-applebaum/teach_math.c
0a1,5
> int square(int x) {
>  int y = x;
>  for (int i = 0; i < x; i++) y += x;
>  return y;
> }

This output (in unified diff format) shows that what she has done is to create a new file and insert these five lines into it. Satisfied that this is the change she intended to write, Professor Applebaum tells darcs to record the change. When she does this, darcs prompt her for a patch name with which she can later refer to this patch:

$ darcs record --all --author=applebaum 
What is the patch name? first version of teach_math.c
Do you want to add a long comment? [yn] n
Finished recording patch 'first version of teach_math.c'

Now, suppose another user, named Professor Bartleby, creates a darcs repository:

$ mkdir /home/bartleby/src
$ cd /home/bartleby/src
$ darcs init

Now Professor Bartleby runs "darcs pull /home/applebaum/newrepo". Darcs inspects Professor Applebaum's repository and offers to Professor Bartleby a list of all patches which are in Professor Applebaum's repository and not yet in Professor Bartleby's. Currently that list has only one patch in it:

$ darcs pull /home/applebaum/newrepo

Mon Aug  8 14:18:31 ADT 2005  applebaum
  * first version of teach_math.c
Shall I pull this patch? (1/1) [ynWvpxqadjk], or ? for help: y
Finished pulling and applying.

When the patch named "first version of teach_math.c" is pulled into Professor Bartleby's repository, then the patch is applied to his current working directory. When this patch is applied, it creates a file named "teach_math.c" and populates the file with the five lines of code that Professor Applebaum wrote.

Simple Example One Continued: Professor Bartleby Writes A Patch

Now Professor Bartleby decides to add a new function to the file. He adds a function named "fast_square" and renames the function "square" to "slow_square". (In order to make this example easy to transcribe, he does this using the "cat" Unix tool rather than using an interactive text editor.)

$ cat > teach_math.c
int fast_square(int x) {
 int y = x;
 return y * y;
}

int slow_square(int x) {
 int y = x;
 for (int i = 0; i < x; i++) y += x;
 return y;
}

Now he inspects his changes using "darcs diff":

$ darcs diff
diff -rN old-bartleby/teach_math.c new-bartleby/teach_math.c
1c1,6
< int square(int x) {
---
> int fast_square(int x) {
>  int y = x;
>  return y * y;
> }
> 
> int slow_square(int x) {

Satisfied, he records the change:

$ darcs record --all --author="bartleby"
What is the patch name? add fast_square, rename square to slow_square
Do you want to add a long comment? [yn] n
Finished recording patch 'add fast_square, rename square to slow_square'

Simple Example One Continued: Applebaum Fixes A Bug

Meanwhile, at the same time that Professor Bartleby is writing his new improved "square" function, Professor Applebaum is inspecting her source code, and she realizes that there is a bug in the original square function. It is easy to fix, and Professor Applebaum proceeds to do so:

$ cat > teach_math.c
int square(int x) {
 int y = 0;
 for (int i = 0; i < x; i++) y += x;
 return y;
}
$ darcs diff
diff -rN old-applebaum/teach_math.c new-applebaum/teach_math.c
2c2
<  int y = x;
---
>  int y = 0;

$ darcs record --author=applebaum --all --patch-name="fix bug in square()"
Finished recording patch 'fix bug in square()'

Simple Example One Completed: The Bugfix Meets The New Function

After Professor Applebaum has recorded this bugfix patch, Professor Bartleby runs his "darcs pull" command again to check for new patches in Professor Applebaum's repository:




     

     




     



     


here's where the rest of the article goes

   

    

The Difference -- Where Do The Changed Lines Go?

Now consider the following example. Suppose one darcs user on one computer

Addendum: Darcs in Practice

Troublesome Corner Cases

The unique behavior of darcs is a substantial improvement over extant alternative revision control systems. Furthermore, the current darcs implementation is very easy to learn and easy to use. However, I hesitate to recommend the use of darcs for large projects. The only reason to hesitate is that there is an unsolved problem about how darcs ought to manage certain corner cases, and if you happen to hit one of these corner cases then it will require you to stop doing your own work for a couple of hours and implement a manual work-around.

Whether the benefits of using darcs outweigh this drawback will depend on the specifics of your project and your development team.

There are two related corner cases that you have to watch out for. The first is when two people separately enter identical or near-identical changes into their separate darcs repositories. For example, if two people each get a copy of a file, put the file into their respective darcs repositories, run "darcs add" on the file, and run "darcs record", then the result will be two separate patches which each add an identical file. If two or three such duplicate changes apply to a single file during a darcs pull, darcs might effectively "lock up" as it tries an exponential number of possible combinations in an attempt to resolve the multiple layers of conflicts.

The second corner case is when there are multiple successive "conflict, merge, reconflict" changes between two people. This happens rarely since, as we have seen, darcs eliminates spurious textual conflicts and requires user intervention only for unavoidable textual conflicts. However, it can happen, and when it does, darcs will again effectively lock up.

If you encounter either of these situations, your work can be saved by removing all of the troublesome patches from all of the darcs repositories in use by your entire development team and re-recording your work as a new darcs patch. After implementing that work-around, no further ill effects will be felt from those troublesome patches.

Handy Tip: "darcs whatsnew"

Note: in practice, you should not use "darcs diff", which is slow on large repositories, but instead "darcs whatsnew", which is fast regardless of the size of the repository and which provides nicer output. "darcs diff" exists primarily for backwards compatibility with the traditional Unix "diff" command, and it was used instead of "darcs whatsnew" in this article only because of its comforting familiarity.

Acknowledgements

Thanks to David Roundy , Graydon Hoare , Tom Lord , Aaron Bentley , Ken Schalk , Bryan O'Sullivan , , ,


Zooko
Last modified: Thu Sep 1 13:13:09 ADT 2005