A Django site.
November 11, 2008

Hans Fugal
no nic
The Fugue :
» git-push is worse than worthless

Ugh

Let's say you are a web developer, and you do development on your laptop, then when things are nice and shiny you want to push those changes to the webserver. Seems natural enough, right?

git clone server:/var/www/foo
# ...
git pull

# edit stuff
git commit -a -m 'i edited stuff'

Now, let's say you're a (possibly former) darcs/bzr/mercurial user and this time you're using git. Git has git-push. You read the man page like a good little code monkey, and it seems like it does the same or similar thing to darcs push or hg push. It seems like if you want to push your changes to the server, you'd do this:

git push

Am I off in left field or does this not seem 100% rational? But wo be unto the code monkey that utters this unfortunate incantation. Observe:

$ mkdir foo
$ cd foo
$ git init
Initialized empty Git repository in /private/tmp/foo/.git/
$ echo hello > foo.txt
$ git add foo.txt
$ git commit -m 'hello'
Created initial commit bee50da: hello
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 foo.txt
$ cd ..
$ git clone foo bar
Initialized empty Git repository in /private/tmp/bar/.git/
$ cd bar
$ echo goodbye >> foo.txt
$ git commit -a -m goodbye
Created commit 99c13c1: goodbye
 1 files changed, 1 insertions(+), 0 deletions(-)
$ git push ../foo
Counting objects: 5, done.
Writing objects: 100% (3/3), 248 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
To ../foo
   bee50da..99c13c1  master -> master
$ cd ../foo
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   modified:   foo.txt
#
$ git diff
$ git diff --cache
error: invalid option: --cache
$ git diff --cached
diff --git a/foo.txt b/foo.txt
index a32119c..ce01362 100644
--- a/foo.txt
+++ b/foo.txt
@@ -1,2 +1 @@
 hello
-goodbye
$ git log
commit 99c13c1e60888ae2c0e221898411e1cd52ad3815
Author: Hans Fugal <hans@fugal.net>
Date:   Mon Nov 10 17:11:57 2008 -0700

    goodbye

commit bee50da72798edc47ddc36dbc4f559f141b1e28b
Author: Hans Fugal <hans@fugal.net>
Date:   Mon Nov 10 17:11:34 2008 -0700

    hello

I promise I didn't fake that. Yes, you saw that correctly—git wants to undo the changes you just committed. If you happen to have a clean working directory, all you need to do to return to sanity is git reset HEAD. If not, heaven help you.

This is totally unacceptable. It's unforgivable on so many levels. At the very least, the manpage should warn you to not push to repositories with working copies. Git should warn you before you push and screw up your repo that it has a working copy checked out. Ideally, git would behave like darcs and update the working copy. Suboptimally, it would behave like mercurial and make it a new revision that you have to manually checkout. But this is simply ridiculous.

So what is the solution? They tell you to use pull. Hello! Is anyone home? My laptop is roving. It's often behind a NATing firewall. I'm supposed to find my public IP address and figure out how to subvert the evil firewalls of the world every time I want to push my changes to the server?

A workaround, and probably the best real-world workflow, is to have a second bare repository or a second branch on the server, push into that, then ssh into the server and pull the changes. I think this page describes how to do that with a second branch, though I'm short on time to actually try it out at the moment.

More of this sickening story in this thread, where you will learn that at least one other person out there has his head screwed on properly, that the developers are more interested in how hooks work (and fail to allow you to do this even if you grok them), and that they've discussed the problem before and decided the correct response is to RTFM (M for minds this time, since the manual was completely unhelpful).

Update: Some of you have been quick to defend git and the design choice of how push behaves. I want to clarify that I don't care so much that push updates the repository but not the working directory. Mercurial works this way too. Not the way I'd do it but it's a valid approach. The problem here is that git push seems like a natural thing to do but screws up your working directory on the remote side. Mercurial doesn't change the working directory, but neither does it silently rebase it and set you up to undo your changes if you're not careful. The problem here is a lack of safety and a lack of warning. They know it's a problem, they've fielded enough "morons". A few words of warning in the man page is all it would take to make me happy.

And now, I have had time to work out a more specific workaround. Here's what I did, and it seems to work well:

# Server setup (set up incoming branch)
server$ git branch incoming
server$ git branch
  incoming
* master
  origin

# Laptop setup (local master to remote incoming)
laptop$ git config remote.origin.push master:incoming

# Everyday usage
laptop$ git push
Counting objects: 5, done.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 279 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To server:/tmp/foo
   b108a07..a9d3282  master -> incoming

server$ git status
# On branch master
nothing to commit (working directory clean)
server$ git pull . incoming
From .
 * branch            incoming   -> FETCH_HEAD
Updating b108a07..a9d3282
Fast forward
 foo.txt |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

You could use hooks to automatically do that git pull . incoming if you liked, making it more like darcs than mercurial.

Updated update: On further thought, the cleanest solution is probably to have a separate master (bare) repository, e.g.

$ mkdir master
$ cd master
$ git init
Initialized empty Git repository in /private/tmp/foo/master/.git/
$ git commit --allow-empty -m initial
Created initial commit 999755e: initial
$ cd ..
$ git clone master live
Initialized empty Git repository in /private/tmp/foo/live/.git/
$ cd live
$ git branch
* master
$ cd ..
$ git clone master laptop
Initialized empty Git repository in /private/tmp/foo/laptop/.git/
$ cd laptop
$ echo hello > foo.txt
$ git add foo.txt
$ git commit -m hello
Created commit 2297bcf: hello
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 foo.txt
$ git push
Counting objects: 4, done.
Writing objects: 100% (3/3), 239 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
To /tmp/foo/master/.git
   999755e..2297bcf  master -> master
$ cd ../live
$ ls
$ git pull
remote: Counting objects: 4, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From /tmp/foo/master/
   999755e..2297bcf  master     -> origin/master
Updating 999755e..2297bcf
Fast forward
 foo.txt |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 foo.txt
$ echo goodbye >> foo.txt
$ git commit -a -m goodbye
Created commit 04f6702: goodbye
 1 files changed, 1 insertions(+), 0 deletions(-)
$ git push
Counting objects: 5, done.
Writing objects: 100% (3/3), 248 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
To /tmp/foo/master/.git
   2297bcf..04f6702  master -> master
$ cd ../laptop
$ git pull
remote: Counting objects: 5, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From /tmp/foo/master/
   2297bcf..04f6702  master     -> origin/master
Updating 2297bcf..04f6702
Fast forward
 foo.txt |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

March 31, 2008

Elijah Newren
no nic
Elijah's Blog
» Many different kinds of revision specifiers

Version control systems each use their own method to refer to different versions (also known as ‘revisions’) of the repository. The choice of revision specification often reflects underlying data structures, and the choice of data structures often inhibits or enables various features for the system. Additionally, the methods of displaying and using revision specifiers can also affect the ease with which users can learn and use the new system.

Unfortunately, a full comparison is beyond the scope of this post. I will concentrate on simply introducing the basics and giving a flavor for how things are layed out, which itself is a long enough topic. While conclusions could be drawn with just the data and explanations presented here, I am intentionally avoiding doing so and leaving such to possible later posts. (Besides, bloody taxes and the brain-damaged US tax code have stripped me of any time that I would need to write such additional comparisons.)

Warning: My pictoral representations for each system will be crazier and more complex than usual (and even more lopsidedly complex for some systems than others) in order to keep things short while still showing what is possible.

cvs

Method

See cvs revision numbers and cvs branching basics, particularly figure 2.4 near the end of the branching basics section.

CVS has revision identifiers that are per-file, meaning that repositories at any given time are a combination of many different revisions (one for each file). Ignoring an ugly technical detail about the special revisions 1.1.1.1 and 1.1.1, the first version of a file is numbered 1.1. The next change to the file is recorded as 1.2, the next is 1.3, and so forth. If the user wants to create a branch, based on the 1.3 version of a file, then the branched version is 1.3.2.1. Changing and committing the file on the branch results in 1.3.2.2, then 1.3.2.3, etc. A second branch also created off of 1.3 would be numbered 1.3.4 instead of 1.3.2 (with actual commits numbered 1.3.4.x).

Note that branches are named by a revision with one less number (e.g. 1.4.2 is the name of the branch with commits numbered 1.4.2.x). As such, branch names refer to the beginning of the branch. Each file is branched separately, with per-file revision numbers (it is even possible to branch some files without branching others).

Tags are aliases for a specific version number. Since revisions are per-file, a given tag may refer to different revision numbers for different files (e.g. the ‘v1.0′ tag might refer to version 1.27 of foo.c, 1.36 of bar.h, and 1.218 of foobar.py)

Uniqueness of cvs revisions is not an issue since there is only one repository.

Picture

                       (etc)
                         |
             (etc)   1.4.4.3.2.2
               |         |
            1.4.4.5  1.4.4.3.2.1
               |         |
            1.4.4.4  (1.4.4.3.2)
               |     /
               |   /
            1.4.4.3
               |
  1.4.2.2   1.4.4.2
     |         |
  1.4.2.1   1.4.4.1
     |         |
  (1.4.2)   (1.4.4)
      \       /
       \     /
        \   /
         \ /
         1.4
          |
         1.3
          |
         1.2
          |
         1.1

svn

Method

See svn revisions and working with your branch, particularly figure 4.4 (the branching of one file’s history).

svn uses global revision identifiers, with the first revision being marked as 1, the second as 2, the third as 3, etc.

Branches have an unusual implementation in subversion; they are handled by a namespacing convention: a branch is the combination of revisions within the global repository that exist within a certain namespace. Creating a new branch is done by copying an existing set of files from one namespace to another, recorded as a revision itself.

Tags (an alias for a specific version in history) don’t exist in subversion. Instead, subversion again uses a namespacing convention identical to that done for branches (thus making tags and branches indistinguishable in subversion other than the chosen names), and users are merely discouraged from committing additional changes to files within a tag namespace.

Uniqueness of svn revisions is not an issue since there is only one repository.

Technically, a revision could simultaneously modify any combination of branches and tags by simply committing to all namespaces; however, this is typically discouraged and users only have a certain namespace checked out at a time.

Picture

  trunk   branches/proj-2-22  branches/proj-2-20  tags/RELEASE_2_22_2
   24
                                                        23
                 22
   21
   20
                 19
                                     18
                                     17
   16
   15
                 14
                 13
                 12
   11
                                     10
    9
    8
    7
                                      6
    5
                                      4
    3
    2
    1

bzr

Method

See understanding bzr revision numbers and specifying bzr revisions.

bzr, like svn, uses 1, 2, 3, etc. for revision numbers. However, the revision numbers are always consecutive in a branch. Merged in changes from other branches are given 3 numbers per revision. For example, if changes were merged from a repository that has changes relative to revision 2, the changes would come into the current branch numbered 2.1.1, 2.1.2, 2.1.3, etc. If changes from more than one branch are relative to the same commit, then the middle number is used to distinguish commits from the different branches. Thus one would see another set of changes relative to commit 2 numbered as 2.2.1, 2.2.2, 2.2.3, 2.2.4, etc. (Versions of bzr older than 1.2 used more than 3 numbers in certain cases, but that is no longer true of current versions.) See the picture below to make this clearer.

Branches in bzr are done by creating separate directories (typically with their own repository), though one can set up shared repositories. Each branch will have its own numbering scheme for the revisions it stores, recording the order that the revisions entered that repository. (See below about uniqueness issues.)

Tags in bzr are an alias for a commit, and are stored as part of a branch.

Note that bzr revision numbers are not unique. If you have the same revision in two different repositories, they will not necessarily have the same revision number in both. bzr does store unique identifiers for revisions, known as revid’s (an example of which looks like Matthieu.Moy@imag.fr-20051026185030-93c7cad63ee570df), though they are not shown by default. Users can obtain these unique identifiers by passing the –show-ids flag to bzr log, and these revids can be used in place of the simpler default revision specifiers when prefixed with “revid:”.

Picture

              12
              |
              11
            / | \
          /   |  \
        /     |   \
      10    4.1.5  4.2.2
       \   /  |      |
        \ /   |      |
         9    |    4.2.1
        / \   |   /
       /   \  |  /
       8    4.1.4
       |      |
       7    4.1.3
       | \    |
       |   \  |
       6    4.1.2
       |      |
       5    4.1.1
        \   /
          4
          |
          3
          |
          2
          |
          1

Note: The revision identifiers shown in this picture are dependent on merge order; the revisions 4.1.5, 4.2.1, and 4.2.2 could instead be numbered 4.2.1, 4.1.5 and 4.1.6 respectively if the merges done to obtain revision 11 were done in a different order.

git

Method

See Understanding git history: Commits, and naming git commits.

git uses cryptographic checksums (in particular, sha1sums) of repository contents as revision identifiers. These checksums are 40-character hexadecimal strings (e.g. 621ff6759414e2a723f61b6d8fc04b9805eb0c20). Each revision also knows which revision(s) it was derived from (known as the revision’s parent(s)).

Git can be used with one branch per directory like bzr or hg, but it is more common to have branches stored within the same directory/repository (thus the reason some refer to git as a ‘branch container’). In git, branches record the revision of the most recent commit for the branch; since each commit records its parent(s), a branch consists of its most recent commit plus all ancestors of that commit. When a new commit is made on a branch, the branch just records the new revision. Tags simply record a single revision, much like branches, but tags are not advanced when additional commits are made. tags are not stored as part of a branch or in a revision controlled file, though by default tags that point to commits that are downloaded are themselves downloaded as well.

git revisions are unique by design; if you have the same revision in two different repositories, the revision name for both will be the same.

git does provide more human-meaningful ways of referring to commits, in the form of simple suffixes used to count backwards in history from the tip of a branch (or backwards from a tag or commit). This includes methods for counting relative to different parents, making the suffixes have structural meaning. However, such methods are somewhat hidden; for example, they are not shown in the output of git log. This leaves many users unaware of how to take advantage of them, if they are aware of them at all. (A simple wrapper can get them to be shown, at the cost of a little time; they could be shown at negligible time cost with an integrated solution, but none exists to my knowledge.)

Picture

           650a6f...
              |
           caf806...
          /   |   \         719b9d...
        /     |     \       /
      /       |       \   /
 75cc2c...  147c0a... acac44...
      \       |         |
        \     |         |
         8f50e6...    8147be...
         /    |     /
       /      |   /
  9b39b2... 6e2cde...
    |         |
  01fa22... 1a9d90...
    |    \    |
    |      \  |
 46508c...  b6765c...
    |         |
 1c4e8d...  328638...
       \     /
       6627f7b...
          |
       754b42...
          |    \
          |      \
       d1879f...  fba5d0...
          |
       c962db...

hg

Method

See a hg tour through history, and section 2.4.1, “Changesets, revisions, and talking to other people”.

hg uses a method that may look like a mix of the methods used by git and bzr; it has two distinct methods of referring to each revision. Like git, hg uses sha1sums to refer to revisions (though it abbreviates them to fewer characters by default). Like bzr, hg uses the numbers 1, 2, 3, etc. to refer to revisions. Thus hg has one unique method to refer to revisions and another that is simple and easily manipulatable by users. Each revision (or “changeset” in mercurial’s vocabulary) is of the form revision-number:changeset-identifier (e.g. 3:ff5d7b70a2a9).

Like bzr, branches in hg are typically done by creating separate directories (typically with their own repository). However, it also has named branches for naming branches within a repository, which are somewhat similar to git. (I have been told there are important distinctions between hg named branches and git branches, but I do not fully understand all the details; maybe someone will explain in the comments.)

mercurial has both tags and local tags, with (normal) tags being stored in an .hgtags file that is version controlled, and local tags being stored in a file that is not version controlled nor shared (cloned/pulled/pushed/etc.). Like most other systems, tags in hg are an alias for a specific commit.

The (abbreviated) sha1sum portion of hg revisions (the “changeset identifier”) is unique by design; if you have the same revision in two different repositories, the changeset identifier for both will be the same. The simple number portion of hg revisions (the “revision number”) is not unique. If you have the same revision in two different repositories, they will not necessarily have the same revision number in both.

Picture

             19:c87f92...
                |
             18:650a6f...
               |      \
        15:caf806...   \
         /     |        \
       /       |         \
      /        |          \
13:75cc2c... 14:147c0a... 17:acac44...
      \        /           |
        \     /            |
       12:8f50e6...      16:8147be...
         /    |        /
       /      |      /
9:9b39b2... 11:6e2cde...
    |         |
8:01fa22... 10:1a9d90...
    |    \    |
    |      \  |
5:46508c... 7:b6765c...
    |         |
4:1c4e8d... 6:328638...
       \     /
      3:6627f7b...
          |
      2:754b42...
          |
          |
      1:d1879f...
          |
      0:c962db...

Final notes

Each system uses a different scheme, which have different advantages and disadvantages. Odds are that I am not aware of all the relative merits of these systems yet, though I do know some. Personally, I don’t think any of them are optimal (though I admit that optimality is a somewhat relative term given the inherent trade-offs involved). Unfortunately I’m going off-topic, as I said I wouldn’t be discussing advantages and disadvantages in this post, so I’ll shut my trap here…

March 1, 2008

Elijah Newren
no nic
Elijah's Blog
» Happenings in the VCS world

It has been a long time since my last blog post on VCSes. I am getting back into the swing of things and will be making a few more posts. Besides, Olav doesn’t have enough to do and he wants more of my long rambling posts to digest.

The VCS world is becoming more and more interesting, even if it is also more and more frustrating. I’ll briefly point out a few things I have seen happen in the last few months that look cool, making this VCS post a little bit different than my others.

cvs

Stinking stingy CVS refuses to die…it seems to prefer slowly petrifying over the years or something. It was great a number of years ago, but there’s just so many better tools these days. However, there does appear to be a light at the end of the tunnel. The last place I am forced to use CVS (work) will finally be switching (to subversion) in a couple months. Woohoo!

svn

I haven’t seen any big changes in subversion itself (only one bug fix release has occurred). However, it looks like they are making progress on finally implementing useful merge functionality. This is interesting on a number of levels: (1) this lack of functionality was one of the big reasons subversion sometimes looks like a (very well polished) antique rather than a modern system; will the incorporation of this feature be enough to stave off some of the ongoing defections to other systems?, (2) this may be interesting for those using bzr-svn, hgsvn, or git-svn — are users of such systems going to find it even easier to use their preferred tool?, (3) the main reason svn’s dozen or so ugly renaming bugs (some of which essentially result in corrupted data) have gone almost completely unnoticed is that most are only triggered in merge operations and subversion’s current merge functionality is so primitive and problematic that hardly anyone uses it. Further, svn’s roadmap clearly lists fixing the rename problems in a different release, after the merge fixes are included. Will the extra visibility that one problem will receive due to a different problem being fixed make subversion look more problematic or less? This will be fun to watch.

On a separate note, it is interesting to see that subversion developers are considering adopting some features of distributed VCSes — sometime in the distant future. An easy to miss but interesting nugget from that email is the following:

Fortunately, we’ve pretty much agreed, IIRC, that we’re willing to punt on subdirectory detachability in working copies in order to get performance improvements.

I have often seen svn and cvs proponents argue that as one of the big advantages to those systems, yet it looks like the svn developers are willing to drop it. Very interesting indeed.

hg

Mercurial version 0.9.5 was released since I did my last round of VCS blog posts and it is on my system. hg-0.9.5 has quite a number of improvements; the one that particularly caught my eye was support for subversion as a source SCM in its convert functionality. When I first looked at mercurial, they suggested people use git-svn and then convert from git to hg. To me, that seemed to push people to just use git. It looks like this has changed.

I have often found it somewhat strange that mercurial doesn’t have more active vocal proponents. Usually one hears from the git or bzr proponents, but not so much from mercurial. Yet it has always had many of the advantages of both (and, in some ways seems to have the most svn-like UI, and would seem a more natural transition for svn converts). I guess it’s a case where having most of the advantages or capabilities of other systems (even multiple other systems) yet not clearly standing out in one particular area will rob you of the active advocates that you could otherwise have. Of course, maybe it’s like the linuxjournal reader’s choice awards phenomenon too; the noise or results that others hear may only be indicative of a certain small subset of the community.

bzr

A lot has happened in the Bazaar world. They had their big 1.0 release in mid-December and are now up to bzr-1.2. They have made impressive gains in performance, particularly with their adoption of the pack idea from git, and it appears they have at long last caught up to the leaders in the field in this area.

Near the end of last year, I corresponded about early versions of the “Main Competitors” writeups of the Why Choose Bazaar page, with Ian Clatworthy. I pointed out some advantages of bzr he hadn’t included, mentioned how some bold claims had no accompanying proof, and pointed out some places where he seemed to be unaware of capabilities of other systems or where I disagreed with some of his claims. The final versions seem to have mixed results; part of my feedback was addressed (and more was addressed in follow-ups), but other parts were not. I’m particularly puzzled by the reticence to investigate the existing capabilities of other systems and the willingness to claim features of bzr as advantages without determining whether they are actually unique. Regardless, though, while one does need to individually verify or discard each claim, the writeups are fairly impressive. I probably need to get back in touch with Ian again.

git

I’m so annoyed with Carl right now. He was the one who introduced me to git a number of years ago, and showed me some really cool things about it. I dropped it almost immediately at the time because it was way too hard to use. But, I’ve always been interested in it and made occasional attempts to tame the dragon ever since.

As many are aware, git has made huge strides towards usability in the 1.5 series, and has recently introduced automatic repacking in git-1.5.4. Because of all this work, I made diligent attempts to understand it over the last couple months. In doing so, I finally had the necessary epiphanies to feel I understand it. It turns out I was able to use it productively long before the uncomfortable feeling of I-don’t-really-understand-this-thing was finally expelled. The result? I found that there are several features of git not present in other systems that I am absolutely addicted to, but looking back on the journey I can’t say that it would be worth the effort for others to follow the same path, despite these awesome features. The thing is still too bloody hard to figure out.

One of my desires for my blog posts series was to point out how horrible the git manpages (i.e. the built in help system for git) are for new users, but I felt uncomfortable doing so until I actually understood them. I was not able to understand even the synopsis of the git-diff manpage until a couple weeks ago. And I tried. Hard. Over days, weeks, and months. I read up on reflogs, the index, git’s storage format, the git tutorial and all kinds of other documentation. I feel stupid now, because I was just missing something simple and now seemingly obvious. But from what I can tell, little should-be-obvious-but-aren’t things like this are blocking lots of people from being able to use git.

Long story short: git has become far more usable…mere mortals can actually figure the system out (a big change from earlier versions) if they have an unusually large level of patience and motivation. git has some really awesome features, but I just can’t recommend it to others in its current state.

December 15, 2007

Hans Fugal
no nic
The Fugue :
» Tailor, Mercurial, Leopard

I was getting this error when trying to use Tailor with Mercurial on Leopard:

Common base for tailor exceptions: 'hg' is not a known VCS kind: No module named mercurial

Mercurial was installed, so the error mystified me. Turns out to be a problem with python versions in use. Leopard's python is version 2.5.1, but mercurial from MacPorts depends on python 2.4, and so MacPorts installs python24. So when running tailor with python 2.5 and trying to load the mercurial module which is installed for 2.4, not 2.5, it naturally fails. The workaround is to run tailor with python 2.4 instead, which works fine.

#! /bin/sh
# Save as ~/bin/tailor and chmod +x
tailor=$HOME/src/tailor/tailor
exec python2.4 $tailor

December 11, 2007

Hans Fugal
no nic
The Fugue :
» Darcs2 Prerelease

Levi referred me to a post entitled "How I stopped missing Darcs and started loving Git". It's an interesting read, but ironically the thing that interested me most was a comment mentioning that just yesterday darcs 2.0.0pre1 was released. It looks like some very exciting things are coming down the darcs pipe:

  1. It should no longer be possible to confuse darcs or freeze it indefinitely by merging conflicting changes.
  2. Identical primitive changes no longer conflict.
  3. Darcs get is now much faster, and always operates in a "lazy" fashion, meaning that patches are downloaded only when they are needed.
  4. Darcs now supports caching of patches and file contents to reduce bandwidth and save disk space.
  5. Speed improvements.

The most exciting change is, of course, the elimination of the exponential merge. This is very good news indeed, if it means what I think it means. The second change I listed is also very interesting to me. You'll remember I posited that exact situation in a previous post, and was teased incessantly as a result.

Do read the release notes. If darcs2 is released within a reasonable time frame, it will continue to be a strong contender.

November 21, 2007

Hans Fugal
no nic
The Fugue :
» hgk with MacPorts

If you use mercurial from MacPorts, you need the following in your ~/.hgrc to enable the hgk extension (GUI tree browser, invoked with hg view):

[extensions]
hgk=

[hgk]
path=/opt/local/share/mercurial/contrib/hgk

While you're in there, enable the mq, fetch, and record extensions.

November 17, 2007

Hans Fugal
no nic
The Fugue :
» Mercurial and Darcs

I'm a long-time distributed revision control user and advocate. I think it's the only way to go, but this article isn't about convincing you of that. I'll just take it for granted.

From almost the beginning, I have used darcs. I did try Tom's ahem Lame Arch, which I found to be completely unusable. Darcs hasn't been around as long as arch, but it is in roughly the same generation as other early systems like monotone and bazaar. I think of these as the second wave of distributed revision control (the first wave being, basically, tla). When I read about darcs I was impressed by the theory of patches. When I tried darcs I was very impressed by how easy it is to use. Soon it became second nature and I never looked back.

But I did look forward, and sideways. When collaborating on a project using svn I gave svk a whirl. I don't hesitate to recommend svk if you're otherwise stuck with svn.

When the third wave of distributed revision control came along, ushered in by the Great Linux/BitKeeper Fiasco, I took a cursory look at git. A couple years later (aka earlier this year) I took a closer look. I can see that it has come a long way (in user interface). You don't need "chrome" or whatever they used to call the wrappers like cogito anymore. It's fairly usable. But it's also fairly complicated (that's an understatement, if you left your understatement detector at home). It's perfectly tailored for what Linus and other kernel developers need, but that's not necessarily what the rest of us need. I wouldn't hesitate to nod my head in agreement if you moved from CVS or subversion to git.

Not entirely satisfied with git, and not being able to use darcs (because of the dreaded exponential merge, and general slowness), I looked at Mercurial (Hg from now on). I found that it was fast like git, but easy to use like darcs. The documentation is excellent, probably the best in the game. The underlying theory is easy to grok, and this is important. Darcs is the same way, although the two theories are definitely different. It has served me well in the application where I couldn't use darcs, but I remained a darcs user for other things.

But mercurial has been seeping into my consciousness of late. Rumors surface from time to time of darcs users migrating to mercurial (it seems to be the only one they migrate to). Other people praise it too. I find it plenty easy to use even as a second-class arrow in my quiver. Finally it gains enough ground that I decide to use it for a project where I'm actually exercising the version control system. I have really liked what I've seen. So the rest of this already-long post will be a comparison of darcs and mercurial. I'm not saying either one or even both together are "the best" by any objective measure, though I definitely prefer them over others myself. I don't know that I'll come to any conclusion over whether to use darcs or mercurial in the future (I haven't yet, anyway, though I'm leaning towards mercurial). I just want to get my comparison out there. Note also that I'm probably not going to mention every feature that I love that they both have in common. As such this isn't an evangelism for either one over the rest of the pack. Just a comparison of what distinguishes these two.

First, darcs. Darcs has excellent support for cherry picking. Its user interface is second to none. It is easy to mail a set of patches to another darcs user or to someone in general. The theory of patches is very interesting and flexible, and easy to follow (until strange things happen, when it gets really hard to follow). It's written in Haskell by a physicist, which has got to count for something. darcs record, which will ask you whether you want to check in each hunk that's changed in your working directory, rather than an all-or-nothing commit, is a joy to work with (especially for those of us who seem to lack the discipline to check in features and bugfixes in neat little packages).

On the other hand, darcs uses _darcs to store its metadata, which is probably for cross-platform reasons but is definitely an eyesore. (.darcs would be better). It's annoyingly inconvenient to grab a specific revision (partly because such a concept doesn't really exist) or a specific point of time. Darcs has no idea of branching. Or more precisely, every darcs repository is its own branch and so it leaves that up to you. That's theoretically ok, but I do prefer to keep the clutter of multiple working copies at bay. Combined with not having to rebuild the whole project from scratch and not wasting the disk space for a large working directory, in-place switching to certain branches/revisions like git and hg do is really nice. Darcs doesn't support symlinks or permissions, again for cross-platform reasons. The two real thorns in the side of darcs are the dreaded exponential merge (I have run into it, alas), and the fact that I have a devil of a time getting it to run everywhere I am. Haskell is not that common, and it's a big thing to have to build, e.g. with MacPorts. That's when it will build at all (but that MacPorts rant is for another post). There are binaries for windows and mac, but sometimes they don't work and versions don't match up... it can be a nightmare. It can usually work out, but it can be difficult.

I mentioned cherry picking. One UI flaw in darcs that greatly reduces the real-world utility of its excellent cherry picking support is that you can't tell it that you would prefer to refuse this patch for ever and ever. Every time you pull you have to tell it "no, I don't want that patch". This makes maintaining similar but subtly different branches impractical. A better tool for this job is quilt. In practice, I've done precious little cherry picking. It's always nice to know I can, but after some deep thought on the subject over the last couple days I'm not at all convinced that it's worth supporting in the revision control system. More on that in another post.

Now, mercurial. It's fast, easy to use, and easy to install. It is written in Python (with some C) and works well on all three platforms. The .hg directory is not an eyesore. hg is quick and easy to type. It supports branching. It has a powerful extension mechanism, with many interesting extensions available. The theory is easy to grok and follow, so you're not surprised very often. The implementation seems excellent, from the point of view of both a user and a system administrator. CVS-like abbreviations for minimal typing are handy (like hg ci for hg commit). I've already mentioned the excellent documentation. I might as well mention an excellent Google Tech Talk given by the author of said documentation. hg clone is fast and efficient and uses hardlinks where it can (for the metadata, not the working directory). It has a built in webserver for quick and easy web-based collaboration (for firewall reasons or for an interface to the repo for non-hg users). It's storage efficient, and has the novel and all-important optimization principle of avoiding disk seeks. That hg, written in python, gives git a run for its money shows that the devs know what they're doing when it comes to systems programming. Mercurial Queues is extremely cool and useful.

As for the cons, although the theory is easy to grok and doesn't surprise you much, it will surprise you as a newcomer unless you grok the theory. This is especially true of pull and push which don't update the working directory. Some of the most useful stuff is provided by extensions that are disabled by default (at least they are distributed). fetch, record (for darcs-like hunk-by-hunk recording), mq, bisect, transplant (for cherry picking). It doesn't have the cherry picking abilities of darcs, though there is the record extension (for cherry picking from your working directory) and the transplant extension (for cherry picking proper). I haven't used the latter yet so I don't know if it works well or not. .hgignore is easy to manage, but comes with almost no useful defaults. This is for performance reasons apparently, but I'd be willing to take the hit. Let me override with a minimal .hgignore if performance matters that much on this project. I don't like that I have to type ssh urls out, as ssh://example.com//path/to/repository, and it's too difficult to set the default push/pull location so I end up having to type it more than I'd like. Directories aren't first class, which makes me slightly uneasy. It hasn't caused me grief yet though.

I don't like that you don't get compatible repositories if you start from the same code. For example:

tar xzf foo-1.0.tar.gz
rsync -a foo-1.0 foo-1.0-2
cd foo-1.0
hg init
hg ci -Am 'initial'
cd ../foo-1.0-2
hg init
hg ci -Am 'initial'

At this point you cannot pull or push from one of these repositories to the other, although they are semantically identical. I do this from time to time with darcs.

Last, and while certainly not least it's certainly not the biggest deal, mercurial's website is ugly.

So there you have it. Look for a post on quilt and MQ and some thoughts on cherry picking soon.