A Django site.
November 11, 2008

Hans Fugal
no nic
The Fugue :
» git-push is worse than worthless

Ugh

Let's say you are a web developer, and you do development on your laptop, then when things are nice and shiny you want to push those changes to the webserver. Seems natural enough, right?

git clone server:/var/www/foo
# ...
git pull

# edit stuff
git commit -a -m 'i edited stuff'

Now, let's say you're a (possibly former) darcs/bzr/mercurial user and this time you're using git. Git has git-push. You read the man page like a good little code monkey, and it seems like it does the same or similar thing to darcs push or hg push. It seems like if you want to push your changes to the server, you'd do this:

git push

Am I off in left field or does this not seem 100% rational? But wo be unto the code monkey that utters this unfortunate incantation. Observe:

$ mkdir foo
$ cd foo
$ git init
Initialized empty Git repository in /private/tmp/foo/.git/
$ echo hello > foo.txt
$ git add foo.txt
$ git commit -m 'hello'
Created initial commit bee50da: hello
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 foo.txt
$ cd ..
$ git clone foo bar
Initialized empty Git repository in /private/tmp/bar/.git/
$ cd bar
$ echo goodbye >> foo.txt
$ git commit -a -m goodbye
Created commit 99c13c1: goodbye
 1 files changed, 1 insertions(+), 0 deletions(-)
$ git push ../foo
Counting objects: 5, done.
Writing objects: 100% (3/3), 248 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
To ../foo
   bee50da..99c13c1  master -> master
$ cd ../foo
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#   modified:   foo.txt
#
$ git diff
$ git diff --cache
error: invalid option: --cache
$ git diff --cached
diff --git a/foo.txt b/foo.txt
index a32119c..ce01362 100644
--- a/foo.txt
+++ b/foo.txt
@@ -1,2 +1 @@
 hello
-goodbye
$ git log
commit 99c13c1e60888ae2c0e221898411e1cd52ad3815
Author: Hans Fugal <hans@fugal.net>
Date:   Mon Nov 10 17:11:57 2008 -0700

    goodbye

commit bee50da72798edc47ddc36dbc4f559f141b1e28b
Author: Hans Fugal <hans@fugal.net>
Date:   Mon Nov 10 17:11:34 2008 -0700

    hello

I promise I didn't fake that. Yes, you saw that correctly—git wants to undo the changes you just committed. If you happen to have a clean working directory, all you need to do to return to sanity is git reset HEAD. If not, heaven help you.

This is totally unacceptable. It's unforgivable on so many levels. At the very least, the manpage should warn you to not push to repositories with working copies. Git should warn you before you push and screw up your repo that it has a working copy checked out. Ideally, git would behave like darcs and update the working copy. Suboptimally, it would behave like mercurial and make it a new revision that you have to manually checkout. But this is simply ridiculous.

So what is the solution? They tell you to use pull. Hello! Is anyone home? My laptop is roving. It's often behind a NATing firewall. I'm supposed to find my public IP address and figure out how to subvert the evil firewalls of the world every time I want to push my changes to the server?

A workaround, and probably the best real-world workflow, is to have a second bare repository or a second branch on the server, push into that, then ssh into the server and pull the changes. I think this page describes how to do that with a second branch, though I'm short on time to actually try it out at the moment.

More of this sickening story in this thread, where you will learn that at least one other person out there has his head screwed on properly, that the developers are more interested in how hooks work (and fail to allow you to do this even if you grok them), and that they've discussed the problem before and decided the correct response is to RTFM (M for minds this time, since the manual was completely unhelpful).

Update: Some of you have been quick to defend git and the design choice of how push behaves. I want to clarify that I don't care so much that push updates the repository but not the working directory. Mercurial works this way too. Not the way I'd do it but it's a valid approach. The problem here is that git push seems like a natural thing to do but screws up your working directory on the remote side. Mercurial doesn't change the working directory, but neither does it silently rebase it and set you up to undo your changes if you're not careful. The problem here is a lack of safety and a lack of warning. They know it's a problem, they've fielded enough "morons". A few words of warning in the man page is all it would take to make me happy.

And now, I have had time to work out a more specific workaround. Here's what I did, and it seems to work well:

# Server setup (set up incoming branch)
server$ git branch incoming
server$ git branch
  incoming
* master
  origin

# Laptop setup (local master to remote incoming)
laptop$ git config remote.origin.push master:incoming

# Everyday usage
laptop$ git push
Counting objects: 5, done.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 279 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
To server:/tmp/foo
   b108a07..a9d3282  master -> incoming

server$ git status
# On branch master
nothing to commit (working directory clean)
server$ git pull . incoming
From .
 * branch            incoming   -> FETCH_HEAD
Updating b108a07..a9d3282
Fast forward
 foo.txt |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

You could use hooks to automatically do that git pull . incoming if you liked, making it more like darcs than mercurial.

Updated update: On further thought, the cleanest solution is probably to have a separate master (bare) repository, e.g.

$ mkdir master
$ cd master
$ git init
Initialized empty Git repository in /private/tmp/foo/master/.git/
$ git commit --allow-empty -m initial
Created initial commit 999755e: initial
$ cd ..
$ git clone master live
Initialized empty Git repository in /private/tmp/foo/live/.git/
$ cd live
$ git branch
* master
$ cd ..
$ git clone master laptop
Initialized empty Git repository in /private/tmp/foo/laptop/.git/
$ cd laptop
$ echo hello > foo.txt
$ git add foo.txt
$ git commit -m hello
Created commit 2297bcf: hello
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 foo.txt
$ git push
Counting objects: 4, done.
Writing objects: 100% (3/3), 239 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
To /tmp/foo/master/.git
   999755e..2297bcf  master -> master
$ cd ../live
$ ls
$ git pull
remote: Counting objects: 4, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From /tmp/foo/master/
   999755e..2297bcf  master     -> origin/master
Updating 999755e..2297bcf
Fast forward
 foo.txt |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 foo.txt
$ echo goodbye >> foo.txt
$ git commit -a -m goodbye
Created commit 04f6702: goodbye
 1 files changed, 1 insertions(+), 0 deletions(-)
$ git push
Counting objects: 5, done.
Writing objects: 100% (3/3), 248 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
To /tmp/foo/master/.git
   2297bcf..04f6702  master -> master
$ cd ../laptop
$ git pull
remote: Counting objects: 5, done.
remote: Total 3 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (3/3), done.
From /tmp/foo/master/
   2297bcf..04f6702  master     -> origin/master
Updating 2297bcf..04f6702
Fast forward
 foo.txt |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

October 24, 2008

Dennis Muhlestein
nonic
All My Brain
» A few reasons braid is better than 40 lines of Rake.

I've been doing a lot of looking at the git subtree merge strategy for tracking remote projects. Ideally, our remote projects would be tracked with submodules. The issue is, we are developing our own libraries for use in our own internal projects. We commonly do major development on the libraries and the [...]

September 15, 2008

Dennis Muhlestein
nonic
All My Brain
» Sharing git branches

I've been learning git lately. Here are a few tips for sharing branches I've collected during the past few weeks. Create a branch In git, branches are stored on your local machine. Even the commonly named "master", is just a branch on your local machine. Master is simply set up to track a branch [...]

September 11, 2008

Scott Paul Robertson
spr
Spr: The Ramblings
» Rambling on Git

So during my UTOSC presentation I add some spare time, and ended up doing a "Git in Five Minutes" demo. I've done a written version for everyone to enjoy. So yes, I've written a Git tutorial.

Oh, and this Git Magic tutorial is pretty good too.

May 19, 2008

Kevin Kubasik
nonic
For Once I Oneder
» Bazaar and its Rockage


So, I think most of the open source world has agreed that the DRCS model fits our working style better than the traditional model pushed by SVN and CVS etc. And in this DRCS world we have rallied around 3 main tools: Bzr, hg, and git. And in an even greater display of complacency we have given those 3 tools quick and general classifications that became obsolete almost a year ago. Bzr is user friendly but slow and technologically inferior, hg is the champion of the middle but with slow development and a lackluster community, git is wicked fast and ‘The Right Way’TM but a pain to use.

Really? Come on guys, those molds were cast almost a year and a half ago, isn’t it time we looked at things again? Git has an entirely new interface, hg has a slew of plugins/extensions, and bzr has a completely new repo format, and network protocol, resulting in a massive speedup. Now I’m not claiming to be some unbiased source, and comparing 3 incredibly robust tools is not my job, but given the amount of support that Git receives from its very vocal supporters makes me feel a need to give props to my favorite DRCS system: Bazaar.

That’s right, Bazaar (or bzr) is awesome. Sure, git is awesome too, and so is mercurial, but I have found myself loving bzr. I’m not going to attack other DRCS tools, I just want to extol the awesomeness that is bzr.

1) Bzr is Python-Tastic! - As a python hacker, being able to utilize a robust API and plugin system is a cool plus, this also generates lots of powerful and complete plugins, which leads me to the next point.
2) Bzr has a ton of plugins! - Plugins like bzr-avahi (allows the discovery of branches on a local network, great for sprints/hackfests), bzr-svn (makes working with upstream repositories easy as pie!), quilt and gtk tools.
3) Bzr works on Windows - Yeah, I’m not a huge fan of accommodating Windows users, but it makes collaboration easier, I don’t have to make my roommate boot into Ubuntu to lend a hand with some CSS bugs.
4) Bzr is easy to share - The ability to push branches to some central repo is a big component of distributed development. While patches work in some cases, most of the time, having access to a branch makes the whole system work better. Both Git and Hg require a bit of work to set up a new repo and push a branch, bzr supports a ton of protocols and can create the target directory/repo with one command. Sharing is easy!
5) Bzr is fast - Maybe others are faster, maybe it could be a million times faster, I dunno. What I do know is the only thing I seem to wait on is my net connection… I realize that many people need more than that. So here you go. http://bazaar-vcs.org/Benchmarks
6) Bzr is small - In my development model (a shared repo with branches inside of it) bzr is compact and aware of disk space, without repositories it might be huge, I dunno.
7) Bzr is clear about whats happening - I can follow what Bzr is trying to do with my code. A branch is a new directory, and I can always see my code. Not only is this comforting/reassuring, but I often utilize IDE’s like Wing, Eclipse, or Monodevelop when working on code, and while they can handle other systems, directories for branches translates to every editor and works well. 8) Bzr is reliable - A massive suite of unit-tests and a commitment to their excellence offers some comfort that I won’t be left holding half of my code in one hand and an ugly binary blob in the other.
9) Most of all, its a feeling. Its hard to explain, but I don’t notice bzr. Its just there, and I just have my code. I rarely take notice of it, and don’t focus on it. I spend 99% of my time coding and every 30 min I enter a terminal for a few seconds to do all my DRCS stuff. Maybe its why people who use Bzr aren’t very vocal about it. Its not a revolution in revision control, and I don’t do a million cool things in it. I just write code, and bzr is there, doing whatever it does.

April 13, 2008

Clint Savage
herlo
Sexy Sexy Penguins » Tech
» Succumbing to the pressure

My T60p.

[clints@herlo-lap ~]$ history|awk ‘{a[$2]++ } END{for(i in a){print a[i] ” ” i}}’|sort -rn|head
144 svn
144 cd
108 ls
104 ./manage.py
101 ssh
69 su
43 screen
26 vim
25 rm
15 ping

[clints@thor ~]$ history|awk ‘{a[$2]++ } END{for(i in a){print a[i] ” ” i}}’|sort -rn|head
266 git
260 make
71 cd
57 ls
55 vim
55 rt
26 rm
19 bin/send-patch
18 grep
16 bin/validate

I guess I love RCS’.

Cheers,

Herlo

March 31, 2008

Elijah Newren
no nic
Elijah's Blog
» Many different kinds of revision specifiers

Version control systems each use their own method to refer to different versions (also known as ‘revisions’) of the repository. The choice of revision specification often reflects underlying data structures, and the choice of data structures often inhibits or enables various features for the system. Additionally, the methods of displaying and using revision specifiers can also affect the ease with which users can learn and use the new system.

Unfortunately, a full comparison is beyond the scope of this post. I will concentrate on simply introducing the basics and giving a flavor for how things are layed out, which itself is a long enough topic. While conclusions could be drawn with just the data and explanations presented here, I am intentionally avoiding doing so and leaving such to possible later posts. (Besides, bloody taxes and the brain-damaged US tax code have stripped me of any time that I would need to write such additional comparisons.)

Warning: My pictoral representations for each system will be crazier and more complex than usual (and even more lopsidedly complex for some systems than others) in order to keep things short while still showing what is possible.

cvs

Method

See cvs revision numbers and cvs branching basics, particularly figure 2.4 near the end of the branching basics section.

CVS has revision identifiers that are per-file, meaning that repositories at any given time are a combination of many different revisions (one for each file). Ignoring an ugly technical detail about the special revisions 1.1.1.1 and 1.1.1, the first version of a file is numbered 1.1. The next change to the file is recorded as 1.2, the next is 1.3, and so forth. If the user wants to create a branch, based on the 1.3 version of a file, then the branched version is 1.3.2.1. Changing and committing the file on the branch results in 1.3.2.2, then 1.3.2.3, etc. A second branch also created off of 1.3 would be numbered 1.3.4 instead of 1.3.2 (with actual commits numbered 1.3.4.x).

Note that branches are named by a revision with one less number (e.g. 1.4.2 is the name of the branch with commits numbered 1.4.2.x). As such, branch names refer to the beginning of the branch. Each file is branched separately, with per-file revision numbers (it is even possible to branch some files without branching others).

Tags are aliases for a specific version number. Since revisions are per-file, a given tag may refer to different revision numbers for different files (e.g. the ‘v1.0′ tag might refer to version 1.27 of foo.c, 1.36 of bar.h, and 1.218 of foobar.py)

Uniqueness of cvs revisions is not an issue since there is only one repository.

Picture

                       (etc)
                         |
             (etc)   1.4.4.3.2.2
               |         |
            1.4.4.5  1.4.4.3.2.1
               |         |
            1.4.4.4  (1.4.4.3.2)
               |     /
               |   /
            1.4.4.3
               |
  1.4.2.2   1.4.4.2
     |         |
  1.4.2.1   1.4.4.1
     |         |
  (1.4.2)   (1.4.4)
      \       /
       \     /
        \   /
         \ /
         1.4
          |
         1.3
          |
         1.2
          |
         1.1

svn

Method

See svn revisions and working with your branch, particularly figure 4.4 (the branching of one file’s history).

svn uses global revision identifiers, with the first revision being marked as 1, the second as 2, the third as 3, etc.

Branches have an unusual implementation in subversion; they are handled by a namespacing convention: a branch is the combination of revisions within the global repository that exist within a certain namespace. Creating a new branch is done by copying an existing set of files from one namespace to another, recorded as a revision itself.

Tags (an alias for a specific version in history) don’t exist in subversion. Instead, subversion again uses a namespacing convention identical to that done for branches (thus making tags and branches indistinguishable in subversion other than the chosen names), and users are merely discouraged from committing additional changes to files within a tag namespace.

Uniqueness of svn revisions is not an issue since there is only one repository.

Technically, a revision could simultaneously modify any combination of branches and tags by simply committing to all namespaces; however, this is typically discouraged and users only have a certain namespace checked out at a time.

Picture

  trunk   branches/proj-2-22  branches/proj-2-20  tags/RELEASE_2_22_2
   24
                                                        23
                 22
   21
   20
                 19
                                     18
                                     17
   16
   15
                 14
                 13
                 12
   11
                                     10
    9
    8
    7
                                      6
    5
                                      4
    3
    2
    1

bzr

Method

See understanding bzr revision numbers and specifying bzr revisions.

bzr, like svn, uses 1, 2, 3, etc. for revision numbers. However, the revision numbers are always consecutive in a branch. Merged in changes from other branches are given 3 numbers per revision. For example, if changes were merged from a repository that has changes relative to revision 2, the changes would come into the current branch numbered 2.1.1, 2.1.2, 2.1.3, etc. If changes from more than one branch are relative to the same commit, then the middle number is used to distinguish commits from the different branches. Thus one would see another set of changes relative to commit 2 numbered as 2.2.1, 2.2.2, 2.2.3, 2.2.4, etc. (Versions of bzr older than 1.2 used more than 3 numbers in certain cases, but that is no longer true of current versions.) See the picture below to make this clearer.

Branches in bzr are done by creating separate directories (typically with their own repository), though one can set up shared repositories. Each branch will have its own numbering scheme for the revisions it stores, recording the order that the revisions entered that repository. (See below about uniqueness issues.)

Tags in bzr are an alias for a commit, and are stored as part of a branch.

Note that bzr revision numbers are not unique. If you have the same revision in two different repositories, they will not necessarily have the same revision number in both. bzr does store unique identifiers for revisions, known as revid’s (an example of which looks like Matthieu.Moy@imag.fr-20051026185030-93c7cad63ee570df), though they are not shown by default. Users can obtain these unique identifiers by passing the –show-ids flag to bzr log, and these revids can be used in place of the simpler default revision specifiers when prefixed with “revid:”.

Picture

              12
              |
              11
            / | \
          /   |  \
        /     |   \
      10    4.1.5  4.2.2
       \   /  |      |
        \ /   |      |
         9    |    4.2.1
        / \   |   /
       /   \  |  /
       8    4.1.4
       |      |
       7    4.1.3
       | \    |
       |   \  |
       6    4.1.2
       |      |
       5    4.1.1
        \   /
          4
          |
          3
          |
          2
          |
          1

Note: The revision identifiers shown in this picture are dependent on merge order; the revisions 4.1.5, 4.2.1, and 4.2.2 could instead be numbered 4.2.1, 4.1.5 and 4.1.6 respectively if the merges done to obtain revision 11 were done in a different order.

git

Method

See Understanding git history: Commits, and naming git commits.

git uses cryptographic checksums (in particular, sha1sums) of repository contents as revision identifiers. These checksums are 40-character hexadecimal strings (e.g. 621ff6759414e2a723f61b6d8fc04b9805eb0c20). Each revision also knows which revision(s) it was derived from (known as the revision’s parent(s)).

Git can be used with one branch per directory like bzr or hg, but it is more common to have branches stored within the same directory/repository (thus the reason some refer to git as a ‘branch container’). In git, branches record the revision of the most recent commit for the branch; since each commit records its parent(s), a branch consists of its most recent commit plus all ancestors of that commit. When a new commit is made on a branch, the branch just records the new revision. Tags simply record a single revision, much like branches, but tags are not advanced when additional commits are made. tags are not stored as part of a branch or in a revision controlled file, though by default tags that point to commits that are downloaded are themselves downloaded as well.

git revisions are unique by design; if you have the same revision in two different repositories, the revision name for both will be the same.

git does provide more human-meaningful ways of referring to commits, in the form of simple suffixes used to count backwards in history from the tip of a branch (or backwards from a tag or commit). This includes methods for counting relative to different parents, making the suffixes have structural meaning. However, such methods are somewhat hidden; for example, they are not shown in the output of git log. This leaves many users unaware of how to take advantage of them, if they are aware of them at all. (A simple wrapper can get them to be shown, at the cost of a little time; they could be shown at negligible time cost with an integrated solution, but none exists to my knowledge.)

Picture

           650a6f...
              |
           caf806...
          /   |   \         719b9d...
        /     |     \       /
      /       |       \   /
 75cc2c...  147c0a... acac44...
      \       |         |
        \     |         |
         8f50e6...    8147be...
         /    |     /
       /      |   /
  9b39b2... 6e2cde...
    |         |
  01fa22... 1a9d90...
    |    \    |
    |      \  |
 46508c...  b6765c...
    |         |
 1c4e8d...  328638...
       \     /
       6627f7b...
          |
       754b42...
          |    \
          |      \
       d1879f...  fba5d0...
          |
       c962db...

hg

Method

See a hg tour through history, and section 2.4.1, “Changesets, revisions, and talking to other people”.

hg uses a method that may look like a mix of the methods used by git and bzr; it has two distinct methods of referring to each revision. Like git, hg uses sha1sums to refer to revisions (though it abbreviates them to fewer characters by default). Like bzr, hg uses the numbers 1, 2, 3, etc. to refer to revisions. Thus hg has one unique method to refer to revisions and another that is simple and easily manipulatable by users. Each revision (or “changeset” in mercurial’s vocabulary) is of the form revision-number:changeset-identifier (e.g. 3:ff5d7b70a2a9).

Like bzr, branches in hg are typically done by creating separate directories (typically with their own repository). However, it also has named branches for naming branches within a repository, which are somewhat similar to git. (I have been told there are important distinctions between hg named branches and git branches, but I do not fully understand all the details; maybe someone will explain in the comments.)

mercurial has both tags and local tags, with (normal) tags being stored in an .hgtags file that is version controlled, and local tags being stored in a file that is not version controlled nor shared (cloned/pulled/pushed/etc.). Like most other systems, tags in hg are an alias for a specific commit.

The (abbreviated) sha1sum portion of hg revisions (the “changeset identifier”) is unique by design; if you have the same revision in two different repositories, the changeset identifier for both will be the same. The simple number portion of hg revisions (the “revision number”) is not unique. If you have the same revision in two different repositories, they will not necessarily have the same revision number in both.

Picture

             19:c87f92...
                |
             18:650a6f...
               |      \
        15:caf806...   \
         /     |        \
       /       |         \
      /        |          \
13:75cc2c... 14:147c0a... 17:acac44...
      \        /           |
        \     /            |
       12:8f50e6...      16:8147be...
         /    |        /
       /      |      /
9:9b39b2... 11:6e2cde...
    |         |
8:01fa22... 10:1a9d90...
    |    \    |
    |      \  |
5:46508c... 7:b6765c...
    |         |
4:1c4e8d... 6:328638...
       \     /
      3:6627f7b...
          |
      2:754b42...
          |
          |
      1:d1879f...
          |
      0:c962db...

Final notes

Each system uses a different scheme, which have different advantages and disadvantages. Odds are that I am not aware of all the relative merits of these systems yet, though I do know some. Personally, I don’t think any of them are optimal (though I admit that optimality is a somewhat relative term given the inherent trade-offs involved). Unfortunately I’m going off-topic, as I said I wouldn’t be discussing advantages and disadvantages in this post, so I’ll shut my trap here…

March 15, 2008

Elijah Newren
no nic
Elijah's Blog
» How NOT to write newbie-friendly documentation

Warning: This is essentially a long rant about the built-in help of git, but with specific problems pointed out and some constructive suggestions buried at the very end.

When I was starting on my PhD in mathematics, there was a time I remember where the professor in one of my classes referred to a problem on our homework assignment as “a calculus problem.” Later during the week, the six or so students in the small class were talking about the homework and one of the brighter students (i.e. not me) trepidly asked, “Did he really call that a calculus problem?” It turns out we were all stumped by the problem, and we were surprised–and scared–that the professor utterly trivialized it. We did all eventually solve the problem, and did ultimately agree that it was a generalization of ideas from calculus, but not until after getting a bit of a scare that we were in over our heads in graduate school.

In a similar way, Carl and others have argued that they just don’t see the big user-visible differences in the models offered by git and other VCSes. Many a git user seems to have had trouble understanding why other systems would be marketed as being easier than git. I am a lot slower than most of them, but I eventually figured it out and I do think that git could be just as easy if tweaked appropriately; however, it sure does seem to place every obstacle possible in the way of users discovering that.

This blog post is dedicated to what I believe is the single biggest obstacle in the way of users learning git: its built-in documentation (which doubles as its manpages, or vice-versa). Let me first point out that the git manpages are stellar from the point of view of comprehensiveness, or from the point of view of those writing their own UI for git. However, in this post I will judge these documents based on their friendliness to users who are new to git (which may not have been one of their design goals for those pages, so such judgement may come across as unfair).

Do not layer concepts — make it all or nothing

Imagine taking a computer user who is bright but unfamiliar with unix or common editors associated with unix, and giving them vi (or vim) to run. Also, imagine making them unable to use vi until they understand every feature and capability of it. Without a way to incrementally improve their productivity (better manuals, or a friend to explain things, etc.), most users will either give up or take an extremely long time to become productive. While this analogy is a bit of a stretch, for the most part this is precisely the way the git manpages are structured[1].

I’ll take push and pull as an example. If you take a look at git’s help for push or pull (or fetch), you’ll note that “refspecs” are featured prominently. Embedded in the explanation of refspecs, you’ll find concepts, terms, and other details such as rebasing, octopus, configuration file syntax and effects of such configuration options, fast-forwarding, nonlinear history, repository storage format, implementation details about git’s repository directory layout, local and remote repositories, remote tracking branches, and merge strategies–all inextricably intertwined (or so you’d think on your first 12 readings). refspecs pack a lot of information into a short amount of space, making them very convenient to the expert; but for new users this just places a mountain in front of them getting started. Since push and pull are the methods of collaborating with other users and refspecs are necessary to their functioning, new users often feel stuck.

In a similar way, when I was starting I thought that I would have to understand the index, reflogs, tree-ish’es, internal repository storage format, and a few other arcane things before I’d ever be able to do anything productive with git diff.

Assume that all details are equally useful

This section could likely be combined with the previous one, but I thought it was worth a special mention. It typically comes across as “this document is overloaded with detail”, though the existence of details is not necessarily the problem (you can have a comprehensive document and have it be understandable; take a look at the cvs, svn, or hg books at red-bean.com). The problem with documentation that assumes all details are equally useful (typically a subconscious assumption) is that it makes it extremely hard for the user to get started. git’s commands have boatloads of flags, and with few exceptions there is not much guidance at all about which are more important for just getting started. Users have to sift through it all and try to figure it out themselves.

The manpages also often specify details before concepts, erring as far as possible on the side of precision. For example, take the opening sentence of the description of the git diff command: “Show changes between two trees, a tree and the working tree, a tree and the index file, or the index file and the working tree.” After reading this, the user can only respond with comments like: “What’s the index file?” (or maybe even “is the ‘index file’ something slightly different than the ‘index’ I read about elsewhere, perhaps just a subset of it or how it behaves?”), “The working tree isn’t a tree?!?”, or, perhaps most likely, “Huh, what?” The git diff help tries, starting from the beginning, to introduce as many concepts as possible (without explanations) and just befuddles the user. For another example, take a look at the description of git checkout. For someone familiar with git, they won’t see anything wrong. In fact, I don’t see anything hard there anymore. However, do you know how many dozen times I tried to parse those two short paragraphs in attempting to learn git? I don’t — I long since lost count.

Require understanding of deeply nested commands

The documentation in git often defers explanation by referring to other (low-level) commands, in a way that makes users feel they have to understand all the low-level commands too.

Here’s an example. From the git push documentation:

The <src> side can be an arbitrary “SHA1 expression” that can be used as an argument to git-cat-file -t.

From the git-cat-file documentation, explaining the -t option:

Instead of the content, show the object type identified by <object>.

Finding the definition of <object> on the git-cat-file page:

The name of the object to show. For a more complete list of ways to spell object names, see “SPECIFYING REVISIONS” section in git-rev-parse

(More complete?!? There was no explanation here at all!) And for kicks, the very beginning of the SPECIFYING REVISIONS section of git-rev-parse:

A revision parameter typically, but not necessarily, names a commit object. They use what is called an extended SHA1 syntax. Here are various ways to spell object names. The ones listed near the end of this list are to name trees and blobs contained in a commit.

If the user gets to this point they are like to ask whether the object from the git-cat-file page refers to a commit object, a different kind of object, or a more general case that includes both. Also, they may ask, if it’s just a commit object, then part of the ensuing explanation is going to be for something else other than what we are interested in…but are we going to be able to differentiate which parts of the explanation that is?

Here are some other examples, all of them taken from the the first paragraph or two of the descriptions of each command. From git-checkout: “It updates the named paths in the working tree from the index file (i.e. it runs git-checkout-index -f -u)” (one has to love following up an unintelligible (to new users) explanation by referring to a lower-level command that assumes the user knows even more). From git-bisect: “This command uses git-rev-list –bisect option to drive the binary search…“. From git-log: “The command takes options applicable to the git-rev-list command to control what is shown and how, and options applicable to the git-diff-tree commands to control how the changes each commit introduce[d] are shown.

Prefer implementational details to user-level concepts

Just one example I know of immediately…open up the git-checkout manpage and you’ll see a section titled “Detached Head”; it’s completely meaningless to new users. Trying to describe the concepts (”working with no active branch”, “working on an unnamed branch”?) would be much better. I think they’ve tried to stomp out issues in this category, so there are not as many as there used to be.

Make your terminology inconsistent

I’ll jump straight to a quick example from git: You put things in the “index” with “add” which is known as “staging”, you remove them with “reset” (or “rm –cached” if it’s the initial commit), you use flags like “–cached” to refer to index stuff in some places, use “HEAD” to avoid it (specifically in diff), and sometimes the flags related to it are “soft” vs “mixed” (which also need to be juggled with “hard” in git-reset). While users eventually learn all the synonyms and related terminology, it really slows learning down.

Use inconsistent conventions

The git manpages do not use consistent conventions. For example, taking a look at the synopsis lines you’ll see “<filepattern>” on git-add, “file” on git-annotate, and “-pNUM” on git-apply; why are non-literal items denoted with angle brackets in one case, uppercase in another, and no special markings in a third case? On the git-push manpage alone, you’ll see “–receive-pack=<git-receive-pack>” and also “–repo=all”; angle brackets in one case, and plain text in another? On the git-diff manpage you’ll see “<commit>{0,2}” and also “<path>…”, while on git-bundle you’ll see “refname…”, and on git-checkout-index you’ll see “[<file>]*”; so denoting more than one item is done with a variety of regexes, an ellipsis combined with angle brackets, or just a plain ellipsis.

The more ways you have of referring to things, the more likely users are to confuse them. One that bit me particularly hard was the “<commit>{0,2}” in the git-diff synopsis. Now, I use regular expressions daily and my usage of them spans emacs, grep, sed, perl, python, and probably other places I’m forgetting…but it still did not occur to me that this syntax in this context happened to be a regex. It looked somewhat similar to reflog notation in git, and I launched into all sorts of reading up on reflogs (and the documentation it depended on) trying to understand this detail. I also read the online tutorial, parts of the git user manual, blog sites, and even mailing list archives. I gave up and moved on, but never felt comfortable that I understood things…until months later when it dawned on me that this was a simple regex.

(And no, I don’t care that my post whining about inconsitent conventions is itself riddled with inconsistent conventions.)

What can be done to fix this?

Well…some people have fled to hg and bzr to avoid these problems in git. As far as built-in documentation, both of them have done well and have very nice documentation for new users; bzr’s is particularly well polished. Adopting one of those systems is a reasonable choice which I can’t fault people for making.

Now, I dislike reaming a project without mentioning something good or at least providing constructive feedback. In addition to spelling out some specific problems throughout this post, I came up with some constructive ideas for improvement: Easy GIT, a brainstorming session about making git more user-friendly, in the form of a usable demonstration. It also doubles as a transition tool, helping users switch from other systems (particularly svn). I don’t know if anyone wants to adopt any of my ideas or documentation, but it’s out there for people to evaluate (even if not yet complete).

I will also note that Govind Salinas had a similar idea and created pyrite (and did so before I started eg, though I didn’t learn of it until after I was using eg daily). So it seems that I am not the only one thinking along these lines…


[1] The online tutorial and user manual of git are different, but sadly users don’t want to read big long tutorials when they are already familiar with another “similar” system; rather they just use ‘git help’ and get the git manpages…and end up more confused than they started.

March 1, 2008

Elijah Newren
no nic
Elijah's Blog
» Happenings in the VCS world

It has been a long time since my last blog post on VCSes. I am getting back into the swing of things and will be making a few more posts. Besides, Olav doesn’t have enough to do and he wants more of my long rambling posts to digest.

The VCS world is becoming more and more interesting, even if it is also more and more frustrating. I’ll briefly point out a few things I have seen happen in the last few months that look cool, making this VCS post a little bit different than my others.

cvs

Stinking stingy CVS refuses to die…it seems to prefer slowly petrifying over the years or something. It was great a number of years ago, but there’s just so many better tools these days. However, there does appear to be a light at the end of the tunnel. The last place I am forced to use CVS (work) will finally be switching (to subversion) in a couple months. Woohoo!

svn

I haven’t seen any big changes in subversion itself (only one bug fix release has occurred). However, it looks like they are making progress on finally implementing useful merge functionality. This is interesting on a number of levels: (1) this lack of functionality was one of the big reasons subversion sometimes looks like a (very well polished) antique rather than a modern system; will the incorporation of this feature be enough to stave off some of the ongoing defections to other systems?, (2) this may be interesting for those using bzr-svn, hgsvn, or git-svn — are users of such systems going to find it even easier to use their preferred tool?, (3) the main reason svn’s dozen or so ugly renaming bugs (some of which essentially result in corrupted data) have gone almost completely unnoticed is that most are only triggered in merge operations and subversion’s current merge functionality is so primitive and problematic that hardly anyone uses it. Further, svn’s roadmap clearly lists fixing the rename problems in a different release, after the merge fixes are included. Will the extra visibility that one problem will receive due to a different problem being fixed make subversion look more problematic or less? This will be fun to watch.

On a separate note, it is interesting to see that subversion developers are considering adopting some features of distributed VCSes — sometime in the distant future. An easy to miss but interesting nugget from that email is the following:

Fortunately, we’ve pretty much agreed, IIRC, that we’re willing to punt on subdirectory detachability in working copies in order to get performance improvements.

I have often seen svn and cvs proponents argue that as one of the big advantages to those systems, yet it looks like the svn developers are willing to drop it. Very interesting indeed.

hg

Mercurial version 0.9.5 was released since I did my last round of VCS blog posts and it is on my system. hg-0.9.5 has quite a number of improvements; the one that particularly caught my eye was support for subversion as a source SCM in its convert functionality. When I first looked at mercurial, they suggested people use git-svn and then convert from git to hg. To me, that seemed to push people to just use git. It looks like this has changed.

I have often found it somewhat strange that mercurial doesn’t have more active vocal proponents. Usually one hears from the git or bzr proponents, but not so much from mercurial. Yet it has always had many of the advantages of both (and, in some ways seems to have the most svn-like UI, and would seem a more natural transition for svn converts). I guess it’s a case where having most of the advantages or capabilities of other systems (even multiple other systems) yet not clearly standing out in one particular area will rob you of the active advocates that you could otherwise have. Of course, maybe it’s like the linuxjournal reader’s choice awards phenomenon too; the noise or results that others hear may only be indicative of a certain small subset of the community.

bzr

A lot has happened in the Bazaar world. They had their big 1.0 release in mid-December and are now up to bzr-1.2. They have made impressive gains in performance, particularly with their adoption of the pack idea from git, and it appears they have at long last caught up to the leaders in the field in this area.

Near the end of last year, I corresponded about early versions of the “Main Competitors” writeups of the Why Choose Bazaar page, with Ian Clatworthy. I pointed out some advantages of bzr he hadn’t included, mentioned how some bold claims had no accompanying proof, and pointed out some places where he seemed to be unaware of capabilities of other systems or where I disagreed with some of his claims. The final versions seem to have mixed results; part of my feedback was addressed (and more was addressed in follow-ups), but other parts were not. I’m particularly puzzled by the reticence to investigate the existing capabilities of other systems and the willingness to claim features of bzr as advantages without determining whether they are actually unique. Regardless, though, while one does need to individually verify or discard each claim, the writeups are fairly impressive. I probably need to get back in touch with Ian again.

git

I’m so annoyed with Carl right now. He was the one who introduced me to git a number of years ago, and showed me some really cool things about it. I dropped it almost immediately at the time because it was way too hard to use. But, I’ve always been interested in it and made occasional attempts to tame the dragon ever since.

As many are aware, git has made huge strides towards usability in the 1.5 series, and has recently introduced automatic repacking in git-1.5.4. Because of all this work, I made diligent attempts to understand it over the last couple months. In doing so, I finally had the necessary epiphanies to feel I understand it. It turns out I was able to use it productively long before the uncomfortable feeling of I-don’t-really-understand-this-thing was finally expelled. The result? I found that there are several features of git not present in other systems that I am absolutely addicted to, but looking back on the journey I can’t say that it would be worth the effort for others to follow the same path, despite these awesome features. The thing is still too bloody hard to figure out.

One of my desires for my blog posts series was to point out how horrible the git manpages (i.e. the built in help system for git) are for new users, but I felt uncomfortable doing so until I actually understood them. I was not able to understand even the synopsis of the git-diff manpage until a couple weeks ago. And I tried. Hard. Over days, weeks, and months. I read up on reflogs, the index, git’s storage format, the git tutorial and all kinds of other documentation. I feel stupid now, because I was just missing something simple and now seemingly obvious. But from what I can tell, little should-be-obvious-but-aren’t things like this are blocking lots of people from being able to use git.

Long story short: git has become far more usable…mere mortals can actually figure the system out (a big change from earlier versions) if they have an unusually large level of patience and motivation. git has some really awesome features, but I just can’t recommend it to others in its current state.

February 10, 2008

Clint Savage
herlo
Sexy Sexy Penguins » Tech
» SCaLE 6x: I’m Here - Saturday in Review

Just left Jono Bacon’s presentation on “The future of the Linux Desktop”. He’s quite an awesome presenter. Afterward, I went down to the exhibit floor and got to say hi to Tom Callaway and actually met Thomas Chung from the Fedora Project. Both of these guys have such exuberance and joy, I love being part of the fedora project.

The next presentation was ‘ifdown -a Now! Becoming productive offline’, by Don Marti. It was awesome! He spent a bit of time talking about git, ikiwiki, blosxom, OfflineIMAP, Mairix and some ssh config rules to help productivity. There is some definite things that will help me become more productive with these tools.

The next presentation I attended was the video codecs presentation, but what was being discussed was stuff I’d already learned. So I headed over to ‘10 Years of GNOME’, with Ken VanDine (also the creator of Foresight Linux). GNOME features are definitely getting cooler, and discussions about Gimmie and the OpenSuse SLAB menu were held. Ken wants to see more involvement in the GNOME project, called GNOME Love. If you love GNOME, they’re making it easy to share the GNOME Love.

I was able to catch the last half of the Second Life presentation as its always been a curiosity to me.  I’m thinking of actually running it and seeing what its all about.  Thanks Liana!

At the end of the day, I skipped the reception in favor of a spirited talk with the folks from BakBone, then spent time talking with the organizers of SCaLE and was able to chat with the developer for the conference management system here.  Looks like they’re open sourcing there django app too, so we might be able to work with them too.

Tom Callaway was in the Fedora BoF, so I was required to go by that at 8pm and annoy him.  Turned out, I spent the next 3+ hours discussing everything from PulseAudio, RPMS, RHCE and PackageKit to Obama, Iraq and Ron Paul and the value system of patents in our nation.  It was a great evening.

Its time to sleep and another day of SCaLE will be upon us.  See you all then…

Cheers,

Herlo

December 11, 2007

Hans Fugal
no nic
The Fugue :
» Darcs2 Prerelease

Levi referred me to a post entitled "How I stopped missing Darcs and started loving Git". It's an interesting read, but ironically the thing that interested me most was a comment mentioning that just yesterday darcs 2.0.0pre1 was released. It looks like some very exciting things are coming down the darcs pipe:

  1. It should no longer be possible to confuse darcs or freeze it indefinitely by merging conflicting changes.
  2. Identical primitive changes no longer conflict.
  3. Darcs get is now much faster, and always operates in a "lazy" fashion, meaning that patches are downloaded only when they are needed.
  4. Darcs now supports caching of patches and file contents to reduce bandwidth and save disk space.
  5. Speed improvements.

The most exciting change is, of course, the elimination of the exponential merge. This is very good news indeed, if it means what I think it means. The second change I listed is also very interesting to me. You'll remember I posited that exact situation in a previous post, and was teased incessantly as a result.

Do read the release notes. If darcs2 is released within a reasonable time frame, it will continue to be a strong contender.

December 8, 2007

Elijah Newren
no nic
Elijah's Blog
» Limbo: Why users are more error-prone with git than other VCSes

Limbo is a term I use but VCS authors don’t. However, that’s because they tend to ignore a certain state that exists in all major VCSes (and give it no name because they tend to ignore it) despite the fact that this state seems to be the largest source of errors. I call this state limbo.

How to make git behave like other VCSes

Most potential git users probably don’t want to read this whole page, and would like their knowledge from usage of other VCSes to apply without learning how the index and limbo are different in git than their previous VCS (despite the really cool extra functionality it brings). This can be done by

  • Always using git diff HEAD instead of git diff
  • and

  • Always using git commit -a instead of git commit

Either make sure you always remember those extra arguments, or come back and read this page when you get a nasty surprise.

The concept of Limbo

VCS users are accustomed to thinking of using their VCS in terms of two states — a working copy where local changes are made, and the repository where the changes are saved. However, the working copy is split into three sets (see also VCS concepts):

  • (explicitly) ignored — files inside your working copy that you explicitly told the VCS system to not track
  • index — the content in your working copy that you asked the VCS to track; this is the portion of your working copy that will be saved when you commit (in CVS, this is done using the CVS/Entries files)
  • limbo — not explicitly ignored, and not explicitly added. This is stuff in your working copy that won’t be checked in when you commit but you haven’t told the VCS to ignore, which typically includes newly created files.

The first state is identical across all major VCSes. The second two states are identical across cvs, svn, bzr, hg, and likely others. But git splits the index and limbo differently.

One could imagine a VCS which just automatically saves all changes that aren’t in an explicitly ignored file (including newly created files) whenever a developer commits, i.e. a VCS where there is no limbo state. None of the major VCSes do this, however. There are various rationales for the existence of limbo: maybe developers are too lazy to add new files to the ignored list, perhaps they are unaware of some autogenerated files, or perhaps the VCS only has one ignore list and developers want to share it but not include their own temporary files in such a shared list. Whatever the reason, limbo is there in all major VCSes.

Changes in limbo are a large source of user error

The problem with limbo is that changes in this state are, in my experience, the cause of the most errors with users. If you create a new file and forget to explicitly add it, then it won’t be included in your commit (happens with all the major VCSes). Naturally, even those familiar with their VCS forget to do that from time to time. This always seems to happen when other changes were committed that depend on the new files, and it always happens just before the relevant developers go on vacation…leaving things in a broken state for me to deal with. (And sure, I return the favor on occasion when I simply forget to add new files.)

A powerful feature of git

Unlike other VCSes, git only commits what you explicitly tell it to. This means that without taking additional steps, the command “git commit” will commit nothing (in this particular case it typically complains that there’s nothing to commit and aborts). git also gives you a lot of fine-grained control over what to commit, more than most other VCSes. In particular, you can mark all the changes of a given file for subsequent committing, but unlike other VCSes this only means that you are marking the current contents of that file for commit; any further changes to the same file will not be included in subsequent commits unless separately added. Additionally, recent versions of git allow the developer to mark subsets of changes in an existing file for commit (pulling a handy feature from darcs). The power of this fine-grained choose-what-to-commit functionality is made possible due to the fact that git enables you to generate three different kinds of diffs: (1) just the changes marked for commit (git diff –cached), (2) just the changes you’ve made to files beyond what has been marked for commit (git diff), or (3) all the changes since the last commit (git diff HEAD).

This fine-grained control can come in handy in a variety of special cases:

  • When doing conflict resolution from large merges (or even just reviewing a largish patch from a new contributor), hunks of changes can be categorized into known-to-be-good and still-needs-review subsets.
  • It makes it easier to keep “dirty” changes in your working copy for a long time without committing them.
  • When adding a feature or refactoring (or otherwise making changes to several different sections of the code), you can mark some changes as known-to-be-good and then continue making further changes or even adding temporary debugging snippets.

These are features that would have helped me considerably in some GNOME development tasks I’ve done in the past.

How git is more problematic

This decision to only commit changes that are explicitly added, and doing so at content boundaries rather than file boundaries, amounts to a shift in the boundary between the index and limbo. With limbo being much larger in git, there is also more room for user error. In particular, while this allows for a powerful feature in git noted above, it also comes with some nasty gotchas in common use cases as can be seen in the following scenarios:

  • Only new files included in the commit
    1. Edit bar
    2. Create foo
    3. Run git add foo
    4. Run git commit

    In this set of steps, users of other VCSes will be surprised that after step 4 the changes to bar were not included in the commit. git only commits changes when explicitly asked. (This can be avoided by either running git add bar before committing, or running git commit -a. The -a flag to commit means “Act like other VCSes — commit all changes in any files included in the previous commit”.)

  • Missing changes in the commit
    1. Create/edit the file foo
    2. Run git add foo
    3. Edit foo some more
    4. Run git commit

    In this set of steps, users of other VCSes will be surprised that after step 4 the version of foo that was commited was the version that existed at the time step 2 was run; not the version that existed when step 4 was run. That’s because step 2 is translated to mean mark the changes currently in the file foo for commit. (This can be avoided by running git add foo again before committing, or running git commit -a for step 4.)

  • Missing edits in the generated patch
    1. Edit the tracked file foo
    2. Run git add foo
    3. Edit foo some more
    4. Run git diff

    In this set of steps, users of other VCSes will be surprised that at step 4 they only get a list of changes to foo made in step 3. To get a list of changes to foo made since the last commit, run git diff HEAD instead.

  • Missing file in the generated patch
    1. Create a new file called foo
    2. Run git add foo
    3. Run git diff

    In this set of steps, users of other VCSes will be surprised that at step 3 the file foo is not included in the diff (unless changes have been made to foo since step 2, but then only those additional changes will be shown). To get foo included in the diff, run git diff HEAD instead.

These gotchas are there in addition to the standard gotcha exhibited in all the major VCSes:

How all the major VCSes are problematic

  • Missing file in the commit
    1. Edit bar
    2. Create a new file called foo
    3. Run vcs commit (where vcs is cvs, svn, hg, bzr…see below about git)

    In this set of steps, the edits in step 1 will be included in the commit, but the file foo will not be. The user must first run vcs add foo (again, replacing vcs with the relevant VCS being used) before committing in order to get foo included in the commit.

    It turns out that git actually can help the user in this case due to its default to only commit what it is explicitly told to commit; meaning that in this case it won’t commit anything and tell the user that it wasn’t told to commit anything. However, since nearly every tutorial on git[*] says to use git commit -a, users include that flag most the time (60% of the time? 98%?). Due to that training, they’ll still get this nasty bug. However, they’re going to forget or neglect this flag sometimes, so they also get the new gotchas above.

[*] Recent versions of the official git tutorial being the only exception I’ve run across. It’s fairly thorough (make sure to also read part two), though it isn’t quite as explicit about the potential gotchas in certain situations.

How bzr, hg, and git mitigate these gotchas (and cvs and svn don’t)

These gotchas can be avoided by always running vcs status (again, replace vcs with the relevant VCS being used) and looking closely at the states the VCS lists files in. It turns out bzr, hg, and git are smart here and try to help the user avoid problems by showing the output of the status command when running a plain vcs commit (at the end of the commit message they are given to edit). This helps, but isn’t foolproof; I’ve somehow glossed over this extra bit of info in the past and still been bit. Also, I’ll often either use the -m flag to specify the commit message on the command line (for tiny personal projects) or a flag to specify taking the commit message from a file (i.e. using -F in most vcses, -l in hg).

November 15, 2007

=Utah Open Source=
Utah Open Source
The Utah Open Source Foundation
» Streaming Audio Tonight: UPHPU and BYU-UUG

Utah Open Source Foundation will be streaming both the Utah PHP User Group and the BYU Unix User Group meetings tonight as we test the power of our new streaming server. We’re looking forward to having you all there, whether at the meeting or from home. Feel free to participate either online or in person. The information is listed below:

BYU-UUG

A Git’s Perspective on Revision Control - Andrew McNabb
November 15th, 2007, 7pm
More Information

Stream Available: http://stream.utosf.org
IRC Channel: #utos-meeting on irc.freenode.net or Go Here

Utah PHP User Group

Building Rich Internet Applications with PHP, REST and ExtJS - Alvaro Carrasco
November 15th, 2007, 7pm
More Information

Stream Available: http://stream.utosf.org
IRC Channel: #uphpu on irc.freenode.net or Go Here at 7pm

Cheers,

Clint Savage
Founder, Utah Open Source Foundation