Usability and terminology in Git

Posted by Martin Vilcans on 23 July 2011

"A rose by any other name would smell just as sweet."

True, but in usability you shouldn't get too poetic with what you call things. It's easier to grasp concepts in a piece of software if it is consistent in what words it uses to describe them. One piece of software that is notoriously bad at consistent wording is Git.

Being new to version control using Git can be confusing. There are many concepts that may be foreign if you're used to centralized version control systems. More so if you haven't used version control at all before. Unfortunately, Git doesn't make the transition very smooth as it has some serious usability problems related to terminology. This makes it harder than necessary to learn the concepts in Git, even though they actually aren't that complex. That's a shame since Git is a very nice version control system.

Just to make it clear, I'm a pretty competent Git user. People tend to think that if you complain about the usability of a piece or software, you're just not very good at it. When I say that C++ sucks for instance, it's based on 15 years of experience with it. So while I personally have no problems (any more) with the usability issues I'll discuss in this post, I can still see that they are problems as they make it harder to teach and learn Git.

To start working on a git repository, you typically clone an existing one. Easy, just use the clone command and give it the URL of the repository:

$ git clone /path/to/some_project.git

So, what does Git answer to the above command?

Initialized empty Git repository in /home/martin/some_project/.git

WTF? I expected it to create a clone of a repository, not create an empty repository. Well, it did actually create a clone. The confusing message is from the git init command that is run as a first step before git fetches and merges the other repository.

Anyway, on to my point about terminology. Let's create a file:

$ echo Hello >foo.txt

...and use the git status command to check the state of the project:

# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#	foo.txt
nothing added to commit but untracked files present (use "git add" to track)

So now Git tells us that our new file is "untracked." Sounds reasonable. We haven't added it to version control yet. Git is friendly and tells us to use git add to add the file. OK, let's do that and check the status again:

$ git add foo.txt
$ git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#	new file:   foo.txt
#

We have now "added" the file. It appears under "Changes to be committed." And Git's helpful output from the status command tells us how to "unstage" it.

As a Git user, you'd know that the git add command adds the file to the "index." Contrary to its name, the "index" does not just hold the names of the files to be committed. It actually contains the content of the files as well. It is sometimes called the "staging area" instead, which is a better name, as it reflects what it is for: It is an area where you prepare your next commit. Note that neither the word index "index" nor the full term "staging area" has been used in the commands or output so far.

Let's see what we're going to commit. The following command will show the differences between the last commit and the contents of the "index" a.k.a. the "staging area":

$ git diff --cached

OK, so you add the --cached flag to see the difference between the "index"/"staging area" and the previous commit. So I guess the "index"/"staging area" can be called the "cache" too? Even though it's in no way similar to what you usually mean when you use the word "cache" in a computer context? (In all fairness, --staged is a synonym to --cached, which maps much better to the terminology used in the rest of the user interface.)

If we edit a file that was already in the project, we get:

$ git status
# On branch master
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#	modified:   a.txt
#
no changes added to commit (use "git add" and/or "git commit -a")

Now, Git says that the file is modified. It also says that it's "changed but not updated." So, what's up with "updated"? That's the same thing as "staged," right?

To sum this up, you "add" an "untracked" file to the "index" (bad name) a.k.a. "staging area" (good name) a.k.a. "cache" (lousy name). Then it becomes a "change to be committed" that can be "unstaged" using the "reset" command. If you change an existing file, it becomes "changed"/"modified" and needs to be "updated" (which you do with "add").

Can you see the problem here?

(Based on Git version 1.7.0.4.)

Previous: How to run Django without a database
Next: SPMD, Screenplain and Marked

blog comments powered by Disqus