Git Blame

Monday, July 22, 2013

Git 1.8.3.4

The latest maintenance release Git v1.8.3.4 is now available at the usual places. This is mostly to propagate documentation fixes and test updates from the master front back to the maintenance track, but there are a handful of minor fixes as well:

The bisect log listed incorrect commits when bisection ends with only skipped ones.
The test coverage framework was left broken for some time.
The test suite for HTTP transport did not run with Apache 2.4.
"git diff" used to fail when core.safecrlf is set and the working tree contents had mixed CRLF/LF line endings. Committing such a content must be prohibited, but "git diff" should help the user to locate and fix such problems without failing.

These fixes are already on the 'master' branch to be included in upcoming Git 1.8.4. Hopefully we can do its zeroth release candidate preview early this week.

Have fun.

Monday, July 15, 2013

Git 1.8.3.3

The third maintenance release for 1.8.3.x series is now available at the usual places. It contains the following fixes that have already been applied to the 'master' branch for 1.8.4.

"git apply" parsed patches that add new files, generated by programs other than Git, incorrectly. This is an old breakage in v1.7.11.
Older cURL wanted piece of memory we call it with to be stable, but we updated the auth material after handing it to a call.
"git pull" into nothing trashed "local changes" that were in the index.
Many "git submodule" operations did not work on a submodule at a path whose name is not in ASCII.
"cherry-pick" had a small leak in its error codepath.
Logic used by git-send-email to suppress cc mishandled names like "A U. Thor" <[email protected]>, where the human readable part needs to be quoted (the user input may not have the double quotes around the name, and comparison was done between quoted and unquoted strings). It also mishandled names that need RFC2047 quoting.
"gitweb" forgot to clear a global variable $search_regexp upon each request, mistakenly carrying over the previous search to a new one when used as a persistent CGI.
The wildmatch engine did not honor WM_CASEFOLD option correctly.
"git log -c --follow $path" segfaulted upon hitting the commit that renamed the $path being followed.
When a reflog notation is used for implicit "current branch", e.g. "git log @{u}", we did not say which branch and worse said "branch ''" in the error messages.
Mac OS X does not like to write(2) more than INT_MAX number of bytes; work it around by chopping write(2) into smaller pieces.
Newer MacOS X encourages the programs to compile and link with their CommonCrypto, not with OpenSSL.

Friday, June 28, 2013

Git 1.8.3.2

The second maintenance release for 1.8.3.x series is now available at the usual places. It contains the following fixes that have already been applied to the 'master' branch for 1.8.4.

Cloning with "git clone --depth N" while fetch.fsckobjects (or transfer.fsckobjects) is set to true did not tell the cut-off points of the shallow history to the process that validates the objects and the history received, causing the validation to fail.
"git checkout foo" DWIMs the intended "upstream" and turns it into "git checkout -t -b foo remotes/origin/foo". This codepath has been updated to correctly take existing remote definitions into account.
"git fetch" into a shallow repository from a repository that does not know about the shallow boundary commits (e.g. a different fork from the repository the current shallow repository was cloned from) did not work correctly.
"git subtree" (in contrib/) had one codepath with loose error checks to lose data at the remote side.
"git log --ancestry-path A...B" did not work as expected, as it did not pay attention to the fact that the merge base between A and B was the bottom of the range being specified.
"git diff -c -p" was not showing a deleted line from a hunk when another hunk immediately begins where the earlier one ends.
"git merge @{-1}~22" was rewritten to "git merge frotz@{1}~22" incorrectly when your previous branch was "frotz" (it should be rewritten to "git merge frotz~22" instead).
"git commit --allow-empty-message -m ''" should not start an editor.
"git push --[no-]verify" was not documented.
An entry for "file://" scheme in the enumeration of URL types Git can take in the HTML documentation was made into a clickable link by mistake.
zsh prompt script that borrowed from bash prompt script did not work due to slight differences in array variable notation between these two shells.
The bash prompt code (in contrib/) displayed the name of the branch being rebased when "rebase -i/-m/-p" modes are in use, but not the plain vanilla "rebase".
"git push $there HEAD:branch" did not resolve HEAD early enough, so it was easy to flip it around while push is still going on and push out a branch that the user did not originally intended when the command was started.
"difftool --dir-diff" did not copy back changes made by the end-user in the diff tool backend to the working tree in some cases.

Friday, June 21, 2013

Fun with various workflows (2)

As I discussed in a separate post, even though Git is a distributed SCM, it supports the centralized workflow well, to help people migrating from traditional SCM systems. But of course, Git serves the distributed workflow well. The one that is used in the Linux kernel development, where you work based on Linus's or a subsystem maintainer's repository, and publish to your own repository to get it pulled by others (including Linus, if your work is very good).

You would first start by cloning from your upstream:

$ git clone git://git.kernel.org/.../git/torvalds/linux.git

The only difference from the initial step in the centralized workflow is,... nothing. You will get a "linux" directory that becomes your working area, where you will have the standard configuration, perhaps not very different from this:

[remote "origin"]
url = git://git.kernel.org/.../torvalds/linux.git
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
remote = origin
merge = refs/heads/master

And your "master" branch, which was copied from the "master" branch of Linus's repository, is ready for you to build your work on it.

The only difference is that you would not "git push" back to Linus's repository. The "git://" protocol will not usually let you push, and even if it did, Linus would not let you write into his repository.

After working on your changes on "master", the way you would push out what you did is to say something like this:

$ git push [email protected]:me/linux.git master

This might get cumbersome to type every time, so you would add another remote, perhaps like this:

[remote "me"]
url = [email protected]:me/linux.git

By defining a short-hand for that URL, you can now say:

$ git push me master

and push out the work you did on your master branch as the master branch of your public repository, so that other people can pull from it.

If you worked on a topic that was forked from Linus's master to enhance a specific feature or fix a specific bug, you may want to say:

$ git checkout -b fix-tty-bug origin/master

... work work work ...

$ git push me fix-tty-bug

to publish the result in your public repository as a branch.

By the way, do you recall the reason why upstream mode was appropriate when using the centralized workflow from the previous post?

While the purpose of the Linus's master branch is to advance the overall state of the Linux kernel to prepare for the next release, the purpose of your topic branch fix-tty-bug is a lot narrower. And you are usually not integrating the work other people did into your work before you push it out. Indeed, you are encouraged to pick one stable point in the official (i.e. Linus's) history, and build on top of it without rebasing or merging things unrelated to what you are trying to achieve yourself.

Unlike in the centralized workflow where you tentatively play the role of integrator and change the purpose of your topic branch into "advance the overall project status" (which is compatible with the purpose of the "master" branch you will be updating with your work in the centralized workflow) immediately before you push it out, the purpose of your topic branch will stay to be the same as the original purpose of the topic until and after you push it out, when you are working with the distributed workflow.

If you started your topic branch, fix-tty-bug, to fix a bug in the tty subsystem and named it after the purpose of the topic branch, it can and should keep the name in your public repository. There is no reason to publish the result as your master branch. You control the branch names in your public repository, and pushing it out as master will only lose information. The branch name fix-tty-bug told what the branch was about. The name master sounds as if you are trying to make everything better, but that is not what you did.

So in general, you would be pushing out your topic branches to your public repository under the same name. You can use the 'current' mode when push your work out, like this:

$ git config push.default current

And then, you can lose that branch name from the command line when you push your work out:

$ git push me

You run the above command while you have your fix-tty-bug branch checked out, and the current branch is pushed out to the destination repository (i.e. me) to update the branch of the same name.

Recently, we added a mechanism to help those who are too lazy to even type "me", i.e. it let you say:

$ git push

To use this, you configure what remote you push to when you do not say from the command line, with a configuration variable, like this:

$ git config remote.pushdefault me

This feature is available in Git 1.8.3 and later.

Thursday, June 20, 2013

Fun with various workflows (1)

Even though Git is distributed, you can still use it for projects that employ the centralized workflow, where there is a single central shared repository. Everybody pulls from it to obtain everybody else's work, and after integrating his own work with others' work, everybody pushes into it so that everybody else can enjoy the fruit of his work.

In the simplest workflow, you can start by cloning from the central repository:
$ git clone our.site.xz:/pub/repo/project.git myproject
and the myproject directory becomes your working area, where you will have the standard configuration, perhaps not very different from this:
[remote "origin"]
url = our.site.xz:/pub/repo/project.git
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
remote = origin
merge = refs/heads/master
and your "master" branch, which was copied from the "master" branch of the central shared repository, is ready for you to build your work on it.

If you run "git pull --rebase" (without any other argument), the configuration above left for you by "git clone" will tell Git that you would want to obtain the latest work from the central shared repository, and you would want to rebase your own work on top of their master branch.

If you say "git push" (without any other argument), the current default mode of pushing is to look at your local branches, and look at the branches the repository you are pushing to has, and update the matching branches. In this "simplest" case, you only have the 'master' branch, and the central repository does have its 'master' branch, so you will update its 'master' branch with the work you did on your 'master' branch.

In Git 2.0, this default mode will change to 'simple', which will push only the current branch to the branch at the central repository you integrate with, but only when they have the same name (so the example of working on 'master' and pushing it back to 'master' will still work).

If your project employs the centralized workflow, after learning Git enough to be comfortable with it, you may want to do
$ git config push.default upstream
to choose to always update the branch at the central repository you integrate with, even if the branch names are different. Note that you can do this (or use 'simple' instead of 'upstream'), and indeed you are encouraged to do so, without waiting for Git 2.0.

That will allow you to work on different things on different branches, e.g.
$ git checkout -b my-feature -t origin/master
$ git push
The first "checkout" will create a new "my-feature" branch, that is set to integrate with the master branch from your central repository. When using the upstream mode, you will push "my-feature" back to update the "master" branch over there.

An interesting thing to notice is that in the centralized workflow, because there is no central project maintainer (aka integrator), everybody is responsible for integrating his own work to advance the mainline of the project. The job of integration is indeed distributed when you use centralized workflow. It is a bit funny when you think about it.

But that is exactly why the upstream mode makes sense. In order to fully appreciate it, you need to realize what it means to have forked the "my-feature" branch out of the "master" branch of the central shared repository.

The purpose of the master branch at the shared central repository is to advance the state of the project in general, but the purpose of your local branch, my-feature, is a lot more specialized. It may be to fix this small bug, or add that neat feature. You would only be working on a small part of the project while on that branch.

But because you are the one who plays the top-level integrator role when you run "git pull --rebase" just before you "git push", when that "git pull --rebase" finishes, the tip of your my-feature branch is no longer about your small fix or neat feature. It temporarily becomes about advancing the state of the overall project. And that is the reason you would "git push" it to update the master branch, not the "my-feature" branch, at the central repository. Of course, if you want to publish it as "my-feature", perhaps because you want to show it to others before really updating the shared master branch, you can explicitly say:
$ git push origin my-feature
Pushing my-feature that was forked from and still integrates with their master is not usually what you want to do every time in the centralized workflow, though. In fact, it often is the case that administrators of a project with centralized workflow flown upon people making random branches at their shared central repository willy-nilly (exactly because the central shared repository is a common resource and a feature branch like "my-branch" is often not of general interest).

Common things require less typing, and uncommon things are possible but you need to explicitly tell Git to do so.

The Git core itself is very much agnostic to what workflow you use, and you can also use it for projects that use "I publish my work to my public repository, others interested in my work can pull my work from there, and there is an integrator who pulls and consolidates good work from others and publishes the aggregated whole" distributed workflow. That will be a topic for a separate post.

Monday, June 10, 2013

Git 1.8.3.1

The first maintenance release 1.8.3.1 is out.

This is primarily to push out fixes to two regressions that seems to have affected many people recently. Sorry about that.

With Git 1.8.3, an entry "!dir" in .gitignore to say "This directory's contents is not ignored, unless other more specific entries tells us otherwise" did not work correctly. This regression has been fixed.
With recent Git since 1.7.12.1 or so, "git daemon", when started by the root user and then switched to an unprivileged user, refused to run when ~root/.gitconfig (and XDG equivalent configuration files under ~root/.config/) cannot be read by the unprivileged user. The right way to start the daemon might be to reset its $HOME (where these configuration files are read from) to somewhere the user the daemon runs as, but it is cumbersome to set up. With 1.8.3.1, failure to access these files with EPERM is treated as if these files do not exist, which is not an error.

The release tarballs are available at the usual places:

https://code.google.com/p/git-core/downloads/list
https://www.kernel.org/pub/software/scm/git/

Checking the current branch programatically

The git branch Porcelain command, when run without any argument, lists the local branches, and shows the current branch prefixed with an asterisk, like this:

$ git branch
* master
next
$ git checkout master^0
$ git branch
* (no branch)
master
next

The second one with (no branch) is shown when you are not on any branch at all. It often is used when you are sightseeing the tree of a tagged version, e.g. after running git checkout v1.8.3 or something like that.

To find out what the current branch is, casual/careless users may have scripted around git branch, which is wrong. We actively discourage against use of any Porcelain command, including git branch, in scripts, because the output from the command is subject to change to help human consumption use case.

And in fact, since release 1.8.3, the output when you are not on any branch, has become something like this:
$ git checkout v1.8.3
$ git branch
* (detached from v1.8.3)
master
next

in order to give you (as a human consumer) a better information. If your script depended on the exact phrasing from git branch, e.g.

branch=$(git branch | sed -ne 's/^\* \(.*\/\1/p')
case "$branch" in
'('?*')') echo not on any branch ;;
*) echo on branch $branch ;;
esac

your script will break.

The right way to programatically find the name of the current branch, if any, is not to use the Porcelain command git branch that is meant for the human consumption, but to use a plumbing command git symbolic-ref instead:

if branch=$(git symbolic-ref --short -q HEAD)
then
echo on branch $branch
else
echo not on any branch
fi

Friday, May 24, 2013

Git 1.8.3 and even more leftover bits

The 1.8.3 release has finally been tagged and pushed out to the usual places. Also the release tarballs at kernel.org are back.

For a list of highlights, please see the previous post on -rc2; not much has changed since then.

During the last development cycle including its pre-release feature freeze, a few more interesting topics were discussed, and at this moment there aren't actual patches or design work.

[Previous list of "leftover bits" is here]

"git config", when removing the last variable in a section, leaves an empty section header behind. Anybody who wants to improve this needs to consider ramifications of leaving or removing comments.
Cf. $gmane/219524
[STARTED AND THEN STALLED] Add "git pull --merge" option to explicitly override configured pull.rebase=true. Make "git pull" that does not say how to integrate fail when the result does not fast-forward, and advise the user to say --merge/--rebase explicitly or configure pull.rebase=[true|false]. An unconfigured pull.rebase and pull.rebase that is explicitly set to false would mean different things (the former will trigger the "fast-forward or die" check, the latter does the "pull = fetch + merge".
Cf. $gmane/225326
Teach more commands that operate on branch names about "-" shorthand for "the branch we were previously on", like we did for "git merge -" sometime after we introduced "git checkout -".
Cf. $gmane/230828
Proofread our documentation set, and update to reduce newbie confusion around "remote", "remote-tracking branch", use of the verb "to track", and "upstream".
Cf. $gmane/230786.

Monday, May 13, 2013

Git 1.8.3-rc2

The second and planned-to-be-the-final release candidate for the upcoming 1.8.3 release was tagged today. Also, the release tarballs at kernel.org are back ;-)

Hopefully we can have the final late next week, but we might end up doing another release candidate. Please help testing the rc2 early to make sure you can have a solid release.

There are numerous little fixes, new features and enhancements that cannot be covered in a single article, so I'll only highlight some selected big-picture changes here. For the full list of changes, please refer to the draft Release Notes.

Preparation for 2.0

A lot of work went into preparing the users for 2.0 release that will come sometime next year, which promises large backward-incompatible UI changes:

"git push" that does not say what to push from the command line will not use the "matching" semantics in Git 2.0 (it will use "simple", which pushes your current branch to the branch of the same name only when you have forked from it previously; e.g. "git push origin" done while you are on your "topic" branch that was previously created with "git checkout -t -b topic origin/topic" will push your "topic" branch to origin).

This default change will hurt old-timers who are used to the traditional "matching" (if you have branches A, B and C, and of the other side has branches A and C, then your branches A and C will update their branches A and C, respectively), and they can use "push.default" configuration variable to keep the traditional behaviour. I.e.

$ git config push.default matching

Recent releases since 1.8.0 has been issuing warnings when you do not have any push.default configuration set, and this release continues to do so.
"git add -u" and "git add -A" that is run inside a subdirectory without any other argument to specify what to add will operate on the entire working tree in Git 2.0, so that the behaviour will be more consistent with "git commit -a" (e.g. "edit file && cd subdir && git commit -a" will commit the change to the file you just edited which is outside the directory you ran "git commit" in).

Users can say "git add -u ." and "git add -A ." (the "dot" means "the current directory") to limit the operation to the subdirectory the command is run in with the traditional versions of Git (and this will stay the same in Git 2.0 or later), so there will be no configuration variable to change the default.

The 1.8.3 and later releases do not yet change the behaviour until Git 2.0, and limit these operations to the current subdirectory, but they do notice when you have changes outside your current subdirectory and warn, saying that if you were to type the same command to Git 2.0, you would be adding those other files to your index, and encourages you to learn to add that explicit "dot" if you mean to add changed or all files in the current subdirectory only.
"git add path" has traditionally been a no-op for removed files (e.g. "rm -f dir/file && git add dir" does not record the removal of dir/file to the index), but Git 2.0 will notice and record removals, too.

The 1.8.3 and later releases do not yet change the behaviour until Git 2.0, but they do notice when you have removed files that match the path and warn, saying that if you were to type the same command to Git 2.0, you would be recording their removal, and encourages you to learn to use the --ignore-removal option if you mean to only add modified or new files to the index.

Tightening of command line verification

There are quite a many UI fixes that tie loose ends. Some commands assumed that the users were perfect and would never throw nonsense command line arguments at them, and some operations that need two parameters were happily carried out even when they got three parameters without diagnosing such a command line as an error (the excess one was simply ignored).

Many of them have been updated to detect and die on such errors.

Helping our friends at Emacs land

We expedited the update of the foreign SCM interface to bzr we have in the contrib/ area since 1.8.2, and included a version that is vastly modified from what we had before, with help from some Emacs folks. This code could be a bit rougher than the rest of Git that usually moves slowly and cautiously, but we decided that, given the circumstance, it is more important to push out some improved version early, in order to help our friends in Emacs land, who have been (reportedly) suffering from less than ideal response to the issues they are having with their SCM of choice.

A beginning of a better triangular workflow support

The recommended workflows to collaborate with others are either:

to have your own repository and push your work there while pulling from your upstream to keep up to date, or
to have a central repository where everybody pushes to and pulls from.

The latter was primarily to make those who are coming from centralized version control systems feel at ease, and the repository configuration mechanisms such as "remote.origin.url" variable were designed to help that workflow (there is one "origin" you pull from and push to). The former however is also important, and many people on Git hosting sites (e.g. GitHub) employ this workflow (you pull from one place and push to another place, but they are not the same).

A new configuration mechanism "remote.pushdefault" has been introduced to support such a triangular workflow. After you clone from somebody else's project, that upstream repository will still be your 'origin', but you can add the repository you regularly push to in order to publish your work (and presumably then you will throw a "pull request" at the upstream) as another remote, and set it to this configuration variable. E.g.

$ git clone git://example.com/frotz.git frotz
$ cd frotz
$ git remote add publish ssh://myhost.com/myfrotz.git
$ git config remote.pushdefault publish

After this, you can say "git push" and the push does not attempt to push to your origin (i.e. git://example.com/frotz.git) but to your publish remote (i.e. ssh://myhost.com/myfrotz.git) because of the last configuration.

Tuesday, April 2, 2013

Where do evil merges come from?

A canonical example of where "evil merge" happens in real life (and is a good thing) is to adjust for semantic conflicts. It almost never have anything to do with textual conflicts.

Imagine you and your friend start with the same codebase where a function f() takes no parameter, and has two existing call-sites.

You decide to update the function to take a parameter, and adjust both existing call-sites to pass one argument to the function. Your friend in the meantime added a new call-site that calls f() still without an argument. Then you merge.

It is very likely that you won't see any textual conflict. Your friend added some code to block of lines you did not touch while you two were forked. However, the end result is now wrong. Your updated f() expects one parameter, while the new call-site your friend added does not pass any argument to it.

In such a case, you would fix the new call-site your friend added to pass an appropriate argument and record that as part of your merge.

Consider the line that has that new call-site you just fixed. It certainly did not exist in your version (it came from your friend'd code), but it is not exactly what your friend wrote, either (it did not pass any argument). That makes the merge an "evil merge".

With "git log -c/--cc", such a line will show with double-plus in the multi-way patch output to show that "this did not exist in either parent".