In a previous blog
post
I said, "If you have encountered a regression and you are building a driver
from source, please provide the results of
git-bisect." There is some feeling
that performing a bisect is hard, time consuming, or both. Back in the
bad-old-days, that was true... but git bisect run
changed all that.
Most of performing a bisect is mechanical and repetitious:
- Build the project.
- If the build fails, run
git bisect skip
. - Run the test.
- Inspect the results.
- Run
git bisect good
orgit bisect bad
depending on the test result. - While there are more steps to bisect, repeat from step 1.
- Run
git bisect reset
to restore the tree to its original state.
Some years ago, someone noticed that this seems like a task a computer could
do. At least as early as git 1.6.0 (around 2010), this has been possible with
using git bisect run
. Once you get the hang of it, it's surprisingly easy
to use.
A Word of Caution
Before actually discussing automated bisects, I want to offer a word of
caution. Bisecting, automated or otherwise, is a process that still requires
the application of common sense. A critical step at the end of bisecting is
manually testing the guilty commit and the commit immediately before the
guilty commit. You should also look at the commit that git-bisect
claims is
guilty. Over the years I have seen many bug reports for Mesa that either
point at commits that only change comments in the code or only change a driver
other than the driver on which the bug was observed.
I have observed these failures to have two causes, although other causes may
be possible. With a manual bisect, it is really easy to accidentally git
bisect good
when you meant git bisect bad
. I have done this by using
up-arrow to go back through the shell command history when I'm too lazy to
type the command again. It's really easy to get the wrong one doing that.
The other cause of failure I have observed occurs when multiple problems occur between the known-good commit and the known-bad commit. In this case the bisect will report a commit that was already know to be bad and was already fixed. This false information can waste a lot of time for the developer who is trying to fix the bug. They will spend time trying to determine why the commit reported by the bisect still causes problems when some other commit is the actual cause. The remedy is proper application of common sense while performing the bisect. It's often not enough to just see that the test case fails. The mode of the failure must also be considered.
Automated Bisect Basics
All of the magic in an automated bisect revolves around a single script that you supply. This script analyzes the commit and tells bisect what to do next. There are four things that the script can tell bisect, and each of them is communicated using the script's return code.
Skip this commit because it cannot be analyzed. This is identical to manually running
git bisect skip
.This can be used, for example, if the commit does not build. A script might contain something like:if ! make ; then exit 125 fi
As you might infer from the code example, a return code of 125 instructs bisect to skip the commit.
Accept this commit as good. This is identical to
git bisect good
. A return code of 0 instructs bisect to accept the commit as good.Reject this commit as bad. This is identical to
git bisect bad
. All tests in the piglit test suite print a string in a specific format when a test passes or fails. This can be used by the script to generate the exit code. For example:bin/arb_clear_buffer_object-null-data -auto > /tmp/result.$$ if grep -q 'PIGLIT: {"result": "pass" }' /tmp/result.$$; then rm /tmp/result.$$ exit 0 else cat /tmp/result.$$ rm /tmp/result.$$ exit 1 fi
In this bit of script, the output of the test is printed in the "bad" case. This can be very useful. Bisects of long ranges of commits may encounter failures unrelated to the failure you are trying to bisect. Seeing the output from the test may alert you to failures with unrelated causes.
Looking for simple "pass" or "fail" output from the test may not be good enough. It may be better to look for specific failure messages from the test. As mentioned above, it is important to only report a commit as bad if it the test fails due to the problem you are trying to bisect.
Imagine a case where a failure in the
arb_clear_buffer_object-null-data
on the master branch is being bisected. The particular failure is an incorrect error being generated, and the known-good commit isHEAD~200
when the last stable release occurred (on a different branch with a common root). However,HEAD~110..HEAD~90
contain an unrelated rendering error that was fixed inHEAD~89
. Sincegit-bisect
performs a binary search, it will testHEAD~100
first and see the wrong failure. Simply looking for test failure would incorrectly identifyHEAD~110
as the problem commit. If the script instead checked for the specific incorrect error message, the correct guilty commit is more likely to be found.A return code with any value of 1 through 127, excluding 125, instructs bisect to reject the commit as bad.
Stop at this commit and wait for human interaction. This can be used when something really catastrophic happens that requires human attention. Imagine a scenario where the bisect is being performed on one system but tests are run on another. This could be used if the bisect system is unable to communicate wit the test system. A return code with any value of 128 through 255 will halt the bisect.
All of this can be used to generate a variety of scripts for any sort of complex environment. To quote Alan Kay, "Simple things should be simple, complex things should be possible." For a simple make-based project and an appropriately written test case, an automated bisect script could be as simple as:
#!/bin/bash
if ! make ; then
exit 125
fi
# Use the return code from the test case
exec test_case
Since this is just a script that builds the project and runs a test, you can
easily test the script. Testing the script is a very good idea if you plan to
leave the computer when the bisect starts. It would be shame to leave the
computer for several hours only to find it stuck at the first commit due to a
problem in the automated bisect script. Assuming the script is called
auto_bisect.sh
, testing the script can be as easy as:
$ ./auto_bisect.sh ; echo $?
Now all of the human interaction for the entire bisect would be three commands:
$ git bisect start bad-commit good-commit
$ git bisect run auto_bisect.sh
$ git bisect reset
If there are a lot commits in good-commit..bad-commit
, building the project
takes a long time, or running the tests takes a long time, feel free to go
have a sandwich while you wait. Or play
Quake. Or do other work.
Broken Builds
The bane of most software developer's existence is a broken build. Few things are more irritating. With GIT, it is possible to have transient build failures that nobody notices. It's not unheard of for a 20 patch series to break at patch 9 and fix at patch 11. This commonly occurs either when people move patches around in a series during development or when reviewers suggest splitting large patches into smaller patches. In either case patch 9 could add a call to a function that isn't added until patch 11, for example. If nobody builds at patch 9 the break goes unnoticed.
The break goes unnoticed until a bisect hits exactly patch 9. If the problem being bisected and the build break are unrelated (and the build break is short lived), the normal skip process is sufficient. The range of commits that don't build will skip. Assuming the commit before the range of commits that don't build and the commit after the range of commits that don't build are both good or bad, the guilty commit will be found.
Sometimes things are not quite so rosy. You are bisecting because there was a problem, after all. Why have just one problem when you can have a whole mess of them? I believe that the glass is either empty or overflowing with steaming hot fail. The failing case might look something like:
$ git bisect start HEAD HEAD~20
Bisecting: 9 revisions left to test after this (roughly 3 steps)
[2d712d35c57900fc0aa0f1455381de48cdda0073] gallium/radeon: move printing texture info into a separate function
$ git bisect run ./auto_bisect.sh
running ./auto_bisect.sh
auto_bisect.sh says skip
Bisecting: 9 revisions left to test after this (roughly 3 steps)
[622186fbdf47e4c77aadba3e38567636ecbcccf5] mesa: errors: validate the length of null terminated string
running ./auto_bisect.sh
auto_bisect.sh says good
Bisecting: 8 revisions left to test after this (roughly 3 steps)
[19eaceb6edc6cd3a9ae878c89f9deb79afae4dd6] gallium/radeon: print more information about textures
running ./auto_bisect.sh
auto_bisect.sh says skip
Bisecting: 8 revisions left to test after this (roughly 3 steps)
[5294debfa4910e4259112ce3c6d5a8c1cd346ae9] automake: Fix typo in MSVC2008 compat flags.
running ./auto_bisect.sh
auto_bisect.sh says good
Bisecting: 6 revisions left to test after this (roughly 3 steps)
[1cca259d9942e2f453c65e8d7f9f79fe9dc5f0a7] gallium/radeon: print more info about CMASK
running ./auto_bisect.sh
auto_bisect.sh says skip
Bisecting: 6 revisions left to test after this (roughly 3 steps)
[c60d49161e3496b9e64b99ecbbc7ec9a02b15a17] gallium/radeon: remove unused r600_texture::pitch_override
running ./auto_bisect.sh
auto_bisect.sh says skip
Bisecting: 6 revisions left to test after this (roughly 3 steps)
[84fbb0aff98d6e90e4759bbe701c9484e569c869] gallium/radeon: rename fmask::pitch -> pitch_in_pixels
running ./auto_bisect.sh
auto_bisect.sh says skip
Bisecting: 6 revisions left to test after this (roughly 3 steps)
[bfc14796b077444011c81f544ceec5d8592c5c77] radeonsi: fix occlusion queries on Fiji
running ./auto_bisect.sh
auto_bisect.sh says bad
Bisecting: 5 revisions left to test after this (roughly 3 steps)
[a0bfb2798d243a4685d6ea32e9a7091fcec74700] gallium/radeon: print more info about HTILE
running ./auto_bisect.sh
auto_bisect.sh says skip
Bisecting: 5 revisions left to test after this (roughly 3 steps)
[75d64698f0b0c906d611e69d9f8b118c35026efa] gallium/radeon: remove DBG_TEXMIP
running ./auto_bisect.sh
auto_bisect.sh says skip
Bisecting: 5 revisions left to test after this (roughly 3 steps)
[3a6de8c86ee8a0a6d2f2fbc8cf2c461af0b9a007] radeonsi: print framebuffer info into ddebug logs
running ./auto_bisect.sh
auto_bisect.sh says bad
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[a5055e2f86e698a35da850378cd2eaa128df978a] gallium/aux/util: Trivial, we already have format use it
running ./auto_bisect.sh
auto_bisect.sh says skip
There are only 'skip'ped commits left to test.
The first bad commit could be any of:
19eaceb6edc6cd3a9ae878c89f9deb79afae4dd6
2d712d35c57900fc0aa0f1455381de48cdda0073
84fbb0aff98d6e90e4759bbe701c9484e569c869
c60d49161e3496b9e64b99ecbbc7ec9a02b15a17
1cca259d9942e2f453c65e8d7f9f79fe9dc5f0a7
75d64698f0b0c906d611e69d9f8b118c35026efa
a0bfb2798d243a4685d6ea32e9a7091fcec74700
a5055e2f86e698a35da850378cd2eaa128df978a
3a6de8c86ee8a0a6d2f2fbc8cf2c461af0b9a007
We cannot bisect more!
bisect run cannot continue any more
In even more extreme cases, the range of breaks can be even longer. Six or seven is about the most that I have personally experienced.
The problem doesn't have to be a broken build. It could be anything that prevents the test case from running. On Mesa I have experienced problems where a bug that prevents one driver from being able to load or create an OpenGL context persists for a few commits. Anything that prevents the test from running (e.g., not produce a pass or fail result) or causes additional, unrelated failures should be skipped.
Usually the problem is something really trivial. If the problem was fixed, a
patch for the problem may even already exist. Let's assume a patch exists in
a file named fix-the-build.patch
. We also know that the build broke at
commit 75d6469
, and it was fixed at commit 3a6de8c
. This means that the
range 75d6469^..3a6de8c^
need the patch applied. If you're not convinced
that the ^
is necessary, observe the log output:
$ git log --oneline 75d6469^..3a6de8c^
a0bfb27 gallium/radeon: print more info about HTILE
1cca259 gallium/radeon: print more info about CMASK
84fbb0a gallium/radeon: rename fmask::pitch -> pitch_in_pixels
19eaceb gallium/radeon: print more information about textures
2d712d3 gallium/radeon: move printing texture info into a separate function
c60d491 gallium/radeon: remove unused r600_texture::pitch_override
75d6469 gallium/radeon: remove DBG_TEXMIP
Notice that the bottom commit in the list is the commit where the break is first experienced, and the top commit in the list is not the one where the break is fixed.
Using this information is simple. The bisect script need only determine the current commit is in the list of commits that need the patch and conditionally apply the patch.
# Get the short-from SHA of the current commit
SHA=$(git log --oneline HEAD^.. | cut -f1 -d' ')
# If the current commit is in the list of commits that need the patch
# applied, do it. If applying the patch fails, even partially, abort.
if grep --silent "^$SHA " <(git log --oneline 75d6469^..3a6de8c^)
# ^^ ^
# This bit runs git-log, redirects the output
# to a temporary file, then uses that temporary
# file as the input to grep. Non-bash shells
# will probably need to do all that manually.
if ! patch -p1 --forward --silent < fix-the-build.patch ; then
exit 255
fi
fi
Before exiting, the script must return the tree to its original state. If it
does not, applying the next commit may fail or applying the patch on the next
step will certainly fail. git-reset
can do most of the work. It just has to be applied everywhere this script
might exit. I generally do this using a wrapper function. The simple bisect
script from before might look like:
#!/bin/bash
function report()
{
git reset --hard HEAD
exit $1
}
# Get the short-from SHA of the current commit
SHA=$(git log --oneline HEAD^.. | cut -f1 -d' ')
# If the current commit is in the list of commits that need the patch
# applied, do it. If applying the patch fails, even partially, abort.
if grep --silent "^$SHA " <(git log --oneline 75d6469^..3a6de8c^)
if ! patch -p1 --forward --silent < fix-the-build.patch ; then
# Just exit here... so that we can see what went wrong
exit 255
fi
fi
if ! make ; then
report 125
fi
# Use the return code from the test case
test_case
report $?
This can be extended to any number of patches to fix any number of problems.
There is one other tip here. If the first bisect attempt produced
inconclusive results due to skipped commits, it may not have been wasted
effort. Referring back to the previous output, there were two good commits
found. These commits can be given to the next invocation of git bisect
start
. This helps reduce the search space from 9 to 6 in this case.
$ git bisect start HEAD HEAD~20 622186fbdf47e4c77aadba3e38567636ecbcccf5 5294debfa4910e4259112ce3c6d5a8c1cd346ae9
Bisecting: 6 revisions left to test after this (roughly 3 steps)
[1cca259d9942e2f453c65e8d7f9f79fe9dc5f0a7] gallium/radeon: print more info about CMASK
Using the last bad commit can reduce the search even further.
$ git bisect start 3a6de8c86ee8a0a6d2f2fbc8cf2c461af0b9a007 HEAD~20 622186fbdf47e4c77aadba3e38567636ecbcccf5 5294debfa4910e4259112ce3c6d5a8c1cd346ae9
Bisecting: 4 revisions left to test after this (roughly 2 steps)
[2d712d35c57900fc0aa0f1455381de48cdda0073] gallium/radeon: move printing texture info into a separate function
Note that git-bisect
does not emit "good" or "bad" information. You have to
author your bisect script to emit that information. The report
function is
a good place to do this.
function report()
{
if [ $1 -eq 0 ]; then
echo " auto_bisect.sh says good"
elif [ $1 -eq 125 ]; then
echo " auto_bisect.sh says skip"
else
echo " auto_bisect.sh says bad"
fi
git reset --hard HEAD
exit $1
}
Remote Test Systems
Running tests on remote systems pose additional challenges. At the very least, there are three additional steps: get the built project on the remote system, start test on the remote system, and retrieve the result.
For these extra steps, rsync
and
ssh
are powerful tools. There
are numerous blog posts and tutorials dedicated to using rsync
and ssh
in
various environments, and duplicating that effort is well beyond the scope of
this post. However, there is one couple nice feature relative to automated
bisects that is worth mentioning.
Recall that returning 255 from the script will cause the bisect
to halt waiting for human intervention. It just so happens that ssh
returns
255 when an error occurs. Otherwise it returns the result of the remote
command. To make use of this, split the work across two scripts instead of
putting all of the test in a single auto_bisect.sh
script. A new
local_bisect.sh
contains all of the commands that run on the build / bisect
system, and remote_bisect.sh
contains all of the commands that run on the
testing system.
remote_bisect.sh
should (only) execute the test and exit with the same sort
of return code as auto_bisect.sh
would. local_bisect.sh
should build the
project, copy the build to the testing system, and start the test on the
testing system. The return code from remote_bisect.sh
should be directly
returned from local_bisect.sh
. A simple local_bisect.sh
doesn't look too
different from auto_bisect.sh
:
#!/bin/bash
if ! make ; then
exit 125
fi
if ! rsync build_results tester@test.system.com:build_results/; then
exit 255
fi
# Use the return code from the test case
exec ssh tester@test.system.com /path/to/test/remote_bisect.sh
Since remote_bisect.sh
returns "normal" automated bisect return codes and
ssh
returns 255 on non-test failures, everything is taken care of.
Interactive Test Cases
Automated bisecting doesn't work out too well when the test itself cannot be
automated. There is still some benefit to be had from automating the process.
Optionally applying patches, building the project, sending files to remote
systems, and starting the test can all still be automated, and I think
"automated" applies only very loosely. When the test is done, the script
should exit with a return code of 255. This will halt the
bisect. Run git bisect good
or git bisect bad
. Then, run git bisect run
./auto_bisect.sh
again.
It's tempting to just run auto_bisect.sh
by hand and skip git bisect run
.
The small advantage to the later is that skipping build failures will still be
automated.
Going further requires making an interactive test case be non-interactive. For developers of OpenGL drivers, it is common to need to bisect rendering errors in games. This can be really, really painful and tedious. Most of the pain comes from the task not being automatable. Just loading the game and getting to the place where the error occurs can often take several minutes. These bugs are often reported by end-users who last tested with the previous release. From the 11.0 branch point to the 11.1 branch point on Mesa there were 2,308 commits.
$ git bisect start 11.1-branchpoint 11.0-branchpoint
Bisecting: 1153 revisions left to test after this (roughly 10 steps)
[bf5f931aee35e8448a6560545d86deb35f0639b3] nir: make nir_instrs_equal() static
When you realized that bisect will be 10 steps with at least 10 to 15 minutes
per step, you may begin to feel your insides die. It's even worse if you
accidentally type git bisect good
when you meant git bisect bad
along the
way.
This is a common problem testing interactive applications. A variety of tools
exist to remove the interactivity from interactive applications.
apitrace
is one such tool. Using apitrace
,
the OpenGL commands from the application can be recorded. This step must be
done manually. The resulting trace can then be run at a known good commit,
and an image can be captured from the portion of the trace that would exhibit
the bug. This step must also be done manually, but the image capture is
performed by a command line option to the trace replay command. Now a script
can reply the trace, collect a new image, and compare the new image with the
old image. If the images match, the commit is good. Otherwise, the commit is
bad. This can be error prone, so it's a good idea to keep all the images from
the bisect. A human can then examine all the images after the bisect to make
sure the right choice were made at each commit tested.
A full apitrace
tutorial is beyond the scope of this post. The apitrace
documentation
has some additional information about using apitrace
with automated bisects.
What Now?
git-bisect
is an amazingly powerful tool. Some of the more powerful aspects
of GIT get a bad rap for being hard to use
correctly: make one mistake, and you'll have to
re-clone your tree. With even the more powerful aspects of git-bisect
, such
as automated bisects, it's actually hard to go wrong. There are two
absolutely critical final tips. First, remember that you're bisecting. If
you start performing other GIT commands in the middle of the bisect, both you
and your tree will get confused. Second, remember to reset your tree using
git bisect reset
when you're done. Without this step, you'll still be
bisecting, so see the first tip. git-bisect
and automated bisects really
make simple things simple and complex things possible.
I appreciate that you've taken the time to write up this walkthrough. I have written my own bisection routines for multi-system bisect on Mesa's CI, but I haven't taken the time to learn and use git-bisect directly.
I'm going to use this to set up a single-system bisection for the next long bisection that I have to investigate. The overhead of the CI is significant, and using git-bisect should speed up the process by an order of magnitude.
-Mark