< Back to Blog

Git History as a Debug Tool: How I Found a Two-Week-Old Bug Using git bisect

TL;DR / Key Takeaways

  • git bisect does a binary search through your commit history to find exactly which commit introduced a bug, and it works even when you have no idea where to start looking.
  • Silent regressions in trading bots are dangerous because they don't crash, they just quietly stop doing the right thing.
  • Atomic commits with meaningful messages are not a style preference, they are a debugging tool you will need eventually.
  • The Weather Bot v2.1 at Predict & Profit shipped cleaner partly because a bad commit message forced a two-day debug session that should have taken twenty minutes.

Two weeks. That's how long a bad behavior had been sitting in the weather bot before I caught it.

The bot wasn't crashing. The logs looked fine. Trades were going out. It was only when I sat down and looked at the actual trade decision log versus what the ensemble was producing that I noticed something wrong. The agreement filter wasn't blocking the trades it should have been blocking. Three-of-four-ensemble agreement was required. The bot was trading on two-of-four situations and logging them as passing.

No exception. No warning. Just wrong.

The Worst Kind of Bug

Runtime errors are easy. Your process dies, you get a traceback, you fix the line. Silent behavioral regressions are the ones that rot your system from the inside. The bot was running, which meant nothing was obviously broken. But the edge I had built the agreement filter to protect was getting eroded on every cycle.

I knew roughly when I had last manually verified the agreement filter was working correctly. It was about two weeks ago. Since then I had made maybe thirty commits. Some were clean and focused. Some were not.

I was not going to read thirty commits worth of diffs. I opened a terminal and ran git bisect.

How git bisect Works

The idea is simple. You tell git two things: a commit you know is good, and a commit you know is bad. Git checks out the commit exactly halfway between them. You test. You tell git whether that commit is good or bad. Git moves to the midpoint of whichever half contains the bug. Repeat until it narrows to a single commit.

Binary search. Log base 2 steps. Thirty commits becomes five tests.

Start a bisect session like this:

git bisect start
git bisect bad                          # current HEAD is bad
git bisect good a3f1c92                 # last known good commit

Git will tell you something like: Bisecting: 14 revisions left to test after this (roughly 4 steps). Then it checks out a commit in the middle of that range.

You test. If the behavior is correct at that commit:

git bisect good

If it's still broken:

git bisect bad

Git moves to the next midpoint. You keep going until it says: a7d3e81 is the first bad commit.

When you're done, clean up with:

git bisect reset

That puts you back on HEAD.

The Actual Debug Session

For the weather bot, I needed a repeatable way to test the agreement filter without placing real orders. I wrote a quick script that loaded a fixed historical forecast scenario I had saved as a JSON fixture and ran the agreement check against it. Expected output: trade blocked. Simple pass/fail.

# test_agreement.py
import json
from auto_trader import check_agreement

with open("fixtures/two_ensemble_scenario.json") as f:
    scenario = json.load(f)

result = check_agreement(scenario["ensemble_results"], min_agreement=3)

if result:
    print("PASS: trade correctly blocked")
else:
    print("FAIL: trade not blocked, agreement filter broken")
    exit(1)

I ran git bisect start, marked HEAD bad, marked a3f1c92 good, and started the session. Five tests later git gave me the bad commit.

The hash was b19f4c3. The commit message was: chore: misc fixes and cleanup.

Of course it was.

What Was Actually Inside That Commit

I ran git show b19f4c3 and looked at the diff. The commit touched four different files. One file was a log formatter. One was a threshold config. One was a comment cleanup in a utility module. And buried in the fourth file, in auto_trader.py, was this:

# Before (correct)
agreements = sum(1 for src in ensemble_results if src["direction"] == majority_direction)
if agreements < self.min_agreement:
    return False  # block the trade

# After (broken, from the bad commit)
agreements = sum(1 for src in ensemble_results if src["direction"] == majority_direction)
if agreements <= self.min_agreement:
    return False  # block the trade

< became <=. One character. The filter that was supposed to block trades with fewer than three agreements was now also blocking trades with exactly three agreements, which meant the logic inverted: trades that should have been blocked were passing because the condition that returned False was wrong, and the fallthrough returned True.

Wait. Actually let me re-read that. The return False was triggering on valid trades. So valid three-of-four trades were being blocked and invalid two-of-four trades were falling through.

Actually no. I'm getting confused re-explaining it. The point is: one character changed the comparison operator and the filter stopped working correctly. It took me about ten minutes to reconstruct the logic once I had the commit. Finding the commit took twenty minutes of bisect.

Without bisect I would have been reading diffs for hours, if I found it at all.

The Commit Message Problem

Here's the part that should make any working developer uncomfortable.

If that commit had been four separate commits, each with a real message, the bisect session would have been faster and the fix would have been obvious immediately. Something like:

refactor: simplify log formatter output for trade decisions
config: raise min_price threshold to 0.40 after AIGEFS calibration
docs: add inline comments to threshold constants
fix: update agreement filter comparison in auto_trader.py

Four commits. Each one atomic. Each one reviewable in isolation. When bisect lands on the bad one, you see exactly what changed and why.

Instead I had a grab-bag commit that touched everything and explained nothing. The misc fixes message is a lie. Not an intentional lie. Just the kind of lie you tell yourself when you're tired and you want to push and move on. But two weeks later, someone (me) pays for it.

Writing Commits That Help Future You

Atomic commits don't mean one line per commit. They mean one logical change per commit. The diff should be explainable in a single sentence. If you can't write a one-sentence commit message that accurately describes the change, the commit is probably too big.

The format I use now:

<type>: <what changed and why it mattered>

Types: fix, feat, refactor, config, docs, test, chore. The chore type is allowed. chore: update .gitignore to exclude .env files is a fine commit. chore: misc fixes is a crime you commit against your future self.

A few rules I actually follow:

One: if you're touching more than three files, ask whether this should be more than one commit. Usually yes.

Two: if the commit message requires the word "and", it's probably two commits.

Three: write the message before you write the code. Not always possible, but when you can do it, the commit stays focused because you've already defined what it is.

Automating the Bisect Test

For any bot with a meaningful test suite, you can fully automate bisect. Git will run your test script and mark commits good or bad without any manual intervention:

git bisect start
git bisect bad HEAD
git bisect good a3f1c92
git bisect run python test_agreement.py

Git runs test_agreement.py at each midpoint. If the script exits 0, the commit is good. If it exits nonzero, the commit is bad. Git bisects the entire range automatically and prints the first bad commit when it's done.

For this to work, your test needs to be deterministic and it needs to work on older code without depending on things that didn't exist yet. Historical fixture data is perfect for this. Live API calls are not.

If you don't have a test suite yet, write one test for the behavior you're currently debugging. The debugging session is the motivation. Use it.

What I Changed After This

I started keeping a small fixtures/ directory in the bot repo. JSON snapshots of specific scenarios: the four-ensemble disagreement case, the edge-too-small case, the spread-too-wide case. Each one is a documented known-input that produces a known output. They're not a full test suite. They're bisect ammunition.

I also added a pre-commit hook that blocks commits with messages shorter than fifteen characters. It won't stop bad messages. It will at least stop fix, wip, and asdf.

# .git/hooks/commit-msg
#!/bin/bash
MSG=$(cat "$1")
if [ ${#MSG} -lt 15 ]; then
  echo "Commit message too short. Write something useful."
  exit 1
fi

Closing

Two weeks of contaminated trade data because of a one-character diff buried in a misc fixes commit. The bug itself was trivial. Finding it without bisect would have been miserable.

git bisect is not a fancy tool. It's been in git since 2007. Most developers I've worked with have heard of it and never used it. Start using it. And start writing commits worth bisecting.