< Back to Blog

The Performance Review vs. The P&L Ledger: Why I Stopped Letting Someone Else Grade My Work

TL;DR / Key Takeaways

  • Corporate performance reviews measure your ability to manage perception, not your actual output. The P&L ledger of an automated trading system measures only one thing: whether the edge is real.
  • Running automated systems creates a feedback loop that is honest by design. The bot either makes money or it does not. There is no narrative you can construct around it.
  • The discipline required to build a profitable trading bot is exactly the same discipline that made you a good engineer — it just finally works for you instead of your employer.
  • You do not need permission, budget approval, or a favorable manager to validate the work. The market handles that.

I had my first corporate performance review at 22 years old. The manager said I was "meeting expectations." I had just built an inventory tracking system that saved the company six figures annually. Meeting expectations.

I have had roughly 60 more performance reviews since then. In 30 years of enterprise software engineering, I have received two that I would call accurate. The rest were political documents dressed up as feedback.

This is not a complaint. It is an observation. Corporate performance reviews are not designed to measure your actual output. They are designed to manage headcount budgets, justify compensation decisions that were already made before the review happened, and give managers plausible documentation for whatever HR needs. If you are good at politics, you get rated high. If you are bad at politics and good at engineering, your results depend heavily on which manager you drew.

I knew this intellectually for years. I did not feel it in my bones until I started building automated trading systems.

What Honest Feedback Actually Feels Like

The Kalshi weather bot uses GFS ensemble data to find mispriced temperature contracts. It runs a scoring function on four factors: ensemble spread, model confidence, the gap between model probability and market price, and fee efficiency. If the score clears the threshold, the bot places a trade. If not, it passes.

Every decision is logged to a SQLite database with a timestamp, the score components, the trade direction, the contract price, and the eventual outcome. After any given cycle I can query that database and know exactly how the system performed. Not how I felt it performed. Not how my manager thought it performed. What actually happened.

-- predictandprofit.io
SELECT
    DATE(trade_time) AS trade_date,
    COUNT(*) AS total_trades,
    SUM(CASE WHEN outcome = 'WIN' THEN 1 ELSE 0 END) AS wins,
    ROUND(SUM(pnl_cents) / 100.0, 2) AS net_pnl_dollars
FROM trade_log
WHERE trade_time >= DATE('now', '-30 days')
GROUP BY DATE(trade_time)
ORDER BY trade_date;

That query does not care if I had a difficult quarter. It does not care if my skip-level liked my presentation skills. It does not care about my five-year plan or my leadership potential. It returns numbers. The numbers are what they are.

The first time I ran that query and saw a month where the bot lost money, I did not feel the urge to write an explanation or talk to anyone about context. I looked at the losing trades, found the pattern, adjusted the minimum confidence threshold, and moved on. No politics. No narrative. Just data and iteration.

I had never experienced feedback that clean in 30 years of corporate engineering.

The Mismatch That Burns You Out

Here is the thing nobody says out loud at work: your actual technical output and your measured performance score are almost entirely decoupled. I have watched mediocre engineers get promoted because they were good at framing things in the language management wanted to hear. I have watched excellent engineers stagnate because they were blunt, or worked on unglamorous systems, or had a bad relationship with one person who sat on the promotion committee.

The work and the reward are not connected the way we were told they would be when we entered the field.

Automated trading systems do not have this problem. The edge either exists or it does not. The bot runs GFS ensemble data through 164 member simulations. If the simulated probability and the Kalshi market price are far enough apart, and all the other filters pass, a trade goes in. Over enough trades, if the edge is real, the P&L climbs. If it is not, it does not. There is no performance management cycle to wait for. The verdict arrives in real time.

This sounds obvious. But after decades of waiting six to twelve months to find out how your work was "perceived," getting honest feedback inside of one trading cycle feels like a different reality.

The Discipline Carries Over

What surprised me most when I started building these systems seriously is how directly my engineering discipline from corporate work applies, and how much better it feels to apply it to my own capital.

At QTS Data Centers I design data pipelines that have to run reliably. I think about failure modes, retry logic, data integrity, observability. I have spent years building systems that do not break when you are not watching them.

That is exactly what an automated trading bot requires. The Kalshi bot runs on a local Ubuntu VM. When the Open-Meteo API returns a malformed response, the bot logs it and retries with exponential backoff. When the Kalshi API rate limit triggers, it waits and tries again. When a trade fills partially, it records the partial fill and calculates exposure against the intended position. All of that is standard enterprise reliability engineering, applied to a system that works for me instead of my employer.

The work is the same. The accountability is totally different.

What the P&L Statement Tells You That a Review Never Will

A performance review tells you what your manager thinks of you at a specific point in time, filtered through their own biases, political pressures, and memory of one quarter of work.

A P&L ledger tells you whether your model of the world is correct.

Those are not comparable things. One measures perception. The other measures reality.

When the Kalshi bot has a losing stretch, I do not assume it is random noise and wait for things to improve. I query the database and look at which market conditions produced the losses. Usually it is a specific pattern: trades entered when ensemble spread was low, meaning the model members agreed but the spread was too narrow to justify the fee. I tighten the filter. The losses stop.

That is a feedback loop that actually closes. Corporate reviews almost never close the loop. You get feedback in December, it describes events from six months ago, and by the time you could act on it the context has changed entirely.

The Honest Part

I want to be clear about something. The bot loses sometimes. The month-one results post I wrote showed a period where the system was underwater. I published those numbers because hiding them would make everything else I write meaningless.

The P&L ledger is honest in the direction of good outcomes and bad ones. That is the whole point. If I only published the winning months, I would be doing what corporate performance reviews do: curating the narrative.

The prediction market space is also genuinely hard. Kalshi takes fees on every contract. The fee formula (0.07 x C x P x (1-P)) is designed to extract value from traders who do not account for it. Most retail participants do not. The edge from a well-calibrated ensemble model is real, but it is not infinite. You have to earn it every cycle.

I have been earning it for over a year now, running real capital through a system I built myself, with no manager telling me if it is good enough.

It is the most honest feedback I have ever received about the quality of my own thinking.


The full source code for the Predict & Profit weather trading bot is $67, including the scoring engine, ensemble data pipeline, Kalshi API integration, and SQLite logging layer.

Get the Source Code — $67

Read the Ebook — $9.99 on Amazon

Related Reading