TL;DR / Key Takeaways

Trade logging is not optional. If you cannot query your bot's decisions after the fact, you cannot improve it.
The schema needs to capture both the market state at decision time and the outcome after settlement, stored separately.
P&L is not just wins minus losses. Fee drag, ensemble accuracy, and filter hit rate are the numbers that tell you whether your edge is real.
A clean PostgreSQL schema with the right indexes makes these queries fast enough to run after every trade cycle.

Every tutorial on prediction market bots ends at place_order(). The trade goes in, the tutorial ends, and you are left to figure out what happened. Did the ensemble call it right? What was the fee drag on that market? Was the edge real or noise?

The answer is in your logs. But only if you designed your logs to answer those questions.

I am a data engineer by trade. Thirty years of building pipelines means I take schema design seriously. When I built the Predict and Profit weather trading bot for Kalshi, the logging layer was not an afterthought. It was part of the architecture from the start. This is what that schema looks like and why it is structured the way it is.

The Core Problem With Simple Trade Logs

The naive approach to trade logging is a single table with a row per trade and a profit column at the end. That works until you want to answer anything interesting.

How many trades did the bot reject this cycle and why? What was the average fee as a percentage of expected edge on losing trades versus winning trades? When ensemble spread was above 0.20, did the win rate improve? How often did the model predict Yes and the market settle No?

A flat trade log cannot answer these. You need to separate the decision from the outcome, capture the state of the scoring inputs at the time of the decision, and structure the schema so these questions are cheap to query.

The Schema

The Kalshi weather bot uses three main tables: scored_markets, placed_trades, and trade_outcomes.

-- predictandprofit.io
-- Captures the full scoring state for every market evaluated, whether traded or not
CREATE TABLE scored_markets (
    id              BIGSERIAL PRIMARY KEY,
    evaluated_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    ticker          TEXT NOT NULL,
    market_title    TEXT,
    direction       TEXT NOT NULL CHECK (direction IN ('YES', 'NO')),
    contract_price  NUMERIC(6, 4) NOT NULL,

    -- The four scoring factors
    ensemble_spread     NUMERIC(6, 4),
    ensemble_confidence NUMERIC(6, 4),
    model_market_gap    NUMERIC(6, 4),
    fee_efficiency      NUMERIC(6, 4),

    -- Composite edge score
    edge_score      NUMERIC(6, 4),

    -- Filter outcomes: why was this market skipped?
    passed_filters  BOOLEAN NOT NULL DEFAULT FALSE,
    filter_failed   TEXT,  -- 'min_confidence', 'min_edge', 'min_price', 'contradiction', 'fee_drag', NULL if passed

    -- Ensemble summary at evaluation time
    ensemble_mean   NUMERIC(6, 4),
    ensemble_members INTEGER,

    INDEX idx_scored_markets_ticker (ticker),
    INDEX idx_scored_markets_evaluated_at (evaluated_at DESC),
    INDEX idx_scored_markets_passed (passed_filters, evaluated_at DESC)
);

-- One row per trade actually placed with Kalshi
CREATE TABLE placed_trades (
    id              BIGSERIAL PRIMARY KEY,
    scored_market_id BIGINT REFERENCES scored_markets(id),
    placed_at       TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    ticker          TEXT NOT NULL,
    direction       TEXT NOT NULL CHECK (direction IN ('YES', 'NO')),
    contracts       INTEGER NOT NULL,
    contract_price  NUMERIC(6, 4) NOT NULL,
    total_cost      NUMERIC(10, 4) NOT NULL,
    kalshi_order_id TEXT UNIQUE NOT NULL,
    status          TEXT NOT NULL DEFAULT 'open' CHECK (status IN ('open', 'settled', 'cancelled')),

    INDEX idx_placed_trades_ticker (ticker),
    INDEX idx_placed_trades_kalshi_order_id (kalshi_order_id),
    INDEX idx_placed_trades_status (status)
);

-- Populated after market settlement
CREATE TABLE trade_outcomes (
    id              BIGSERIAL PRIMARY KEY,
    placed_trade_id BIGINT REFERENCES placed_trades(id) UNIQUE,
    settled_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    result          TEXT NOT NULL CHECK (result IN ('WIN', 'LOSS', 'PUSH')),
    payout          NUMERIC(10, 4),
    fee_paid        NUMERIC(10, 4),
    net_pnl         NUMERIC(10, 4),

    -- Actual weather outcome for accuracy tracking
    actual_temp     NUMERIC(5, 2),
    ensemble_was_correct BOOLEAN,

    INDEX idx_trade_outcomes_placed_trade_id (placed_trade_id),
    INDEX idx_trade_outcomes_settled_at (settled_at DESC)
);

Three tables instead of one. This matters.

scored_markets records every market the bot evaluated, not just the ones it traded. This is how you measure filter effectiveness. If you only log placed trades, you have no visibility into what got filtered and why.

placed_trades is the execution record. It links back to the scoring state via scored_market_id so you can always ask: what did the model think when this trade was placed?

trade_outcomes is populated by a separate settlement process that runs after market close. It stores the actual result, the fee paid, and whether the ensemble called it correctly. The ensemble_was_correct flag is computed separately by comparing the ensemble mean direction against the settlement outcome.

Populating the Tables in Python

The scoring loop populates scored_markets for every market it evaluates, regardless of outcome:

# predictandprofit.io
def log_scored_market(conn, market: dict, scores: dict, decision: dict) -> int:
    sql = """
        INSERT INTO scored_markets (
            ticker, market_title, direction, contract_price,
            ensemble_spread, ensemble_confidence, model_market_gap, fee_efficiency,
            edge_score, passed_filters, filter_failed,
            ensemble_mean, ensemble_members
        ) VALUES (
            %(ticker)s, %(title)s, %(direction)s, %(price)s,
            %(spread)s, %(confidence)s, %(gap)s, %(fee_eff)s,
            %(edge_score)s, %(passed)s, %(filter_failed)s,
            %(ens_mean)s, %(members)s
        )
        RETURNING id;
    """
    params = {
        "ticker":        market["ticker"],
        "title":         market.get("title"),
        "direction":     decision["direction"],
        "price":         market["last_price"],
        "spread":        scores["ensemble_spread"],
        "confidence":    scores["ensemble_confidence"],
        "gap":           scores["model_market_gap"],
        "fee_eff":       scores["fee_efficiency"],
        "edge_score":    scores["composite"],
        "passed":        decision["trade"],
        "filter_failed": decision.get("reason"),
        "ens_mean":      scores["ensemble_mean"],
        "members":       scores["member_count"],
    }
    with conn.cursor() as cur:
        cur.execute(sql, params)
        conn.commit()
        return cur.fetchone()[0]

The decision["reason"] field is set by whichever filter rejected the market. If no filter fired, it is None and passed_filters is True. This gives you a clean rejection reason for every non-traded market.

The P&L Queries That Actually Matter

Net P&L is the obvious query. Here is the one I check after every settlement cycle:

-- Net P&L all time
SELECT
    COUNT(*)                                    AS total_trades,
    SUM(CASE WHEN result = 'WIN' THEN 1 END)    AS wins,
    SUM(CASE WHEN result = 'LOSS' THEN 1 END)   AS losses,
    ROUND(SUM(net_pnl)::NUMERIC, 2)             AS net_pnl,
    ROUND(SUM(fee_paid)::NUMERIC, 2)            AS total_fees,
    ROUND(AVG(net_pnl)::NUMERIC, 4)             AS avg_pnl_per_trade
FROM trade_outcomes;

But the more useful query is filter effectiveness. How much edge am I leaving on the table with each filter, and is each filter pulling its weight?

-- Filter rejection breakdown
SELECT
    filter_failed,
    COUNT(*)                                    AS rejected_count,
    ROUND(AVG(edge_score)::NUMERIC, 4)          AS avg_edge_at_rejection
FROM scored_markets
WHERE passed_filters = FALSE
GROUP BY filter_failed
ORDER BY rejected_count DESC;

If min_confidence is rejecting 60% of markets but those markets have an average edge of 0.08, the filter is working. If they have an average edge of 0.22, you might be setting the threshold too high and leaving money out.

Ensemble accuracy by spread bucket is the one I find most interesting:

-- Win rate by ensemble spread quartile
SELECT
    NTILE(4) OVER (ORDER BY sm.ensemble_spread) AS spread_quartile,
    ROUND(MIN(sm.ensemble_spread)::NUMERIC, 4)  AS spread_min,
    ROUND(MAX(sm.ensemble_spread)::NUMERIC, 4)  AS spread_max,
    COUNT(*)                                    AS trades,
    SUM(CASE WHEN to2.result = 'WIN' THEN 1 ELSE 0 END) AS wins,
    ROUND(
        SUM(CASE WHEN to2.result = 'WIN' THEN 1 ELSE 0 END)::NUMERIC
        / COUNT(*), 4
    )                                           AS win_rate
FROM placed_trades pt
JOIN scored_markets sm  ON sm.id = pt.scored_market_id
JOIN trade_outcomes to2 ON to2.placed_trade_id = pt.id
GROUP BY spread_quartile
ORDER BY spread_quartile;

This tells you whether higher ensemble spread actually correlates with better win rates. If it does, the spread filter is a real signal. If it does not, you are adding noise, not edge.

One Index Decision That Matters

All the above queries join across three tables. The foreign key indexes on placed_trades.scored_market_id and trade_outcomes.placed_trade_id are not optional. Without them, settlement reporting queries run a sequential scan on placed_trades every time.

For a bot placing a handful of trades per day, this does not matter. For a bot running across 30 cities and multiple contract expiry dates simultaneously, the table can grow to 50,000 rows in a month. The index difference between a 3ms query and a 400ms query is whether you bothered with CREATE INDEX on the join columns.

Why PostgreSQL and Not Just a Flat File

Because I want to ask questions I have not thought of yet.

A flat CSV log lets you answer the questions you anticipated. A properly normalized relational schema lets you run ad hoc queries as you learn what matters. Ensemble spread bucketing was not a metric I planned for when I started. I added it after three months of data made it an obvious thing to ask.

Every major insight I have had about this bot came from a query I did not plan in advance. The schema has to be flexible enough to support that. PostgreSQL with proper normalization and indexing gives you that flexibility. A flat file does not.

The full Predict and Profit bot source code, including the complete PostgreSQL schema and logging integration, is available at predictandprofit.gumroad.com.

Designing a PostgreSQL Trade Log Schema for an Automated Weather Trading Bot