TL;DR / Key Takeaways

A Kalshi bot should stream market updates where possible instead of hammering REST polling endpoints.
Order execution needs exponential backoff for 429 responses and clear handling for open positions.
Market scans should be staggered, cached, and paused around weather-model upload windows.
Rate-limit engineering protects the trading edge by keeping the bot reliable when markets are moving.

When you first build a Kalshi trading bot, you will almost certainly write a while True loop that polls the REST API every few seconds to check market prices. It will work fine in testing. Then you will add more markets, tighten the scan interval, or run it for a few hours straight — and you will see this:

429 Too Many Requests

If that happens while you have open positions and no retry logic, the bot goes silent. It cannot check whether your orders filled. It cannot cancel positions if the market moves against you. That is not just a bug. For a trading bot, it is a real financial risk.

Here is how I solved it in the Predict & Profit system, in order of what actually matters.

Step 1: Stop polling, start streaming

| Access pattern | API cost | Freshness | Bot risk | | --- | --- | --- | --- | | Constant REST polling | High request volume | Depends on interval | Rate limits and stale snapshots | | WebSocket streaming | Lower REST pressure | Near real time | Requires reconnect handling | | Cached snapshots | Minimal repeat calls | Good for slow-changing metadata | Must invalidate when markets change |

The most important rate limit fix has nothing to do with backoff. It is eliminating most REST API calls entirely by switching to WebSockets for market data.

Kalshi has a WebSocket API that pushes live order book updates to your connection as they happen. Instead of polling /markets/{ticker}/orderbook every 5 seconds across 50 contracts (that is 600 requests per minute), you open one WebSocket connection and subscribe to the tickers you care about. Updates come in as diffs rather than full snapshots.

# Source: predictandprofit.io
import asyncio
import websockets
import json

KALSHI_WS_URL = "wss://api.elections.kalshi.com/trade-api/ws/v2"

async def subscribe_to_orderbook(tickers: list[str], on_update):
    async with websockets.connect(KALSHI_WS_URL) as ws:
        # Subscribe to orderbook deltas for each ticker
        subscribe_msg = {
            "id": 1,
            "cmd": "subscribe",
            "params": {
                "channels": ["orderbook_delta"],
                "market_tickers": tickers
            }
        }
        await ws.send(json.dumps(subscribe_msg))

        async for message in ws:
            data = json.loads(message)
            if data.get("type") == "orderbook_delta":
                await on_update(data)

Once you have this running, you only use the REST API for authenticated actions: placing orders, canceling orders, and checking account balance. That is maybe 5-10 requests per scan cycle instead of hundreds. Rate limit pressure drops by an order of magnitude.

Step 2: Exponential backoff for order execution

Even with WebSockets handling market data, order placement can still hit rate limits during high-activity periods — when many contracts are settling simultaneously and the bot is firing multiple orders close together.

Exponential backoff means: if the server returns a 429, wait before retrying. Double the wait time on each subsequent failure. Cap the maximum wait at some reasonable ceiling. Here is the full implementation I use:

# Source: predictandprofit.io
import time
import requests
import logging

logger = logging.getLogger(__name__)

def retry_with_backoff(
    fn,
    max_retries: int = 5,
    base_delay: float = 1.0,
    max_delay: float = 32.0
):
    """
    Calls fn() and retries on 429 or 5xx with exponential backoff.
    Raises after max_retries exhausted.
    """
    delay = base_delay
    for attempt in range(max_retries):
        try:
            response, data = fn()
            if response.status_code == 429:
                if attempt < max_retries - 1:
                    logger.warning(
                        f"Rate limited (attempt {attempt + 1}). "
                        f"Retrying in {delay:.1f}s..."
                    )
                    time.sleep(delay)
                    delay = min(delay * 2, max_delay)
                    continue
                else:
                    raise Exception(f"Rate limit exceeded after {max_retries} retries")
            if response.status_code >= 500:
                logger.warning(f"Server error {response.status_code}, retrying...")
                time.sleep(delay)
                delay = min(delay * 2, max_delay)
                continue
            return response, data
        except requests.exceptions.ConnectionError as e:
            if attempt < max_retries - 1:
                logger.warning(f"Connection error: {e}. Retrying in {delay:.1f}s...")
                time.sleep(delay)
                delay = min(delay * 2, max_delay)
            else:
                raise
    raise Exception("Max retries exhausted")

Usage is clean — wrap any REST call in a lambda:

# Source: predictandprofit.io
res, data = retry_with_backoff(
    lambda: authenticated_post("/portfolio/orders", order_body)
)

Every order placement in the bot goes through this wrapper. Connection resets, transient server errors, and rate limit responses all get handled automatically without the bot crashing or leaving positions unmanaged.

Step 3: Stagger your scan intervals

If you are evaluating 14 cities with multiple upcoming contracts on each one, do not fire all the API calls simultaneously. Stagger them.

# Source: predictandprofit.io
async def scan_all_markets(tickers: list[str]) -> list[dict]:
    results = []
    for ticker in tickers:
        _, data = authenticated_get(f"/markets/{ticker}")
        results.append(data)
        await asyncio.sleep(0.1)  # 100ms between calls
    return results

At 100ms per call with 100 tickers, you spend 10 seconds on a full scan. That is fine when your ensemble data only updates every 6 hours. Batching calls like this keeps you averaging well under any rate limit threshold even during heavy scan periods.

Step 4: Cache what does not change

Market metadata — ticker symbols, contract thresholds, settlement dates — does not change between scan cycles. Do not re-fetch it on every loop.

# Source: predictandprofit.io
from datetime import datetime, timedelta

class MarketCache:
    def __init__(self, ttl_minutes: int = 60):
        self.cache = {}
        self.ttl = timedelta(minutes=ttl_minutes)
        self.last_refresh = None

    def needs_refresh(self) -> bool:
        if self.last_refresh is None:
            return True
        return datetime.now() - self.last_refresh > self.ttl

    def refresh(self, markets: list[dict]):
        self.cache = {m["ticker"]: m for m in markets}
        self.last_refresh = datetime.now()

    def get(self, ticker: str) -> dict:
        return self.cache.get(ticker)

With a 60-minute TTL on market metadata, you go from potentially hundreds of metadata fetches per hour to one. Order book data still comes live via WebSocket. You get the best of both.

Step 5: Execution kill switch before NOAA update windows

This one is less about rate limits and more about timing. GFS uploads its forecast hours sequentially, creating a window of instability where ensemble data is partially updated. During that window, the bot should not be placing orders.

I shut down new order placement for a 5-minute window before each scheduled NOAA release cycle (00z, 06z, 12z, 18z UTC). Existing positions are not touched — only new order placement is suspended.

# Source: predictandprofit.io
from datetime import datetime, timezone

NOAA_RELEASE_HOURS_UTC = [0, 6, 12, 18]
BLACKOUT_MINUTES_BEFORE = 5

def is_in_noaa_blackout() -> bool:
    now = datetime.now(timezone.utc)
    for hour in NOAA_RELEASE_HOURS_UTC:
        release = now.replace(hour=hour, minute=0, second=0, microsecond=0)
        delta = (release - now).total_seconds() / 60
        if -2 <= delta <= BLACKOUT_MINUTES_BEFORE:
            return True
    return False

This is not a rate limit fix, but it is in the same category of "protecting the bot from bad fills at bad times." The rate limit architecture and the data timing architecture need to work together.

The full implementation — WebSocket integration, retry decorator, market cache, and execution kill switch — is included in the Predict & Profit Python source code.

Get the Source Code — $67

Frequently Asked Questions

Q: Why is WebSocket streaming preferred over REST polling?

A: Streaming reduces repeated REST requests and gives the bot fresher order-book data. REST still has a role for orders and snapshots, but polling every market continuously wastes rate-limit budget.

Q: How does exponential backoff protect order execution?

A: Backoff spaces out retries after 429 responses so the bot does not amplify a temporary rate limit into a sustained outage. It also gives the API time to recover before another order attempt.

Q: Why pause execution around NOAA update windows?

A: During upload windows, old and new model data can be mixed or incomplete. Pausing avoids trades based on stale forecasts that look fresh only because the pipeline updated partially.

Surviving Kalshi API Rate Limits: Exponential Backoff and WebSockets in Python