March 28, 2026
From 31 to 62: How We Upgraded Our Kalshi Weather Trading Bot with NOAA's AI Ensemble
The problem with weather prediction markets is that they are actually priced fairly well. Kalshi's temperature contracts — “Will DC hit 55°F or higher tomorrow?” — attract real forecasters. The market is not full of people guessing. It is populated by people looking at the same National Weather Service forecast you are. To find edge, you need a better probability model, not just the same one everyone else has.
When we built the first version of Predict & Profit, we used the GFS ensemble — 31 members, all via the Open-Meteo API. It worked well enough to identify trades. But 31 members, while statistically significant, still leave uncertainty around the tails of temperature distributions. That is exactly where Kalshi markets are priced.
What the GFS Ensemble Actually Is
The Global Forecast System does not run one forecast — it runs 31 slightly different versions of the same physics model. NOAA perturbs the initial atmospheric conditions of each run to capture the range of possible states. Instead of a single temperature prediction for Chicago at 2pm tomorrow, you get 31 predictions.
We pull all 31 GFS members via the Open-Meteo API. Their ensemble endpoint returns all member forecasts for temperature at specific coordinates, so we can cover 14 Kalshi market cities in a single API call. From there, it is a counting problem: 26 out of 31 members say New York will hit 72°F → 84% probability → if Kalshi prices it at 70%, we have a 14-point edge.
Enter AIGEFS: NOAA's Project EAGLE
In December 2025, NOAA launched Project EAGLE — the AI Ensemble Forecast System (AIGEFS). It is 31 AI-generated forecast members built on Google DeepMind's GraphCast architecture, retrained on NOAA's own historical observational data.
GraphCast is significant. Traditional numerical weather models like GFS work by solving partial differential equations forward through time — computationally expensive and bounded by the resolution of the atmospheric grid. GraphCast uses a graph neural network trained on decades of ERA5 reanalysis data. It learns the statistical relationships between atmospheric states rather than computing fluid dynamics from first principles.
NOAA's AIGEFS runs 31 ensemble members from this AI model, giving the same probabilistic output format as GFS but from a fundamentally different modeling approach. Two independent lines of evidence for the same forecast question.
The Technical Challenge: NOMADS Rate Limiting
Our first attempt to pull AIGEFS data from NOAA's NOMADS distribution server hit rate limits immediately. NOMADS is the standard access point for NOAA model output, but the AIGEFS GRIB2 files are large — multiple gigabytes per forecast cycle — and NOMADS enforces strict per-IP request throttling.
The solution was NOAA's public AWS S3 bucket. NOAA publishes AIGEFS output to S3 in near real-time with no rate limits and no authentication required. We parse the GRIB2 files directly using the cfgrib library, extract the 2-meter temperature grid, and interpolate to each city coordinate using bilinear interpolation. It added complexity to the data pipeline but removed the reliability bottleneck entirely.
Combining Into HGEFS: 62 Members
When you have two independently generated ensembles — one physics-based, one AI — you can combine them into a grand ensemble. NOAA calls this HGEFS: the Hybrid Global Ensemble Forecast System.
The arithmetic is straightforward: if GFS gives 0.82 probability and AIGEFS gives 0.79, the combined 62-member probability is 0.81. Both models agree, confidence is high.
The more interesting case is disagreement. If GFS says 0.82 and AIGEFS says 0.55, the combined probability moves toward 0.68 — and the divergence itself becomes a signal. When the physics model and the AI model disagree, we flag the opportunity as uncertain and typically skip it. We only execute when both ensembles agree above threshold. In practice, this eliminates roughly half the trades the old system would have taken.
NOAA's own research shows HGEFS consistently outperforms both the GFS-only and AIGEFS-only systems on 2-10 day forecasts — exactly the range that matters for Kalshi daily and weekly temperature contracts.
Where We Are Now
Honesty first: the live results through March 28, 2026 show 1 win and 5 losses. Those trades came from the old 31-member GFS-only configuration with looser filters. The 62-member HGEFS system with stricter edge score requirements has been live since March 28 — too early to report meaningful statistics.
What we can say is that the bot is now rejecting more trades. The agreement requirement between GFS and AIGEFS alone eliminates a significant share of the opportunities the old system would have entered. That is by design. We would rather wait for high-confidence setups than force activity.
The full audited P&L publishes after the April 10 CPI settlement. We will post it here regardless of outcome.
Want to run this system yourself? The full Python source code — including the HGEFS pipeline, Kalshi API integration, and edge scoring system — is available now.