< Back to Blog

From 31 to 62: How I Upgraded the Kalshi Weather Trading Bot with NOAA's AI Ensemble

TL;DR / Key Takeaways

  • The bot was upgraded from a 31-member GFS ensemble to a 62-member HGEFS hybrid ensemble.
  • AIGEFS adds 31 AI-generated members built on a different modeling approach than physics-based GFS.
  • NOAA S3 ingestion solved the reliability problems caused by large AIGEFS GRIB2 files and NOMADS rate limits.
  • The new system deliberately rejects more trades when GFS and AIGEFS do not agree.

The problem with Kalshi weather prediction markets is that they are actually priced fairly well. Temperature contracts attract real forecasters. The market is not full of people guessing. It is populated by people looking at the same National Weather Service forecast you are. To find edge, you need a better probability model, not just the same one everyone else has.

When I built the first version of Predict & Profit, I used the GFS ensemble — 31 members, all via the Open-Meteo API. It worked well enough to identify trades. But 31 members, while statistically useful, still leave real uncertainty around the tails of temperature distributions. And that is exactly where Kalshi contracts are priced — at the boundary thresholds where a degree or two in either direction decides everything.

The upgrade to 62 members was the right move. Here is how it works and what I found out.

What the GFS ensemble actually is

The Global Forecast System does not run one forecast — it runs 31 slightly different versions of the same physics model. NOAA perturbs the initial atmospheric conditions of each run to capture the range of possible atmospheric states given measurement uncertainty. Instead of a single temperature prediction for Chicago tomorrow afternoon, you get 31 predictions.

I pull all 31 GFS members via the Open-Meteo API. Their ensemble endpoint returns all member forecasts for temperature at specific coordinates, so I can cover 14 Kalshi market cities in a single API call. From there the math is straightforward: 26 out of 31 members say New York will hit 72°F. That is 84% probability. If Kalshi is pricing it at 70 cents, I have a 14-point edge.

Enter AIGEFS: NOAA's Project EAGLE

In December 2025, NOAA launched Project EAGLE — the AI Ensemble Forecast System (AIGEFS). It is 31 AI-generated forecast members built on Google DeepMind's GraphCast architecture, retrained on NOAA's own historical observational data.

GraphCast is significant. Traditional numerical models like GFS work by solving partial differential equations forward through time — computationally expensive and bounded by grid resolution. GraphCast uses a graph neural network trained on decades of ERA5 reanalysis data. It learns the statistical relationships between atmospheric states instead of computing fluid dynamics from first principles. Two completely independent approaches to the same forecast question.

NOAA's AIGEFS runs 31 ensemble members from this AI model, giving the same probabilistic output format as GFS but from a fundamentally different modeling foundation.

The technical challenge: NOMADS rate limiting and the S3 solution

My first attempt to pull AIGEFS data from NOAA's NOMADS distribution server hit rate limits immediately. NOMADS is the standard access point for NOAA model output, but the AIGEFS GRIB2 files are large — multiple gigabytes per forecast cycle — and NOMADS enforces strict per-IP request throttling that makes it impractical for real-time pipeline use.

The solution was NOAA's public AWS S3 Open Data bucket. NOAA publishes AIGEFS output to S3 in near real-time with no rate limits and no authentication required. I parse the GRIB2 files directly using the cfgrib library, extract the 2-meter temperature grid, and interpolate to each city coordinate using bilinear interpolation. It added complexity to the ingestion pipeline but eliminated the reliability bottleneck entirely. The bot now pulls GFS from Open-Meteo and AIGEFS from S3 in parallel, giving both datasets within minutes of each NOAA release cycle.

Combining into HGEFS: the 62-member grand ensemble

| Forecast source | Members | Access path | Why it matters | | --- | ---: | --- | --- | | GFS | 31 | Open-Meteo API | Fast physics-based ensemble baseline | | AIGEFS | 31 | NOAA public S3 GRIB2 files | Independent AI-based forecast family | | HGEFS | 62 | Combined internal pipeline | Filters trades through cross-model agreement |

When you have two independently generated ensembles — one physics-based, one AI — you can combine them into a grand ensemble. NOAA calls this HGEFS: the Hybrid Global Ensemble Forecast System.

The arithmetic is simple. If GFS gives 0.82 probability and AIGEFS gives 0.79, the combined 62-member probability is approximately 0.81. Both models agree, confidence is high, and I have a clear signal.

The more interesting case is disagreement. If GFS says 0.82 and AIGEFS says 0.55, the combined probability shifts toward 0.68 — but the divergence itself becomes the signal to skip. When the physics model and the AI model cannot agree, the atmosphere is in a state of genuine uncertainty, and I do not want to be taking concentrated binary positions in that environment. The system only executes when both ensembles agree above threshold. In practice, this eliminates roughly half the trades the old 31-member system would have taken. That is the point.

NOAA's own research shows HGEFS consistently outperforms both GFS-only and AIGEFS-only systems on 2-to-10-day forecasts — exactly the range that matters for Kalshi daily and weekly temperature contracts.

Honest early results

Honesty first: live results through March 28, 2026 show 1 win and 5 losses. Those trades came from the old 31-member GFS-only configuration with looser filters. The 62-member HGEFS system with stricter edge score requirements has been live since March 28, 2026 — too early to report meaningful statistics.

What I can say is that the new system is rejecting more trades. The dual-ensemble agreement requirement alone eliminates a significant share of the marginal opportunities the old system would have entered. That is exactly the behavior I designed for. I would rather wait 10 days for a high-confidence setup than force activity on a coin-flip signal.

The full audited P&L publishes after the April 10 CPI settlement. I will post it regardless of outcome.


Want to run this system yourself? The full Python source code — including the HGEFS pipeline, AWS S3 AIGEFS ingestion, Kalshi API integration, and edge scoring system — is available now.

Get the Source Code — $67

How It Works — Full System Overview

Frequently Asked Questions

Q: What problem did the HGEFS upgrade solve?

A: The 31-member GFS-only system left uncertainty around threshold contracts. HGEFS adds independent AI ensemble members and requires cross-model agreement, which filters more weak trades.

Q: Why use NOAA S3 instead of NOMADS for AIGEFS data?

A: AIGEFS GRIB2 files are large and NOMADS throttling makes real-time ingestion unreliable. NOAA S3 provides the same data with better reliability for automated pipelines.

Q: What happens when GFS and AIGEFS disagree?

A: The bot treats disagreement as a risk signal. Even if the combined probability looks usable, the trade is skipped unless both model families clear the agreement threshold.

Related Reading