A national electricity transmission system operator was struggling with a problem the renewable-energy transition has imposed on every grid in the world: load and generation are both harder to predict than they used to be, and the cost of getting it wrong scales nonlinearly. The operator's existing forecasting stack, a respected commercial product from a major vendor, was delivering day-ahead load forecasts accurate to within roughly 2.1%. For a regulated grid operating under tight balancing-cost constraints, that level of error translated into real annual costs — costs that were ultimately passed through to ratepayers.

The challenge

Grid load forecasting used to be a comparatively well-behaved problem. Demand was driven by economic activity, weather, and calendar patterns. Short-term generation was known, because it came overwhelmingly from dispatchable sources. Forecast error was low and symmetric.

None of that is true anymore. Rooftop solar, distributed storage, electric-vehicle charging, and heat pumps have made the relationship between time-of-day and grid demand much more complex. Weather forecasting error at the five-day horizon translates directly into demand forecasting error. Renewable generation introduces its own layer of uncertainty, particularly at the sub-hourly level.

The operator's incumbent forecasting platform was competent, but it was architected for a previous era. It used classical time-series techniques augmented with weather covariates. It did not ingest high-resolution smart-meter data. It did not systematically integrate uncertainty estimates from the weather forecast itself. And — the fundamental constraint — the vendor's roadmap for these capabilities was multi-year.

The constraints

Three constraints shaped the engagement from day one:

  • Data sovereignty. This is a national grid. Data does not leave the country, ever, under any circumstance. No cloud services. On-prem only.
  • Regulator approval. Any change to forecasting methodology had to pass review by the national energy regulator. The methodology had to be explainable, auditable, and demonstrably not worse than the incumbent in any important scenario.
  • Operational continuity. The incumbent forecasting platform continues to run. Our system had to produce forecasts in parallel, in the same format, through the same APIs, so that if our system went down the operator could instantly fall back to the incumbent.

We designed around all three. The deployment ran on the operator's own data centers, in an air-gapped VLAN. Every model decision was logged with full feature contributions and a plain-language explanation. The output format was byte-for-byte compatible with the existing forecasting platform, so that all downstream systems — balancing, nomination, settlement — were unaware anything had changed.

What we built

A probabilistic load forecasting system that fused four data sources at high temporal resolution:

  1. Smart-meter aggregates at 15-minute granularity, rolled up to substation level
  2. Numerical weather forecasts from two independent providers, with explicit uncertainty quantification
  3. Calendar and special-event data, including national holidays, sporting events, and known industrial demand patterns
  4. Distributed-generation estimates from a separate model trained on the smart-meter data to infer behind-the-meter solar and storage behavior

The forecasting model itself was an ensemble: a temporal-fusion transformer for the base case, a gradient-boosted model for rapid-response corrections, and a classical SARIMA baseline as a sanity check. The ensemble produced a central forecast plus calibrated confidence intervals at three percentile bands.

The regulator's question

During review, the energy regulator asked a specific question: "If your model is wrong in a new way that the incumbent would have gotten right, how will you catch that before it affects the balancing market?" We answered by running both models in parallel indefinitely, writing a reconciliation report for every meaningful divergence, and committing to reverting to the incumbent forecast automatically if our model's accuracy dropped below incumbent for three consecutive days. The regulator approved the deployment on those terms.

Results

-18%
MAPE vs. incumbent forecast
~$14M
Annualized balancing-cost savings
99.97%
Forecasting uptime over 12 months
0
Fallbacks to incumbent required

An 18% improvement in mean absolute percentage error translated, through the operator's published cost-of-error curves, into approximately $14M per year in avoided balancing-market expenditure. Per the regulatory framework, those savings flow back to ratepayers — not to the operator's shareholders and not to us. That was always the point of the engagement.

The bigger operational impact, though, was the calibrated confidence intervals. The incumbent system had produced a single point forecast, which the operator's balancing team had to mentally annotate with their own uncertainty estimates based on experience. The new system produces probabilistic forecasts, which the balancing team uses directly to size reserve requirements. That workflow change has made balancing operations more systematic and less dependent on individual operator experience.

The regulator's second look

Nine months into production, the regulator conducted a follow-up review. Their questions focused on two things: whether the accuracy improvements had held up under seasonal stress (yes — performance actually improved in winter when the incumbent's weakness was most visible), and whether the explainability of individual forecasts was adequate for scrutiny of specific balancing decisions (yes — the system produces, for any given forecast, a full trace of which features drove which part of the prediction).

The outcome of that review was the regulator's explicit endorsement of the methodology, which has since been cited as a reference case by two other national grid operators considering similar approaches.

What this engagement taught us about regulated utilities

  1. Data sovereignty is not negotiable. Build the on-prem deployment capability first. Cloud-only approaches disqualify you from most of the global regulated-utility market.
  2. Run in parallel, indefinitely. Regulator confidence comes from demonstrable parallel-run performance over seasons, not from model benchmarks on held-out data.
  3. Probabilistic beats point forecasts. The move from point forecasts to calibrated probabilistic forecasts changes how operators use the output, often more than the accuracy improvement itself.
  4. Design the explainability from day one. Retrofitting explainability to a regulated model deployment is nearly impossible. Build it in from the first commit.