A global automotive tier-1 supplier, producing precision-machined components for major OEMs across four continents, was running a first-pass yield rate of roughly 94%. On paper that looked acceptable. On the P&L, the remaining six points represented more than $60 million a year in scrap, rework, and warranty exposure. The CFO wanted a yield program. The plant operations team wanted to be left alone to hit quarterly volume commitments. Both were right.

The challenge

The yield problem was not mysterious — the supplier had instrumented most of its lines extensively. What was missing was the ability to act on the data fast enough. Quality issues were typically caught hours after they started, in the end-of-shift inspection reports. By then, thousands of parts had been produced, and scrap was a foregone conclusion. Root-cause analysis happened days later, in an engineering review meeting, by which point the specific combination of parameters that had caused the defect had long since drifted.

Two earlier initiatives had tried to close this loop. The first, led by the MES vendor, proposed an upgrade to an AI-enabled module — seven-figure cost, eighteen-month program, no guarantee of plant-specific results. The second was a machine-learning pilot run by a data-science consultancy, which built a promising model but could not integrate it into the production-line environment without changes to the MES. Both stalled.

What plant managers told us, in different words at every plant we visited: "Don't touch the MES. Don't touch the PLCs. Don't ask my engineers to change what they do at the line. If you can give us a thirty-second heads-up before a quality issue starts, we'll take it."

What we built

A parallel real-time pipeline that read from every plant's SCADA system, applied a per-plant predictive model, and pushed alerts directly to the shop-floor tablets operators were already using. It did not touch the MES. It did not touch the PLCs. It did not require operators to learn a new tool. When the model predicted a quality deviation was about to emerge, a yellow banner appeared on the operator's existing interface telling them which parameter to check.

Three components:

  • Edge inference per plant. Each plant got a small GPU cluster inside its own network, running models trained on that plant's specific equipment signatures. Data never left the plant. Inference latency stayed under 800ms even at peak throughput.
  • A central model registry. Cross-plant learnings — new defect patterns, new parameter combinations to watch — were shared centrally and pushed out as model updates to every plant. Every plant benefited from every other plant's experiences.
  • A thin integration layer. Alerts were delivered through a single REST endpoint into the existing operator tablet application, which the MES vendor had built an open API for. No MES changes.
Why this approach worked

Most industrial AI projects fail because they try to become the new system of record. Ours explicitly didn't. The MES remained the single source of truth for production data, for the PLC commands, for the quality records. We added a parallel observation-and-prediction layer that only wrote one kind of data back: early-warning alerts. That's a narrow enough scope that no plant IT team raised serious objections.

The rollout sequence

  1. Weeks 1-6 — First plant, instrumentation review. We started at the plant with the most mature SCADA data and the most supportive plant manager. We mapped every data stream, identified gaps, and quantified the current-state yield baseline in a way everyone agreed to.
  2. Weeks 7-10 — Model development. Supervised classification and time-series anomaly models trained on eighteen months of production data. Critically, we involved the plant's process engineers in feature design — they knew which combinations of parameters mattered and which were spurious correlations.
  3. Weeks 11-12 — Shadow mode. Models running live but alerts invisible to operators. We compared model predictions against actual quality outcomes for two weeks. Calibration was adjusted. False-positive rates tuned down.
  4. Weeks 13-14 — Production. Alerts visible on operator tablets. Incident response process documented. Plant manager dashboard showing prevented-defect tracking.
  5. Months 4-11 — Rollout to remaining 17 plants. Each plant went through a compressed version of the first-plant sequence, typically four to six weeks. Cross-plant model sharing activated after plant 6.

Results

+3.4 pts
First-pass yield improvement
$42M
Annualized savings across 18 plants
14 wk
Payback period
-27%
Unplanned line stops

The 3.4-point yield improvement was driven mostly by early intervention — operators catching drift before it produced scrap, rather than after. Unplanned line stops fell as well, because the same model that predicted quality deviations turned out to be surprisingly good at flagging imminent equipment issues a few minutes before they became failures.

The economics generalized well. The first plant paid back the entire program's deployment cost in 14 weeks. Subsequent plants, benefiting from the central model registry, paid back faster — the last plants to go live saw payback in 6-8 weeks.

What didn't work the first time

Our initial alert design was too aggressive. Operators saw a warning banner for any predicted deviation, no matter how marginal, and the noise quickly eroded trust. We rebuilt the alerting logic with three severity tiers and tighter thresholds on the top tier, and operator engagement recovered within weeks. The lesson: every false positive costs you credibility, and credibility is the resource the whole deployment runs on.

We also initially underestimated how much plant-to-plant variation mattered. A model trained on plant A's signatures produced far too many false positives when applied directly to plant B, even though the equipment was nominally the same. We redesigned the pipeline to train per-plant models with shared architecture but plant-specific weights. Accuracy recovered and the cross-plant learning pattern still worked — just at the level of features and patterns, not raw model weights.

What the COO said

We spent two years trying to buy an AI solution. What we needed was someone willing to build around what we already run, instead of on top of it. The MES didn't change. The PLCs didn't change. The operators' job didn't change. Only the outcomes did. — Chief Operating Officer, client supplier

Lessons that generalize

Three observations we now default to on industrial engagements:

  1. The operator tablet is the intervention surface. Not a new dashboard, not an email alert, not a Teams channel. Whatever the operators already look at is where the intelligence belongs.
  2. Per-plant models, shared architecture. Plants are more different from each other than most executives believe. Identical lines in different facilities produce subtly different signatures. Respect that.
  3. Instrument the baseline before you build the model. We spent the first six weeks just measuring current-state yield in a way everyone agreed on. That investment paid for itself many times over when it came time to demonstrate ROI.