Every six weeks, another enterprise publishes an AI governance framework. They tend to agree on the principles — fairness, accountability, transparency, human oversight — and to disagree on what to actually do about them. The reason, in our view, is that most of these frameworks are solving the wrong problem. The hard problem is not policy. It is data lineage. And until you solve lineage, you cannot seriously do governance.
The governance-theater problem
We have now seen enough AI governance documents across enough industries to draw a reliable conclusion: most of them are theater. They are internally consistent. They cite the right regulations. They list the right committees. What they do not do is produce, for any specific AI-driven decision, a defensible answer to a simple question: which data shaped this?
That question is the operational core of governance. Every regulatory framework we've read — the EU AI Act, the NIST AI Risk Management Framework, the proposed U.S. federal agency guidance, the emerging MAS and FCA regimes — ultimately reduces, in practice, to accountability for specific decisions. An auditor, a regulator, or a customer's lawyer asks: why was this loan declined? Why was this claim flagged? Why did the model route this transaction to manual review?
An organization that cannot produce, for each of these decisions, a reproducible trace back through every data element, every model version, every preprocessing step, and every feature-engineering transformation, does not actually have governance. It has a policy document.
Why this problem is harder than it looks
If you've never built one, data lineage sounds like it ought to be the easy part. It isn't. A typical enterprise AI system pulls data from:
- Multiple source systems with different update cadences — transactional databases, warehouses, third-party APIs, document stores
- A series of ETL or ELT pipelines, each of which transforms the data in ways large (aggregation, joining, deduplication) and small (null handling, type coercion)
- Feature stores that pre-compute aggregations and enrichments
- Embedding systems that convert text, images, or structured data into vector representations
- Retrieval systems that pull relevant context for a specific query
- Multiple model versions, each trained on slightly different data slices
- Inference-time transformations — prompt templates, tokenization, truncation
To answer "which data shaped this decision," you need to trace backward through all of these. You need to know which version of the data was current at the moment the model ran. You need to know which version of the model was deployed. You need to know which feature engineering was applied, and with what parameters. You need to know, if there was RAG involved, which specific documents were retrieved and in what order.
Most organizations have maybe two of these layers instrumented. The rest is archaeology.
In about 70% of the enterprise AI deployments we've reviewed, the organization cannot — even with engineering effort — reconstruct the exact data state that produced a specific historical decision. The feature store gets overwritten. The upstream data gets patched. The model gets silently updated. The trace breaks.
What real lineage looks like
A genuinely governed AI system — the kind that produces defensible answers to audit questions three years after a decision was made — requires a few specific design choices, built in from the start.
Immutable input snapshots
Every model inference must write, or reference, an immutable snapshot of the exact data that went into it. Not a pointer to "the customer record at that time" — an actual hash-identified artifact that cannot be changed retroactively. This is expensive in storage. It is the price of defensibility.
Signed model versions
Every model that produces a decision must be cryptographically signed, version-addressable, and archived for the retention period required by regulation. A model that was deployed for one afternoon in April 2024 must still be recoverable in April 2031.
Explicit transformation logs
Every data transformation between source and model — every join, filter, aggregation, and enrichment — must be logged with enough detail to be replayed. Not "we ran the pipeline" but "we ran pipeline v4.2.1 with config X, producing output hash Y."
Decision provenance records
Every consequential output of the system — every recommendation, score, classification, or generation — must carry, as metadata, the references needed to reconstruct its full lineage: model version, input snapshot hash, transformation versions, retrieved context ids. That metadata is itself archived, immutably.
Why most organizations skip this
It is expensive. It is slow. It makes the initial deployment feel heavier than it needs to be. And it produces no immediate value — the ROI shows up in the audit you haven't had yet, the regulator letter you haven't received yet, the litigation you aren't defending yet.
The result is a predictable pattern: organizations deploy AI quickly, book the productivity gains, and tell themselves they'll add governance "once things settle down." Things do not settle down. They get further from ground truth every week. By the time the regulator or plaintiff arrives, the lineage debt is unrecoverable.
What we recommend
For any enterprise we work with that is starting an AI program, our consistent recommendation is: build the lineage infrastructure before you deploy the first model. It feels like premature optimization. It isn't. It is the only way to stay out of the "archaeology" mode that most mature AI organizations have resigned themselves to.
Specifically:
- Design the snapshot/logging architecture before designing the first model. It shapes every downstream decision about data pipelines, feature stores, and inference servers.
- Enforce it with tooling, not with policy. Engineers will skip lineage logging if they have to remember to turn it on. Make it structurally impossible to deploy a model that doesn't produce lineage records.
- Store everything immutably, from day one. S3 Object Lock, Azure Immutable Blob Storage, or equivalent on-prem equivalents. Retention periods set to the longest regulatory window that could plausibly apply.
- Budget for the storage cost separately. It is not a trivial number. It is also much cheaper than the alternative.
What this implies for governance committees
If your organization's AI governance committee is spending most of its time on principles, frameworks, and escalation paths — and not on data lineage infrastructure — the committee is solving the wrong problem.
The work that matters is unglamorous: which systems log what, to where, with what retention, with what integrity guarantees. This is not the conversation committees want to have. It is the conversation they need to have. Everything else is downstream of it.
A governance framework without lineage is a promise you cannot keep. It may satisfy an initial regulatory review. It will not survive the first serious audit.