I keep ending up in the same argument. A team has shipped a model into a regulated workflow, the reviewer asks how the system arrived at a given output, and somebody on the team mumbles something about SHAP plots and we can generate an explanation if anyone asks. That is not an explainability strategy. That is a liability with a Jupyter notebook attached.
If you are building AI for clinical, financial, or public-sector use in 2026, explainability is no longer a research toy or a final-week compliance bolt-on. It is a load-bearing element of the architecture. Treat it that way or do not ship.
Why this stopped being optional
Three forces have collapsed onto the same point.
Regulators. The EU AI Act's high-risk obligations under Articles 13 and 14 — transparency to deployers, and meaningful human oversight — apply to Annex III systems from 2 August 2026. The wording is not subtle: a high-risk system must be designed so that a deployer can actually interpret the output and use it appropriately. A black box that nobody on the receiving end can reason about does not satisfy that bar.
Clinicians. I work daily with the people who would have to sign their name under an AI-assisted decision. They will not. Not without seeing what the model saw, why it weighted what it weighted, and how confident it is. Trust in clinical AI is not built by a slick dashboard; it is built by the doctor being able to challenge the model and watch it back down where it should.
Plaintiffs. An uninterpretable model in a courtroom is a gift to opposing counsel. "The system said so" is not a defense. The discovery process around model behavior is going to look ugly for any team that did not capture, at inference time, exactly why the system produced what it produced.
The two flavors of explainability that actually ship
Stop thinking about XAI as a single thing. In production there are two distinct patterns, and they answer different questions.
1. Inherently interpretable models for narrow, high-stakes decisions
For the load-bearing decision — is this signal abnormal, does this transaction look like fraud, does this applicant qualify — use a model whose structure a competent reviewer can read. Generalised additive models, sparse linear models, monotonic gradient-boosted trees, decision lists. Constraints on the architecture buy you a property no post-hoc method can guarantee: the explanation is the model.
This was the under-appreciated finding from the DARPA XAI program: in several domains, models built to be interpretable did not lose meaningful accuracy. The folklore that you must trade explainability for performance is, frequently, just folklore.
2. Post-hoc explanations and provenance for LLM-driven outputs
For everything an LLM touches — narrative generation, summarisation, retrieval-augmented answers — you cannot crack open the model and read it. So you do the next best thing: capture the receipts. Which prompt, which model and version, which retrieved documents, which tool calls, which features fed the upstream classifier. Stored alongside the output. Immutable.
Why post-hoc-only is not enough for regulated diagnostics
At Eir Tec our clinical platform processes 4-channel home QEEG and produces an analysis a clinician can act on. The temptation is obvious: throw the signal at a deep model, ask an LLM to write the report, ship it. We did not do that, and we will not.
The signal-layer analysis is built around inherently explainable methods. Spectral features, ratios, asymmetry indices, with thresholds and weightings a neurophysiologist can argue with. The LLM sits on top, generating clinician-readable narrative from a structured analysis object. The narrative cites the analysis. The analysis does not depend on the narrative. If the LLM goes off-piste, the underlying numbers are still there, still defensible.
This matters because of a result the faithfulness literature has been hammering at for two years: when an LLM is asked to explain a decision after the fact, it will often confabulate a plausible rationale that has nothing to do with the actual cause. An explanation generated on demand is not an explanation. It is a story.
The architectural pattern
Every AI output in a high-stakes system should be persisted as a record with at least the following:
- Inputs — the exact data the model saw, hashed and stored, not just a reference that may drift.
- Model identity — name, version, training-data snapshot, evaluation suite ID.
- Contributing evidence — for tabular models, the contributing features and their attributions; for retrieval, the retrieved chunks and their IDs; for tool use, the call graph.
- Calibrated confidence — not raw softmax. A calibrated probability or an explicit uncertainty band, with the calibration method recorded.
- Rationale string — a human-readable explanation, generated at inference time from the structured evidence, not retrofitted later.
This record is part of the data model. It lives in the same transaction as the result. You cannot query the output without being able to query why.
At MediVox, our SOAP-generation flow follows the same shape: every generated section carries a provenance trail back to the source utterances and the model version that produced it. If a clinician edits a line, that is captured too. The audit story is not a nightly job; it is the schema.
Tooling, used judiciously
SHAP and LIME still earn their place for tabular and feature-based models. Captum is the sensible default for PyTorch-based attribution across CNNs, RNNs, and transformers. Grad-CAM for imaging where spatial evidence is what the clinician needs. Use them. Just remember what they are: approximations of model behaviour, with known failure modes. Cross-check with a second method when the stakes warrant it; the literature is full of cases where SHAP and LIME disagree, and the disagreement is the signal.
Beyond attribution, the things that actually move the needle in production:
- Model cards that are kept in sync with the deployment, not written once and forgotten.
- Eval suites that test explanation faithfulness, not just answer accuracy. If a retrieved-doc citation can be removed without changing the answer, the citation was decorative.
- Retrieval-augmented logging: every RAG hop captured with the retrieved IDs, scores, and the prompt that consumed them.
- Calibration monitoring as a first-class metric. A miscalibrated 95% confidence is more dangerous than a wrong answer with an honest 60%.
The closing point
Explainability is not a UX feature. It is part of the data model.
If your explanation is generated when the regulator asks, the regulator is right to be suspicious. Capture it at inference. Persist it. Test it. Version it. Make the system unable to produce an output without producing the receipts.
Build it like that, ship it like that, audit it like that. The teams who do not are going to spend 2027 explaining to a tribunal what their model was thinking. They will not have a good answer.