05. Interpretability basics — opening the judge without pretending to read its soul¶
~15 min read. We need tools that explain which pieces of evidence pushed the verdict, locally and globally.
Built on the ELI5 in 00-eli5.md. The judge — the model issuing a verdict — should not remain a sealed chamber when people need to understand which evidence mattered.
Picture first: global explanations and local explanations are different windows¶
Imagine entering the courtroom after a verdict. You can ask two different questions. First, across thousands of cases, which kinds of evidence usually matter most? That is a global question. Second, for this one person, why did the judge say yes or no? That is a local question.
Interpretability tools answer one or both. Feature importance gives a global signal. SHAP gives additive local attributions. LIME builds a tiny local surrogate around one case. Attention maps show where some neural models focused, but they are not the full story. Simple, no? Different tools answer different explanation questions.
same model
│
├── global window ──→ which features matter often?
│
└── local window ──→ why this verdict for this case?
See. Confusion starts when teams mix the windows. A global importance chart cannot fully explain one applicant's rejection. A local explanation for one applicant cannot summarize how the whole judge behaves. Use the right window for the right appeal.
Feature importance: what moves the model most overall?¶
Feature importance is the coarse map. In tree models, you may use gain or split counts. In black-box settings, permutation importance is safer. The idea is simple. If you shuffle one feature and model quality drops sharply, that feature mattered.
Picture a courtroom clerk removing one document type from many cases. If verdict quality collapses, that document was important. If little changes, that document was less central.
Worked example. Suppose your risk model has AUROC 0.84. Shuffle debt-to-income ratio across validation cases. AUROC falls to 0.77. Importance drop = 0.84 - 0.77 = 0.07. Now shuffle missed-payment count. AUROC falls to 0.74. Importance drop = 0.10. Now shuffle browser type. AUROC falls to 0.839. Importance drop = 0.001.
Look. Missed-payment count mattered more than browser type. Good. But feature importance does not tell you direction for one person. It does not say whether high value helped or hurt this individual. And correlated features split credit strangely. If zip code and neighborhood income move together, importance may bounce between them. The appeal process needs that warning in mind.
baseline AUROC = 0.84
│
├── shuffle missed payments ──→ 0.74 drop 0.10
├── shuffle debt ratio ───────→ 0.77 drop 0.07
└── shuffle browser type ─────→ 0.839 drop 0.001
SHAP: additive contributions for one verdict¶
Now let us open one specific case. SHAP asks, "Starting from a baseline prediction, how much did each feature push this case up or down?" That feels close to courtroom language. Each piece of evidence nudges the verdict.
Picture a neutral starting point. Then each evidence item adds or subtracts force. At the end you get the final score. Good. That is why SHAP is popular. It matches human explanation style.
Use a small numeric example. Suppose the baseline default probability in your portfolio is 0.20. For one applicant, SHAP contributions are: - debt ratio: +0.15 - recent missed payment: +0.25 - stable employment: -0.10 - savings balance: -0.05
Now add them. Start at 0.20. Add debt ratio. 0.20 + 0.15 = 0.35. Add missed payment. 0.35 + 0.25 = 0.60. Add stable employment. 0.60 - 0.10 = 0.50. Add savings. 0.50 - 0.05 = 0.45. Final predicted default probability = 0.45.
Simple, no? This lets the appeal process say, "The verdict was not driven by one thing alone." It was a balance of pushes. That is much more useful than a raw score alone.
But remember the caution. SHAP explains the model, not the world. If the judge learned a bad proxy, SHAP will faithfully tell you that the proxy mattered. That does not make the model fair. Interpretability is diagnostic. Not absolution.
LIME and attention visualization: useful, but handle gently¶
LIME builds a simple local model around one case. It perturbs the input slightly. Then it fits an interpretable surrogate near that point. This can be helpful when exact model internals are hard to access.
But LIME depends on how you perturb the input. Bad perturbations create silly local neighborhoods. Correlated features can break the story. Two runs may give slightly different explanations. So what to do? Use LIME as a probe, not as sacred truth.
Attention visualization feels intuitive in language or vision models. People see highlighted tokens and think, "That is the reason." Careful. Attention shows some internal weighting, not a complete causal explanation. The model has residual paths, MLP layers, token interactions, and other circuits. The judge has many chambers, not one spotlight.
input tokens
│
├── attention map ──→ where some focus went
├── MLP states ─────→ hidden transforms still matter
└── residual paths ─→ final verdict uses all of this
Look. A highlighted phrase can be useful. It can help reviewers inspect evidence quickly. It can support user-facing transparency in low-stakes tools. But it is not a complete proof of why the verdict happened. That line belongs in the case record whenever attention visuals are shown.
What interpretability is actually good for¶
Interpretability helps in four practical ways. First, debugging. You catch proxy features, leakage, and brittle rules. Second, appeals. Humans can review which signals pushed a harmful decision. Third, compliance and governance. Teams can document what the judge appears to rely on. Fourth, trust calibration. Users and operators learn when to question a score.
It does not replace fairness metrics. It does not prove causal truth. It does not guarantee faithful explanations for giant neural models. Still, it is extremely useful. See. If the courtroom stays completely sealed, responsible review becomes nearly impossible.
Where this lives in the wild¶
- Stripe Radar analyst tools — fraud operations investigator: use feature-attribution views to understand why a payment was blocked or sent to manual review.
- Google Cloud tabular models — ML platform engineer: expose global importance and local attributions so business teams can inspect decision drivers.
- H2O Driverless AI workflows — model risk validator: compare permutation importance and SHAP explanations before approving a model for production use.
- Amazon SageMaker Clarify users — responsible AI engineer: run feature-attribution reports to detect proxy dependence and slice-specific explanation patterns.
- Healthcare risk stratification dashboards — clinical data reviewer: inspect local explanation summaries before escalating or overturning automated prioritization.
Pause and recall¶
- What is the difference between global and local interpretability?
- In the SHAP example, which features pushed risk up and which pulled it down?
- Why can an accurate explanation of the model still describe an unfair model?
- Why should attention visualization be treated as a partial clue rather than final proof?
Interview Q&A¶
Q: Why use SHAP for a local lending explanation and not only global feature importance? A: Because an individual appeal asks which factors pushed this specific verdict, while global importance only summarizes average influence across many cases. Common wrong answer to avoid: "Because SHAP is always more accurate than the underlying model."
Q: Why prefer permutation importance over raw tree split counts in many audits? A: Because permutation importance measures actual performance dependence on held-out data, while split counts can overstate noisy or high-cardinality features. Common wrong answer to avoid: "Because tree models cannot compute internal feature importance at all."
Q: Why is interpretability not the same as fairness? A: Because explanation tools reveal what the judge relied on, but they do not decide whether those learned patterns are socially acceptable or causally valid. Common wrong answer to avoid: "Because unfair models are impossible to explain."
Q: Why should teams be cautious about treating attention maps as explanations? A: Because attention shows one internal weighting mechanism, not the full set of computations that produce the final verdict. Common wrong answer to avoid: "Because attention weights are random and contain no information."
Apply now (5 min)¶
Exercise. Take one scored example from a model you know. Write a baseline score and four feature contributions. Add them step by step until you reach the final verdict. Then ask whether any contribution looks like a social proxy.
Sketch from memory. Draw two windows into the judge. Label one global and one local. Under them, write which tool fits each window best: permutation importance, SHAP, or LIME.
Bridge. Classic models at least give us tools like SHAP and LIME. With LLMs, the harder question appears: when is an explanation actually faithful instead of just persuasive? → 06-explainability-llms.md