08. Debiasing techniques — changing the evidence file, the judge, or the verdict rule¶
~16 min read. Fairness mitigation happens before training, during training, and after scoring, with different tradeoffs each time.
Built on the ELI5 in 00-eli5.md. The jury instructions — the fairness rules for the courtroom — can be enforced by reshaping the evidence file, retraining the judge, or adjusting how verdicts are issued.
Picture first: three places to intervene¶
Look at the pipeline. Bias can enter before training. During optimization. Or at the thresholding step. So mitigation can also happen in three places. Pre-processing. In-processing. Post-processing. Simple, no?
evidence file ──→ judge training ──→ score ──→ verdict
│ │ │
├── resample ├── adversarial └── recalibrate / retune threshold
└── reweight └── fairness regularizer
No single method is always best. Pre-processing is model-agnostic and often simple. In-processing can directly optimize fairness constraints. Post-processing is fast to deploy but may feel awkward if it changes thresholds by group. The right choice depends on legal, product, and operational constraints.
Pre-processing: resampling and reweighting the evidence file¶
Suppose the evidence file contains 900 examples from Group A and 100 from Group B. If you train naively, the loss is dominated by Group A. The judge can look globally strong while underlearning Group B.
Resampling changes the batch mix. You may oversample Group B or undersample Group A. Reweighting changes the loss contribution. Same examples. Different influence.
Worked example. Assume average loss per sample is 0.20 for Group A and 0.40 for Group B. Unweighted total loss contribution is: - Group A: 900 × 0.20 = 180 - Group B: 100 × 0.40 = 40 Total = 220
Group B has higher per-sample loss, but still contributes much less overall. Now assign weight 1 to Group A and weight 5 to Group B. Weighted contribution becomes: - Group A: 900 × 1 × 0.20 = 180 - Group B: 100 × 5 × 0.40 = 200 Weighted total = 380
See what happened. The optimizer now feels Group B mistakes strongly. That can improve slice performance. But it may also reduce overall accuracy or overfit small noisy groups. So what to do? Tune weights against your jury instructions, not by guesswork.
In-processing: teach the judge to forget sensitive clues¶
In-processing methods change training itself. One approach adds a fairness penalty to the loss. Another uses adversarial debiasing. The predictor tries to do the task well. An adversary tries to recover the protected attribute from the learned representation. The predictor is rewarded for task skill and penalized if the adversary can easily infer the sensitive group.
Picture this as two forces on the judge. One says, "predict well." The other says, "do not encode sensitive shortcuts too strongly." That tension can reduce proxy leakage in the representation.
representation h
│
├── task head ─────────→ predict label well
└── adversary ────────→ recover group?
▲
└── predictor tries to make this hard
This is elegant. It is also not magic. If the label itself is biased, the model can still learn harmful patterns. If proxies are everywhere, complete removal may crush utility. And fairness penalties require careful optimization. The judge is being pulled by multiple objectives now.
Post-processing: change the verdict rule after scores exist¶
Sometimes retraining is too slow. Sometimes the score is already calibrated and owned by another team. Then post-processing becomes attractive. You keep the score. You change how it becomes a verdict.
Suppose one threshold gives: - Group A TPR 0.85 and FPR 0.25 - Group B TPR 0.70 and FPR 0.12 The model is harsher on Group B. Now lower Group B threshold slightly. New metrics become: - Group A TPR 0.85 and FPR 0.25 - Group B TPR 0.80 and FPR 0.20
The gap narrows. Good. But maybe calibration across groups shifts. Maybe operations dislike group-specific thresholds. Maybe the legal team objects depending on jurisdiction and use case. So post-processing is practical, but politically and legally sensitive. That belongs in the case record.
Choosing mitigation without fooling yourself¶
Now what is the trap? Teams often run one mitigation, watch one metric improve, and declare victory. Do not do that. Check the full tradeoff surface. Overall utility. Slice utility. Calibration. Latency or retraining cost. Operational complexity. User appeal burden. Regulatory acceptability.
The appeal process should compare before and after tables. Not only one fairness number. Not only one benchmark. You may fix false negatives and worsen false positives. You may help one group and hurt another. You may reduce binary disparity while making explanations less stable. Yes? Mitigation is engineering under constraints.
A mature team says, "We used reweighting to reduce false negative disparity from 18 points to 7 points, at a three-point drop in overall precision." That is honest. That is much better than, "We debiased the model."
Where this lives in the wild¶
- Fairlearn mitigation workflows — responsible AI engineer: compare reweighting, threshold adjustment, and constraint-based training against explicit fairness metrics.
- Credit underwriting teams using XGBoost — model validator: often start with sample weighting before attempting more complex in-training fairness objectives.
- Computer vision moderation pipelines — ML research lead: rebalance underrepresented classes and lighting conditions before retraining large detectors.
- Hiring-screen vendor audits — compliance analyst: review whether post-processing thresholds reduce harmful rejection disparity without violating policy constraints.
- Healthcare prioritization systems — clinical model owner: test whether fairness penalties improve underserved-patient recall without degrading safety-critical precision too far.
Pause and recall¶
- What are the three major places where debiasing interventions can happen?
- In the weighted-loss example, why did Group B influence training much more after reweighting?
- Why can adversarial debiasing help without solving every fairness problem?
- Why is post-processing attractive operationally but sometimes controversial?
Interview Q&A¶
Q: Why start with reweighting or resampling and not jump straight to adversarial debiasing? A: Because simpler interventions are easier to audit, implement, and compare, and they often deliver meaningful gains before optimization complexity rises. Common wrong answer to avoid: "Because adversarial debiasing only works for image models."
Q: Why is a fairness-improved model not automatically a deployment win? A: Because mitigation changes tradeoffs across utility, calibration, legal acceptability, and operational complexity, not just one disparity number. Common wrong answer to avoid: "Because fairness always reduces accuracy so deployment is impossible."
Q: Why can post-processing thresholds be effective even without retraining? A: Because many disparities emerge at the decision boundary, so changing how scores turn into verdicts can materially alter error rates. Common wrong answer to avoid: "Because thresholds can create new information the model never learned."
Q: Why must teams report before-and-after tables instead of saying a model was debiased? A: Because every mitigation improves some metrics while potentially worsening others, and responsible review needs the explicit tradeoff surface. Common wrong answer to avoid: "Because the word debiasing is banned in responsible AI documents."
Apply now (5 min)¶
Exercise. Take a toy imbalanced dataset. Compute how much loss each group contributes before and after weighting. Then write one sentence on how the judge's learning pressure changes.
Sketch from memory. Draw the pipeline with three intervention points. Label pre-processing, in-processing, and post-processing. Under each, write one method name and one risk.
Bridge. So far we mostly discussed tabular or scoring models. LLMs bring a wider kind of harm, where the verdict is text itself and fairness includes representation, stereotype, and allocation effects. → 09-fairness-in-llms.md