Skip to content

13. Honest admission — what AI ethics and fairness still cannot settle cleanly

~15 min read. We have better audits and controls now, but some fairness questions remain structurally open, contested, or underdefined.

Built on the ELI5 in 00-eli5.md. The jury instructions — the fairness rules for the courtroom — still leave hard cases where reasonable people disagree on what the judge should optimize.


Fairness is partly technical and partly political

Here is the honest part. There is no final universal fairness metric that resolves every deployment. Why? Because fairness is partly about math and partly about values. Which errors matter more? Which history counts? Which disparities are acceptable tradeoffs? Who gets to decide?

The judge can satisfy one set of jury instructions and violate another. We saw that already. But the deeper issue is not only metric conflict. It is value conflict. A lender, a civil-rights advocate, a product manager, and an affected user may not rank harms the same way. That disagreement does not disappear by adding one more benchmark.

technical layer         governance layer
┌──────────────────┐    ┌──────────────────┐
│ rates, scores,   │    │ values, rights,  │
│ calibration,     │ ◀─▶│ legitimacy,      │
│ monitoring       │    │ accountability   │
└──────────────────┘    └──────────────────┘

See. A senior answer must acknowledge both layers. Pretending fairness is only optimization is too narrow. Pretending it is only politics is also too shallow.

Counterfactual fairness is appealing, but reality is tangled

Many people want a clean question. Would this person receive the same verdict in a world where only the protected trait changed? That sounds elegant. It is the spirit behind counterfactual fairness.

Now what is the problem? Real lives are causally entangled. Protected attributes influence schooling, neighborhood, policing exposure, wealth, health access, language style, and many other features through society itself. So changing one trait "while holding everything else fixed" may describe an impossible world.

Use a tiny example. Candidate A and Candidate B have equal test score 7. Candidate A attended a school prestige level 8 and lives in zip opportunity level 9. Candidate B attended school prestige 5 and zip opportunity level 4. The hiring model score is test + school + zip. So A gets 7 + 8 + 9 = 24. B gets 7 + 5 + 4 = 16.

Now remove the protected attribute column. The gap remains. School and zip still carry social history. Should the judge ignore them entirely? Maybe. Should it partially adjust them? Maybe. There is no universally agreed knob. That is the honest part.

LLM fairness adds culture, language, and context ambiguity

With LLMs, the open problems multiply. Respectful wording varies by culture. Some groups reclaim slurs in-group but not out-group. Dialect markers can change tone judgments. The same safety refusal may feel protective in one context and dismissive in another. Benchmarks help. They do not close the gap.

Multilingual fairness is especially hard. A moderation model may behave well in English and poorly in low-resource languages. A help bot may sound warm in one language and robotic in another. A retrieval layer may surface different sources by region, changing the evidence file available to the LLM. The courtroom itself is not linguistically uniform.

Simple, no? Open-ended generation produces open-ended fairness questions. That is why humility matters even more for LLM systems.

Participation is necessary, but hard to operationalize

Another honest gap is legitimacy. Who helps write the jury instructions? Only internal teams? Only the legal group? Only the loudest customers? Those are weak proxies for affected people.

Suppose a ranking product gets 50 complaints from heavy users and 5 from small creators. Volume suggests the heavy users matter more. But the creator group may lose actual income when exposure drops. Complaint count alone is not a neutral priority rule.

The appeal process therefore needs participation design. Which communities are consulted? Which harms are visible in metrics and which are only visible in testimony? Who can challenge the judge without insider power?

internal metrics ──→ useful, but incomplete
external complaints ──→ useful, but uneven
affected stakeholders ──→ necessary for legitimacy

Look. Participation is not magic either. Stakeholders disagree. Some groups are underrepresented even in feedback channels. Some harms appear only after long-term exposure. But without participation, the courtroom risks becoming procedurally neat and socially deaf.

What honest senior engineers should say

Say this. We can reduce harm significantly. We can define clearer jury instructions. We can audit slices. We can monitor drift. We can document limits. We can create appeal pathways.

But we cannot promise perfect neutral truth across every group, culture, and downstream use. We cannot fully remove all social history from the evidence file. We cannot settle value disagreements with one formula. We cannot guarantee that a polished explanation from the judge is the final moral answer. That is not weakness. That is disciplined honesty.

Look. The mature posture is not cynical. It is serious. Keep improving controls. Keep involving affected stakeholders. Keep documenting tradeoffs. Keep leaving room for human challenge. That is how accountable AI work actually looks.


Where this lives in the wild

  • Hiring-platform fairness councils — responsible AI lead: face unresolved debates about how much social history to remove versus how much job signal to preserve.
  • Multilingual moderation teams — trust and safety researcher: still struggle to align toxicity judgments across dialects, reclaimed language, and low-resource contexts.
  • Credit-model governance groups — model risk executive: must choose between competing fairness metrics when base rates and legal constraints pull in different directions.
  • LLM assistant builders — product ethics owner: cannot fully guarantee that explanations, refusals, and tone feel equally fair across cultures and identities.
  • Public-sector decision support teams — policy accountability reviewer: confront disagreements about legitimacy, appeals, and acceptable automation authority even when the metrics look decent.

Pause and recall

  • Why is fairness partly technical and partly political?
  • In the counterfactual example, why did removing the protected column fail to close the gap?
  • Why does multilingual LLM fairness remain especially difficult?
  • What is the honest senior-level claim about what fairness work can and cannot achieve?

Interview Q&A

Q: Why is it misleading to promise one universal fairness metric for every AI system? A: Because different products prioritize different harms, and many fairness criteria conflict both mathematically and normatively across contexts. Common wrong answer to avoid: "Because fairness metrics are all arbitrary and should be ignored."

Q: Why does removing protected attributes not solve deeper fairness problems? A: Because social history survives in correlated features, labels, and institutional processes that still shape the evidence file. Common wrong answer to avoid: "Because protected attributes are always required as model inputs."

Q: Why is humility a senior answer in AI ethics rather than a weak answer? A: Because responsible deployment is about measurable harm reduction under uncertainty, not about making impossible guarantees for complex socio-technical systems. Common wrong answer to avoid: "Because uncertainty means teams should avoid all concrete mitigations."

Q: Why might the next frontier of fairness work involve retrieval and information access, not only model scoring? A: Because what evidence the system surfaces first shapes both judgment and explanation, so fairness increasingly depends on search, ranking, and source coverage. Common wrong answer to avoid: "Because retrieval systems are neutral by default and only generative models can be biased."


Apply now (5 min)

Exercise. Write one fairness question in your domain that has no clean single-metric answer. Then list two stakeholders who might disagree about the right jury instructions and why.

Sketch from memory. Draw two columns: technical controls and value judgments. Place metrics, monitoring, and documentation on one side. Place legitimacy, rights, and tradeoffs on the other. Then connect them with arrows.


Bridge. We have spent this module asking whether the judge is fair and what evidence shaped the verdict. The next module moves one layer earlier: how do systems retrieve the right evidence in the first place, and what happens when search and ranking decide what the judge gets to see? → 00-eli5.md