07. Bias in feedback¶

Calibration uses feedback as ground truth. The next discipline is bias awareness — selection, response, sycophancy — that distorts feedback if uncorrected. Without it, the team optimises against a non-representative signal.

A platform engineer at a Mumbai SaaS company drives product changes based on user feedback. The team is responsive; complaints are addressed. Six months in, an audit reveals that the team has been over-indexing on a vocal 2% of users — the ones who consistently provide explicit feedback. The other 98% have different needs; some features prioritised for the vocal minority have produced no measurable change in the broader user base's satisfaction. The fix is bias awareness: weight feedback by representativeness, triangulate explicit with implicit, periodically survey the silent majority through proactive (representative) sampling.

This chapter is that awareness. Feedback is signal; bias is the noise inside the signal; the discipline is to know which is which.

The three biases¶

Bias	What happens	Distortion
Selection bias	Who responds is not representative of who uses	Over-indexes on responders' preferences
Response bias	Extreme reactions respond more than middle reactions	Over-indexes on the loud minority
Sycophancy bias	Users praise when they think the system needs encouragement; criticise less than warranted	Inflates positive signal

Each is well-documented in user research. The agent feedback context inherits these from the broader UX research tradition.

Selection bias¶

Who responds to explicit feedback is a non-random sample of users.

Patterns:

Power users respond more than casual users.
Frustrated users respond more than satisfied users.
Enterprise users respond more (paid contracts; they engage with the platform team).
Tech-comfortable users respond more (they find the buttons).

A platform that uses feedback responses as the user population is reasoning about responders, not users. Decisions optimised for responders may harm the broader population.

Mitigation. Slice feedback by user attributes; compare responder demographics to overall user demographics. Where significant skew exists, weight the feedback accordingly or proactively sample the under-represented populations.

Response bias¶

Users at the extremes of satisfaction respond more than users in the middle.

A user who is delighted clicks thumbs-up; a user who is angry clicks thumbs-down; a user who is mildly satisfied does nothing. The collected feedback is bimodal; the median user is missing.

Consequence: the negative-thumbs rate overstates dissatisfaction (the merely-okay users do not register), and the positive-thumbs rate over-states delight. The true distribution is in the middle, mostly unobserved.

Mitigation. Combine explicit (which catches extremes) with implicit signals (which cover the median). Treat the explicit rate as a directional signal, not an absolute representation.

Sycophancy bias¶

In agent contexts specifically, users sometimes provide feedback shaped by what they think the agent "wants" or what is socially expected.

Users praise the agent when the response is reasonable, even if not great.
Users hesitate to be harshly critical of a polite AI assistant.
Users may provide positive feedback as encouragement, expecting improvement.

The effect: positive feedback over-states real satisfaction; negative feedback is more reliable as a signal of real problems.

Mitigation. Weight negative feedback as the stronger signal; treat positive feedback as confirmation rather than measurement. For calibration (chapter 06), the disagreement cases on the negative side are more informative than on the positive side.

Cohort comparison¶

The strongest discipline against bias is cohort comparison. The team compares feedback rates and content across cohorts:

Responders vs all users (selection bias check)
Power users vs casual (engagement-level slicing)
Premium vs free tier (segment slicing)
Recent users vs long-term (temporal slicing)
Mobile vs desktop (channel slicing)

Where cohorts differ significantly, the team knows the aggregate is hiding structure. Decisions slice by cohort, not just on aggregate.

Proactive sampling¶

To address selection bias, the team proactively samples representative users for feedback. Methods:

Periodic in-product surveys to a random sample.
Outbound user research (the team contacts a sample for structured feedback).
Customer-success conversations with a planned cross-section of users, not just the ones who complain.

The proactive samples produce smaller volumes but more representative signal. Combine with the larger volume of self-selected feedback for triangulation.

Triangulation¶

Three signals — explicit feedback, implicit signals, proactive samples — triangulate the truth. Each has biases; the combination is more robust than any one.

When all three agree, confidence is high. When two agree and one diverges, investigate the divergent one. When all three disagree, the team has a complex situation that needs careful analysis.

For most product decisions, triangulation is the discipline that prevents bias-driven mistakes.

What bias awareness looks like in practice¶

Concrete behaviours:

The team reports feedback rates with the responder demographics noted.
Dashboards show cohort-sliced metrics, not just aggregates.
Quarterly proactive samples supplement the self-selected feedback.
Major product decisions cite triangulated signals, not single sources.
Calibration (chapter 06) uses cases representative of the user base, not skewed toward responders.

The disciplines are operational; they require time and attention; they prevent the chapter-opening pattern of over-indexing on vocal minorities.

What bias awareness does not solve¶

The fundamental noise in feedback. Even unbiased feedback has variance.
Disagreements between users. Different users have different needs; the team must navigate trade-offs.
The cost of proactive sampling. Representative samples cost time and money; not every platform can sustain them.

Common mistakes¶

Treating feedback as user-representative. Optimising for responders harms the silent majority.

Weighting positive feedback as much as negative. Sycophancy bias inflates positive; negative is the more reliable signal.

Aggregate-only dashboards. Cohort patterns hidden.

No proactive sampling. Only self-selected feedback informs decisions.

Single-signal decisions. Major decisions on one biased signal; the triangulation discipline ignored.

Interview Q&A¶

Q1. The team has been driving product changes based on user feedback for six months. The broader user base is not happier. What might be happening? Selection bias. The feedback comes from a non-representative sample of users (typically power users, frustrated users, and tech-comfortable users). Changes optimised for them may not address the broader user base's needs. The team should compare responder demographics to all-user demographics; weight feedback by representativeness; triangulate with implicit signals (which cover everyone) and proactive sampling (which targets representativeness). Wrong-answer notes: "the changes were wrong" misses the systemic cause.

Q2. Walk through the three biases in agent feedback. Selection: who responds is not random; power users, frustrated users, tech-comfortable users respond more. Response: extremes respond more than middle; the merely-okay user is unrepresented. Sycophancy: users praise the agent or moderate their criticism; positive feedback is inflated. Combined effect: feedback over-states extreme reactions and under-represents median users. Mitigations are cohort comparison, implicit-signal triangulation, proactive sampling. Wrong-answer notes: missing one of the three or conflating them.

Q3. How would you triangulate explicit feedback, implicit signals, and proactive sampling for a major product decision? Look at all three. If they agree (e.g., positive trend in explicit thumbs, decreasing abandonment, positive proactive-sample responses), confidence is high; ship. If two agree and one diverges (e.g., positive explicit and proactive, but rising abandonment), investigate the divergent — what segment is abandoning? The aggregate hides important structure. If all three disagree, the situation is complex; do not ship without deeper analysis. The triangulation is the discipline that prevents single-source bias. Wrong-answer notes: "trust the most recent signal" loses the multi-source value.

Q4. The team's explicit feedback is 80% positive; proactive sample shows mixed; abandonment is rising. What do you do? Investigate. The three signals diverge. Explicit is selection- and sycophancy-biased upward; proactive is more representative; abandonment is unsentimental about who responds. The likely picture: vocal users are happy; broader users are mixed; some are giving up. Investigate the abandonment cohort — which segment, what cases. The decision is not "trust one signal" but "understand why they diverge and act on the underlying issue." Often the abandoning users are a segment the platform has not been serving well; product changes there are the response. Wrong-answer notes: "80% positive is enough" misses the signal in the abandonment rise.

What to do differently after reading this¶

Compare responder demographics to all-user demographics; surface the gap.
Slice feedback dashboards by cohort.
Run periodic proactive samples for representative signal.
Triangulate explicit, implicit, and proactive sources for major decisions.
Weight negative feedback as the stronger signal in agent contexts.

Bridge. Bias awareness improves how feedback is used. Privacy ensures the feedback itself is handled responsibly. The next chapter applies the privacy discipline to feedback data. → 08-privacy-in-feedback.md