02. Explicit feedback capture¶

Diagnosis in hand. The first surface to design is explicit feedback — direct user signals about whether the AI's response was good. Thumbs, ratings, comments, structured forms. Each has design trade-offs; the discipline is to pick the right shape for the workload.

A platform engineer at a Chennai SaaS company adds thumbs to the agent's responses. Response rate is 0.3% — three users out of every thousand click. The team is disappointed. The engineer investigates. The thumbs appear small, after the response, with no clear value to the user. She redesigns: thumbs are visible inline with the response; an optional comment opens on click; users who provide feedback see "thanks, we'll improve" with a small acknowledgement. Response rate jumps to 3%. Still small in absolute terms but 10× higher; the volume is now enough to drive useful weekly analysis.

This chapter is the discipline of explicit feedback. Capture so users actually respond; structure so the response is useful; route so it feeds the pipeline.

The four explicit signals¶

Signal	What it captures	Use case
Thumbs up/down	Binary satisfaction	High-volume; low effort per response; coarse signal
Star rating	Graded satisfaction	Medium-volume; nuance over binary; risk of mid-rating clustering
Free-text comment	The user's specific concern or appreciation	Low volume; high information density
Structured response form	Multi-criteria feedback	Lowest volume; highest specificity; appropriate for high-stakes contexts

Most platforms use a mix: thumbs as the base, optional comment on click, structured forms for specific contexts (post-resolution surveys).

What good thumbs look like¶

A well-designed thumbs implementation:

Visible. Inline with the response, not buried.
Lightweight. One click; no required follow-up.
Optional comment. The user can elaborate but is not required to.
Acknowledged. "Thanks, we'll use this to improve" with a small visual confirmation.
Distinguishable. Up and down are clearly different and clearly mean what the user expects.
Reversible. The user can change their mind within the same session.

The implementation difference between 0.3% and 3% response rate is usually visibility and weight. A thumbs the user has to hunt for is not a thumbs.

What good star ratings look like¶

For platforms where binary is too coarse:

3 to 5 stars. Fewer levels mean less mid-cluster.
Anchored labels. "1 = wrong; 3 = okay; 5 = perfect." Anchors reduce relative-rating confusion.
Optional reasoning. A comment field on submission.
Avoided when binary suffices. Binary thumbs are faster for the user and the analysis; stars only when the nuance is needed.

The biggest risk with stars is mid-cluster (everyone clicks 3 or 4); the anchored labels mitigate it.

What free-text comments tell you¶

Comments are the highest-information signal per response. A user writes:

"It got my account number wrong" — specific failure mode the team should capture.
"This was helpful but I had to ask three times" — implicit context about the multi-turn experience.
"The tone felt cold" — qualitative signal the metrics miss.

Comments are read by humans, not aggregated mechanically. The team's discipline is to actually read them — weekly, by a person with context (product, support, or engineering with domain knowledge). The act of reading is the analysis; the patterns surface from the reader's intuition.

For high-volume comments, classification can structure them — an LLM categorises by failure mode, the categories drive aggregation. The human still reads a sample.

Structured response forms¶

For specific contexts where multi-criteria feedback adds value:

Post-resolution surveys for support cases ("did the AI understand your issue? did it provide a useful answer? would you use it again?")
Periodic NPS-style surveys (less informative for AI specifically; more about overall product)
Domain-specific forms (a medical-decision platform may ask about accuracy and trust separately)

These have low response rates (1-10%) and high effort per response; the team uses them sparingly, in contexts where the deeper signal warrants the user's time.

When to ask¶

The timing of the feedback prompt matters.

Immediately after the response. The user's memory of the response is fresh; the click is in-context. Highest response rate; signal is per-response.

End of session. A summary "how was your experience?" at the end. Lower per-response specificity; covers the overall conversation.

After resolution. "Did the AI help you resolve your issue?" Tied to outcome, not just response. Most useful for support-style workloads.

Per-response is the default; end-of-session or post-resolution are supplemental for the contexts where they fit.

What to not do¶

Required feedback. Forcing the user to provide feedback before continuing is hostile. Response rate goes up; quality goes down (random clicks to dismiss); long-term user trust erodes.

Manipulative defaults. Pre-checking "this was helpful" or making the down-vote harder to click than the up-vote produces biased data.

Feedback that disappears. The user clicks; nothing visible happens; the user feels the click was performative. The acknowledgement matters.

Unactionable categorisation. Asking "what category of feedback?" with 12 categories produces noise; the user picks one without thinking; the data is low-information.

What to capture with each signal¶

Every feedback event is a structured record. The minimum:

feedback_id: fb_01HNF...
ts: 2026-05-25T11:14:02Z
user_id_hash: <hashed identifier>
session_id: sess_...
response_id: resp_...        # the specific AI response being rated
signal_type: thumbs           # thumbs | rating | comment | form
signal_value: down            # for thumbs/rating
comment_text: "got my account number wrong"  # if provided
client_context:
  page: chat
  feature: support_agent
  device: mobile

The response_id is the load-bearing field — it joins to the audit log of the response, the prompt version, the model used, the input context. Without it, the feedback floats; with it, the feedback is anchored to a specific call.

How to interpret response rates¶

Typical rates by signal type:

Signal	Typical response rate
Thumbs (well-designed)	1-5%
Star rating	1-3%
Comment (optional)	0.5-2%
Comment (after thumbs)	30-60% of thumbs
Structured form	0.1-2%

Below 0.5% on thumbs suggests a design problem (invisibility, weight). Above 10% suggests intrusive prompting. The sweet spot is 1-5% with the design feeling natural.

Absolute volumes matter for analysis. 1,000 responses per day at 1% gives 10 thumbs per day; weekly that is 70. Useful but small. 100,000 responses per day at 1% gives 7,000 thumbs per week; large enough for stratified analysis. A small platform may need to aggregate over longer windows.

Common mistakes¶

Skipping explicit feedback entirely. "Users won't engage" produces operating-blind.

Required feedback. Hostile to users; produces noise.

Captured but not joined to the response. No response_id; feedback floats; pipeline cannot use it.

Buried UI. 0.3% response rate; the signal is too thin for analysis.

No acknowledgement. Users feel their click was performative; long-term engagement drops.

Interview Q&A¶

Q1. The thumbs response rate is 0.3%. The team thinks "users don't care to give feedback." What is your view? The response rate is a UX problem, not a user-engagement problem. A redesign — thumbs visible inline with the response, one click, optional comment, acknowledged — typically gets to 1-5%. The 10× improvement is design, not user behaviour. The team should treat the response rate as a metric to improve, not as a verdict on users. Wrong-answer notes: accepting 0.3% as the natural rate produces ongoing operating-blind.

Q2. What goes in a feedback event record, and why is the response_id load-bearing? Minimum: feedback_id, ts, user_id_hash, session_id, response_id, signal_type, signal_value, comment_text, client_context. The response_id is the join key to the audit log — the prompt version, the model used, the input, the system's output. Without it, you know a user gave a thumbs-down somewhere; you cannot correlate to what they were responding to. The join is what makes feedback actionable for the team. Wrong-answer notes: "just capture the rating" misses the join needed for analysis.

Q3. When are structured forms appropriate vs thumbs? Thumbs are the default; structured forms for high-stakes specific contexts. A post-resolution survey on a support workflow asks separately about understanding the issue, providing a useful answer, and overall experience — three signals where one binary would lose nuance. Use forms when the multi-criteria signal warrants the user's effort; do not use forms for routine response feedback where thumbs are sufficient. The trade is signal density vs response rate. Wrong-answer notes: "use forms everywhere for more data" loses response rate without compensating signal value.

Q4. The team is collecting thumbs but not joining them to responses. What do you do? Wire the response_id from the moment the response is rendered. Every AI response in the UI carries an opaque ID; thumbs include the ID; the join enables analysis. Backfill if possible: for collected feedback without IDs, the data is mostly unanalysable; the going-forward fix is what matters. The pipeline (chapter 05) cannot operate on feedback without anchoring. Wrong-answer notes: "we'll add the ID later" produces months of unusable feedback.

What to do differently after reading this¶

Implement thumbs with the design discipline: visible, lightweight, optional comment, acknowledged.
Capture structured feedback events with response_id as the join key.
Read free-text comments weekly; do not aggregate before reading.
Use structured forms sparingly for high-stakes contexts.
Treat response rate as a metric to improve, not a verdict on users.

Bridge. Explicit feedback is direct; only 1-5% of users provide it. Implicit signals — every user's behaviour reveals satisfaction — cover the broader population. The next chapter is the implicit signals discipline. → 03-implicit-signals.md