10. Closing the loop¶

Cadence is the rhythm. Closing the loop is the work the cadences produce. Feedback feeds back into prompts, model decisions, and eval-set additions; the loop is closed when each signal produces an artefact change that improves what users experience.

A platform engineer at a Bengaluru SaaS company has feedback flowing, the pipeline running, the cadences in place. The team's quarterly retrospective reviews three months of operation. Outcomes: 240 new eval cases added; 8 prompt iterations shipped; 1 model migration evaluated based partly on feedback signal; 12 incidents (chapter 11) triggered by feedback patterns. The customer-satisfaction score has moved from a flat 0.86 to a rising 0.89; the negative-feedback rate has dropped from 8% to 5%. The loop is closed; each feedback signal traces to an artefact change and (eventually) to a measurable improvement in user experience.

This chapter is the discipline of closing the loop systematically.

What the loop closes¶

The feedback signal enters; three artefacts change; the user experience updates.

Production feedback (explicit + implicit)
       |
       v
Pipeline (chapter 05)
       |
   +---+---+
   |       |
   v       v
Eval set   Prompt registry / Model selection
   |       |
   |       v
   |   Production behaviour change
   |       |
   |       v
   |   Production feedback shifts (the loop closes)
   |
   v
Future regression-prevention against the new failure modes

The loop has two parts: the immediate change (prompt or model) that affects user experience, and the regression-prevention (eval set growth) that protects against future regressions of the fixed failure mode.

How feedback shapes prompt changes¶

Feedback patterns suggest where prompts need iteration.

Specific signals:

Repeat-ask on a specific feature → the prompt may not be eliciting useful clarification.
Follow-up clarification with words like "no, I meant" → the prompt may be misclassifying intent.
Negative thumbs with comments about tone → the prompt's tone instructions need refinement.
Abandonment after the first response → the prompt may be producing answers that look right but miss the user's actual need.

For each pattern with sufficient volume, the team:

Pulls representative cases.
Reads the system's responses.
Identifies the prompt aspect that produces the failure.
Drafts a prompt revision.
Tests against the regression eval set (cases including the failure mode).
Canaries per 13_prompt_lifecycle_operations discipline.
Monitors feedback rate on the new prompt.

The revision is gated by the eval; the deployment is canaried; the production feedback closes the loop by measuring whether the revision actually helps.

How feedback shapes model decisions¶

Less frequent but consequential. Feedback informs:

Model migrations. Pre-migration: the current model's feedback profile is the baseline. Canary: the new model's feedback profile during the canary period is compared. Promotion: contingent on feedback profile matching or improving.
Workload routing. If feedback shows that one model handles a workload poorly, route that workload to a different model (alias mapping in 02_ai_infrastructure/01 chapter 03).
Per-segment routing. Some segments may need a more capable model; feedback per segment surfaces the need.

The model decisions are made by the platform team, with feedback as one input among eval scores and cost-latency considerations.

How feedback shapes the eval set¶

The pipeline (chapter 05) produces new cases from feedback. The eval set grows by 10-30 cases per week for an active platform.

Over time, the set reflects:

Failure modes the production has actually seen.
Segments the production actually serves.
Patterns the team has actively investigated.

The set evolves with production, not in advance of it. Chapter 05 of 01_dataset_golden_set_operations covers the refresh discipline; this module is one of the primary inputs.

Verifying the loop actually closes¶

The loop closes only if the artefact change improves user experience. The verification:

After a prompt change. Measure the relevant feedback signal (negative-thumbs rate, repeat-ask rate, etc.) over the following weeks. Did it improve?
After a model migration. Compare pre and post feedback profiles. Did the targeted segments improve without regressions elsewhere?
After eval-set additions. Did the added cases catch the next regression in CI?

If the artefact change does not improve the signal, the loop is open — the change addressed something other than the actual cause. Investigate further.

A loop that consistently fails to close is a discipline problem — the pipeline is producing changes that do not address user needs. Common cause: insufficient case-level investigation; the change addressed a hypothesis, not the actual pattern.

When to act, when to wait¶

Not every signal demands immediate action.

Act: sustained pattern over multiple weeks; high-impact failure mode; high-volume signal; cases that align with strategic priorities.

Wait: noisy week-over-week variation; low-impact failure modes; small-volume signal; cases outside strategic focus.

The cadences and the metrics support the distinction. The discipline is to triage signals by impact, not to action everything.

The compounding effect over time¶

A team that closes the loop consistently sees:

The eval set grows from imagination to production-shaped reality.
The prompts iterate from initial guesses to refined production tools.
The model decisions move from one-time choices to continuous evaluation.
The user experience improves measurably over months.

The compounding is slow per cycle; large over months. A team that does not close the loop sees the eval-set staleness and the prompt aging from chapter 01 — the silent decay.

Common mistakes¶

Pipeline produces changes without measurement. No verification of whether the loop closed.

Changes shipped without canary. Production users are the test; feedback degrades before the team notices.

Acting on every signal. Effort scattered across many small changes; no compounding focus.

Not acting on the right signals. Loud minorities drive changes; the broader user base unaffected.

Loop traced in dashboards but not in artefacts. "Feedback informed our work" without the artefact change is undocumented.

Interview Q&A¶

Q1. Walk through how a feedback pattern becomes a prompt change. Identify the pattern (e.g., repeat-ask rate up 15% for the support feature). Pull representative cases. Read the system's responses; identify what the prompt is or is not doing. Draft a prompt revision targeting the issue. Test against the regression eval, including cases representing the failure mode. If the eval passes, canary the new prompt per 13_prompt_lifecycle_operations. Monitor the production feedback rate on the new prompt. If the rate improves, the loop closed; the change is justified. If not, investigate further. Wrong-answer notes: skipping the case-level investigation produces changes that don't address the actual cause.

Q2. How do you verify that a feedback-driven change actually closed the loop? Measure the relevant signal over the weeks following the change. For a prompt change targeting repeat-ask, the repeat-ask rate should drop. For a model migration targeting a segment, that segment's negative-feedback rate should improve. If the signal does not move, the change addressed the wrong cause; investigate. The loop is open until the production signal shifts in the expected direction. Wrong-answer notes: "we shipped the change" without measurement does not confirm the loop closed.

Q3. The team has shipped 20 prompt iterations this quarter based on feedback. Only 3 produced measurable improvement. What is happening? The pipeline is producing changes without sufficient case-level grounding. Most changes addressed hypotheses about the cause rather than the actual cause; only 3 hit the right diagnosis. The fix is in the triage step (chapter 05): more time reading cases, less time shipping. Deeper investigation per change; fewer changes shipped; higher rate of actual improvement. A high ship rate with low improvement rate is the discipline problem. Wrong-answer notes: "ship faster" or "ship more" compounds the issue.

Q4. Should the team act on every feedback signal? No. Triage by impact, volume, and strategic alignment. Sustained patterns with material user impact warrant action. Noisy week-over-week variation; small-volume signals; cases outside strategic focus often do not. The cadences support the triage; the metrics show the impact; the team's strategic priorities inform what to focus on. Acting on every signal scatters effort with no compounding result. Wrong-answer notes: "all feedback is valuable, all should be acted on" produces effort dilution.

What to do differently after reading this¶

Trace every feedback-driven change through to its impact on the production signal.
Canary changes; do not ship and hope.
Triage signals by impact, volume, strategic alignment; do not act on everything.
Measure the loop's closure; investigate when changes do not produce the expected signal shift.
Document the closed loops; the documentation justifies the discipline's cost.

Bridge. Closing the loop is the routine work. Sometimes the feedback signal indicates not a per-case issue but a systemic problem — a sudden spike, a sustained degradation, a cross-platform pattern. The next chapter is the incident response when feedback shows systemic concerns. → 11-feedback-incident-response.md