Skip to content

12. Architect checklist

Twenty items. Capture, store, pipe, calibrate, debias, govern, action. If you can answer all with an artefact, the feedback-loop operation is defensible.


Capture (1–6)

1. Explicit feedback. Are thumbs (or equivalent) implemented with the design discipline (visible, lightweight, optional comment, acknowledged)? Is response rate at 1-5%? (Chapter 02.)

2. Implicit signals. Are at least the four core implicit signals (abandonment, repeat-ask, follow-up clarification, copy-rate) captured? (Chapter 03.)

3. Response-id anchoring. Does every feedback event carry the response_id joining to the audit log? (Chapters 02, 04.)

4. Structured schema. Is the feedback schema enforced; events without required fields rejected? (Chapter 04.)

5. Storage choice. Is feedback in an OLAP store with denormalised hot fields for fast analytical queries? (Chapter 04.)

6. PII redaction at intake. Are free-text comments scanned and redacted at write time? (Chapter 08.)


Pipe (7–11)

7. Eval-case pipeline. Is there a weekly process converting negative feedback into eval cases? Are 10-30 new cases per week landing in the set? (Chapters 05, 09.)

8. Judge calibration cycle. Is the judge calibrated against user feedback monthly? Is agreement tracked and refinement run when below threshold? (Chapters 06, 09.)

9. Prompt iteration from feedback. Do prompt revisions trace to feedback patterns? Are they canaried and measured? (Chapter 10.)

10. Model decisions informed by feedback. Do model migrations and routing decisions use feedback profiles in addition to eval scores? (Chapter 10.)

11. Loop closure verification. After artefact changes, is the production signal measured to verify the loop closed? (Chapter 10.)


Govern (12–16)

12. Bias awareness. Are responder demographics compared to all-user demographics? Is feedback sliced by cohort? (Chapter 07.)

13. Triangulation. Are explicit, implicit, and proactive-sample signals triangulated for major decisions? (Chapter 07.)

14. Privacy discipline. Are user identifiers hashed? Reverse-lookup held separately? Tenant-tagged? Access-controlled? (Chapter 08.)

15. Retention. Are feedback retention windows defined, automatic, and verified? Does the store participate in RTBF? (Chapter 08.)

16. Cadence. Is the weekly review running consistently? Monthly calibration? Quarterly cross-team retrospective? (Chapter 09.)


Respond (17–20)

17. Feedback alarms. Are negative-feedback rate, implicit-signal anomalies, and calibration-drift alarms wired with thresholds and response policies? (Chapter 11.)

18. Containment-first response. Does the on-call roll back suspect changes before fully understanding root cause when severity warrants? (Chapter 11.)

19. Postmortem discipline. Are feedback-driven incidents postmortemed with systemic action items? (Chapter 11.)

20. Long-window trend monitors. Are slow degradations (week-over-week trends) monitored in addition to short-window alarms? (Chapter 11.)


How to use the checklist

Walk the items at setup; reds become work. At three months: capture (1-6) is green. At six months: piping (7-11) is operational. At nine months: govern and respond (12-20) are routine.


Common postmortem-to-checklist mappings

  • "We captured feedback but never used it" → items 7-11 (pipeline)
  • "Real PII in feedback comments" → item 6 (redaction)
  • "Loud minority drove changes; broader users unaffected" → items 12, 13 (bias, triangulation)
  • "Judge drifted from user perception" → items 8 (calibration), 16 (cadence)
  • "Slow degradation reached customer support before we caught it" → items 17, 20 (alarms, trend monitors)
  • "Same incident recurred" → item 19 (postmortem action items)

Interview Q&A

Q1. The team has explicit feedback collection and dashboards. What three items would you build next? Item 3 (response-id anchoring) if not present — the join key is load-bearing. Item 7 (eval-case pipeline) — convert signal to artefact. Item 8 (judge calibration cycle) — align the measurement with reality. These three turn captured signal into operational change. Wrong-answer notes: more capture (item 2) without the pipeline produces more unused data.

Q2. Which item is most under-appreciated? Item 11 (loop closure verification). Teams ship artefact changes based on feedback and consider the work done; without measuring whether the change actually shifted the production signal, they cannot tell whether the loop closed. The discipline of measuring the closure is what produces the compounding improvement over months. Wrong-answer notes: any item is defensible; what distinguishes is the reasoning about confirmed outcomes.

Q3. The team has a small platform; cannot land all twenty items. What sequencing? Capture (1-6) first; pipeline (7-11) second; respond (17-20) third; govern (12-16) as the platform matures. The minimum operational floor is items 1, 3, 4, 7 — capture with anchor and schema, plus the pipeline. Without those, the rest does not have substrate. Wrong-answer notes: "do everything at once" is unrealistic; the sequencing matters.

Q4. The eval-case pipeline is in place but the eval set has not grown in a month. What is wrong? The pipeline exists; the triage is not happening. The weekly review (item 7, item 16) is being skipped. The fix is to make the review ritual: standing calendar, accountable owner, output visible. Without the human triage step, the pipeline runs but produces nothing. Wrong-answer notes: "automate the triage" loses the case-level judgement that makes the pipeline useful.


Bridge. Twenty items. The last chapter is the honest opposite — what feedback loops cannot solve. → 13-honest-admission.md