00. Prompt Ops — The Five-Year-Old Version¶
Prompts are not strings. They are config. And nobody edits production config at 2 a.m. with no review.
Imagine a bakery with one famous secret recipe.
The recipe lives on a single sheet of paper. The sheet sits on the kitchen counter. Anyone in the bakery can grab a pen and edit it.
One morning a tired cook crosses out "two pinches salt" and writes "two tablespoons salt". He thinks he is fixing a typo. He is not. The next batch of bread is inedible. Customers complain. Sales drop. Reviews tank. Nobody can remember what the recipe said yesterday. Nobody knows who changed it. Nobody can roll back to the version that worked.
That bakery is most AI teams in their first year. The recipe is the prompt. The paper is production. The tired cook is everyone — including yesterday's intern, including the founder, including a script that "just made a small wording fix".
This module is the answer to that bakery.
A tiny bakery picture — what changes¶
BEFORE PROMPT OPS AFTER PROMPT OPS
───────────────── ────────────────
prompt on paper prompt in the registry
│ │
│ anyone can edit │ only reviewers can ship
▼ ▼
1 version, no history every version has a SHA
│ │
│ no rollback │ rollback in one command
▼ ▼
silent regressions evals block bad changes
│ │
│ "who changed this?" │ git blame, with timestamps
▼ ▼
Friday evening incident Tuesday afternoon postmortem
The shape on the right is what mature AI teams actually do. Every column on the right has the same theme — prompts are treated like code. That single shift is most of what this module teaches.
A tinier worked example — the salt change¶
Pretend our bakery is a customer-support AI. Last week, this is the prompt that shipped:
You are a helpful support agent.
Always greet the user by name.
Always end with "Is there anything else I can help with?"
Today a well-meaning engineer makes a small edit. She removes the second line because "the greeting feels too formal":
A small change. One line. Looks harmless.
By Friday, complaints arrive. "The bot stopped using my name." "It feels colder." "I felt like I was being routed to a machine."
In the old bakery, the team would now spend two days hunting. Who changed it? When? Why? What did the old version look like?
In the new bakery, every prompt has a SHA. Like a git commit hash.
prompt = customer_support_greeter
version v17 → SHA a8c3f9... (had the name greeting)
version v18 → SHA b1d7e4... (removed the name greeting)
trace 4481 → ran on v18
The traces carry the SHA. The complaint trace says v18. The diff between v17 and v18 is one line. The rollback to v17 is one command. Time-to-recovery — 90 seconds.
Same bug, same regression. Two very different Fridays.
The seven placeholders you will see called back¶
| Placeholder | What it really is |
|---|---|
| the recipe | The prompt itself — system prompt, instructions, few-shot examples |
| the recipe book | The prompt registry — versioned, content-addressed storage |
| the SHA | The content-hash of a specific prompt version — unique, immutable |
| the rollback | Swapping the live recipe back to a previous SHA, in one action |
| the taste test | The eval suite that has to pass before a recipe change can ship |
| the trial bake | Shadow or A/B traffic — running the new recipe alongside the old, comparing outputs |
| the bakery log | Observability — every trace tagged with the exact prompt SHA that ran |
| the customer's recipe | Multi-tenant override — one customer's preferred wording without forking the recipe book |
We will keep returning to these names. The whole module is the rules of running a bakery where the recipe is the most valuable thing you own.
Why this module exists, plainly¶
Module 07 taught you how to write a good prompt. This module teaches how to run prompts as production assets.
The single biggest source of silent AI regressions is not model upgrades. It is someone changed the prompt and nobody noticed. A tired cook. A "small wording fix." A wrong-window paste from an AI assistant. A merged PR that bundled a prompt edit with a refactor.
Lead engineers are expected to know how to put guardrails around the prompt-edit surface. That is what this module is.
In interviews, the question often arrives like this:
"How do you avoid silent regressions when someone edits a prompt?" "How do you A/B test a prompt change safely?" "How do you handle per-customer prompt overrides without forking the prompt for each customer?"
Most candidates do not know the answer. That is your opening.
What's coming¶
-
01-prompts-as-code.md — why prompts are load-bearing config, not strings in source.
-
02-the-prompt-registry.md — storage, naming, immutability, content-addressed identity.
-
03-versioning-and-rollback.md — every change has a SHA, a diff, a rollback target.
-
04-review-gates.md — who can edit a prompt, what review must happen before a prompt ships.
-
05-shadow-and-ab-testing.md — shadow vs split traffic. Ramp percentages. Eval gates.
-
06-prompt-drift-detection.md — did a "small wording fix" change downstream behavior?
-
07-prompt-observability.md — linking traces back to the exact prompt version that ran.
-
08-prompt-eval-suites.md — eval gates that block prompt changes from shipping.
-
09-multi-tenant-prompts.md — per-customer overrides without forking.
-
10-prompt-feature-flags.md — gating prompt rollouts with feature flag systems.
-
11-prompt-incidents-and-rollback.md — when prompts cause regressions, how to roll back fast.
-
12-tooling-landscape.md — Langfuse, Pezzo, PromptLayer, Braintrust, LangSmith — what each actually solves.
-
13-honest-admission.md — what prompt ops still does not solve.
Bridge. First we need to convince ourselves that a prompt is not a string. It is config. The kind of config you do not edit live. The kind of config that, when mistreated, takes the system down quietly. → 01-prompts-as-code.md