02. The heartbeat that must stop — ReAct loop and the give-up rule¶
~22 min read. A loop without an exit is not an agent — it is a runaway process with a credit card. This file teaches the heartbeat (think → act → observe → decide) and the wall that stops it (iteration caps, token caps, cost caps, time caps, confidence thresholds). They are one mechanism, not two.
Built on 01-architecture-decision-tree.md. We chose the leash. For any leash longer than a single call, the agent runs a loop. This file is about the loop — and the rule that ends it.
The 47-iteration disaster¶
An agent gets a task: "Find the cheapest enterprise plan among our top three competitors and summarise the difference."
It calls web_search. It reads a marketing page. It calls web_search again.
The pricing is behind a "contact sales" form. It cannot fill the form. But it
keeps trying. Different keywords. Different sites. Different rewrites.
47 tool calls later, it has nothing. The token meter reads $18.42. The user sees a polite shrug: "I was unable to complete the task." No notes. No partial findings. Nothing the next attempt can reuse. Pure waste.
Now compare. A senior handyman gets the same task. He tries three searches. He sees that public pricing is missing. He stops. He writes: "Two of three competitors hide enterprise pricing behind sales calls. Here is what I found for the third. Want me to draft an outreach email instead?" Total spend: 60 cents. Total time: 18 seconds.
The difference is not intelligence. It is discipline. The handyman who refuses to stop digging is the dangerous one. The one who knows when to put the wrench down is the professional.
Two questions fall out of this story:
- What rhythm does the agent use while it is looping? (The heartbeat.)
- What rule forces it to stop? (The wall.)
Both are one mechanism. Teach either without the other and you get either a timid agent that never loops, or a runaway agent that never stops.
The heartbeat — think, act, observe, decide¶
Watch this. A support agent gets a ticket: "Update Priya's billing address."
Think: "I first need the account ID before I can update anything."
Act: search_customer(name="Priya")
Observe: returned 3 matches — Priya M., Priya S., Priya Corp.
Think: "Ticket says 'corporate billing'. I want the enterprise one."
Act: search_customer(name="Priya", org_type="enterprise")
Observe: 1 match — account_id=ACC-7741, plan=enterprise
Think: "Now I can update the address."
Act: update_billing_address(account_id="ACC-7741", address={...})
Observe: status=200, billing_address_updated_at=2026-05-12T09:14Z
Think: "Goal complete. Reply to the customer."
See the rhythm? Think. Try. Check. Think again. Four small heartbeats and the job is done. No magic. No giant plan. Just one move, one observation, one update — and repeat until the goal is reached or the wall fires.
The handyman mental model¶
Recall the handyman from first principles. He does not swing first. He looks. He thinks. He picks one tool. He tries. He checks the leak. Then he decides again. ReAct is literally that discipline, written as a loop — plus a hard clock on the wall that tells him when his shift is over.
┌────────────┐
│ Think │ "What is the next best move?"
└─────┬──────┘
▼
┌────────────┐
│ Act │ "Use exactly one tool to reduce uncertainty."
└─────┬──────┘
▼
┌────────────┐
│ Observe │ "Read the actual result. Not the hoped-for one."
└─────┬──────┘
▼
┌────────────────────────────────────────────────────┐
│ STOP CONDITIONS (OR-gate — ANY one fires → stop) │
├────────────────────────────────────────────────────┤
│ goal met? ──┐ │
│ iter > N? ──┤ │
│ cost > $C? ──┤── OR ──→ EXIT LOOP │
│ tokens > T? ──┤ │
│ time > W? ──┤ │
│ conf < F? ──┤ │
│ same err x3? ──┘ │
└─────────────────────────┬──────────────────────────┘
│ none fired
▼
back to Think
Three boxes. One feedback arrow. One wall. That is the whole control system. The genius is not in any single box. The genius is the arrow that goes back up and the wall that sometimes refuses to let it.
Mini-FAQ. "How is ReAct different from plain function calling?" Plain function calling is one swing. The model picks a tool, calls it, returns text. ReAct wraps that swing in a feedback loop — the tool's output goes back to the model, the model decides again, and the process repeats. Function calling is the primitive. ReAct is the program built from it. And a program without a termination condition is a bug.
Walking one iteration in slow motion¶
Think — choose the next move, not the whole future¶
The model writes a short rationale. Sometimes one line. Sometimes three. The point is not a giant monologue. The point is to pick the next uncertainty to reduce.
Good think: "I do not know the order's age. I should fetch it before deciding eligibility." Bad think: "Let me reason about all possible refund scenarios in detail." (paralysis)
Think is cheap. Act is expensive. A two-line plan before a tool call prevents the model from firing the wrong tool with the wrong arguments.
Act — one tool, one uncertainty¶
The agent calls one tool — get_order(id=4481). One call, one targeted
question. Not ten parallel calls hoping something sticks.
Observe — read the payload, not just the status¶
The tool returns:
The agent must read the fields, not just the success flag. The decisive fact
is placed_at. That tells the agent whether the 30-day refund window applies.
A weak agent reads "200 OK" and moves on. A real agent reads the date.
Tools lie. APIs return success on partial data. Search returns empty arrays instead of errors. Files come back with stale content. The observe step turns the raw payload into a fact the next think step can use. Skip it and the loop becomes hopeful, not grounded.
Worked example — fixing a failing checkout test¶
Same coding agent. Same goal across all iterations: "Fix the failing checkout test." Watch how observations steer the plan — and watch where a stop rule would fire if the agent got stuck.
Iteration 1¶
Think: "I do not know why the test fails. Run it first."
Act: run_test(name="checkout_total")
Observe: FAIL — expected=540.00, actual=539.50
assertion at billing/tax.py:tax_rounding
Iteration 2¶
Think: "Mismatch is small. Likely a rounding bug in tax.py."
Act: open_file(path="billing/tax.py")
Observe: line 41: total = floor(amount * rate * 100) / 100
(policy expects round-half-up, not floor)
Iteration 3¶
Think: "Change floor to round, then re-run the test."
Act: edit_file(path="billing/tax.py", line=41,
new="total = round(amount * rate, 2)")
run_test(name="checkout_total")
Observe: PASS — actual=540.00
Iteration 4¶
Four iterations. Each Observe was the seed for the next Think. The agent did not know the fix at iteration zero. It discovered the fix through feedback.
Now imagine the bug were deeper. Imagine the agent opened six files, tried three patches, and the test still failed. Without a wall, iteration 25 becomes iteration 50 becomes iteration 100. The heartbeat must eventually stop.
The wall — seven stop conditions¶
Return to the competitor-pricing agent. It burned $18 over 47 iterations. Watch how each rule would have caught it.
a) Max iterations¶
A counter. Increment per think-act-observe cycle. Stop when it crosses N.
| Agent class | Typical N |
|---|---|
| Chat agents | 5–20 |
| Coding agents (Cursor, Aider, Claude Code) | 50–200 |
| Deep research (OpenAI Deep Research, Perplexity) | 100–500 |
If the pricing agent had max_iterations = 8, it stops at iteration 8.
Savings: ~$15.
for step in range(max_iterations):
action = model.next_action(state)
if action.is_final_answer:
return action.answer
observation = run_tool(action)
state = state.append(observation)
return give_up(state, reason="max iterations reached")
That for line is the whole rule. The give_up function is where the craft
lives — more on that below.
b) Max tokens¶
Sum input + output across every call. Stop when total crosses budget.
- Cheap chat task: 10K tokens.
- Mid-complexity task: 50K–200K tokens.
- Deep research: 500K–2M tokens.
If the pricing agent had max_tokens = 100_000, it stops around iteration 30.
c) Max cost (dollars)¶
Same as tokens, but priced. Useful because input and output rates differ, and tool calls may have their own dollar cost.
- Consumer chat: \(0.10–\)1.
- Coding agent on a hard bug: \(1–\)10.
- Autonomous research: \(5–\)50.
If the pricing agent had max_cost = $2.00, it stops at iteration 6.
d) Max wall-clock time¶
A timer. Useful when tools are slow or the user is waiting.
- Interactive chat: 30–90 seconds.
- Async coding task: 5–15 minutes.
- Overnight research: 1–8 hours.
If the pricing agent had max_wall_time = 60s, it stops at iteration 11.
e) Confidence threshold¶
The model self-rates progress, or a separate critic does. Stop when confidence drops below floor (0.3–0.4 typical).
This rule is noisy. Models are bad at calibration. Use it as one of several,
never alone. If the pricing agent had min_confidence = 0.35, it stops around
iteration 5.
f) "Done" signal detection¶
The model emits a structured signal — a final_answer tool call, a <done>
token, or a stop field. The loop watches for it and exits.
This is the success stop. Not a wall. A graceful finish. If the pricing
agent had emitted final_answer("I cannot find public pricing") after the
third search, the loop ends at iteration 3. Cost: 60 cents.
g) No-progress / repeated-error counter¶
Track whether the last K steps produced new evidence. If observations repeat, stop.
The pricing agent kept calling web_search with cosmetically different keywords
but getting the same shape of result. A hash-equality check over the last 3
observations would catch it at iteration 9.
Full comparison — same agent, seven rules applied alone¶
| Rule | Threshold | Fires at iter | Cost saved vs. $18 baseline |
|---|---|---|---|
max_iterations |
8 | 8 | $15.30 |
max_tokens |
100K | ~30 | $7.10 |
max_cost |
$2.00 | 6 | $16.40 |
max_wall_time |
60s | 11 | $14.20 |
min_confidence |
0.35 | 5 | $16.70 |
done_signal |
n/a | 3 | $17.65 |
no_progress |
3 same hashes | 9 | $14.80 |
The done-signal is the cheapest and most useful — it ends with an answer. Cost cap fires last; it is the safety net, not the strategy. Wire them all as an OR-gate. Any one fires → the loop stops.
The give-up message — what a good stop looks like¶
Stopping is not the same as shrugging. Compare:
Bad stop.
"I was unable to complete the task."
Good stop.
"I tried 6 web searches across Vendor-A, Vendor-B, Vendor-C. Two hide enterprise pricing behind 'contact sales' forms. For Vendor-A I found a tier list: Starter $99/mo, Team $299/mo, Enterprise custom. I am stopping because the remaining two require human outreach. Want me to draft outreach emails, or shall I look at archived pricing on the Wayback Machine?"
The good stop returns: 1. Evidence — what was tried. 2. Partial findings — what was learned. 3. Blocker — why the wall hit. 4. Next-action proposal — what the user can do now.
Design stopping as a visible product behaviour, not a hidden guardrail.
Who decides what — the LLM/orchestrator split¶
The model decides the success stop. It emits final_answer when it believes
the task is done. Every other stop is enforced by the orchestrator.
Why? The model cannot see its own running cost, step count, or wall-clock time. It judges only the local question: "Is my answer ready?" Cost, step, and time enforcement must live in orchestrator code that reads running totals the model cannot access.
LLM decides: "Am I done?" → emit final_answer
Orchestrator: "Has the wall been hit?" → break + give_up(state)
Trusting the LLM to enforce its own cost cap is like trusting a toddler with a candy jar. It is a calibration failure waiting to happen.
Failure modes — where heartbeat and wall break¶
Loop failures (the heartbeat is broken)¶
| Mode | Symptom | Fix |
|---|---|---|
| Skip Observe | Model claims success without reading payload | Force quoting one field from tool output |
| Think-then-forget | Next Think ignores prior Observe | Keep last Observe in context or summarise into scratchpad |
| Planning paralysis | 600-token Think before trivial Act | Cap think tokens (100–200) |
| Runaway iteration | No stopping rule, agent loops 80 times | Hard step cap + no-progress detection |
Wall failures (the stop rules are broken)¶
| Mode | Symptom | Fix |
|---|---|---|
| No cap at all | Infinite loop discovered on billing day | Always set max_iterations |
| Cap too tight | N=5 on coding agent → false failures → users stop trusting | Measure p95 successful task length, add margin |
| Only cost cap | Loop wanders 30 min before cap fires | Add iteration cap as first line of defence |
| No human fallback | Agent stops, prints error, discards work | Surface findings + ask clarifying question |
| Done without verifying | Model emits final_answer but answer is wrong |
Pair done signal with verification (test, critic, schema check) |
| Confidence-only | Miscalibrated model never/always stops | Use confidence as secondary signal only |
| Kill without breadcrumbs | Scratchpad discarded on stop → retry pays full cost again | Always persist scratchpad on stop |
Numbers — what one iteration costs, what one loop costs¶
| Item | Typical range (Claude/GPT-4-class, 2026) |
|---|---|
| Tokens per think | 50–250 |
| Tokens per observe (tool result) | 100–2,000 |
| Wall clock per iteration | 1.5–5 seconds |
| Iterations per task (coding agent) | 3–15 typical, 20–40 hard |
| Iterations per task (support agent) | 2–5 typical |
| Total task tokens | 5k–80k input, 1k–10k output |
| Cost per task (Sonnet-class) | \(0.02–\)0.40 |
| Cost per task (Opus-class) | \(0.20–\)3.00 |
Stack matters. Self-hosted small models run an iteration in 300–800 ms at near-zero cost. Frontier API models run 2–5 s per iteration at real money. Always qualify the budget by model and framework before quoting numbers. The same loop on a cheaper model affords a higher iteration ceiling.
Alternatives to vanilla ReAct¶
ReAct is the default heartbeat. Three close cousins matter.
Plan-and-Execute. Write a full plan first, then execute each step. Observation still happens, but replanning is rare. Wins when the domain is predictable. Weaker when reality keeps surprising. Many production systems blend: plan the skeleton, run ReAct inside each step.
Reflexion. After a failed attempt, the agent writes a self-critique, stores it, retries. Self-feedback layered on top of ReAct. Add it when the agent keeps making the same class of mistake across attempts.
Tree-of-Thoughts. Explore multiple think branches in parallel, score, prune. Expensive — branching multiplies tokens. Useful on puzzles and proofs. Most product agents do not need it.
Rule of thumb. Start with vanilla ReAct + OR-gate stopping. Add Reflexion when errors repeat. Add Plan-and-Execute when domain is predictable. Reach for Tree-of-Thoughts only when search is genuinely needed.
Where this lives in the wild¶
The heartbeat-and-wall pattern is the dominant control system in production agents. A non-exhaustive tour:
- Claude Code — tight Think → Act → Observe loop; 25-turn checkpoint before asking user to confirm continuation; configurable max-turn cap.
- Cursor — agent mode with iteration limit per task; falls back to user on stuck diffs.
- Devin (Cognition) — hundreds of iterations per ticket; wall-clock timeout
and
wait_for_useras graceful stop. - OpenAI Agents SDK —
max_turnsonRunner.run; raisesMaxTurnsExceededwhen hit. - LangGraph —
recursion_limiton compiled graph (default 25); raisesGraphRecursionError. - CrewAI —
max_iterper agent (default 25) andmax_execution_time. - Aider —
--max-chat-history-tokensand stop on repeated identical edits. - GitHub Copilot agent mode — opens files, runs tests, reads failures, edits, loops; explicit session cap.
- Replit Agent — per-task time budget and stop button.
- Bolt.new / StackBlitz — token budget visible as a meter to the user.
- OpenAI Deep Research — multi-hour wall-clock cap with planner sub-agent done signal.
- Perplexity Pro — retrieval → read → decide loop with internal step cap.
The diagram does not change. The toolbelt and the ceiling change.
Interview Q&A¶
Q1. Walk me through one ReAct iteration. A. The model emits a short rationale (Think) about the next uncertainty, calls one tool (Act), then reads the actual returned payload (Observe). The observation becomes the next Think's input. Repeat until the goal is met, the budget is out, or the agent escalates. Common wrong answer: describing it as "the model reasons longer." Length is not the point; feedback is.
Q2. How do you prevent an agent from running forever? A. Multi-condition stop rules wired as an OR-gate. Max iterations bounds loop count. Max cost bounds dollar spend. Wall-clock bounds latency. No-progress bounds thrashing. Done-signal gives graceful exit. None alone is enough; together they catch all waste shapes. Common wrong answer: "Lower the temperature." That reduces variance, not iteration count.
Q3. Should the LLM decide when to stop?
A. Only for the success stop — emitting final_answer. Every other stop is
enforced by the orchestrator. The model cannot see its own running cost or step
count.
Common wrong answer: "Yes, the agent knows when it is done." Models are not
reliable judges of their own cost or progress.
Q4. What is the difference between Act and Observe? A. Act fires a tool. Observe reads the actual response — fields, errors, timestamps — and converts the raw payload into a fact the next Think can use. Skipping Observe leaves the loop hopeful, not grounded. Common wrong answer: collapsing them into one step.
Q5. Cost cap or iteration cap — which matters more? A. Iteration cap fires earlier and prevents tight infinite loops. Cost cap catches the case where one expensive call (200K input tokens) ruins the budget in a single step. Run both. Iteration cap is first line of defence; cost cap is the last. Common wrong answer: "Cost cap subsumes iteration cap." It does not.
Q6. What information must a good give-up message include? A. Evidence (what was tried), partial findings (what was learned), the blocker (why the wall hit), and a next-action proposal (what to do now). Without these, the budget already spent is wasted. Common wrong answer: "Just return an error code."
Q7. What is Reflexion, and when would you add it? A. A self-critique layer over ReAct. After a failed run, the agent writes "what went wrong" and prepends it to the next attempt. Add when the same class of error repeats across attempts. Common wrong answer: confusing Reflexion with chain-of-thought. Reflexion is across runs; CoT is within one call.
Q8. How do you detect no-progress? A. Hash the observation at each step. If 3–5 consecutive observations are equal or near-equal, the loop is thrashing. Stop and surface findings. Common wrong answer: "Check whether confidence is dropping." Confidence is noisy; observation-equality is sharper.
Apply now (10 min)¶
Step 1 — model the exercise. Here is what I would write for the competitor-pricing agent: the loop trace AND its stop policy in one table.
| Iter | Think | Act | Observe | Stop check |
|---|---|---|---|---|
| 1 | Need pricing for Vendor-A | web_search("Vendor-A enterprise pricing") |
Blog post, no hard numbers | iter=1 < 12 ✓ |
| 2 | Try official site | web_search("Vendor-A pricing page") |
Pricing page: Starter $99, Team $299, Enterprise "contact us" | iter=2 < 12 ✓ |
| 3 | Vendor-A done (partial). Try Vendor-B | web_search("Vendor-B enterprise pricing") |
"Contact sales" gatewall | iter=3 < 12 ✓ |
| 4 | Same pattern. Try Vendor-C | web_search("Vendor-C pricing plans") |
"Contact sales" gatewall | iter=4 < 12 ✓ |
| 5 | Two of three are gated. I have partial data. Stop. | submit_final_answer(...) |
— | done_signal fired ✓ |
Five iterations. 60 cents. The stop policy:
| Rule | Value | What it catches |
|---|---|---|
max_iterations |
12 | Run-on loops |
max_cost_usd |
2.00 | Single expensive call |
max_wall_time_s |
90 | Slow tools |
no_progress_window |
3 | Search thrashing |
done_signal |
submit_final_answer |
Graceful success |
human_fallback |
request_user_input |
Graceful failure with question |
persist_on_stop |
scratchpad + last hypothesis | Resumability |
Step 2 — your turn. Pick one real task from your work. Write the same combined table: the loop trace (Think / Act / Observe) AND the stop policy. Mark which stop rule would fire first if the task got stuck.
Step 3 — sketch from memory. Redraw the combined diagram from the "handyman mental model" section: three loop boxes, the feedback arrow, and the OR-gate wall with all seven conditions. If you can do this cold, you understand the heartbeat and the wall as one mechanism.
Operational memory¶
This chapter explained the three-beat heartbeat (Think → Act → Observe → decide) and the OR-gate wall that stops it. The important idea is that these are one mechanism, not two — a loop without an exit is a runaway process, and an exit without a loop is a single call. The tension is persistence vs. waste: the loop must persist long enough to discover the fix (like the checkout-test agent that needed four iterations) but stop before it burns $18 on a task it can never finish (like the pricing agent that looped forty-seven times).
You learned that Observe is the heart of the agent (the check, not the act), that the model decides the success stop while the orchestrator decides every other stop, that the seven conditions are wired as an OR-gate (any one fires → stop), and that a good give-up message surfaces evidence, partial findings, blockers, and a next-action proposal instead of silent failure.
Carry this diagnostic forward: when an agent claims success but the underlying state did not change, the failure was in Observe. When an agent burns budget without progress, the failure was in the wall — either missing, misconfigured, or too generous. Audit Observe first for correctness. Audit the wall first for cost.
Remember:
- Think, Act, Observe — the feedback arrow back to Think is what makes it a loop; the OR-gate wall is what makes it safe.
- Observe reads fields, not status codes; the decisive fact lives in the payload.
- OR-gate, not AND — any one stop condition firing must stop the loop.
- The model decides the success stop (
final_answer); the orchestrator decides every other stop. Numbers in prompts are suggestions, not contracts. - Iteration cap fires earliest and cheapest; cost cap catches single expensive calls; no-progress catches thrashing; confidence is too noisy to be primary.
- A good stop returns evidence + partial findings + blocker + next-action proposal — never just "could not complete."
- Always persist scratchpad on stop; otherwise the next retry pays full cost again.
Bridge. The heartbeat is only as good as the tools it can fire. Vague tool interfaces poison even a perfect ReAct loop — the agent picks the wrong tool, passes the wrong arguments, or cannot tell two tools apart. Next: how we design the contracts between model and world.