04. Behavioral and Lead¶

How to use the STAR bank¶

Behavioral rounds reward specific, owned, measured stories. The interviewer is mapping your answer to a competency. Prepare one real story per archetype below, drawn from your own work, and write it in STAR form:

Situation — one sentence of context.
Task — what you specifically owned.
Action — the 2-4 concrete things you did (not "we").
Result — an explicit outcome, ideally a number.
Tie-in — for an AI role, end each story by connecting the instinct to AI systems (traces, evals, canary gates, rollout).

Fill the template for each archetype. Keep the bracketed prompts; replace them with your specifics.

STAR archetypes to prepare¶

1. Production debugging¶

Probes: Can you diagnose a hard, ambiguous failure under pressure?
Story shape: A system intermittently failed; you read logs/traces, isolated a root cause (e.g. a resource conflict or race), fixed it, and added a runbook or guardrail so it couldn't recur.
Template: Situation [the flaky failure] · Task [diagnose + fix, often with limited access] · Action [how you traced it to root cause] · Result [time-to-resolve + a metric drop] · AI tie-in: same instinct for agent failures — read traces, isolate the conflict, fix the root cause.

2. Ambiguity / 0→1¶

Probes: Can you build something with no prior pattern to copy?
Story shape: You owned a greenfield build, compared real options, chose one for defensible reasons, designed it, and shipped — and it became a foundation others built on.
Template: Situation [the unmet need] · Task [build it with no template] · Action [options compared, choice + why, design, ship] · Result [adoption + a metric] · Lead tie-in: the architecture choice others built on for months.

3. A production incident you caused¶

Probes: Ownership, blamelessness, prevention.
Story shape: A change you made broke something; you rolled back fast, wrote a postmortem, found the missing control, and added it.
Template: Situation [what broke] · Task [acknowledge, fix, prevent] · Action [rollback time, postmortem, the new gate/canary] · Result [no repeat] · AI tie-in: same for prompt/model changes — eval-set and canary gates before rollout.

4. Disagreement / pushback¶

Probes: Can you disagree with data instead of ego or title?
Story shape: You argued against a popular choice, wrote a short comparison, acknowledged the other side, and de-risked with an incremental test.
Template: Situation [the contested decision] · Task [let data decide] · Action [written comparison, cost/risk projection, A/B or pilot] · Result [outcome + a metric] · Lead tie-in: disagreed cleanly, derisked with proof.

5. Mentoring an engineer¶

Probes: Do you grow people, not just ship code?
Story shape: Someone struggled to ship; you diagnosed the real blocker (often confidence/perfectionism), coached a tighter ship-iterate loop, and they grew.
Template: Situation [who struggled, how] · Task [improve ramp/confidence] · Action [1:1s, root cause, coaching, scoped wins] · Result [reliable shipping; later mentored others] · Lead tie-in: leadership compounds through people.

6. Cross-functional work (sales / product)¶

Probes: Can you align non-engineering stakeholders without breaking trust?
Story shape: A commitment was made that engineering hadn't scoped; you found a realistic path with trade-offs, aligned the parties, and made it reusable.
Template: Situation [the mismatch] · Task [reach a clear yes/no] · Action [research options, find a path, negotiate timing] · Result [deal/outcome + reuse] · Lead tie-in: stakeholder alignment is a lead-tier responsibility.

7. Production scale¶

Probes: Can you scale without premature complexity?
Story shape: You grew a system in stages, adding multi-tenancy, observability, and autoscaling only as load demanded.
Template: Situation [the growth need] · Task [staged architecture] · Action [what you added at each scale threshold] · Result [scale reached + uptime] · AI tie-in: same phased rollout for AI — manual eval first, automation later.

8. A decision you'd make differently¶

Probes: Self-awareness and a better mental model afterward.
Story shape: You optimized for v1 speed, hit a scaling/architecture wall later, and led the painful correction — walking away with a sharper irreversible-decision lens.
Template: Situation [the early choice] · Task [reflect honestly] · Action [what you'd model differently; the migration you led] · Result [a reusable decision framework] · Lead tie-in: now ask "what does this look like at 100x?" for hard-to-reverse choices.

9. Customer empathy¶

Probes: Do you design for real behavior, not lab assumptions?
Story shape: A product assumed ideal users; you observed real usage, found the gap, and redesigned for actual behavior.
Template: Situation [the wrong assumption] · Task [design for reality] · Action [observed real usage, redesigned] · Result [success-rate jump + customer outcome] · AI tie-in: eval sets must reflect real user behavior, not clean internal demos.

10. Making yourself replaceable¶

Probes: The senior signal — removing yourself as a single point of failure.
Story shape: You were the bottleneck on a critical area; you documented it, ran deep-dives, shifted reviews to others, and moved on to higher-leverage work.
Template: Situation [where you were the SPOF] · Task [remove the dependency] · Action [docs, training, delegated reviews] · Result [team owned it; you moved up] · Lead tie-in: senior signal is making the team less dependent on you.

Question -> archetype retrieval¶

Question	Best archetype
Tell me about a production incident	#1 or #3
Tell me about a hard technical decision	#2 or #8
Tell me about a disagreement	#4
Tell me about mentoring someone	#5 or #10
Tell me about working with non-engineering	#6
Tell me about scaling a system	#7
Tell me about a mistake	#3 or #8
Tell me about a customer interaction	#9
Tell me about leading a team	#5, #6, #10

Lead-specific architecture questions¶

How would you architect AI services for a 100-engineer org?¶

Standardize observability: LangSmith, Helicone, or equivalent.
Standardize eval methodology: org-owned gold sets per use case.
Maintain reusable patterns: HITL, retries, checkpointing, escalation.
Decide shared inference platform vs per-team stack.
Add an LLM proxy for budgets, rate limits, and routing.
Document "how we do agents here".

How do you decide build vs API?¶

Build / self-host when privacy, data residency, scale economics, or custom control demand it.
Use APIs when speed matters and scale economics are not yet proven.
Re-evaluate yearly; the landscape shifts fast.

Walk through a postmortem you ran.¶

Use the production-incident archetype (#3).
Translate the config-canary lesson into prompt/model/eval gate language.

Your AI cost doubled. What now?¶

Find which call or flow changed.
Separate traffic growth from per-call inflation.
Check model choice, prompt length, retry storms, and runaway loops.
Mitigate with routing, caching, retry caps, and budget alarms.
Add feature-level cost dashboards and pre-launch cost reviews.

What is your eval philosophy?¶

Gold sets per use case; no generic benchmark worship.
Track failure modes, not only aggregate accuracy.
Gate prompt/model changes before deploy.
Sample 1-5% of production traffic for human review.
Monitor drift weekly.
Assign an explicit owner per gold set.

How would you onboard a junior AI engineer?¶

Week 1-2: Read code, traces, and patterns.
Week 3-4: Ship a small bounded change.
Month 2: Own one feature end-to-end.
Month 3: Contribute to architecture.
Weekly mentoring throughout; review load shifts gradually.

Leadership questions¶

Tell me about a project you led end-to-end.¶

Use the 0→1 archetype (#2).
Emphasize architecture choice, execution, shipping, and measurement.

How do you handle a high performer who is a culture problem?¶

Give direct feedback with examples.
Explain team impact, not personal annoyance.
Coach with clear expectations.
Escalate to PIP if behavior does not change.
Protect team health over one strong IC.

What is your hiring bar?¶

Proven shipping ability > clever talk.
Self-awareness about gaps.
Curiosity and learning speed.
Clear written communication.
For AI roles: strong production discipline.

What is the worst code review you've given?¶

Admit where tone could have been better.
Show that effective feedback improves work and relationship.

How do you decide what not to do?¶

Define 90-day success first.
Anything not helping that goal becomes a no, defer, or explicit trade-off.
Communicate the no clearly.

How do you communicate to non-technical stakeholders?¶

Lead with the decision.
Focus on capability, cost, risk, and action needed.
Use customer/user language.
Tie to the cross-functional archetype (#6) when useful.

Have you managed an AI-augmented team?¶

If yes: use specific examples.
If not: frame it as an adjacency — same principles: clear scope, short feedback loops, code review, operating discipline.

Mentorship questions¶

How do you scale yourself through people?¶

Use the make-yourself-replaceable archetype (#10).
Emphasize docs, training loops, and delegated ownership.

Tell me about mentoring someone who got promoted.¶

Use the mentoring archetype (#5).
Be explicit about the outcome: reliability, promotion, or mentoring others.

A junior is stuck. What do you do?¶

Diagnose whether the blocker is code, concept, or confidence.
Pair for 30 minutes.
Step back and let them try.
Follow up next day.
Do not just hand over the answer.

AI-specific lead questions¶

Where is the line between AI engineer and AI researcher?¶

AI engineer: applies foundation models in production.
AI researcher: improves/trains models.
Different skills; partial overlap; hire accordingly.

How do you choose a framework to standardize on?¶

Do not force one too early.
Multi-framework fluency is the senior signal early.
Once the team is ~5+, pick a default and allow exceptions.
Re-evaluate yearly.

What is your AI safety philosophy?¶

Production reliability is the first safety layer.
HITL for high-stakes actions.
Eval gates before deploy.
Audit logs for sensitive operations.
Privacy / PII handling by regulation and risk level.

Build vs buy for an agent platform?¶

LangGraph / open orchestration gives control.
Vendor SDKs give speed, with lock-in trade-offs.
Managed platforms help non-core teams move quickly.
Default: open + composable for core product; buy for smaller or non-core needs.

How do you stay current?¶

30 min/day reading: Anthropic engineering, Simon Willison, LangChain/LangGraph notes, curated newsletters.
1 paper/week skim.
1 community event / quarter.
Do not chase every model release.

Closing questions to ask¶

Walk me through a recent production incident. What changed after it?
How does the team handle evals and gold sets?
What is the biggest architecture decision in flight right now?
How do you decide what not to build?
Tell me about someone recently promoted. What did they do?
Where does this team friction most with the rest of engineering?
What does success at 6 months look like for this role?
Where does the team disagree with engineering leadership today?

Delivery rules¶

Keep answers to 2-3 minutes max.
Use specific numbers from your own work: users/customers, % cost saved, latency reduction, success-rate gains.
Use first person for your contribution; give team credit where needed.
End with a lesson or changed operating rule.
Pre-write bullets, not scripts.
In Lead rounds, avoid sounding only like an IC or only like a manager.

Anti-patterns¶

Using "we" so much that your role disappears.
Vague stories with no metric or result.
No explicit outcome.
Memorized verbatim answers.
Acting like the hero in every story.
Sounding apologetic about the AI pivot.
Asking only compensation questions.
Having no questions for them.