04. Behavioral and Lead¶
How to use the STAR bank¶
Behavioral rounds reward specific, owned, measured stories. The interviewer is mapping your answer to a competency. Prepare one real story per archetype below, drawn from your own work, and write it in STAR form:
- Situation — one sentence of context.
- Task — what you specifically owned.
- Action — the 2-4 concrete things you did (not "we").
- Result — an explicit outcome, ideally a number.
- Tie-in — for an AI role, end each story by connecting the instinct to AI systems (traces, evals, canary gates, rollout).
Fill the template for each archetype. Keep the bracketed prompts; replace them with your specifics.
STAR archetypes to prepare¶
1. Production debugging¶
- Probes: Can you diagnose a hard, ambiguous failure under pressure?
- Story shape: A system intermittently failed; you read logs/traces, isolated a root cause (e.g. a resource conflict or race), fixed it, and added a runbook or guardrail so it couldn't recur.
- Template: Situation
[the flaky failure]· Task[diagnose + fix, often with limited access]· Action[how you traced it to root cause]· Result[time-to-resolve + a metric drop]· AI tie-in: same instinct for agent failures — read traces, isolate the conflict, fix the root cause.
2. Ambiguity / 0→1¶
- Probes: Can you build something with no prior pattern to copy?
- Story shape: You owned a greenfield build, compared real options, chose one for defensible reasons, designed it, and shipped — and it became a foundation others built on.
- Template: Situation
[the unmet need]· Task[build it with no template]· Action[options compared, choice + why, design, ship]· Result[adoption + a metric]· Lead tie-in: the architecture choice others built on for months.
3. A production incident you caused¶
- Probes: Ownership, blamelessness, prevention.
- Story shape: A change you made broke something; you rolled back fast, wrote a postmortem, found the missing control, and added it.
- Template: Situation
[what broke]· Task[acknowledge, fix, prevent]· Action[rollback time, postmortem, the new gate/canary]· Result[no repeat]· AI tie-in: same for prompt/model changes — eval-set and canary gates before rollout.
4. Disagreement / pushback¶
- Probes: Can you disagree with data instead of ego or title?
- Story shape: You argued against a popular choice, wrote a short comparison, acknowledged the other side, and de-risked with an incremental test.
- Template: Situation
[the contested decision]· Task[let data decide]· Action[written comparison, cost/risk projection, A/B or pilot]· Result[outcome + a metric]· Lead tie-in: disagreed cleanly, derisked with proof.
5. Mentoring an engineer¶
- Probes: Do you grow people, not just ship code?
- Story shape: Someone struggled to ship; you diagnosed the real blocker (often confidence/perfectionism), coached a tighter ship-iterate loop, and they grew.
- Template: Situation
[who struggled, how]· Task[improve ramp/confidence]· Action[1:1s, root cause, coaching, scoped wins]· Result[reliable shipping; later mentored others]· Lead tie-in: leadership compounds through people.
6. Cross-functional work (sales / product)¶
- Probes: Can you align non-engineering stakeholders without breaking trust?
- Story shape: A commitment was made that engineering hadn't scoped; you found a realistic path with trade-offs, aligned the parties, and made it reusable.
- Template: Situation
[the mismatch]· Task[reach a clear yes/no]· Action[research options, find a path, negotiate timing]· Result[deal/outcome + reuse]· Lead tie-in: stakeholder alignment is a lead-tier responsibility.
7. Production scale¶
- Probes: Can you scale without premature complexity?
- Story shape: You grew a system in stages, adding multi-tenancy, observability, and autoscaling only as load demanded.
- Template: Situation
[the growth need]· Task[staged architecture]· Action[what you added at each scale threshold]· Result[scale reached + uptime]· AI tie-in: same phased rollout for AI — manual eval first, automation later.
8. A decision you'd make differently¶
- Probes: Self-awareness and a better mental model afterward.
- Story shape: You optimized for v1 speed, hit a scaling/architecture wall later, and led the painful correction — walking away with a sharper irreversible-decision lens.
- Template: Situation
[the early choice]· Task[reflect honestly]· Action[what you'd model differently; the migration you led]· Result[a reusable decision framework]· Lead tie-in: now ask "what does this look like at 100x?" for hard-to-reverse choices.
9. Customer empathy¶
- Probes: Do you design for real behavior, not lab assumptions?
- Story shape: A product assumed ideal users; you observed real usage, found the gap, and redesigned for actual behavior.
- Template: Situation
[the wrong assumption]· Task[design for reality]· Action[observed real usage, redesigned]· Result[success-rate jump + customer outcome]· AI tie-in: eval sets must reflect real user behavior, not clean internal demos.
10. Making yourself replaceable¶
- Probes: The senior signal — removing yourself as a single point of failure.
- Story shape: You were the bottleneck on a critical area; you documented it, ran deep-dives, shifted reviews to others, and moved on to higher-leverage work.
- Template: Situation
[where you were the SPOF]· Task[remove the dependency]· Action[docs, training, delegated reviews]· Result[team owned it; you moved up]· Lead tie-in: senior signal is making the team less dependent on you.
Question -> archetype retrieval¶
| Question | Best archetype |
|---|---|
| Tell me about a production incident | #1 or #3 |
| Tell me about a hard technical decision | #2 or #8 |
| Tell me about a disagreement | #4 |
| Tell me about mentoring someone | #5 or #10 |
| Tell me about working with non-engineering | #6 |
| Tell me about scaling a system | #7 |
| Tell me about a mistake | #3 or #8 |
| Tell me about a customer interaction | #9 |
| Tell me about leading a team | #5, #6, #10 |
Lead-specific architecture questions¶
How would you architect AI services for a 100-engineer org?¶
- Standardize observability: LangSmith, Helicone, or equivalent.
- Standardize eval methodology: org-owned gold sets per use case.
- Maintain reusable patterns: HITL, retries, checkpointing, escalation.
- Decide shared inference platform vs per-team stack.
- Add an LLM proxy for budgets, rate limits, and routing.
- Document "how we do agents here".
How do you decide build vs API?¶
- Build / self-host when privacy, data residency, scale economics, or custom control demand it.
- Use APIs when speed matters and scale economics are not yet proven.
- Re-evaluate yearly; the landscape shifts fast.
Walk through a postmortem you ran.¶
- Use the production-incident archetype (#3).
- Translate the config-canary lesson into prompt/model/eval gate language.
Your AI cost doubled. What now?¶
- Find which call or flow changed.
- Separate traffic growth from per-call inflation.
- Check model choice, prompt length, retry storms, and runaway loops.
- Mitigate with routing, caching, retry caps, and budget alarms.
- Add feature-level cost dashboards and pre-launch cost reviews.
What is your eval philosophy?¶
- Gold sets per use case; no generic benchmark worship.
- Track failure modes, not only aggregate accuracy.
- Gate prompt/model changes before deploy.
- Sample 1-5% of production traffic for human review.
- Monitor drift weekly.
- Assign an explicit owner per gold set.
How would you onboard a junior AI engineer?¶
- Week 1-2: Read code, traces, and patterns.
- Week 3-4: Ship a small bounded change.
- Month 2: Own one feature end-to-end.
- Month 3: Contribute to architecture.
- Weekly mentoring throughout; review load shifts gradually.
Leadership questions¶
Tell me about a project you led end-to-end.¶
- Use the 0→1 archetype (#2).
- Emphasize architecture choice, execution, shipping, and measurement.
How do you handle a high performer who is a culture problem?¶
- Give direct feedback with examples.
- Explain team impact, not personal annoyance.
- Coach with clear expectations.
- Escalate to PIP if behavior does not change.
- Protect team health over one strong IC.
What is your hiring bar?¶
- Proven shipping ability > clever talk.
- Self-awareness about gaps.
- Curiosity and learning speed.
- Clear written communication.
- For AI roles: strong production discipline.
What is the worst code review you've given?¶
- Admit where tone could have been better.
- Show that effective feedback improves work and relationship.
How do you decide what not to do?¶
- Define 90-day success first.
- Anything not helping that goal becomes a no, defer, or explicit trade-off.
- Communicate the no clearly.
How do you communicate to non-technical stakeholders?¶
- Lead with the decision.
- Focus on capability, cost, risk, and action needed.
- Use customer/user language.
- Tie to the cross-functional archetype (#6) when useful.
Have you managed an AI-augmented team?¶
- If yes: use specific examples.
- If not: frame it as an adjacency — same principles: clear scope, short feedback loops, code review, operating discipline.
Mentorship questions¶
How do you scale yourself through people?¶
- Use the make-yourself-replaceable archetype (#10).
- Emphasize docs, training loops, and delegated ownership.
Tell me about mentoring someone who got promoted.¶
- Use the mentoring archetype (#5).
- Be explicit about the outcome: reliability, promotion, or mentoring others.
A junior is stuck. What do you do?¶
- Diagnose whether the blocker is code, concept, or confidence.
- Pair for 30 minutes.
- Step back and let them try.
- Follow up next day.
- Do not just hand over the answer.
AI-specific lead questions¶
Where is the line between AI engineer and AI researcher?¶
- AI engineer: applies foundation models in production.
- AI researcher: improves/trains models.
- Different skills; partial overlap; hire accordingly.
How do you choose a framework to standardize on?¶
- Do not force one too early.
- Multi-framework fluency is the senior signal early.
- Once the team is ~5+, pick a default and allow exceptions.
- Re-evaluate yearly.
What is your AI safety philosophy?¶
- Production reliability is the first safety layer.
- HITL for high-stakes actions.
- Eval gates before deploy.
- Audit logs for sensitive operations.
- Privacy / PII handling by regulation and risk level.
Build vs buy for an agent platform?¶
- LangGraph / open orchestration gives control.
- Vendor SDKs give speed, with lock-in trade-offs.
- Managed platforms help non-core teams move quickly.
- Default: open + composable for core product; buy for smaller or non-core needs.
How do you stay current?¶
- 30 min/day reading: Anthropic engineering, Simon Willison, LangChain/LangGraph notes, curated newsletters.
- 1 paper/week skim.
- 1 community event / quarter.
- Do not chase every model release.
Closing questions to ask¶
- Walk me through a recent production incident. What changed after it?
- How does the team handle evals and gold sets?
- What is the biggest architecture decision in flight right now?
- How do you decide what not to build?
- Tell me about someone recently promoted. What did they do?
- Where does this team friction most with the rest of engineering?
- What does success at 6 months look like for this role?
- Where does the team disagree with engineering leadership today?
Delivery rules¶
- Keep answers to 2-3 minutes max.
- Use specific numbers from your own work: users/customers, % cost saved, latency reduction, success-rate gains.
- Use first person for your contribution; give team credit where needed.
- End with a lesson or changed operating rule.
- Pre-write bullets, not scripts.
- In Lead rounds, avoid sounding only like an IC or only like a manager.
Anti-patterns¶
- Using "we" so much that your role disappears.
- Vague stories with no metric or result.
- No explicit outcome.
- Memorized verbatim answers.
- Acting like the hero in every story.
- Sounding apologetic about the AI pivot.
- Asking only compensation questions.
- Having no questions for them.