Skip to content

03. Interview Rounds

Round types overview

Round Typical ask What to show
Take-home / project Build an agent, RAG system, eval harness, or production-shaped workflow in 4-6 hours Working code, clear README, one explicit architecture choice, at least one eval, trade-offs
System design Design support agent, doc-Q&A, code-review agent, scheduling workflow, moderation system Problem framing, retrieval/prompt/routing choices, eval plan, failure modes, latency/cost budget, production additions
Behavioral / deep dive Translate past work into senior AI signal Numbers, ownership, architecture choices, incidents, trade-offs
Technical / coding Build a small AI feature live; write robust API-calling code Clear approach, practical Python, error handling, structured output, fast iteration
Framework-specific LangGraph, Anthropic SDK, or framework internals Honest depth on tools listed on resume

What AI interviews test more than classic SWE loops

  • System design for AI workflows.
  • Eval thinking.
  • Failure modes and reliability.
  • Judgment on tools, retrieval, prompting, and cost.
  • Leadership signal for Lead-tier roles.

Prep ramp schedule

By round type

Round Best prep
Take-home Practice 2-3 time-boxed mock take-homes
System design Talk through 5-10 designs out loud
Behavioral Memorize 3-4 strong STAR stories with numbers
Coding Practice API-calling code and structured outputs quickly
Framework Read source docs / abstractions for every framework claimed

Weekly ramp

Week Focus
8 Drill 1 + Drill 4
9 Drill 2 + Drill 5; pre-write STAR stories
10 Drill 3; mock 1 system design + 3 behavioral rounds
11 Drill 6; mock 2 system designs; Lead-specific Qs
12 Mix drills; record yourself; mock 1 full loop
13 Active interviews with tight feedback loop

Take-home patterns

What hiring managers care about

  1. Something working.
  2. Evals included.
  3. Clear README.
  4. Explicit trade-offs.
  5. Readable code.
  6. Defensible architecture decisions.

Format A — Build an X that does Y

  • Examples:
  • Chatbot over a small docs corpus.
  • RAG system over provided data.
  • Agent using 2-3 tools.
  • Senior move:
  • Add EVAL.md with 10-20 sample queries, expected outputs, actual outputs, pass/fail.

Format B — Improve existing code

  • Run their code first.
  • Establish baseline.
  • Find 3 issues.
  • Fix the 1-2 highest-impact issues.
  • Show before/after metrics when possible.

Format C — Design + implement one critical piece

  • Spend the first hour on the design doc.
  • Implement the most judgment-heavy component.
  • Do not try to build the whole system.
  • Add an explicit trade-offs section.

README template

# [Project Name]

## What this does
- [1-2 bullets]

## Architecture
- [Diagram or 5 bullets]

## How to run
- [Commands]

## Decisions and trade-offs
- [Decision 1]: chose X because Y; trade-off: Z.
- [Decision 2]: ...

## What I'd add for production
- [5-7 bullets]

## Eval results
- [Link to EVAL.md]

## Time spent
- ~5 hours

EVAL template

# Eval Report

## Methodology
- [How the eval was set up]

## Gold queries (10-20)
| Query | Expected | Actual | Pass/Fail |
|---|---|---|---|

## Metrics
| Metric | Score |
|---|---|

## Failure modes identified
1. [Cluster 1]
2. [Cluster 2]

## Hypothesized fixes
1. [Fix 1]
2. [Fix 2]

## What I'd do differently with more time
- [Bullet list]

6-hour take-home breakdown

Time Activity
0:00-0:30 Read prompt 3x; decide architecture; sketch
0:30-1:00 Set up project; skeleton commits
1:00-3:00 Build the core feature end-to-end
3:00-3:30 Break
3:30-4:30 Add evals; run; capture results
4:30-5:30 Polish README, EVAL.md, comments
5:30-6:00 Final review; commit; push; send
  • If architecture is wrong at hour 4, pivot to a simpler shippable version.

Sample take-homes to practice

  1. Chatbot over awesome-llm-apps README files.
  2. Agent with 2 tools: web search + calculator.
  3. Structured-output extractor for job postings.
  4. Code-review agent over a small JS/Python repo.

Coding drills

Drill 1 — Streaming chatbot

import anthropic

client = anthropic.Anthropic()

def chat_stream(message: str):
    with client.messages.stream(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": message}],
    ) as stream:
        for text in stream.text_stream:
            print(text, end="", flush=True)
    print()

while True:
    msg = input("> ")
    if msg == "quit":
        break
    chat_stream(msg)
  • Extend with conversation history, a system prompt, token counting, and rate-limit retry.

Drill 2 — Structured output with Pydantic

from anthropic import Anthropic
from pydantic import BaseModel

class JobPosting(BaseModel):
    company: str
    role: str
    salary_min: int | None
    salary_max: int | None
    must_haves: list[str]
    nice_to_haves: list[str]


def extract_job(text: str) -> JobPosting:
    client = Anthropic()
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        tools=[{
            "name": "extract_job_posting",
            "description": "Extract structured job posting fields from text",
            "input_schema": JobPosting.model_json_schema(),
        }],
        tool_choice={"type": "tool", "name": "extract_job_posting"},
        messages=[{"role": "user", "content": f"Extract from:\n\n{text}"}],
    )
    tool_use = next(block for block in response.content if block.type == "tool_use")
    return JobPosting(**tool_use.input)
  • Extend with graceful missing-field handling, validation-retry, and async batch processing.

Drill 3 — Tool-using agent

from langgraph.prebuilt import create_react_agent
from langchain_anthropic import ChatAnthropic
from langchain_core.tools import tool

@tool
def calculator(expression: str) -> str:
    try:
        return str(eval(expression, {"__builtins__": {}}))
    except Exception as exc:
        return f"Error: {exc}"

@tool
def web_search(query: str) -> str:
    return f"Mocked search results for: {query}"

llm = ChatAnthropic(model="claude-sonnet-4-6")
agent = create_react_agent(llm, tools=[calculator, web_search])

result = agent.invoke({
    "messages": [{"role": "user", "content": "What's GDP of India in 2024 plus 100?"}]
})
print(result["messages"][-1].content)
  • Extend with a third tool, max-iteration cap, tracing, and streaming.

Drill 4 — Quick eval harness

gold_set = [
    {"query": "What's the capital of France?", "expected": "Paris"},
    {"query": "Calculate 23 * 41", "expected": "943"},
]


def evaluate(agent, gold_set):
    results = []
    for item in gold_set:
        response = agent.invoke({"messages": [{"role": "user", "content": item["query"]}]})
        actual = response["messages"][-1].content
        passed = item["expected"].lower() in actual.lower()
        results.append({
            "query": item["query"],
            "expected": item["expected"],
            "actual": actual,
            "passed": passed,
        })
    return results

results = evaluate(agent, gold_set)
print(f"Pass rate: {sum(r['passed'] for r in results) / len(results):.1%}")
  • Extend with semantic similarity, failure categorization, and markdown-table output.

Drill 5 — Debug an agent that loops forever

from langgraph.graph import StateGraph

# 1. Add max_iterations
graph = StateGraph(...)
graph.add_node("agent", agent_fn)
graph.add_conditional_edges(
    "agent",
    lambda s: "end" if s["iter"] > 10 else "agent",
)

# 2. Log every iteration
def agent_fn(state):
    print(f"Iteration {state['iter']}: {state['messages'][-1]}")
    ...

# 3. Check tool returns
@tool
def my_tool(...) -> str:
    result = ...
    print(f"Tool returned: {result}")
    return result
  • Common causes: no cap, repeated failed approach, same tool error every time, broken termination condition.

Drill 6 — Make RAG production-ready

  • Add retries with exponential backoff.
  • Add streaming output.
  • Add eval suite with 10 gold queries.
  • Log prompt, response, tokens, and latency.
  • Add source citation in output.
  • Refuse when context lacks the answer.

Setup checklist

Before any interview day

  • Re-read your own repos.
  • Skim the company's engineering content.
  • Sleep, water, camera, backup device, clean environment.
  • Leave 30 minutes after each round to write notes.

Before any coding round

  • IDE ready.
  • Anthropic / OpenAI client installed.
  • Sample API key configured if live calls are allowed.
  • LangGraph + LangChain installed.
  • Pydantic available.
  • Water / coffee / restroom done.

During live coding

Time Activity
First 2 min Restate problem; confirm understanding
Next 5 min Discuss approach out loud; get a nod
Next 25 min Code; narrate decisions
Last 5 min Run it; discuss what you'd add with more time

After every round

  • Send thank-you note within 24 hours.
  • Record what you did not know.
  • Feed that into next week's prep.
  • Track the loop in Portfolio.xlsx.

Anti-patterns

Take-home

  • Over-engineering.
  • Skipping the README.
  • Claiming production readiness after 6 hours.
  • Adding features outside scope.
  • Using an unfamiliar framework under time pressure.

Live coding

  • Coding silently.
  • Chasing perfect solution before working solution.
  • Premature optimization.
  • Ignoring evals.
  • Pasting memorized snippets without explanation.

General loop

  • Rehearsing by reading only; not practicing aloud.
  • Skipping mock interviews.
  • Forgetting to debrief.
  • Claiming framework depth you do not have.