03. Interview Rounds¶

Round types overview¶

Round	Typical ask	What to show
Take-home / project	Build an agent, RAG system, eval harness, or production-shaped workflow in 4-6 hours	Working code, clear README, one explicit architecture choice, at least one eval, trade-offs
System design	Design support agent, doc-Q&A, code-review agent, scheduling workflow, moderation system	Problem framing, retrieval/prompt/routing choices, eval plan, failure modes, latency/cost budget, production additions
Behavioral / deep dive	Translate past work into senior AI signal	Numbers, ownership, architecture choices, incidents, trade-offs
Technical / coding	Build a small AI feature live; write robust API-calling code	Clear approach, practical Python, error handling, structured output, fast iteration
Framework-specific	LangGraph, Anthropic SDK, or framework internals	Honest depth on tools listed on resume

What AI interviews test more than classic SWE loops¶

System design for AI workflows.
Eval thinking.
Failure modes and reliability.
Judgment on tools, retrieval, prompting, and cost.
Leadership signal for Lead-tier roles.

Prep ramp schedule¶

By round type¶

Round	Best prep
Take-home	Practice 2-3 time-boxed mock take-homes
System design	Talk through 5-10 designs out loud
Behavioral	Memorize 3-4 strong STAR stories with numbers
Coding	Practice API-calling code and structured outputs quickly
Framework	Read source docs / abstractions for every framework claimed

Weekly ramp¶

Week	Focus
8	Drill 1 + Drill 4
9	Drill 2 + Drill 5; pre-write STAR stories
10	Drill 3; mock 1 system design + 3 behavioral rounds
11	Drill 6; mock 2 system designs; Lead-specific Qs
12	Mix drills; record yourself; mock 1 full loop
13	Active interviews with tight feedback loop

Take-home patterns¶

What hiring managers care about¶

Something working.
Evals included.
Clear README.
Explicit trade-offs.
Readable code.
Defensible architecture decisions.

Format A — Build an X that does Y¶

Examples:
Chatbot over a small docs corpus.
RAG system over provided data.
Agent using 2-3 tools.
Senior move:
Add EVAL.md with 10-20 sample queries, expected outputs, actual outputs, pass/fail.

Format B — Improve existing code¶

Run their code first.
Establish baseline.
Find 3 issues.
Fix the 1-2 highest-impact issues.
Show before/after metrics when possible.

Format C — Design + implement one critical piece¶

Spend the first hour on the design doc.
Implement the most judgment-heavy component.
Do not try to build the whole system.
Add an explicit trade-offs section.

README template¶

# [Project Name]

## What this does
- [1-2 bullets]

## Architecture
- [Diagram or 5 bullets]

## How to run
- [Commands]

## Decisions and trade-offs
- [Decision 1]: chose X because Y; trade-off: Z.
- [Decision 2]: ...

## What I'd add for production
- [5-7 bullets]

## Eval results
- [Link to EVAL.md]

## Time spent
- ~5 hours

EVAL template¶

# Eval Report

## Methodology
- [How the eval was set up]

## Gold queries (10-20)
| Query | Expected | Actual | Pass/Fail |
|---|---|---|---|

## Metrics
| Metric | Score |
|---|---|

## Failure modes identified
1. [Cluster 1]
2. [Cluster 2]

## Hypothesized fixes
1. [Fix 1]
2. [Fix 2]

## What I'd do differently with more time
- [Bullet list]

6-hour take-home breakdown¶

Time	Activity
0:00-0:30	Read prompt 3x; decide architecture; sketch
0:30-1:00	Set up project; skeleton commits
1:00-3:00	Build the core feature end-to-end
3:00-3:30	Break
3:30-4:30	Add evals; run; capture results
4:30-5:30	Polish README, EVAL.md, comments
5:30-6:00	Final review; commit; push; send

If architecture is wrong at hour 4, pivot to a simpler shippable version.

Sample take-homes to practice¶

Chatbot over awesome-llm-apps README files.
Agent with 2 tools: web search + calculator.
Structured-output extractor for job postings.
Code-review agent over a small JS/Python repo.

Coding drills¶

Drill 1 — Streaming chatbot¶

import anthropic

client = anthropic.Anthropic()

def chat_stream(message: str):
    with client.messages.stream(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": message}],
    ) as stream:
        for text in stream.text_stream:
            print(text, end="", flush=True)
    print()

while True:
    msg = input("> ")
    if msg == "quit":
        break
    chat_stream(msg)

Extend with conversation history, a system prompt, token counting, and rate-limit retry.

Drill 2 — Structured output with Pydantic¶

from anthropic import Anthropic
from pydantic import BaseModel

class JobPosting(BaseModel):
    company: str
    role: str
    salary_min: int | None
    salary_max: int | None
    must_haves: list[str]
    nice_to_haves: list[str]


def extract_job(text: str) -> JobPosting:
    client = Anthropic()
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        tools=[{
            "name": "extract_job_posting",
            "description": "Extract structured job posting fields from text",
            "input_schema": JobPosting.model_json_schema(),
        }],
        tool_choice={"type": "tool", "name": "extract_job_posting"},
        messages=[{"role": "user", "content": f"Extract from:\n\n{text}"}],
    )
    tool_use = next(block for block in response.content if block.type == "tool_use")
    return JobPosting(**tool_use.input)

Extend with graceful missing-field handling, validation-retry, and async batch processing.

Drill 3 — Tool-using agent¶

from langgraph.prebuilt import create_react_agent
from langchain_anthropic import ChatAnthropic
from langchain_core.tools import tool

@tool
def calculator(expression: str) -> str:
    try:
        return str(eval(expression, {"__builtins__": {}}))
    except Exception as exc:
        return f"Error: {exc}"

@tool
def web_search(query: str) -> str:
    return f"Mocked search results for: {query}"

llm = ChatAnthropic(model="claude-sonnet-4-6")
agent = create_react_agent(llm, tools=[calculator, web_search])

result = agent.invoke({
    "messages": [{"role": "user", "content": "What's GDP of India in 2024 plus 100?"}]
})
print(result["messages"][-1].content)

Extend with a third tool, max-iteration cap, tracing, and streaming.

Drill 4 — Quick eval harness¶

gold_set = [
    {"query": "What's the capital of France?", "expected": "Paris"},
    {"query": "Calculate 23 * 41", "expected": "943"},
]


def evaluate(agent, gold_set):
    results = []
    for item in gold_set:
        response = agent.invoke({"messages": [{"role": "user", "content": item["query"]}]})
        actual = response["messages"][-1].content
        passed = item["expected"].lower() in actual.lower()
        results.append({
            "query": item["query"],
            "expected": item["expected"],
            "actual": actual,
            "passed": passed,
        })
    return results

results = evaluate(agent, gold_set)
print(f"Pass rate: {sum(r['passed'] for r in results) / len(results):.1%}")

Extend with semantic similarity, failure categorization, and markdown-table output.

Drill 5 — Debug an agent that loops forever¶

from langgraph.graph import StateGraph

# 1. Add max_iterations
graph = StateGraph(...)
graph.add_node("agent", agent_fn)
graph.add_conditional_edges(
    "agent",
    lambda s: "end" if s["iter"] > 10 else "agent",
)

# 2. Log every iteration
def agent_fn(state):
    print(f"Iteration {state['iter']}: {state['messages'][-1]}")
    ...

# 3. Check tool returns
@tool
def my_tool(...) -> str:
    result = ...
    print(f"Tool returned: {result}")
    return result

Common causes: no cap, repeated failed approach, same tool error every time, broken termination condition.

Drill 6 — Make RAG production-ready¶

Add retries with exponential backoff.
Add streaming output.
Add eval suite with 10 gold queries.
Log prompt, response, tokens, and latency.
Add source citation in output.
Refuse when context lacks the answer.

Setup checklist¶

Before any interview day¶

Re-read your own repos.
Skim the company's engineering content.
Sleep, water, camera, backup device, clean environment.
Leave 30 minutes after each round to write notes.

Before any coding round¶

IDE ready.
Anthropic / OpenAI client installed.
Sample API key configured if live calls are allowed.
LangGraph + LangChain installed.
Pydantic available.
Water / coffee / restroom done.

During live coding¶

Time	Activity
First 2 min	Restate problem; confirm understanding
Next 5 min	Discuss approach out loud; get a nod
Next 25 min	Code; narrate decisions
Last 5 min	Run it; discuss what you'd add with more time

After every round¶

Send thank-you note within 24 hours.
Record what you did not know.
Feed that into next week's prep.
Track the loop in Portfolio.xlsx.

Anti-patterns¶

Take-home¶

Over-engineering.
Skipping the README.
Claiming production readiness after 6 hours.
Adding features outside scope.
Using an unfamiliar framework under time pressure.

Live coding¶

Coding silently.
Chasing perfect solution before working solution.
Premature optimization.
Ignoring evals.
Pasting memorized snippets without explanation.

General loop¶

Rehearsing by reading only; not practicing aloud.
Skipping mock interviews.
Forgetting to debrief.
Claiming framework depth you do not have.