Home
/
Career Guide
/
09. Career
03. Interview Rounds
Round types overview
Round
Typical ask
What to show
Take-home / project
Build an agent, RAG system, eval harness, or production-shaped workflow in 4-6 hours
Working code, clear README, one explicit architecture choice, at least one eval, trade-offs
System design
Design support agent, doc-Q&A, code-review agent, scheduling workflow, moderation system
Problem framing, retrieval/prompt/routing choices, eval plan, failure modes, latency/cost budget, production additions
Behavioral / deep dive
Translate past work into senior AI signal
Numbers, ownership, architecture choices, incidents, trade-offs
Technical / coding
Build a small AI feature live; write robust API-calling code
Clear approach, practical Python, error handling, structured output, fast iteration
Framework-specific
LangGraph, Anthropic SDK, or framework internals
Honest depth on tools listed on resume
What AI interviews test more than classic SWE loops
System design for AI workflows.
Eval thinking.
Failure modes and reliability.
Judgment on tools, retrieval, prompting, and cost.
Leadership signal for Lead-tier roles.
Prep ramp schedule
By round type
Round
Best prep
Take-home
Practice 2-3 time-boxed mock take-homes
System design
Talk through 5-10 designs out loud
Behavioral
Memorize 3-4 strong STAR stories with numbers
Coding
Practice API-calling code and structured outputs quickly
Framework
Read source docs / abstractions for every framework claimed
Weekly ramp
Week
Focus
8
Drill 1 + Drill 4
9
Drill 2 + Drill 5; pre-write STAR stories
10
Drill 3; mock 1 system design + 3 behavioral rounds
11
Drill 6; mock 2 system designs; Lead-specific Qs
12
Mix drills; record yourself; mock 1 full loop
13
Active interviews with tight feedback loop
Take-home patterns
What hiring managers care about
Something working.
Evals included.
Clear README.
Explicit trade-offs.
Readable code.
Defensible architecture decisions.
Examples:
Chatbot over a small docs corpus.
RAG system over provided data.
Agent using 2-3 tools.
Senior move:
Add EVAL.md with 10-20 sample queries, expected outputs, actual outputs, pass/fail.
Run their code first.
Establish baseline.
Find 3 issues.
Fix the 1-2 highest-impact issues.
Show before/after metrics when possible.
Spend the first hour on the design doc.
Implement the most judgment-heavy component.
Do not try to build the whole system.
Add an explicit trade-offs section.
README template
# [Project Name]
## What this does
- [1-2 bullets]
## Architecture
- [Diagram or 5 bullets]
## How to run
- [Commands]
## Decisions and trade-offs
- [Decision 1 ]: chose X because Y; trade-off: Z.
- [Decision 2 ]: ...
## What I'd add for production
- [5-7 bullets]
## Eval results
- [Link to EVAL.md]
## Time spent
- ~5 hours
EVAL template
# Eval Report
## Methodology
- [How the eval was set up]
## Gold queries (10-20)
| Query | Expected | Actual | Pass/Fail |
|---|---|---|---|
## Metrics
| Metric | Score |
|---|---|
## Failure modes identified
1. [Cluster 1]
2. [Cluster 2]
## Hypothesized fixes
1. [Fix 1]
2. [Fix 2]
## What I'd do differently with more time
- [Bullet list]
6-hour take-home breakdown
Time
Activity
0:00-0:30
Read prompt 3x; decide architecture; sketch
0:30-1:00
Set up project; skeleton commits
1:00-3:00
Build the core feature end-to-end
3:00-3:30
Break
3:30-4:30
Add evals; run; capture results
4:30-5:30
Polish README, EVAL.md, comments
5:30-6:00
Final review; commit; push; send
If architecture is wrong at hour 4, pivot to a simpler shippable version.
Sample take-homes to practice
Chatbot over awesome-llm-apps README files.
Agent with 2 tools: web search + calculator.
Structured-output extractor for job postings.
Code-review agent over a small JS/Python repo.
Coding drills
Drill 1 — Streaming chatbot
import anthropic
client = anthropic . Anthropic ()
def chat_stream ( message : str ):
with client . messages . stream (
model = "claude-sonnet-4-6" ,
max_tokens = 1024 ,
messages = [{ "role" : "user" , "content" : message }],
) as stream :
for text in stream . text_stream :
print ( text , end = "" , flush = True )
print ()
while True :
msg = input ( "> " )
if msg == "quit" :
break
chat_stream ( msg )
Extend with conversation history, a system prompt, token counting, and rate-limit retry.
Drill 2 — Structured output with Pydantic
from anthropic import Anthropic
from pydantic import BaseModel
class JobPosting ( BaseModel ):
company : str
role : str
salary_min : int | None
salary_max : int | None
must_haves : list [ str ]
nice_to_haves : list [ str ]
def extract_job ( text : str ) -> JobPosting :
client = Anthropic ()
response = client . messages . create (
model = "claude-sonnet-4-6" ,
max_tokens = 1024 ,
tools = [{
"name" : "extract_job_posting" ,
"description" : "Extract structured job posting fields from text" ,
"input_schema" : JobPosting . model_json_schema (),
}],
tool_choice = { "type" : "tool" , "name" : "extract_job_posting" },
messages = [{ "role" : "user" , "content" : f "Extract from: \n\n { text } " }],
)
tool_use = next ( block for block in response . content if block . type == "tool_use" )
return JobPosting ( ** tool_use . input )
Extend with graceful missing-field handling, validation-retry, and async batch processing.
from langgraph.prebuilt import create_react_agent
from langchain_anthropic import ChatAnthropic
from langchain_core.tools import tool
@tool
def calculator ( expression : str ) -> str :
try :
return str ( eval ( expression , { "__builtins__" : {}}))
except Exception as exc :
return f "Error: { exc } "
@tool
def web_search ( query : str ) -> str :
return f "Mocked search results for: { query } "
llm = ChatAnthropic ( model = "claude-sonnet-4-6" )
agent = create_react_agent ( llm , tools = [ calculator , web_search ])
result = agent . invoke ({
"messages" : [{ "role" : "user" , "content" : "What's GDP of India in 2024 plus 100?" }]
})
print ( result [ "messages" ][ - 1 ] . content )
Extend with a third tool, max-iteration cap, tracing, and streaming.
Drill 4 — Quick eval harness
gold_set = [
{ "query" : "What's the capital of France?" , "expected" : "Paris" },
{ "query" : "Calculate 23 * 41" , "expected" : "943" },
]
def evaluate ( agent , gold_set ):
results = []
for item in gold_set :
response = agent . invoke ({ "messages" : [{ "role" : "user" , "content" : item [ "query" ]}]})
actual = response [ "messages" ][ - 1 ] . content
passed = item [ "expected" ] . lower () in actual . lower ()
results . append ({
"query" : item [ "query" ],
"expected" : item [ "expected" ],
"actual" : actual ,
"passed" : passed ,
})
return results
results = evaluate ( agent , gold_set )
print ( f "Pass rate: { sum ( r [ 'passed' ] for r in results ) / len ( results ) : .1% } " )
Extend with semantic similarity, failure categorization, and markdown-table output.
Drill 5 — Debug an agent that loops forever
from langgraph.graph import StateGraph
# 1. Add max_iterations
graph = StateGraph ( ... )
graph . add_node ( "agent" , agent_fn )
graph . add_conditional_edges (
"agent" ,
lambda s : "end" if s [ "iter" ] > 10 else "agent" ,
)
# 2. Log every iteration
def agent_fn ( state ):
print ( f "Iteration { state [ 'iter' ] } : { state [ 'messages' ][ - 1 ] } " )
...
# 3. Check tool returns
@tool
def my_tool ( ... ) -> str :
result = ...
print ( f "Tool returned: { result } " )
return result
Common causes: no cap, repeated failed approach, same tool error every time, broken termination condition.
Drill 6 — Make RAG production-ready
Add retries with exponential backoff.
Add streaming output.
Add eval suite with 10 gold queries.
Log prompt, response, tokens, and latency.
Add source citation in output.
Refuse when context lacks the answer.
Setup checklist
Before any interview day
Re-read your own repos.
Skim the company's engineering content.
Sleep, water, camera, backup device, clean environment.
Leave 30 minutes after each round to write notes.
Before any coding round
IDE ready.
Anthropic / OpenAI client installed.
Sample API key configured if live calls are allowed.
LangGraph + LangChain installed.
Pydantic available.
Water / coffee / restroom done.
During live coding
Time
Activity
First 2 min
Restate problem; confirm understanding
Next 5 min
Discuss approach out loud; get a nod
Next 25 min
Code; narrate decisions
Last 5 min
Run it; discuss what you'd add with more time
After every round
Send thank-you note within 24 hours.
Record what you did not know.
Feed that into next week's prep.
Track the loop in Portfolio.xlsx.
Anti-patterns
Take-home
Over-engineering.
Skipping the README.
Claiming production readiness after 6 hours.
Adding features outside scope.
Using an unfamiliar framework under time pressure.
Live coding
Coding silently.
Chasing perfect solution before working solution.
Premature optimization.
Ignoring evals.
Pasting memorized snippets without explanation.
General loop
Rehearsing by reading only; not practicing aloud.
Skipping mock interviews.
Forgetting to debrief.
Claiming framework depth you do not have.