03. FastAPI Basics — clean routes for messy AI work¶

~15 min read. FastAPI gives the front desk shape, validation, and safe request handling.

Built on the ELI5 in 00-eli5.md. The front desk — the entry point for each request — must label the order ticket correctly before cooking starts.

What FastAPI actually gives you¶

Picture the request path first. A browser sends JSON. FastAPI matches a route. It validates input. It runs dependencies. Then it calls your handler.

HTTP request
     │
     ▼
┌──────────────┐
│  front desk  │  route match: /chat
└──────┬───────┘
       ▼
┌──────────────┐
│ Pydantic     │  validate body and query params
└──────┬───────┘
       ▼
┌──────────────┐
│ dependencies │  auth, db session, clients
└──────┬───────┘
       ▼
┌──────────────┐
│ handler      │  build response or stream
└──────────────┘

See. FastAPI is not just a router. It is a contract system. It turns messy input into typed Python objects. That matters a lot for AI products. Prompts, model names, tool settings, and metadata should not drift silently.

The front desk must reject malformed orders early. If max_tokens is a string, if messages is missing, if temperature is outside policy, we should fail cleanly. Not deep inside model code. Simple, no?

Routes, models, and dependencies in one worked example¶

Let us build a tiny chat endpoint. Picture it before code.

client JSON
   │
   ▼
ChatRequest model
   │
   ├── validated fields
   ├── default values
   └── shape guaranteed
   │
   ▼
chat handler
   │
   ├── auth dependency
   ├── llm client dependency
   └── returns ChatResponse

Now the code.

from fastapi import Depends, FastAPI, Header, HTTPException
from pydantic import BaseModel, Field

app = FastAPI()

class ChatRequest(BaseModel):
    message: str = Field(min_length=1, max_length=4000)
    model: str = Field(default="gpt-4o-mini")
    temperature: float = Field(default=0.2, ge=0.0, le=1.0)

class ChatResponse(BaseModel):
    answer: str
    model: str

async def require_api_key(x_api_key: str = Header()) -> str:
    if x_api_key != "demo-key":
        raise HTTPException(status_code=401, detail="bad key")
    return x_api_key

@app.post("/chat", response_model=ChatResponse)
async def chat(
    body: ChatRequest,
    _: str = Depends(require_api_key),
) -> ChatResponse:
    return ChatResponse(answer=f"Echo: {body.message}", model=body.model)

Now step through it. ChatRequest is the shape of the incoming order ticket. If the payload is wrong, FastAPI returns a 422 response. That is good. The bug stops at the counter.

Depends(require_api_key) is how we plug shared checks into the route. Auth is one example. Database session creation is another. Rate-limit context, tracing context, and LLM clients also fit well.

response_model=ChatResponse is underrated. It ensures the output shape stays stable. Your pass window should not leak accidental internal fields. That matters for AI billing, metadata, and tool output contracts.

Dependency injection keeps the kitchen modular¶

Now what is the problem without dependencies? Handlers become giant functions. Each route opens clients. Each route repeats auth. Each route duplicates tracing. Then nobody knows where shared policy lives.

Picture the better structure.

route handler
    │
    ├── get_current_user()
    ├── get_llm_client()
    ├── get_db_session()
    └── get_request_id()

Each dependency does one job. FastAPI wires them together. That makes testing easier too. We can override dependencies in tests. We can swap real LLM clients with fakes. We can insert tenant policy checks once.

Worked example. Suppose premium users may access a larger model. We can express that cleanly.

class User(BaseModel):
    user_id: str
    tier: str

async def get_current_user() -> User:
    return User(user_id="u_1", tier="pro")

@app.post("/generate")
async def generate(body: ChatRequest, user: User = Depends(get_current_user)):
    chosen_model = body.model if user.tier == "pro" else "gpt-4o-mini"
    return {"model": chosen_model}

See how readable that is. The front desk assembles the right context. The line cook can focus on business logic. Simple, no?

Pydantic is your schema firewall¶

AI systems fail from shape drift more often than people admit. One service sends conversation_id. Another expects thread_id. One tool returns score as string. Another expects float. You need a schema firewall. That is what Pydantic gives.

unsafe flow                         safer flow
raw dict ──→ handler                raw dict ──→ model validate ──→ handler
             │                                         │
             └── hidden key errors                     └── clear contract

Look at a practical model.

from typing import Literal

class EmbedRequest(BaseModel):
    texts: list[str] = Field(min_length=1, max_length=128)
    input_type: Literal["query", "document"]
    normalize: bool = True

Now invalid inputs fail early. Empty list? Rejected. Unknown input_type? Rejected. That prevents weird downstream states.

Also, schemas help your API docs. FastAPI automatically generates OpenAPI docs. Frontend teams and SDK generators love this. For AI platforms, that means safer integrations with chat UIs, agents, and internal tools.

FastAPI is a thin layer, not your whole architecture¶

This point matters at senior level. FastAPI is excellent at request handling and typed contracts. It is not a queue system. It is not a workflow engine. It is not an observability platform. It is not your retry strategy.

So what to do? Use FastAPI for the front desk and route-level orchestration. Keep business logic in services. Keep long work on the prep shelf. Keep schemas explicit. Keep dependencies small. That is the maintainable shape.

When teams fail here, the route file becomes a junk drawer. Prompt templates, DB writes, retry loops, metrics, and policy branches all live together. Then every change is risky. Better design is boring design. That is good engineering.

Where this lives in the wild¶

OpenAI-compatible internal gateway — platform engineer: FastAPI routes validate model, tenant, and token-limit settings before sending requests downstream.
LangSmith-style tracing API — backend engineer: dependency injection adds request ids, auth context, and tracing spans to every run endpoint.
Perplexity upload service — product engineer: Pydantic models guard document metadata before chunking and indexing pipelines start.
GitHub Copilot enterprise proxy — API engineer: response models prevent accidental leakage of internal scoring fields back to the editor client.
Zapier AI actions API — integrations engineer: auto-generated docs from FastAPI schemas reduce mismatch across hundreds of third-party connectors.

Pause and recall¶

Why is FastAPI more than just a URL router?
What problem does dependency injection solve in AI service codebases?
Why does response_model matter even when your handler already returns a dict?
In the analogy, what does a well-run front desk do before the kitchen starts work?

Interview Q&A¶

Q: Why use Pydantic request models instead of raw dictionaries in AI APIs? A: Because typed validation catches schema drift, enforces constraints early, and documents the contract for every caller and internal service. Common wrong answer to avoid: "Because Python dictionaries are slow."

Q: Why prefer dependency injection for clients and auth instead of constructing them inside every route? A: It centralizes shared policy, improves testability, and prevents route handlers from turning into unmaintainable setup code. Common wrong answer to avoid: "Because dependencies make the code shorter, so they are always better."

Q: Why specify a response model when returning JSON from a route? A: It enforces output shape, improves generated docs, and reduces accidental exposure of internal fields or inconsistent data contracts. Common wrong answer to avoid: "Response models only matter for frontend autocomplete."

Q: Why is FastAPI not the full answer to production AI serving? A: Because serving also needs retries, queues, observability, rate limits, graceful shutdown, and external worker systems beyond simple request routing. Common wrong answer to avoid: "Once you picked FastAPI, architecture is basically solved."

Apply now (5 min)¶

Exercise. Design one request model for a summarization endpoint. Include at least one bounded number field, one enum-like field, and one required text field. Then write one dependency that injects a fake current user.

Sketch from memory. Draw the front desk flow. Show route match, model validation, dependency injection, and handler execution in order.

Bridge. FastAPI can call both sync and async functions. So the next important question is sharp: when should a route be async def, and when should it stay plain def? → 04-async-endpoints.md