09. Serverless patterns — Lambda, Cloud Functions, and workflows that fit¶

⏱️ Estimated time: 17 min | Level: intermediate

ELI5 callback: In the dragon farm, the barn runs the work, the feeding trough holds the data, the fence limits access, the breeding ground scales the herd, and the ledger stops waste. Today we see where no-server style execution helps AI systems, and where it does not.

1) See the shape clearly¶

Lambda, Cloud Functions, and workflow orchestration all matter here. They do not optimise the same pressure. See. Start with workload shape, not vendor branding. Check startup time, runtime length, and host control. Check who patches the base layer. Check whether scale is steady or bursty. Check whether warm state must survive. Simple, no? Serverless functions execute code on demand with tight operational boundaries. Workflow services coordinate retries, branching, and long-running steps. The pattern shines around AI systems more often than inside heavy model serving. See. Use it for glue, not for every hot path. So what to do? Write the fit matrix before provisioning anything. - Prioritise the slowest or costliest path. - Measure idle time honestly. - Record operational ownership. - Record rollback method. - Record debugging path. - Record compliance limits. Good teams choose boring defaults first. Fancy choices can wait.

2) Read the decision signals¶

Use serverless for webhooks, validation, routing, enrichment, and event handlers. Use workflow tools for multi-step batch or approval flows. Avoid pure function runtimes for large warm models and long GPU-heavy jobs. Cold starts, package size, and timeout ceilings matter a lot. State should usually live outside the function. Good serverless design breaks work into short, observable steps. Now use thresholds, not feelings. If latency is sacred, keep readiness. If cost is sacred, chase utilisation carefully. If control is sacred, reduce abstraction. If delivery speed is sacred, buy managed pieces. Quick decision prompts: - Can the work finish quickly? - Can state be stored externally? - Will traffic spike unpredictably? - Is package size small enough? - Does the job need GPU or huge memory? - Would a workflow engine simplify retries? See. One clear 'no' can eliminate a whole option. Trade-offs are normal. Document the fallback path. Now watch.

3) Map the working path¶

Serverless often sits at the edges of the platform. It reacts to events and hands off heavy lifting elsewhere. That split keeps costs and complexity reasonable. Now watch the path. ┌────────────┐ ┌────────────┐ ┌────────────┐ │ Event │──→│ Function │──→│ Workflow │ └────────────┘ └─────┬──────┘ └─────┬──────┘ │ │ ▼ ▼ ┌────────────┐ ┌────────────┐ │ Store │ │ Alert │ └────────────┘ └────────────┘ An event can be an API call, file upload, queue message, or timer. The function should validate, enrich, and route quickly. Longer multi-step flows belong in workflow services with state and retries. Persistent data should live in storage or a database, not memory hope. Alerts should fire on failures and unusual duration growth. Serverless works best when the contract of each step is tiny and clear. At every arrow, ask who retries. At every box, ask who pays. At every store, ask what expires. Now watch. One metric should sit beside each box. That is how operations stays sane.

4) Notice the common traps¶

Trying to host giant models inside tiny function packages. Ignoring cold starts while promising chat-grade latency. Packing too many responsibilities into one handler. Relying on function memory for durable state. Skipping idempotency on event retries. Using serverless where always-on workers would be simpler. See. Most outages start as silent assumptions. Review these traps before launch: - Cold starts can spike p95 suddenly. - Retry storms can duplicate downstream work. - Timeouts can cut multi-step jobs halfway. - Large dependency bundles can slow deployments and startup. - Hidden concurrency limits can throttle launches. - Debugging can get painful without tracing. Simple, no? Write failure drills for the top three risks. Decide what degrades first. Decide what must never degrade. Review quotas before launch day. Prefer explicit limits over wishful thinking. Now watch.

5) Lock the operating routine¶

Keep each function short, stateless, and observable. Move real state into durable stores. Use workflows when steps branch or run long. Design idempotent handlers for retries. Measure cold start, duration, and concurrency limits. Push heavy inference to dedicated services when needed. Lock the language across the team. Use the same terms in code, dashboards, and reviews. Review this quick operating list: - Keep packages small. - Keep timeouts explicit. - Keep events versioned. - Keep tracing enabled. - Keep retries bounded. - Keep downstream writes idempotent. Good platform design keeps the barn, feeding trough, fence, breeding ground, and ledger aligned. So what to do? Create a one-page runbook. Create a one-page cost note. Create a one-page rollback note. Teach the team the same words. That alignment saves real money. See. Consistency beats cleverness. Benchmark first; opinions come second. Name the owner of every limit. Prefer reversible choices whenever the future is foggy. Document what changes during incidents. Keep one small default path for newcomers. Automate the boring thing as soon as it stabilises. Vendor docs help, but workload data matters more. Good naming prevents bad tickets. Observe p95, not only averages. Small runbooks beat heroic memory. Teach cost with the same seriousness as latency. Now watch how much confusion disappears.

Where this lives in the wild¶

AWS Lambda plus Step Functions. Very common pattern for orchestration around AI and data pipelines.
Google Cloud Functions with Workflows. Useful for event-driven glue and moderate orchestration in GCP.
Azure Functions with Durable Functions. Shows how workflow semantics can sit beside short handlers.
Cloudflare Workers and Workers AI edge handlers. Good example of tiny logic close to users with managed scale.
Webhook processing and file-triggered enrichment pipelines. A classic place where serverless pays for itself quickly.

Pause and recall¶

What kind of AI work fits serverless well? Say it without looking up vendor names.
Why do giant models rarely fit pure function runtimes? Give one concrete example.
When should a workflow engine join the picture? State the trade-off in one line.
What must every retrying handler guarantee? Mention one failure mode too.

Interview Q&A¶

Q. When is serverless a strong fit? A. It is strong for short, stateless, bursty work such as webhooks, validation, and routing. Common wrong answer to avoid: It is strong whenever you want lower ops work. Better direction: Mention duration, state, and package size.

Q. Why not serve every model through Lambda style functions? A. Large models need warm memory, bigger packages, and often hardware that function runtimes do not suit. Common wrong answer to avoid: Auto-scaling solves that automatically. Better direction: Explain cold starts, model loading, and hardware limits.

Q. What is the role of Step Functions or Durable Functions? A. They coordinate retries, branching, waiting, and multi-step state cleanly. Common wrong answer to avoid: They are just nicer cron jobs. Better direction: Tie them to orchestration and fault handling.

Q. What design property makes retries safe? A. Idempotency keeps repeated events from corrupting state or duplicating side effects. Common wrong answer to avoid: Just disable retries. Better direction: Show how event-driven systems fail without idempotency.

Apply now (5 min)¶

Pick one AI-adjacent task in your system.
Check whether it is short and stateless.
Check whether it can tolerate cold starts.
Check whether it needs durable state.
Decide function, workflow, or always-on service.
Write the idempotency key if events retry.
Write the timeout limit.
Write the metric you would alert on first.

Bridge. Serverless understood. But what about inference at the edge? → 10