12. Design Patterns for AI — wiring the model switchboard¶

~14 min read. AI features look magical until the model switchboard trips in production.

Built on the ELI5 in 00-eli5.md. The wiring — behavioral logic — decides how one room hands work through a stable hallway during model lifecycle events.

1) Strategy plus Factory: swap the brain, keep the contract steady¶

See. Product asks for GPT-4 today and Claude tomorrow. Your business workflow should not collapse because the vendor changed. Strategy pattern helps by standardising one behavior contract. Factory pattern helps by creating the right implementation from config. Together they make model swapping boring. That is exactly what you want. Simple diagram. ┌──────────────┐ │ Summarizer │ └──────┬───────┘ │ uses ▼ ┌──────────────┐ │ ModelStrategy│ ├──────┬───────┤ │ GPT4 │ Claude│ └──────┴───────┘ ▲ │ created by ┌──────────────┐ │ ModelFactory │ └──────────────┘ Worked example. Request budget is 2 seconds. Prompt has 1400 tokens. Config says premium tenants use GPT-4. Default tenants use Claude Sonnet. Service logic should only call generate. Concrete code-level sketch.

class ModelStrategy:
    def generate(self, prompt, opts):
        raise NotImplementedError
class GPT4Strategy(ModelStrategy):
    def generate(self, prompt, opts):
        return openai_generate(prompt, opts)
class ClaudeStrategy(ModelStrategy):
    def generate(self, prompt, opts):
        return anthropic_generate(prompt, opts)
class ModelFactory:
    def build(self, tenant_tier):
        return GPT4Strategy() if tenant_tier == 'premium' else ClaudeStrategy()

Simple, no? Business code stays stable while implementations move. That stability is pattern value, not pattern ceremony.

2) Pipeline pattern: preprocess in small stages, not one mega-method¶

AI requests rarely go straight to the model. You clean text. You redact secrets. You enrich with retrieval context. You maybe compress a long chat. That sequence is a pipeline. Each stage transforms input and passes output onward. Use small stages so you can test, measure, and reorder them. Diagram first. ┌────────┐ → ┌─────────┐ → ┌──────────┐ → ┌──────────┐ │ input │ │ redact │ │ retrieve │ │ compress │ └────────┘ └─────────┘ └──────────┘ └────┬─────┘ │ ▼ ┌─────────┐ │ model │ └─────────┘ Worked example with numbers. Raw user text is 3200 tokens. Secret scrubber removes 3 API keys. Retriever adds 4 chunks of 180 tokens each. Compressor shrinks conversation history from 2200 to 600 tokens. Final prompt lands at 1320 tokens. That is pipeline value you can explain clearly. Concrete code-level sketch.

type Stage = (ctx: PromptContext) => Promise<PromptContext>
async function runPipeline(ctx: PromptContext, stages: Stage[]) {
  let current = ctx
  for (const stage of stages) current = await stage(current)
  return current
}

Why not one giant method? Because debugging one 150-line preprocessing block is misery. Pipelines also let you add metrics per stage. If retrieval suddenly adds 900 ms, you see it immediately.

3) Observer pattern: watch the model without tangling the generation code¶

Monitoring should not invade every business branch. Observer pattern lets the main flow publish events. Listeners then handle logging, alerting, tracing, and dashboards. That keeps generation logic readable. Use observers for things like latency spikes, guardrail hits, and token burn. Simple picture. ┌──────────────┐ publish ┌──────────────┐ │ ModelRunner │────────────→│ EventBus │ └──────────────┘ ├──────────────┤ │ MetricsSink │ │ AlertSink │ │ AuditSink │ └──────────────┘ Worked example. A call starts at 10:00:00. Model reply returns at 10:00:01.8. Prompt tokens are 900. Completion tokens are 240. Guardrail flags one unsafe phrase. You publish one ModelCompleted event and one GuardrailTriggered event. Concrete code-level sketch.

interface ModelObserver {
  void onEvent(ModelEvent event);
}
class MetricsObserver implements ModelObserver {
  public void onEvent(ModelEvent event) {
    metrics.increment(event.type());
  }
}

See the benefit. Adding a new audit sink does not change generation code. It only subscribes to events. That is clean behavioral expansion.

4) Chain of Responsibility: route prompts by intent, risk, and cost¶

Not every prompt should hit the same model path. Some are billing questions. Some need tool use. Some are unsafe and should stop early. Chain of Responsibility passes the request through handlers until one claims it. This keeps routing logic open for extension. Diagram. ┌──────────┐ → ┌──────────┐ → ┌──────────┐ → ┌──────────┐ │ Safety │ │ ToolUse │ │ Billing │ │ General │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ blocked │ matched │ matched │ fallback ▼ ▼ ▼ ▼ stop tools path cheap model default model Worked example. Prompt A asks for account refund status. Prompt B asks to run SQL over uploaded CSV. Prompt C contains self-harm language. Safety handler stops C. ToolUse handler routes B to tool-enabled flow. Billing handler routes A to a cheaper fine-tuned model. Everything else goes to default generation. Concrete code-level sketch.

class Handler:
    def __init__(self, nxt=None):
        self.nxt = nxt
    def handle(self, req):
        return self.nxt.handle(req) if self.nxt else default_route(req)

Factory often helps here too. It can assemble the chain from config. Premium region may get one extra compliance handler. Consumer region may skip it. So what to do in LLD? Keep handler contracts tiny. Expose route reason for observability. Never hide cost and safety decisions inside random if-else blocks.

Where this lives in the wild¶

At GitHub, a Copilot platform engineer can use Strategy and Factory to swap model providers without changing IDE workflow code. At Anthropic, an inference platform engineer may attach observers to token, latency, and refusal metrics across deployments. At Perplexity, a retrieval engineer can model preprocessing as a pipeline with chunking, reranking, and citation stages. At Uber, an applied scientist building support automation can route prompts through safety and tool handlers before generation. At LinkedIn, an ML infrastructure engineer may use factories to instantiate region-specific model clients with policy controls.

Pause and recall¶

Why do Strategy and Factory often appear together in AI systems? What metric becomes easier to isolate when preprocessing is pipeline-based? Why is Observer cleaner than sprinkling log calls everywhere? When does Chain of Responsibility beat one giant router method?

Interview Q&A¶

Why Strategy not if-else model selection everywhere?¶

Strategy keeps the contract stable while vendor-specific behavior stays isolated. Common wrong answer to avoid: A few if statements are always simpler than a pattern.

Why Pipeline not one preprocessing function?¶

Pipeline gives stage-level testability, metrics, and controlled reordering. Common wrong answer to avoid: Breaking steps apart only adds files, not value.

Why Observer not direct metrics calls from ModelRunner?¶

Observer removes monitoring tangles and lets new sinks subscribe without touching core flow. Common wrong answer to avoid: Logging inside the main method is the cleanest because it is nearby.

Why Chain of Responsibility not one router class?¶

A chain handles evolving routing rules without turning one class into a dumping ground. Common wrong answer to avoid: Routing is just branching, so patterns are overkill by default.

Apply now (5 min)¶

Design a chat assistant that serves free and premium users. Write one Strategy interface, one Factory rule, and three pipeline stages. Then add two routing handlers: safety first and tool use second. Sketch from memory: draw the request path from input to pipeline to router to model to observer sinks.

Bridge. Patterns give reusable wiring. Next, we enter the actual AI rooms that need those patterns inside them. → 13-ai-component-lld.md