02. Build, Buy, or Fine-tune — The classic captain decision¶

~13 min read. The first serious compass test is simple to ask, and costly to ask badly.

Built on the ELI5 in 00-eli5.md. the compass — reminder of the decision framework — keeps tool names from pretending to be strategy.

1) Start with the picture, not the vendor list¶

See. A captain does not begin by admiring engines. The captain first asks where the ship must go.

Then asks how rough the water is. Then asks what cargo must be carried. AI teams should do the same.

Yet many teams begin with tool names. Which foundation model? Which vector store?

Which GPU stack? Which fine-tune service? That is backwards.

The course comes first. The compass comes second. Tool choice comes after that.

┌──────────────────────┐
│ Desired product move │
└──────────┬───────────┘
           ▼
┌──────────────────────┐
│ Decision questions   │
└──────────┬───────────┘
           ▼
┌──────────────────────┐
│ Evidence collected   │
└──────────┬───────────┘
           ▼
┌──────────────────────┐
│ Build / Buy / Tune   │
└──────────────────────┘

Look. The wrong first question is, 'Which stack should we adopt?'

The right first questions are these. What behaviour is missing today? How certain are we about demand? What performance gap is blocking value? What compliance or cost constraints already exist? How reversible is the choice?

Simple, no? If uncertainty is high, prefer API-first.

Do not marry infrastructure before evidence arrives.

2) Use a ladder of commitment¶

So what to do? Match investment to evidence level. That is the whole trick.

Low evidence should get low commitment. High evidence can justify deeper investment. See the ladder.

┌───────────────────────────────┐
│ API-first                     │
│ fastest learning              │
├───────────────────────────────┤
│ RAG / retrieval layer         │
│ adds domain grounding         │
├───────────────────────────────┤
│ Fine-tune                     │
│ changes behaviour more deeply │
├───────────────────────────────┤
│ Self-host or custom stack     │
│ highest control and burden    │
└───────────────────────────────┘

Start with API-first when the problem is still foggy. Why? Because the weather check is still incomplete.

You still do not know real usage shape. You still do not know failure patterns. You still do not know business tolerance.

Then move to RAG when grounding is the main issue. Move to fine-tuning when repeated behaviour gaps survive prompting and retrieval. Move to self-hosting when evidence proves cost, latency, control, or data constraints demand it.

Yes? Notice the order. It is not ideological.

It is evidential. The course decides the destination. The compass decides the step size.

The weather check decides how much risk the ship can carry.

3) Decision questions before tool names¶

Write these questions in one shared place. That shared place becomes operational clarity. It also becomes the start of the ship's log.

Here is a clean question set.

Is the product problem proven or still exploratory?
Is the main gap knowledge, behaviour, latency, cost, or control?
How often will this workload run in production?
What error types are acceptable, and which are fatal?
What data boundaries or regulatory needs constrain deployment?
How reversible is the decision within one quarter?
Who will operate the system after launch? Look. This question set does two useful things. First, it slows cargo-cult choices.

Second, it helps the crew disagree usefully. A product manager may highlight uncertainty. A staff engineer may highlight operational burden.

A security lead may highlight data movement risk. A finance partner may highlight usage economics. That is healthy.

The crew should not always agree instantly. They should share the same compass. Simple, no?

If they do, the debate becomes structured. If they do not,

build vs buy becomes politics with diagrams.

4) A practical default path for many AI teams¶

Many teams want a magical final answer. There usually is none. But there is a sane default path.

See the path clearly.

Begin with an external API to learn user demand quickly.
Add retrieval when freshness or domain specificity matters.
Fine-tune only after seeing stable failure patterns.
Self-host only when the evidence survives the weather check. Why this order? Because most teams over-invest before they understand the task. They optimise the ship engine before confirming the route.

That feels advanced. It is often wasteful. Now one simple scoring picture.

Give each option a score from 1 to 5 on these dimensions. Learning speed. Operational burden.

Unit economics. Control. Reversibility.

Then discuss the scores openly. Do not worship the matrix. Use it to sharpen judgement. And always write the final choice into the ship's log. Why option two lost matters. Why option three was delayed matters.

Revisit triggers matter too. Otherwise a temporary choice hardens into accidental doctrine. Yes?

That is how mature teams move. Not by guessing bigger. By committing deeper only when evidence earns it.

Where this lives in the wild¶

Customer support assistant — product lead may start with an API model, then add retrieval for policy documents when accuracy gaps appear.
Code generation copilot — platform engineer may delay fine-tuning until prompt and context improvements stop moving the error curve.
Enterprise search assistant — solutions architect may choose buy plus RAG first because domain freshness matters more than custom behaviour.
Voice agent for collections — ML lead may consider self-hosting only after latency and compliance constraints remain stubborn.
Clinical note summariser — engineering manager may keep the early stack reversible until evaluation evidence becomes stable across specialties.

Pause and recall¶

Why should build vs buy start with questions instead of tool names?
What usually justifies moving from API-first to RAG?
When does fine-tuning become more reasonable than more prompting?
Which placeholder reminds you to match investment level to evidence level?

Interview Q&A¶

Q: Why is API-first a sensible default when uncertainty is high?

A: It maximises learning while preserving reversibility. You discover demand, failure types, and economics before carrying heavy operational baggage.

Common wrong answer to avoid: API-first is always the cheapest long-term architecture.

Q: Why is RAG often a better next step than fine-tuning?

A: Many early failures come from missing or stale knowledge, not missing model capability. Retrieval can close that gap with less commitment.

Common wrong answer to avoid: Fine-tuning is the professional choice, and RAG is only a beginner step.

Q: What usually justifies self-hosting?

A: Repeated evidence around control, cost, latency, privacy, or availability constraints. Self-hosting needs a strong operating reason, not ego.

Common wrong answer to avoid: Self-hosting is best because it gives maximum ownership.

Q: Why should the team record why rejected options lost?

A: That protects future reasoning, reduces repeated debates, and shows the crew what evidence changed later.

Common wrong answer to avoid: Recording only the chosen option is enough for documentation.

Apply now (5 min)¶

Exercise: Pick one AI feature your team is discussing. Answer the seven decision questions from this file.

Then place the feature on the ladder: API-first, RAG, fine-tune, or self-host. Sketch from memory: Draw the ladder of commitment.

Next to each rung, write one sentence on when that rung becomes justified.

Bridge. Build vs buy is only one example. Underneath it sits a broader idea: some choices are easy to reverse, and some are not.

→ 03-reversibility-one-way-doors.md