09. Index lifecycle — build, update, and rebuild without downtime¶

~14 min read. Search quality is only half the job. The other half is keeping indexes fresh while users keep querying.

Continues from the first-principles overview in 00-first-principles.md. The loading dock — the pipeline where new parcels arrive and old shelves are rebuilt — is where vector search becomes operations, not theory.

1) The lifecycle stages¶

Begin with a concrete workload: An index is not born finished. It moves through stages.

embedding generation
validation and deduplication
index build
warmup
production serving
updates and deletes
full rebuild or compaction
retirement

The operating spectrum looks like this.

raw data -> embeddings -> index build -> shadow test -> cutover -> serve
                                  │                         │
                                  └──── rebuild later ◄─────┘

The loading dock is a conveyor belt. If any stage is sloppy, users feel it later as stale or broken search.

2) Incremental updates vs full rebuilds¶

Some changes are small. New documents arrive. A few old documents are deleted. An HNSW index may accept inserts incrementally. Even then, quality may drift over time.

Other changes are structural. New embedding model. Different metric. Different HNSW parameters. Different IVF centroids. These usually require a rebuild.

In production, rebuilds take time and the live system cannot simply disappear. The zero-downtime pattern is to run dual indexes: build the new index beside the old one, validate it in shadow, and cut traffic over only when it is ready.

3) Worked numerical example: blue-green index cutover¶

Suppose index V1 serves 100% of traffic. It has 10 million vectors. A new embedding model creates V2. You build V2 in parallel.

Latency and recall check:

V1 recall@10 = 0.92, p95 latency = 42 ms
V2 recall@10 = 0.95, p95 latency = 48 ms

Good start. Now shadow traffic for 1,000 real queries. Suppose V2 disagrees with V1 on 220 queries. Manual review finds:

150 are genuine improvements
40 are neutral
30 are regressions

Regression rate is 30 / 1000 = 3%. If that exceeds the launch threshold, you keep tuning. If it passes, move 5% live traffic to V2. Then 25%. Then 100%. That is blue-green rollout.

Use this picture as the mental model before the details.

before cutover
query -> router -> V1 only

migration phase
query -> router -> V1 serve
               -> V2 shadow copy

after cutover
query -> router -> V2 serve
V1 kept briefly for rollback

Rollback remains easy. That is the beauty. The warehouse stays open.

4) Deletes, tombstones, and compaction¶

Deletes are annoying. Some indexes cannot cheaply remove vectors in place. So systems use tombstones. A deleted item remains in storage, but search ignores it.

In production, the failure is specific: Too many tombstones waste memory and slow traversal. A graph may still carry dead edges. An IVF list may still hold stale codes. The practical rule is: Run compaction or periodic rebuilds.

The sketch looks like this.

live ids:   1 2 3 4 5 6 7 8
deleted:      x   x     x
search must skip tombstones
rebuild later removes dead weight

The loading dock needs policies. For example:

if tombstones exceed 10%, rebuild shard
if embedding model changes, full reindex
if latency rises 20%, compact graph or lists

Operational rules beat vague hope.

5) Reindexing without downtime: the full recipe¶

A mature recipe looks like this.

Generate new embeddings into a versioned namespace. 2. Build new index in shadow. 3. Run exact-scan evaluation on sampled queries. 4. Run ANN evaluation on real filtered queries. 5. Warm caches and graph entry points. 6. Shadow live traffic. 7. Canary small live percentage. 8. Monitor recall proxies, latency, and error rates. 9. Flip router.
Keep old index for rollback window.
Retire old index after confidence period.

That is the production pattern. The route map changes quietly. The user sees only stable search.

A second operational complexity appears during long backfills. Backfills can take hours or days. During that time, new documents keep arriving. The practical rule is: Use a delta log. Build the big snapshot, then replay incremental writes onto the new index before cutover. Otherwise the new index goes stale before launch.

6) Versioning discipline¶

Never name indexes vaguely. Please. Use explicit versions. For example:

docs-emb-v3-hnsw-m32-2026-06
catalog-emb-v5-ivfpq-96x8

The name should reveal embedding model, index family, and build wave. Then rollbacks are sane. Audits are sane. Debugging is sane.

Also version metadata schema. If aisle sticker fields change, query code and index code must agree. A filter on language_code will fail if the new index stores lang. These small mismatches cause ugly incidents.

The loading dock is where discipline lives. Without it, vector search becomes chaos.

6) Why not rebuilding in place during live traffic under this workload¶

The tempting alternative is rebuilding in place during live traffic because it keeps the architecture small and makes the first demo look clean. That story is useful for a prototype, but it becomes dangerous once the workload has real scale, filters, freshness pressure, and evaluation data.

It fails when indexes must change while traffic continues and embeddings/documents keep moving. At that point the system needs an inspectable artifact — blue-green index release plan with build, backfill, cutover, tombstones, and rollback — because otherwise every bad answer turns into a vague argument about whether embeddings, ANN, metadata filters, lifecycle, or evaluation are guilty.

Option	Works when	Fails when	Cost moves to
rebuilding in place during live traffic	corpus is small or low-risk	indexes must change while traffic continues and embeddings/documents keep moving	latency, recall, or user trust
index lifecycle	the failure can be measured in the index path	traces or baselines are missing	memory, rebuilds, evals, operations

Mini-FAQ. "Is this always worth adding?" No. The RAG-fundamentals rule still applies: add machinery only when a measured workload pressure earns it. If exact search is cheap, if filters are simple, or if evaluation is missing, the clever index can become a more expensive way to stay confused.

7) Production signals — know whether index lifecycle is working¶

Healthy behavior means blue-green index release plan with build, backfill, cutover, tombstones, and rollback explains why the returned neighbors changed. In a real incident review, you should be able to point at that artifact and explain why the candidate set changed, not merely say that the database returned something.

The first metric to watch is cutover error rate and freshness lag. Track it by query family, tenant, corpus slice, and index version, because global averages hide exactly the failures users notice first.

The misleading metric is database uptime. A vector database can be perfectly available while recall, filtering, freshness, or embedding compatibility is broken, so uptime only proves the warehouse doors opened; it does not prove the scout robot found the right shelf.

The expert graph compares exact baseline recall, p50/p99 latency, filter selectivity, index version, embedding version, and bad-query examples by slice. That graph is the difference between tuning knobs and debugging a retrieval system.

bad retrieval
   -> query vector / filter
   -> index path
   -> candidate neighbors
   -> score and metadata trace
   -> exact baseline or judged list

8) Boundary — where index lifecycle helps and where it does not¶

Use this mechanism when the failure happens inside vector geometry, index traversal, filtering, lifecycle, or serving operations. That is the zone where vector-database machinery can actually change the returned neighbors, the latency curve, or the operational envelope.

Do not expect it to fix cases where the source content is wrong, the embedding model is poor for the domain, or the product definition of relevance is unresolved. Those are upstream or product-definition failures, and better ANN settings will only make the wrong evidence arrive faster.

The common pathology is that teams keep tuning ANN knobs when the real issue is bad chunks, stale data, weak labels, or missing evals. In interviews, call this out explicitly: the index is not the whole retrieval system, it is one stage inside a pipeline that also depends on documents, chunks, labels, and evals.

The scale limit is blunt: every improvement spends something — RAM, disk, build time, query latency, engineering time, or vendor lock-in. The mature answer is not to pick the fanciest mechanism; it is to choose the pressure you are willing to pay for.

9) Wrong model — an index is built once and then left alone¶

The wrong model is attractive because it compresses the system into one easy story, and easy stories feel good in design docs. The trouble is that production vector search is not one story; it is embedding quality, distance metric, ANN index, metadata filters, lifecycle, sharding, vendor operations, and monitoring all interacting under traffic.

If index lifecycle cannot change recall, latency, cost, freshness, or debug visibility, it is not carrying its weight; it is vocabulary without leverage.

10) Failure taxonomy for index lifecycle¶

Geometry failure — the embedding space does not put useful neighbors close enough.
Metric failure — the chosen similarity ruler disagrees with the model or workload.
Index failure — ANN skips relevant vectors or returns unstable candidates.
Filtering failure — metadata filters erase good candidates or violate scope.
Lifecycle failure — stale, mixed-version, or partially rebuilt indexes serve traffic.
Scale failure — fan-out, memory, or rebuild cost breaks the SLO.
Debugging failure — no trace connects query vector, index path, candidates, and final result.

11) Pattern transfer — where this returns later¶

RAG uses vector DBs as the evidence gateway before generation.
Retrieval and ranking supplies the metrics and fusion logic used here.
Data engineering supplies chunk quality, metadata, and embedding-version hygiene.
Production evals decide whether recall and relevance changes actually help users.

12) Design review checklist¶

What pressure is this mechanism relieving: latency, memory, filtering, freshness, scale, or evaluation?
What artifact would you inspect first: vector neighbors, index trace, filter plan, namespace manifest, or exact baseline?
Why is rebuilding in place during live traffic weaker for this workload?
Which slice should improve first?
Which cost rises first: RAM, disk, build time, query latency, or operational complexity?
What rollback signal tells you the index change hurt retrieval?

Where this lives in the wild¶

Pinecone production migrations — vector platform engineer. Teams build a new namespace or index, shadow traffic, then cut over gradually.
Weaviate schema upgrades — search infrastructure engineer. Reindexing accompanies embedding changes, payload schema changes, and rollback planning.
Qdrant collections in SaaS products — SRE plus backend engineer. Blue-green collections and aliases let traffic switch without client-visible downtime.
pgvector inside PostgreSQL — database reliability engineer. New tables or indexes are built concurrently, validated, then swapped into query plans.
FAISS offline-to-online pipelines — ML infra engineer. Snapshot builds plus delta replay keep large indexes fresh during long backfills.
Enterprise RAG — vector DBs store policy, wiki, ticket, and document chunks for semantic retrieval.
Ecommerce search — vectors help with descriptive queries while filters protect catalog scope.
Support copilots — need metadata filters for tenant, product, language, and freshness.
Code search — mixes semantic vectors with exact identifiers and repository permissions.
Recommendation systems — use nearest-neighbor retrieval before ranking models.
Image and multimodal search — embeddings represent images, captions, and cross-modal queries.
Legal discovery — recall and auditability are more important than average latency alone.
Healthcare retrieval — metadata, permissions, and freshness are safety boundaries.
Fraud and anomaly systems — vector similarity finds nearby behavior patterns.
Personalization systems — user and item embeddings need versioned lifecycle management.

Recall checkpoint¶

Which changes usually force a full rebuild rather than an incremental update?
Why do blue-green index rollouts reduce risk?
What problem do tombstones create over time?
Why is delta replay needed during long rebuilds?
Which artifact would you inspect first for index lifecycle?
What query or corpus slice would prove the improvement is real?
What is the first operational cost this mechanism adds?

Interview Q&A¶

Q: Why rebuild a new index beside the old one instead of mutating production in place? A: Because side-by-side builds allow validation, shadow traffic, and instant rollback if quality or latency regresses.

Common wrong answer to avoid: "Because vector indexes cannot be updated." Many can; the issue is safe migration.

Q: Why can incremental updates degrade quality over time even if they are supported? A: Because graph structure, centroids, tombstones, and data distribution can drift away from the assumptions of the original build.

Common wrong answer to avoid: "Only latency changes, not quality." Quality can drift too.

Q: Why keep old indexes after cutover? A: Because rollback must be fast when hidden regressions appear under live traffic.

Common wrong answer to avoid: "For archival curiosity." It is primarily an operational safety measure.

Q: Why version embedding schema and metadata schema together? A: Because query logic, filters, and vector interpretation must match the index contents exactly during rollout.

Common wrong answer to avoid: "Only the model version matters." Schema mismatches can break search just as badly.

Q: What artifact would you inspect first when index lifecycle fails? A: I would inspect blue-green index release plan with build, backfill, cutover, tombstones, and rollback, then compare it with exact baseline, filter state, index version, and embedding version.

Common wrong answer to avoid: "Just check whether the vector DB is up." — Availability does not prove recall, freshness, or relevance.

Q: How do you know the change helped? A: Track cutover error rate and freshness lag on a representative query slice and compare it with latency, memory, build time, and filtered-result behavior.

Common wrong answer to avoid: "The average similarity score increased." — Similarity scores are not product-quality metrics by themselves.

Q: When should you avoid this mechanism? A: Avoid it when the corpus is small, exact search is cheap, or the team lacks evaluation data to prove the extra complexity helps.

Common wrong answer to avoid: "Every production AI system needs the most advanced vector index." — The right index depends on workload, scale, filters, and operational constraints.

Apply now (10 min)¶

Exercise. Write a mini rollout plan for upgrading from embedding model V1 to V2. Include build, shadow test, canary, cutover, and rollback. Add one tombstone threshold that would trigger compaction.

Sketch from memory. Draw the blue-green router with old and new indexes beside the loading dock. Label where delta replay happens before the final swap.

Reproduce from memory: explain index lifecycle with its pressure, artifact, metric, boundary, and failure mode.

What you should remember¶

Index lifecycle exists because indexes must change while traffic continues and embeddings/documents keep moving. The point is not to memorize a vendor feature; it is to know which workload pressure the mechanism relieves and which cost it creates.

The artifact to inspect is blue-green index release plan with build, backfill, cutover, tombstones, and rollback. If you cannot inspect it, vector search debugging becomes guesswork.

Remember:

Vector search fails through geometry, metrics, indexes, filters, lifecycle, scale, and monitoring.
Watch cutover error rate and freshness lag by query and corpus slice before trusting global averages.
Exact baselines and judged lists are how you keep ANN tuning honest.
Every vector database choice moves cost between recall, latency, memory, rebuilds, and operations.

Bridge. One index is manageable. Real products need many machines, many shards, and many replicas. So next we study scaling and distribution. → 10-scaling-sharding.md