06. Background Tasks — move long work to the prep shelf¶

~15 min read. Not every job belongs in the request-response path, especially in AI systems.

Built on the ELI5 in 00-eli5.md. The prep shelf — where slow work waits outside the serving lane — keeps the front desk responsive.

First picture: immediate reply, delayed completion¶

Look at the picture first. Some requests should finish now. Some work should finish later. Those are different promises.

client request
     │
     ▼
┌──────────────┐
│  front desk  │
└──────┬───────┘
       ├── short work ──→ immediate response
       │
       └── enqueue job ──→ prep shelf ──→ worker handles later

See. If a task takes ten seconds, a minute, or an hour, do not trap the user in one HTTP request. Return an acknowledgment. Store a job id. Let workers continue from the prep shelf. Simple, no?

Typical AI examples are obvious. Large document parsing. Bulk embedding generation. Nightly evaluation runs. Dataset labeling jobs. Video transcription. None belong in the main chat request path.

FastAPI `BackgroundTasks` for small post-response work¶

FastAPI has a built-in helper called BackgroundTasks. It is useful, but only for light work after sending the response. Think email logging. Think analytics write. Think deleting a temporary upload.

from fastapi import BackgroundTasks, FastAPI

app = FastAPI()

def write_audit_log(user_id: str, action: str) -> None:
    print(f"audit {user_id} {action}")

@app.post("/feedback")
async def save_feedback(background_tasks: BackgroundTasks):
    background_tasks.add_task(write_audit_log, "u_1", "feedback_submitted")
    return {"status": "accepted"}

This is handy. But notice the boundary. These tasks run in the app process. They are not durable queue jobs. If the process dies, the task may die too. So use this for small, disposable follow-up work. Not for business-critical pipelines.

Durable queues for real long-running AI jobs¶

Now what is the problem? Teams often misuse BackgroundTasks for huge jobs. Bad move. Long work needs durability, retries, visibility, and worker separation. That is queue territory.

Picture the stronger architecture.

upload request
     │
     ▼
FastAPI route
     │
     ├── save file metadata
     ├── create job record
     └── push message to queue
                    │
                    ▼
              worker process
                    │
        parse ─→ chunk ─→ embed ─→ index
                    │
                    ▼
              update job status

Tools here include Celery, RQ, Dramatiq, Arq, or a cloud queue plus separate workers. The exact tool matters less than the promise. A job should survive app restarts. Its status should be queryable. Retries should be controlled. The serving API should stay lean.

Worked example. User uploads a 300 MB policy archive. Your route stores the file reference. It returns job_id = 42 in 200 milliseconds. A worker later parses PDFs, extracts text, chunks content, computes embeddings, and writes to the vector store. The UI polls /jobs/42 or receives events. That is the right shape.

Job design: idempotency, status, and failure visibility¶

A queue alone is not enough. You need job design. Three things matter a lot. Idempotency. Status transitions. Error storage.

queued ──→ running ──→ succeeded
   │          │
   │          └────→ failed
   │
   └────→ cancelled

Idempotency means a retry should not corrupt state. If embedding batch 7 runs twice, can you detect duplicates? If an email step repeats, is it safe? This matters because retries will happen.

Status matters for product trust. Users hate black boxes. A job record with queued, running, failed, and succeeded is basic hygiene. Store timestamps too. Store error messages safe for operators.

Visibility matters for debugging. If a worker crashes halfway, which file failed? Which chunk number? Which vendor call timed out? The prep shelf should not be a dark cupboard. It needs labels.

How to split responsibilities cleanly¶

A healthy AI service often splits into three layers. FastAPI accepts and validates requests. Queue or broker transports long work. Workers do heavy compute and external side effects.

The front desk should not also wash dishes. The prep shelf should not take live customer orders. See. When roles stay clean, scaling and debugging both improve.

One more practical rule. If the user needs the result inside one interaction, keep the route synchronous or streaming. If the work can finish later, make it a job. Do not invent half-pregnant designs where the request hangs for ninety seconds "just in case." That is the worst of both worlds.

Where this lives in the wild¶

OpenAI file-processing backend — platform engineer: uploaded files are acknowledged quickly while parsing and indexing continue in workers.
Notion AI import pipeline — backend engineer: large workspace ingestion runs as background jobs with status updates, not blocking HTTP requests.
Scale AI data operations — ML platform engineer: long labeling and evaluation workflows require durable task queues and retryable workers.
Perplexity document upload service — retrieval engineer: embedding and indexing pipelines belong on workers, while the API immediately returns job tracking ids.
Enterprise support analytics platform — data engineer: nightly summarization and sentiment backfills run off the main serving path to protect daytime chat latency.

Pause and recall¶

When is FastAPI BackgroundTasks a good fit, and when is it the wrong tool?
Why do long AI jobs usually need a durable queue instead of in-process follow-up work?
What three job-design ideas keep background systems reliable?
In the analogy, why should the prep shelf not replace the front desk?

Interview Q&A¶

Q: Why use a task queue instead of FastAPI BackgroundTasks for document indexing? A: Because indexing is long-lived, failure-prone, and operationally important, so it needs durability, retries, worker isolation, and visible status tracking. Common wrong answer to avoid: "Because BackgroundTasks cannot run Python functions."

Q: Why is idempotency central in background AI pipelines? A: Retries and duplicate deliveries happen, so workers must tolerate replay without double-billing, double-indexing, or corrupting downstream state. Common wrong answer to avoid: "Because idempotency makes the first run faster."

Q: Why return a job id quickly instead of keeping the request open until completion? A: It protects serving capacity, matches user expectations for long workflows, and makes retries, progress reporting, and failure recovery much cleaner. Common wrong answer to avoid: "Users always prefer one request, no matter how long it hangs."

Q: Why should queue workers be operationally separate from API workers? A: Their scaling patterns, failure modes, and resource profiles differ sharply, especially when worker jobs are CPU-heavy or bursty. Common wrong answer to avoid: "Separate workers are only for very large companies."

Apply now (5 min)¶

Exercise. Pick one AI workflow you know. Decide whether it should be inline, streamed, or backgrounded. Then define a minimal job state machine with four states.

Sketch from memory. Draw the front desk, the prep shelf, and one worker. Show where a job id is created and later updated.

Bridge. Good. We can queue long work now. But inside one request or one worker, we still need smart concurrency patterns so many waits do not stampede at once. → 07-concurrency-patterns.md