Structured Job Extractor — Analysis¶

What this implementation does¶

Pydantic schema with required fields (company, role, must_haves) and optional (salary_min, salary_max, nice_to_haves).
Retry loop with correction hint: if validation fails, the next prompt includes the error message so the model can self-correct.
JSON-decode resilience: malformed JSON also triggers retry with a hint.
Batch extraction that isolates per-item failures rather than failing the whole batch on one bad posting.

The retry pattern¶

attempt 1: extract → JSON decode fails or validation fails
         → next prompt includes the error message
attempt 2: extract → likely succeeds (model self-corrects given the error)
attempt N: extract → still failing → raise

This pattern works because LLMs are good at fixing their own outputs when told what went wrong. A blind retry (same prompt) often produces the same error. A retry with the error message reaches success much faster.

Why missing compensation as `None`, not `0`¶

Salary 0 is meaningful (unpaid). Salary None is "not disclosed". Conflating them produces wrong downstream filtering ("show me jobs with salary > 0" excludes the not-disclosed ones, which is rarely the intent). The schema enforces this distinction with int | None.

Why batch isolates failures¶

A batch of 100 postings; 3 fail validation. Two approaches:

Fail-the-whole-batch. Surface the first error; lose results for postings 1-2 that succeeded.
Per-item isolation. Return a list where each item is either a JobPosting or an Exception. Caller decides what to do per item.

Per-item isolation is almost always right for batch processing. Failures are inspected, logged, possibly retried separately; the successful items flow through.

What this exercise teaches¶

Structured output is not "ask the LLM for JSON" — it's schema + validation + retry.
The retry's correction hint is the structural defence against transient failures.
Per-item batch handling beats all-or-nothing.
Pydantic enforces the contract; without it, the caller is responsible for every type check.

Production additions this version doesn't include¶

Real LLM call. mock_llm_response is the placeholder. In production, swap for openai.chat.completions.create(...) with response_format={"type": "json_object"} or Anthropic's tool-use API.
Tool-use mode. Some providers offer structured-output mode directly (Anthropic's tool use, OpenAI's function calling); when available, prefer them over JSON-mode prompting.
Token cost tracking. Each attempt costs tokens; track per-call cost.
Caching. Identical postings should return cached results.
Per-tenant rate limits.

Interview probes¶

"How do you handle a model that returns malformed JSON?"
"Why retry with a correction hint rather than blind retry?"
"How do you batch-process while keeping per-item success/failure?"
"What's the difference between using JSON mode and tool-use mode?"
"How would you measure extractor accuracy on a labelled set?"