Structured Job Extractor — Analysis¶
What this implementation does¶
- Pydantic schema with required fields (
company,role,must_haves) and optional (salary_min,salary_max,nice_to_haves). - Retry loop with correction hint: if validation fails, the next prompt includes the error message so the model can self-correct.
- JSON-decode resilience: malformed JSON also triggers retry with a hint.
- Batch extraction that isolates per-item failures rather than failing the whole batch on one bad posting.
The retry pattern¶
attempt 1: extract → JSON decode fails or validation fails
→ next prompt includes the error message
attempt 2: extract → likely succeeds (model self-corrects given the error)
attempt N: extract → still failing → raise
This pattern works because LLMs are good at fixing their own outputs when told what went wrong. A blind retry (same prompt) often produces the same error. A retry with the error message reaches success much faster.
Why missing compensation as None, not 0¶
Salary 0 is meaningful (unpaid). Salary None is "not disclosed". Conflating them produces wrong downstream filtering ("show me jobs with salary > 0" excludes the not-disclosed ones, which is rarely the intent). The schema enforces this distinction with int | None.
Why batch isolates failures¶
A batch of 100 postings; 3 fail validation. Two approaches:
- Fail-the-whole-batch. Surface the first error; lose results for postings 1-2 that succeeded.
- Per-item isolation. Return a list where each item is either a
JobPostingor anException. Caller decides what to do per item.
Per-item isolation is almost always right for batch processing. Failures are inspected, logged, possibly retried separately; the successful items flow through.
What this exercise teaches¶
- Structured output is not "ask the LLM for JSON" — it's schema + validation + retry.
- The retry's correction hint is the structural defence against transient failures.
- Per-item batch handling beats all-or-nothing.
- Pydantic enforces the contract; without it, the caller is responsible for every type check.
Production additions this version doesn't include¶
- Real LLM call.
mock_llm_responseis the placeholder. In production, swap foropenai.chat.completions.create(...)withresponse_format={"type": "json_object"}or Anthropic's tool-use API. - Tool-use mode. Some providers offer structured-output mode directly (Anthropic's tool use, OpenAI's function calling); when available, prefer them over JSON-mode prompting.
- Token cost tracking. Each attempt costs tokens; track per-call cost.
- Caching. Identical postings should return cached results.
- Per-tenant rate limits.
Interview probes¶
- "How do you handle a model that returns malformed JSON?"
- "Why retry with a correction hint rather than blind retry?"
- "How do you batch-process while keeping per-item success/failure?"
- "What's the difference between using JSON mode and tool-use mode?"
- "How would you measure extractor accuracy on a labelled set?"