Skip to content

Structured Job Extractor — Analysis

What this implementation does

  • Pydantic schema with required fields (company, role, must_haves) and optional (salary_min, salary_max, nice_to_haves).
  • Retry loop with correction hint: if validation fails, the next prompt includes the error message so the model can self-correct.
  • JSON-decode resilience: malformed JSON also triggers retry with a hint.
  • Batch extraction that isolates per-item failures rather than failing the whole batch on one bad posting.

The retry pattern

attempt 1: extract → JSON decode fails or validation fails
         → next prompt includes the error message
attempt 2: extract → likely succeeds (model self-corrects given the error)
attempt N: extract → still failing → raise

This pattern works because LLMs are good at fixing their own outputs when told what went wrong. A blind retry (same prompt) often produces the same error. A retry with the error message reaches success much faster.

Why missing compensation as None, not 0

Salary 0 is meaningful (unpaid). Salary None is "not disclosed". Conflating them produces wrong downstream filtering ("show me jobs with salary > 0" excludes the not-disclosed ones, which is rarely the intent). The schema enforces this distinction with int | None.

Why batch isolates failures

A batch of 100 postings; 3 fail validation. Two approaches:

  • Fail-the-whole-batch. Surface the first error; lose results for postings 1-2 that succeeded.
  • Per-item isolation. Return a list where each item is either a JobPosting or an Exception. Caller decides what to do per item.

Per-item isolation is almost always right for batch processing. Failures are inspected, logged, possibly retried separately; the successful items flow through.

What this exercise teaches

  • Structured output is not "ask the LLM for JSON" — it's schema + validation + retry.
  • The retry's correction hint is the structural defence against transient failures.
  • Per-item batch handling beats all-or-nothing.
  • Pydantic enforces the contract; without it, the caller is responsible for every type check.

Production additions this version doesn't include

  • Real LLM call. mock_llm_response is the placeholder. In production, swap for openai.chat.completions.create(...) with response_format={"type": "json_object"} or Anthropic's tool-use API.
  • Tool-use mode. Some providers offer structured-output mode directly (Anthropic's tool use, OpenAI's function calling); when available, prefer them over JSON-mode prompting.
  • Token cost tracking. Each attempt costs tokens; track per-call cost.
  • Caching. Identical postings should return cached results.
  • Per-tenant rate limits.

Interview probes

  • "How do you handle a model that returns malformed JSON?"
  • "Why retry with a correction hint rather than blind retry?"
  • "How do you batch-process while keeping per-item success/failure?"
  • "What's the difference between using JSON mode and tool-use mode?"
  • "How would you measure extractor accuracy on a labelled set?"