Skip to content

13. Evaluating ingestion quality — how to know your pipeline is working

~13 min read. The eval most RAG teams skip, and then wonder why retrieval is "broken".


[Stub — to be written]

Outline:

  • The four failure types: text loss, structural loss, hallucination, ordering error
  • A golden-document set: 20-50 hand-curated docs with hand-extracted expected text
  • Character-level recall: did we get the words back
  • Structural recall: did we keep the heading hierarchy
  • Numeric fidelity: did the digits in tables survive
  • Reading-order tests: are paragraphs in the right sequence
  • Diffing extracted output against ground truth
  • Continuous evaluation: pin a few customer docs as canaries
  • The "RAG eval will hide ingestion bugs" trap — why upstream eval matters separately
  • Cross-reference to module 24 (evals in production)