11. Metadata preservation — the boring part that wins production¶

~10 min read. The difference between "I found an answer" and "I found an answer and can cite it."

[Stub — to be written]

Outline:

Source attribution: document name, page, section, paragraph
The page number paradox — PDF pages, logical pages, printed pages
Heading hierarchy: preserving H1/H2/H3 so chunks know their context
Authorship and document date — for recency-weighted retrieval and trust signals
Version metadata — same document, different revisions
Access-control metadata — which users can see which chunks (cross-ref module 27)
Language tags for multi-lingual corpora
Quality scores from ingestion — "this chunk was OCR'd at 78% confidence"
The metadata schema as a contract between ingestion and retrieval