10. Image and figure handling — what to do with the pictures¶

~9 min read. Sometimes the figure is the answer. Throwing it away is a silent failure.

[Stub — to be written]

Outline:

The three options: ignore, caption, embed-as-image
Caption generation with a vision-LLM
Storing image bytes alongside text and serving them at retrieval time
Multimodal embeddings (CLIP, SigLIP) — when image-as-vector pays off
Chart and graph extraction — read the values, not just describe the picture
Diagrams and architecture drawings — usually not worth full structured extraction
Figure-text grounding: linking a caption to its figure
The "ignore for v1, revisit for v2" decision — when this is the right call