10. Image and figure handling — what to do with the pictures¶
~9 min read. Sometimes the figure is the answer. Throwing it away is a silent failure.
[Stub — to be written]
Outline:
- The three options: ignore, caption, embed-as-image
- Caption generation with a vision-LLM
- Storing image bytes alongside text and serving them at retrieval time
- Multimodal embeddings (CLIP, SigLIP) — when image-as-vector pays off
- Chart and graph extraction — read the values, not just describe the picture
- Diagrams and architecture drawings — usually not worth full structured extraction
- Figure-text grounding: linking a caption to its figure
- The "ignore for v1, revisit for v2" decision — when this is the right call