Skip to content

10. Image and figure handling — what to do with the pictures

~9 min read. Sometimes the figure is the answer. Throwing it away is a silent failure.


[Stub — to be written]

Outline:

  • The three options: ignore, caption, embed-as-image
  • Caption generation with a vision-LLM
  • Storing image bytes alongside text and serving them at retrieval time
  • Multimodal embeddings (CLIP, SigLIP) — when image-as-vector pays off
  • Chart and graph extraction — read the values, not just describe the picture
  • Diagrams and architecture drawings — usually not worth full structured extraction
  • Figure-text grounding: linking a caption to its figure
  • The "ignore for v1, revisit for v2" decision — when this is the right call