05. Assignment 14 — Diffusion Image Generation Pipeline¶

Companion files: Weekly Plan · Explainer · Study Material · Daily Recall · Revision

Week 14. Build a controlled image generation pipeline with systematic evaluation.

Goal¶

Build a pipeline that generates images from text prompts using Stable Diffusion, while making the internals legible to yourself.

Your output should prove you understand: - how denoising works conceptually - how CFG changes the result - why latent diffusion is fast enough for real use - how to reason about quality, diversity, and latency together

Choose an implementation track¶

Pick one realistic track and ship it.

Track A — local or Colab diffusion¶

Use Hugging Face diffusers
Run Stable Diffusion 1.5, SDXL, or a turbo variant
Recommended if you want direct exposure to steps, schedulers, and guidance

Track B — hosted image API with strong evaluation¶

Use a hosted model if local GPU access is weak
Still log prompt, seed, steps, guidance scale, and latency
Recommended if your bottleneck is hardware, not understanding

Requirements¶

Generation pipeline — text → image with configurable parameters
Controlled generation — at least one control method (ControlNet, img2img, or inpainting)
Prompt engineering analysis — systematic prompt variations, not random trial-and-error
Evaluation — at least one automated metric and one human preference review
Documentation — explain the pipeline clearly in your README

Deliverables¶

generate.py — text-to-image generation with model, steps, guidance scale, seed
controlled.py — controlled generation pipeline
prompt_analysis.py — prompt sweep and comparison utility
eval.py — CLIP score, FID if possible, or structured LLM/human judging
results/prompt_engineering.md — observations and failure cases
README.md — architecture, diffusion explanation, results, lessons

Experiment matrix¶

Experiment	What to learn
Guidance scale (1 vs 3 vs 7 vs 15 vs 20)	Creativity vs fidelity trade-off (02_explainer.md Chapter 4)
Steps (4 vs 20 vs 50 vs 100)	Speed vs quality trade-off (02_explainer.md Chapter 5)
Negative prompts (with vs without)	Whether quality and safety improve
Prompt structure (short vs detailed)	What detail actually helps
Seed variation (same prompt, different seeds)	Diversity under fixed intent
ControlNet on/off	How much spatial control you gain

Suggested execution plan¶

Day 1¶

Get a single prompt working end-to-end.
Save prompt, seed, steps, guidance scale, and runtime.

Day 2¶

Add the controlled generation path.
Verify that the control signal changes composition, not just style.

Day 3¶

Run the experiment matrix.
Save outputs in a reproducible structure.

Day 4¶

Evaluate outputs.
Write the README so a reviewer can follow your reasoning.

Evaluation rubric¶

Score yourself honestly.

Dimension	Questions
Correctness	Does the pipeline actually generate coherent outputs?
Control	Does ControlNet/img2img meaningfully affect layout or structure?
Analysis	Did you learn something specific from prompt/step/guidance sweeps?
Explanation	Can a reader understand the system from your README alone?
Production thinking	Did you record latency, memory limits, and trade-offs?

Success criteria¶

Pipeline generates coherent images from text prompts
Controlled generation clearly changes spatial outcomes
Prompt analysis produces specific, actionable conclusions
README explains forward process, reverse process, latent space, and CFG from memory
You can answer the retrieval prompts in 02_explainer.md after shipping

Stretch goals¶

Add a small web UI
Compare two schedulers
Compare one fast model and one slower high-quality model
Add a safety or moderation layer
Benchmark latency across resolutions

Why this matters¶

Image generation is no longer a novelty feature.

It appears in creative tooling, marketing automation, design workflows, editing, synthetic data, and multimodal products.

A strong AI engineer does not just produce pretty outputs. A strong AI engineer can explain why those outputs happened, how fast they can be served, and what trade-offs were chosen.