05. Assignment 14 — Diffusion Image Generation Pipeline¶
Companion files: Weekly Plan · Explainer · Study Material · Daily Recall · Revision
Week 14. Build a controlled image generation pipeline with systematic evaluation.
Goal¶
Build a pipeline that generates images from text prompts using Stable Diffusion, while making the internals legible to yourself.
Your output should prove you understand: - how denoising works conceptually - how CFG changes the result - why latent diffusion is fast enough for real use - how to reason about quality, diversity, and latency together
Choose an implementation track¶
Pick one realistic track and ship it.
Track A — local or Colab diffusion¶
- Use Hugging Face
diffusers - Run Stable Diffusion 1.5, SDXL, or a turbo variant
- Recommended if you want direct exposure to steps, schedulers, and guidance
Track B — hosted image API with strong evaluation¶
- Use a hosted model if local GPU access is weak
- Still log prompt, seed, steps, guidance scale, and latency
- Recommended if your bottleneck is hardware, not understanding
Requirements¶
- Generation pipeline — text → image with configurable parameters
- Controlled generation — at least one control method (ControlNet, img2img, or inpainting)
- Prompt engineering analysis — systematic prompt variations, not random trial-and-error
- Evaluation — at least one automated metric and one human preference review
- Documentation — explain the pipeline clearly in your README
Deliverables¶
generate.py— text-to-image generation with model, steps, guidance scale, seedcontrolled.py— controlled generation pipelineprompt_analysis.py— prompt sweep and comparison utilityeval.py— CLIP score, FID if possible, or structured LLM/human judgingresults/prompt_engineering.md— observations and failure casesREADME.md— architecture, diffusion explanation, results, lessons
Experiment matrix¶
| Experiment | What to learn |
|---|---|
| Guidance scale (1 vs 3 vs 7 vs 15 vs 20) | Creativity vs fidelity trade-off (02_explainer.md Chapter 4) |
| Steps (4 vs 20 vs 50 vs 100) | Speed vs quality trade-off (02_explainer.md Chapter 5) |
| Negative prompts (with vs without) | Whether quality and safety improve |
| Prompt structure (short vs detailed) | What detail actually helps |
| Seed variation (same prompt, different seeds) | Diversity under fixed intent |
| ControlNet on/off | How much spatial control you gain |
Suggested execution plan¶
Day 1¶
- Get a single prompt working end-to-end.
- Save prompt, seed, steps, guidance scale, and runtime.
Day 2¶
- Add the controlled generation path.
- Verify that the control signal changes composition, not just style.
Day 3¶
- Run the experiment matrix.
- Save outputs in a reproducible structure.
Day 4¶
- Evaluate outputs.
- Write the README so a reviewer can follow your reasoning.
Evaluation rubric¶
Score yourself honestly.
| Dimension | Questions |
|---|---|
| Correctness | Does the pipeline actually generate coherent outputs? |
| Control | Does ControlNet/img2img meaningfully affect layout or structure? |
| Analysis | Did you learn something specific from prompt/step/guidance sweeps? |
| Explanation | Can a reader understand the system from your README alone? |
| Production thinking | Did you record latency, memory limits, and trade-offs? |
Success criteria¶
- Pipeline generates coherent images from text prompts
- Controlled generation clearly changes spatial outcomes
- Prompt analysis produces specific, actionable conclusions
- README explains forward process, reverse process, latent space, and CFG from memory
- You can answer the retrieval prompts in 02_explainer.md after shipping
Stretch goals¶
- Add a small web UI
- Compare two schedulers
- Compare one fast model and one slower high-quality model
- Add a safety or moderation layer
- Benchmark latency across resolutions
Why this matters¶
Image generation is no longer a novelty feature.
It appears in creative tooling, marketing automation, design workflows, editing, synthetic data, and multimodal products.
A strong AI engineer does not just produce pretty outputs. A strong AI engineer can explain why those outputs happened, how fast they can be served, and what trade-offs were chosen.