Skip to content

05. Assignment 14 — Diffusion Image Generation Pipeline

Companion files: Weekly Plan · Explainer · Study Material · Daily Recall · Revision

Week 14. Build a controlled image generation pipeline with systematic evaluation.

Goal

Build a pipeline that generates images from text prompts using Stable Diffusion, while making the internals legible to yourself.

Your output should prove you understand: - how denoising works conceptually - how CFG changes the result - why latent diffusion is fast enough for real use - how to reason about quality, diversity, and latency together

Choose an implementation track

Pick one realistic track and ship it.

Track A — local or Colab diffusion

  • Use Hugging Face diffusers
  • Run Stable Diffusion 1.5, SDXL, or a turbo variant
  • Recommended if you want direct exposure to steps, schedulers, and guidance

Track B — hosted image API with strong evaluation

  • Use a hosted model if local GPU access is weak
  • Still log prompt, seed, steps, guidance scale, and latency
  • Recommended if your bottleneck is hardware, not understanding

Requirements

  1. Generation pipeline — text → image with configurable parameters
  2. Controlled generation — at least one control method (ControlNet, img2img, or inpainting)
  3. Prompt engineering analysis — systematic prompt variations, not random trial-and-error
  4. Evaluation — at least one automated metric and one human preference review
  5. Documentation — explain the pipeline clearly in your README

Deliverables

  1. generate.py — text-to-image generation with model, steps, guidance scale, seed
  2. controlled.py — controlled generation pipeline
  3. prompt_analysis.py — prompt sweep and comparison utility
  4. eval.py — CLIP score, FID if possible, or structured LLM/human judging
  5. results/prompt_engineering.md — observations and failure cases
  6. README.md — architecture, diffusion explanation, results, lessons

Experiment matrix

Experiment What to learn
Guidance scale (1 vs 3 vs 7 vs 15 vs 20) Creativity vs fidelity trade-off (02_explainer.md Chapter 4)
Steps (4 vs 20 vs 50 vs 100) Speed vs quality trade-off (02_explainer.md Chapter 5)
Negative prompts (with vs without) Whether quality and safety improve
Prompt structure (short vs detailed) What detail actually helps
Seed variation (same prompt, different seeds) Diversity under fixed intent
ControlNet on/off How much spatial control you gain

Suggested execution plan

Day 1

  • Get a single prompt working end-to-end.
  • Save prompt, seed, steps, guidance scale, and runtime.

Day 2

  • Add the controlled generation path.
  • Verify that the control signal changes composition, not just style.

Day 3

  • Run the experiment matrix.
  • Save outputs in a reproducible structure.

Day 4

  • Evaluate outputs.
  • Write the README so a reviewer can follow your reasoning.

Evaluation rubric

Score yourself honestly.

Dimension Questions
Correctness Does the pipeline actually generate coherent outputs?
Control Does ControlNet/img2img meaningfully affect layout or structure?
Analysis Did you learn something specific from prompt/step/guidance sweeps?
Explanation Can a reader understand the system from your README alone?
Production thinking Did you record latency, memory limits, and trade-offs?

Success criteria

  • Pipeline generates coherent images from text prompts
  • Controlled generation clearly changes spatial outcomes
  • Prompt analysis produces specific, actionable conclusions
  • README explains forward process, reverse process, latent space, and CFG from memory
  • You can answer the retrieval prompts in 02_explainer.md after shipping

Stretch goals

  • Add a small web UI
  • Compare two schedulers
  • Compare one fast model and one slower high-quality model
  • Add a safety or moderation layer
  • Benchmark latency across resolutions

Why this matters

Image generation is no longer a novelty feature.

It appears in creative tooling, marketing automation, design workflows, editing, synthetic data, and multimodal products.

A strong AI engineer does not just produce pretty outputs. A strong AI engineer can explain why those outputs happened, how fast they can be served, and what trade-offs were chosen.