Skip to content

04. Week 14 — Daily Recall

Companion files: Weekly Plan · Explainer · Study Material · Assignment · Revision

How to use: Answer each question out loud, from memory, before checking 02_explainer.md or 03_study_material.md. Aim for 1-2 sentences per answer. Terse is fine; wrong is not.

Monday

  1. [W13] CLIP — what does it learn and how?
  2. [W13] ViT — how does it apply transformers to images?
  3. Diffusion forward process — what happens in T steps? What is x_T?

Tuesday

  1. Write the closed-form expression for x_t given x_0. Name every variable.
  2. Stable Diffusion architecture — text encoder, U-Net, VAE. Role of each component.
  3. Latent diffusion — why work in latent space instead of pixel space? What is the speedup?

Wednesday

  1. Classifier-free guidance — what is trained differently? What is computed at inference?
  2. Guidance scale w: what happens at w = 1, w = 7, and w = 20?
  3. [W13] GAN generator vs discriminator — what does each do?

Thursday

  1. Text-to-image eval — how do you measure quality? (FID, CLIP score, human eval — limitations of each)
  2. Diffusion vs GAN — trade-offs (quality, diversity, training stability, inference speed).
  3. [W11] RAG vs fine-tuning vs prompting — decision framework.

Friday

  1. DiT vs U-Net — architectural difference. Which scales better and why?
  2. Name three inference speed techniques (DDIM, LCM, consistency models). One sentence each.
  3. Image generation safety — what production guardrails are needed? Name three.

After completing the week: run the four Retrieval Prompts in 02_explainer.md as a final self-test before starting 06_revision.md.