Skip to content

Training Log

Failure → fix note

Failure pattern: early generations sounded repetitive and overly rehearsed.

Likely cause: the tiny corpus had repeated phrasing and the initial tuning plan was too aggressive for such a narrow dataset. In the module’s language, this looked like the beginning of narrow specialization pressure with catastrophic forgetting risk.

Fix: keep exact-duplicate removal in the data pipeline, use a conservative learning rate (5e-5), preserve a held-out split, and compare base vs tuned generations at the same temperatures.

Evidence to watch:

  • held-out perplexity improves instead of only train loss
  • tuned generations pick up domain vocabulary without losing basic coherence
  • base vs tuned comparison stays fair because decoding settings match

Notes to capture after your run

  • Base perplexity:
  • Tuned perplexity:
  • Perplexity delta:
  • Best sample improvement:
  • Worst regression:
  • Next change to try: