Training Log¶
Failure → fix note¶
Failure pattern: early generations sounded repetitive and overly rehearsed.
Likely cause: the tiny corpus had repeated phrasing and the initial tuning plan was too aggressive for such a narrow dataset. In the module’s language, this looked like the beginning of narrow specialization pressure with catastrophic forgetting risk.
Fix: keep exact-duplicate removal in the data pipeline, use a conservative learning rate (5e-5), preserve a held-out split, and compare base vs tuned generations at the same temperatures.
Evidence to watch:
- held-out perplexity improves instead of only train loss
- tuned generations pick up domain vocabulary without losing basic coherence
- base vs tuned comparison stays fair because decoding settings match
Notes to capture after your run¶
- Base perplexity:
- Tuned perplexity:
- Perplexity delta:
- Best sample improvement:
- Worst regression:
- Next change to try: