Training Log¶

Failure → fix note¶

Failure pattern: early generations sounded repetitive and overly rehearsed.

Likely cause: the tiny corpus had repeated phrasing and the initial tuning plan was too aggressive for such a narrow dataset. In the module’s language, this looked like the beginning of narrow specialization pressure with catastrophic forgetting risk.

Fix: keep exact-duplicate removal in the data pipeline, use a conservative learning rate (5e-5), preserve a held-out split, and compare base vs tuned generations at the same temperatures.

Evidence to watch:

held-out perplexity improves instead of only train loss
tuned generations pick up domain vocabulary without losing basic coherence
base vs tuned comparison stays fair because decoding settings match

Notes to capture after your run¶

Base perplexity:
Tuned perplexity:
Perplexity delta:
Best sample improvement:
Worst regression:
Next change to try: