Assignment 5 — GPT-2 Domain Fine-Tune¶
This folder implements the Week 5 hands_on_lab from ../05_hands_on_lab.md.
What is included¶
train.py— fine-tuning script using Hugging FaceTrainereval.py— held-out perplexity evaluation for base vs tuned checkpointsgenerate.py— before/after sample generation at two temperaturesconfig.yaml— model, data, and training hyperparametersTRAINING_LOG.md— one failure → fix note in the module’s languagedata/domain_corpus.txt— a narrow corpus of AI-platform runbook notesdata/prompts.json— three prompts for qualitative comparison
Dataset¶
The local corpus is intentionally narrow:
- AI platform runbooks
- support-assistant operating guidelines
- prompt, eval, and incident-response notes
That gives the model a visible vocabulary target: latency, citations, prompt templates, catastrophic forgetting, escalation, held-out evals.
Training setup¶
The default config targets gpt2 for the real hands_on_lab.
For local smoke tests on CPU, override the model with sshleifer/tiny-gpt2.
Key defaults:
- learning rate:
5e-5 - batch size:
2 - gradient accumulation:
4 - effective batch size:
8 - sequence length:
128 - epochs:
2 - warmup ratio:
0.1
This is a conservative setup on purpose. The module explainer warns that tiny corpora plus high LR can trigger catastrophic forgetting quickly.
Commands¶
Real hands_on_lab path¶
python3 train.py --config config.yaml
python3 eval.py --config config.yaml
python3 generate.py --config config.yaml
Local smoke-test path¶
python3 train.py --config config.yaml --model-name sshleifer/tiny-gpt2 --output-dir outputs/tiny_gpt2_smoke --max-train-samples 10 --max-eval-samples 2
python3 eval.py --config config.yaml --base-model-name sshleifer/tiny-gpt2 --tuned-model-path outputs/tiny_gpt2_smoke --max-eval-samples 2
python3 generate.py --config config.yaml --base-model-name sshleifer/tiny-gpt2 --tuned-model-path outputs/tiny_gpt2_smoke
Smoke validation snapshot¶
The local tiny-model smoke run completed successfully in this workspace.
- base eval loss:
10.8128586 - tuned eval loss:
10.8124733 - base perplexity:
49655.21 - tuned perplexity:
49636.08 - perplexity delta:
19.13
That is only a sanity check, not a meaningful domain-tuning claim.
sshleifer/tiny-gpt2 is too small for high-quality generations here, but it proves the train/eval/generate path works end to end.
What to write up after a full run¶
- Base-model perplexity on the held-out split
- Tuned-model perplexity on the held-out split
- Perplexity delta
- Three before/after generations at matched decoding settings
- One failure → fix note
- One sentence on catastrophic-forgetting risk
Why this hands_on_lab is mechanically useful¶
This is not frontier pretraining. It is a small post-training exercise that makes the knobs concrete:
- data cleaning
- formatting consistency
- LR choice
- warmup
- effective batch size
- held-out evaluation
If those words still feel abstract, run this folder end to end and inspect the saved metrics JSON.