05. Assignment 00 — Classical ML Interview Pack¶
Module 00. This is a refresher hands_on_lab, not a semester-long project.
Required reading first: read
02_explainer.mdchapters 1-5. Use03_study_material.mdas your quick-reference sheet while building the deliverables.
Goal¶
Build a compact set of artifacts you could use to survive a first-round AI/ML interview with confidence. The deliverables should prove that you can explain classical ML, not just name-drop it.
Deliverables¶
cheat_sheet.md— one-page summary of the whole modulefailure_fix_table.md— recreate the chain from02_explainer.md§6.1 and add one row of your ownmodel_selection_memo.md— one-page comparison of logistic regression, random forest, and XGBoost for a tabular business problem
Required contents¶
1. cheat_sheet.md¶
Cover these sections in 5-12 lines each: - bias vs variance - linear regression and logistic regression - gradient descent - L1 vs L2 geometry - trees, random forests, and boosting - metrics: precision, recall, F1, ROC-AUC, PR-AUC - calibration and class imbalance - leakage and split design
2. failure_fix_table.md¶
- Recreate at least 8 rows from
02_explainer.md§6.1 - Add one extra row from your own work, coursework, or mock-interview experience
- Columns must be: symptom, likely cause, first fix, validation check
3. model_selection_memo.md¶
Use a fictional but realistic tabular problem, such as churn, fraud, claims risk, or support escalation. Include: - the feature types - the class balance - the evaluation plan - which baseline you would train first and why - when you would prefer logistic regression, random forest, or XGBoost - which metric would drive the launch decision
Constraints¶
- Keep each artifact compact and defensible
- Write as if a skeptical staff engineer will question every line
- No copied textbook prose
- Every claim should connect to a failure mode explained in
02_explainer.md
Suggested workflow¶
- Read
02_explainer.mdchapter 1 and write the production-failure story in your own words. - Read chapter 2 and sketch the bias-variance curve plus L1/L2 shapes.
- Read chapter 3 and complete the linear/logistic/gradient-descent section of your cheat sheet.
- Read chapter 4 and draft your model-selection memo.
- Read chapter 5 and choose the evaluation metrics and split logic.
- Read chapter 6 and recreate the failure-fix chain from memory.
What success looks like¶
- You can answer “Why did 99% training accuracy fail in production?” without rambling
- You can compare logistic regression, random forest, and XGBoost crisply
- You can defend your metric choice for an imbalanced task
- Your memo includes split logic that mirrors deployment reality
- Your failure-fix table sounds diagnostic, not academic
Common pitfalls¶
- Writing formulas you cannot explain out loud
- Using “accuracy” as the main metric for a rare-event problem
- Forgetting calibration when the model outputs probabilities
- Choosing XGBoost automatically without describing the data geometry
- Recreating the failure-fix table without adding a validation check
- Using target encoding or preprocessing in a leakage-prone way
Stretch version¶
After finishing the three artifacts, record a 5-minute spoken walkthrough of the memo and cheat sheet. If you hesitate badly on one topic, go back to the relevant section in 02_explainer.md:
- overfitting → §1.1
- regularization geometry → §2.3
- logistic regression → §3.3
- ensembles → §4.2-§4.4
- calibration and imbalance → §5.3-§5.4
Why this hands_on_lab matters¶
The next module moves into neural networks. If this module leaves you unable to reason about gradients, loss, features, splits, or overfitting, deep learning will feel mystical. This hands_on_lab makes those foundations explicit and interview-ready.