05. Assignment 00 — Classical ML Interview Pack¶

Module 00. This is a refresher hands_on_lab, not a semester-long project.

Required reading first: read 02_explainer.md chapters 1-5. Use 03_study_material.md as your quick-reference sheet while building the deliverables.

Goal¶

Build a compact set of artifacts you could use to survive a first-round AI/ML interview with confidence. The deliverables should prove that you can explain classical ML, not just name-drop it.

Deliverables¶

cheat_sheet.md — one-page summary of the whole module
failure_fix_table.md — recreate the chain from 02_explainer.md §6.1 and add one row of your own
model_selection_memo.md — one-page comparison of logistic regression, random forest, and XGBoost for a tabular business problem

Required contents¶

1. `cheat_sheet.md`¶

Cover these sections in 5-12 lines each: - bias vs variance - linear regression and logistic regression - gradient descent - L1 vs L2 geometry - trees, random forests, and boosting - metrics: precision, recall, F1, ROC-AUC, PR-AUC - calibration and class imbalance - leakage and split design

2. `failure_fix_table.md`¶

Recreate at least 8 rows from 02_explainer.md §6.1
Add one extra row from your own work, coursework, or mock-interview experience
Columns must be: symptom, likely cause, first fix, validation check

3. `model_selection_memo.md`¶

Use a fictional but realistic tabular problem, such as churn, fraud, claims risk, or support escalation. Include: - the feature types - the class balance - the evaluation plan - which baseline you would train first and why - when you would prefer logistic regression, random forest, or XGBoost - which metric would drive the launch decision

Constraints¶

Keep each artifact compact and defensible
Write as if a skeptical staff engineer will question every line
No copied textbook prose
Every claim should connect to a failure mode explained in 02_explainer.md

Suggested workflow¶

Read 02_explainer.md chapter 1 and write the production-failure story in your own words.
Read chapter 2 and sketch the bias-variance curve plus L1/L2 shapes.
Read chapter 3 and complete the linear/logistic/gradient-descent section of your cheat sheet.
Read chapter 4 and draft your model-selection memo.
Read chapter 5 and choose the evaluation metrics and split logic.
Read chapter 6 and recreate the failure-fix chain from memory.

What success looks like¶

You can answer “Why did 99% training accuracy fail in production?” without rambling
You can compare logistic regression, random forest, and XGBoost crisply
You can defend your metric choice for an imbalanced task
Your memo includes split logic that mirrors deployment reality
Your failure-fix table sounds diagnostic, not academic

Common pitfalls¶

Writing formulas you cannot explain out loud
Using “accuracy” as the main metric for a rare-event problem
Forgetting calibration when the model outputs probabilities
Choosing XGBoost automatically without describing the data geometry
Recreating the failure-fix table without adding a validation check
Using target encoding or preprocessing in a leakage-prone way

Stretch version¶

After finishing the three artifacts, record a 5-minute spoken walkthrough of the memo and cheat sheet. If you hesitate badly on one topic, go back to the relevant section in 02_explainer.md: - overfitting → §1.1 - regularization geometry → §2.3 - logistic regression → §3.3 - ensembles → §4.2-§4.4 - calibration and imbalance → §5.3-§5.4

Why this hands_on_lab matters¶

The next module moves into neural networks. If this module leaves you unable to reason about gradients, loss, features, splits, or overfitting, deep learning will feel mystical. This hands_on_lab makes those foundations explicit and interview-ready.