Skip to content

05. Assignment 00 — Classical ML Interview Pack

Module 00. This is a refresher hands_on_lab, not a semester-long project.

Required reading first: read 02_explainer.md chapters 1-5. Use 03_study_material.md as your quick-reference sheet while building the deliverables.

Goal

Build a compact set of artifacts you could use to survive a first-round AI/ML interview with confidence. The deliverables should prove that you can explain classical ML, not just name-drop it.

Deliverables

  1. cheat_sheet.md — one-page summary of the whole module
  2. failure_fix_table.md — recreate the chain from 02_explainer.md §6.1 and add one row of your own
  3. model_selection_memo.md — one-page comparison of logistic regression, random forest, and XGBoost for a tabular business problem

Required contents

1. cheat_sheet.md

Cover these sections in 5-12 lines each: - bias vs variance - linear regression and logistic regression - gradient descent - L1 vs L2 geometry - trees, random forests, and boosting - metrics: precision, recall, F1, ROC-AUC, PR-AUC - calibration and class imbalance - leakage and split design

2. failure_fix_table.md

  • Recreate at least 8 rows from 02_explainer.md §6.1
  • Add one extra row from your own work, coursework, or mock-interview experience
  • Columns must be: symptom, likely cause, first fix, validation check

3. model_selection_memo.md

Use a fictional but realistic tabular problem, such as churn, fraud, claims risk, or support escalation. Include: - the feature types - the class balance - the evaluation plan - which baseline you would train first and why - when you would prefer logistic regression, random forest, or XGBoost - which metric would drive the launch decision

Constraints

  • Keep each artifact compact and defensible
  • Write as if a skeptical staff engineer will question every line
  • No copied textbook prose
  • Every claim should connect to a failure mode explained in 02_explainer.md

Suggested workflow

  1. Read 02_explainer.md chapter 1 and write the production-failure story in your own words.
  2. Read chapter 2 and sketch the bias-variance curve plus L1/L2 shapes.
  3. Read chapter 3 and complete the linear/logistic/gradient-descent section of your cheat sheet.
  4. Read chapter 4 and draft your model-selection memo.
  5. Read chapter 5 and choose the evaluation metrics and split logic.
  6. Read chapter 6 and recreate the failure-fix chain from memory.

What success looks like

  • You can answer “Why did 99% training accuracy fail in production?” without rambling
  • You can compare logistic regression, random forest, and XGBoost crisply
  • You can defend your metric choice for an imbalanced task
  • Your memo includes split logic that mirrors deployment reality
  • Your failure-fix table sounds diagnostic, not academic

Common pitfalls

  • Writing formulas you cannot explain out loud
  • Using “accuracy” as the main metric for a rare-event problem
  • Forgetting calibration when the model outputs probabilities
  • Choosing XGBoost automatically without describing the data geometry
  • Recreating the failure-fix table without adding a validation check
  • Using target encoding or preprocessing in a leakage-prone way

Stretch version

After finishing the three artifacts, record a 5-minute spoken walkthrough of the memo and cheat sheet. If you hesitate badly on one topic, go back to the relevant section in 02_explainer.md: - overfitting → §1.1 - regularization geometry → §2.3 - logistic regression → §3.3 - ensembles → §4.2-§4.4 - calibration and imbalance → §5.3-§5.4

Why this hands_on_lab matters

The next module moves into neural networks. If this module leaves you unable to reason about gradients, loss, features, splits, or overfitting, deep learning will feel mystical. This hands_on_lab makes those foundations explicit and interview-ready.