05. Assignment 1 — MNIST Neural Network from Scratch¶

Week 1. No PyTorch / Keras. NumPy only.

Required reading first: 02_explainer.md chapters 1-4. The hard exercise in §6.5 (XOR-MLP from scratch) is a smaller version of this hands_on_lab. If you can do that, you can do this.

Goal¶

Train a multi-layer perceptron on MNIST. Achieve >95% test accuracy. Implement everything from scratch: - Forward pass - Backpropagation - Cross-entropy loss - ReLU activation - Softmax output - Mini-batch gradient descent

Constraints¶

No high-level libraries (no PyTorch / TF / Keras / sklearn for the model)
NumPy for matrix ops is fine
Use keras.datasets.mnist ONLY to load data (or download MNIST raw)

Required architecture¶

Input: 784 (28×28 flattened)
  ↓
Hidden: 128 (ReLU)
  ↓
Hidden: 64 (ReLU)
  ↓
Output: 10 (Softmax)

Required deliverables¶

train.py — training script
model.py — model class with forward/backward
README.md — architecture, hyperparameters, accuracy, training curves
Loss curve PNG (training loss over epochs)
Final accuracy ≥95% on test set

Hyperparameters (suggested)¶

Learning rate: 0.01
Batch size: 64
Epochs: 10-20
Optimizer: vanilla SGD (or simple SGD with momentum)

Hints¶

def relu(x): return np.maximum(0, x)
def relu_grad(x): return (x > 0).astype(float)
def softmax(x): 
    x_max = np.max(x, axis=-1, keepdims=True)
    exp_x = np.exp(x - x_max)
    return exp_x / np.sum(exp_x, axis=-1, keepdims=True)

def cross_entropy(probs, labels):
    # labels: integer class labels
    return -np.mean(np.log(probs[np.arange(len(labels)), labels]))

Common pitfalls¶

Forgetting to normalize input pixels (divide by 255)
Initializing weights too large — use sqrt(2/fan_in) for He init (see explainer §3.1)
Cross-entropy gradient: use the simplification (softmax_output - one_hot_label) (derivation in explainer §3.4)
Forgetting to shuffle data each epoch (explainer §3.3)
Dead ReLU neurons — if accuracy plateaus early, log fraction of zero activations per layer (explainer §6.4)

What to demonstrate in writeup¶

Architecture choice rationale
Loss curve showing convergence
Final accuracy
One thing that surprised you
Comparison to PyTorch baseline (~98% achievable easily)

LinkedIn post template¶

"Built a neural network from scratch this week — no PyTorch, just NumPy.

Got to 96.4% on MNIST. The hardest part wasn't the math — it was getting the matrix shapes right in backprop.

Three things that surprised me: 1. [your insight 1] 2. [your insight 2] 3. [your insight 3]

Repo: [link]"

Why this hands_on_lab matters¶

Most "AI Engineers" never built a NN from scratch. They use PyTorch / Keras and treat backprop as magic. This hands_on_lab teaches you the fundamentals so you can debug real production failures and discuss internals in interviews.