Skip to content

05. Assignment 1 — MNIST Neural Network from Scratch

Week 1. No PyTorch / Keras. NumPy only.

Required reading first: 02_explainer.md chapters 1-4. The hard exercise in §6.5 (XOR-MLP from scratch) is a smaller version of this hands_on_lab. If you can do that, you can do this.

Goal

Train a multi-layer perceptron on MNIST. Achieve >95% test accuracy. Implement everything from scratch: - Forward pass - Backpropagation - Cross-entropy loss - ReLU activation - Softmax output - Mini-batch gradient descent

Constraints

  • No high-level libraries (no PyTorch / TF / Keras / sklearn for the model)
  • NumPy for matrix ops is fine
  • Use keras.datasets.mnist ONLY to load data (or download MNIST raw)

Required architecture

Input: 784 (28×28 flattened)
Hidden: 128 (ReLU)
Hidden: 64 (ReLU)
Output: 10 (Softmax)

Required deliverables

  1. train.py — training script
  2. model.py — model class with forward/backward
  3. README.md — architecture, hyperparameters, accuracy, training curves
  4. Loss curve PNG (training loss over epochs)
  5. Final accuracy ≥95% on test set

Hyperparameters (suggested)

  • Learning rate: 0.01
  • Batch size: 64
  • Epochs: 10-20
  • Optimizer: vanilla SGD (or simple SGD with momentum)

Hints

def relu(x): return np.maximum(0, x)
def relu_grad(x): return (x > 0).astype(float)
def softmax(x): 
    x_max = np.max(x, axis=-1, keepdims=True)
    exp_x = np.exp(x - x_max)
    return exp_x / np.sum(exp_x, axis=-1, keepdims=True)

def cross_entropy(probs, labels):
    # labels: integer class labels
    return -np.mean(np.log(probs[np.arange(len(labels)), labels]))

Common pitfalls

  • Forgetting to normalize input pixels (divide by 255)
  • Initializing weights too large — use sqrt(2/fan_in) for He init (see explainer §3.1)
  • Cross-entropy gradient: use the simplification (softmax_output - one_hot_label) (derivation in explainer §3.4)
  • Forgetting to shuffle data each epoch (explainer §3.3)
  • Dead ReLU neurons — if accuracy plateaus early, log fraction of zero activations per layer (explainer §6.4)

What to demonstrate in writeup

  • Architecture choice rationale
  • Loss curve showing convergence
  • Final accuracy
  • One thing that surprised you
  • Comparison to PyTorch baseline (~98% achievable easily)

LinkedIn post template

"Built a neural network from scratch this week — no PyTorch, just NumPy.

Got to 96.4% on MNIST. The hardest part wasn't the math — it was getting the matrix shapes right in backprop.

Three things that surprised me: 1. [your insight 1] 2. [your insight 2] 3. [your insight 3]

Repo: [link]"

Why this hands_on_lab matters

Most "AI Engineers" never built a NN from scratch. They use PyTorch / Keras and treat backprop as magic. This hands_on_lab teaches you the fundamentals so you can debug real production failures and discuss internals in interviews.