05. Assignment 1 — MNIST Neural Network from Scratch¶
Week 1. No PyTorch / Keras. NumPy only.
Required reading first:
02_explainer.mdchapters 1-4. The hard exercise in §6.5 (XOR-MLP from scratch) is a smaller version of this hands_on_lab. If you can do that, you can do this.
Goal¶
Train a multi-layer perceptron on MNIST. Achieve >95% test accuracy. Implement everything from scratch: - Forward pass - Backpropagation - Cross-entropy loss - ReLU activation - Softmax output - Mini-batch gradient descent
Constraints¶
- No high-level libraries (no PyTorch / TF / Keras / sklearn for the model)
- NumPy for matrix ops is fine
- Use
keras.datasets.mnistONLY to load data (or download MNIST raw)
Required architecture¶
Required deliverables¶
train.py— training scriptmodel.py— model class with forward/backwardREADME.md— architecture, hyperparameters, accuracy, training curves- Loss curve PNG (training loss over epochs)
- Final accuracy ≥95% on test set
Hyperparameters (suggested)¶
- Learning rate: 0.01
- Batch size: 64
- Epochs: 10-20
- Optimizer: vanilla SGD (or simple SGD with momentum)
Hints¶
def relu(x): return np.maximum(0, x)
def relu_grad(x): return (x > 0).astype(float)
def softmax(x):
x_max = np.max(x, axis=-1, keepdims=True)
exp_x = np.exp(x - x_max)
return exp_x / np.sum(exp_x, axis=-1, keepdims=True)
def cross_entropy(probs, labels):
# labels: integer class labels
return -np.mean(np.log(probs[np.arange(len(labels)), labels]))
Common pitfalls¶
- Forgetting to normalize input pixels (divide by 255)
- Initializing weights too large — use sqrt(2/fan_in) for He init (see explainer §3.1)
- Cross-entropy gradient: use the simplification
(softmax_output - one_hot_label)(derivation in explainer §3.4) - Forgetting to shuffle data each epoch (explainer §3.3)
- Dead ReLU neurons — if accuracy plateaus early, log fraction of zero activations per layer (explainer §6.4)
What to demonstrate in writeup¶
- Architecture choice rationale
- Loss curve showing convergence
- Final accuracy
- One thing that surprised you
- Comparison to PyTorch baseline (~98% achievable easily)
LinkedIn post template¶
"Built a neural network from scratch this week — no PyTorch, just NumPy.
Got to 96.4% on MNIST. The hardest part wasn't the math — it was getting the matrix shapes right in backprop.
Three things that surprised me: 1. [your insight 1] 2. [your insight 2] 3. [your insight 3]
Repo: [link]"
Why this hands_on_lab matters¶
Most "AI Engineers" never built a NN from scratch. They use PyTorch / Keras and treat backprop as magic. This hands_on_lab teaches you the fundamentals so you can debug real production failures and discuss internals in interviews.