Unlike ANN:
- Input is not one number
- Input is an image (28×28 pixels)
- Output is 10 classes (0–9)
Here, depth + non-linearity matters.
- Grayscale images
- Size: 28 × 28
- Pixel values: 0–255
- Labels: digits 0 → 9
import tensorflow as tf
# Load MNIST
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# Normalize (VERY IMPORTANT)
x_train = x_train / 255.0
x_test = x_test / 255.0Pixels become 0–1, so learning is stable.
model = tf.keras.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])Let’s break this fully.
Before:
28 × 28 image
After:
784 numbers → [x1, x2, x3, ... x784]
✅ Converts image → vector ✅ Required for Dense layers
- 128 neurons
- Each neuron looks at ALL 784 pixels
- Learns simple patterns
ReLU:
max(0, x)
✅ Adds non-linearity
- Combines features from layer 1
- Learns more complex digit shapes
This is hierarchical learning ✅
- 10 neurons → digits 0–9
- Outputs probabilities
Example output:
[0.01, 0.02, 0.90, 0.01, ...]
✅ Highest probability = predicted digit
model.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)- Smarter than SGD
- Auto-adjusts learning rate
- Integer labels (0–9)
- Matches softmax output
model.fit(
x_train, y_train,
epochs=5,
validation_split=0.1
)Expected accuracy:
- Train: ~98%
- Validation: ~97%
model.evaluate(x_test, y_test)Accuracy ~97–98% ✅
| Layer | Learned |
|---|---|
| Flatten | pixels |
| Dense 128 | edges, curves |
| Dense 64 | digit parts |
| Output | digit class |
This is deep feature learning 🧠🔥
If you increase layers too much:
- Train accuracy ↑
- Test accuracy ↓ ❌
Fix using:
- Dropout
- Regularization
- Less depth
✔ Difference between ANN & DNN ✔ Why depth matters ✔ Non-linearity ✔ Multi-class classification ✔ Softmax + crossentropy

