MNIST Neural Network Animation

Input Digit

3
Prediction: 3 (Confidence: 92%)

How Does MNIST Neural Network Work?

This animation demonstrates how a neural network recognizes a handwritten digit. The model consists of 4 layers:

  • Input Layer: 28x28 pixel handwritten image is converted into a vector of 784 neurons.
  • Hidden Layers: Contains 512, 256, and 128 neurons respectively, and each layer processes information from the previous layer.
  • Output Layer: Contains 10 neurons, each representing digits 0-9.

Between each layer, ReLU is used, and in the final layer log_softmax activation function is used. log_softmax is the logarithm of the softmax function. The softmax function converts outputs to probability values between 0 and 1, and their sum equals 1. This process is used to determine which digit has the highest probability.

In the animation, active neurons are shown in red, and important connections are highlighted. Additionally, the probability of each digit is represented with a bar, and the digit with the highest probability is shown as the "prediction".

Python Code and Explanations

Below is an example of a simple fully connected neural network (MNISTNet) written with PyTorch and line-by-line explanations. This code takes 28x28 MNIST images (784 pixels) as input, passes them through three hidden layers, and returns logarithmic probability values for 10 different digits (0-9).

import torch
import torch.nn as nn           # PyTorch's basic neural network modules
import torch.nn.functional as F # Activation functions and other helper functions

class MNISTNet(nn.Module):
    """
    MNISTNet is a simple fully connected neural network architecture
    designed for the MNIST dataset.
    """
    def __init__(self):
        # Calling the initializer of the nn.Module class
        super(MNISTNet, self).__init__()

        # Input layer: 28x28 pixels -> vector of size 784
        # 1st Hidden layer: 784 -> 512
        self.fc1 = nn.Linear(784, 512)

        # 2nd Hidden layer: 512 -> 256
        self.fc2 = nn.Linear(512, 256)

        # 3rd Hidden layer: 256 -> 128
        self.fc3 = nn.Linear(256, 128)

        # Output layer: 128 -> 10 (digit classes from 0 to 9)
        self.fc4 = nn.Linear(128, 10)

        # Dropout layer to reduce overfitting (dropout rate 20%)
        self.dropout = nn.Dropout(0.2)

    def forward(self, x):
        """
        In the forward pass method, we define the sequence in which data is passed through the network.
        x: MNIST images in the shape [batch_size, 1, 28, 28]
        """
        # 1) Flatten the input data: [batch_size, 784]
        x = x.view(-1, 28 * 28)

        # 2) First layer + ReLU activation
        x = F.relu(self.fc1(x))
        # Apply dropout to disable certain neurons
        x = self.dropout(x)

        # 3) Second layer + ReLU activation
        x = F.relu(self.fc2(x))
        x = self.dropout(x)

        # 4) Third layer + ReLU activation
        x = F.relu(self.fc3(x))
        x = self.dropout(x)

        # 5) Output layer
        x = self.fc4(x)

        # 6) Converting output values to probability distribution with log_softmax
        # dim=1: calculates log_softmax row-wise (for each example)
        return F.log_softmax(x, dim=1)

During the training of this model, a cross-entropy-like loss function (e.g. nn.NLLLoss or nn.CrossEntropyLoss) is used. When training is complete, the model produces a "prediction" (the digit with the highest probability) for each input image.