Gradient Descent Algorithm Visualization

This tutorial visualizes the gradient descent algorithm for finding the minimum of a quadratic function. The example demonstrates how gradient descent iteratively approaches the optimal solution.

Current Parameters

Weight (W): 0

Loss L(W): 0

Gradient dL/dW: 0

Iteration: 0

Iterations Log

Iteration Weight (W) Loss L(W) Gradient dL/dW |dL/dW|

How Gradient Descent Works

Gradient descent is an optimization algorithm used to minimize a function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient.

The Algorithm:

  1. Start with an initial parameter value (weight)
  2. Calculate the gradient (derivative) of the loss function with respect to the parameter
  3. Update the parameter by moving in the opposite direction of the gradient
  4. Repeat until convergence

Implementation in JavaScript:

// Function we want to minimize: L(W) = 5*(W-14)^2+10
function lossFunction(w) {
  return 5 * Math.pow(w - 14, 2) + 10;
}

// Derivative of the loss function: dL/dW = 10*(W-14)
function gradientFunction(w) {
  return 10 * (w - 14);
}

function gradientDescent(initialW, learningRate, maxIterations) {
  let w = initialW;
  let iterations = [];
  
  for (let i = 0; i < maxIterations; i++) {
    // Calculate current loss and gradient
    let loss = lossFunction(w);
    let gradient = gradientFunction(w);
    
    // Store current state
    iterations.push({
      iteration: i,
      w: w,
      loss: loss,
      gradient: gradient,
      absGradient: Math.abs(gradient)
    });
    
    // Stop if gradient is very small (convergence)
    if (Math.abs(gradient) < 0.001) {
      break;
    }
    
    // Update weight by moving in the opposite direction of the gradient
    w = w - learningRate * gradient;
  }
  
  return iterations;
}
  

The Math Behind Our Example

In our example, we're minimizing the function:

L(W) = 5*(W-14)²+10

The derivative (gradient) is:

dL/dW = 10*(W-14)

For gradient descent, we update W using the rule:

Wnew = Wold - learning_rate * gradient

This function has its minimum at W = 14, where the derivative equals zero:

10*(W-14) = 0 gives W = 14