Mathematical Derivation of Logistic Regression Coefficients

Logistic regression models establish a non-linear relationship between a dependent variable \(y\) and independent variable(s) \(x\), and are typically used for binary classification problems. This relationship is expressed as:

Here, \(\beta_0\) represents the intercept, \(\beta_1\) represents the slope, and \(\sigma\) represents the sigmoid function. To find these parameters, we use the Maximum Likelihood method.

1. The Foundation of Logistic Regression: Likelihood Function

While linear regression uses the least squares method, logistic regression uses the Maximum Likelihood method. This method finds parameters that maximize the probability of observing the given data:

2. Log-Likelihood

To simplify mathematical calculations, we take the logarithm of the likelihood function to use sums instead of products:

3. Finding the Maximum Point

To maximize the log-likelihood function, we take partial derivatives with respect to \(\beta_0\) and \(\beta_1\) and set them equal to zero:

Partial derivative with respect to β₀:

Partial derivative with respect to β₁:

4. Setting Derivatives to Zero

Since these equations are non-linear, they don't have a closed-form solution. Therefore, numerical optimization methods are typically used.

5. Numerical Optimization: Gradient Ascent

We can use the Gradient Ascent method to solve these equations. This is an iterative algorithm that updates parameter values at each step:

6. Newton-Raphson Method

For faster convergence, the Newton-Raphson method can be used. This method also uses second-order derivatives (the Hessian matrix):

7. Multivariate Logistic Regression

When there are multiple independent variables, the model is extended as follows:

Here \(\textbf{x} = [x_1, x_2, \ldots, x_p]^T\) and \(\boldsymbol{\beta} = [\beta_1, \beta_2, \ldots, \beta_p]^T\) are vectors.

8. Step-by-Step Calculation with an Example

Let's perform a step-by-step calculation with a small dataset:

x = [1, 2, 3, 4, 5]
y = [0, 0, 0, 1, 1]

Step 1: Set initial parameters

\(\beta_0^{(0)} = 0\), \(\beta_1^{(0)} = 0\)

Step 2: Calculate probabilities for the first iteration

For each \(x_i\), calculate \(z_i = \beta_0 + \beta_1 x_i\) and \(P(y_i=1|x_i) = \sigma(z_i)\):

\(z_1 = 0 + 0 \cdot 1 = 0\), \(P(y_1=1|x_1) = \sigma(0) = 0.5\)

\(z_2 = 0 + 0 \cdot 2 = 0\), \(P(y_2=1|x_2) = \sigma(0) = 0.5\)

\(z_3 = 0 + 0 \cdot 3 = 0\), \(P(y_3=1|x_3) = \sigma(0) = 0.5\)

\(z_4 = 0 + 0 \cdot 4 = 0\), \(P(y_4=1|x_4) = \sigma(0) = 0.5\)

\(z_5 = 0 + 0 \cdot 5 = 0\), \(P(y_5=1|x_5) = \sigma(0) = 0.5\)

Step 3: Calculate derivatives

\(\frac{\partial \ell}{\partial \beta_0} = \sum_{i=1}^{5} [y_i - \sigma(z_i)] = (0-0.5) + (0-0.5) + (0-0.5) + (1-0.5) + (1-0.5) = -0.5\)

\(\frac{\partial \ell}{\partial \beta_1} = \sum_{i=1}^{5} [x_i(y_i - \sigma(z_i))] = 1(0-0.5) + 2(0-0.5) + 3(0-0.5) + 4(1-0.5) + 5(1-0.5) = -0.5 - 1 - 1.5 + 2 + 2.5 = 1.5\)

Step 4: Update parameters (with α = 0.1)

\(\beta_0^{(1)} = \beta_0^{(0)} + 0.1 \cdot \frac{\partial \ell}{\partial \beta_0} = 0 + 0.1 \cdot (-0.5) = -0.05\)

\(\beta_1^{(1)} = \beta_1^{(0)} + 0.1 \cdot \frac{\partial \ell}{\partial \beta_1} = 0 + 0.1 \cdot 1.5 = 0.15\)

This process is repeated until convergence is achieved.

Final approximate values (after several iterations)

\(\beta_0 \approx -3.0\), \(\beta_1 \approx 1.0\)

Logistic regression model

\(P(y=1|x) = \frac{1}{1 + e^{-(-3.0 + 1.0x)}}\)

This model predicts low probability (approximately 0) for low x values and high probability (approximately 1) for high x values. The threshold value is approximately x = 3, which aligns with our dataset.

9. Geometric Interpretation of Logistic Regression

Logistic regression divides the feature space with a decision boundary that separates the two classes. In the binary classification case, this boundary consists of points that satisfy the equation:

10. Regularization

To prevent overfitting, a penalty term can be added to the log-likelihood function:

L2 Regularization (Ridge):

L1 Regularization (Lasso):

Conclusion

Logistic regression is a powerful statistical method for classification problems. Unlike linear regression, it works within a probability framework and estimates parameters using the Maximum Likelihood method.

Despite its simplicity, this method forms a fundamental building block in the field of machine learning and is an important step in understanding more complex algorithms.