Mathematical Derivation of Linear Regression Coefficients
Linear regression models attempt to establish a linear relationship between a dependent variable \(y\) and an independent variable \(x\) that best represents the given dataset. This relationship is expressed as:
\[ y = \beta_0 + \beta_1 x \]
Here, \(\beta_0\) represents the intercept (where the line crosses the y-axis) and \(\beta_1\) represents the slope. To find these parameters, we use the Ordinary Least Squares method.
1. The Foundation of Least Squares Method: Error Function
This method aims to minimize the sum of squared differences between the actual \(y\) values and the model's predictions. This sum is called the error function or cost function:
\[ J(\beta_0, \beta_1) = \sum_{i=1}^{n} (y_i - (\beta_0 + \beta_1 x_i))^2 \]
Where:
- \(y_i\) : Actual \(y\) values
- \(x_i\) : Independent variable values
- \(n\) : Number of data points
2. Finding the Minimum Point
To minimize the error function, we take partial derivatives with respect to \(\beta_0\) and \(\beta_1\) and set them equal to zero:
Partial derivative with respect to β₀:
\[ \frac{\partial J}{\partial \beta_0} = -2 \sum_{i=1}^{n} (y_i - \beta_0 - \beta_1 x_i) \]
Partial derivative with respect to β₁:
\[ \frac{\partial J}{\partial \beta_1} = -2 \sum_{i=1}^{n} x_i(y_i - \beta_0 - \beta_1 x_i) \]
3. Setting Derivatives to Zero
At the minimum point, these derivatives must be zero.
Equation for β₀:
\[ \frac{\partial J}{\partial \beta_0} = 0 \Rightarrow \sum_{i=1}^{n} (y_i - \beta_0 - \beta_1 x_i) = 0 \]
Let's rearrange this equation:
\[ \sum_{i=1}^{n} y_i - n\beta_0 - \beta_1 \sum_{i=1}^{n} x_i = 0 \]
\[ n\beta_0 = \sum_{i=1}^{n} y_i - \beta_1 \sum_{i=1}^{n} x_i \]
\[ \beta_0 = \frac{\sum_{i=1}^{n} y_i}{n} - \beta_1 \frac{\sum_{i=1}^{n} x_i}{n} \]
Using mean values:
\[ \beta_0 = \bar{y} - \beta_1 \bar{x} \]
Equation for β₁:
\[ \frac{\partial J}{\partial \beta_1} = 0 \Rightarrow \sum_{i=1}^{n} x_i(y_i - \beta_0 - \beta_1 x_i) = 0 \]
Expanding this equation:
\[ \sum_{i=1}^{n} x_i y_i - \beta_0 \sum_{i=1}^{n} x_i - \beta_1 \sum_{i=1}^{n} x_i^2 = 0 \]
4. Solving the System of Equations
Let's substitute the formula we found for \(\beta_0\) into the equation for \(\beta_1\):
\[ \sum_{i=1}^{n} x_i y_i - (\bar{y} - \beta_1 \bar{x}) \sum_{i=1}^{n} x_i - \beta_1 \sum_{i=1}^{n} x_i^2 = 0 \]
Let's continue rearranging:
\[ \sum_{i=1}^{n} x_i y_i - \bar{y} \sum_{i=1}^{n} x_i + \beta_1 \bar{x} \sum_{i=1}^{n} x_i - \beta_1 \sum_{i=1}^{n} x_i^2 = 0 \]
\[ \sum_{i=1}^{n} x_i y_i - \bar{y} \sum_{i=1}^{n} x_i = \beta_1 \sum_{i=1}^{n} x_i^2 - \beta_1 \bar{x} \sum_{i=1}^{n} x_i \]
Using the fact that \(\sum_{i=1}^{n} x_i = n\bar{x}\):
\[ \sum_{i=1}^{n} x_i y_i - n\bar{x}\bar{y} = \beta_1 \sum_{i=1}^{n} x_i^2 - \beta_1 n \bar{x}^2 \]
Now we can solve for \(\beta_1\):
\[ \beta_1 = \frac{\sum_{i=1}^{n} x_i y_i - n\bar{x}\bar{y}}{\sum_{i=1}^{n} x_i^2 - n \bar{x}^2} \]
5. Deriving a Simpler Formulation
Let's transform this formula into a more understandable form. First, let's rearrange the numerator:
\begin{align}
\sum_{i=1}^{n} x_i y_i - n\bar{x}\bar{y} &= \sum_{i=1}^{n} x_i y_i - \bar{y}\sum_{i=1}^{n} x_i \\
&= \sum_{i=1}^{n} (x_i y_i - \bar{y}x_i) \\
&= \sum_{i=1}^{n} x_i(y_i - \bar{y})
\end{align}
Similarly for the denominator:
\begin{align}
\sum_{i=1}^{n} x_i^2 - n \bar{x}^2 &= \sum_{i=1}^{n} x_i^2 - \bar{x}\sum_{i=1}^{n} x_i \\
&= \sum_{i=1}^{n} (x_i^2 - \bar{x}x_i) \\
&= \sum_{i=1}^{n} x_i(x_i - \bar{x})
\end{align}
Let's take one more step and note that:
\begin{align}
\sum_{i=1}^{n} x_i(y_i - \bar{y}) &= \sum_{i=1}^{n} (x_i - \bar{x} + \bar{x})(y_i - \bar{y}) \\
&= \sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y}) + \bar{x}\sum_{i=1}^{n}(y_i - \bar{y})
\end{align}
The last term is zero because \(\sum_{i=1}^{n}(y_i - \bar{y}) = 0\). Similarly, the denominator expression can be simplified. As a result:
\[ \beta_1 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2} = \frac{Cov(x,y)}{Var(x)} \]
Here, \(Cov(x,y)\) represents the covariance between x and y, and \(Var(x)\) represents the variance of x.
6. Final Formulas
Thus, with the ordinary least squares method, we obtain the following formulas for linear regression coefficients:
Slope (β₁):
\[ \beta_1 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2} = \frac{Cov(x,y)}{Var(x)} \]
Intercept (β₀):
\[ \beta_0 = \bar{y} - \beta_1 \bar{x} \]
7. Step-by-Step Calculation with an Example
Let's perform a step-by-step calculation with a small dataset:
- x = [1, 2, 3, 4, 5]
- y = [2, 3.5, 5, 6.2, 8]
Step 1: Calculate the means
\(\bar{x} = \frac{1+2+3+4+5}{5} = 3\)
\(\bar{y} = \frac{2+3.5+5+6.2+8}{5} = 4.94\)
Step 2: Calculate β₁
First, the numerator:
\((1-3)(2-4.94) + (2-3)(3.5-4.94) + (3-3)(5-4.94) + (4-3)(6.2-4.94) + (5-3)(8-4.94)\)
\(= (-2)(-2.94) + (-1)(-1.44) + (0)(0.06) + (1)(1.26) + (2)(3.06)\)
\(= 5.88 + 1.44 + 0 + 1.26 + 6.12 = 14.7\)
Then, the denominator:
\((1-3)^2 + (2-3)^2 + (3-3)^2 + (4-3)^2 + (5-3)^2\)
\(= 4 + 1 + 0 + 1 + 4 = 10\)
Therefore:
\(\beta_1 = \frac{14.7}{10} = 1.47\)
Step 3: Calculate β₀
\(\beta_0 = 4.94 - 1.47 \times 3 = 4.94 - 4.41 = 0.53\)
Step 4: Write the regression equation
\(y = 0.53 + 1.47x\)
This is the best-fitting linear model for our data. For each unit increase in x, y will increase by approximately 1.47 units, and when x=0, y will be 0.53.
Conclusion
The Ordinary Least Squares method is a mathematical technique used to find the parameters of linear regression. This method attempts to minimize the sum of squared errors and results in a line that minimizes the difference between our predictions and the actual values. The derivation steps above clearly show where these formulas come from.
This method is one of the fundamental building blocks of statistical modeling and data analysis, and forms the foundation of many modern machine learning algorithms.