Least Squares Linear Regression

Linear models are one of the simplest kinds of Supervised Learning models. A linear model expresses the target output value in terms of a sum of weighted input variables.

For example, the expected market value of a house might be expressed as:

$$\hat{y_{price}}=212,000 + 109 X_{tax} - 2000 X_{age}$$

Then, a house with feature values $(x_{tax}, x_{age})$ = $(10000, 75)$ would have a predicted selling price of:

$$\hat{y_{price}}=212,000 + 109 \times 10,000 - 2000 \times 75 = 1,152,000$$

Linear Regression Model

More precisely, a linear model consists of the following parameters.

Feature Vector

$$x = (x_0, x_1, …, x_n)$$

Predicted Output

$$\hat{y} = \hat{w_0} x_0 + \hat{w_1} x_1 + … + \hat{w_n} x_n + \hat{b}$$

Parameters to estimate

Feature weights/model coefficients: $$\hat{w} = (\hat{w_0}, …, \hat{w_n})$$

Constant bias term/intercept: $$\hat{b}$$

A Linear Regression Model with one Variable (Feature)

Input Instance

$$\textbf{x} = (x_0)$$

Predicted Output

$$\hat{y}=\hat{w_0}x_0 + \hat{b}$$

Parameters to Estimate

Slope:

$$\hat{w_0}$$

Y-intercept (or “bias”):

$$\hat{b}$$

How are the linear regression parameters $w$, and $b$ estimated?

  • Parameters are estimated from training data.
  • There are many different ways to estimate $w$ and $b$:
    • Differen methods correspond to different “fit” criteria and goals and means of controlling model complexity.
  • The learning algorithm finds the parameters that optimize an objective function, typically to minimize some kind of lose function of the predicted target values versus actual target values.

Least-Squares Linear Regression (“Ordinary least-squares”)

  • Finds $w$ and $b$ that minimizes the sum of squared differences (RSS) over the training data between predicted target and actual target values, AKA, mean squared error of the linear model.
  • No parameters to control model complexity.
    • No matter the values of $w$ and $b$, the result will always be a straight line.

$$RSS(w,b)=\sum_{(i=1)}^N (y_i-(w \cdot x_i + b))^2$$

%matplotlib notebook

import numpy as np
import matplotlib
import pandas as pd
import sklearn

print('       Numpy version ' + str(np.__version__))
print('  Matplotlib version ' + str(matplotlib.__version__))
print('      Pandas version ' + str(pd.__version__))
print('     Sklearn version ' + str(sklearn.__version__))
       Numpy version 1.18.1
  Matplotlib version 3.1.2
      Pandas version 1.0.0
     Sklearn version 0.22.1
import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression
X_R1, y_R1 = make_regression(n_samples = 100, n_features=1,
                             n_informative=1, bias = 150.0,
                             noise = 30, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X_R1, 
                                                    y_R1, 
                                                    random_state=0)
linreg = LinearRegression().fit(X_train, y_train)

print('linear model coeff (w): {}'
      .format(linreg.coef_))
print('linear model intercept (b): {:.3f}'
      .format(linreg.intercept_))
print('R-squared score (training): {:.3f}'
      .format(linreg.score(X_train, y_train)))
print('R-squared score (test): {:.3f}'
      .format(linreg.score(X_test, y_test)))
linear model coeff (w): [45.70870465]
linear model intercept (b): 148.446
R-squared score (training): 0.679
R-squared score (test): 0.492

The R-squared values above scores the quality of the regression model, in the same way as the K-nearest neighbors post.

plt.figure(figsize=(5,4))
plt.scatter(X_R1, y_R1, marker= 'o', s=50, alpha=0.8)
plt.plot(X_R1, linreg.coef_ * X_R1 + linreg.intercept_, 'r-')
plt.title('Least-squares linear regression')
plt.xlabel('Feature value (x)')
plt.ylabel('Target value (y)')
plt.show()
<IPython.core.display.Javascript object>

Linear models make strong assumptions about the structure of the data. Specifically, linear models assume that the target value can be predicted using a weighted sum of the input variables.