Regression Model Evaluation

There are several metrics we can use to evaluate regression models:

  • Mean Absolute Error
  • Mean Squared Error
  • R2 Score

Mean Absolute Error

The Mean Absolute Error, is the sum of the absolute values of the distances from the points to the line.

import numpy as np

from sklearn.metrics import mean_absolute_error
from sklearn.linear_model import LinearRegression

X = np.array([1, 2, 3, 3, 4]).reshape(-1,1)
y = np.array([3, 4, 3, 4, 3]).reshape(-1,1)

classifier = LinearRegression(),y)

guesses = classifier.predict(X)

mean_absolute_error(y, guesses)

Mean Absolute Error can also be implemented manually, as follows:

def mae(actual, preds):
    return np.sum(abs(actual - preds))/len(actual)

mae(y, guesses)

Mean Squared Error

One problem with mean absolute error is that the absolute value function is not differentiable. This a problem if we intend to us methods such as gradient descent. To solve this problem, we use the more common Mean Squared Error.

Note that optimizing on the mean squared error will always lead to the same ‘best’ model as if you were to optimize on the r2 value. The mean squared error is by far the most used metric for optimization in regression problems.

MSE can be thought of as the average amount that you miss by across all the points.

from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LinearRegression

classifier = LinearRegression(),y)

guesses = classifier.predict(X)

mean_squared_error(y, guesses)

Mean Squared Error can also be implemented manually, as follows:

def mse(actual, preds):
    return np.sum((actual - preds)**2)/len(actual)

mse(y, guesses)

R2 Score

The R2 Score is based on comparing our model to the simplest possible model: a horizontal line. Drawing this horizontal line is straightforward, as it only requires taking the average of the y-values of the points. Reasonable regression models will have mean-squared-error smaller than that of the simple model. We calculate the ratio of the former to the latter, and sutract the result from 1.

$$R2 = 1-\frac{regression_{mean\ squared\ error}}{horizontal\ line_{mean\ squared\ error}}$$

  • Bad model: R2 will be close to 0.
  • Good model: R2 will be close to 1.

Note that optimizing on the mean squared error will always lead to the same ‘best’ model as if you were to optimize on the r2 value.

The R2 value is frequently interpreted as the ‘amount of variability’ captured by a model.

from sklearn.metrics import r2_score

y_true = np.array([1, 2, 4]).reshape(-1,1)
y_pred = np.array([1.3, 2.5, 3.7]).reshape(-1,1)

r2_score(y_true, y_pred)

The R2 Score can also be implemented manually, as follows:

def r2(actual, preds):
    sse = np.sum((actual-preds)**2)
    sst = np.sum((actual-np.mean(actual))**2)
    return 1 - sse/sst

r2(y_true, y_pred)

Boston Housing Example

  • Linear Regression is one of very few machine learning models that should not be used for classification (predicting categories for data).
  • Logistic Regression, on the other hand, cannot be used to predict numeric categories.
  • Most sklearn models can be used for both regression and classification, including Decision Trees, Random Forest, and Adaptive Boosting.

Each of these model except logistic regression will be used to perform regression on the Boston Housing Dataset.

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import numpy as np

boston = load_boston()
y =
X =

X_train, X_test, y_train, y_test = train_test_split(
            X, y, test_size=0.33, random_state=42)

Import modules.

from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import AdaBoostRegressor
from sklearn.linear_model import LinearRegression

Instantiate models.

decision_tree = DecisionTreeRegressor()
random_forest = RandomForestRegressor()
adaboost = AdaBoostRegressor()
linear = LinearRegression()

Fit data.,y_train),y_train),y_train),y_train);

Make predictions on the test data.

decision_tree_preds = decision_tree.predict(X_test)
random_forest_preds = random_forest.predict(X_test)
adaboost_preds = adaboost.predict(X_test)
linear_preds = linear.predict(X_test)

Print the metrics for each of these models.

for label, preds in [('Decision Tree', decision_tree_preds),
                     ('Random Forest', random_forest_preds),
                     ('Adaboost',      adaboost_preds),
                     ('Linear',        linear_preds)]:
    r2 = r2_score(y_test, preds)
    mse = mean_squared_error(y_test, preds)
    mae = mean_absolute_error(y_test, preds)

    print('\n  --- {} ---'.format(label))
    print('           R2 Score: {:.3}'.format(r2))
    print(' Mean Squared Error: {:.3}'.format(mse))
    print('Mean Absolute Error: {:.3}'.format(mae))
  --- Decision Tree ---
           R2 Score: 0.762
 Mean Squared Error: 18.0
Mean Absolute Error: 2.89

  --- Random Forest ---
           R2 Score: 0.864
 Mean Squared Error: 10.3
Mean Absolute Error: 2.12

  --- Adaboost ---
           R2 Score: 0.789
 Mean Squared Error: 16.0
Mean Absolute Error: 2.75

  --- Linear ---
           R2 Score: 0.726
 Mean Squared Error: 20.7
Mean Absolute Error: 3.15

The Random Forest regressor was the top performer across all metrics.