Quantifying the Value of a Binary Classifier

19 Jul 2018

In this example, false positives (predicting a customer will default, when in fact they would not) are assigned a cost of $2500. False negatives (predicting a customer will not default, when in fact they do) are assigned a cost of $5000. This enables selection of a threshold that minimizes the average cost per transaction.

Training Data

Using the training data, the threshold results in 35 true positives, and, therefore $50-35=15$ false negatives. It also resulted in 27 false positives. The total cost when this metric is applied becomes

$$(15)(\frac{$5000}{false\ negative})+(27)(\frac{$2500}{false\ positive})=$142,500$$

$$\frac{$142,500}{200\ events}=$713\ per\ event$$

or $713 per “event.” The optimal threshold for the composite metric is 0.30.

Testing Data

Calculating performance using the testing data (training data is never used to judge effectiveness of a model, due to the risk of “overfitting”), the training data has a AUC of 0.77. At the previously-determined threshold of 0.30, using the test data, the model generates 30 true positives, 20 false negatives, and 33 false positives.

$$(20)(\frac{$5000}{false\ negative})+(33)(\frac{$2500}{false\ positive})=$182,500$$

$$\frac{$182,500}{200\ events}=$913\ per\ event$$

The total cost works out to $182,500 or $913 per event.

Baseline

The cost baseline is calculated using no applicant filtering whatsoever. Practically, this means assuming 50 of the population of 200 people are issued credit cards and then default. Effectively, those 50 are false negatives. There are obviously no false positives since the banks are not rejecting anyone in this scenario.

$$(50)(\frac{$5000}{false\ negative})+(0)(\frac{$2500}{false\ positive})=$250,000$$

$$\frac{$250,000}{200\ events}=$1250\ per\ event$$

The banks' cost for issuing these cards therefore works out to $250,000, or $1250 per event.

Implementing this fairly simple binary classification scheme would therefore save the bank

$$$1250-$913$$

or $337 on every card issuing transaction going forward.

Some content in these notes is taken from the spreadsheets listed below. They are accessible as part of the Mastering Data Analysis in Excel course on coursera.org, and licensed by Daniel Egger under CC BY-NC 4.0.

Binary-Performance-Metrics.xlsx
Review-of-AUC-for-ROC-Curve.xlsx
Forecasting-Soldier-Performance.xlsx
AUC_Calculator-and-Review-of-AUC-Curve.xlsx
Data_Final-Project.xlsx
Information-Gain-Calculator.xlsx

Some other content is taken from my notes on other aspects of the Coursera course.