Quantifying the Financial Value of a Binary Classification Scheme

In this ongoing example, false positives (predicting a customer will default, when in fact they would not) are assigned a cost of $2500. False negatives (predicting a customer will not default, when in fact they do) are assigned a cost of $5000. This enables selection of a threshold that minimizes the average cost per transaction.

Training Data

Using the training data, the threshold results in 35 true positives, and, therefore $50-35=15$ false negatives. It also resulted in 27 false positives. The total cost when this metric is applied becomes

or $713 per “event.” The optimal threshold for the composite metric is 0.30.

Testing Data

image-center

Calculating performance using the testing data (training data is never used to judge effectiveness of a model, due to the risk of “overfitting”), the training data has a AUC of 0.77. At the previously-determined threshold of 0.30, using the test data, the model generates 30 true positives, 20 false negatives, and 33 false positives.

The total cost works out to $182,500 or $913 per event.

Baseline

The cost baseline is calculated using no applicant filtering whatsoever. Practically, this means assuming 50 of the population of 200 people are issued credit cards and then default. Effectively, those 50 are false negatives. There are obviously no false positives since the banks are not rejecting anyone in this scenario.

The banks’ cost for issuing these cards therefore works out to $250,000, or $1250 per event.

Implementing this fairly simple binary classification scheme would therefore save the bank

or $337 on every card issuing transaction going forward.


Some content from this note was taken from the spreadsheets listed below. They are distributed as part of the Mastering Data Analysis in Excel course on coursera.org, and licensed by Daniel Egger under CC BY-NC 4.0.

  • Binary-Performance-Metrics.xlsx
  • Review-of-AUC-for-ROC-Curve.xlsx
  • Forecasting-Soldier-Performance.xlsx
  • AUC_Calculator-and-Review-of-AUC-Curve.xlsx
  • Data_Final-Project.xlsx
  • Information-Gain-Calculator.xlsx

Some other content is taken from my notes on the Coursera course “Mastering Data Analysis in Excel.” It is sponsored by Duke University and the course content is presented by Professor Daniel Egger.