Quantifying the Info Gain from a Classifier

The goal for this section is to quantify the information gain of this model and compare it on an apples-to-apples basis to a hypothetical competing model.

Model Characteristics

I previously established the following performance characteristics for this model:

true positive count=.150

true positive rate=.600

positive predictive value=.480

Information gain measures are discussed here. The Venn diagram defining the types of information is linked below.

Mutual Information

Mutual information between default and the test is calculated as follows. This is the entropy of the original base rate minus the conditional entropy of default given the test classification.

I(X;Y)=H(X)H(X|Y)

H(X), the entropy of the original base rate, is calculated as follows. The letters A through H, used in the formulae below, are as defined in the image here.

H(X)=(A)(log2 1A) + (B)(log2 1B)

H(X)=0.8113

H(X|Y), the conditional entropy of default given the test classification, is calculated as

H(X|Y)=H(EC,GC)(C)+H(FD,HD)(D)

H(X|Y)=0.7253

Then, I(X;Y) is calculated as

I(X;Y)=0.81130.7253=0.0860

Recall that the units for this value is mutual information, or “information gain,” in average bits per event.

Percent Information Gain (P.I.G.)

The Percent Information Gain (P.I.G.) is the ratio I(X;Y)H(X), calculated as follows.

P.I.G.=I(X;Y)H(X)=0.08600.8113=10.6

Savings Per Bit

Between the savings-per-event value and the bits-per-event value, just calculated, it is possible to measure a savings-per-bit value. This concept is powerful, because it places a financial value on the information content of a model or data source.

savings-per-bit=savings-per-eventbits-per-event

The savings-per-event value of $337 was calculated previously.

The bits-per-event value, or mutual information, value of 0.0860 was calculated above.

Thus the savings-per-bit is given by

savings-per-bit=$3370.0860=$3919 per bit

Alternative Model Characteristics

A hypothetical competing model has the following performance characteristics. It outperforms my model.

true positive count=.180

true positive rate=.720

positive predictive value=.480

Mutual Information

I(X;Y)=H(X)H(X|Y)

I(X;Y)=0.81130.6908=0.1205

Percent Information Gain (P.I.G.)

P.I.G.=I(X;Y)H(X)=0.12050.8113=14.85

Savings Per Bit

The cost per event for this alternative schema is $838. The savings per event is calculated as $1250$838=$412.

savings-per-bit=$4120.1205=$3419 per bit

Points of Comparison

The following table establishes the important points of comparison between the competing models.

Parameters My Model Alternative Model
Mutual Information 0.0860 0.1205
Percent Information Gain 10.6% 14.9%
Cost per Event $913 $838
Savings per Bit $3919 $3419

A few important differential comparisons are possible.

The incremental information gain of the alternative model over my model is

0.12050.0860=0.345 bits per event

If my model was available to an organization, the maximum price that the organization should be willing to pay for the alternative model is

$913838=$75 per event

At this maximum break-even price per score, the incremental value per bit from the alternative model is

$75 per event0.1205 bits per event=$622 per bit