Uncertainty and Bayesian Logical Data Analysis

Professor Egger opens his Mastering Data Analysis in Excel course with a brief discussion of practical means for reducing the uncertainty inherent in making decisions based upon data analysis. This is my brief synopsis of his thoughts.

The purpose for data analysis is to reduce the uncertainty associated with human decision making. As has been summarized in other notes, the goal of any business analysis project essentially boils down to one of these three: increasing revenues, maximizing profitability, or reducing risk. Given that actions will not be undertaken if they do not help the business, reducing uncertainty about the probable outcome of an action is always an improvement. In business, everyone is uncertainty averse. Nonetheless there are very few (possibly zero) business actions with risk-free, totally certain outcomes.

The best tac is to approach a given business decision with enough realism about uncertainty that you avoid excessive risks, without overestimating uncertainty to the point that there is excessive delay due to paralysis by analysis.

One of the problems Egger identifies is a general lack of accountability in data analytic problems. By this he means that there is no way for people outside the analytics group to determine how much uncertainty reduction is actually offered by the data analytics team’s answer. Generally speaking, the team itself doesn’t know either.

As a practical means of reducing uncertainty, Egger recommends that data analytics groups measure uncertainty twice. First, when the team deems it has enough information to draw a good conclusion. And second, when the team has done additional work using the best methods available or gathered additional data to validate the initial conclusion. Then, the result can be presented to the relevant decision makers with the most realistic possible assessment of the remaining risk.

Quantifying uncertainty is the province of a field called information theory. The specific methods are called Bayesian Logical Data Analysis. These methods were initially developed by physicists, who built upon Claude Shannon’s definition of information, which he published in 1948. Bayesian logical data analysis underlies today’s work in machine learning and artificial intelligence.

Egger is led by the firm conviction that these relatively complex ideas should be taught earlier in the education process for data analytics professionals. He has designed his Excel course accordingly. It is organized around methods that create accountability in data science by providing rigorous and consistent means of quantifying uncertainty.

This content is taken from my notes on the Coursera course “Mastering Data Analysis in Excel.” It is sponsored by Duke University and the course content is presented by Professor Daniel Egger.