Probability Density Functions

19 Jul 2018

As there are many ways of describing data sets, there are analogous ways of describing histogram representations of data.

These representations are termed “discrete” probability distributions, as distinct from “continuous” probability distributions which are also known as “Probability Density Functions” and discussed later.

Histograms

A histogram is a way of visually representing sets of data. Specifically, it is a bar chart of frequency in which data appears within certain ranges (or “bins”).

In general, there are two types of histograms. The first is called a “Frequency histogram” and has counts of occurrences on the Y-axis. The second is called a “Probability histogram” and has probabilities on the Y-axis. Examples of both types, generated from the same set of 125 data points, are shown below.

The data are a series of 125 values, the first five of which are: $- 0.023, - 0.095, - 0.204, 0.004, - 0.063$ .

Bin	Low	High	Count	“Probability,” p_i	Midpoint, x_i
1	-0.3	-0.2	1	0.008	-0.25
2	-0.2	-0.1	4	0.032	-0.15
3	-0.1	0	58	0.464	-0.05
4	0	0.1	59	0.472	0.05
5	0.1	0.2	3	0.024	0.15

Histogram Descriptions

The example used for the following calculations is the “Probability Histogram” with red bars, above.

$p_{i}$ is the probability associated with each bin. $x_{i}$ is the midpoint of each bin. $μ$ is the weighted average mean of the bins, calculated as follows.

Mean

$mean = μ = \sum_{i = 1}^{b i n c o u n t} (p_{i}) (x_{i})$

$μ_{x} = \frac{1}{n} \sum_{i = 1}^{n} x_{i} = 44.67$

In this case, it is calculated as follows.

$μ = (0.008) (- 0.25) + \dots + (0.024) (0.15) = - 0.0028$

Variance

The variance of a bin is calculated as follows. This parameter is also known as the 2nd moment about the mean. The standard deviation of the histogram can be calculated as the squire root of the variance.

$variance = \sum_{i = 1}^{b i n c o u n t} p_{i} (x_{i} - μ)^{2}$

For this histogram, the variance is 0.0041.

Skewness

Skewness is the 3rd moment about the mean.

$s k e w n e s s = \sum_{i = 1}^{b i n c o u n t} p_{i} (x_{i} - μ)^{3}$

For this histogram, the skewness is -0.00012.

Kurtosis

Kurtosis is the 4th moment about the mean.

$kurtosis = \sum_{i = 1}^{b i n c o u n t} p_{i} (x_{i} - μ)^{4}$

Probability Density Functions

Continuous probability distributions have the property of always having area 1 under the curve. Expressed mathematically, this is written:

$\int f (x) d x = 1$

The properties for continuous probability distributions are analogous to those discrete distributions:

$mean = \int f (x) x d x$

$variance = \int f (x) (x - μ)^{2} d x$

$skewness = \int f (x) (x - μ)^{3} d x$

Important and Common PDFs

Many phenomenon described by data closely approximate certain distributions. In many situations, there are certain distributions that should be chosen, if we are ignorant of the true distribution. These representations are most appropriate because they are the representations that have maximum entropy (or uncertainty).

Support Set: the set over which the distribution is defined.

Uniform Continuous Distribution

Support set: real number line from a to b

$f (x) = \frac{1}{b - a}$

$mean = \frac{a + b}{2}$

$variance = \frac{(b - a)^{2}}{12}$

$skewness = 0$

$entropy = l o g_{2} (b - a)$

The uniform continuous distribution is the maximum entropy distribution for $a$ and $b$ finite and known.

Uniform Discrete Distribution

Support set: discrete numbers from a to b

$f (x) = discrete values a to b$

$mean = \frac{a + b}{2}$

$variance = \frac{(b - a + 1)^{2} - 1}{12}$

$skewness = 0$

$entropy = l o g n$

This is the appropriate choice where something can take any value between a minimum and a maximum, but the values are discrete (IE, must be represented by a whole number).

Gaussian Continuous Probability Density Function

Support set: $(- \infty, \infty)$

$f (x) = \frac{1}{\sqrt{2 π σ^{2}}} \int e^{\frac{- (x - μ)^{2}}{2 σ^{2}}} d x$

$mean = μ$

$variance = σ^{2}$

$skewness = 0$

$entropy = 2.05 + l o g σ$

When there is a distribution that can take any value from $(- \infty, \infty)$ and all that is known is the standard deviation, $σ$ or variance $σ^{2}$ , this is the maximum entropy distribution. This means that if we have complete ignorance of a phenomenon, except for its variance, then we should use a Gaussian.

Some content in these notes is taken from the spreadsheets listed below. They are accessible as part of the Mastering Data Analysis in Excel course on coursera.org, and licensed by Daniel Egger under CC BY-NC 4.0.

Histograms Spreadsheet.xlsx

Some other content is taken from my notes on other aspects of the Coursera course.