A Birds' Eye View

Two Histories of Machine Learning

Current uses of machine learning come from the combination of two perspectives: statisticians’ and computer scientists’.

Statisticians’ Approach

The statisticians’ perspective has evolved from understanding data. This perspective involves using probability-based models of our world. For example, there are models that predict the answers given on a test, the number of accidents on a given stretch of road, and the frequency of words in a body of text. These models have been used to describe many aspects of our daily lives, but these models are an oversimplification of the real-world that do not always hold true.

Computer Scientists’ Approach

In the 1950’s, computer scientists began to realize that hard-coding all computers’ decision-making was becoming untenable. Instead, a new paradigm was developed wherein computers looked at data and recognized patterns. This idea of recognizing patterns is what is now known as machine learning. Rather than hard-coding rules, these new techniques involved rules that were based on distributions. Over time, machine learning advanced, and these distributions evolved into extremely flexible and complex models.

These more recent, complex models outperform both the previous hard-coded computational paradigm as well as the more simplistic statical models. These intricate models have, however, also become opaque and difficult to understand and interpret, even for the experts who design them.

Types of Machine Learning

Supervised Learning

In supervised learning, algorithms learn from studying labeled data. After studying the labeled data, the algorithms determine which label should be assigned to new data. Supervised learning can be further subdivided into classification and regression, which are differentiated by whether the output is discrete or continuous. Supervised learning is the type of machine learning that has gained the most traction in business use cases.

Unsupervised Learning

Unsupervised learning creates labels for data when there are no labels to train on. There are many applications for these techniques, including reducing a dataset down to only the most useful features, grouping similar items, and building music recommendation systems.

Reinforcement Learning

The final type of machine learning is reinforcement learning. It is used to train algorithms that learn based on taking certain actions, and receiving feedback from taking those actions. The applications of reinforcement learning include self-driving cars and gaming agents like the ones recently developed to play Go and Starcraft 2.

Deep Learning

Deep learning has outperformed all other machine learning systems in its ability to predict. It can be used for all three types of learning outlined above: unsupervised, supervised, and reinforcement. The recent rapid growth of this technique makes sense, because over time we have cared less about how we make predictions and more about the accuracy of the predictions we make. In deep learning, we rarely have a complete understanding of how or why certain predictions are made.

Deep learning, unfortunately, cannot be the answer to every problem because of three barriers:

  1. Deep learning requires a lot of data.
  2. Deep learning requires a lot of computing power. In order to reach superhuman performance at Go, Google Deep Mind’s Alpha Go required 1202 CPUs and 176 GPUs.
  3. How or why a decision was made is not likely to be fully understandable.


Scikit-learn is the most popular open-source library for supervised and unsupervised learning. With this library, the most advanced techniques are available to all data scientists at no cost. Scikit-learn will continue to be widely used for the foreseeable future.

Topics of Study in the Intro to ML Nanodegree:

Supervised Learning

Deep Learning

  • Intro to Neural Networks
  • Implementing Gradient Descent
  • Training Neural Networks
  • Deep Learning with PyTorch

Unsupervised Learning

  • Clustering
  • Hierarchical and Density Based Clustering
  • Gaussian Mixture Models and Cluster Validation
  • Dimensionality Reduction and PCA
  • Random Projection and ICA