ML Definition and Survey

24 Aug 2019

Machine Learning is defined as

Computational Statistics, or more broadly as the
notion of building computational artifacts that learn over time based on experience,
plus all the math, science, engineering, and computing that goes into building artifacts that learn all the time.

Supervised Learning

Problem of taking labeled datasets and gleaning information from them to label new datasets, or
“Function approximation,” whereby a set of inputs and outputs are used to generalize to a broader set of inputs than are provided in the original set.

The fundamental problem of machine learning is Generalization. Or, induction, whereby a set of specifics are abstracted to form a general rule, as opposed to deduction, whereby a general rule is used to infer a group of specifics.

Unsupervised Learning

Labels are not available, as in Supervised Learning. All that are provided are inputs, and the task is to derive structure just by examining the relationship between the inputs themselves.
- Specific example: a child calls all 4-legged animals “dogs,” even if they are horses or cows, but knows that trees are not “dogs.”
- Specific example: dividing a crowd of people into logical groupings (ethnicity, sex, facial hair).
Results in a concise, compact summarization of the input. Number of males/females, number with facial hair, etc.
Supervised learning = “Approximation”
Unsupervised learning = “Description” or “Summarization”

Unsupervised Learning can be used to help do Supervised Learning better:

Reinforcement Learning

Learning from delayed reward. In Supervised Learning, the feedback is available immediately, but in Reinforcement Learning, the feedback may come several steps after the decisions that the agent has made.
- In Tic Tac Toe, several moves must be made before there is feedback in the form of a win or a loss.
Reinforcement learning is, in some sense, harder, because nobody is telling the agent what to do. The agent has to “work it out on its own.”

Overlap between the Types

Supervised, Unsupervised, and Reinforcement Learning can be joined together in interesting ways. Additionally, the distinctions between the types are not always as clear-cut as the definitions set forth above might lead you to believe. All the types of problems the branches of ML might address are ultimately the same type.

Specifically, all these types problems can be formulated as some type of optimization.

Type	Optimization
Supervised Learning	Labels data well
Reinforcement Learning	Behavior scores well
Unsupervised Learning	Cluster scores well

Data

Data is king in machine learning. The professors identify as “computationalists” as opposed to “computer scientists,” because they:

Work in a college of computing, not a department of computer science, and
Believe in computing and computation as the “ultimate thing.”

Another fundamental shift associated with an ML (as opposed to computer science) is that the focus is less on algorithms and more on the data. This is the difference between “AI people,” who focus on the former, and “ML people,” who focus on the latter. Another perspective is that algorithms and data are coequal parts of the ML process. Regardless, data has more primacy in ML than it does in traditional computer science.

Content is taken from my notes on the GA Tech course CS 7461, Machine Learning. Specifically, these notes are taken from the group of lectures "ML is the ROX."

For Fall 2019, CS 7461 is instructed by Dr. Charles Isbell. The course content was originally created by Dr. Charles Isbell and Dr. Michael Littman.