Machine Learning is defined as
- Computational Statistics, or more broadly as the
- notion of building computational artifacts that learn over time based on experience,
plus all the math, science, engineering, and computing that goes into building artifacts that learn all the time.
Problem of taking labeled datasets and gleaning information from them to label new datasets, or
“Function approximation,” whereby a set of inputs and outputs are used to generalize to a broader set of inputs than are provided in the original set.
The fundamental problem of machine learning is Generalization. Or, induction, whereby a set of specifics are abstracted to form a general rule, as opposed to deduction, whereby a general rule is used to infer a group of specifics.
Labels are not available, as in Supervised Learning. All that are provided are inputs, and the task is to derive structure just by examining the relationship between the inputs themselves.
- Specific example: a child calls all 4-legged animals “dogs,” even if they are horses or cows, but knows that trees are not “dogs.”
- Specific example: dividing a crowd of people into logical groupings (ethnicity, sex, facial hair).
Results in a concise, compact summarization of the input. Number of males/females, number with facial hair, etc.
Supervised learning = “Approximation”
Unsupervised learning = “Description” or “Summarization”
Unsupervised Learning can be used to help do Supervised Learning better:
- Learning from delayed reward. In Supervised Learning, the feedback is available immediately, but in Reinforcement Learning, the feedback may come several steps after the decisions that the agent has made.
- In Tic Tac Toe, several moves must be made before there is feedback in the form of a win or a loss.
- Reinforcement learning is, in some sense, harder, because nobody is telling the agent what to do. The agent has to “work it out on its own.”
Overlap between the Types
Supervised, Unsupervised, and Reinforcement Learning can be joined together in interesting ways. Additionally, the distinctions between the types are not always as clear-cut as the definitions set forth above might lead you to believe. All the types of problems the branches of ML might address are ultimately the same type.
Specifically, all these types problems can be formulated as some type of optimization.
|Supervised Learning||Labels data well|
|Reinforcement Learning||Behavior scores well|
|Unsupervised Learning||Cluster scores well|
Data is king in machine learning. The professors identify as “computationalists” as opposed to “computer scientists,” because they:
- Work in a college of computing, not a department of computer science, and
- Believe in computing and computation as the “ultimate thing.”
Another fundamental shift associated with an ML (as opposed to computer science) is that the focus is less on algorithms and more on the data. This is the difference between “AI people,” who focus on the former, and “ML people,” who focus on the latter. Another perspective is that algorithms and data are coequal parts of the ML process. Regardless, data has more primacy in ML than it does in traditional computer science.