Supervised Learning Terminology

Two types of Supervised Learning:

  • Classification: Process of mapping an input to some discrete label.

    • For example, mapping a picture of a person to Male or Female.
    • Mapping a credit history to a binary choice: lend money or do not lend money, etc.
  • Regression: Process of mapping an input to some continuous value.

    • For example, mapping a picture of a person to a number indicating the length of their hair.
    • Mapping a picture of a person to a real value indicating their age, etc.
  • Depending upon how the output is defined, a given problem may be either Classification or Regression. The best example here is age as an output, which is typically given in discrete integers (30 years old) but which could also be given as a continuous float value (30.65 years old).

Classification Learning - Term Definitions:

  • Instances: set of inputs (or a vectors of values, or features) that define the input space.
    • For example, pixels in a photo, credit score inputs, etc.
  • Concept: function that maps input space to outputs.
    • Concepts could be things like maleness, creditworthiness, age-ness, that are used to map objects in the real world to membership in a set.
  • Target Concept: the “answer,” or specific concept, we are looking for and trying to represent.
  • Hypothesis Class: the set of all concepts (or functions, or classifications) we are willing to entertain.
  • Sample: training set. A set of inputs paired with correct outputs.
    • This is the essence of inductive learning. Lots of labeled examples of “credit applications”, as opposed to a single concrete definition of what creditworthiness is.
  • Candidate: the concept that we think might be the target concept.
  • Testing Set: set of labeled inputs to which the candidate concept is applied to determine the quality of the model. The training set should not be the same as the testing set.