Skip to content

Supervised Learning

Sheldon Nunes edited this page Oct 31, 2017 · 23 revisions

"Remember that supervise learning is used when ever we want to predict a certain outcome from a given input, and we have examples of inputs/output pairs. We build a machine learning model from these input/output pairs, which comprise a training set. Our goal is to make accurate predictions for new, never before seen data. Supervise learning often requires human effort to pull the training set, but afterwards automates and often speeds up an otherwise laborious or infeasible task." Quoted from Introduction to machine learning in Python

Classification and Regression

There are two issues that occur with supervised machine learning problems

1. Classification

Classification is the grouping that we place a result in. Formally this is referred to as a class label. Classification can either be a binary classification which behaves simlarly to yes/no questions. The other is a multiclass classification which can group into multiple classes (like the iris species from the introduction).

2. Regression

The goal for regression is to predict a continuous number (non-discrete). An example of this is shown in the introduction in using features from music to determine how many likes/listens it will have.

Generalization, Overfitting and Underfitting

And supervise learning we want to build a model that can make accurate predictions on new unseen data. If the model is able to do this we say it is able to generalise from the training set for test set. Sometimes our predictions can be compromised by the quality of our data/models.

K-Nearest Neighbors

The K-Nearest Neighbors (k-NN) algorithm is considered one of the simplest machine learning algorithm. Once the training and test datasets have been stored the algorithm makes a prediction for a new data point, the algorithm finds the closest data points in the training dataset - it's "nearest neighbors"

Clone this wiki locally