Skip to content

Machine Learning Glossary

rameshjesswani edited this page Aug 1, 2018 · 4 revisions

One-hot vector representation

  • Representation in which vector is 0 in most of the dimensions and 1 in only one dimension
  • For suppose, if we have 10 classes(10 digits), and we want to tell the classifier that Image contains the digit 2. We can represent it as follows:
[0,0,1,0,0,0,0,0,0,0]

Loss or cost function

  • It represents how far off our model is from our desired outcome. We try to minimize that error, and the smaller the error margin, the better our model is.

Different Loss function

  • cross-entropy

Early stopping

  • It is a form of regularization used to avoid over-fitting when training a model with an iterative method such as gradient descent. Furthermore, early stopping rules provide guidance as to how many iterations can be run before the models begins to over-fit.
  • Early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit. Early stopping rules have been employed in many different machine learning methods, with varying amounts of theoretical foundation.

Distant supervision (can be called as semi-supervised learning)

A Distant supervision algorithm usually has the following steps:

  • It may have some labeled training data
  • It "has" access to a pool of unlabeled data
  • It has an operator that allows it to sample from this unlabeled data and label them and this operator is expected to be noisy in its labels
  • The algorithm then collectively utilizes the original labeled training data if it had and this new noisily labeled data to give the final output.

Pre-training

Usual way of training the network

  • You want to train a neural network to perform a task (e.g. classification) on a data set (e.g. a set of images). You start training by initializing the weights randomly. As soon as your start training, the weights are changed in order to perform the task with less mistakes (i.e. optimization). Once you're satisfied with the training results you save the weights of your network somewhere.
  • You are now interested in training a network to do perform a new task (e.g. object detection) on a different data set (e.g. images too but not the same as the ones you used before). Instead of repeating what you did for the first network and start from training with randomly initialized weights, you can use the weights you saved from the previous network as the initial weight values for your new experiment. Initializing the weights this way is referred to as using a pre-trained network. The first network is your pre-trained network. The second one is the network you are fine-tuning.
  • The idea behind pre-training is that random initialization is...well...random, the values of the weights have nothing to do with the task your trying to solve. Why should a set of values be any better than another set? But how else would you initialize the weights? If you knew how to initialize them properly for the task, you might as well set them to the optimal values (slightly exaggerated). No need to train anything. You have the optimal solution to your problem. Pre-training gives the network a head start. As if it has seen the data before.

What to watch out for when pre-training:

  • Using a pre-trained network makes sense if both the datasets of two tasks are related, so that pre-trained network can be more effective.
  • The bigger the gap between two datasets, the less effective pre-training will be. It makes little sense to pre-train a network for image classification by training it one financial data first. In this case there's too much disconnect between the pre-training and fine-tuning stages.

Clone this wiki locally