video number 79 on The missing piece : non-linearity, timestamp 11:54:40 #630

avish121 · 2023-09-11T16:05:59Z

avish121
Sep 11, 2023

video number 79 on The missing piece : non-linearity , timestamp 11:54:36

My questions

In the above mentioned video section you have asked a question

I have two questions on which I want some clarity.

Q 1. In my opinion I can draw any shape if I have to use only straight lines given with different slopes and intercepts and finite length provided they are "infinite" in number, then why should I use non-linear lines/ functions?

Q 2. As I learned, each neuron in the hidden layer and output layer has a connection with every neuron in the previous layer with a weight(slope) and a bias( intercept) that represents a line ( y = mx + c ). Does it means that each neuron has a set of lines, which will be used to draw shapes you were talking about?

for example

Note : I used tanh activation function in this example.
Considering the above example, the first hidden layer has 3 neurons, does this means that each neuron in the first hidden layer has 2 weights and a bias which represents a set of two lines for each neuron. If true then there will be total of 6 weights and 3 biases which represents 6 linear lines in the first hidden layer,

And in the second hidden layer with 2 neurons the shape has now changed to non-linear lines.

Please shed some light on how these lines from the first hidden layer are related to the final output.

What I've tried so far

I was searching on web to find relationship between the linear lines from neurons in the intermediate hidden layers, the sources I found were not easy to understand by me. Did I misunderstood the concepts of weights and biases in neural network.

Regards
Abhishek

Answered by mrdbourke

Sep 11, 2023

Hi @avish121 ,

Excellent questions!

Q 1. In my opinion I can draw any shape if I have to use only straight lines given with different slopes and intercepts and finite length provided they are "infinite" in number, then why should I use non-linear lines/ functions?

This sounds plausible. I'd agree too.

With an infinite amount of straight lines at almost any angle/slope you should be able to draw any pattern you please.

In fact, this is how I believe the Linear SVM (another machine learning function) works, by separating data linearly across different planes.

However, in practice, neural networks function far better by combining linear and non-linear functions.

As to exactly why, I don't …

View full answer

mrdbourke · 2023-09-11T22:46:16Z

mrdbourke
Sep 11, 2023
Maintainer

Hi @avish121 ,

Excellent questions!

Q 1. In my opinion I can draw any shape if I have to use only straight lines given with different slopes and intercepts and finite length provided they are "infinite" in number, then why should I use non-linear lines/ functions?

This sounds plausible. I'd agree too.

With an infinite amount of straight lines at almost any angle/slope you should be able to draw any pattern you please.

In fact, this is how I believe the Linear SVM (another machine learning function) works, by separating data linearly across different planes.

However, in practice, neural networks function far better by combining linear and non-linear functions.

As to exactly why, I don't have a perfectly good answer (I've searched for them myself), other than the explanation you linked (combine straight and non-straight lines).

It is only through experience of building 1000s of models have I (and many others) come to a similar conclusion.

There may be a mathematical proof somewhere but I am unaware of it (if you find it, I'd love to read it).

Q 2. As I learned, each neuron in the hidden layer and output layer has a connection with every neuron in the previous layer with a weight(slope) and a bias( intercept) that represents a line ( y = mx + c ). Does it means that each neuron has a set of lines, which will be used to draw shapes you were talking about?

Yes that's correct (though sometimes not all neurons are connected to the next layer, such as in the case of dropout).

Each neuron can be considered it's own function (e.g. y = mx + c) trying to find patterns in the data.

Combine the learned patterns of each individual neuron and hopefully you get a good representation of the data you're working with.

As for your question about how patterns in the first layer are related to the final output.

Generally, each layer progressively learns more refined patterns.

Starting from the rougher straight lines in the first layer (e.g. 3x straight lines in your example trying to cut up the data).

To the more refined circles in the second layer.

This kind of example can be seen here because our target data is quite easily visually separable (e.g. you can see with your eyes the blue dots should be separate from the orange dots).

However, in larger datasets, this kind of visualization is often not possible (because there are far too many dimensions).

And generally, with larger datasets, you use larger neural networks with more individual neurons.

So inspecting what an individual neuron learns on a large dataset becomes harder and harder with size.

However, to see a cool example of an individual neuron learning something, you may be interested in reading OpenAI's paper called "Unsupervised Sentiment Neuron", where it was found out of thousands of individual neurons (4096 total), one was found to learn the sentiment of a piece of text.

Anyway, fantastic questions and let me know if you'd to discuss more or me to expand on anything.

Daniel

2 replies

avish121 Sep 12, 2023
Author

Mr. @mrdbourke please accept my deepest thanks.

mrdbourke Sep 13, 2023
Maintainer

You're most welcome @avish121

PS I posted on Twitter (X) and LinkedIn asking a similar question about non-linearity and have more answers/resources that may help:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

video number 79 on The missing piece : non-linearity, timestamp 11:54:40 #630

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

video number 79 on The missing piece : non-linearity, timestamp 11:54:40 #630

Uh oh!

avish121 Sep 11, 2023

My questions

What I've tried so far

Replies: 1 comment · 2 replies

Uh oh!

Uh oh!

mrdbourke Sep 11, 2023 Maintainer

Uh oh!

avish121 Sep 12, 2023 Author

Uh oh!

mrdbourke Sep 13, 2023 Maintainer

avish121
Sep 11, 2023

Replies: 1 comment 2 replies

mrdbourke
Sep 11, 2023
Maintainer

avish121 Sep 12, 2023
Author

mrdbourke Sep 13, 2023
Maintainer