This post shall be using the same code of Programming assignment 4(week 5) in the online course:
https://www.coursera.org/learn/machine-learning/
As Backpropagation is a mathematically complex algorithm, using a simpler dataset and reviewing each step along the way would give us a better intuition. That is the goal of this post.
Here, we shall be taking the following steps.
1. Generate some random data points. Let's say, we are going to appear for the GRE test. As a part of the preparation, we would appear for two preliminary tests of the same format, let's say powerprep1(p1) and powerprep2(p2). We would try to predict the final score based on these two. I have written a function to generate test scores as much as needed in a random fashion.
https://gist.github.com/hasanIqbalAnik/6aa2af7138595d2ba85d
Here, p1, p2 would consist our input matrix X, and final scores would be Y. These are the first two and the last columns of the data matrix returned by our data generating function. For example:
2. Structure of our Neural Network: For simplicity, It would have 3 layers: 1 input layer(3 nodes(including bias)), 1 hidden layer(6 nodes(including bias) and 1 output layer.
The full code is available here:
https://gist.github.com/hasanIqbalAnik/bd51dbf3e91550c69620
Now, to fit this dataset in the code, we need to care about just the following things:
https://www.coursera.org/learn/machine-learning/
As Backpropagation is a mathematically complex algorithm, using a simpler dataset and reviewing each step along the way would give us a better intuition. That is the goal of this post.
Here, we shall be taking the following steps.
1. Generate some random data points. Let's say, we are going to appear for the GRE test. As a part of the preparation, we would appear for two preliminary tests of the same format, let's say powerprep1(p1) and powerprep2(p2). We would try to predict the final score based on these two. I have written a function to generate test scores as much as needed in a random fashion.
https://gist.github.com/hasanIqbalAnik/6aa2af7138595d2ba85d
Here, p1, p2 would consist our input matrix X, and final scores would be Y. These are the first two and the last columns of the data matrix returned by our data generating function. For example:
P1 | P2 | Final |
---|---|---|
317 | 318 | 319 |
305 | 306 | 307 |
302 | 303 | 303 |
2. Structure of our Neural Network: For simplicity, It would have 3 layers: 1 input layer(3 nodes(including bias)), 1 hidden layer(6 nodes(including bias) and 1 output layer.
The full code is available here:
https://gist.github.com/hasanIqbalAnik/bd51dbf3e91550c69620
Now, to fit this dataset in the code, we need to care about just the following things:
- Our num_labels would be 340.
- We do not have pre-initialized Theta1 and Theta2 as we did in the assignment, so we would need to initialize them randomly from the beginning.
- Handle lambda carefully, a higher value of lambda would result in less overfitting but high bias and vice versa.