Wednesday, July 1, 2015

Artificial Neural Network in Octave: Backpropagation to predict test scores

This post shall be using the same code of Programming assignment 4(week 5) in the online course:
https://www.coursera.org/learn/machine-learning/
As Backpropagation is a mathematically complex algorithm, using a simpler dataset and reviewing each step along the way would give us a better intuition. That is the goal of this post.

Here, we shall be taking the following steps.

1. Generate some random data points. Let's say, we are going to appear for the GRE test. As a part of the preparation, we would appear for two preliminary tests of the same format, let's say powerprep1(p1) and powerprep2(p2). We would try to predict the final score based on these two. I have written a function to generate test scores as much as needed in a random fashion.
https://gist.github.com/hasanIqbalAnik/6aa2af7138595d2ba85d

Here, p1, p2 would consist our input matrix X, and final scores would be Y. These are the first two and the last columns of the data matrix returned by our data generating function. For example:
P1 P2 Final
317 318 319
305 306 307
302 303 303


2. Structure of our Neural Network: For simplicity, It would have 3 layers: 1 input layer(3 nodes(including bias)), 1 hidden layer(6 nodes(including bias) and 1 output layer.

The full code is available here:
https://gist.github.com/hasanIqbalAnik/bd51dbf3e91550c69620

Now, to fit this dataset in the code, we need to care about just the following things:
  • Our num_labels would be 340.
  • We do not have pre-initialized Theta1 and Theta2 as we did in the assignment, so we would need to initialize them randomly from the beginning. 
  • Handle lambda carefully, a higher value of lambda would result in less overfitting but high bias and vice versa.
The rest can just be left as it is. The prediction performance would depend on a number of things like, how your data is distributed, number of hidden layers, your handling of bias and variance etc.