Processing math: 0%

Wednesday, July 27, 2016

Simplified Neural Network Backpropagation Example Python

This is an even simplified version of this fantastic tutorial: https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
Let's consider a very simple neural network shown in the following image:

All we want is that given the input .1, the network would output the value .9.
We will be using \frac{1}{1+e^{-x}} as our activation function. Initially both weight w1 and w2 are initialized to .5, though this can be any arbitrary value. However, different initialization shall converge to different final values.

First let's consider the net input in the hidden layer(neth). It's simply the input multiplied by w1. That is:(Here, outh means the output of the hidden layer, neto means the net input in the output layer, outo is the output of the output layer. E is the error. )

neth = i1 * w1 = .1 * .5 = .05
outh = \frac{1}{1+e^{-neth}} = \frac{1}{1+e^{-.05}} = .51249

neto = outh * w2 = .51249 * .5 = .25624
outo = \frac{1}{1+e^{-neto}} = \frac{1}{1+e^{-.25624}} = .563711

E = \frac{1}{2}*(target - outo)^2 =  \frac{1}{2}*(.9 - .563711)^2 = .056545

This concludes our forward pass of the network. We've selected that particular error function for our convenience, as it's convex and differentiable. Now, we can see that our target value was .9 but we've got .5637 . We are going to update our weights w1, w2 so that the network produces that desired output. How would we do that is the reason for using the famous Backpropagation algorithm. The basic idea is, we will calculate how much responsibility should each of the weights take for producing that error. Then the weight would be modified accordingly and the next iteration would begin.

First let's see what happens in case of w2. The partial differentiation expression \frac{\partial E}{\partial w2} says the rate of change in the error with respect to w2. As it can't be calculated directly, we would use the chain rule to find it's constituting parts.

\frac{\partial E}{\partial w2} = \frac{\partial E}{\partial outo} * \frac{\partial outo}{\partial neto} * \frac{\partial neto}{\partial w2}


Let's find out each of those parts separately:

\frac{\partial E}{\partial outo} = \frac{\partial}{\partial outo}(\frac{1}{2}*(target - outo)^2)  = \frac{1}{2} * 2 * ( target - outo) * (-1)
= (.9 - .563711) * (-1) = -.336289

\frac{\partial outo}{\partial neto} = \frac{\partial }{\partial neto} (\frac{1}{1+e^{-neto}}) = \frac{-1}{(1+e^{-neto})^2} * e^{-neto} * (-1) = \frac{e^{-neto}}{(1+e^{-neto})^2}
 = outo * (1-outo) = .563711 * ( 1- .563711) = .24594

\frac{\partial neto}{\partial w2} = \frac{\partial (w2*outh)}{\partial w2} = outh = .51249

So finally we have found the responsibility of w2 for producing the error E:

\frac{\partial E}{\partial w2} = -.33628 * .24594 * .51249 = -.042386

Now let's turn our attention to w1, that is we want to calculate \frac{\partial E}{\partial w1}. Again, chain rule comes to our rescue.

\frac{\partial E}{\partial w1} = \frac{\partial E}{\partial outo} * \frac{\partial outo}{\partial neto} *\frac{\partial neto}{\partial outh} * \frac{\partial outh}{\partial neth} * \frac{\partial neth}{\partial w1}

We already have calculated \frac{\partial E}{\partial outo} and \frac{\partial outo}{\partial neto}. Let's calculate the rest.

\frac{\partial neto}{\partial outh} = \frac{\partial (w2*outh)}{\partial outh} = w2 = .5
\frac{\partial outh}{\partial neth} = outh * ( 1- outh) = .51249 * ( 1- .51249) = .24984
\frac{\partial neth}{\partial w1} = \frac{\partial (w1 * input)}{\partial w1} = input = .1

So,

\frac{\partial E}{\partial w1} = -.336289 * .24594 * .5 * .24984 * .1 = -.001033

Now it's time to update our initial guesses of w1 and w2. Let's say our learning rate \eta = .9
w1 = w1 - (\eta * \frac{\partial E}{\partial w1}) =  .5 + .9 * 001033 = 0.5009298654
Similarly,
w2 = w2 - ( \eta * \frac{\partial E}{\partial w2}) = .5 + .9 * 042386 = 0.538148

We can continue this process until our network's output is very close to our target. The following is a python implementation of that process:

from __future__ import division
import math

ctr = 0
eta = .9
w1 = .5
w2 = .5

i1 = .1
target = .9

while True:
    ctr += 1
    neth = i1 * w1
    outh = 1 / (1 + math.exp(-neth))

    neto = w2 * outh
    outo = 1 / (1 + math.exp(-neto))

    e = .5 * math.pow((target - outo), 2)
    print 'error now is: ', e
    if e < .00005:
        print '*' * 100
        print 'target is: ', target
        print 'output is now: ', outo
        print 'w1 and w2 is: ', w1, ',', w2
        print 'number of iterations: ', ctr
        break
    # update w2
    de_by_douto = (-1) * (target - outo)
    douto_by_dneto = outo * (1 - outo)
    dneto_by_dw2 = outh

    de_by_dw2 = de_by_douto * douto_by_dneto * dneto_by_dw2

    # update w1
    # de_by_dw1 = de_by_douto * douto_by_dneto * dneto_by_douth * douth_by_dneth * dneth_by_dw1
    dneto_by_douth = w2
    de_by_outh = de_by_douto * douto_by_dneto * dneto_by_douth
    douth_by_dneth = outh * (1 - outh)
    dneth_by_dw1 = i1

    de_by_dw1 = de_by_outh * douth_by_dneth * dneth_by_dw1

    # updated w1 and w2
    w1 = w1 - (eta * de_by_dw1)
    w2 = w2 - (eta * de_by_dw2)

Sample output:

No comments:

Post a Comment