A Training Algorithm for FFNN
The back propagation algorithm is a learning rule for multilayered NWs [17], credited to Rumelhart and McClelland. The algorithm provides a prescription for adjusting the initially randomized set of synaptic weights (existing between all pairs of neurons in each successive layer of the network) to minimize the difference between the network’s output of each input fact and the output with which the given input is known (or desired) to be associated. The back propagation rule takes its name from the way in which the calculated error at the output layer is propagated
backwards from the output layer to the nth hidden layer, then to the njth hidden layer, and so on. Because this learning process requires us to ‘‘know’’ the correct pairing of input-output facts beforehand, this type of weight adjustment is called supervised learning. The FFNN training algorithm is described using the matrix/ vector notation for easy implementation in PC MATLAB. Alternatively, the NN tool box of MATLAB can be used.
The FFNW has the following variables:
1. u0 as input to (input layer of) the network
2. щ as the number of input neurons (of the input layer) equal to the number of inputs u0
3. nh as the number of neurons of the hidden layer
4. n0 as the number of output neurons (of the output layer) equal to the number of outputs z, v), W1 = nh x n; as the weight matrix between input and hidden layers
5. W10 = nh x 1 as the bias weight vector
6. W2 = n0 x nh as the weight matrix between hidden and output layers
7. W20 = n0 x 1 as the bias weight vector
8. m as the learning rate or step size
The algorithm is based on the steepest descent optimization method [4]. Signal computation is done using the following equations, since u0 and initial guesstimates of the weights are known.
У1 = W1u0 + W10
U1 = f( У1)
Here, У1 and u1 are the vector of intermediate values and the input to the hidden layer, respectively. The f(y1) is a sigmoid activation function given by
Here, l is a scaling factor to be defined by the user.
The signal between the hidden and output layers is computed as
У2 = W2U1 + W20 (2.26)
U2 = f( У2) (2.27)
A quadratic function is defined as E = 1 (z — u2)(z — u2)T, which signifies the square of the errors between the NW output and desired output. u2 is the signal at the output layer and z is the desired output. The following result from the optimization theory is used to derive a training algorithm:
The expression for the gradient is based on Equations 2.26 and 2.27:
@E T
=-Ay2)(z – U2)uT (2.29)
Here, uj is the gradient of y2 with respect to W2. The derivative f of the node activation function is given from Equation 2.25 as
The modified error of the output layer is expressed as
Є2Ь = f ( y2)(z – U2)
Finally, the recursive weight update rule for the output layer is given as
W2(i + 1) = W2(i) + me2buT + V[W2(i) – W2(i – 1)] (2.32)
V is the momentum factor used for smoothing out the (large) weight changes and to accelerate the convergence of the algorithm. The back propagation of the error and the update rule for W1 are given as
еіь = f (yi)WTe2b (2.33)
Wi(i + 1) = Wi(i) + теїьиТ + V[Wi(i) – Wi(i – 1)] (2.34)
The data are presented to the network in a sequential manner repeatedly but with initial weights as the outputs from the previous cycle until convergence is reached. The entire process is recursive-iterative.