Lecture #5 looked at the training of single neurons (TLUs) using gradient descent methods.
Notes are available in two formats:
The lecture covered material from Chapter 3 of Nilsson's "Artificial Intelligence: A new synthesis", from pages 39-44. If you have access to this book, read through these pages.
Neural networks are covered in the textbook on pages 736-748. What we have covered goes up to about page 744; as you can see, there is much about neural networks that we haven't (and won't) cover (the maths gets rather hairy).
If you want more detail on the Widrow-Hoff procedure, you can read the original article:
Widrow, B. and Hoff, M. (1960). Adaptive switching circuits. In 1960 IRE WESCON Convention Record, volume part 4, pp 96--104.while one of the original articles to discuss the generalised delta procedure was:
D.E. Rumelhart, G.E. Hinton, and J.L. McClelland. Learning internal representations. In D.E. Rumelhart and J.L. McClelland, editors, Parallel Distributed Processing, pages 318--362, Cambridge, Massachusetts, 1986. MIT Press.
Note that we won't be covering how to train neural networks (as opposed to individual TLUs) in this course. The remainder of Chapter 3 of Nilsson and pages 744-748 of the textbook do cover this, so if you are interested in using neural networks, then you should read one of these.
If you want more detail on backpropagation, the technique for training neural networks, read:
Rumelhart D.E., Hinton G.E. and Williams R.J. (1986). Learning Internal Representations by Error Back-propagation. In Rumelhart D.E., McClelland J.L. (eds.) , Parallel Distributed Processing, vol. 1, ch. 8, MIT Press.