Lecture #5 looked at the training of single neurons (TLUs) using gradient descent methods.
Notes are available in two formats:
The lecture covered material from Chapter 3 of the textbook, from pages 39-44. If you have the textbook, read through this.
You should go through this material in conjunction with the notes, ensuring that at least you understand why backpropagation works even if you don't understand the calculus needed.
If you want more detail on the Widrow-Hoff procedure, you can read the original article:
Widrow, B. and Hoff, M. (1960). Adaptive switching circuits. In 1960 IRE WESCON Convention Record, volume part 4, pp 96--104.while one of the original articles to discuss the generalised delta procedure was:
D.E. Rumelhart, G.E. Hinton, and J.L. McClelland. Learning internal representations. In D.E. Rumelhart and J.L. McClelland, editors, Parallel Distributed Processing, pages 318--362, Cambridge, Massachusetts, 1986. MIT Press.
Note that we won't be covering how to train neural networks (as opposed to individual TLUs) in this course. The remainder of Chapter 3 does cover this, so if you are interested in using neural networks, then you should read this.
If you want more detail on backpropagation, the technique for training neural networks, read:
Rumelhart D.E., Hinton G.E. and Williams R.J. (1986). Learning Internal Representations by Error Back-propagation. In Rumelhart D.E., McClelland J.L. (eds.) , Parallel Distributed Processing, vol. 1, ch. 8, MIT Press.