Lecture #4 looked at the basics of tarining neural networks, in particular the training of individual TLUs.
Notes are available in four formats:
The lecture covered material from Chapter 3 of the textbook, from page 37 to 44.
If you have the textbook, you should go through this material in conjunction with the notes, ensuring that at least you understand why gradient descent works even if you don't understand the calculus needed.
If you want more detail on the Widrow-Hoff procedure, you can read the original article:
Widrow, B. and Hoff, M. (1960). Adaptive switching circuits. In 1960 IRE WESCON Convention Record, volume part 4, pp 96--104.while one of the original articles to discuss the generalised delta procedure was:
D.E. Rumelhart, G.E. Hinton, and J.L. McClelland. Learning internal representations. In D.E. Rumelhart and J.L. McClelland, editors, Parallel Distributed Processing, pages 318--362, Cambridge, Massachusetts, 1986. MIT Press.Also worth reading is David Medler's Brief history of neural network approaches.