The paper for discussion is:
R. L. Brafman and M. Tennenholtz R-MAX -- A general polynomial time algorithm for near-optimal reinforcement learning, Journal of Machine Learning Research, 3, 213-231, 2002
The notes are: