CIS 725

  Lecture Notes, 9/10/01

I'll try to clear up some (but probably not all) of the confusing parts from lecture.  Note that this is probably one of the most mathematically intense lectures of the semester; later on we'll be focusing more on algorithms and techniques.  Don't worry if you feel like you don't quite follow all this; few of us do.  Hopefully you are getting a sense, though, of where a lot of this stuff comes from.
 

What is H'?

This is a measurement of "entropy per second" rather than "entropy per symbol".  For our purposes, it turns out that this isn't a terribly interesting idea.  The motivation is this:  suppose that the source is some kind of machine (a "Markov process," but that doesn't matter much) that goes through a series of states at a fixed rate (e.g. 5 states/second), and emits a symbol at every state.  Then, because we can measure the number of symbols appearing each second, we can talk meaningfully about entropy per second.  And the value of entropy per second is then pretty intuitive:  if the source emits m symbols per second, then H' is just mH.

But we're not that interested in that particular scenario - we're more interested in thinking about the symbols themselves.  So what we can do to simply things is just set m=1 - the system emits one symbol per second, so then we just have H' = H (numerically speaking; of course the units are different, but the meaning is quite similar).

What is GN?  And what does N have to do with anything?

G measures the "entropy per symbol in a group of N symbols."  N is relevant to the Fundamental Theorem because it ends up being related to the error term between the ideal transmission rate and the actual one.  In the proof, we're supposed to consider groups ("messages") of N symbols which we then encode using a string of bits as described by the theorem's algorithm.  The idea is that as N gets bigger and bigger, the encoding gets closer and closer to ideal.

To understand the proof of the theorem, we relied on the simple example in the paper (the symbols A, B, C, D, with some probabilities attached to them).  This example is actually too simple to show the role of N, but it's still possible to explain it, I think, without having to look too closely at a more complicated example.

Here's the idea.  Consider English (which we've mentioned a few times in class, and which Shannon spends a couple pages on - you may want to look at his techniques in the paper for constructing "statistical approximations" to English - it's both interesting and relevant to this discussion).  Each letter in the alphabet has a certain probability of occurring, as we've noted.  E has a relatively high probability; Q has a relatively low probability.  So far this sounds very much like the simple A-B-C-D example we looked at (except with more symbols).  But we can actually say more complicated things about English.  What if we consider pairs of letters?  The probability of seeing an H is (let's suppose) 0.05.  But the probability of seeing an H immediately following a T is probably a bit higher, maybe 0.07; and the probability of seeing an H immediately following a Q is much lower, almost 0.  This level of detail can't be captured by looking only at single-letter probabilities, but if we look at pairs we can get that kind of information.  So we could make (as people already have) a table of all 26x26 pairs of letters and assign them probabilities - pairs like EE, TH, IO would have relatively high probabilities, and pairs like JQ, PZ, XY would have almost zero probability.  We could extend this to triples of letters - even though TH is a relatively common pair,  the triple ETH is far more likely than VTH.  And so on.

The point is that when you have a "real" example, just looking at messages that are one letter long isn't going to let you capture the full statistical profile of the source, but by increasing N, we get closer and closer to the real measure - we're able to be increasingly clever in how we encode the symbols into bits.

So what about the part of the theorem that says the maximum transmission rate is C/H?  It still doesn't quite make sense!

True enough.  I can't give you much comfort here, except to observe that Theorems 7 and 8 of the paper, which come right before the Fundamental Theorem, provide proofs of some of the statements he makes in his argument (such as "the entropy of the encoded symbols is the same as the entropy of the symbols").  Those Theorems are tough going, though, to my way of thinking, so I'd suggest that we take them more-or-less on faith, with the understanding that the statements we were trying to wrap our heads around do require proof, which Shannon does give us.


I think that covers most of what we were struggling with; if you're still feeling confused about any of this stuff, email me, and I'll try to provide a response either here or privately.