channel capacity : theory, code, examples

This is all taken from

* chapters 8 & 9 of MacKay's "Information Theory, Inference, and Learning Algorithms" 
* chapter 5 of Biggs' "Codes : An Introduction to Information Communication and Cryptography".

Jim Mahoney | March 2017 | cs.marlboro.edu | MIT License

theory

X send --->  noise ---> Y receive

The situation is a communications channel, with X = ($x_1$, $x_2$, ...) the sent symbols and Y = ($y_1$, $y_2$, ...) the received symbols. The channel itself is described by a conditional probability matrix P(Y|X) which gives the a probability of an $x_i$ becoming a $y_j$ due to noise.

P(Y|X) =  P(Y=y1|X=x1)  P(Y=y1|X=x2) ...          a matrix
          P(Y=y2|X=x1)  P(Y=y2|X=x2) ...
          ...

In addition to these numbers, the probabilities P(X) are also needed to completely to specify the situation.

P(X)   = P(X=x1)                                  a column vector
         P(X=x2)
         ...

Note that rows and columns of P(Y|X) as given here (with Y giving the row and X the column) are consistent with MacKay's formulation but are the transpose of Biggs' $\Gamma$ matrix.

The additional quantities we can calculate are then

P(y,x) = joint probability = P(y|x) * P(x) , with the property that 1 = sum over x,y ( P(x,y) )
P(y)   = sum over x ( P(y, x) ) = matrix(P(Y|X)) _matrix_multiply_ column(P(X)) , also sums to 1.
H(Y,X) = - sum over x,y ( P(x,y) * log P(x,y) ) = the entropy (uncertainty) in the whole sysem
H(X)   = - sum over x ( P(x) * log P(x) ) = entropy in X
H(Y)   = - sum over y ( P(y) * log P(y) ) = entropy in Y

And with the relations suggested by this diagram :

|---------------- H(X,Y) -------------------|
|---- H(X) -------------------|
               |-------- H(Y) --------------|
|--- H(X|Y) ---|--- I(X;Y) ---|--- H(Y|X)---|

we can calculate from those

I(X;Y) = H(X) + H(Y) - H(X,Y)
H(X|Y) = H(X) - I(X;Y)
H(Y|X) = H(Y) - I(X;Y)

Finally, if we allow P(X) to vary while keeping P(Y|X) fixed, the maximum value of I(X;Y) is C, the "channel capacity".

C is important since Shannon's "noisy-channel coding theorem" says that it's the maximum effective rate at which data can be sent over the channel,

For example, if 2 two bit symbols (00, 11) are sent which due to noise turn into all 4 possibilities (00, 01, 10, 11), then the rate would be 2/4 = 0.5

code definitions

symmetric binary channel

   P(y=0 | x=0) = 1 - f       P(y=0 | x=1) = f
   P(y=1 | x=0) = f           P(y=1 | x=1) = 1 - f

MacKay's Z channel

   P(y=0 | x=0) = 1           P(y=0 | x=1) = f
   P(y=1 | x=0) = 0           P(y=1 | x=1) = 1 - f

Notice that for the Z channel

* The capacity is a bit larger than the symmetric channel, and

* The capacity is not at P(x=0.5) but at a value which is a bit smaller.