Information Theory: A Concise Introduction by Hollos Stefan & Hollos J. Richard

Information Theory: A Concise Introduction by Hollos Stefan & Hollos J. Richard

Author:Hollos, Stefan & Hollos, J. Richard
Language: eng
Format: epub
Publisher: Abrazol Publishing
Published: 2015-06-04T16:00:00+00:00


Principle of Maximum Entropy

We end the chapter by looking at a simple example using the principle of maximum entropy. This is a technique for inferring a probability distribution that was first developed by the American physicist Edwin T. Jaynes in a paper titled “Information Theory and Statistical Mechanics” (see references). The general idea is to choose a probability distribution that is consistent with what is known and that does not make any unwarranted assumptions.

Knowledge is expressed as a set of constraint equations that are generally insufficient to uniquely solve for the distribution. The extra condition imposed, is that the correct distribution should be the one that maximizes the entropy. This is the distribution that introduces no additional assumptions. The idea is best understood with a very simple example.

Sparky is a tattoo artist who offers three different tattoos priced at $8, $12, and $16. At the end of the week he knows how many tattoos he did in total and how much money he made but he forgot to keep track of how many of the three different tattoos were sold. He asks Spike, his mathematician friend, to help him figure it out. Taking the total amount Sparky made, A, and dividing by the number of tattoos, N, gives Spike the average cost of a tattoo, a = A/N. Letting p1, p2, and p3 be the probabilities of the $8, $12, and $16 tattoos respectively, he can then set up the following equation for the average cost of a tattoo.

(65)

He gets another equation from the fact that the probabilities must sum to 1.

(66)

Now Spike has two equations with three unknowns. There are many possible solutions. How can he find the correct one? He decides to use the distribution that maximizes the entropy which is given by

(67)

Using equations (65) and (66) he can write p2 and p3 as follows

(68)

Substituting these into equation (67) gives him an expression for the entropy in terms of the probability p1. Next he finds the maximum of H(p1) by taking the derivative with respect to p1, setting the result equal to zero and solving for p1. Checking to make sure he has a maximum and not a minimum, he gets the following expression for the value of p1 that maximizes the entropy.

(69)

The value of a must be in the range [8, 12]. At a = 8 all the tattoos must have been the $8 tattoo and at a = 16 they must all have been the $16 tattoo. Checking, he gets p1(8) = 1 and p1(16) = 0 which is correct. A plot of the three probabilities as a function of a is shown in figure (18)

Figure 18. Tattoo probabilities as a function of a.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.