Updated 2017-10-31
How much information is required to to represent some outcome. If the outcome is either true or false, then only 1 bit is required to store the information with no loss.
Information measure should take into account the probabilities of outcomes to minimize the “footprint” of the information stored.
Suppose there are symbols: . Then the information in each symbol is measured as
If the symbols are bits in a computer, then and is number of bits.
Requirements
Example:
You got admitted to Stanford university (4.8% admission rate), then the chance is about 1 in 21. It follows that the information measured is bits.
However, it was all a dream; you did not get into Stanford. Then you’re part of the 95.2% that was rejected, or 20 in 21, or 0.952. Thus, the information measured is bits.
The total information in a message with symbols and an alphabet size of is given by
The average information is the entropy, which is
Notice that it can be also expressed as .
The entropy tells us the minimum number of bits required to represent each symbol. Any less then we have a lossy encoding / compression.
According to law of large numbers, on average, any symbol will appear times. Where is the length in number of bits, and is the probability of occurrence. This approximation is only accurate as .
In a stream of information that has bits, number of typical sequences are
It follows that we need bits to represent a typical sequence.
Assuming that each typical sequence is equally likely, then the probability of a typical sequence is
It is a method of encoding symbols with codewords. The idea centers around prefix-free code: meaning that the codeword is never a prefix to other symbols’ codewords.
Huffman coding allows the symbol that appears the most use codewords that have the least length. Therefore number of bits that is required to be transmitted is minimized.