Probability and Information
The information in two independent messages should be the sum of the information in each message. The probability of receiving the two messages is the product of the probability of receiving each of the messages. Information, like entropy, should be added when the probabilities are multiplied. Considerations like this led Shannon to adopt Boltzmann’s formula for entropy even though there is no absolute temperature to associate with the messages.
Let’s think about living in a place with a very predictable climate. In Costa Rica during the rainy season, for example, mornings are bright and sunny, but there are showers in the afternoon, which clear again by nightfall. The weather prediction day after day is just the same. People don’t bother to listen to the weather report, because it contains almost no information. If the probability is nearly one the information is nearly zero.
We know that 1 is the result of raising 2 or 10 or e or any other number to the zeroth power, because we multiply or divide by the number zero times. That is, the logarithm of the probability is 0 when the probability is 1.
Parents in the first few months of expecting a new baby think about the sex of the child rather differently from how Costa Ricans think about their weather. Chances are about even of having a boy or a girl. “It’s a toss-up,” we say, referring to similar chances of a flipped coin landing heads or tails. If there are two equally likely outcomes, the probability of each is ½. How much information is there in a message like “It’s a girl!”? The number of bits in a message is equal to the number of yes-or-no questions the message answers when either answer is equally likely. If we ask, “Is it a girl?” a yes or no answer gets us all the information.
Information is easier to calculate than entropy because the constant that multiplies the logarithm in the information formula is simply minus one (–1). If we take the logarithm to the base 2 we will get the amount of information in bits. The logarithm to the base 2 of ½ is minus one. That is, one must raise 2 to the negative first power to get ½. Multiplying the constant (–1) times the logarithm (–1) gives positive one (+1). The information in the message is one bit.
A binary digit is a zero or a one. As there are only two possibilities, we can get the value of a binary digit by asking just one question. Therefore the number of bits is the minimum number of yes-or-no questions we must ask to get that amount of information. This makes the bit intuitively the ideal unit for measuring information.
The computer word “byte” is a variant spelling of the ordinary word “bite.” Computer people spell “bite” with a “y” to make it more sophisticated, in a vain attempt to disguise a silly play on words. A byte is a series of bits. Early computers had their bits organized in series of various lengths, 5, 6, 7, or 8, but now most computer manufacturers have settled on a standard series length of 8 bits. Eight yes-or-no questions determine 256 possibilities, since 2 raised to the eighth power is equal to 256. To calculate information in bytes, use 256 as the logarithm base.
Computer programmers say that half a byte, 4 bits, is a nibble. Thus, 20 questions can get 5 nibbles of information. A nibble determines 16 possibilities. “Twenty questions” is a parlor game in which the players ask up to 20 yes-or-no questions trying to identify an item the quizmaster says is either “animal,” “vegetable,” or “mineral.” People are often surprised to find out how many different items a skillful questioner can identify with 20 bits of information. The number 2 to the 20th power indicates 1 048 576 equally likely possibilities.
It’s hard to find a calculator or a logarithm table that gives logarithms for base 2 or base 256. We can, however, use any logarithm calculator or table we find, because there is a simple formula for converting from an arbitrary base to a desired base. We just take the logarithm of the probability in whatever base is available. Also we take the logarithm of the desired base in the available base. Finally, we divide the first logarithm by the second.
Let’s think about living in a place with a very predictable climate. In Costa Rica during the rainy season, for example, mornings are bright and sunny, but there are showers in the afternoon, which clear again by nightfall. The weather prediction day after day is just the same. People don’t bother to listen to the weather report, because it contains almost no information. If the probability is nearly one the information is nearly zero.
We know that 1 is the result of raising 2 or 10 or e or any other number to the zeroth power, because we multiply or divide by the number zero times. That is, the logarithm of the probability is 0 when the probability is 1.
Parents in the first few months of expecting a new baby think about the sex of the child rather differently from how Costa Ricans think about their weather. Chances are about even of having a boy or a girl. “It’s a toss-up,” we say, referring to similar chances of a flipped coin landing heads or tails. If there are two equally likely outcomes, the probability of each is ½. How much information is there in a message like “It’s a girl!”? The number of bits in a message is equal to the number of yes-or-no questions the message answers when either answer is equally likely. If we ask, “Is it a girl?” a yes or no answer gets us all the information.
Information is easier to calculate than entropy because the constant that multiplies the logarithm in the information formula is simply minus one (–1). If we take the logarithm to the base 2 we will get the amount of information in bits. The logarithm to the base 2 of ½ is minus one. That is, one must raise 2 to the negative first power to get ½. Multiplying the constant (–1) times the logarithm (–1) gives positive one (+1). The information in the message is one bit.
A binary digit is a zero or a one. As there are only two possibilities, we can get the value of a binary digit by asking just one question. Therefore the number of bits is the minimum number of yes-or-no questions we must ask to get that amount of information. This makes the bit intuitively the ideal unit for measuring information.
The computer word “byte” is a variant spelling of the ordinary word “bite.” Computer people spell “bite” with a “y” to make it more sophisticated, in a vain attempt to disguise a silly play on words. A byte is a series of bits. Early computers had their bits organized in series of various lengths, 5, 6, 7, or 8, but now most computer manufacturers have settled on a standard series length of 8 bits. Eight yes-or-no questions determine 256 possibilities, since 2 raised to the eighth power is equal to 256. To calculate information in bytes, use 256 as the logarithm base.
Computer programmers say that half a byte, 4 bits, is a nibble. Thus, 20 questions can get 5 nibbles of information. A nibble determines 16 possibilities. “Twenty questions” is a parlor game in which the players ask up to 20 yes-or-no questions trying to identify an item the quizmaster says is either “animal,” “vegetable,” or “mineral.” People are often surprised to find out how many different items a skillful questioner can identify with 20 bits of information. The number 2 to the 20th power indicates 1 048 576 equally likely possibilities.
It’s hard to find a calculator or a logarithm table that gives logarithms for base 2 or base 256. We can, however, use any logarithm calculator or table we find, because there is a simple formula for converting from an arbitrary base to a desired base. We just take the logarithm of the probability in whatever base is available. Also we take the logarithm of the desired base in the available base. Finally, we divide the first logarithm by the second.