## Multiplying Probability and Adding Entropy

To understand the relationship between probability and entropy, let’s stop talking about steam engines for a while and examine a more familiar example. Suppose a group of people asks you to take a picture of them. How many pictures should you take to be fairly sure of getting one good picture? This depends on your skill as a photographer. It also depends on the subjects. If they are not professional models they probably won’t strike their best poses spontaneously. Your task is to humor them, to organize them nicely so the picture will look good. You may have to ask your subjects to stand in front of a pleasing background, turn to face the light, move closer together, stand still, make sure they can all see the camera, and smile. Even then they may not all be looking their best when you click the shutter. At any given moment some of the subjects may be blinking, squinting, sneezing, slouching, or hiding behind someone else.

If subjects pose independently, what is the probability of all of them looking good at the same time? Independent probabilities should be multiplied together. Suppose each person looks good only half the time. The chances of two looking good at the same time are only ½ times ½, which multiplies out to ¼. If you want a reasonable chance of catching everyone looking good at the same time, you have to make more tries. With one subject, you should take a least two pictures. With two subjects, you should take at least four pictures. Continuing in the same way, you should get 1 024 pictures of ten subjects, but they will never wait that long. Does this explain why pictures of large groups of people hardly ever please everybody?

Notice that adding people means multiplying probabilities. Multiplying is harder than adding. School children learn to add and subtract first. Later they learn to multiply and divide. Students these days seldom learn the harder operations well, because they plan to do most of their figuring on a pocket calculator. Much later, in high school perhaps, a few mathematically inclined students may learn about an operation that lets them add instead of multiply. Let’s illustrate an easy case first. Suppose we are asked to multiply 256 by 16. Without a calculator that is hard work. If we have to multiply, almost the easiest multiplier is 2. Multiplying by 2 is the same as adding together two copies of the number we are supposed to multiply. Now let’s notice something about the two numbers we are supposed to multiply together, 256 and 16. We can get 16 by multiplying four 2’s together. That is, 16 is equal to the product 2 x 2 x 2 x 2. A convenient way of writing this is 16 = 24. The superscript "4" means that we multiplied four 2’s together. Some people have multiplied 2’s together often enough to recognize that 256 = 28. Therefore 256 x 16 is the same as 28 x 24 = 212. We got the last result by adding 8 and 4 to make 12 instead of multiplying. However, we have to recognize 212 or consult a table of powers of 2, if we want to use addition instead of multiplication to find the final result, 4 096.

Adding instead of multiplying is the easy part, but there is some hard work to get ready, and some more hard work to interpret the result. We need to introduce the notion of a function that tells how big a number is. Let’s start by considering the numbers ¼, ½, 1, 2, 4, 8, 16, 32, and 64. Recognizing that all those numbers are what mathematicians call “powers of two,” we can write them as 2‑2, 2‑1, 20, 21, 22, 23, 24, 25, and 26. The superscript number is the power or exponent. All are powers of two because all are superscripts on the number 2. We have already noted that 2 x 2 x 2 x 2 = 16 = 24. In a similar way we can understand the power or exponent as indicating the number of 2’s we must multiply together to get 22, 23, 24, 25, and 26. How are we to understand 21? There is no multiplying in this case, but we need one 2 to get 21. The first entry on the list, 2‑2, has a negative exponent. Since positive exponents refer to a process of multiplying factors of 2, negative exponents refer to a process of dividing by factors of 2. This has a certain logic because negative numbers “undo” positive numbers and division “undoes” multiplication. However, division introduces a certain complication. We must specify what it is that the factors of 2 are dividing. The simplest choice is the number 1. The number ½ is 1 divided by 2, and the number ¼ is 1 divided by two factors of 2. Thus we get ¼ = 2‑2 and ½ = 2‑1. Finally, how do we interpret 20? It is 1 divided by no 2’s, neither multiplied by any 2’s. This gives 20 = 1.

In summary, ¼ = 2‑2, ½ = 2‑1, 1= 20, 2 = 21, 4 = 22, 8 = 23, 16 = 24, 32 = 25, and 64 = 26. Notice that as the number increase in size rapidly, the power of 2 or exponent increases in constant steps of 1 each time. A power of 2 gauges how big a number is. A little farther on we will explain the name mathematicians give to this gauge.

There is a similar concept called the “order of magnitude” for a series of numbers like 0.01, 0.1, 1.0, 10.0, 100.0, and 1000.0. The order of magnitude tells how big they are. For the six numbers in the series the respective orders of magnitude are –2, –1, 0, 1, 2, and 3. How do we figure? In the above six numbers there is only one significant figure and it is 1 in every case. We wrote the decimal in its proper place for all of the numbers, and we emphasized the decimal by putting a zero to the left or right of it, even though we usually do not write the decimal for numbers like 1, 10, 100, and 1000. The order of magnitude is the number of times we have to shift from a position just to the right of the leading significant figure to reach the decimal. When we shift to the right, as we do for the numbers 10.0, 100.0, and 1000.0, the order of magnitude is positive. But if we have to shift to the left, as we do for 0.01 and 0.1, the order of magnitude is negative. The order of magnitude of 1.0 is zero since no shifts are required.

Orders of magnitude are the same as powers of ten. In the paragraph above we could have talked about multiplying or dividing by ten instead of talking about shifting to the right or left. Shifting is the same as multiplying or dividing by 10 because Arabic numbers (the system we commonly use for numbers) are based on ten, probably because most people have ten fingers.

What happens if we want to know how big a number is when the number is 3 or 7 or any other number that is neither a power of 2 nor 1 multiplied or divided by 10 several times? The closest power of 2 or order of magnitude still tells roughly how big the number is, but mathematicians desire precision, not rough estimates. There is a function, called a logarithm, which applies to all positive numbers. A logarithm is a smooth curve that fills in between the steps in the powers of two or the order of magnitude. Logarithm is abbreviated as “log” and often carries a subscript to distinguish how it is used. If we are talking about powers of 2 the function is log2, read “logarithm to the base 2.” For orders of magnitude the function is log10, read “logarithm to the base 10.”

We can see that the power of 2, the order of magnitude, and the logarithm function all grow very slowly as the numbers grow rapidly. Nevertheless, they all reach infinity when the number reaches infinity. As the number gets closer and closer to zero the power of 2, the order of magnitude, and the logarithm all approach negative infinity.

Before using addition instead of multiplication to obtain the product of two numbers, one has to find the logarithm of each of the numbers. To find the product we then raise the base to the power equal to the sum of the two logarithms.

Logarithms appear in formulas that relate information or entropy to probability. Information, according to Shannon’s formula, is minus 1 times the logarithm to the base 2 (log2) of the probability of the message that carries the information. Entropy, according to Boltzmann’s formula, is Boltzmann’s constant times the natural logarithm of the probability of a given state of motion of particles or the number of possible wrong arrangements of things. Natural logarithms (log

When we use Boltzmann’s formula to find the entropy from probability, the entropy comes out in watt-seconds per kelvin. For example, in the cylinder of a steam engine the water molecules are most likely bouncing around in every direction. The probability of this state is just slightly less than one. The logarithm of one is zero. Therefore the entropy of the usual state is just below zero. If it is nearly zero, why do we say the steam is in a state of high entropy? We do not ordinarily think of a number close to zero as a high number. The answer is because the entropy of any more organized state is much less than zero. For instance, if all the molecules were moving in parallel straight lines perpendicular to the face of the piston, they could impart all the energy of their motion to the piston, and the steam engine would be 100 percent efficient, as we said before. The probability of finding all the molecules moving that way is just above zero. The logarithm of a positive number close to zero is negative and large in magnitude, like minus a thousand or minus a million or minus millions of millions of millions. This makes the entropy for the normal, disorderly state of motion much larger than the entropy for the unusual, highly ordered state of straight-line motion idealized above.

We now welcome back those readers who left us when we mentioned logarithms. Most people get on fine with an intuitive understanding of order and disorder. Engineers need advanced math to calculate precisely how much disorder there is. The main thing to remember is that entropy increases with disorder, and disorder is much more usual than order. With that we can understand the connection between order and information.

If subjects pose independently, what is the probability of all of them looking good at the same time? Independent probabilities should be multiplied together. Suppose each person looks good only half the time. The chances of two looking good at the same time are only ½ times ½, which multiplies out to ¼. If you want a reasonable chance of catching everyone looking good at the same time, you have to make more tries. With one subject, you should take a least two pictures. With two subjects, you should take at least four pictures. Continuing in the same way, you should get 1 024 pictures of ten subjects, but they will never wait that long. Does this explain why pictures of large groups of people hardly ever please everybody?

Notice that adding people means multiplying probabilities. Multiplying is harder than adding. School children learn to add and subtract first. Later they learn to multiply and divide. Students these days seldom learn the harder operations well, because they plan to do most of their figuring on a pocket calculator. Much later, in high school perhaps, a few mathematically inclined students may learn about an operation that lets them add instead of multiply. Let’s illustrate an easy case first. Suppose we are asked to multiply 256 by 16. Without a calculator that is hard work. If we have to multiply, almost the easiest multiplier is 2. Multiplying by 2 is the same as adding together two copies of the number we are supposed to multiply. Now let’s notice something about the two numbers we are supposed to multiply together, 256 and 16. We can get 16 by multiplying four 2’s together. That is, 16 is equal to the product 2 x 2 x 2 x 2. A convenient way of writing this is 16 = 24. The superscript "4" means that we multiplied four 2’s together. Some people have multiplied 2’s together often enough to recognize that 256 = 28. Therefore 256 x 16 is the same as 28 x 24 = 212. We got the last result by adding 8 and 4 to make 12 instead of multiplying. However, we have to recognize 212 or consult a table of powers of 2, if we want to use addition instead of multiplication to find the final result, 4 096.

Adding instead of multiplying is the easy part, but there is some hard work to get ready, and some more hard work to interpret the result. We need to introduce the notion of a function that tells how big a number is. Let’s start by considering the numbers ¼, ½, 1, 2, 4, 8, 16, 32, and 64. Recognizing that all those numbers are what mathematicians call “powers of two,” we can write them as 2‑2, 2‑1, 20, 21, 22, 23, 24, 25, and 26. The superscript number is the power or exponent. All are powers of two because all are superscripts on the number 2. We have already noted that 2 x 2 x 2 x 2 = 16 = 24. In a similar way we can understand the power or exponent as indicating the number of 2’s we must multiply together to get 22, 23, 24, 25, and 26. How are we to understand 21? There is no multiplying in this case, but we need one 2 to get 21. The first entry on the list, 2‑2, has a negative exponent. Since positive exponents refer to a process of multiplying factors of 2, negative exponents refer to a process of dividing by factors of 2. This has a certain logic because negative numbers “undo” positive numbers and division “undoes” multiplication. However, division introduces a certain complication. We must specify what it is that the factors of 2 are dividing. The simplest choice is the number 1. The number ½ is 1 divided by 2, and the number ¼ is 1 divided by two factors of 2. Thus we get ¼ = 2‑2 and ½ = 2‑1. Finally, how do we interpret 20? It is 1 divided by no 2’s, neither multiplied by any 2’s. This gives 20 = 1.

In summary, ¼ = 2‑2, ½ = 2‑1, 1= 20, 2 = 21, 4 = 22, 8 = 23, 16 = 24, 32 = 25, and 64 = 26. Notice that as the number increase in size rapidly, the power of 2 or exponent increases in constant steps of 1 each time. A power of 2 gauges how big a number is. A little farther on we will explain the name mathematicians give to this gauge.

There is a similar concept called the “order of magnitude” for a series of numbers like 0.01, 0.1, 1.0, 10.0, 100.0, and 1000.0. The order of magnitude tells how big they are. For the six numbers in the series the respective orders of magnitude are –2, –1, 0, 1, 2, and 3. How do we figure? In the above six numbers there is only one significant figure and it is 1 in every case. We wrote the decimal in its proper place for all of the numbers, and we emphasized the decimal by putting a zero to the left or right of it, even though we usually do not write the decimal for numbers like 1, 10, 100, and 1000. The order of magnitude is the number of times we have to shift from a position just to the right of the leading significant figure to reach the decimal. When we shift to the right, as we do for the numbers 10.0, 100.0, and 1000.0, the order of magnitude is positive. But if we have to shift to the left, as we do for 0.01 and 0.1, the order of magnitude is negative. The order of magnitude of 1.0 is zero since no shifts are required.

Orders of magnitude are the same as powers of ten. In the paragraph above we could have talked about multiplying or dividing by ten instead of talking about shifting to the right or left. Shifting is the same as multiplying or dividing by 10 because Arabic numbers (the system we commonly use for numbers) are based on ten, probably because most people have ten fingers.

What happens if we want to know how big a number is when the number is 3 or 7 or any other number that is neither a power of 2 nor 1 multiplied or divided by 10 several times? The closest power of 2 or order of magnitude still tells roughly how big the number is, but mathematicians desire precision, not rough estimates. There is a function, called a logarithm, which applies to all positive numbers. A logarithm is a smooth curve that fills in between the steps in the powers of two or the order of magnitude. Logarithm is abbreviated as “log” and often carries a subscript to distinguish how it is used. If we are talking about powers of 2 the function is log2, read “logarithm to the base 2.” For orders of magnitude the function is log10, read “logarithm to the base 10.”

We can see that the power of 2, the order of magnitude, and the logarithm function all grow very slowly as the numbers grow rapidly. Nevertheless, they all reach infinity when the number reaches infinity. As the number gets closer and closer to zero the power of 2, the order of magnitude, and the logarithm all approach negative infinity.

Before using addition instead of multiplication to obtain the product of two numbers, one has to find the logarithm of each of the numbers. To find the product we then raise the base to the power equal to the sum of the two logarithms.

Logarithms appear in formulas that relate information or entropy to probability. Information, according to Shannon’s formula, is minus 1 times the logarithm to the base 2 (log2) of the probability of the message that carries the information. Entropy, according to Boltzmann’s formula, is Boltzmann’s constant times the natural logarithm of the probability of a given state of motion of particles or the number of possible wrong arrangements of things. Natural logarithms (log

*e*or sometimes ln) are those that use as their base the special number*e*= 2.7 18 28 18 28 45 90 45 23536 02874 71352 66.… The dots after the number mean that there is an endless string of additional digits afterwards, so*e*can never be expressed exactly in decimal notation. Explaining why mathematicians think this number is “natural” is beyond the scope of this book.When we use Boltzmann’s formula to find the entropy from probability, the entropy comes out in watt-seconds per kelvin. For example, in the cylinder of a steam engine the water molecules are most likely bouncing around in every direction. The probability of this state is just slightly less than one. The logarithm of one is zero. Therefore the entropy of the usual state is just below zero. If it is nearly zero, why do we say the steam is in a state of high entropy? We do not ordinarily think of a number close to zero as a high number. The answer is because the entropy of any more organized state is much less than zero. For instance, if all the molecules were moving in parallel straight lines perpendicular to the face of the piston, they could impart all the energy of their motion to the piston, and the steam engine would be 100 percent efficient, as we said before. The probability of finding all the molecules moving that way is just above zero. The logarithm of a positive number close to zero is negative and large in magnitude, like minus a thousand or minus a million or minus millions of millions of millions. This makes the entropy for the normal, disorderly state of motion much larger than the entropy for the unusual, highly ordered state of straight-line motion idealized above.

We now welcome back those readers who left us when we mentioned logarithms. Most people get on fine with an intuitive understanding of order and disorder. Engineers need advanced math to calculate precisely how much disorder there is. The main thing to remember is that entropy increases with disorder, and disorder is much more usual than order. With that we can understand the connection between order and information.