multiplying-probability-and-adding-entropy

Multiplying Probability and Adding Entropy

We have already noted that 2 x 2 x 2 x 2 = 16 = 2⁴. In a similar way we can understand the power or exponent as indicating the number of 2’s we must multiply together to get 2², 2³, 2⁴, 2⁵, and 2⁶.

How are we to understand 2¹? There is no multiplying in this case, but we need one 2 to get 2¹. The first entry on the list, 2^-2, has a negative exponent. Since positive exponents refer to a process of multiplying factors of 2, negative exponents refer to a process of dividing by factors of 2. This has a certain logic because negative numbers “undo” positive numbers and division “undoes” multiplication.

However, division introduces a certain complication. We must specify what it is that the factors of 2 are dividing. The simplest choice is the number 1. The number ½ is 1 divided by 2, and the number ¼ is 1 divided by two factors of 2. Thus, we get ¼ = 2^-2 and ½ = 2^-1.

Finally, how do we interpret 2⁰? It is 1 divided by no 2’s, neither multiplied by any 2’s. This gives 2⁰ = 1.

In summary, ¼ = 2^-2, ½ = 2^-1, 1 = 2⁰, 2 = 2¹, 4 = 2², 8 = 2³, 16 = 2⁴, 32 = 2⁵, and 64 = 2⁶. Notice that as the number increases in size rapidly, the power of 2 or exponent increases in constant steps of 1 each time. A power of 2 gauges how big a number is. A little farther on we will explain the name mathematicians give to this gauge.

Orders of magnitude are the same as powers of ten. In the paragraph above about powers of 2, we could have talked about multiplying or dividing by ten instead of talking about shifting to the right or left. Shifting is the same as multiplying or dividing by 10 because Arabic numbers (the system we commonly use for numbers) are based on ten, probably because most people have ten fingers.

What happens if we want to know how big a number is when the number is 3 or 7 or any other number that is neither a power of 2 nor 1 multiplied or divided by 10 several times? The closest power of 2 or order of magnitude still tells roughly how big the number is, but mathematicians desire precision, not rough estimates. There is a function, called a logarithm, which applies to all positive numbers. A logarithm is a smooth curve that fills in between the steps in the powers of two or the order of magnitude. Logarithm is abbreviated as “log” and often carries a subscript to distinguish how it is used. If we are talking about powers of 2 the function is log₂, read “logarithm to the base 2.” For orders of magnitude the function is log₁₀, read “logarithm to the base 10.”

Before using addition instead of multiplication to obtain the product of two numbers, one must find the logarithm of each of the numbers. To find the product we then raise the base to the power equal to the sum of the two logarithms.

Logarithms appear in formulas that relate information or entropy to probability. Information, according to Shannon’s formula, is minus 1 multiplied by the logarithm to the base 2 (log₂) of the probability of the message that carries the information. Entropy, according to Boltzmann’s formula, is Boltzmann’s constant multiplied by the natural logarithm of the probability of a given state of motion of particles or the number of possible wrong arrangements of things. Natural logarithms (log_e or sometimes ln) are those that use as their base the special number e = 2.7 18 28 18 28 45 90 45 23536 02874 71352 66.… The dots after the number mean that there is an endless string of additional digits afterwards, so e can never be expressed exactly in decimal notation. Explaining why mathematicians think this number is “natural” is beyond the scope of this website.

When we use Boltzmann’s formula to find the entropy from probability, the entropy comes out in watt-seconds per kelvin.

For example, in the cylinder of a steam engine the water molecules are most likely bouncing around in every direction. The probability of this state is just slightly less than one. The logarithm of one is zero. Therefore, the entropy of the usual state is just below zero. If it is nearly zero, why do we say the steam is in a state of high entropy? We do not ordinarily think of a number close to zero as a high number. The answer is because the entropy of any more organized state is much less than zero. For instance, if all the molecules were moving in parallel straight lines perpendicular to the face of the piston, they could impart all the energy of their motion to the piston, and the steam engine would be 100 percent efficient, as we said before. The probability of finding all the molecules moving that way is just above zero. The logarithm of a positive number close to zero is negative and large in magnitude, like minus a thousand or minus a million or minus millions of millions of millions. This makes the entropy for the normal, disorderly state of motion much larger than the entropy for the unusual, highly ordered state of straight-line motion idealized above.

We now welcome back those readers who left us when we mentioned logarithms. Most people get on fine with an intuitive understanding of order and disorder. Engineers need advanced math to calculate precisely how much disorder there is. The main thing to remember is that entropy increases with disorder, and disorder is much more usual than order. With that we can understand the connection between order and information.

Adding instead of multiplying is the easy part, but there is some hard work to get ready, and some more hard work to interpret the result. We need to introduce the notion of a function that tells how big a number is. Let’s start by considering the numbers ¼, ½, 1, 2, 4, 8, 16, 32, and 64. Recognizing that all those numbers are what mathematicians call “powers of two,” we can write them as 2‑2, 2-1, 20, 21, 22, 23, 24, 25, and 26. The superscript number is the power or exponent. All are powers of two because all are superscripts on the number 2. We have already noted that 2 x 2 x 2 x 2 = 16 = 24. In a similar way we can understand the power or exponent as indicating the number of 2’s we must multiply together to get 22, 23, 24, 25, and 26. How are we to understand 21? There is no multiplying in this case, but we need one 2 to get 21. The first entry on the list, 2‑2, has a negative exponent. Since positive exponents refer to a process of multiplying factors of 2, negative exponents refer to a process of dividing by factors of 2. This has a certain logic because negative numbers “undo” positive numbers and division “undoes” multiplication. However, division introduces a certain complication. We must specify what it is that the factors of 2 are dividing. The simplest choice is the number 1. The number ½ is 1 divided by 2, and the number ¼ is 1 divided by two factors of 2. Thus, we get ¼ = 2‑2 and ½ = 2‑1. Finally, how do we interpret 20? It is 1 divided by no 2’s, neither multiplied by any 2’s. This gives 20 = 1.

In summary, ¼ = 2-2, ½ = 2-1, 1 = 20, 2 = 21, 4 = 22, 8 = 23, 16 = 24, 32 = 25, and 64 = 26. Notice that as the number increase in size rapidly, the power of 2 or exponent increases in constant steps of 1 each time. A power of 2 gauges how big a number is. A little farther on we will explain the name mathematicians give to this gauge.

There is a similar concept called the “order of magnitude” for a series of numbers like 0.01, 0.1, 1.0, 10.0, 100.0, and 1000.0. The order of magnitude tells how big they are. For the six numbers in the series the respective orders of magnitude are –2, –1, 0, 1, 2, and 3. How do we figure? In the above six numbers there is only one significant figure and it is 1 in every case. We wrote the decimal in its proper place for all of the numbers, and we emphasized the decimal by putting a zero to the left or right of it, even though we usually do not write the decimal for numbers like 1, 10, 100, and 1000. The order of magnitude is the number of times we must shift from a position just to the right of the leading significant figure to reach the decimal. When we shift to the right, as we do for the numbers 10.0, 100.0, and 1000.0, the order of magnitude is positive. But if we must shift to the left, as we do for 0.01 and 0.1, the order of magnitude is negative. The order of magnitude of 1.0 is zero since no shifts are required.

Orders of magnitude are the same as powers of ten. In the paragraph above we could have talked about multiplying or dividing by ten instead of talking about shifting to the right or left. Shifting is the same as multiplying or dividing by 10 because Arabic numbers (the system we commonly use for numbers) are based on ten, probably because most people have ten fingers.

What happens if we want to know how big a number is when the number is 3 or 7 or any other number that is neither a power of 2 nor 1 multiplied or divided by 10 several times? The closest power of 2 or order of magnitude still tells roughly how big the number is, but mathematicians desire precision, not rough estimates. There is a function, called a logarithm, which applies to all positive numbers. A logarithm is a smooth curve that fills in between the steps in the powers of two or the order of magnitude. Logarithm is abbreviated as “log” and often carries a subscript to distinguish how it is used. If we are talking about powers of 2 the function is log2, read “logarithm to the base 2.” For orders of magnitude the function is log10, read “logarithm to the base 10.”

We can see that the power of 2, the order of magnitude, and the logarithm function all grow very slowly as the numbers grow rapidly. Nevertheless, they all reach infinity when the number reaches infinity. As the number gets closer and closer to zero, the power of 2, the order of magnitude, and the logarithm all approach negative infinity.

Before using addition instead of multiplication to obtain the product of two numbers, one must find the logarithm of each of the numbers. To find the product we then raise the base to the power equal to the sum of the two logarithms.

Logarithms appear in formulas that relate information or entropy to probability. Information, according to Shannon’s formula, is minus 1 times the logarithm to the base 2 (log2) of the probability of the message that carries the information. Entropy, according to Boltzmann’s formula, is Boltzmann’s constant times the natural logarithm of the probability of a given state of motion of particles or the number of possible wrong arrangements of things. Natural logarithms (loge or sometimes ln) are those that use as their base the special number e = 2.7 18 28 18 28 45 90 45 23536 02874 71352 66.… The dots after the number mean that there is an endless string of additional digits afterwards, so e can never be expressed exactly in decimal notation. Explaining why mathematicians think this number is “natural” is beyond the scope of this book.

When we use Boltzmann’s formula to find the entropy from probability, the entropy comes out in watt-seconds per kelvin. For example, in the cylinder of a steam engine the water molecules are most likely bouncing around in every direction. The probability of this state is just slightly less than one. The logarithm of one is zero. Therefore, the entropy of the usual state is just below zero. If it is nearly zero, why do we say the steam is in a state of high entropy? We do not ordinarily think of a number close to zero as a high number. The answer is because the entropy of any more organized state is much less than zero. For instance, if all the molecules were moving in parallel straight lines perpendicular to the face of the piston, they could impart all the energy of their motion to the piston, and the steam engine would be 100 percent efficient, as we said before. The probability of finding all the molecules moving that way is just above zero. The logarithm of a positive number close to zero is negative and large in magnitude, like minus a thousand or minus a million or minus millions of millions of millions. This makes the entropy for the normal, disorderly state of motion much larger than the entropy for the unusual, highly ordered state of straight-line motion idealized above.

We now welcome back those readers who left us when we mentioned logarithms. Most people get on fine with an intuitive understanding of order and disorder. Engineers need advanced math to calculate precisely how much disorder there is. The main thing to remember is that entropy increases with disorder, and disorder is much more usual than order. With that we can understand the connection between order and information.

There is a similar concept called the “order of magnitude” for a series of numbers like 0.01, 0.1, 1.0, 10.0, 100.0, and 1000.0. The order of magnitude tells how big they are. For the six numbers in the series the respective orders of magnitude are –2, –1, 0, 1, 2, and 3. How do we figure? In the above six numbers there is only one significant figure and it is 1 in every case. We wrote the decimal in its proper place for all of the numbers, and we emphasized the decimal by putting a zero to the left or right of it, even though we usually do not write the decimal for numbers like 1, 10, 100, and 1000. The order of magnitude is the number of times we have to shift from a position just to the right of the leading significant figure to reach the decimal. When we shift to the right, as we do for the numbers 10.0, 100.0, and 1000.0, the order of magnitude is positive. But if we have to shift to the left, as we do for 0.01 and 0.1, the order of magnitude is negative. The order of magnitude of 1.0 is zero since no shifts are required.

Orders of magnitude are the same as powers of ten. In the paragraph above we could have talked about multiplying or dividing by ten instead of talking about shifting to the right or left. Shifting is the same as multiplying or dividing by 10 because Arabic numbers (the system we commonly use for numbers) are based on ten, probably because most people have ten fingers.

What happens if we want to know how big a number is when the number is 3 or 7 or any other number that is neither a power of 2 nor 1 multiplied or divided by 10 several times? The closest power of 2 or order of magnitude still tells roughly how big the number is, but mathematicians desire precision, not rough estimates. There is a function, called a logarithm, which applies to all positive numbers. A logarithm is a smooth curve that fills in between the steps in the powers of two or the order of magnitude. Logarithm is abbreviated as “log” and often carries a subscript to distinguish how it is used. If we are talking about powers of 2 the function is log2, read “logarithm to the base 2.” For orders of magnitude the function is log10, read “logarithm to the base 10.”

We can see that the power of 2, the order of magnitude, and the logarithm function all grow very slowly as the numbers grow rapidly. Nevertheless, they all reach infinity when the number reaches infinity. As the number gets closer and closer to zero the power of 2, the order of magnitude, and the logarithm all approach negative infinity.

Before using addition instead of multiplication to obtain the product of two numbers, one has to find the logarithm of each of the numbers. To find the product we then raise the base to the power equal to the sum of the two logarithms.

Logarithms appear in formulas that relate information or entropy to probability. Information, according to Shannon’s formula, is minus 1 times the logarithm to the base 2 (log₂) of the probability of the message that carries the information. Entropy, according to Boltzmann’s formula, is Boltzmann’s constant times the natural logarithm of the probability of a given state of motion of particles or the number of possible wrong arrangements of things. Natural logarithms (log_e or sometimes ln) are those that use as their base the special number e = 2.7 18 28 18 28 45 90 45 23536 02874 71352 66.… The dots after the number mean that there is an endless string of additional digits afterwards, so e can never be expressed exactly in decimal notation. Explaining why mathematicians think this number is “natural” is beyond the scope of this book.

When we use Boltzmann’s formula to find the entropy from probability, the entropy comes out in watt-seconds per kelvin. For example, in the cylinder of a steam engine the water molecules are most likely bouncing around in every direction. The probability of this state is just slightly less than one. The logarithm of one is zero. Therefore the entropy of the usual state is just below zero. If it is nearly zero, why do we say the steam is in a state of high entropy? We do not ordinarily think of a number close to zero as a high number. The answer is because the entropy of any more organized state is much less than zero. For instance, if all the molecules were moving in parallel straight lines perpendicular to the face of the piston, they could impart all the energy of their motion to the piston, and the steam engine would be 100 percent efficient, as we said before. The probability of finding all the molecules moving that way is just above zero. The logarithm of a positive number close to zero is negative and large in magnitude, like minus a thousand or minus a million or minus millions of millions of millions. This makes the entropy for the normal, disorderly state of motion much larger than the entropy for the unusual, highly ordered state of straight-line motion idealized above.

We now welcome back those readers who left us when we mentioned logarithms. Most people get on fine with an intuitive understanding of order and disorder. Engineers need advanced math to calculate precisely how much disorder there is. The main thing to remember is that entropy increases with disorder, and disorder is much more usual than order. With that we can understand the connection between order and information.

probability and information

home