Archive for the ‘Entropy/Information’ Category

Q: What’s so special about the Gaussian distribution (i.e. the normal distribution / bell curve)??

Saturday, February 13th, 2010

Physicist: A big part of what makes physicists slothful and attractive is a theorem called the “central limit theorem”.  In a nutshell it says that, even if you can’t describe how a single random thing happens, a whole mess of them together will act like a gaussian.  If you have a weighted die I won’t be able to tell you the probability of each individual number being rolled.  But (given the mean and variance) if you roll a couple dozen weighted dice and add them up I can tell you (with fairly small error) the probability of any sum, and the more dice the smaller the error.  Systems with lots of pieces added together show up all the time in practice, so knowing your way around a gaussian is well worth the trouble.

Gaussians also maximize entropy for a given energy (or other conserved quadratic quantity, energy is quadratic because E = \frac{1}{2} mv^2).  So if you have a bottle of gas at a given temperature (which fixes the total energy) you’ll find that the probability that a given particle is moving with a given velocity is given by a gaussian distribution.

From quantum mechanics, gaussians are the most “certain” wave functions.  The “Heisenberg uncertainty principle” states that for any wave function \Delta X \Delta P \ge \frac{\hbar}{2}, where \Delta X is the uncertainty in position and \Delta P is the uncertainty in momentum.  For a gaussian: \Delta X \Delta P = \frac{\hbar}{2}, the absolute minimum total uncertainty.

And much more generally, we know a lot about gaussians and there’s a lot of slick, easy math that works best on them.  So whenever you see a “bump” a good gut reaction is to pretend that it’s a “gaussian bump” just to make the math easier.  Sometimes this doesn’t work, but often it does or it points you in the right direction.

 

Mathematician: I’ll add a few more comments about the Gaussian distribution (also known as the normal distribution or bell curve) that the physicist didn’t explicitly touch on. First of all, while it is an extremely important distribution that arises a lot in real world applications, there are plenty of phenomenon that it does not model well. In particular, when the central limit theorem does not apply (i.e. our data points were not produced by taking a sum or average over samples drawn from more or less independent distributions) and we have no reason to believe our distribution should have maximum entropy, the normal distribution is the exception rather than the rule.

To give just one of many, many examples where non-normality arises: when we are dealing with a product (or geometric mean) of (essentially independent) random variables rather than a sum of them, we should expect that the resulting distribution will be approximately log-normal rather than normal (see image below). As it turns out, daily returns in the stock market are generally better modeled using a log-normal distribution rather than a normal distribution (perhaps this is the case because the most a stock can lose in one day is -100%, whereas the normal distribution assigns a positive probability to all real numbers). There are, of course, tons of other distributions that arise in real world problems that don’t look normal at all (e.g. the exponential distribution, Laplace distribution, Cauchy distribution, gamma distribution, and so on.)

 

Human height provides an interesting case study, as we get distributions that are almost (but not quite) normally distributed. The heights of males (ignoring outliers) are close to being normal (perhaps height is the result of a sum of a number of nearly independent factors relating to genes, health, diet, etc.). On the other hand, the distribution of heights of people in general (i.e. both males and females together) looks more like the sum of two normal distributions (one for each gender), which in this case is like a slightly skewed normal distribution with a flattened top.

 

I’ll end with a couple more interesting facts about the normal distribution. In Fourier analysis we observe that, when it has an appropriate variance, the normal distribution is one of the eigenvectors of the Fourier transform operator. That is a fancy way of saying that the gaussian distribution represents its own frequency components. For instance, we have this nifty equation (relating a normal distribution to its Fourier transform):

e^{- \pi x^2}=\int_{-\infty}^{\infty} e^{- \pi s^2}e^{-2 \pi i s x} ds.

Note that the general equation for a (1 dimensional) Gaussian distribution (which tells us the likelihood of each value x) is

\frac{ e^{- \frac{(x-\mu)^2}{2 \sigma^2}} } {\sqrt{2 \pi \sigma^2}}

where \mu is the mean of the distribution, and \sigma is its standard deviation. Hence, the Fourier transform relation above deals with a normal distribution of mean 0 and standard deviation \frac{1}{\sqrt{2 \pi}}.

Another useful property to take note of relates to solving maximum likelihood problems (where we are looking for the parameters that make some data set as likely as possible). We generally end up solving these problems by trying to maximize something related to the log of the probability distribution under consideration. If we use a normal distribution, this takes the unusually simple form

\log{ \frac{ e^{- \frac{(x-\mu)^2}{2 \sigma^2}} } {\sqrt{2 \pi \sigma^2}} } = - \frac{(x-\mu)^2}{2 \sigma^2} - \frac{1}{2} \log(2 \pi \sigma^2)

which is often nice enough to allow for solutions that can be calculated exactly by hand. In particular, the fact that this function is quadratic in x makes it especially convenient, which is one reason that the Gaussian is commonly chosen in statistical modeling. In fact, the incredibly popular ordinary least squares regression technique can be thought of as finding the most likely line (or plane, or hyperplane) to fit a dataset, under the assumption that the data was generated by a linear equation with additive gaussian noise.

Q: Is the total complexity of the universe growing, shrinking or staying the same?

Friday, January 22nd, 2010

The complete question was:

If you were to look at the universe as an organism, was the early universe a simpler organism than the present-day organism?  Is the total complexity of the universe growing, shrinking or staying the same?  And how do you measure that?

Physicist: Absolutely.  The total complexity of the universe is increasing, due to the inevitable march of entropy (or information), which is exactly the measure of complexity.  A more intuitive way to talk about complexity and entropy is: can you predict what you’ll see next?  If you look at part of a checker board, you can probably guess what the whole thing looks like, so the board is predictable and has low entropy.  In the early universe matter was distributed pretty uniformly, almost all of it was hydrogen, almost everything was the same temperature, and there were no complex chemicals of any kind (going back far enough everything was ionized).  So if you’d seen one part of the universe, you’ve pretty much seen all of it.

This is actually a chess board.

No surprises.

Nowadays the universe is full of a wide variety of different elements with very complicated ways to combine together, matter shows up hot, cold, as plasma, as proteins, in stars, and clouds, and not at all.  The amount of data it would take to accurately describe the universe as it is now utterly dwarfs the amount that it would take to describe the early universe.  On an atom-by-atom basis, in the early universe you could grab an atom at random and feel fairly confident that: it’s hydrogen, it’s ionized, it’s about “yay” far away from the other nearby hydrogen, etc.  Today you’d probably be right if you guessed “hydrogen” (about 3/4 of the universe’s mass is still hydrogen), but you’d have a really hard time predicting anything beyond that.

Oddly enough, life is surprisingly uncomplex compared to say, dirt or sea water.  If you look at a single cell in your body, you’ve already got a pretty good idea of what you’ll see everywhere else in your body.  Admittedly, we are more complex than single celled life, but most of that is a symptom of being physically bigger.

Q: What’s the relationship between entropy in the information-theory sense and the thermodynamics sense?

Monday, January 18th, 2010

Physicist: The term “Entropy” shows up both in thermodynamics and information theory, so (since thermodynamics called dibs), I’ll call thermodynamic entropy “entropy”, and information theoretic entropy “information”.

I can’t think of a good way to demonstrate intuitively that entropy and information are essentially the same, so instead check out the similarities!  Essentially, they both answer the question “how hard is it to describe this thing?”.  In fact, unless you have a mess of time on your hands, just go with that.  For those of you with some time, a post that turned out to be longer than it should have been:

Entropy!) Back in the day a dude named Boltzmann found that heat and temperature didn’t effectively describe heat flow, and that a new variable was called for.  For example, all the air in a room could suddenly condense into a ball, which then bounces around with the same energy as the original air, and conservation of energy would still hold up.  The big problem with this scenario is not that it violates any fundamental laws, but that it’s unlikely (don’t bet against a thermodynamicist when they say something’s “unlikely”).  To deal with this Boltzmann defined entropy.  Following basic probability, the more ways that a macrostate (things like temperature, wind blowing, “big” stuff with lots of molecules) can happen the more likely it is.  The individual configurations (atom 1 is exactly here, atom 2 is over here, …) are called “microstates” and as you can imagine a single macrostate, like a bucket of room temperature water, is made up of a hell of a lot of microstates.

Now if a bucket of water has N microstates, then 2 buckets will have N2 microstates (1 die has 6 states, 2 dice have 36 states).  But that’s pretty tricky to deal with, and it doesn’t seem to be what nature is concerned with.  If one bucket has entropy E, you’d like two buckets to have entropy 2E.  Here’s what nature seems to like, and what Bolzmann settled on: E = k log(N), where E is entropy, N is the number of microstates, and k is a physical constant (k is the Boltzmann constant, but it hardly matters, it changes depending on the units used, and the base of the log).  In fact, Boltzmann was so excited about his equation and how well it works that he had it carved into his head stone (he used different letters, so it reads “S = k \cdot \log{(W)}“, but whatever).  The “log” turns the “squared” into “times 2″, which clears up that problem.  Also, the log can be in any base, since changing the base would just change k, and it doesn’t matter what k is (as long as everyone is consistent).

This formulation of entropy makes a lot of sense.  If something can only happen in one way, it will be unlikely and have zero entropy.  If it has many ways to happen, it will be fairly likely and have higher entropy.  Also, you can make very sensible statements with it.  For example: Water expands by a factor around 1000 when it boils, and it’s entropy increases 1000 fold.  That’s why it’s easy to boil water in a pot (it increases entropy), and it’s difficult to condense water in a pot (it decreases entropy).  You can also say that if the water is in the pot then the position of each molecule is fairly certain (it’s in the pot), so the entropy is low, and when the water is steam then the position is less certain (it’s around here somewhere), so the entropy is high.  As a quick aside, Boltzmann’s entropy assumes that all microstates have the same probability.  It turns out that’s not quite true, but you can show that the probability of seeing a microstate state with a different probability is effectively zero, so they may as well all have the same probability.

Information!) In 1948 a dude named Shannon (last name) was listening to a telegraph line and someone asked him “how much information is that?”.  Then information theory happened.  He wrote a paper worth reading, that can be understood by anyone who knows what “log” is and has some patience.

Say you want to find the combination of a combination lock.  If the lock has 2 digits, there are 100 (102) combinations, if it has 3 digits there are 1000 (103) combinations, and so on.  Although a 4 digit code has a hundred times as many combinations as a 2 digit code, it only takes twice as long to describe.  Information is the log of the number of combinations.  So I = \log_b{(N)} where I is the amount of information, N is the number of combinations, and b is the base.  Again, the base of the log can be anything, but in information theory the standard is base 2 (this gives you the amount of information in “bits”, which is what computers use).  Base 2 gives you bits, base e (the natural log) gives you “nats”, and base \pi gives you “slices”.  Not many people use nats, and nobody every uses slices (except in bad jokes), so from now on I’ll just talk about information in bits.

So, say you wanted to send a message and you wanted to hide it in your padlock combination.  If your padlock has 3 digits you can store I = log2(1000) = 9.97 bits of information.  10 bits requires 1024 combinations.  Another good way to describe information is “information is the minimal number of yes/no questions you have to ask (on average) to determine the state”.  So for example, if I think of a letter at random, you could ask “Is it A?  Is it B? …” and it would take 13 questions on average, but there’s a better method.  You can divide the alphabet in half, then again, and again until the letter is found.  So a good series of questions would be “Is is A to M?”, and if the answer is “yes” then “Is it A to G?”, and so on.  It should take log2(26) = 4.70 questions on average, so it should take 4.7 bits to describe each letter.

In thermodynamics every state is as likely to come up as any other.  In information theory, the different states (in this case the “states” are letters) can have different likelyhoods of showing up.  Right of the bat, you’ll notice that z’s and q’s occur rarely in written English (this post has only 4 “non-Bolzmann” z’s and 16 q’s), so you can estimate that the amount of information in an English letter should be closer to log2(24) = 4.58 bits.  Shannon figured out that if you have N “letters” and the probability of the first letter is P1, of the second letter is P2, and so on, then the information per digit is I = \sum_{i=1}^N P_i \log_2{\left(\frac{1}{P_i}\right)}.  If all the probabilities are the same, then this summation reduces to I = log2(N).

As weird as this definition looks, it does makes sense.  If you only have one letter to work with, then you’re not sending any information since you always know what the next letter will be (I = 1 log(1) +0log(0) + … + 0log(0) = 0).  By the same token, if you use all of the letters equally often, it will be the most difficult to predict what comes next (information per digit is maximized when the probability is equal, or spread out, between all the letters).  This is why compressed data looks random.  If your data isn’t random, then you could save room by just describing the pattern.  For example: “ABABABABABABABABABAB” could be written “10AB”.  There’s an entire science behind this, so rather than going into it here, you should really read the paper.

Overlap!) The bridge between information and entropy lies in how hard it is to describe a physical state or process.  The amount of information it takes to describe something is proportional to its entropy.  Once you have the equations (“I = log2(N)” and “E = k log(N)”) this is pretty obvious.  However, the way the word “entropy” as used in common speech is a little misleading.  For example, if you found a book that was just the letter “A” over and over, then you would say that it had low entropy because it’s so predictable, and that it has no information for the same reason. If you read something like Shakespeare on the other hand, you’ll notice that it’s more difficult to predict what will be written next.  So, somewhat intuitively, you’d say that Shakespeare has higher entropy, and you’d definitely say that Shakespeare has more information.

As a quick aside, you can extend this line of thinking empirically and you’ll find that you can actually determine if a sequence of symbols is random, or a language, etc.  It has been suggested that an entropy measurement could be applied to post modernist texts to see if they are in fact communicating anything at all (see “Sokal affair“).  This was recently used to demonstrate that the Indus Script is very likely to be a language, without actually determining what the script says.

In day to day life we only describe things with very low entropy.  If something has very high entropy, it would take a long time to describe it so we don’t bother.  That’s not a indictment of laziness, it’s just that most people have better things to do than count atoms.  For example: If your friend gets a new car they may describe it as “a red Ferrari 250 GT Spyder” (and congratulations).  The car has very little entropy, so that short description has all the information you need.  If you saw the car you’d know exactly what to expect.  Later it gets dented, so they would describe it as “a red Ferrari 250 GT Spyder with a dent in the hood”.

Bueller?

Easy to describe, and soon-to-be-difficult to describe.

As time goes on and the car’s entropy increases, and it takes more and more information to accurately describe the car.  Eventually the description would be “scrap metal”.  But “scrap metal” tells you almost nothing.  The entropy has gotten so high that it would take forever to effectively describe the ex-car, so nobody bothers to try.

By the by, I think this post has more information than any previous post.  Hence all the entropy.

Q: Will black holes ever release their energy and will we be able to tell what had gone into them?

Thursday, January 14th, 2010

Physicist: In any reasonable sense the answer to both of these questions is a dull “nope”.  In theory however, the answer is an excitable “yup”!

Blackholes lose energy through “Hawking Radiation”, which is a surprising convergence of general relativity, quantum mechanics, and thermodynamics.  Hawking (and later others) predicted that a blackhole will have a blackbody spectrum.  That is, it will radiate like people, the sun, or anything that radiates by virtue of having heat.  Hawking also calculated what temperature a blackhole will appear to be radiating at.  He found that for a blackhole of mass M: T = \frac{\hbar c^3}{8 \pi G k M}, where everything other than M is a physical constant (even the 8, depending on who you talk to).  A more useful way to write this is to plug in all the constants to get:

T = \frac{1.21 \times 10^{23}}{M}, where M is in kilograms and T is in degrees Kelvin.  That “10^{23}” makes it seem like blackholes should be really hot, and in fact small ones (like those we hope to see at CERN) are crazy hot.  However, if the Sun (M = 2 \times 10^{30} kg) were a blackhole its temperature would be about 60 nK (nano Kelvin).  …!  You wouldn’t want to lick it, or your tongue would stick.

Here’s the point.  Deep space glows.  It has a temperature of about 2.7K, which means that any blackhole that could reasonably form (M > 10^{31} kg, or several Suns) is going to be way colder than that.  Since the blackhole is colder it will actually absorb more energy than it emits.  In order for a blackhole in the universe today to actually shrink it must have a temperature above 2.7K, and so it must have a mass less than 4.5 \times 10^{22}, or around half the mass of the Moon.  Alternatively, you could wait several trillion years for the universe to cool down, and then the blackholes would start to evaporate.

As for the second half of the question: General relativity would suggest that when things fall into a blackhole they are erased.  Once they fall in, there’s no way to tell the difference between a ton of Soylent Green and a ton of Pogs (metric tonnes of course).  This makes quantum physicists really uncomfortable, because in addition to all the usual conservative laws (energy, momentum, drug policy) quantum physicists have “conservation of information”.  Lucky for them they also get to play with entanglement.  So if you chuck in a copy of War and Peace the blackhole will radiate thermally (which is the most randomized way to radiate) and will seem to scramble everything about Tolstoy’s pivotal work.  If you look at one outgoing photon at a time you’ll gain almost zero information.  If however, you can gather every outgoing photon, interfere them with each other and analyze how they are entangled you could (in theory) reconstruct what fell in.  However, you’d need to catch at least half of the photons before you could demonstrate that they hold any information at all.

This view of blackholes, that they hide information in the “quantum entanglement” between all of their radiated photons, makes them suddenly far more interesting.  Without going into to much detail, if you have N non-entangled 2-state particles you can have N bits of information, but if you have N entangled 2-state particles you can have 2N bits of information.  Allowing for entanglement frees up a lot of “extra room” to put information.

Suddenly, you’ll find that most (as in “almost all”) of the entropy in the universe is tied up in blackholes.  Also (again in theory), a carefully constructed blackhole can be the fastest and most powerful computer that it will ever be possible to create.

So, yes, blackholes will release all their energy, but you have to wait for the universe to cool down almost completely.  And, yes, we can tell what went into them, but we’ll have to wait for them to evaporate completely (after the universe has cooled down) and catch, without disturbing, almost every single particle that comes out of them.

Q: How/Why are Quantum Mechanics and Relativity incompatible?

Thursday, December 24th, 2009

Physicist: Quantum Mechanics (QM) and relativity are both 100% accurate, so far as we have been able to measure (and our measurements are really, really good).  The incompatibility shows up when both QM effects and relativistic effects are large enough to be detected and then disagree.  This condition is strictly theoretical today, but in the next few years our observations of Sagittarius A*, and at CERN should bring the problems between QM and relativity into sharp focus.

Relativity comes in two flavors: special and general.  Special relativity describes how time and distance are affected by movement (especially fast movement), and it replaces Newtonian mechanics, which is only accurate at low speeds.  Einstein came up with it by looking at the mathematical repercussions of the fact that all of physics works the same way, independent of movement (constant speed is the same as no speed).  Special relativity has been exhaustively tested (relativistic effects have been verified all the way down to walking speed), and works so perfectly that it is now held up as the yardstick against which all new theories are tested.  In fact, QM would make grossly inaccurate predictions if Dirac hadn’t shown up and tied QM together with special relativity to create “relativistic QM”.

General relativity, on the other hand, describes the stretching and bending of space and time by gravity.  Einstein came up with it when he thought about what the universe would be like if inertial and gravitational acceleration were the same (turns out they are).  By the way: gravitational acceleration is what pushes you toward the ground, and inertial acceleration is what pushes you back into the car seat when you step on the gas.  It’s general relativity that causes the problems.  Here’s two (of a possible untold many):

1) Smooth vs. Chunky: General relativity needs space to be “smooth”, or at the very least continuous.  So if you have two points side by side, then no matter how close you bring them together you can still tell which one is on the right or left.  Quantum mechanically you have to deal with position uncertainty.  At very small scales you can’t tell which is right or left.  In addition (as the name implies) QM requires everything to be “quantized”, or show up in discrete pieces.  You see this clearly with atoms, photons, and even phonons (which is quantized sound!  How awesome is that!?).  Less clear is the quantization of space, which would require space to be “chopped up”.  This choppiness will never be directly measured.  The predicted “chunky scale” should be no large than 10-35 m.  For comparison, a hydrogen atom is about a million, million, million, million times larger (10-24).

2) The Information Paradox: According to general relativity when stuff falls into a blackhole everything about it’s existence (with the exception of mass, charge, and momentum) is completely erased.  That doesn’t sound so bad.  We tend to think of blackholes as being like galactic garbage disposals.  However, if all the information about something is destroyed, then you lose time-reversibility.  Time-reversal is the idea that if you run time backwards, all the basic physical laws of the universe continue to work the same.  More obscurely, you can predict the future based on what you know now, and time reversal means that you can derive what happened in the past as well.  QM requires that time-reversibility (or “unitarity”, to a professional) holds.  So QM requires that blackholes cannot destroy information.  One way around this is amazingly complicated entanglement between all of the in-falling matter, and all of the Hawking Radiation that comes out later.  Again, we’ll never be able to measure this.  To get results we would have to exactly measure at least half of all of the photons generated by Hawking radiation over the essentially infinite life time of the blackhole (every blackhole that exists today will be around long, long after the heat death of the universe).

Q: Is teleportation possible?

Monday, November 9th, 2009

Physicist: Nope.

The best you could hope for is a machine that reads the exact location of every atom in your body, as well as it’s chemical relationship to every nearby atom, then sends that blue print to another machine that builds a new body one atom at a time. Not only is every step of this a horrifying technical problem, but for “Uncertainty Principle” reasons is almost certainly impossible. Also, it doesn’t seem to be what they do on Star Trek.

Back in the 90′s it was shown that if two people share a pair of entangled particles, then they can use them to send 1 qbit instead of 2 bits, or they can send 2 bits instead of 1 qbit.  The former is called “superdense coding” and the latter is called “quantum teleportation“.  The guy who named it “quantum teleportation” and not the “2 bits = 1 qbit theorem” is a jerk.

The discovery, and more importantly the subsequent naming, of quantum teleportation lead to a new (false) hope that entire objects might be teleported.  In fact the only thing being “teleported” is information about the particle involved (not the particle itself).

What follows is answer gravy (more complex):

For example: the polarization of a photon is a combination of both vertical and horizontal polarization.  If you were to measure the polarization you would get a result of “vertical” or “horizontal”, but never the true combination of both.  So in the process of measurement you lose some (quantum) information.  Quantum teleportation (stupid name) allows you to get around this problem.

The exact technique is a little confusing, so if you intend to read the wikipedia article it might help to understand “classical teleportation” first.  This is an experiment that also doubles as a party trick (nerd parties).  The following technique will teleport the (classical) state of Alice’s coin A to Bob’s coin C.

1) Get 3 coins, A, B, and C.

2) Get B and C to be the same (heads or tails) without knowing which they both are.  Maybe paper-clip them together, then flip them both without looking, or just get some one else to do this set up.  B and C are now entangled.

3) Alice keeps coins A and B, and Bob takes coin C as far away as he likes.

4) Flip A and B together (like in step 2) and look at them.  There is no way of telling what the original states of either A or B were, but you can tell if they were the same or different.

5) If A and B were not the same, then Alice tells Bob to flip C over.  If A and B were the same, then Alice tells Bob to leave his coin alone.  The idea is, since B=C: if A=B then A=C, and if A≠B, then A≠C (so C should be turned over to match A).

Without ever determining the exact state of any coin, but only comparing two of them, Alice and Bob have teleported the state of A to C.  If A has a 77% chance of being heads and 23% chance of being tails (weird coin), then C will now also have a 77% chance of being heads and 23% chance of being tails.  The information about A, including the probabilies on A, have been transfered to C.  You’ll notice that the actual coin was never teleported, distance is irrelevant, the original state of A is destroyed, and the entire process is not even a little mysterious.

Q: What is monotony?

Wednesday, October 21st, 2009

Physicist: It seems fair to say that monotony goes hand in hand with predictability which goes hand in hand with low entropy.  So (mathematically speaking), you can reasonably define monotony as the reciprocal of entropy, or something like that.