## Q: Does quantum mechanics really say there are other “mes”? Where are they?

Physicist: As much of a trope as “Other Quantum Worlds” has become in sci-fi, there are reasons to think that they may be a real thing; including “other yous”.  Here’s the idea.

Superposition is a real thing

One of the most fundamental aspects of quantum mechanics is “superposition“.  Something is in a superposition when it’s in multiple states/places simultaneously.  You can show, without too much effort, that a wide variety of things can be in a superposition.  The cardinal example is a photon going through two slits before impacting a screen: the double slit experiment.

The infamous Double Slit experiment demonstrates a single photon going through two (or more) slits simultaneously.  The “beats” of light are caused by photons interfering like waves between the two slits.  This still works if you release one photon at a time; even individually, they’ll only hit the bright regions.

Instead of the photons going straight through and creating a single bright spot behind every slit (classical) we instead see a wave interference pattern (quantum).  This only makes sense if: 1) the photons of light act like waves and 2) they’re going through both slits.

It’s completely natural to suspect that the objects involved in experiments like this are really in only one state, but have behaviors too complex for us to understand.  Rather surprisingly, we can pretty effectively rule that out.

There is no scale at which quantum effects stop working

To date, every experiment capable of detecting the difference between quantum and classical results has always demonstrated that the underlying behavior is strictly quantum.  To be fair, quantum phenomena is as delicate as delicate can be.  For example, in the double slit experiment any interaction that could indicate to the outside world (the “environment”), even in principle, which slit the particle went through will destroy the quantumness of the experiment (the interference fringes go away).  When you have to worry about the disruptive influence of individual stray particles, you don’t expect to see quantum effects on the scale of people and planets.

That said, the double slit experiment has been done and works for every kind of particle and even molecules with hundreds of atomsQuantum states can be maintained for minutes or hours, so superposition doesn’t “wear out” on its own.  Needles large enough to be seen with the naked eye have been put into superpositions of vibrational modes and this year China launched the first quantum communication satellite which is being used as a relay to establish quantum effects over scales of hundreds or thousands of miles.  So far there is no indication of a natural scale where objects are big enough, far enough apart, or old enough that quantum mechanics simply cease to apply.  The only limits seem to be in our engineering abilities.  It’s completely infeasible to do experimental quantum physics with something as substantive as a person (or really anything close).

Left: Buckminsterfullerene (and even much larger molecules) interfere in double-slit experiments. Middle: A needle that was put into a superposition of literally both vibrating and not vibrating at all. Right: Time lapse of a laser being used to establish quantum entanglement with carefully isolated atoms inside of a satellite in orbit.

If the quantum laws did simply ceased to apply at some scale, then those laws would be bizarre and unique; the first of their kind.  Every physical law applies at all scales, it’s just a question of how relevant each is.  For example, on large scales gravity is practically the only force worth worrying about, but on the atomic scale it can be efficiently ignored (usually).

Io sticks to (orbits) Jupiter because of gravitational forces and styrofoam sticks to cats because of electrical forces. Both apply on all scales, but on smaller scales (evidently cat scale and below) electrical forces tend to dominate.

So here comes the point: if the quantum laws really do apply at all scales, then we should expect that exactly like everything else, people (no big deal) should ultimately follow quantum laws and exhibit quantum behavior.  Including superposition.  But that begs a very natural question: what does it feel like to be in multiple states?  Why don’t we notice?

The quantum laws don’t contradict the world we see

When you first hear about the heliocentric theory (the Earth is in motion around the Sun instead of the other way around), the first question that naturally comes to mind is “Why don’t I feel the Earth moving?“.  But in trying to answer that you find yourself trying to climb out of the assumption that you should notice anything.  A more enlightening question is “What do the laws of gravitation and motion say that we should experience?“.  In the case of Newton’s laws and the Earth, we find that because the surface of the Earth, the air above it, and the people on it all travel together, we shouldn’t feel anything.  Despite moving at ridiculous speeds there are only subtle tell-tail signs that we’re moving at all, like ocean tides or the precession of garishly large pendulums.

Quantum laws are in a similar situation.  You may have noticed that you’ve never met other versions of yourself wandering around your house.  You’re there, so shouldn’t at least some other versions of you be as well?  Why don’t we see them?  A better question is “What do the quantum laws say that we should experience?“.

Why don’t we run into our other versions all the time, instead of absolutely never?

The Correspondence Principle is arguably one of the most important philosophical underpinnings in science.  It says that whatever your theories are, they need to predict (or at the very least not contradict) ordinary physical laws and ordinary experience when applied to ordinary situations.

When you apply the laws of relativity to speeds much slower than light all of the length contractions and twin paradoxes become less and less important until the laws we’re accustomed to working with are more than sufficient.  That is to say; while we can detect relativistic effects all the way down to walking speed, the effect is so small that you don’t need to worry about it when you’re walking.

Similarly, the quantum laws reproduce the classical laws when you assume that there are far more particles around than you can keep track of and that they’re not particularly correlated with one another (i.e., if you’re watching one air molecule there’s really no telling what the next will be doing).  There are times when this assumption doesn’t hold, but those are exactly the cases that reveal that the quantum laws are there.

It turns out that simply applying the quantum laws to everything seems to resolve all the big paradoxes.  That’s good on the one hand because physics works.  But on the other hand, we’re forced into the suspicion that the our universe might be needlessly weird.

The big rift between “the quantum world” and “the classical world” is that large things, like you and literally everything that you can see, always seem to be in exactly one state.  When we keep quantum systems carefully isolated (usually by making them very cold, very tiny, and very dark) we find that they exhibit superposition, but when we then interact with those quantum systems they “decohere” and are found to be in a smaller set of states.  This is sometimes called “wave function collapse”, to evoke an image of a wide wave suddenly collapsing into a single tiny particle.  The rule seems to be that interacting with things makes their behavior more classical.

But not always.  “Wave function collapse” doesn’t happen when isolated quantum systems interact with each other, only when they interact with the environment (the outside world).  Experimentally, when you allow a couple of systems that are both in a superposition of states to interact, then the result is two systems in a joint superposition of states (this is entanglement).  If the rule were as simple as “when things interact they decohere” you’d expect to find both systems each in only one state after interacting.  What we find instead is that in every testable case the superposition in maintained.  Changed or entangled, sure, but the various states in a superposition never just disappear.  When you interact with a system in a superposition you only see a particular state, not a superposition.  So what’s going on when we, or anything in the environment, interacts with a quantum system?  Where did the other states in the superposition go?

We have physical laws that describe the interactions between pairs of isolated quantum systems (A and B).  When we treat the environment as another (albeit very big) quantum system we can continue to use those same laws.  When we assume that the environment is not a quantum system, we have to make up new laws and special exceptions.

The rules we use to describe how pairs of isolated systems interact also do an excellent job describing the way isolated quantum systems interact with the outside environment.  When isolated systems interact with each other they become entangled.  When isolated systems interact with the environment they decohere.  It turns out that these two effects, entanglement and decoherence, are two sides of the same coin.  When we make the somewhat artificial choice to ask “What states of system B can some particular state in system A interact with?” we find that the result mirrors what we ourselves see when we interact with things and “collapse their wave functions” (see the Answer Gravy below for more on that).  The phrase “wave function collapse” is like the word “sunrise”; it does a good job describing our personal experience, but does a terrible job describing the underlying dynamics.  When you ask the natural question “What does it feel like to be in a many different states?” the frustrating answer seems to be “You tell me.“.

A thing can only be inferred to be in multiple states (such as by witnessing an interference pattern).  If there’s any way to tell the difference between the individual states that make up a superposition, then (from your point of view) there is no superposition.  Since you can see yourself, you can tell the difference between your state and another.  You might be in an effectively infinite number of states, but you’d never know it.  The fact that you can’t help but observe yourself means that you will never observe yourself somewhere that you’re not.

“Where are my other versions?” isn’t quite the right question

Where are those other versions of you?  Assuming that they exist, they’re no place more mysterious than where you are now.  In the double slit experiment different versions of the same object go through the different slits and you can literally point to exactly where each version is (in exactly the same way you can point at anything you can’t presently observe), so the physical position of each version isn’t a mystery.

The mathematical operations that describe quantum mechanical interactions and the passage of time are “linear”, which means that they treat the different states in a superposition separately.  A linear operator does the same thing to every state individually, and then sums the results.  There are a lot of examples of linear phenomena in nature, including waves (which are solutions to the wave equation).  The Schrodinger equation, which describe how quantum wave functions behave, is also linear.

The wave equation is linear, so you can describe how each of these ways travels across the surface of the water by considering them each one at a time and adding up the results.  They don’t interact with each other directly, but they do add up.

So, if there are other versions of you, they’re wandering around in very much the same way you are.  But (as you may have noticed) you don’t interact with them, so saying they’re “in the same place you are” isn’t particularly useful.  Which is frustrating in a chained-in-Plato’s-cave-kind-of-way.

The rain puddle picture is from here.

Answer Gravy: In quantum mechanics the affect of the passage of time and every interaction is described by a linear operator (even better, it’s a unitary operator).  Linear operators treat everything they’re given separately, as though each piece was the only piece.  In mathspeak, if $f(x)$ is a linear operator, then $f(ax+by) = af(x)+bf(y)$ (where $a$ and $b$ are ordinary numbers).  The output is a sum of the results from every input taken individually.

Consider a quantum system that can be in either of two states, $|\blacksquare\rangle$ or $|\square\rangle$.  When observed it is always found to be in only one of the states, but when left in isolation it can be in any superposition of the form $\alpha|\blacksquare\rangle + \beta|\square\rangle$, where $|\alpha|^2+|\beta|^2=1$.  The $\alpha$ and $\beta$ are important for how this state will interact with others as well as describing the probability of seeing either result.  According to the Born Rule, if $\alpha=-\frac{2}{3}$ (for example), then the probability of seeing $|\blacksquare\rangle$ is $|\alpha|^2=\frac{4}{9}$.

Let’s also say that the quantum scientists Alice and Bob can be described by the modest notation $|A(?)\rangle$ and $|B(?)\rangle$, where the “?” indicates that they have not looked at a the isolated quantum system yet.  If the isolated system is in the state $|\blacksquare\rangle$ initially, then the initial state of the whole scenario is $|A(?)\rangle|B(?)\rangle|\blacksquare\rangle$.

Define a linear “look” operation for Alice, $L_A$, that works like this

$L_A\left(|A(?)\rangle|B(?)\rangle|\blacksquare\rangle\right) = |A(\blacksquare)\rangle|B(?)\rangle|\blacksquare\rangle$

and similarly for Bob

$L_B\left(|A(?)\rangle|B(?)\rangle|\blacksquare\rangle\right) = |A(?)\rangle|B(\blacksquare)\rangle|\blacksquare\rangle$

Applying these one at a time we see what happens when each looks at the quantum system; they end up seeing the same thing.

$L_BL_A\left(|A(?)\rangle|B(?)\rangle|\blacksquare\rangle\right) = L_B\left(|A(\blacksquare)\rangle|B(?)\rangle|\blacksquare\rangle\right) = |A(\blacksquare)\rangle|B(\blacksquare)\rangle|\blacksquare\rangle$

It’s subtle, but you’ll notice that the coefficient in front of this state is 1, meaning that it has a 100% of happening.

But what happens if the system is in a superposition of states, such as $|\psi\rangle = \frac{|\square\rangle+|\blacksquare\rangle}{\sqrt{2}}$?  Since the “look” operation is linear, this is no big deal.

$\begin{array}{ll} &L_BL_A\left(|A(?)\rangle|B(?)\rangle\left(\frac{|\square\rangle+|\blacksquare\rangle}{\sqrt{2}}\right)\right) \\[2mm] =&\frac{1}{\sqrt{2}}L_BL_A\left(|A(?)\rangle|B(?)\rangle|\square\rangle\right)+\frac{1}{\sqrt{2}}L_BL_A\left(|A(?)\rangle|B(?)\rangle|\blacksquare\rangle\right) \\[2mm] =&\frac{1}{\sqrt{2}}L_B\left(|A(\square)\rangle|B(?)\rangle|\square\rangle\right)+\frac{1}{\sqrt{2}}L_B\left(|A(\blacksquare)\rangle|B(?)\rangle|\blacksquare\rangle\right) \\[2mm] =&\frac{1}{\sqrt{2}}|A(\square)\rangle|B(\square)\rangle|\square\rangle+\frac{1}{\sqrt{2}}|A(\blacksquare)\rangle|B(\blacksquare)\rangle|\blacksquare\rangle \\[2mm] \end{array}$

From an extremely outside perspective, Alice and Bob have a probability of $\left|\frac{1}{\sqrt{2}}\right|^2=\frac{1}{2}$ of seeing either state.  Alice, Bob, and their pet quantum system are all in a joint superposition of states: they’re entangled.

Anything else that happens can also be described by a linear operation (call it “F”) and therefore these two states can’t directly affect each other.

$F\left(\frac{1}{\sqrt{2}}|A(\square)\rangle|B(\square)\rangle|\square\rangle+\frac{1}{\sqrt{2}}|A(\blacksquare)\rangle|B(\blacksquare)\rangle|\blacksquare\rangle\right) = \frac{1}{\sqrt{2}}F\left(|A(\square)\rangle|B(\square)\rangle|\square\rangle\right)+\frac{1}{\sqrt{2}}F\left(|A(\blacksquare)\rangle|B(\blacksquare)\rangle|\blacksquare\rangle\right)$

The states can both contribute to the same end result, but only for other systems/observers that haven’t interacted with either of Alice or Bob or their pet quantum system.  We see this in the double slit experiment: the versions of the photons that go through each slit each contribute to the interference pattern on the screen, but neither version ever directly affects the other.  “Contributing but not interacting” sounds more abstract than it is.  If you shine two lights on the same object, the photons flying around all ignore each other, but each individually contributes to illuminating said object just fine.

The Alices and Bobs in the states $|A(\square)\rangle|B(\square)\rangle|\square\rangle$ and $|A(\blacksquare)\rangle|B(\blacksquare)\rangle|\blacksquare\rangle$ consider themselves to be the only ones (they don’t interact with their other versions).  The version of Alice in the state $|A(\square)\rangle$ “feels” that the state of the universe is $|A(\square)\rangle|B(\square)\rangle|\square\rangle$ because, as long as the operators being applied are linear, it doesn’t matter in any way if the other state exists.

Notice that Alice and Bob don’t see their own state being modified by that $\frac{1}{\sqrt{2}}$.  They don’t see their state as being 50% likely, they see it as definitely happening (every version thinks that).  That can be fixed with a “normalizing constant“.  That sounds more exciting than it is.  If you ask “what is the probability of rolling a 4 on a die?” the answer is “1/6”.  If you are then told that the number rolled was even, then suddenly the probability jumps to 1/3.  Once 1, 3, and 5 are ruled out, while the probability of 2, 4, and 6 change from 1/6 each to 1/3 each.  Same idea here; every version of Alice and Bob is certain of their result, and multiplying their state by the normalizing constant ($\sqrt{2}$ in this example) reflects that sentiment and ensures that probabilities sum to 1.

If you are determined to follow a particular state through the problem, ignoring the others, then you need to make this adjustment.  The interaction operation starts to look more like a “projection operator” adjusted so that the resulting state is properly normalized.

The projection operator for the state $|\blacksquare\rangle$ is $|\blacksquare\rangle\langle\blacksquare|$.  This Bra-Ket notation allows us to quickly write vectors, $|\blacksquare\rangle$, and their duals, $\langle\blacksquare|$, and their inner products, $\langle\square|\blacksquare\rangle$.  This particular inner product, $\langle\square|\blacksquare\rangle$, is the probability of measuring $|\square\rangle$ when looking at the state $|\blacksquare\rangle$.  This can be tricky, but in this example we’re assuming “orthogonal states”.  In mathspeak, $\langle\square|\blacksquare\rangle=\langle \blacksquare |\square\rangle=0$ and $\langle\square|\square\rangle=\langle\blacksquare|\blacksquare\rangle=1$.

An interaction with the system in the state $|\psi\rangle=\alpha|\blacksquare\rangle + \beta|\square\rangle$ performs the operation $M_\blacksquare|\psi\rangle = \frac{|\blacksquare\rangle\langle\blacksquare|\psi\rangle}{\left|\langle\blacksquare|\psi\rangle\right|}$ with probability $p=\left|\langle\blacksquare|\psi\rangle\right|^2=|\alpha|^2$ or $M_\square|\psi\rangle = \frac{| \square\rangle\langle \square |\psi\rangle}{\left|\langle \square |\psi\rangle\right|}$ with probability $p=\left|\langle \square |\psi\rangle\right|^2=|\beta|^2$.

Here comes the same example again, but in a more “wave function collapse sort of way”.  We still start with the state $|A(?)\rangle|B(?)\rangle\left(\frac{|\square\rangle+|\blacksquare\rangle}{\sqrt{2}}\right)$, but when Alice or Bob looks at the system (or perhaps a little before or a little after) the wave function of the state collapses.  It needs to be one or the other, so the quantum system suddenly and inexplicably becomes

$\begin{array}{ll} &M_\blacksquare\left(\frac{|\square\rangle+|\blacksquare\rangle}{\sqrt{2}}\right) \\[2mm] =& \frac{|\blacksquare\rangle\langle\blacksquare|\left(\frac{|\square\rangle+|\blacksquare\rangle}{\sqrt{2}}\right)}{\left|\langle\blacksquare|\left(\frac{|\square\rangle+|\blacksquare\rangle}{\sqrt{2}}\right)\right|} \\[2mm] =& \frac{|\blacksquare\rangle\left(\frac{\langle\blacksquare|\square\rangle+\langle\blacksquare|\blacksquare\rangle}{\sqrt{2}}\right)}{\left|\frac{\langle\blacksquare|\square\rangle+\langle\blacksquare|\blacksquare\rangle}{\sqrt{2}}\right|} \\[2mm] =& \frac{|\blacksquare\rangle\left(\frac{0+1}{\sqrt{2}}\right)}{\left|\frac{0+1}{\sqrt{2}}\right|} \\[2mm] =& \frac{|\blacksquare\rangle\frac{1}{\sqrt{2}}}{\frac{1}{\sqrt{2}}} \\[2mm] =& |\blacksquare\rangle \end{array}$

That is to say, the superposition suddenly becomes only the measured state while the other states suddenly vanish (by some totally unknown means).  After this collapse, the “look” operators function normally.

This “measurement operator” (which does all the collapsing) is definitively non-linear, which is a big red flag.  We never see non-linear operations when we study isolated sets of quantum systems, no matter how they interact.  The one and only time we see non-linear operations is when we include the environment and even then only when we assume that there’s something unique and special about the environment.  When you assume that literally everything is a quantum system capable of being in superpositions of states the quantum laws become ontologically parsimonious (easy to write down).  We lose our special position as the only version of us that exists, but we gain a system of physical laws that doesn’t involve lots of weird exceptions, extra rules, and paradoxes.

## Q: In base ten 1=0.999…, but what about in other bases? What about in base 1?

Physicist: Yup!

The “0.999… thing” has been done before, but here’s the idea.  When we write 0.9, 0.99, 0.999, 0.9999, etc. we’re writing a sequence of numbers that gets closer and closer to 1.  Specifically, if there are N 9’s, then $1-0.\underbrace{9\ldots9}_{\textrm{N nines}}=\frac{1}{10^N}$.  What this means is that no matter how close you want to get to 1, you can get closer than that with enough 9’s.  If the 9’s never end, then the difference between 1 and 0.999… is zero.  The way our number system is constructed, this means that “0.999…” and “1” (or even “1.000…”) are one and the same in every respect.

As a quick aside, if you think it’s weird that 1 = 0.999…, then you’re in good company.  Literally everyone thinks it’s weird.  But be cool.  There are no grand truths handed down from on high.  The rules of math are like the rules of Monopoly; if you don’t like them you can change them, but you risk the “game” becoming inconsistent or merely no fun.

The same philosophy applies to every base.  A good way to understand bases is to first consider what it means to write down a number in a given base.  For example:

372.51 = 300 + 70 + 2 + 0.5 + 0.01 = 3×102 + 7×101 + 2×100 + 5×10-1 + 1×10-2

As you step to the right along a number, each digit you see is multiplied by a lower power of ten.  This is why our number system is called “base 10”.  But beyond being convenient to use our fingers to count, there’s nothing special about the number ten.  If we could start over (and why not?), base 12 would be a much better choice.  For example, 1/3 in base 10 is “0.333…” and in base 12 it’s “0.4”; much nicer.  More succinctly: 0.333…10 = 0.412

Because we work in base 10, if you tried to “build a tower to one” from below, you’d want to use the largest possible number each time.  0.910 is the largest one-digit number, o.9910 is the largest two-digit number, 0.99910 is the largest three-digit number, etc.  This is because “910” is the largest number in base 10.

In the exact same way, 0.89 is the largest one-digit number in base 9, 0.889 is the largest two-digit number, and so on.  The same way that it works in base 10, in base 9: 19 = 0.888…9 !

The easiest way to picture the number 1 as an infinite sum of parts is to picture 0.111…2 , “0.111…” in base 2.

If you cut a stick in half, then cut one of those halves in half, then cut one of those quarters in half, and so on, then the collected set of sticks would have the same length as the original stick.  This is the same as saying 1 = 0.111… in base 2.

If you cut take a stick and cut it in half, then cut one of those halves in half, then cut one of those quarters in half, and so on, the collected set of sticks would have the same length as the original stick.  One half, 0.12 , plus 1 quarter, 0.112 , plus 1 eighth, 0.1112 , add infinitum equals one.  That is to say, 12 , = 0.111…2 .

But things get tricky when you get to base 1.  The largest value in a given base is always less than the base; 9 for base 10, 6 for base 7, 37 for base 38, 1 for base 2.  So you’d expect that the largest number in base 1 is 01 .  The problem is that the whole idea of a base system breaks down in “base 1”.  In base ten, the number “abc.de10 .” means “ax102 + bx101 + cx100 + dx10-1 + ex10-2” (where “a” through “e” are some digits, but who cares what they are).  More generally, in base B we have abc.deB = axB2 + bxB1 + cxB0 + dxB-1 + exB-2.

But in base 1, abc.de1 = ax12 + bx11 + cx10 + dx1-1 + ex1-2 = a+b+c+d+e.  That is to say, every digit has the same value.  Rather than digits to the left being worth more, and digits to the right being worth less, in base 1 every position is the same as every other.  So, base one is a number system where the position of the numbers don’t matter and technically the only number you get to work with is zero.  Not useful.

If you’re gauche enough to allow the use of the number 1 in base 1, then you can count.  But not fast.

Top: The oldest recorded numbers, “4” and “17” in base 1.  Bottom: Using a modern abuse of notation, “96” and “15” in base 1.

In base 1, 1 = 10 = 0.000001 = 10000 = 0.01.  Therefore, the infinitely repeating number 0.111…1 = .  That is, if you add up an infinite number string of 1’s, 1+1+1+1…, then naturally you get infinity.

In short: The “1 = 0.999… thing” is just a symptom of how the our number system is constructed, and has nothing in particular to do with 9’s or 10’s.  The base 1 number system is kind of a mess and, outside of tallying, isn’t worth using.  Base 1 is broken when we consider this particular problem, but that’s to be expected since it’s usually broken.

Answer Gravy: We can use the definition of the base system to show that 1 = 0.999…10 = 0.333…4 = 0.555…6 etc.  For example, when we write the number 0.999… in base 10, what we explicitly mean is

$0.999\ldots_{10} = 9\times 10^{-1}+9\times 10^{-1}+9\times 10^{-1}+\ldots = \sum_{n=1}^\infty 9\times 10^{-n}$

The same idea is true in any base B, $1=0.(B-1)(B-1)(B-1)\ldots_B$.  Showing that this is equal to one is a matter of working this around until it looks like a geometric sum, $1+r+r^2+r^3+\ldots$, and using the fact that $\sum_{n=0}^\infty r^n = \frac{1}{1-r}$.

$\begin{array}{ll} &0.(B-1)(B-1)(B-1)\ldots_B \\[2mm] =& (B-1)\times B^{-1}+(B-1)\times B^{-1}+(B-1)\times B^{-1}+\ldots \\[2mm] =& \sum_{n=1}^\infty (B-1)\times B^{-n} \\[2mm] =& \sum_{n=0}^\infty (B-1)\times B^{-n-1} \\[2mm] =& \sum_{n=0}^\infty \frac{B-1}{B}\times B^{-n} \\[2mm] =& \frac{B-1}{B}\sum_{n=0}^\infty B^{-n} \\[2mm] =& \frac{B-1}{B}\sum_{n=0}^\infty \left(\frac{1}{B}\right)^n \\[2mm] =& \frac{B-1}{B} \frac{1}{1-\frac{1}{B}} \\[2mm] =& \frac{B-1}{B} \frac{B}{B-1} \\[2mm] =& \frac{B-1}{B} \frac{B}{B-1} \\[2mm] =&1 \end{array}$

Notice that issues with base 1, B=1, crop up twice.  First because you’re adding up nothing, 0=B-1, over and over.  Second because $1+B^{-1}+B^{-2}+B^{-3}+\ldots = \frac{B}{B-1} = \infty$ when B=1.  So don’t use base 1.  There are better things to do.

The excellent pdf about constructing the real numbers was written by this guy.

Posted in -- By the Physicist, Math | 5 Comments

## Q: How many samples do you need to take to know how big a set is?

The Original Question Was: I have machine … and when I press a button, it shows me one object that it selects randomly. There are enough objects that simply pressing the button until I no longer see new objects is not feasible.  Pressing the button a specific number of times, I take a note of each object I’m shown and how many times I’ve seen it.  Most of the objects I’ve seen, I’ve seen once, but some I’ve seen several times.  With this data, can I make a good guess about the size of the set of objects?

Physicist: It turns out that even if you really stare at how often each object shows up, your estimate for the size of the set never gets much better than a rough guess.  It’s like describing where a cloud is; any exact number is silly.  “Yonder” is about as accurate as you can expect.  That said, there are some cute back-of-the-envelope rules for estimating the sizes of sets witnessed one piece at a time, that can’t be improved upon too much with extra analysis.  The name of the game is “have I seen this before?”.

The situation in question.

Zero repeats

It wouldn’t seem like seeing no repeats would give you information, but it does (a little).

How many times do you have to randomly look at cards before they start to look familiar?

The probability of seeing no repeats after randomly drawing K objects out of a set of N total objects is $P \approx e^{-\frac{K^2}{N}}$.  This equation isn’t exact, but (for N bigger than ten or so) it’s way to close to matter.

The probability of seeing no repeats after K draws from a set of N=10,000 objects.

The probability is one for K=0 (if you haven’t looked at any objects, you won’t see any repeats), it drops to about 50% for $K=\sqrt{N}$ and about 10% for $K=2\sqrt{N}$.  This gives us a decent rule of thumb: in practice, if you’re drawing objects at random and you haven’t seen any repeats in the first K draws, then there are likely to be at least $K^2$ objects in the set.  Or, to be slightly more precise, if there are N objects, then there’s only about a 50% chance of randomly drawing $\sqrt{N}$ times without repeats.

Seeing only a handful of repeats allows you to very, very roughly estimate the size of the set (about the square of the number of times you’d drawn when you saw your first repeats, give or take a lot), but getting anywhere close to a good estimate requires seeing an appreciable fraction of the whole.

Some repeats

So, say you’ve seen an appreciable fraction of the whole.  This is arguably the simplest scenario.  If you’re making your way through a really big set and 60% (for example) of the time you see repeats, then you’ve seen about 60% of the things in the set.  That sounds circular, but it’s not quite.

The orbits of 14,000 worrisome objects.

For example, we’re in a paranoia-fueled rush to catalog all of the dangerous space rocks that might hit the Earth.  We’ve managed to find at least 90% of the Near Earth Objects that are over a km across and we can make that claim because whenever someone discovers a new one, it’s already old news at least 90% of the time.  If you decide to join the effort (which is a thing you can do), then be sure to find at least ten or you probably won’t get to put your name on a new one.

All repeats

There’s no line in the sand where you can suddenly be sure that you’ve seen everything in the set.  You’ll find new things less and less often, but it’s impossible to definitively say when you’ve seen the last new thing.

When should you stop looking for something new at the bottom?

I turns out that the probability of having seen all N objects in a set after K draws is approximately $P\approx e^{-Ne^{-\frac{K}{N}}}$, which is both admittedly weird looking and remarkably accurate.  This can be solved for K.

$K \approx N\ln\left(N\right) - N\ln\left(\ln\left(\frac{1}{P}\right)\right)$

When P is close to zero K is small and when P is close to one K is large.  The question is: how big is K when the probability changes?  Well, for reasonable values of P (e.g., 0.1<P<0.9) it turns out that $\ln\left(\ln\left(\frac{1}{P}\right)\right)$ is between -1 and 1.  You’re likely to finally see every object at least once somewhere in $(N-1)\ln(N).  You’ll already know approximately how many objects there are (N), because you’ve already seen (almost) all of them.

The probability of seeing every one of N=1000 objects at least once after K draws.  This ramps up around Nln(N)≈6,900.

So, if you’ve seen N objects and you’ve drawn appreciably more than $K=N\ln(N)$ times, then you’ve probably seen everything.  Or in slightly more back-of-the-envelope-useful terms: when you’ve drawn more than “K = 2N times the number of digits in K” times.

Answer Gravy: Those approximations are a beautiful triumph of asymptotics.  First:the probability of seeing every object.

When you draw from a set over-and-over you generate a sequence.  For example, if your set is the alphabet (where N=26), then a typical sequence might be something like “XKXULFQLVDTZAC…”

If you want only the sequences the include every letter at least once, then you start with every sequence (of which there are $N^K$) and subtract all of the sequences that are missing one of the letters.  The number of sequences missing a particular letter is $(N-1)^K$ and there are N letters, so the total number of sequences missing at least one letter is $N(N-1)^K$.  But if you remove all the sequences without an A and all the sequences without a B, then you’ve twice removed all the sequences missing both A’s and B’s.  So, those need to be added back.  There are $(N-2)^K$ sequences missing any particular 2 letters and there are “N choose 2” ways to be lacking 2 of the N letter.  We need to add ${N \choose 2} (N-2)^K$ back.  But the same problem keeps cropping up with sequences lacking three or more letters.  Luckily, this is not a new problem, so the solution isn’t new either.

By the inclusion-exclusion principle, the solution is to just keep flipping sign and ratcheting up the number of missing letters.  The number of sequences of K draws that include every letter at least once is $\underbrace{N^K}_{\textrm{any}}-\underbrace{{N\choose1}(N-1)^K}_{\textrm{any but one}}+\underbrace{{N\choose2}(N-2)^K}_{\textrm{any but two}}-\underbrace{{N\choose3}(N-3)^K}_{\textrm{any but three}}\ldots$ which is the total number of sequences, minus the number that are missing one letter, plus the number missing two, etc.  A more compact way of writing this is $\sum_{j=0}^N(-1)^j{N\choose j}(N-j)^K$.  The probability of seeing every letter at least once is just this over the total number of possible sequences, $N^K$, which is

$\begin{array}{rcl}P(all) &=& \frac{1}{N^K}\sum_{j=0}^N(-1)^j {N \choose j} (N-j)^K \\[2mm]&=& \sum_{j=0}^N(-1)^j {N \choose j} \left(1-\frac{j}{N}\right)^K \\[2mm]&=& \sum_{j=0}^N(-1)^j {N \choose j} \left[\left(1-\frac{j}{N}\right)^N\right]^\frac{K}{N} \\[2mm]&\approx& \sum_{j=0}^N(-1)^j {N \choose j} e^{-j\frac{K}{N}} \\[2mm]&=& \sum_{j=0}^N {N \choose j} \left(-e^{-\frac{K}{N}}\right)^j \\[2mm]&=& \sum_{j=0}^N {N \choose j} \left(-e^{-\frac{K}{N}}\right)^j 1^{N-j} \\[2mm]&=& \left(1-e^{-\frac{K}{N}}\right)^N \\[2mm]&=& \left(1-\frac{Ne^{-\frac{K}{N}}}{N}\right)^N \\[2mm]&\approx& e^{-Ne^{-\frac{K}{N}}} \end{array}$

The two approximations are asymptotic and both of the form $e^x \approx \left(1+\frac{x}{n}\right)^n$.  They’re asymptotic in the sense that they are perfect as n goes to infinity, but they’re also remarkably good for values of n as small as ten-ish.  This approximation is actually how the number e is defined.

This form is simple enough that we can actually do some algebra and see where the action is.

$\begin{array}{rcl} e^{-Ne^{-\frac{K}{N}}} &\approx& P \\[2mm] -Ne^{-\frac{K}{N}} &\approx& \ln(P) \\[2mm] e^{-\frac{K}{N}} &\approx& -\frac{1}{N}\ln\left(P\right) \\[2mm] e^{-\frac{K}{N}} &\approx& \frac{1}{N}\ln\left(\frac{1}{P}\right) \\[2mm] -\frac{K}{N} &\approx& \ln\left(\frac{1}{N}\ln\left(\frac{1}{P}\right)\right) \\[2mm] -\frac{K}{N} &\approx& -\ln\left(N\right) +\ln\left(\ln\left(\frac{1}{P}\right)\right) \\[2mm] K &\approx& N\ln\left(N\right) - N\ln\left(\ln\left(\frac{1}{P}\right)\right) \\[2mm] \end{array}$

Now: the probability of seeing no repeats.

The probability of seeing no repeats on the first draw is $\frac{N}{N}$, in the first two it’s $\frac{N(N-1)}{N^2}$, in the first three it’s $\frac{N(N-1)(N-2)}{N^3}$, and after K draws the probability is

$\begin{array}{rcl} P(no\,repeats) &=& \frac{N(N-1)\cdots(N-K+1)}{N^K} \\[2mm] &=& 1\left(1-\frac{1}{N}\right)\left(1-\frac{2}{N}\right)\cdots\left(1-\frac{K-1}{N}\right) \\[2mm] &=& \prod_{j=0}^{K-1}\left(1-\frac{j}{N}\right) \\[2mm] \ln(P) &=& \sum_{j=0}^{K-1}\ln\left(1-\frac{j}{N}\right) \\[2mm] &\approx& \sum_{j=0}^{K-1} -\frac{j}{N} \\[2mm] &=& -\frac{1}{N}\sum_{j=0}^{K-1} j \\[2mm] &\approx& -\frac{1}{N}\frac{1}{2}K^2 \\[2mm] &=& -\frac{K^2}{2N} \\[2mm] P &\approx& e^{-\frac{K^2}{2N}} \\[2mm] \end{array}$

The approximations here are $\ln(1+x)\approx x$, which is good for small values of x, and $\sum_{j=0}^{K-1} j \approx \frac{1}{2}K^2$, which is good for large values of K.  If K is bigger than ten or so and N is a hell of a lot bigger than that, then this approximation is remarkably good.

Posted in -- By the Physicist, Combinatorics, Math, Probability | 3 Comments

## Q: Does anti-matter really move backward through time?

Physicist: The very short answer is: yes, but not in time-traveler-kind-of-way.

There is a “symmetry” in physics implied by our most fundamental understanding of physical law, and is never violated by any known process, that’s called the “CPT symmetry“.  It says that if you take the universe and everything in it and flip the electrical charge (C), invert everything as though through a mirror (P), and reverse the direction of time (T), then the base laws of physics all continue to work the same.

Together, the PT amount to  putting a negative on the spacetime position, $(t,x,y,z)\to(-t,-x,-y,-z)$.  In addition to time this reflects all three spacial directions, and since each of these reflections reverses parity (flips left and right), these three reflections amount to just one P.  You find, when you do this (PT) in quantum field theory, that if you then flip the charge of the particles involved (C), then overall nothing really changes.  In literally every known interaction and phenomena (on the particle level), flipping all of the coordinates (PT) and the charge (C) leaves the base laws of physics unchanged.  It’s worth considering these flips one at a time.

Charge Conjugation Flip all the charges in the universe.  Most important for us, protons become negatively charged and electrons become positively charged.  Charge conjugation keeps all of the laws of electromagnetism unchanged.  Basically, after reversing all of the charges, likes are still likes (and repel) and opposites are still opposites (and attract).

Time Reversal If you watch a movie in reverse a lot of nearly impossible things happen.  Meals are uneaten, robots are unexploded, words are unsaid, and hearts are unbroken.  The big difference between the before and after in each situation is entropy, which almost always increases with time.  This is a “statistical law” which means that it only describes what “tends” to happen.  On scales-big-enough-to-be-seen entropy “doesn’t tend” to decrease in the sense that fire “doesn’t tend” to change ash into paper; it is a law as absolute as any other.  But on a very small scale entropy becomes more suggestion than law.  Interactions between individual particles play forward just as well as they play backwards, including particle creation and annihilation.

Left: An electron and a positron annihilate producing two photons. Right: Two photons interact creating an electron and a positron.  We see both of these events in nature routinely and they are literally time-reverses of each other.

Parity If you watch the world through a mirror, you’ll never notice anything amiss.  If you build a car, for example, and then build another that is the exact mirror opposite, then both cars will function just as well as the other.  It wasn’t until 1956 that we finally had an example of something that behaves differently from its mirror twin.  By putting ultra-cold radioactive cobalt-60 in a strong magnetic field the nuclei, and the decaying neutrons, were more or less aligned and we found that the electrons shot out (β radiation) in one direction preferentially.

Chien-Shiung Wu in 1956 demonstrating how difficult it is to build something that behaves differently than its mirror image.

The way matter interacts through the weak force has handedness in the sense that you can genuinely tell the difference between left and right.  During β (“beta minus”) decay a neutron turns into a proton while ejecting an electron, an anti-electron neutrino, and a photon or two (usually) out of the nucleus.  Neutrons have spin, so defining a “north” and “south” in analogy to the way Earth rotates, it turns out that the electron emitted during β decay is always shot out of the neutron’s “south pole”.  But mirror images spin in the opposite direction (try it!) so their “north-south-ness” is flipped.  The mirror image of the way neutrons decay is impossible.  Just flat out never seen in nature.  Isn’t that weird?  There doesn’t have to be a “parity violation” in the universe, but there is.

Matter’s interaction with the weak force is “handed”.  When emitting beta radiation (a weak interaction) matter and anti-matter are mirrors of each other.

Parity and charge are how anti-matter is different from matter.  All anti-matter particles have the opposite charge of their matter counterparts and their parity is flipped in the sense that when anti-particles interact using the weak force, they do so like matter’s image in a mirror.  When an anti-neutron decays into an anti-proton, a positron, and an electron-neutrino, the positron pops out of its “north pole”.

CPT is why physicists will sometimes say crazy sounding things like “an anti-particle (CP) is like the normal particle traveling back in time (T)”.  In physics, whenever you’re trying to figure out how an anti-particle will behave in a situation you can always reverse time and consider how a normal particle traveling into the past would act.

“Anti-matter acts like matter traveling backward in time”. Technically true, but not in a way that’s useful or particularly enlightening for almost anyone to know.

This isn’t as useful an insight as it might seem.  Honestly, this is useful for understanding beta decay and neutrinos and the fundamental nature of reality or whatever, but as far as your own personal understanding of anti-matter and time, this is a remarkably useless fact.  The “backward in time thing” is a useful way of describing individual particle interactions, but as you look at larger and larger scales entropy starts to play a more important role, and the usual milestones of passing time (e.g., ticking clocks, fading ink, growing trees) show up for both matter and anti-matter in exactly the same way.  It would be a logical and sociological goldmine if anti-matter people living on an anti-matter world were all Benjamin Buttons, but at the end of the day if you had a friend made of anti-matter (never mind how), you’d age and experience time in exactly the same way.  You just wouldn’t want to hang out in the same place.

The most important, defining characteristic of time is entropy and entropy treats matter and anti-matter in exactly the same way; the future is the future is the future for everything.

## Q: How do we know that everyone has a common anecestor? How do we know that someone alive today will someday be a common ancestor to everyone?

The original question was: From biology and genetics we know that any group of living organisms had a mitochondrial most recent common ancestor (mitochondrial Eve): a female organism who lived in the past such that all organisms in this group are her descendants.

How can we [theoretically] prove this? (I.e. without assumption that there’s equal probability to mate for any two individuals).

Also: can we prove that one of 3 billion women currently living on Earth will be the only ancestor of all human population some day in the future, and all other currently living women (except her mother and daughters) will have no descendants at that day?

Physicist:  The fact that everyone on Earth has a common female ancestor if you go back far enough is a direct consequence of the theory of common descent.  It looks like everything that lives is part of the same very extended family tree with a last universal common ancestor at its base.  In order to have two familial lines that never combine in the past you’d need to have more than one starting point for life, and all the evidence to date implies that there’s just the one.  Luckily, you don’t have to go all the way back to slime molds to find common ancestors for all humans; the most recent were standard, off-the-shelf people.

It turns out that animal and plant cells aren’t particularly good at producing usable energy, so before we could get around to the business of existing we needed to get past that problem.  The solution: fill our cells with a couple thousand symbiotic bacteria.  Literally, they’re not human; mitochondria reproduce on their own and have their own genetic code.  There’s a hell of a lot of communication and exchange of material between them and our cells, and without them there wouldn’t be an us, but they are (arguably) separate organisms that we are absolutely dependent on and which are completely dependent on us.

Here comes the important bit: eggs cells have mitochondria but sperm cells don’t, so mitochondria are passed strictly from mother to child.  There’s no implicit reason for your mitochondria and your father’s to be related at all.  The nice thing about that is that it keeps the genetic lineage very simple: all of your mitochondria are essentially clones of those in the egg cell you started as and (for our female readers) any of your children’s mitochondria will essentially be clones of yours.  The genes of sexually reproducing beings are a lot trickier to keep track of over time; every generation half of our genes are dumped and the other half are shuffled with someone else’s (which makes your DNA is unique).  The one real advantage to talking about mtDNA (mitochondrial DNA) is that you only have a single chain of ancestors to worry about.  By the way: you can do exactly the same thing with the Y-chromosome and direct male lines.

Mitochondria are passed only from mother to child, so if you follow your direct female line back, you’re following your mitochondria’s ancestors as well.

If you have a group of creatures with two types of mitochondria, two “haplogroups“, living under a population ceiling, then eventually one or the other will be bred out.  The math behind this is essentially the Drunkard’s Walk.  The number of folk in a haplogroup can increase or decrease forever, unless it gets to zero; given nough time and no where else to go (a population limit), eventually the drunkard’s walk will take him off a cliff (zero population).  So, if you start with a small village and several haplogroups, then after a few generations you’ll probably have fewer.

Why did that particular woman become the Mitochondrial Eve instead of one of the others?  Luck and fecundity.

That isn’t saying much.  It boils down to the rather fatalistic statement that “in order to be the last thing standing, you just have to wait for everything else to die off”.  We can’t prove that a woman today will eventually be declared, very post-mortem, the Mitochondrial Eve to everyone (that is; all but one haplogroup will die off).  But statistically: that’ll definitely happen.  To within less than a 1% error, every inherited line of every kind has died off; practically every species, sub-species, gene, haplogroup, whatever, has gone extinct leaving only the amazing scraps that remain.  That’s evolution in a nutshell: you chip away all the life that isn’t an efficient, functioning organism (and then a hell of a lot more besides) and the inconceivably tiny fraction that remains is (some of the) efficient, functioning organisms.  So, chances are that every living haplogroup presently around will go extinct eventually.  When there’s one haplogroup left, then you can say that they all have a common Mitochondrial Eve and when there are zero haplogroups left, then all human issues become moot.

In order to definitely not get a modern Mitochondrial Eve, you’d need human populations that are absolutely independent (and viable) forever.  Maybe if we colonized Mars and then completely forgot it?

So, if any given haplogroup eventually dies out, then why is there more than one?  Well good news: over long time scales (millennia) mtDNA accrues tiny changes through random mutation, leading to a relatively few distinguishable lineages.  We live in a kind of meta-family tree, where each branch is entire groups of female lines.  Even though some branches stop, others will randomly sprout new branches (“new” meaning “with mtDNA that’s detectably different at all”).

By reading the mtDNA of people from all over the world, you can track how folk have expanded across the planet.

In fact, by carefully looking at the differences in our mtDNA and theirs, we can show that Neanderthals are not a parent species of ours, but cousins, and the common “Eve” that we share with them lived around half a million years ago.  We can do the same thing with regular genetics and damn near any living thing to see how and how closely we’re related.

Humanity’s (present) Mitochondrial Eve is not our unique common ancestor nor is she our most recent common ancestor.  Mitochondrial Eve is merely the most recent ancestor of all living people by means of a direct female line alone.  If you allow for the inclusion of both men and women, then our most recent common ancestor jumps from around 120,000 – 150,000 years ago, for direct female lines only, to as recent as 3,000 years ago, for any ol’ lines.  There’s no way to even reasonably guess who or where any of these common ancestors were.  Probably lived near big population centers?  Maybe?

Looking at mitochondria is a solid, simple way of understanding evolution and inheritance, but it doesn’t paint an accurate picture of how genes move around populations.  An important fact to keep in mind is that a huge fraction of the people alive today will eventually be a common ancestor to all of humanity.  Even if your family doesn’t increase or decrease the population (every couple has two kids), your family is still going to grow exponentially (2 kids, 4 grandchildren, 8 great-grandchildren, …).  It only takes about 30 generations to have a billion descendants (less if you really work at it), so if you have kids, and they have kids, and so on, then in less time than you’d expect (not forever anyway) your genes will be spread thinly throughout all of humanity.  Many of your particular genetics won’t make it, but many of them will.  For example, if your haplogroup dies out then your mtDNA won’t be around, but the genes that dictate, say, the shape of your earlobe might end up all over the place.

Too true.

Arguably, that’s the reason for sex.  Maybe not the first reason most folk would cite, but they weren’t at the meeting a billion years ago.  By mixing our genes every generation we can prevent genetic lines from disappearing forever, which is good: more diversity means more combinations for evolution to try out in a pinch.  Almost as good, useful mutations and combinations of genes can be distributed and used by (a random subset of) the entire species after a mere few thousand years!  Sexual reproduction literally makes us much better at evolving.  Huzzah for doing it!

Posted in -- By the Physicist, Biology, Evolution, Probability | 3 Comments

## Q: Are some colors of light impossible? Can any color of light be made?

Physicist: Just so we can talk about this using physics rather than poetry, for the sake of this article “color” really means “frequency”.  Light frequency is a bit more objective than color and includes things we can’t see (like ultraviolet).

When you put gas in a tube and pass electricity through it you get light.  Electro-dynamically speaking, this is basically just beating the hell out of the atoms and letting the atoms ring like bells (only emitting light instead of sound).  Individual atoms are like simply shaped bells; the “tones” they make (or absorb) are very specific.  The colors emitted by atoms, their “spectra”, are different for different elements.  This is tremendously useful because it allows us to look at the light coming from something and immediately know what that thing is made of.

When given the energy (like in a neon light) atoms will emit light with very specific frequencies.  These are the lines for mercury, lithium, cadmium, strontium, calcium, and sodium that happen to fall in the visual spectrum.  There are many more lines that we mere humans can’t see.

Some colors fall into the gaps between the spectral lines of all elements (technically, almost all of them do).  So you can be forgiven for thinking that there are some colors that just never show up in nature.  Fortunately, there are a lot of effects that shift all those lines, blur them, or even split them.

Top: When a light source moves toward or away from you its spectrum is shifted up or down. Middle: In a hot gas the atoms are moving randomly, so whether the lines are Doppler shifted up or down is also random. This broadens the spectral lines. Bottom: With a very strong magnet you can change some of the electron’s energy levels and as a result transitions that would normally create the same color become separated.

So you can create any color by starting with a few distinct colors and then moving your light source either toward or away from an observer to Doppler shift one of your colors to the target color.  That’s a little like using a piano to get some notes, and then driving it around to get all the notes in-between.

An efficient means to play C above high C.

You can also just use a non-atomic source of light, like something that’s glowing hot, and then select out the color you want with a monochromator (the rest is chucked out).  But, as with any process that involves throwing out almost everything, this is remarkably inefficient.

Monocromators generate light with a single color (one might say “mono-chromatic” light) by just throwing away all the light that isn’t the right color.

So, say you want to create a very specific color of light with as little “waste light” as possible.  Well, a good place to start is lasers.  For some slick quantum reasons, the photons in laser beams are all kinda “clones” of each other; inside of any kind of laser device, the presence of the right kind of photon encourages the creation of other identical photons.  Pretty soon your laser is bubbling over with coherent, identical photons and not a lot else.  These share, among other things, a common color.

Laser beams: not a lot of colors.

It turns out that only a very small fraction of atomic spectral lines are good candidates for lasers.  It is possible to create laser light at any frequency between microwaves and X-rays, but the technique is a long way from efficient.  You can use the Doppler effect to change the color of your laser, but in order to make any significant change you’ll need to get it going a significant fraction of the speed of light.

If you want to efficiently create any very specific color of light, you just need to strap a laser to a starship.  So… no need to be picky.

An efficient means to make green light.