Q: How does “1+2+3+4+5+… = -1/12” make any sense?

Physicist: When wondering across the vast plains of the internet, you may have come across this bizarre fact, that 1+2+3+4+\ldots=-\frac{1}{12}, and immediately wondered: Why isn’t it infinity?  How can it be a fraction?  Wait… it’s negative?

An unfortunate conclusion may be to say “math is a painful, incomprehensible mystery and I’m not smart enough to get it”.  But rest assured, if you think that 1+2+3+4+\ldots=\infty, then you’re right.  Don’t let anyone tell you different.  The -\frac{1}{12} thing falls out of an obscure, if-it-applies-to-you-then-you-already-know-about-it, branch of mathematics called number theory.

Number theorists get very excited about the “Riemann Zeta Function”, ζ(s), which is equal to \zeta(s)=\sum_{n=1}^\infty\left(\frac{1}{n}\right)^s=1+\frac{1}{2^s}+\frac{1}{3^s}+\frac{1}{4^s}+\ldots whenever this summation is equal to a number.  If you plug s=-1 into ζ(s), then it seems like you should get \zeta(-1)=1+2+3+4+\ldots, but in this case the summation isn’t a number (it’s infinity) so it’s not equal to ζ(s).  The entire -1/12 thing comes down to the fact that \zeta(-1)=-\frac{1}{12}, however (and this is the crux of the issue), when you plug s=-1 into ζ(s), you aren’t using that sum.  ζ(s) is a function in its own right, which happens to be equal to 1+\frac{1}{2^s}+\frac{1}{3^s}+\frac{1}{4^s}+\ldots for s>1, but continues to exist and make sense long after the summation stops working.

The bigger s is, the smaller each term, \frac{1}{n^s}, and ζ(s) will be.  As a general rule, if s>1, then ζ(s) is an actual number (not infinity).  When s=1, \zeta(1)=1+\frac{1}{2}+\frac{1}{3}+\frac{1}{4}+\ldots=\infty.  It is absolutely reasonable to expect that for s<1, ζ(s) will continue to be infinite.  After all, ζ(1)= and each term in the sum only gets bigger for lower values of s.  But that’s not quite how the Riemann Zeta Function is defined.  ζ(s) is defined as \zeta(s)=\sum_{n=1}^\infty\left(\frac{1}{n}\right)^s when s>1 and as the “analytic continuation” of that sum otherwise.

 You ever analytically continue a staircase, but there isn't actually another step, so you're like "whoa, my math!"

You know what this bridge would do if it kept going.  “Analytic continuation” is essentially the same idea; take a function that stops (perhaps unnecessarily) and continue it in exactly the way you’d expect.

The analytic continuation of a function is unique, so nailing down ζ(s) for s>1 is all you need to continue it out into the complex plane.

Complex numbers take the form “A+Bi” (where i=\sqrt{-1}).  The only thing about complex numbers you’ll need to know here is that complex numbers are pairs of real numbers (regular numbers), A and B.  Being a pair of numbers means that complex numbers form the “complex plane“, which is broader than the “real number line“.  A is called the “real part”, often written A=Re[A+Bi], and B is the “imaginary part”, B=Im[A+Bi].

That blow up at s=1 seems insurmountable on the real number line, but in the complex plane you can just walk around it to see what’s on the other side.

Left: ζ(s) for values of s>1 on the real number line. Right: The same red function surrounded by its “analytic continuation” into the rest of the complex plane.  Notice that, except for s=1, ζ(s) is completely smooth and well behaved.

\zeta(s)=1+\frac{1}{2^s}+\frac{1}{3^s}+\frac{1}{4^s}+\ldots defines a nice, smooth function for Re[s]>1.  When you extend ζ(s) into the complex plane this summation definition rapidly stops making sense, because the sum “diverges” when s≤1.  But there are two ways to diverge: a sum can either blow up to infinity or just never get around to being a number.  For example, 1-1+1-1+1-1+… doesn’t blow up to infinity, but it also never settles down to a single number (it bounces between one and zero).  ζ(s) blows up at s=1, but remains finite everywhere else.  If you were to walk out into the complex plane you’d find that right up until the line where Re[s]=1, ζ(s) is perfectly well-behaved.  Looking only at the values of ζ(s) you’d see no reason not to keep going, it’s just that the \zeta(s)=1+\frac{1}{2^s}+\frac{1}{3^s}+\frac{1}{4^s}+\ldots formulation suddenly stops working for Re[s]≤1.

But that’s no problem for a mathematician (see the answer gravy below).  You can follow ζ(s) from the large real numbers (where the summation definition makes sense), around the blow up at s=1, to s=-1 where you find a completely mundane value.  It’s -\frac{1}{12}.  No big deal.

So the 1+2+3+\ldots=-\frac{1}{12} thing is entirely about math enthusiasts being so (justifiably) excited about ζ(s) that they misapply it, and has nothing to do with what 1+2+3+… actually equals.

This shows up outside of number theory.  In order to model particle interactions correctly, it’s important to take into account every possible way for that interaction to unfold.  This means taking an infinite sum, which often goes well (produces a finite result), but sometimes doesn’t.  It turns out that physical laws really like functions that make sense in the complex plane.  So when “1+2+3+…” started showing up in calculations of certain particle interactions, physicists turned to the Riemann Zeta function and found that using -1/12 actually turned out to be the right thing to do (in physics “the right thing to do” means “the thing that generates results that precisely agree with experiment”).

A less technical shortcut (or at least a shortcut with the technicalities swept under the rug) for why the summation is -1/12 instead of something else can be found here.  For exactly why ζ(-1)=-1/12, see below.

Answer Gravy: Figuring out that ζ(-1)=-1/12 takes a bit of work.  You have to find an analytic continuation that covers s=-1, and then actually evaluate it.  Somewhat surprisingly, this is something you can do by hand.

Often, analytically continuing a function comes down to re-writing it in such a way that its “poles” (the locations where it blows up) don’t screw things up more than they absolutely have to.  For example, the function f(z)=\sum_{n=0}^\infty z^n only makes sense for -1<z<1, because it blows up at z=1 and doesn’t converge at z=-1.

f(z) can be explicitly written without a summation, unlike ζ(s), which gives us some insight into why it stops making sense for |z|≥1.  It just so happens that for |z|<1, f(z)=\frac{1}{1-z}.  This clearly blows up at z=1, but is otherwise perfectly well behaved; the issues at z=-1 and beyond just vanish.  f(z) and \frac{1}{1-z} are the same in every way inside of -1<z<1.  The only difference is that \frac{1}{1-z} doesn’t abruptly stop, but instead continues to make sense over a bigger domain.  \frac{1}{1-z} is the analytic continuation of f(z)=\sum_{n=0}^\infty z^n to the region outside of -1<z<1.

Finding an analytic continuation for ζ(s) is a lot trickier, because there’s no cute way to write it without using an infinite summation (or product), but the basic idea is the same.  We’re going to do this in two steps: first turning ζ(s) into an alternating sum that converges for s>0 (except s=1), then turning that into a form that converges everywhere (except s=1).

For seemingly no reason, multiply ζ(s) by (1-21-s):

\begin{array}{ll}    &\left(1-2^{1-s}\right)\zeta(s)\\[2mm]  =&\left(1-2^{1-s}\right)\sum_{n=1}^\infty\frac{1}{n^s}\\[2mm]  =&\sum_{n=1}^\infty\frac{1}{n^s}-2^{1-s}\sum_{n=1}^\infty\frac{1}{n^s}\\[2mm]  =&\sum_{n=1}^\infty\frac{1}{n^s}-2\sum_{n=1}^\infty\frac{1}{(2n)^s}\\[2mm]  =&\left(\frac{1}{1^s}+\frac{1}{2^s}+\frac{1}{3^s}+\frac{1}{4^s}+\ldots\right)-2\left(\frac{1}{2^s}+\frac{1}{4^s}+\frac{1}{6^s}+\frac{1}{8^s}+\ldots\right)\\[2mm]  =&\frac{1}{1^s}-\frac{1}{2^s}+\frac{1}{3^s}-\frac{1}{4^s}+\ldots\\[2mm]  =&\sum_{n=1}^\infty\frac{(-1)^{n-1}}{n^s}\\[2mm]  =&\sum_{n=0}^\infty\frac{(-1)^{n}}{(n+1)^s}  \end{array}

So we’ve got a new version of the Zeta function, \zeta(s)=\frac{1}{1-2^{1-s}}\sum_{n=0}^\infty\frac{(-1)^{n}}{(n+1)^s}, that is an analytic continuation because this new sum converges in the same region the original form did (s>1), plus a little more (0<s≤1).  Notice that while the summation no longer blows up at s=1, \frac{1}{1-2^{1-s}} does.  Analytic continuation won’t get rid of poles, but it can express them differently.

There’s a clever old trick for shoehorning a summation into converging: Euler summation.  Euler (who realizes everything) realized that \sum_{n=k}^\infty {n\choose k}\frac{y^{k+1}}{(1+y)^{n+1}}=1 for any y.  This is not obvious.  Being equal to one means that you can pop this into the middle of anything.  If that thing happens to be another sum, it can be used to make that sum “more convergent” for some values of y.  Take any sum, \sum_{k=0}^\infty A_k, insert Euler’s sum, and swap the order of summation:

\begin{array}{rcl}  \sum_{k=0}^\infty A_k&=&\sum_{k=0}^\infty \left(\sum_{n=k}^\infty {n\choose k}\frac{y^{k+1}}{(1+y)^{n+1}}\right)A_k\\[2mm]  &=&\sum_{k=0}^\infty \sum_{n=k}^\infty {n\choose k}\frac{y^{k+1}}{(1+y)^{n+1}}A_k\\[2mm]  &=&\sum_{n=0}^\infty \sum_{k=0}^n {n\choose k}\frac{y^{k+1}}{(1+y)^{n+1}}A_k\\[2mm]  &=&\sum_{n=0}^\infty \frac{1}{(1+y)^{n+1}}\sum_{k=0}^n {n\choose k}y^{k+1}A_k\\[2mm]  \end{array}

If the original sum converges, then this will converge to the same thing, but it may also converge even when the original sum doesn’t.  That’s exactly what you’re looking for when you want to create an analytic continuation; it agrees with the original function, but continues to work over a wider domain.

This looks like a total mess, but it’s stunningly useful.  If we use Euler summation with y=1, we create a summation that analytically continues the Zeta function to the entire complex plane: a “globally convergent form”.  Rather than a definition that only works sometimes (but is easy to understand), we get a definition that works everywhere (but looks like a horror show).

\begin{array}{rcl}  \zeta(s)&=&\frac{1}{1-2^{1-s}}\sum_{k=0}^\infty\frac{(-1)^{k}}{(k+1)^s}\\[2mm]  &=&\frac{1}{1-2^{1-s}}\sum_{n=0}^\infty\frac{1}{(1+y)^{n+1}}\sum_{k=0}^n{n\choose k}y^{k+1}\frac{(-1)^{k}}{(k+1)^s}\\[2mm]  &=&\frac{1}{1-2^{1-s}}\sum_{n=0}^\infty\frac{1}{(1+1)^{n+1}}\sum_{k=0}^n{n\choose k}1^{k+1}\frac{(-1)^{k}}{(k+1)^s}\\[2mm]  &=&\frac{1}{1-2^{1-s}}\sum_{n=0}^\infty\frac{1}{2^{n+1}}\sum_{k=0}^n{n\choose k}\frac{(-1)^{k}}{(k+1)^s}\\[2mm]  \end{array}

This is one of those great examples of the field of mathematics being too big for every discovery to be noticed.  This formulation of ζ(s) was discovered in the 1930s, forgotten for 60 years, and then found in an old book.

For most values of s, this globally convergent form isn’t particularly useful for us “calculate it by hand” folk, because it still has an infinite sum (and adding an infinite number of terms takes a while).  Very fortunately, there’s another cute trick we can use here.  When n>d, \sum _{k=0}^n{n \choose k}(-1)^{k}k^d=0.  This means that for negative integer values of s, that infinite sum suddenly becomes finite because all but a handful of terms are zero.

So finally, we plug s=-1 into ζ(s)

\begin{array}{rcl}  \zeta(-1)&=&\frac{1}{1-2^{2}}\sum_{n=0}^\infty\frac{1}{2^{n+1}}\sum_{k=0}^n{n\choose k}(-1)^{k}(k+1)\\[2mm]  &=&-\frac{1}{3}\sum_{n=0}^1\frac{1}{2^{n+1}}\sum_{k=0}^n{n\choose k}(-1)^{k}(k+1)\\[2mm]  &=&-\frac{1}{3}\cdot\frac{1}{2^{0+1}}{0\choose 0}(-1)^{0}(0+1)-\frac{1}{3}\cdot\frac{1}{2^{1+1}}{1\choose 0}(-1)^{0}(0+1)-\frac{1}{3}\cdot\frac{1}{2^{1+1}}{1\choose 1}(-1)^{1}(1+1)\\[2mm]  &=&-\frac{1}{3}\cdot\frac{1}{2}\cdot1\cdot1\cdot1-\frac{1}{3}\cdot\frac{1}{4}\cdot1\cdot1\cdot1-\frac{1}{3}\cdot\frac{1}{4}\cdot1\cdot(-1)\cdot2\\[2mm]  &=&-\frac{1}{6}-\frac{1}{12}+\frac{1}{6}\\[2mm]  &=&-\frac{1}{12}  \end{array}

Keen-eyed readers will note that this looks nothing like 1+2+3+… and indeed, it’s not.

Posted in -- By the Physicist, Equations, Math, Number Theory | 9 Comments

There’s something new under the Sun!

Physicist: Good news!

On October 19th, for the first time in history, we detected an object inside of our solar system that originated somewhere else.  It’s a rock the size of a stadium that passed inside of Mercury’s orbit and is presently on its way back out.

This is basically the plot of Arthur C. Clark’s seminal novel “Rendezvous With Rama” (sans aliens), which is about a thing from space passing really close to the Sun on a hyperbolic orbit before leaving the solar system.  The International Astronomical Union has incorrectly chosen the name “ʻOumuamua” for our first known interstellar visitor, rather than the obvious, pronounceable, and generally superior name “Rama“.  A rose is a rose is a rose, right up until the IAU gives it a weird name.

That dot in the middle is not from around here.  The background stars are streaks because this is a long exposure by a telescope tracking with Rama.  Everything we know comes from observations like this, so while we can determine speed and position, we have to estimate size and mass.

We can tell that Rama is new to the neighborhood because it is traveling far too fast to be gravitationally bound to the Sun.  When it’s in interstellar space, Rama travels at 26 km/s (with respect to the Sun), which is about how fast other stars in our stellar neighborhood are moving.  As it fell toward the inner solar system it gained speed until, at perihelion, it was a quarter of the distance from the Sun as the Earth and was screaming along at nearly 90 km/s.  For comparison, once Voyager 1 has shaken the Sun’s gravitational pull, it will be traveling at about 13 km/s through interstellar space.

The path of Rama (or ʻOumuamua if you insist), the first observed interstellar object to pass through the solar system.

We know that Rama is rocky because it’s not producing a tail.  Anything with lots of water ice boils when it gets close to the Sun.  Rama is distinctly tail-free, which means that it’s an old, cold rock that’s having its first and last Summer for millions or billions of years.

Presumably, Rama is just one of many interstellar objects to pass through our solar system.  We’ve been capable of detecting such objects for only a few decades and this one came remarkably close to us (several dozen Moon orbits, which is “close”).  If it hadn’t been so nearby, we probably never would have noticed it.  So if interstellar visitors are rare, then we are seriously lucky.

It has long been assumed that objects such as Rama should exist.  During the early formation of a star system the new planets and moons follow chaotic, often destructive orbits.  Not only do the young planetoids slam into each other, but near misses can fling asteroids, moons, and even planets out of a star system entirely.  As recently as 1982, we’ve witnessed Jupiter rudely chucking a comet into interstellar space.  The main reason things in a mature solar system (such as our own) follow such nice, clean orbits is that almost everything that didn’t stay in its own lane met a bad end a long time ago.  So we can expect the interstellar void to be thinly populated by an unimaginable number of rejected asteroids and comets.

Artist interpretation of a young solar system.

This is likely to be Rama’s backstory.  It formed somewhere that wasn’t here, and was ejected from its home star system through an impact event or gravitational slingshot.  The fact that it isn’t presently icy means that, for one reason or another, it spent enough time close to its parent star for all of its ice to boil off.  It could be a very old comet or a native of an inner solar system.  If we bothered to send a probe to catch up with it, we would be able to gain a lot of insight into the ancient chemical history of another star system.  It’s been drifting around for so long that it may be impossible to say which star it calls home, but just knowing how another solar system might be different is a big deal.  Fortunately, the movement of things in space is extremely predictable, so we’re not going to lose track of Rama anytime soon.  Unfortunately, we’d need a rocket about twice as fast as New Horizons (the fastest rocket to date) to catch up with it.

Totally swinging wide: the existence of Rama gives a little more credence to the idea of lithopanspermia, the idea that life might travel between stars embedded in rocks.  It has already been shown that some microbial life is perfectly happy in the cold, irradiated vacuum of space, and can even survive the massive accelerations involved in both ejection from, and falling onto, planetary surfaces.  What hasn’t been shown (until now) is that those interstellar rocks exist at all.  Spotting Rama in the sky is akin to living your whole life on a remote Pacific Island and seeing a (probably empty) boat for the first time.  Exciting stuff!

The picture of Rama itself is from here.

The picture of Rama’s trajectory (slightly altered) is from here.

The blender picture is from here.

Posted in -- By the Physicist, Astronomy | 5 Comments

Q: Is it more efficient to keep keep a swimming pool warm or let it get cold and heat it up again?

The original question: I’m having a debate with my wife that I think you can help us resolve.  We have a swimming pool in our back yard.  It has an electric heater, which we set to keep the pool water at 85 degrees Fahrenheit.  We’re going to be away for three days.  My wife says we should turn the heater off while we’re away to save energy.  I say that it takes less energy to maintain the pool at 85 while we’re away then to let it drop about ten degrees (summer evenings can get quite cool where we live in upstate New York) and then use the heater to restore 85.  Who’s right?  And what variables are relevant to the calculation?  The average temperature for the three days?  The volume of the pool?  The efficiency of the heater?

Physicist: The correct answer is always to leave the heater off for as long as possible, as often as possible.  The one and only gain from leaving a pool heater on is that it will be warm when you get in.  The same is true of all heaters (pool, car, space, whatever).

You can gain a lot of intuition for how heat flows from place to place by imagining it as a bunch of “heat beads”, randomly skittering through matter.  Each bead rolls independently from place to place, continuously changing direction, and the more beads there are in a given place, the hotter it is.

If all of these marbles started to randomly roll around, more would roll out of the circle than roll in.  Heat flow works the same way: hotter to cooler.

Although heat definitely does not take the form of discrete chunks of energy meandering about, this metaphor is remarkably good.  You can actually derive useful math from it, which is a damn sight better than most science metaphors (E.g., “space is like a rubber sheet” is not useful for actual astrophysicists).  In very much the same way that a concentrated collection of beads will spread themselves uniformly, hot things will lose heat energy to the surrounding cooler environment. If the temperature of the pool and the air above it are equal, then the amount of heat that flows out of the pool is equal to the amount that flows in.  But if the pool is hotter, then more “beads” will randomly roll out than randomly roll in.

A difference in temperature leads to a net flow of heat energy.  In fact, the relationship is as simple as it can (reasonably) get: the rate of heat transfer is proportional to the difference in temperature.  So, if the surrounding air is 60°, then an 80° pool will shed heat energy twice as fast as a 70° pool.  This is why coffee/tea/soup will be hot for a little while, but tepid for a long time; it cools faster when it’s hotter.

In a holy bucket, the higher the water level, the faster the water flows out.  Differences in temperature work the same way.  The greater the difference in temperature, the faster the heat flows out.

Ultimately, the amount of energy that a heater puts into the pool is equal to the heat lost from the pool.  Since you lose more heat energy from a hot pool than from a cool pool, the most efficient thing you can do is keep the temperature as low as possible for as long as possible.  The most energy efficient thing to do is always to turn off the heater.  The only reason to keep it on is so that you don’t have to wait for the water to warm up before you use it.

It seems as though a bunch of water is a good place to store heat energy, but the more time something spends being hot, the more energy it drains into everything around it.

Answer Gravy: This gravy is just to delve into why picturing heat flow in terms of the random motion of hypothetical particles is a good idea.  It’s well worth taking a stroll through statistical mechanics every now and again.

The diffusion of heat is governed, not surprisingly, by the “diffusion equation”.


The same equation describes the random motion of particles.  If ρ(x,t) is the amount of heat at any given location, x, and time, t, then the diffusion equation tells you how that heat will change over time.  On the other hand, if ρ is either the density of “beads” or the probability of finding a bead at a particular place (if the movement of the beads is independent, then these two situations are interchangeable), then once again the diffusion equation describes how the bead density changes over time.  This is why the idea of “heat beads” is a useful intuition to use; the same math that describes the random motion of particles also describes how heat spreads through materials.

In one of his terribly clever 1905 papers, Einstein described how the random motion of individual atoms gives rise to diffusion.  The idea is to look at ρ(x,t) and then figure out ρ(x,t+τ), which is what it will be one small time step, τ, later. If you put a particle down somewhere, wait τ seconds and check where it is over and over, then you can figure out the probability of the particle drifting some distance, ε.  Just to give it a name, call that probability ϕ(ε).

ϕ(ε) is a recipe for figuring out how ρ(x,t) changes over time.  The probability that the particle will end up at, say, x=5 is equal to the probability that it was at x=3 times ϕ(2) plus the probability that it was at x=1 times ϕ(4) plus the probability that it was at x=8 times ϕ(-3) and so on, for every number.  Adding up the probabilities from every possible starting position is the sort of thing integrals were made for:


So far this is standard probability fare.  Einstein’s cute trick was to say “Listen, I don’t know what ϕ(ε) is, but I know it’s symmetrical and it’s some kind of probability thing, which is pretty good, amirite?”.

ρ(x,t) varies smoothly (particles don’t teleport) which means ρ(x,t) can be expanded into a Taylor series in x or t.  That looks like:




where “…” are the higher order terms, that are all very small as long as τ and ε are small.  Plugging the expansion of ρ(x+ε,t) into \int\rho(x+\epsilon,t)\phi(-\epsilon)d\epsilon we find that


Einstein’s cute tricks both showed up in that last line.  \int\epsilon\phi(-\epsilon)d\epsilon=0 since ϕ(ε) is symmetrical (so the negative values of ε subtract the same amount that the positive values add) and \int\phi(-\epsilon)d\epsilon=1 since ϕ(ε) is a probability distribution (and the sum of probabilities over all possibilities is 1).

So, \rho(x,t+\tau)=\int\rho(x+\epsilon,t)\phi(-\epsilon)d\epsilon can be written:


To make the jump from discrete time steps to continuous time, we just let the time step, τ, shrink to zero (which also forces the distances involved, ε, to shrink since there’s less time to get anywhere).  As τ and ε get very small, the higher order terms dwindle away and we’re left with \frac{d}{dt}\rho(x,t)=\left[\int\frac{\epsilon^2}{2\tau}\phi(-\epsilon)d\epsilon\right]\frac{d^2}{dx^2}\rho(x,t).  We may not know what ϕ(ε), but it’s something, so \int\frac{\epsilon^2}{2\tau}\phi(-\epsilon)d\epsilon is something too.  Call that something “k” and you’ve got the diffusion equation, \frac{d\rho}{dt}=k\frac{d^2\rho}{dx^2}.

The second derivative, \frac{d^2\rho}{dx^2}, is a way to describe how a function is curving.  When it’s positive the function is curving up the way your hand curves when you palm is pointing up and when it’s negative the function is curving down.  By saying that the time derivative is proportional to the 2nd position derivative, you’re saying that “hills” will drop and “valleys” will rise.  This is exactly what your intuition should say about heat: if a place is hotter than the area around it, it will cool off.

The diffusion equation dictates that if the graph is concave down, the density drops and if the graph is concave up, the density increases.

This is a very long-winded way of saying “think of heat as randomly moving particles, because the math is the same”.  But again, heat isn’t actually particles, it’s just that picturing it as such leads to useful insights.  While the equation and the intuition are straight forward, actually solving the diffusion equation in almost any real world scenario is a huge pain.

The corners cool off faster because there are more opportunities for “heat beads” to fall out of the material there.  Although this is exactly what the diffusion equation predicts, actually doing the math by hand is difficult.

It’s all well and good to talk about how heat beads randomly walk around inside of a material, but if that material isn’t uniform or has an edge, then suddenly the math gets remarkably nasty.  Fortunately, if all you’re worried about is whether or not you should leave your heater on, then you’re probably not sweating the calculus.

The shuttle tile photo is from here.

Posted in -- By the Physicist, Engineering, Physics | 14 Comments

Q: What determines the size of the bright spot when you focus sunlight with a lens?

Physicist: This question really appeals to my ten-year-old self.  If you’ve never tried to burn something with a lens, collect three pairs of sunglasses, a magnifying lens, and something you dislike.  On a bright day, put on all three pairs of sunglasses and give it a shot.

Burning stuff with a magnifying lens: education at its finest.

Typically, when you try to focus sunlight with a lens you’ll get something that looks like a comet.  You turn the lens and move it up and down and just at the moment when the bright spot gets the smallest, you suddenly see clouds.  This is because the smallest you can concentrate light from the Sun using a lens is an in-focus image of the Sun, and incidentally, an in-focus image of anything else in the sky as well.

Paper held in the “focal plane” of a pair of binoculars during an eclipse.  This is the smallest that the light can be focused.  If the paper were any closer or farther away the images of the Sun would become blurry and more spread out.

There are several types of lens, depending on how the two sides are curved, but in each case the defining characteristic is the “focal length”, f.  The big rule for lenses is: parallel light collects together at the same place, the “focal point”, which is f away from the lens.

Left: Parallel light beams collect together at a focal point, with a different point for each different set of parallel beams.  The collection of these points is the “focal plane”.  Right: Here we’ll consider the simplest case, light perpendicular to the lens, since the results apply in general and we can talk about the focal point (“the focus”) instead of the focal plane.

An “image” is the location where the light from an “object” is brought together by a lens.  The image is so named because if there happens to be a screen at that location, an image of the object will appear in-focus.  The distance to the image depends on the distance to the object.

For lenses, the rules are: #1) Parallel light beams will pass through the focus (dots) on the far side of the lens.  #2) Light passing through the center of the lens doesn’t change direction.  These rules allow us to figure out the size and location of the Image given the size and location of the Object (black bars).

It takes a little algebra (included in the “answer gravy” below), but rule #1 in the caption above leads to the Thin Lens Equation:


where do is distance from lens to object, di is the distance from lens to image, and f is the focal length of the lens.

Rule #2 is easier to work with and, with a little less algebra, leads to the magnification equation:


where hi and ho are the sizes of the image and object.  M is the factor for how much bigger the image is than the object.  For a microscope, you want M to be big.

The distance to the Sun, do, is infinite in every useful sense.  That’s why it took so long to figure out how far away it is.  As far as (reasonably sized) lenses go, a few miles may as well be the other side of the universe.  The distance to the Sun is somewhere in that “may as well be on the other side of the universe” range.  In fact, everything in the sky falls into the same category, which is why the night sky looks like a dome, rather than the unending void that it is.

Plug do=\infty into the thin lens equation and you find that \frac{1}{f}=\frac{1}{d_i}+\frac{1}{\infty}=\frac{1}{d_i}, and so di=f.  In other words, the Sun, and everything else “at infinity”, will be in focus f away from the lens.  This coincides with the definition of the focal length, since light from a source at infinity is always parallel.  That should jive with your our experience: if you look at a light ten feet away and you step back and forth, the angle to the light changes, but if you look at the Sun (don’t) and step back and forth, the angle to the Sun stays the same.

Now on the face of it, it would seem as though there’s no way to figure out the size of the Sun’s image, hi, since \frac{h_i}{h_o}=\frac{d_i}{d_o}=\frac{f}{\infty}=0.  As with the resolution to so many other unwanted infinities, all we need is a little algebra.

Without some fairly sophisticated techniques, it’s impossible to gauge how far away the Sun is.  But while do is out of reach (for most of us), \frac{h_o}{d_o} isn’t.  By measuring the Sun’s apparent size in the sky, it’s easy to figure out that it’s 110 times farther away than it is big.  The same thing, very coincidentally, is true of the Moon; it is 110 Moon-diameters away from Earth.  Mathematically speaking, \frac{h_o}{d_o}=\frac{1}{110}.

Retentive readers will recall that we haven’t brought the magnification equation, \frac{h_i}{h_o}=\frac{d_i}{d_o}, into play.  That was intentional; pretending there’s an issue heightens drama.  Solving for the image size, hi, and plugging in what we already know, \frac{h_o}{d_o}=\frac{1}{110} and di=f, we get:


So, how big is the bright spot when you focus sunlight?  At best, a little less than 1% of the distance to the lens.  To concentrate light from the Sun as much as possible, you want to position the target (the solar oven, the ant trail, the piece of wood, whatever) at the lens’ focal distance.  When you do, the bright spot will have a diameter of \frac{f}{110}.  This ultimately comes down to the fact that the Sun is really far away, and 110 times smaller than it is distant.

The bigger a lens is, the more Sunlight it can gather.  So the best lenses for burning stuff are as big as possible (more light), with the shortest possible focal length (tiny spot).

Answer Gravy: Every now and again, it’s worth seeing how statements of fact turn into math.  Geometric optics (which sounds much more impressive than it is) basically boils down to the two rules mentioned above:

#1) Parallel light beams will pass through the focus on the far side of the lens.

#2) Light passing through the center of the lens doesn’t change direction.

The thin lens equation almost immediately falls out of these rules and the geometry of similar triangles.  Rule #2 is the easiest to work with.  Looking at the line that passes through the center of the lens, we find two similar triangle on either side.  The triangle on the left has legs ho and do and the other with legs hi and di.  Since these triangles are similar, the ratio of these lengths are the same: \frac{h_o}{d_o}=\frac{h_i}{d_i}.  Rearranging this to put the h’s and d’s on opposite sides produces the magnification equation, \frac{d_i}{d_o}=\frac{h_i}{h_o}.  Easy!

Using the same basic trick on the triangles formed by rule #1, we can find the thin lens equation.  Looking at just the right side (which side doesn’t matter), there are two triangles similar to each other.  A smaller one with legs f and ho and a larger one with legs di and ho+hi.


And there it is.  We start with a pair of intuitive, but difficult to apply principles, and end up with a pair of unintuitive, but easily applicable equations.

Posted in -- By the Physicist, Equations, Geometry, Math, Physics | Leave a comment

Q: Why are numerical methods necessary? If we can’t get exact solutions, then how do we know when our approximate solutions are any good?

Physicist: When a problem can be solved exactly and in less time than forever, then it is “analytically solvable”.  For example, “Jack has 2 apples and Jill has 3 apples, how many apples do they have together?” is analytically solvable.  It’s 5.  Exactly 5.

Precisely solving problems is what we often imagine that mathematicians are doing, but unfortunately you can’t swing a cat in the sciences without hitting a problem that can’t be solved analytically.  In reality “doing math” generally involves finding an answer rather than the answer.  While you may not be able to find the exact answer, you can often find answers with “arbitrary precision”.  In other words, you can find an approximate answer and the more computer time / scratch paper you’re willing to spend, the closer that approximation will be to the correct answer.

A lot of math problems can’t be directly solved.  For example: most.

A trick that lets you get closer and closer to an exact answer is a “numerical method”.  Numerical methods do something rather bizarre: they find solutions close to the answer without ever knowing what that answer is.  As such, an important part of every numerical method is a proof that it works.  So that there is the answer: we need numerical methods because a lot of problems are not analytically solvable and we know they work because each separate method comes packaged with a proof that it works.

It’s remarkable how fast you can stumble from solvable to unsolvable problems.  For example, there is an analytic solution for the motion of two objects interacting gravitationally but no solution for three or more objects.  This is why we can prove that two objects orbit in ellipses and must use approximations and/or lots of computational power to predict the motion of three or more objects.  This inability is the infamous “three body problem“.  It shows up in atoms as well; we can analytically describe the shape of electron orbitals and energy levels in individual hydrogen atoms (1 proton + 1 electron = 2 bodies), but for every other element we need lots of computer time to get things right.

Even for purely mathematical problems the line between analytically solvable and only numerically solvable is razor thin.  Questions with analytic solutions include finding the roots of 2nd degree polynomials, such as 0=x^2+2x-3, which can be done using the quadratic equation:


The quadratic equation is a “solution by radicals”, meaning you can find the solution using only the coefficients in front of each term (in this case: 1, 2, -3).  There’s a solution by radicals for 3rd degree polynomials and another for 4th degree polynomials (they’re both nightmares, so don’t).  However, there can never be a solution by radicals for 5th or higher degree polynomials.  If you wanted to find the solutions of 2x^5-3x^4+\pi x^3+x^2-x+\sqrt{3}=0 (and who doesn’t?) there is literally no way to find an expression for the exact answers.

Numerical methods have really come into their own with the advent of computers, but the idea is a lot older.  The decimal expansion of \pi (3.14159…) never ends and never repeats, which is a big clue that you’ll never find its value exactly.  At the same time, it has some nice properties that make it feasible to calculate \pi to arbitrarily great precision.  In other words: numerical methods.  Back in the third century BC, Archimedes realized that you could approximate \pi by taking a circle with circumference \pi, then inscribing a polygon inside it and circumscribing another polygon around it.  Since the circle’s perimeter is always longer than the inscribed polygon’s and always shorter than the circumscribed polygon’s, you can find bounds for the value of \pi.

Hexagons inscribed (blue) and circumscribed (red) on a circle with circumference π.  The perimeters of such polygons, in this case p6=3 and P6=2√33.46, must always fall on either side of π≈3.14.

By increasing the number of sides, the polygons hug the circle tighter and produce a closer approximation, from both above and below, of \pi.  There are fancy mathematical ways to prove that this method approaches \pi, but it’s a lot easier to just look at the picture, consider for a minute, and nod sagely.

Archimedes’ trick wasn’t just noticing that \pi must be between the lengths of the two polygons.  That’s easy.  His true cleverness was in coming up with a mathematical method that takes the perimeters of a given pair of k-sided inscribed and circumscribed polygons with perimeters p_k and P_k and produces the perimeters for polygons with twice the numbers of sides, p_{2k} and P_{2k}.  Here’s the method:

P_{2k}={\frac {2p_{k}P_{k}}{p_{k}+P_{k}}}\quad \quad p_{2k}={\sqrt {p_{k}P_{2k}}}

By starting with hexagons, where p_6=3 and P_6=2\sqrt{3}, and doubling the number of sides 4 times Archie found that for inscribed and circumscribed enneacontahexagons p_{96}=\frac{223}{71}\approx3.14085 and P_{96}=\frac{22}{7}\approx3.14286.  In other words, he managed to nail down \pi to about two decimal places: 3.14085<\pi<3.14286.

Some puzzlement has been evinced by Mr. Medes’ decision to stop where he did, with just two decimal points in \pi.  But not among mathematicians.  The mathematician’s ancient refrain has always been: “Now that I have demonstrated my amazing technique to the world, someone else can do it.”.

To be fair to Archie, this method “converges slowly”.  It turns out that, in general, p_n=n\sin\left(\frac{\pi}{n}\right)\approx\pi-\frac{\pi^3}{6}\frac{1}{n^2} and P_n=n\tan\left(\frac{\pi}{n}\right)\approx\pi+\frac{\pi^3}{3}\frac{1}{n^2}.  Every time you double n the errors, \frac{\pi^3}{3}\frac{1}{n^2} and \frac{\pi^3}{6}\frac{1}{n^2}, get four times as small (because 2^2=4), which translates to very roughly one new decimal place every two iterations.  \pi never ends, but still: you want to feel like you’re making at least a little progress.

Some numerical methods involve a degree of randomness and yet still manage to produce useful results.  Speaking of \pi, here’s how you can calculate it “accidentally”.  Generate n pairs of random numbers, (x,y), between 0 and 1.  Count up how many times x^2+y^2\le1 and call that number k.  If you do this many times, you’ll find that \frac{4k}{n}\approx\pi.

If you randomly pick a point in the square, the probability that it will be in the grey region is π/4.

As you generate more and more pairs and tally up how many times x^2+y^2\le1 the law of large numbers says that \frac{k}{n}\to\frac{\pi}{4}, since that’s the probability of randomly falling in the grey region in the picture above.  This numerical method is even slower than Archimedes’ not-particularly-speedy trick.  According to the central limit theorem, after n trials you’re likely to be within about \frac{0.41}{\sqrt{n}} of \pi.  That makes this a very slowly converging method; it takes about half a million trials before you can nail down “3.141”.  This is not worth trying.

Long story short, most applicable math problems cannot be done directly.  Instead we’re forced to use clever approximations and numerical methods to get really close to the right answer (assuming that “really close” is good enough).  There’s no grand proof or philosophy that proves that all these methods work but, in general, if we’re not sure that a given method works, we don’t use it.

Answer Gravy: There are a huge number of numerical methods and entire sub-sciences dedicated to deciding which to use and when.  Just for a more detailed taste of a common (fast) numerical method and the proof that it works, here’s an example of Newton’s Method, named for little-known mathematician Wilhelm Von Method.

Newton’s method finds (approximates) the zeros of a function, f(x).  That is, it finds a value, \lambda, such that f(\lambda)=0.  The whole idea is that, assuming the function is smooth, when you follow the slope at a given point down you’ll find a new point closer to a zero/solution.  All polynomials are “smooth”, so this is a good way to get around that whole “you can’t find the roots of 5th or higher degree polynomials” thing.

The “big idea” behind Newton’s Method: pick a point (xn), follow the slope, find yourself closer (xn+1), repeat.

The big advantage of Newton’s method is that, unlike the two \pi examples above, it converges preternaturally fast.

The derivative is the slope, so f^\prime(x_n) is the slope at the point (x_n,f(x_n)).  Considering the picture above, that same slope is given by the rise, f(x_n), over the run, x_n-x_{n+1}.  In other words f^\prime(x_n)=\frac{f(x_n)}{x_n-x_{n+1}} which can be solved for x_{n+1}:


So given a point near a solution, x_n, you can find another point that’s closer to the true solution, x_{n+1}.  Notice that if f(x_n)\approx0, then x_{n+1}\approx x_n.  That’s good: when you’ve got the right answer, you don’t want your approximation to change.

To start, you guess (literally… guess) a solution, call it x_0.  With this tiny equation in hand you can quickly find x_1.  With x_1 you can find x_2 and so on.  Although it can take a few iterations for it to settle down, each new x_n is closer than the last to the actual solution.  To end, you decide you’re done.

Say you need to solve x=\cos(x) for x.  Never mind why.  There is no analytical solution (this comes up a lot when you mix polynomials, like x, or trig functions or logs or just about anything).  The correct answer starts with \lambda=0.739085133215160641655312\ldots

y=x and y=cos(x). They clearly intersect, but there’s no way to analytically solve for exactly where.

First you write it in such a way that you can apply Newton’s method: f(x)=\cos(x)-x=0.  The derivative is f^\prime(x)=-\sin(x)-1 and therefore:


First make a guess.  I do hereby guess x_0=3.  Plug that in and you find that:


Plug back in what you get out several times and:

\begin{array}{ll}    x_0&=3\\    x_1&=-0.496558178297331398840279\ldots\\    x_2&=2.131003844480994964494021\ldots\\    x_3&=0.689662720778373223587585\ldots\\    x_4&=0.739652997531333829185767\ldots\\    x_5&=0.739085204375836184250693\ldots\\    x_6&=0.739085133215161759778809\ldots\\    x_7&=0.739085133215160641655312\ldots\\    x_8&=0.739085133215160641655312\ldots\\    \end{array}

In this particular case, x_0 through x_3 jump around a bit.  Sometimes Newton’s method does this forever (try x_0=5) in which case: try something else or make a new guess.  It’s not until x_4 that Newton’s method starts to really zero in on the solution.  Notice that (starting at x_4) every iteration establishes about twice as many decimal digits than the previous step:

\begin{array}{ll}    \vdots\\    x_4&=0.739\ldots\\    x_5&=0.739085\ldots\\    x_6&=0.73908513321516\ldots\\    x_7&=0.739085133215160641655312\ldots\\    \vdots    \end{array}

We know that Newton’s method works because we can prove that it converges to the solution.  In fact, we can show that it converges quadratically (which is stupid fast).  Something “converges quadratically” when the distance to the true solution is squared with every iteration.  For example, if you’re off by 0.01, then in the next step you’ll be off by around (0.01)^2=0.0001.  In other words, the number of digits you can be confident in doubles every time.

Here’s why it works:

A smooth function (which is practically everywhere, for practically any function you might want to write down) can be described by a Taylor series.  In this case we’ll find the Taylor series about the point x_n and use the facts that 0=f(\lambda) and x_n-\frac{f\left(x_n\right)}{f^\prime\left(x_n\right)}.

\begin{array}{rcl}  0&=&f(\lambda) \\[2mm]  0&=&f(x_n+(\lambda-x_n)) \\[2mm]  0&=&f(x_n)+f^\prime(x_n)\left(\lambda-x_n\right)+\frac{1}{2}f^{\prime\prime}(x_n)\left(\lambda-x_n\right)^2+\ldots \\[2mm]  0&=&\frac{f\left(x_n\right)}{f^\prime\left(x_n\right)}+\left(\lambda-x_n\right)+\frac{f^{\prime\prime}(x_n)}{2f^\prime\left(x_n\right)}\left(\lambda-x_n\right)^2+\ldots \\[2mm]  0&=&\lambda-\left(x_n-\frac{f\left(x_n\right)}{f^\prime\left(x_n\right)}\right)+\frac{f^{\prime\prime}(x_n)}{2f^\prime\left(x_n\right)}\left(\lambda-x_n\right)^2+\ldots \\[2mm]  0&=&\lambda-x_{n+1}+\frac{f^{\prime\prime}(x_n)}{2f^\prime\left(x_n\right)}\left(\lambda-x_n\right)^2+\ldots \\[2mm]  \lambda-x_{n+1}&=&-\frac{f^{\prime\prime}(x_n)}{2f^\prime\left(x_n\right)}\left(\lambda-x_n\right)^2+\ldots  \end{array}

The “…” becomes small much faster than \left(\lambda-x_n\right)^2 as x_n and \lambda get closer.  At the same time, \frac{f^{\prime\prime}(x_n)}{2f^\prime\left(x_n\right)} becomes effectively equal to \frac{f^{\prime\prime}(\lambda)}{2f^\prime\left(\lambda\right)}.  Therefore \lambda-x_{n+1}\propto\left(\lambda-x_n\right)^2 and that’s what quadratic convergence is.  Note that this only works when you’re zeroing in; far away from the correct answer Newton’s method can really bounce around.

Posted in -- By the Physicist, Computer Science, Equations, Math | 10 Comments

Burning Man 2017

Long ago, Ask a Mathematician / Ask a Physicist was two guys sitting around in the desert talking to strangers about whatever came to mind.  It’s been a while, but we’re heading back to Burning Man for more of the same!

If you happen to find yourself in the desert, have a question, and/or want to waste time with a Mathematician and a Physicist, you can find us here

There!  12-4 on Thursday.

from 12:00 to 4:00 on Thursday the 31st.  According to the official schedule we’re a gathering or party in a red tent between center camp and the Man.  That same schedule goes on to say:

“Ask a Mathematician / Ask a Physicist is two people sitting in the desert talking to other people in the desert. Ever wonder about the universe?  Entanglement?  The nature of logic?  Got a nagging question that’s been bothering you for years or just want to hang out and listen to other people’s questions?  We can help!”

Posted in -- By the Physicist | 8 Comments