There’s something new under the Sun!

Physicist: Good news!

On October 19th, for the first time in history, we detected an object inside of our solar system that originated somewhere else.  It’s a rock the size of a stadium that passed inside of Mercury’s orbit and is presently on its way back out.

This is basically the plot of Arthur C. Clark’s seminal novel “Rendezvous With Rama” (sans aliens), which is about a thing from space passing really close to the Sun on a hyperbolic orbit before leaving the solar system.  The International Astronomical Union has incorrectly chosen the name “ʻOumuamua” for our first known interstellar visitor, rather than the obvious, pronounceable, and generally superior name “Rama“.  A rose is a rose is a rose, right up until the IAU gives it a weird name.

That dot in the middle is not from around here.  The background stars are streaks because this is a long exposure by a telescope tracking with Rama.  Everything we know comes from observations like this, so while we can determine speed and position, we have to estimate size and mass.

We can tell that Rama is new to the neighborhood because it is traveling far too fast to be gravitationally bound to the Sun.  When it’s in interstellar space, Rama travels at 26 km/s (with respect to the Sun), which is about how fast other stars in our stellar neighborhood are moving.  As it fell toward the inner solar system it gained speed until, at perihelion, it was a quarter of the distance from the Sun as the Earth and was screaming along at nearly 90 km/s.  For comparison, once Voyager 1 has shaken the Sun’s gravitational pull, it will be traveling at about 13 km/s through interstellar space.

The path of Rama (or ʻOumuamua if you insist), the first observed interstellar object to pass through the solar system.

We know that Rama is rocky because it’s not producing a tail.  Anything with lots of water ice boils when it gets close to the Sun.  Rama is distinctly tail-free, which means that it’s an old, cold rock that’s having its first and last Summer for millions or billions of years.

Presumably, Rama is just one of many interstellar objects to pass through our solar system.  We’ve been capable of detecting such objects for only a few decades and this one came remarkably close to us (several dozen Moon orbits, which is “close”).  If it hadn’t been so nearby, we probably never would have noticed it.  So if interstellar visitors are rare, then we are seriously lucky.

It has long been assumed that objects such as Rama should exist.  During the early formation of a star system the new planets and moons follow chaotic, often destructive orbits.  Not only do the young planetoids slam into each other, but near misses can fling asteroids, moons, and even planets out of a star system entirely.  As recently as 1982, we’ve witnessed Jupiter rudely chucking a comet into interstellar space.  The main reason things in a mature solar system (such as our own) follow such nice, clean orbits is that almost everything that didn’t stay in its own lane met a bad end a long time ago.  So we can expect the interstellar void to be thinly populated by an unimaginable number of rejected asteroids and comets.

Artist interpretation of a young solar system.

This is likely to be Rama’s backstory.  It formed somewhere that wasn’t here, and was ejected from its home star system through an impact event or gravitational slingshot.  The fact that it isn’t presently icy means that, for one reason or another, it spent enough time close to its parent star for all of its ice to boil off.  It could be a very old comet or a native of an inner solar system.  If we bothered to send a probe to catch up with it, we would be able to gain a lot of insight into the ancient chemical history of another star system.  It’s been drifting around for so long that it may be impossible to say which star it calls home, but just knowing how another solar system might be different is a big deal.  Fortunately, the movement of things in space is extremely predictable, so we’re not going to lose track of Rama anytime soon.  Unfortunately, we’d need a rocket about twice as fast as New Horizons (the fastest rocket to date) to catch up with it.

Totally swinging wide: the existence of Rama gives a little more credence to the idea of lithopanspermia, the idea that life might travel between stars embedded in rocks.  It has already been shown that some microbial life is perfectly happy in the cold, irradiated vacuum of space, and can even survive the massive accelerations involved in both ejection from, and falling onto, planetary surfaces.  What hasn’t been shown (until now) is that those interstellar rocks exist at all.  Spotting Rama in the sky is akin to living your whole life on a remote Pacific Island and seeing a (probably empty) boat for the first time.  Exciting stuff!

The picture of Rama itself is from here.

The picture of Rama’s trajectory (slightly altered) is from here.

The blender picture is from here.

Posted in -- By the Physicist, Astronomy | 7 Comments

Q: Is it more efficient to keep keep a swimming pool warm or let it get cold and heat it up again?

The original question: I’m having a debate with my wife that I think you can help us resolve.  We have a swimming pool in our back yard.  It has an electric heater, which we set to keep the pool water at 85 degrees Fahrenheit.  We’re going to be away for three days.  My wife says we should turn the heater off while we’re away to save energy.  I say that it takes less energy to maintain the pool at 85 while we’re away then to let it drop about ten degrees (summer evenings can get quite cool where we live in upstate New York) and then use the heater to restore 85.  Who’s right?  And what variables are relevant to the calculation?  The average temperature for the three days?  The volume of the pool?  The efficiency of the heater?

Physicist: The correct answer is always to leave the heater off for as long as possible, as often as possible.  The one and only gain from leaving a pool heater on is that it will be warm when you get in.  The same is true of all heaters (pool, car, space, whatever).

You can gain a lot of intuition for how heat flows from place to place by imagining it as a bunch of “heat beads”, randomly skittering through matter.  Each bead rolls independently from place to place, continuously changing direction, and the more beads there are in a given place, the hotter it is.

If all of these marbles started to randomly roll around, more would roll out of the circle than roll in.  Heat flow works the same way: hotter to cooler.

Although heat definitely does not take the form of discrete chunks of energy meandering about, this metaphor is remarkably good.  You can actually derive useful math from it, which is a damn sight better than most science metaphors (E.g., “space is like a rubber sheet” is not useful for actual astrophysicists).  In very much the same way that a concentrated collection of beads will spread themselves uniformly, hot things will lose heat energy to the surrounding cooler environment. If the temperature of the pool and the air above it are equal, then the amount of heat that flows out of the pool is equal to the amount that flows in.  But if the pool is hotter, then more “beads” will randomly roll out than randomly roll in.

A difference in temperature leads to a net flow of heat energy.  In fact, the relationship is as simple as it can (reasonably) get: the rate of heat transfer is proportional to the difference in temperature.  So, if the surrounding air is 60°, then an 80° pool will shed heat energy twice as fast as a 70° pool.  This is why coffee/tea/soup will be hot for a little while, but tepid for a long time; it cools faster when it’s hotter.

In a holy bucket, the higher the water level, the faster the water flows out.  Differences in temperature work the same way.  The greater the difference in temperature, the faster the heat flows out.

Ultimately, the amount of energy that a heater puts into the pool is equal to the heat lost from the pool.  Since you lose more heat energy from a hot pool than from a cool pool, the most efficient thing you can do is keep the temperature as low as possible for as long as possible.  The most energy efficient thing to do is always to turn off the heater.  The only reason to keep it on is so that you don’t have to wait for the water to warm up before you use it.

It seems as though a bunch of water is a good place to store heat energy, but the more time something spends being hot, the more energy it drains into everything around it.

Answer Gravy: This gravy is just to delve into why picturing heat flow in terms of the random motion of hypothetical particles is a good idea.  It’s well worth taking a stroll through statistical mechanics every now and again.

The diffusion of heat is governed, not surprisingly, by the “diffusion equation”.


The same equation describes the random motion of particles.  If ρ(x,t) is the amount of heat at any given location, x, and time, t, then the diffusion equation tells you how that heat will change over time.  On the other hand, if ρ is either the density of “beads” or the probability of finding a bead at a particular place (if the movement of the beads is independent, then these two situations are interchangeable), then once again the diffusion equation describes how the bead density changes over time.  This is why the idea of “heat beads” is a useful intuition to use; the same math that describes the random motion of particles also describes how heat spreads through materials.

In one of his terribly clever 1905 papers, Einstein described how the random motion of individual atoms gives rise to diffusion.  The idea is to look at ρ(x,t) and then figure out ρ(x,t+τ), which is what it will be one small time step, τ, later. If you put a particle down somewhere, wait τ seconds and check where it is over and over, then you can figure out the probability of the particle drifting some distance, ε.  Just to give it a name, call that probability ϕ(ε).

ϕ(ε) is a recipe for figuring out how ρ(x,t) changes over time.  The probability that the particle will end up at, say, x=5 is equal to the probability that it was at x=3 times ϕ(2) plus the probability that it was at x=1 times ϕ(4) plus the probability that it was at x=8 times ϕ(-3) and so on, for every number.  Adding up the probabilities from every possible starting position is the sort of thing integrals were made for:


So far this is standard probability fare.  Einstein’s cute trick was to say “Listen, I don’t know what ϕ(ε) is, but I know it’s symmetrical and it’s some kind of probability thing, which is pretty good, amirite?”.

ρ(x,t) varies smoothly (particles don’t teleport) which means ρ(x,t) can be expanded into a Taylor series in x or t.  That looks like:




where “…” are the higher order terms, that are all very small as long as τ and ε are small.  Plugging the expansion of ρ(x+ε,t) into \int\rho(x+\epsilon,t)\phi(-\epsilon)d\epsilon we find that


Einstein’s cute tricks both showed up in that last line.  \int\epsilon\phi(-\epsilon)d\epsilon=0 since ϕ(ε) is symmetrical (so the negative values of ε subtract the same amount that the positive values add) and \int\phi(-\epsilon)d\epsilon=1 since ϕ(ε) is a probability distribution (and the sum of probabilities over all possibilities is 1).

So, \rho(x,t+\tau)=\int\rho(x+\epsilon,t)\phi(-\epsilon)d\epsilon can be written:


To make the jump from discrete time steps to continuous time, we just let the time step, τ, shrink to zero (which also forces the distances involved, ε, to shrink since there’s less time to get anywhere).  As τ and ε get very small, the higher order terms dwindle away and we’re left with \frac{d}{dt}\rho(x,t)=\left[\int\frac{\epsilon^2}{2\tau}\phi(-\epsilon)d\epsilon\right]\frac{d^2}{dx^2}\rho(x,t).  We may not know what ϕ(ε), but it’s something, so \int\frac{\epsilon^2}{2\tau}\phi(-\epsilon)d\epsilon is something too.  Call that something “k” and you’ve got the diffusion equation, \frac{d\rho}{dt}=k\frac{d^2\rho}{dx^2}.

The second derivative, \frac{d^2\rho}{dx^2}, is a way to describe how a function is curving.  When it’s positive the function is curving up the way your hand curves when you palm is pointing up and when it’s negative the function is curving down.  By saying that the time derivative is proportional to the 2nd position derivative, you’re saying that “hills” will drop and “valleys” will rise.  This is exactly what your intuition should say about heat: if a place is hotter than the area around it, it will cool off.

The diffusion equation dictates that if the graph is concave down, the density drops and if the graph is concave up, the density increases.

This is a very long-winded way of saying “think of heat as randomly moving particles, because the math is the same”.  But again, heat isn’t actually particles, it’s just that picturing it as such leads to useful insights.  While the equation and the intuition are straight forward, actually solving the diffusion equation in almost any real world scenario is a huge pain.

The corners cool off faster because there are more opportunities for “heat beads” to fall out of the material there.  Although this is exactly what the diffusion equation predicts, actually doing the math by hand is difficult.

It’s all well and good to talk about how heat beads randomly walk around inside of a material, but if that material isn’t uniform or has an edge, then suddenly the math gets remarkably nasty.  Fortunately, if all you’re worried about is whether or not you should leave your heater on, then you’re probably not sweating the calculus.

The shuttle tile photo is from here.

Posted in -- By the Physicist, Engineering, Physics | 15 Comments

Q: What determines the size of the bright spot when you focus sunlight with a lens?

Physicist: This question really appeals to my ten-year-old self.  If you’ve never tried to burn something with a lens, collect three pairs of sunglasses, a magnifying lens, and something you dislike.  On a bright day, put on all three pairs of sunglasses and give it a shot.

Burning stuff with a magnifying lens: education at its finest.

Typically, when you try to focus sunlight with a lens you’ll get something that looks like a comet.  You turn the lens and move it up and down and just at the moment when the bright spot gets the smallest, you suddenly see clouds.  This is because the smallest you can concentrate light from the Sun using a lens is an in-focus image of the Sun, and incidentally, an in-focus image of anything else in the sky as well.

Paper held in the “focal plane” of a pair of binoculars during an eclipse.  This is the smallest that the light can be focused.  If the paper were any closer or farther away the images of the Sun would become blurry and more spread out.

There are several types of lens, depending on how the two sides are curved, but in each case the defining characteristic is the “focal length”, f.  The big rule for lenses is: parallel light collects together at the same place, the “focal point”, which is f away from the lens.

Left: Parallel light beams collect together at a focal point, with a different point for each different set of parallel beams.  The collection of these points is the “focal plane”.  Right: Here we’ll consider the simplest case, light perpendicular to the lens, since the results apply in general and we can talk about the focal point (“the focus”) instead of the focal plane.

An “image” is the location where the light from an “object” is brought together by a lens.  The image is so named because if there happens to be a screen at that location, an image of the object will appear in-focus.  The distance to the image depends on the distance to the object.

For lenses, the rules are: #1) Parallel light beams will pass through the focus (dots) on the far side of the lens.  #2) Light passing through the center of the lens doesn’t change direction.  These rules allow us to figure out the size and location of the Image given the size and location of the Object (black bars).

It takes a little algebra (included in the “answer gravy” below), but rule #1 in the caption above leads to the Thin Lens Equation:


where do is distance from lens to object, di is the distance from lens to image, and f is the focal length of the lens.

Rule #2 is easier to work with and, with a little less algebra, leads to the magnification equation:


where hi and ho are the sizes of the image and object.  M is the factor for how much bigger the image is than the object.  For a microscope, you want M to be big.

The distance to the Sun, do, is infinite in every useful sense.  That’s why it took so long to figure out how far away it is.  As far as (reasonably sized) lenses go, a few miles may as well be the other side of the universe.  The distance to the Sun is somewhere in that “may as well be on the other side of the universe” range.  In fact, everything in the sky falls into the same category, which is why the night sky looks like a dome, rather than the unending void that it is.

Plug do=\infty into the thin lens equation and you find that \frac{1}{f}=\frac{1}{d_i}+\frac{1}{\infty}=\frac{1}{d_i}, and so di=f.  In other words, the Sun, and everything else “at infinity”, will be in focus f away from the lens.  This coincides with the definition of the focal length, since light from a source at infinity is always parallel.  That should jive with your our experience: if you look at a light ten feet away and you step back and forth, the angle to the light changes, but if you look at the Sun (don’t) and step back and forth, the angle to the Sun stays the same.

Now on the face of it, it would seem as though there’s no way to figure out the size of the Sun’s image, hi, since \frac{h_i}{h_o}=\frac{d_i}{d_o}=\frac{f}{\infty}=0.  As with the resolution to so many other unwanted infinities, all we need is a little algebra.

Without some fairly sophisticated techniques, it’s impossible to gauge how far away the Sun is.  But while do is out of reach (for most of us), \frac{h_o}{d_o} isn’t.  By measuring the Sun’s apparent size in the sky, it’s easy to figure out that it’s 110 times farther away than it is big.  The same thing, very coincidentally, is true of the Moon; it is 110 Moon-diameters away from Earth.  Mathematically speaking, \frac{h_o}{d_o}=\frac{1}{110}.

Retentive readers will recall that we haven’t brought the magnification equation, \frac{h_i}{h_o}=\frac{d_i}{d_o}, into play.  That was intentional; pretending there’s an issue heightens drama.  Solving for the image size, hi, and plugging in what we already know, \frac{h_o}{d_o}=\frac{1}{110} and di=f, we get:


So, how big is the bright spot when you focus sunlight?  At best, a little less than 1% of the distance to the lens.  To concentrate light from the Sun as much as possible, you want to position the target (the solar oven, the ant trail, the piece of wood, whatever) at the lens’ focal distance.  When you do, the bright spot will have a diameter of \frac{f}{110}.  This ultimately comes down to the fact that the Sun is really far away, and 110 times smaller than it is distant.

The bigger a lens is, the more Sunlight it can gather.  So the best lenses for burning stuff are as big as possible (more light), with the shortest possible focal length (tiny spot).

Answer Gravy: Every now and again, it’s worth seeing how statements of fact turn into math.  Geometric optics (which sounds much more impressive than it is) basically boils down to the two rules mentioned above:

#1) Parallel light beams will pass through the focus on the far side of the lens.

#2) Light passing through the center of the lens doesn’t change direction.

The thin lens equation almost immediately falls out of these rules and the geometry of similar triangles.  Rule #2 is the easiest to work with.  Looking at the line that passes through the center of the lens, we find two similar triangle on either side.  The triangle on the left has legs ho and do and the other with legs hi and di.  Since these triangles are similar, the ratio of these lengths are the same: \frac{h_o}{d_o}=\frac{h_i}{d_i}.  Rearranging this to put the h’s and d’s on opposite sides produces the magnification equation, \frac{d_i}{d_o}=\frac{h_i}{h_o}.  Easy!

Using the same basic trick on the triangles formed by rule #1, we can find the thin lens equation.  Looking at just the right side (which side doesn’t matter), there are two triangles similar to each other.  A smaller one with legs f and ho and a larger one with legs di and ho+hi.


And there it is.  We start with a pair of intuitive, but difficult to apply principles, and end up with a pair of unintuitive, but easily applicable equations.

Posted in -- By the Physicist, Equations, Geometry, Math, Physics | Leave a comment

Q: Why are numerical methods necessary? If we can’t get exact solutions, then how do we know when our approximate solutions are any good?

Physicist: When a problem can be solved exactly and in less time than forever, then it is “analytically solvable”.  For example, “Jack has 2 apples and Jill has 3 apples, how many apples do they have together?” is analytically solvable.  It’s 5.  Exactly 5.

Precisely solving problems is what we often imagine that mathematicians are doing, but unfortunately you can’t swing a cat in the sciences without hitting a problem that can’t be solved analytically.  In reality “doing math” generally involves finding an answer rather than the answer.  While you may not be able to find the exact answer, you can often find answers with “arbitrary precision”.  In other words, you can find an approximate answer and the more computer time / scratch paper you’re willing to spend, the closer that approximation will be to the correct answer.

A lot of math problems can’t be directly solved.  For example: most.

A trick that lets you get closer and closer to an exact answer is a “numerical method”.  Numerical methods do something rather bizarre: they find solutions close to the answer without ever knowing what that answer is.  As such, an important part of every numerical method is a proof that it works.  So that there is the answer: we need numerical methods because a lot of problems are not analytically solvable and we know they work because each separate method comes packaged with a proof that it works.

It’s remarkable how fast you can stumble from solvable to unsolvable problems.  For example, there is an analytic solution for the motion of two objects interacting gravitationally but no solution for three or more objects.  This is why we can prove that two objects orbit in ellipses and must use approximations and/or lots of computational power to predict the motion of three or more objects.  This inability is the infamous “three body problem“.  It shows up in atoms as well; we can analytically describe the shape of electron orbitals and energy levels in individual hydrogen atoms (1 proton + 1 electron = 2 bodies), but for every other element we need lots of computer time to get things right.

Even for purely mathematical problems the line between analytically solvable and only numerically solvable is razor thin.  Questions with analytic solutions include finding the roots of 2nd degree polynomials, such as 0=x^2+2x-3, which can be done using the quadratic equation:


The quadratic equation is a “solution by radicals”, meaning you can find the solution using only the coefficients in front of each term (in this case: 1, 2, -3).  There’s a solution by radicals for 3rd degree polynomials and another for 4th degree polynomials (they’re both nightmares, so don’t).  However, there can never be a solution by radicals for 5th or higher degree polynomials.  If you wanted to find the solutions of 2x^5-3x^4+\pi x^3+x^2-x+\sqrt{3}=0 (and who doesn’t?) there is literally no way to find an expression for the exact answers.

Numerical methods have really come into their own with the advent of computers, but the idea is a lot older.  The decimal expansion of \pi (3.14159…) never ends and never repeats, which is a big clue that you’ll never find its value exactly.  At the same time, it has some nice properties that make it feasible to calculate \pi to arbitrarily great precision.  In other words: numerical methods.  Back in the third century BC, Archimedes realized that you could approximate \pi by taking a circle with circumference \pi, then inscribing a polygon inside it and circumscribing another polygon around it.  Since the circle’s perimeter is always longer than the inscribed polygon’s and always shorter than the circumscribed polygon’s, you can find bounds for the value of \pi.

Hexagons inscribed (blue) and circumscribed (red) on a circle with circumference π.  The perimeters of such polygons, in this case p6=3 and P6=2√33.46, must always fall on either side of π≈3.14.

By increasing the number of sides, the polygons hug the circle tighter and produce a closer approximation, from both above and below, of \pi.  There are fancy mathematical ways to prove that this method approaches \pi, but it’s a lot easier to just look at the picture, consider for a minute, and nod sagely.

Archimedes’ trick wasn’t just noticing that \pi must be between the lengths of the two polygons.  That’s easy.  His true cleverness was in coming up with a mathematical method that takes the perimeters of a given pair of k-sided inscribed and circumscribed polygons with perimeters p_k and P_k and produces the perimeters for polygons with twice the numbers of sides, p_{2k} and P_{2k}.  Here’s the method:

P_{2k}={\frac {2p_{k}P_{k}}{p_{k}+P_{k}}}\quad \quad p_{2k}={\sqrt {p_{k}P_{2k}}}

By starting with hexagons, where p_6=3 and P_6=2\sqrt{3}, and doubling the number of sides 4 times Archie found that for inscribed and circumscribed enneacontahexagons p_{96}=\frac{223}{71}\approx3.14085 and P_{96}=\frac{22}{7}\approx3.14286.  In other words, he managed to nail down \pi to about two decimal places: 3.14085<\pi<3.14286.

Some puzzlement has been evinced by Mr. Medes’ decision to stop where he did, with just two decimal points in \pi.  But not among mathematicians.  The mathematician’s ancient refrain has always been: “Now that I have demonstrated my amazing technique to the world, someone else can do it.”.

To be fair to Archie, this method “converges slowly”.  It turns out that, in general, p_n=n\sin\left(\frac{\pi}{n}\right)\approx\pi-\frac{\pi^3}{6}\frac{1}{n^2} and P_n=n\tan\left(\frac{\pi}{n}\right)\approx\pi+\frac{\pi^3}{3}\frac{1}{n^2}.  Every time you double n the errors, \frac{\pi^3}{3}\frac{1}{n^2} and \frac{\pi^3}{6}\frac{1}{n^2}, get four times as small (because 2^2=4), which translates to very roughly one new decimal place every two iterations.  \pi never ends, but still: you want to feel like you’re making at least a little progress.

Some numerical methods involve a degree of randomness and yet still manage to produce useful results.  Speaking of \pi, here’s how you can calculate it “accidentally”.  Generate n pairs of random numbers, (x,y), between 0 and 1.  Count up how many times x^2+y^2\le1 and call that number k.  If you do this many times, you’ll find that \frac{4k}{n}\approx\pi.

If you randomly pick a point in the square, the probability that it will be in the grey region is π/4.

As you generate more and more pairs and tally up how many times x^2+y^2\le1 the law of large numbers says that \frac{k}{n}\to\frac{\pi}{4}, since that’s the probability of randomly falling in the grey region in the picture above.  This numerical method is even slower than Archimedes’ not-particularly-speedy trick.  According to the central limit theorem, after n trials you’re likely to be within about \frac{0.41}{\sqrt{n}} of \pi.  That makes this a very slowly converging method; it takes about half a million trials before you can nail down “3.141”.  This is not worth trying.

Long story short, most applicable math problems cannot be done directly.  Instead we’re forced to use clever approximations and numerical methods to get really close to the right answer (assuming that “really close” is good enough).  There’s no grand proof or philosophy that proves that all these methods work but, in general, if we’re not sure that a given method works, we don’t use it.

Answer Gravy: There are a huge number of numerical methods and entire sub-sciences dedicated to deciding which to use and when.  Just for a more detailed taste of a common (fast) numerical method and the proof that it works, here’s an example of Newton’s Method, named for little-known mathematician Wilhelm Von Method.

Newton’s method finds (approximates) the zeros of a function, f(x).  That is, it finds a value, \lambda, such that f(\lambda)=0.  The whole idea is that, assuming the function is smooth, when you follow the slope at a given point down you’ll find a new point closer to a zero/solution.  All polynomials are “smooth”, so this is a good way to get around that whole “you can’t find the roots of 5th or higher degree polynomials” thing.

The “big idea” behind Newton’s Method: pick a point (xn), follow the slope, find yourself closer (xn+1), repeat.

The big advantage of Newton’s method is that, unlike the two \pi examples above, it converges preternaturally fast.

The derivative is the slope, so f^\prime(x_n) is the slope at the point (x_n,f(x_n)).  Considering the picture above, that same slope is given by the rise, f(x_n), over the run, x_n-x_{n+1}.  In other words f^\prime(x_n)=\frac{f(x_n)}{x_n-x_{n+1}} which can be solved for x_{n+1}:


So given a point near a solution, x_n, you can find another point that’s closer to the true solution, x_{n+1}.  Notice that if f(x_n)\approx0, then x_{n+1}\approx x_n.  That’s good: when you’ve got the right answer, you don’t want your approximation to change.

To start, you guess (literally… guess) a solution, call it x_0.  With this tiny equation in hand you can quickly find x_1.  With x_1 you can find x_2 and so on.  Although it can take a few iterations for it to settle down, each new x_n is closer than the last to the actual solution.  To end, you decide you’re done.

Say you need to solve x=\cos(x) for x.  Never mind why.  There is no analytical solution (this comes up a lot when you mix polynomials, like x, or trig functions or logs or just about anything).  The correct answer starts with \lambda=0.739085133215160641655312\ldots

y=x and y=cos(x). They clearly intersect, but there’s no way to analytically solve for exactly where.

First you write it in such a way that you can apply Newton’s method: f(x)=\cos(x)-x=0.  The derivative is f^\prime(x)=-\sin(x)-1 and therefore:


First make a guess.  I do hereby guess x_0=3.  Plug that in and you find that:


Plug back in what you get out several times and:

\begin{array}{ll}    x_0&=3\\    x_1&=-0.496558178297331398840279\ldots\\    x_2&=2.131003844480994964494021\ldots\\    x_3&=0.689662720778373223587585\ldots\\    x_4&=0.739652997531333829185767\ldots\\    x_5&=0.739085204375836184250693\ldots\\    x_6&=0.739085133215161759778809\ldots\\    x_7&=0.739085133215160641655312\ldots\\    x_8&=0.739085133215160641655312\ldots\\    \end{array}

In this particular case, x_0 through x_3 jump around a bit.  Sometimes Newton’s method does this forever (try x_0=5) in which case: try something else or make a new guess.  It’s not until x_4 that Newton’s method starts to really zero in on the solution.  Notice that (starting at x_4) every iteration establishes about twice as many decimal digits than the previous step:

\begin{array}{ll}    \vdots\\    x_4&=0.739\ldots\\    x_5&=0.739085\ldots\\    x_6&=0.73908513321516\ldots\\    x_7&=0.739085133215160641655312\ldots\\    \vdots    \end{array}

We know that Newton’s method works because we can prove that it converges to the solution.  In fact, we can show that it converges quadratically (which is stupid fast).  Something “converges quadratically” when the distance to the true solution is squared with every iteration.  For example, if you’re off by 0.01, then in the next step you’ll be off by around (0.01)^2=0.0001.  In other words, the number of digits you can be confident in doubles every time.

Here’s why it works:

A smooth function (which is practically everywhere, for practically any function you might want to write down) can be described by a Taylor series.  In this case we’ll find the Taylor series about the point x_n and use the facts that 0=f(\lambda) and x_n-\frac{f\left(x_n\right)}{f^\prime\left(x_n\right)}.

\begin{array}{rcl}  0&=&f(\lambda) \\[2mm]  0&=&f(x_n+(\lambda-x_n)) \\[2mm]  0&=&f(x_n)+f^\prime(x_n)\left(\lambda-x_n\right)+\frac{1}{2}f^{\prime\prime}(x_n)\left(\lambda-x_n\right)^2+\ldots \\[2mm]  0&=&\frac{f\left(x_n\right)}{f^\prime\left(x_n\right)}+\left(\lambda-x_n\right)+\frac{f^{\prime\prime}(x_n)}{2f^\prime\left(x_n\right)}\left(\lambda-x_n\right)^2+\ldots \\[2mm]  0&=&\lambda-\left(x_n-\frac{f\left(x_n\right)}{f^\prime\left(x_n\right)}\right)+\frac{f^{\prime\prime}(x_n)}{2f^\prime\left(x_n\right)}\left(\lambda-x_n\right)^2+\ldots \\[2mm]  0&=&\lambda-x_{n+1}+\frac{f^{\prime\prime}(x_n)}{2f^\prime\left(x_n\right)}\left(\lambda-x_n\right)^2+\ldots \\[2mm]  \lambda-x_{n+1}&=&-\frac{f^{\prime\prime}(x_n)}{2f^\prime\left(x_n\right)}\left(\lambda-x_n\right)^2+\ldots  \end{array}

The “…” becomes small much faster than \left(\lambda-x_n\right)^2 as x_n and \lambda get closer.  At the same time, \frac{f^{\prime\prime}(x_n)}{2f^\prime\left(x_n\right)} becomes effectively equal to \frac{f^{\prime\prime}(\lambda)}{2f^\prime\left(\lambda\right)}.  Therefore \lambda-x_{n+1}\propto\left(\lambda-x_n\right)^2 and that’s what quadratic convergence is.  Note that this only works when you’re zeroing in; far away from the correct answer Newton’s method can really bounce around.

Posted in -- By the Physicist, Computer Science, Equations, Math | 11 Comments

Burning Man 2017

Long ago, Ask a Mathematician / Ask a Physicist was two guys sitting around in the desert talking to strangers about whatever came to mind.  It’s been a while, but we’re heading back to Burning Man for more of the same!

If you happen to find yourself in the desert, have a question, and/or want to waste time with a Mathematician and a Physicist, you can find us here

There!  12-4 on Thursday.

from 12:00 to 4:00 on Thursday the 31st.  According to the official schedule we’re a gathering or party in a red tent between center camp and the Man.  That same schedule goes on to say:

“Ask a Mathematician / Ask a Physicist is two people sitting in the desert talking to other people in the desert. Ever wonder about the universe?  Entanglement?  The nature of logic?  Got a nagging question that’s been bothering you for years or just want to hang out and listen to other people’s questions?  We can help!”

Posted in -- By the Physicist | 8 Comments

Q: How can something be “proven” in science or math?

The original question was: … it confuses me that abstract concepts, such as Banach-Tarski, and other concepts in pure mathematics and theoretical physics, can be considered to have been “proven”.  Is it not the case that one can only prove something by testing hypotheses in the real/physical world?  And even then isn’t it a bit of a stretch to say that anything can really be proven beyond doubt?

Physicist: Hypothesis testing is the workhorse of scientific inquiry, used to determine whether or not a given effect is real.  The result of a hypothesis test isn’t a proof or disproof, it’s an estimate of how likely it is that you would see a given result accidentally.  The more unlikely it is that something would occur accidentally, the more likely it is to be a real effect.  For example, we haven’t proven that the Higgs boson exists, it’s just that there’s only about a one in half a trillion chance that the data from CERN would be produced accidentally.  That’s not a proof.  Even so, if an effect works as predicted very consistently, then you may as well believe that it’s real.

Things are “proven” to be true with certainty in very much the same way that we can know with certainty that someone has won a chess game.  There’s nothing etched into the fabric of the universe that determines how chess pieces move on a board (other than, you know, physically) or who won a given game, and yet everyone who knows the rules will be able to agree on the victor.  Math, despite its vaunted status as the purest science and the means by which the reach of our simple minds can exceed their squishy grasp, is basically like the rules of chess or any other game.

Once the rules are established, you can prove things based on those rules and some logic (technically, logic is just more rules).  For example, based on a reasonably short list of straightforward mathematical rules you can first define what a prime number is and then prove that there are an infinite number of them.

The rules in mathematics are called “axioms” and the results based on those rules are “theorems”.  For example, “you can’t split a point in half” is an axiom while “there are an infinite number primes” is a theorem.  When you first learn about numbers and arithmetic, you’re learning Peano’s axioms and lots of definitions and conclusions based on them.  Like the rules of chess, axioms just establish what things you can and can’t do in math and people are free to argue about which they do or don’t want to include.  Math doesn’t necessarily have anything to do with reality; it just happens to include some the most effective tools for understanding it ever conceived.

The fact that we can create new mathematics that doesn’t have anything to do with reality may seem like a weakness, but it’s turned out to be fantastically useful.  For example, by generalizing the laws of geometry away from triangles, three dimensions of space, and even the very notion of distance, mathematicians paved the way for Einstein’s general relativity (which describes the nature of gravity in terms of warped spacetime).  He basically just had to plug his new ideas about spacetime into math that had already been created.

Banach-Tarski is a century old result from set theory which says that you can (among other things) break a sphere into five or more sets, rotate and move those sets, and recombine them into two spheres identical to the first.  These sets are less like block puzzles pieces and more like droplets in a fog, almost all of which are smaller than any given size.  Notice that this is completely impossible physically.  Lucky for Banach and Tarski, math isn’t dictated by the uptight strictures of reality.

XKCD: Funny because it’s true.

Banach-Tarski is based on the usual axioms of set theory, Zermelo–Fraenkel (ZF), but requires the addition of a hotly contested axiom, the “Axiom of Choice” (ZFC).  “Hotly contested” in the math community is bit of a misnomer; mathematicians mostly just write long papers and stare angrily at each others shoes when they’re forced to shake hands.  The axiom of choice is to mathematics as en passant is to chess; it comes up when it comes up, but you don’t need it in general (if you’ve ever made it through a game of chess and have no idea what en passant is: exactly).

In abstract systems, the rules that are included are determined by preference, not physical reality.  In order to be useful to more than one person, most of the rules are generally agreed upon, but some are not.  (Left) En passant in chess and (Right) the axiom of choice in set theory.

The axiom of choice states that it is always possible to select (or even choose) a single item from each of a infinite collection of sets.  This is easy if there are a finite number of sets (“just go ahead and do it”) or if there’s a nice rule you can come up with (“always pick the lowest number”).  But sometimes you find yourself with an infinite set of infinite sets, none of which have a highest, lowest, or middlest point.  If you’re wondering how you go about picking a single unique item out of each of these sets, the Axiom of Choice says “you just can, so be cool”.  It is a completely made up statement that changes the rules of the game.  It’s not a matter of true or false, it’s a matter of consistency and agreeing with other mathematicians.

Physics, despite being the queen of the sciences and the means by which we mortals may strive to understand the underlying nature of reality, isn’t any better than math.  In physics you can “prove” that things are true or false, but only based on established rules: the “physical laws”.  For example, Newton’s universal law of gravitation says that the force of attraction between two objects with masses M and m spaced a distance r apart is F=\frac{GMm}{r^2}.  More than merely a statement of fact, mathematical expressions like this allow us to describe/predict precisely how things physically behave.  We can prove that orbits are elliptical based on this law (and a couple others) are accurate.  Notice that’s “accurate”, but not necessarily “true”.

If those rules turn out to be false, then the proofs based upon them aren’t proofs.  This is why physicists are so careful about establishing and verifying every detail of their theories.  They spend (seemingly wasted) decades doing tests of things that they’re already almost 100% sure is right, because a flaw in any of the fundamental laws would ripple out into every “proved” thing that’s based on it.

Every now and again some base rule or assumption in math or physics is overturned.  In math this is entirely due to logic, but physics is a bit more tricky.  We can’t divine the rules of the universe with logic alone.  If you were just a mind in a void, the nature of this universe would be a real shock.  No matter how smart you are, you need experiment and observation to learn new things about the world.

It’s easy (well… fairly easy) to write down some physical laws that seem to describe what we know about the universe that turn out to be wrong.  Without buckets of fantastically precise data and the math to understand it, there’s no way to know whether what you know is really only what you think.  Newton’s laws are tremendously useful, but ultimately misinformed.  They perfectly described the universe according to the data we had at the time; when more accurate (and more difficult to attain) data gave rise to “truer” physical theories we came to realize that Newtonian physics is merely a very good approximation.

Before Einstein we had safely assumed that time and space were completely independent. It took some seriously recondite phenomena (e.g., the invariance of the speed of light and a tiny error in Mercury’s orbit) to indicate that time and space are not some much related as they are different aspects of the same thing.  Almost more sacred, before Bell we had assumed that everything exists in a single definite state, whether we know what that state is or not.  This totally reasonable assumption is “realism”.  Again, the difference between the universe we had assumed we lived in and the world we evidently do live in (probably) was a set of incredibly esoteric, nigh unnoticeable effects (e.g., the randomness of things like radioactive decay and the “impossible” statistics of entangled particles).  It took a lot of clever experiments (dutifully checked, expounded upon, and multiply verified) and math to come to the conclusion that: nope, an assumption so fundamental that we call it “realism” or “the reality assumption” is actually false.  Quantum physicists who have evolved beyond the need to be understood will call the property of definitely being in a single state “counterfactual definiteness“.  Not that it’s worth mentioning, but if you can read this, you exist.  Good on ya.

In mathematics you can prove things, but you’re ultimately just moving pieces around on a board.  There’s a lot to learn and discover in the realms of logic, but math, like every abstract human endeavor, is all in our heads.

In physics you can prove things using physical laws.  However, those physical laws are only true insofar as they always work perfectly (as far as we can measure and verify) in every scenario which, arguably, is the best you can hope for.

Posted in -- By the Physicist, Conventions, Math, Philosophical | 7 Comments