Q: Why are numerical methods necessary? If we can’t get exact solutions, then how do we know when our approximate solutions are any good?

Physicist: When a problem can be solved exactly and in less time than forever, then it is “analytically solvable”.  For example, “Jack has 2 apples and Jill has 3 apples, how many apples do they have together?” is analytically solvable.  It’s 5.  Exactly 5.

Precisely solving problems is what we often imagine that mathematicians are doing, but unfortunately you can’t swing a cat in the sciences without hitting a problem that can’t be solved analytically.  In reality “doing math” generally involves finding an answer rather than the answer.  While you may not be able to find the exact answer, you can often find answers with “arbitrary precision”.  In other words, you can find an approximate answer and the more computer time / scratch paper you’re willing to spend, the closer that approximation will be to the correct answer.

A lot of math problems can’t be directly solved.  For example: most.

A trick that lets you get closer and closer to an exact answer is a “numerical method”.  Numerical methods do something rather bizarre: they find solutions close to the answer without ever knowing what that answer is.  As such, an important part of every numerical method is a proof that it works.  So that there is the answer: we need numerical methods because a lot of problems are not analytically solvable and we know they work because each separate method comes packaged with a proof that it works.

It’s remarkable how fast you can stumble from solvable to unsolvable problems.  For example, there is an analytic solution for the motion of two objects interacting gravitationally but no solution for three or more objects.  This is why we can prove that two objects orbit in ellipses and must use approximations and/or lots of computational power to predict the motion of three or more objects.  This inability is the infamous “three body problem“.  It shows up in atoms as well; we can analytically describe the shape of electron orbitals and energy levels in individual hydrogen atoms (1 proton + 1 electron = 2 bodies), but for every other element we need lots of computer time to get things right.

Even for purely mathematical problems the line between analytically solvable and only numerically solvable is razor thin.  Questions with analytic solutions include finding the roots of 2nd degree polynomials, such as $0=x^2+2x-3$, which can be done using the quadratic equation:

$x=\frac{-(2)\pm\sqrt{(2)^2-4(1)(-3)}}{2(1)}=\frac{-2\pm4}{2}=-3\,or\,1$

The quadratic equation is a “solution by radicals”, meaning you can find the solution using only the coefficients in front of each term (in this case: 1, 2, -3).  There’s a solution by radicals for 3rd degree polynomials and another for 4th degree polynomials (they’re both nightmares, so don’t).  However, there can never be a solution by radicals for 5th or higher degree polynomials.  If you wanted to find the solutions of $2x^5-3x^4+\pi x^3+x^2-x+\sqrt{3}=0$ (and who doesn’t?) there is literally no way to find an expression for the exact answers.

Numerical methods have really come into their own with the advent of computers, but the idea is a lot older.  The decimal expansion of $\pi$ (3.14159…) never ends and never repeats, which is a big clue that you’ll never find its value exactly.  At the same time, it has some nice properties that make it feasible to calculate $\pi$ to arbitrarily great precision.  In other words: numerical methods.  Back in the third century BC, Archimedes realized that you could approximate $\pi$ by taking a circle with circumference $\pi$, then inscribing a polygon inside it and circumscribing another polygon around it.  Since the circle’s perimeter is always longer than the inscribed polygon’s and always shorter than the circumscribed polygon’s, you can find bounds for the value of $\pi$.

Hexagons inscribed (blue) and circumscribed (red) on a circle with circumference π.  The perimeters of such polygons, in this case p6=3 and P6=2√33.46, must always fall on either side of π≈3.14.

By increasing the number of sides, the polygons hug the circle tighter and produce a closer approximation, from both above and below, of $\pi$.  There are fancy mathematical ways to prove that this method approaches $\pi$, but it’s a lot easier to just look at the picture, consider for a minute, and nod sagely.

Archimedes’ trick wasn’t just noticing that $\pi$ must be between the lengths of the two polygons.  That’s easy.  His true cleverness was in coming up with a mathematical method that takes the perimeters of a given pair of k-sided inscribed and circumscribed polygons with perimeters $p_k$ and $P_k$ and produces the perimeters for polygons with twice the numbers of sides, $p_{2k}$ and $P_{2k}$.  Here’s the method:

$P_{2k}={\frac {2p_{k}P_{k}}{p_{k}+P_{k}}}\quad \quad p_{2k}={\sqrt {p_{k}P_{2k}}}$

By starting with hexagons, where $p_6=3$ and $P_6=2\sqrt{3}$, and doubling the number of sides 4 times Archie found that for inscribed and circumscribed enneacontahexagons $p_{96}=\frac{223}{71}\approx3.14085$ and $P_{96}=\frac{22}{7}\approx3.14286$.  In other words, he managed to nail down $\pi$ to about two decimal places: $3.14085<\pi<3.14286$.

Some puzzlement has been evinced by Mr. Medes’ decision to stop where he did, with just two decimal points in $\pi$.  But not among mathematicians.  The mathematician’s ancient refrain has always been: “Now that I have demonstrated my amazing technique to the world, someone else can do it.”.

To be fair to Archie, this method “converges slowly”.  It turns out that, in general, $p_n=n\sin\left(\frac{\pi}{n}\right)\approx\pi-\frac{\pi^3}{6}\frac{1}{n^2}$ and $P_n=n\tan\left(\frac{\pi}{n}\right)\approx\pi+\frac{\pi^3}{3}\frac{1}{n^2}$.  Every time you double n the errors, $\frac{\pi^3}{3}\frac{1}{n^2}$ and $\frac{\pi^3}{6}\frac{1}{n^2}$, get four times as small (because $2^2=4$), which translates to very roughly one new decimal place every two iterations.  $\pi$ never ends, but still: you want to feel like you’re making at least a little progress.

Some numerical methods involve a degree of randomness and yet still manage to produce useful results.  Speaking of $\pi$, here’s how you can calculate it “accidentally”.  Generate n pairs of random numbers, (x,y), between 0 and 1.  Count up how many times $x^2+y^2\le1$ and call that number k.  If you do this many times, you’ll find that $\frac{4k}{n}\approx\pi$.

If you randomly pick a point in the square, the probability that it will be in the grey region is π/4.

As you generate more and more pairs and tally up how many times $x^2+y^2\le1$ the law of large numbers says that $\frac{k}{n}\to\frac{\pi}{4}$, since that’s the probability of randomly falling in the grey region in the picture above.  This numerical method is even slower than Archimedes’ not-particularly-speedy trick.  According to the central limit theorem, after n trials you’re likely to be within about $\frac{0.41}{\sqrt{n}}$ of $\pi$.  That makes this a very slowly converging method; it takes about half a million trials before you can nail down “3.141”.  This is not worth trying.

Long story short, most applicable math problems cannot be done directly.  Instead we’re forced to use clever approximations and numerical methods to get really close to the right answer (assuming that “really close” is good enough).  There’s no grand proof or philosophy that proves that all these methods work but, in general, if we’re not sure that a given method works, we don’t use it.

Answer Gravy: There are a huge number of numerical methods and entire sub-sciences dedicated to deciding which to use and when.  Just for a more detailed taste of a common (fast) numerical method and the proof that it works, here’s an example of Newton’s Method, named for little-known mathematician Wilhelm Von Method.

Newton’s method finds (approximates) the zeros of a function, f(x).  That is, it finds a value, $\lambda$, such that $f(\lambda)=0$.  The whole idea is that, assuming the function is smooth, when you follow the slope at a given point down you’ll find a new point closer to a zero/solution.  All polynomials are “smooth”, so this is a good way to get around that whole “you can’t find the roots of 5th or higher degree polynomials” thing.

The “big idea” behind Newton’s Method: pick a point (xn), follow the slope, find yourself closer (xn+1), repeat.

The big advantage of Newton’s method is that, unlike the two $\pi$ examples above, it converges preternaturally fast.

The derivative is the slope, so $f^\prime(x_n)$ is the slope at the point $(x_n,f(x_n))$.  Considering the picture above, that same slope is given by the rise, $f(x_n)$, over the run, $x_n-x_{n+1}$.  In other words $f^\prime(x_n)=\frac{f(x_n)}{x_n-x_{n+1}}$ which can be solved for $x_{n+1}$:

$x_{n+1}=x_n-\frac{f(x_n)}{f^\prime(x_n)}$

So given a point near a solution, $x_n$, you can find another point that’s closer to the true solution, $x_{n+1}$.  Notice that if $f(x_n)\approx0$, then $x_{n+1}\approx x_n$.  That’s good: when you’ve got the right answer, you don’t want your approximation to change.

To start, you guess (literally… guess) a solution, call it $x_0$.  With this tiny equation in hand you can quickly find $x_1$.  With $x_1$ you can find $x_2$ and so on.  Although it can take a few iterations for it to settle down, each new $x_n$ is closer than the last to the actual solution.  To end, you decide you’re done.

Say you need to solve $x=\cos(x)$ for x.  Never mind why.  There is no analytical solution (this comes up a lot when you mix polynomials, like x, or trig functions or logs or just about anything).  The correct answer starts with $\lambda=0.739085133215160641655312\ldots$

y=x and y=cos(x). They clearly intersect, but there’s no way to analytically solve for exactly where.

First you write it in such a way that you can apply Newton’s method: $f(x)=\cos(x)-x=0$.  The derivative is $f^\prime(x)=-\sin(x)-1$ and therefore:

$x_{n+1}=x_n-\frac{f\left(x_n\right)}{f^\prime\left(x_n\right)}=x_n+\frac{\cos\left(x_n\right)-x_n}{\sin\left(x_n\right)+1}$

First make a guess.  I do hereby guess $x_0=3$.  Plug that in and you find that:

$x_1=x_0+\frac{\cos\left(x_0\right)-x_0}{\sin\left(x_0\right)+1}=3+\frac{\cos\left(3\right)-3}{\sin\left(3\right)+1}=-0.496558178297331398840279$

Plug back in what you get out several times and:

$\begin{array}{ll} x_0&=3\\ x_1&=-0.496558178297331398840279\ldots\\ x_2&=2.131003844480994964494021\ldots\\ x_3&=0.689662720778373223587585\ldots\\ x_4&=0.739652997531333829185767\ldots\\ x_5&=0.739085204375836184250693\ldots\\ x_6&=0.739085133215161759778809\ldots\\ x_7&=0.739085133215160641655312\ldots\\ x_8&=0.739085133215160641655312\ldots\\ \end{array}$

In this particular case, $x_0$ through $x_3$ jump around a bit.  Sometimes Newton’s method does this forever (try $x_0=5$) in which case: try something else or make a new guess.  It’s not until $x_4$ that Newton’s method starts to really zero in on the solution.  Notice that (starting at $x_4$) every iteration establishes about twice as many decimal digits than the previous step:

$\begin{array}{ll} \vdots\\ x_4&=0.739\ldots\\ x_5&=0.739085\ldots\\ x_6&=0.73908513321516\ldots\\ x_7&=0.739085133215160641655312\ldots\\ \vdots \end{array}$

We know that Newton’s method works because we can prove that it converges to the solution.  In fact, we can show that it converges quadratically (which is stupid fast).  Something “converges quadratically” when the distance to the true solution is squared with every iteration.  For example, if you’re off by 0.01, then in the next step you’ll be off by around $(0.01)^2=0.0001$.  In other words, the number of digits you can be confident in doubles every time.

Here’s why it works:

A smooth function (which is practically everywhere, for practically any function you might want to write down) can be described by a Taylor series.  In this case we’ll find the Taylor series about the point $x_n$ and use the facts that $0=f(\lambda)$ and $x_n-\frac{f\left(x_n\right)}{f^\prime\left(x_n\right)}$.

$\begin{array}{rcl} 0&=&f(\lambda) \\[2mm] 0&=&f(x_n+(\lambda-x_n)) \\[2mm] 0&=&f(x_n)+f^\prime(x_n)\left(\lambda-x_n\right)+\frac{1}{2}f^{\prime\prime}(x_n)\left(\lambda-x_n\right)^2+\ldots \\[2mm] 0&=&\frac{f\left(x_n\right)}{f^\prime\left(x_n\right)}+\left(\lambda-x_n\right)+\frac{f^{\prime\prime}(x_n)}{2f^\prime\left(x_n\right)}\left(\lambda-x_n\right)^2+\ldots \\[2mm] 0&=&\lambda-\left(x_n-\frac{f\left(x_n\right)}{f^\prime\left(x_n\right)}\right)+\frac{f^{\prime\prime}(x_n)}{2f^\prime\left(x_n\right)}\left(\lambda-x_n\right)^2+\ldots \\[2mm] 0&=&\lambda-x_{n+1}+\frac{f^{\prime\prime}(x_n)}{2f^\prime\left(x_n\right)}\left(\lambda-x_n\right)^2+\ldots \\[2mm] \lambda-x_{n+1}&=&-\frac{f^{\prime\prime}(x_n)}{2f^\prime\left(x_n\right)}\left(\lambda-x_n\right)^2+\ldots \end{array}$

The “…” becomes small much faster than $\left(\lambda-x_n\right)^2$ as $x_n$ and $\lambda$ get closer.  At the same time, $\frac{f^{\prime\prime}(x_n)}{2f^\prime\left(x_n\right)}$ becomes effectively equal to $\frac{f^{\prime\prime}(\lambda)}{2f^\prime\left(\lambda\right)}$.  Therefore $\lambda-x_{n+1}\propto\left(\lambda-x_n\right)^2$ and that’s what quadratic convergence is.  Note that this only works when you’re zeroing in; far away from the correct answer Newton’s method can really bounce around.

This entry was posted in -- By the Physicist, Computer Science, Equations, Math. Bookmark the permalink.

6 Responses to Q: Why are numerical methods necessary? If we can’t get exact solutions, then how do we know when our approximate solutions are any good?

1. Tommy Scott says:

Maybe the truth is that we just aren’t smart enough to figure out exact solutions. Perhaps another genius like Sir Isaac Newton will come along some day and invent a new math that solves all these problems.

2. The Physicist says:

@Tommy Scott
Maybe? If so, that new math would be profoundly alien. It can be proven that, for example, the three body problem is impossible to solve in general, so we could have planet-sized brains and still be no more capable of solving it.

3. Tommy Scott says:

@The Physicist, There are many things that people have claimed are unsolvable or those that claim that certain things were fact beyond doubt and yet here we are now knowing better. If you were to claim that something is unsolvable then you are also claiming to have all knowledge. Because it’s just possible that in the 99.9% of all knowledge that you don’t know that there is a solution. For example, I can prove that there is no such thing as an irrational number. But at this moment you cannot even conceive of such a notion. However, you are just a simple explanation away from a revelation that would change your whole perspective on math.

4. The Physicist says:

@Tommy Scott
There’s a big difference between merely claiming that something is impossible and proving that it’s impossible. It’s not for lack of trying that we can’t square the circle, solve 5th or higher degree polynomials, or find a rational value of pi. These things can’t be done and we know why. Trying to solve one of these problems would be like trying to find an odd number that’s divisible by two.

5. Tommy Scott says:

@The Physicist, Actually I believe that I can prove there is a rational value for pi. But I will not demonstrate it here. This is paradigm changing stuff and I’m not just going to lay it at your feet. I assure you, though, there almost certainly must be a rational pi. Once I consult with some mathematicians that I trust, the results will ultimately be published. Unless I am completely wrong in which case I will humbly admit it.

6. Greg ROBERT says:

This is a well written and comprehensive article.
I have but one quibble.

You blurred some what the difference between pure math and applied math. The job for physics is to describe and predict and we find, as someone said, mathematics to be unreasonably good at this. Or is it?

What I’m talking about is the philosopohical debate about whether mathematics is invented or discovered. As well as whether it has anything to do with reality other than providing an extremely useful tool for description/predictions. Whether there is any tangible connection between the two is a question well beyond us at this time.

Just because mathematical equations LOOK like the “truth” that doesn’t mean they actually ARE the truth.