Q: Is there an intuitive proof for the chain rule?

Physicist: The chain rule is a tool from calculus that says that if you have one function “nested” inside of another, $f(g(x))$ , then the derivative of the whole mess is given by $\frac{df}{dx} (g(x))\cdot \frac{dg}{dx} (x)$ . There are a number of ways to prove this, but one of the more enlightening ways to look at the chain rule (without rigorously proving it) is to look at what happens to any function, $f(x)$ , when you muck about with the argument (the “x” part).

Doubling the "argument" of a function scrunches it up. As a result the slope at each corresponding part of the new function is doubled.

When you multiply the argument by some amount, the graph of the function gets squished by the same amount. If you, for example, plug in “x=3” to f(2x), that’s exactly the same as plugging in “x=6” to $f(x)$ . For $f(2x)$ , everything happens at half the original x value.

However, while $f(2x)$ when x=3 is the same as $f(x)$ when x=6, the same is not true of their slopes. The slope (derivative) is “rise over run” and the run just became half as long, so the slope just got twice as big. Scrunching a graph makes the slope steeper (see picture above).

So, the slope of $f(2x)$ at x=3 is actually double the slope of $f(x)$ at x=6. You can write this in general as $\frac{d}{dx} \left[ f(2x) \right] = \frac{df}{dx}(2x)\cdot 2$ .

Here’s the calculus leap: replacing the x in $f(x)$ with 2x clearly means that you’re running through the function twice as fast, so when you take the derivative you just multiply by two to deal with the scrunching. But, if you instead replace x with a more complicated function, $g(x)$ , then the amount of speed up and slow down depends on $g(x)$ . If $g(x)$ has a slope of 2 at some point, then it’s acting like 2x and you get the same “times two” slope. If it’s got a slope of 3 or 1/5, then the slope of $f$ at the corresponding point will be multiplied by 3 or 1/5 respectively.

sin(x) in blue and sin(x^2/4) in green. x^2 starts slow and gets faster and faster, and as a result the green line gets steeper and steeper.

So, to find the slope of $f(g(x))$ , which is just the derivative, $\frac{d}{dx}\left[f(g(x))\right]$ you first find what the slope of $f$ would be at the appropriate x value, $\frac{df}{dx}(g(x))$ , and then multiply by how much $g$ is speeding things up or slowing things down (scrunching or expanding). The slope of $g$ is just the derivative, so you’re multiplying by $\frac{dg}{dx}(x)$ .

Boom! Chain rule: $\frac{d}{dx}\left[ f(g(x))\right] = \frac{df}{dx} (g(x))\cdot \frac{dg}{dx} (x)$

It’s worth pointing out that, like all calc rules, it doesn’t matter that this rule only talks about two functions. If you have something like $f(g(h(x)))$ , then you can treat $g(h(x))$ as one function, and you’ll find that after running through the chain rule once you’ll be faced with another, simpler, chain rule problem: