Q: Why is the integral/antiderivative the area under a function?

Physicist: If you’ve taken calculus, then at some point you learned that to find the area under a function (generally written \int_A^B f(x) \, dx) you need to find the anti-derivative of that function.  The most natural response to these types of theorems is “wait… what?… why?”.

This theorem is so important and widely used that it’s called the “fundamental theorem of calculus”, and it ties together the integral (area under a function) with the antiderivative (opposite of the derivative) so tightly that the two words are essentially interchangeable.  However, there are some mathematicians who may take issue with mixing up the two terms.

It comes back (in a roundabout way) to the fact that the derivative of a function is the slope of that function or the “rate of change”.  In what follows “f” is a function, and “F” is its anti-derivative (that is: F’ = f).


Intuitively: Say you’ve got a function f(x), and the area under f(x) (up to some value x) is given by A(x).

Then the statement “the area, A, is given by the anti-derivative of f” is equivalent to “the derivative of A is given by f”.

In other words, the rate at which the area increases (as you slide x to the right) is given by the height, f(x).

For a constant function the area is given by A=cx, and the rate of increase (the amount that the area increases if x increases by 1) is c. Whether or not the function moves around makes no difference. From moment-to-moment the rate of increase is always equal to the height (the value of f).

For example, if the height of the function were 3, then, for a moment, the area under the function is increasing by 3 for every 1 unit of distance you slide to the right.  Keep in mind that the function can move up and down as much as it wants.  As far as the function “knows”, at any particular moment it may as well be constant (dotted line in picture above).

So if the height of the function (which is just the function) is the rate at which the area changes, then f is the derivative of the area: A’=f.  But that’s exactly the same as saying that the area is the anti-derivative of the function.


Mathematically: There’s a theorem called the mean value theorem that states that if you have a “smooth” function with no sudden bends or kinks, then over any interval the derivative will be equal to the average slope at least once.  This needs a picture:

Given a smooth function f, there's a point c where the function has the same slope as the overall average slope.

More precisely, if you have a function on the interval [A,B], then there’s a point c between A and B such that f^\prime (c) = \frac{f(B)-f(A)}{B-A}.  You can just as easily write this as f^\prime (c) (B-A) = f(B)-f(A) or f(c) (B-A) = F(B)-F(A) (since F’ =f).

So if you drive 60 miles in one hour, then at some instant you must have been driving at exactly 60 mph, even though for almost the entire trip you may have been traveling much faster or much slower than 60 mph.

Keep that stuff in the back of your mind for a moment, and ponder instead how to go about approximating the area under a function.

You can approximate the area under a function by dividing it up into a whole lot of tiny rectangles. The area of each is the width times the height, where the height is any value of f in that particular interval. Choosing different values does change the area of that rectangle, but it turns out that that doesn't matter.

You can divide up the area between x=A and x=B under a function by putting a mess of rectangles under it.   Divide up the interval [A,B] by picking a string of points x0, x1, x2, …, xN, and use these as the left and right sides of your rectangles (and set x0=A and xN=B).

The point, ci, that you pick in between each xi-1 and xi is unimportant.  To get the exact area you let N, the total number of rectangles, go flying off to infinity, and you’ll find that the highest value of f and the lowest value of f in each tiny interval gets squeezed together.

So, why not choose a value of ci so that in each rectangle you can say f(c_i) (x_i-x_{i-1}) = F(x_i)-F(x_{i-1})?

\begin{array}{ll}area \\\approx \sum_{i=1}^N f(c_i) (x_i-x_{i-1}) \\= \sum_{i=1}^N F(x_i)-F(x_{i-1}) \\= \left\{ \begin{array}{ll}F(x_1)-F(x_0)\\+F(x_2)-F(x_1)\\+F(x_3)-F(x_2)\\ \cdots \\ +F(x_{N-1})-F(x_{N-2})\\+F(x_N)-F(x_{N-1})\end{array} \right\}\\= F(x_N) - F(x_0)\\= F(B) - F(A)\end{array}

Holy crap!  The area under the function (the integral) is given by the antiderivative!  Again, this approximation becomes an equality as the number of rectangles becomes infinite.


As an aside (for those of you who really wanted to read an entire post about integrals), integrals are surprisingly robust.  That is to say, if your function has a kink in it (the way |x| has a kink at zero, for example) then you can’t find a derivative at that kink, but integrals don’t have that problem.  If there’s a kink or even a discontinuity; no problem!

You can just put the edge of a rectangle at the problem point, and then ignore it.  In fact, think of (almost) any function in your head…  You can take the integral of that.  It may have an infinite value, or something awful like that, but you can still take the integral.

To make a function that can’t be integrated you have to make it infinitely messed up.  Mathematicians live for this sort of thing.  There is almost nothing in the world they enjoy more than coming up with ways to break each other’s theories.  One of the classic examples is the function f(x) = \left\{ \begin{array}{ll} 0,&\textrm{when x is a rational number}\\1,&\textrm{when x is an irrational number}\end{array}\right.

Over any interval you pick, f still jumps around infinitely often, so the whole “things will get better as the number of rectangles increases” thing can never get off the ground.  There are fixes to this, but they come boiling and howling up out of the ever-darker, stygian abyss that is measure theory.

This entry was posted in -- By the Physicist, Equations, Math. Bookmark the permalink.

58 Responses to Q: Why is the integral/antiderivative the area under a function?

  1. Amphiprion says:

    I don’t know if you will ever see this, but thank you for your clear explanation and well thought out examples. You’re a lifesaver and have my eternal gratitude. I’ve watched multiple lectures and read several explanations but hadn’t understood the fundamental theorem of calculus until reading this post.

  2. Error: Unable to create directory uploads/2024/03. Is its parent directory writable by the server? The Physicist says:

    @Amphiprion
    Glad to help!

  3. Shuchi Goyal says:

    Loved it. So easy to explain it with mean value theorem.

  4. Rex Balsarin says:

    Thanks for your explanation. First off, am not a student anymore, past the 40 years of age, have never used calculus in my working experience and have never understood why the area is important to find out. Many experts out there just blurt out the same definition of the integral and derivative, but I haven’t found any information of why the area is needed in the first place.
    You and others might be probably laughing and I don’t bear any grudges, if I’ve understood everything I might be laughing as well.
    Even back in high school (let alone the few years of uni, then dropped off) the teachers just skipped that part that I’ve never understood: what is a practical example of why an area under the coordinates that defines a function is needed?
    I’ve only understood what function f(x) really meant in the first year of university, that is it’s the axis Y. The graph is only a representation, that’s it. Graphically speaking, a function that “gets closer but never touches an axis” doesn’t make sense, but if seen f(x) as a value on the ordinate, then TO ME it finally clicks. When explained using only graphs, TO ME it’s just nonsense (referring to certain specific examples as the one above).

    If you feel like answering my question, I’d be happy. If not, thanks for reading.

  5. Chris says:

    Consider
    y=ax+b
    y’=a
    y is inherently an area. It comes from y’x plus a constant.

    When a slope is varying dy/dx, same sense apply. The area is no loner rectangular shape but the varying edge make the total still the same as y.

  6. David Whitelaw says:

    I like the intuitive explanation. It complements the standard theory in the texts. My question is why is the antiderivative (also called the indefinite integral) represented as a definite integral without upper or lower bounds. In the theoretical development (eg Thomas) the indefinite integral is simply assigned this notation with no explanation (ie the notation is arbitrary). Yet in all subsequent theory (eg line integrals with vector dot products) the notation makes sense as a limiting sum over a function times an infinitesimal increment. How is the indefinite integral notation related to the definite integral?

  7. Umer Shahzad says:

    You are suuuuuperb sir i was struggling with this concept for a year and you cleared it in minutes thank you so much

  8. Randy, retired Geophysicist says:

    It’s even easier than using the mean value theorem. Draw a curve and a vertical line segment from the x axis through the curve. The intersection (x,y), represents a general point on the curve. Then draw a segment on either side of the first. Both segments are parallel and close to the first and the intersections are (x-Δx,y-Δy) and (x+Δx,y+Δy). Now, to determine the area function, A(x), under the curve, we have ΔA=(y)(Δx). Then, ΔA/Δx=y or ΔA/Δx=f(x). Now, take the limit as Δx→0 (from left and right). Assuming the limit exists, it must be dA/dx=f(x) or A(x) is the anti-derivative of f(x).

Leave a Reply

Your email address will not be published.