# Q: Why is the integral/antiderivative the area under a function?

Physicist: If you’ve taken calculus, then at some point you learned that to find the area under a function (generally written $\int_A^B f(x) \, dx$) you need to find the anti-derivative of that function.  The most natural response to these types of theorems is “wait… what?… why?”.

This theorem is so important and widely used that it’s called the “fundamental theorem of calculus”, and it ties together the integral (area under a function) with the antiderivative (opposite of the derivative) so tightly that the two words are essentially interchangeable.  However, there are some mathematicians who may take issue with mixing up the two terms.

It comes back (in a roundabout way) to the fact that the derivative of a function is the slope of that function or the “rate of change”.  In what follows “f” is a function, and “F” is its anti-derivative (that is: F’ = f).

Intuitively: Say you’ve got a function f(x), and the area under f(x) (up to some value x) is given by A(x).

Then the statement “the area, A, is given by the anti-derivative of f” is equivalent to “the derivative of A is given by f”.

In other words, the rate at which the area increases (as you slide x to the right) is given by the height, f(x).

For a constant function the area is given by A=cx, and the rate of increase (the amount that the area increases if x increases by 1) is c. Whether or not the function moves around makes no difference. From moment-to-moment the rate of increase is always equal to the height (the value of f).

For example, if the height of the function were 3, then, for a moment, the area under the function is increasing by 3 for every 1 unit of distance you slide to the right.  Keep in mind that the function can move up and down as much as it wants.  As far as the function “knows”, at any particular moment it may as well be constant (dotted line in picture above).

So if the height of the function (which is just the function) is the rate at which the area changes, then f is the derivative of the area: A’=f.  But that’s exactly the same as saying that the area is the anti-derivative of the function.

Mathematically: There’s a theorem called the mean value theorem that states that if you have a “smooth” function with no sudden bends or kinks, then over any interval the derivative will be equal to the average slope at least once.  This needs a picture:

Given a smooth function f, there's a point c where the function has the same slope as the overall average slope.

More precisely, if you have a function on the interval [A,B], then there’s a point c between A and B such that $f^\prime (c) = \frac{f(B)-f(A)}{B-A}$.  You can just as easily write this as $f^\prime (c) (B-A) = f(B)-f(A)$ or $f(c) (B-A) = F(B)-F(A)$ (since F’ =f).

So if you drive 60 miles in one hour, then at some instant you must have been driving at exactly 60 mph, even though for almost the entire trip you may have been traveling much faster or much slower than 60 mph.

Keep that stuff in the back of your mind for a moment, and ponder instead how to go about approximating the area under a function.

You can approximate the area under a function by dividing it up into a whole lot of tiny rectangles. The area of each is the width times the height, where the height is any value of f in that particular interval. Choosing different values does change the area of that rectangle, but it turns out that that doesn't matter.

You can divide up the area between x=A and x=B under a function by putting a mess of rectangles under it.   Divide up the interval [A,B] by picking a string of points x0, x1, x2, …, xN, and use these as the left and right sides of your rectangles (and set x0=A and xN=B).

The point, ci, that you pick in between each xi-1 and xi is unimportant.  To get the exact area you let N, the total number of rectangles, go flying off to infinity, and you’ll find that the highest value of f and the lowest value of f in each tiny interval gets squeezed together.

So, why not choose a value of ci so that in each rectangle you can say $f(c_i) (x_i-x_{i-1}) = F(x_i)-F(x_{i-1})$?

$\begin{array}{ll}area \\\approx \sum_{i=1}^N f(c_i) (x_i-x_{i-1}) \\= \sum_{i=1}^N F(x_i)-F(x_{i-1}) \\= \left\{ \begin{array}{ll}F(x_1)-F(x_0)\\+F(x_2)-F(x_1)\\+F(x_3)-F(x_2)\\ \cdots \\ +F(x_{N-1})-F(x_{N-2})\\+F(x_N)-F(x_{N-1})\end{array} \right\}\\= F(x_N) - F(x_0)\\= F(B) - F(A)\end{array}$

Holy crap!  The area under the function (the integral) is given by the antiderivative!  Again, this approximation becomes an equality as the number of rectangles becomes infinite.

As an aside (for those of you who really wanted to read an entire post about integrals), integrals are surprisingly robust.  That is to say, if your function has a kink in it (the way |x| has a kink at zero, for example) then you can’t find a derivative at that kink, but integrals don’t have that problem.  If there’s a kink or even a discontinuity; no problem!

You can just put the edge of a rectangle at the problem point, and then ignore it.  In fact, think of (almost) any function in your head…  You can take the integral of that.  It may have an infinite value, or something awful like that, but you can still take the integral.

To make a function that can’t be integrated you have to make it infinitely messed up.  Mathematicians live for this sort of thing.  There is almost nothing in the world they enjoy more than coming up with ways to break each other’s theories.  One of the classic examples is the function $f(x) = \left\{ \begin{array}{ll} 0,&\textrm{when x is a rational number}\\1,&\textrm{when x is an irrational number}\end{array}\right.$

Over any interval you pick, f still jumps around infinitely often, so the whole “things will get better as the number of rectangles increases” thing can never get off the ground.  There are fixes to this, but they come boiling and howling up out of the ever-darker, stygian abyss that is measure theory.

This entry was posted in -- By the Physicist, Equations, Math. Bookmark the permalink.

### 37 Responses to Q: Why is the integral/antiderivative the area under a function?

1. daniel vaca says:

neat!

What did you use to generate the plots?

2. The Physicist says:

Power Point

3. Borodin says:

I am two thirds mathematician and one third physicist, and am very upset by your explanation.

I have never heard of antiderivative, but perhaps that is because I am from England. Even so, it seems as useful as using antiquicker instead of distance.

You seem to be antiabstracting notions to a physical graph, from where you can explain the idea of ‘area under’ a function in geometric terms. You have devised a ladder back out of the hole you imagined yourself into.

When I accelerate to work there is no ‘area under’ my journey: I stop and start as I please, and when I arrive I have covered enough miles for me to walk the remaining distance to my desk.

IMO you do mathematics a disservice by pretending that there is an x and a f(x) that can be drawn on a piece of paper and crayoned in. Even the relationship between trigonometry and imaginary numbers breaks down straight away.

4. The Physicist says:

British people really don’t use the term “anti-derivative” or crayons?

5. Cardshark says:

Borodin: anyone who’s taken an undergraduate degree in mathematics and has done a first year analysis course should have heard of an “antiderivative”! The more-meaty proof (which is what was explained heuristically, and usefully, above) is proving that this antiderivative is in fact the original function.

I don’t understand why it does maths a disservice by pretending there’s an x and an f(x)? No one’s pretending – we’re just saying that you could write it down in those terms if you wished and you could also find areas etc. also if you wished.

In England we don’t use crayons, no. We still use quill and ink of course – the Queen doesn’t allow otherwise.

6. Borodin says:

I have never heard “anti-derivative” before, although of course it is possible I blinked at the wrong times. And, erm, yes we have crayons – don’t you? I can’t think what American for crayon might be. I survived my time in Kansas City by being able to say “vanilla ice cream”.

7. OhMyEinstein says:

Why F’ = f?

8. The Physicist says:

It’s just a definition. I’m defining it that way.

9. Al says:

I have a question regarding the paragraph, “More precisely, if you have…”

We end by saying F(B)-F(A) gives the area under the curve and it all seemed to make logical sense. However, if I then return to the aforementioned paragraph and use what I’ve just learned, I come to the conclusions that

“f(c)(B-A) = F(B) – F(A)”

f(c)(B-A) is the area under the middle chart. However, it looks very much larger. It is also not intuitive to me that the x-coordinate, c, whose derivative matches the average gradient has a height that matches the corresponding average height. What am I doing wrong?

Many thanks.

10. The Physicist says:

Different c’s.
What you’ve done is turned the mean value theorem on it’s head. You’re statement boils down to: “for a smooth function, there exists a “c” between “A” and “B”, such that f(c) is equal to the average height of the function”. It’s this statement that actually gives the “mean value theorem” its name.
Imagine that the area under the curve is made of wax. When you melt it you end up with a rectangle with the same area and the same base length (B-A), but its height will be the average height of the original function.
So, if F(B) – F(A) is the area, then [F(B)-F(A)]/(B-A) is the average (mean) height. And the mean value theorem states that, for a smooth function, the function will assume its “mean value” at some point (c).

11. Al says:

Thank you for responding so quickly but I’m not sure that entirely answered my question. Do you mean to say you are referring to different c’s between the two equations:
f’(c)(B-A)=f(B)-f(A) and f(c)(B-A)=F(B)-F(A)
in which case how can I justify this step? I’m comfortable with the idea that on the interval [A,B] there exists a c whose gradient matches the average gradient and I’m comfortable with the idea that there exists a c whose f(c) corresponds to the average height and that these are not necessarily the same c. But I don’t understand how to get between these two equations.

Many thanks.

12. The Physicist says:

There’s no generalizable relationship between the c’s for F, f, f’, f”, …
The Mean Value Theorem is just a property of functions. It’s nice to have a picture (the post had one, and you found another) to understand what’s going on, but it’s not the whole story.
The MVT just says: “for a smooth function, f, on (A,B) there’s at least one value, c, between A and B such that (B-A) f’(c) = f(B)-f(A)”. It establishes a relation between functions and their derivatives (f and f’) or (exactly equivalently) between anti-derivatives of functions and their original functions (F and f).
Did that just run us in circles?

13. Shikhi says:

@the physicist

thank you very much for the post..exactly what i was looking for..a mathematical proof that the antiderivative is indeed equal to the area under the curve…and i didnt find any flaws in it..pretty straightforward. i didnt think the MVT was of much use, but guess it is.

14. Al says:

I get it!
This whole time I’ve been struggling to see how you got from f’(c)(B-A)=f(B)-f(A) to f(c)(B-A)=F(B)-F(A) as though you were somehow integrating it up but you’re not doing that. You’re simply applying MVT to F and f as opposed to f and f’, and saying there must be some other (unrelated) c on the interval [A,B] that satisfies:

f(c)(B-A)=F(B)-F(A)

Physicist. Sorry for being so slow. Thanks for all your help.

15. The Physicist says:

Thank you for helping the other people with exactly the same question!

16. neeserg says:

this proof was in my back of my head, but, because of the confusion between F(x), f(x) and f ‘(x) graphs, i would always lose track writing it down in paper. thanks to this now i have, hopefully, a permanent concept of integration.

17. Dane says:

So, would it be safe to say that when the amount of rectangles approaches infinity the mean value theorem for integrals becomes an identity rather than a relationship? In other words, do the two endpoints of each rectangle become synonymous with the point c as the distance between the endpoints approaches zero? If this is the case than I get it! Rather simple compared to lebesgue measure I bet!

Mechanical Engineering Major
Thanks Alot

18. The Physicist says:

Basically, yes!
What you can do is show that it doesn’t matter what point you pick. As the rectangles get thinner and thinner they’ll all give you very nearly the same answer. One of those points is “c”, so any point (for a very thin rectangle) will give you about the same answer as c. As the width of every rectangle is taken to zero, the difference between “nearly the same” and “exactly the same” disappears.

19. Guston says:

@physicist: About the method you use in this post to show the relationship between antiderivative and area under a curve. Is it your own method, or is it someone else’s? Does it have a popular name?

20. The Physicist says:

The “intuitive” part of the post is my own approach (not a proof), but the “mathematical” part is just the standard proof.

21. Ko says:

Sooo I tried to explain to myself how u go from f’(c)(B-A)=f(B)-f(A) to f(c)(B-A)=F(B)-F(A) and even after reading al’s comments and your concomitant answers I still end up nowhere. I do fully (at least that’s what my mind is saying) comprehend f’(c)(B-A)=f(B)-f(A), but I cannot make the logical transition to f(c)(B-A)=F(B)-F(A). Please help me put an end to this struggle

22. Ko says:

So is it because the relationship between f’ and f is exactly the same as that of f and F and vice versa ?

23. The Physicist says:

Yup! Exactly.

24. HishoBoB says:

Hey guys,Im a lebanese Gs(physics-math) student id like to travel abroad but I dont know if my level is sufficient to do that,so if someone would help me..Were taking Logic.metric relations.irrational functions.parametric curves.conics(hyper,ellipse,parabola,curves of 2nd degree).level curves.mean value theoram.applications on complex numbers.transformTion plAnes.complements for integrals.numericL sequence.sphere.functions(limits,continuity,derivative).inverse functions.trigonometric functions.system of linear equations.vect Hey guys,Im a lebanese Gs(physics-math) student id like to travel abroad but I dont know if my level is sufficient to do that,so if someone would help me..Were taking Logic.metric relations.irrational functions.parametric curves.conics(hyper,ellipse,parabola,curves of 2nd degree).level curves.mean value theoram.applications on complex numbers.transformTion plAnes.complements for integrals.numericL sequence.sphere.functions(limits,continuity,derivative).inverse functions.trigonometric functions.system of linear equations.vector product-mixed product.lines and planes in space.complex numbers.integration.logarithims.exponentials.differential equations.binary operaation.statistics.counting.probability product-mixed product.lines and planes in space.complex numbers.integration.logarithims.exponentials.differential equations.binary operaation.statistics.counting.probability

25. HishoBoB says:

Weve taken this proof,and moreover we had A test(4 hour tests usually),in iy she asked prove that the area of a circle is pie.r*2 ,and we didnt do such a thing in class,and we had only solved 1/4 of the integral course in that time.

26. Jam says:

Hi,

Thank you for that explanation. I’m a physicist (not really, just have a bachelor’s degree) and I never really questioned why the slope of the tangent line was the opposite of the area under the curve. This makes perfect sense.

27. Sung says:

“f ‘(c) (B-A) = f(B)-f(A) or f(c) (B-A) = F(B)-F(A) (since F’ =f).”

This doesn’t make any sense to me. And, if this is wrong, your entire derivation is wrong!

28. The Physicist says:

You’re right, it would be.
The mean value theorem is a statement about any function and the derivative of that function.
If the function is $f$, the derivative is $f^\prime$.
If the function is $F$, the derivative is $f$.

29. som says:

if we integrate the f(x) with dx it gives area bw curve fx and x axia with appropriate limit
as same if we integrate f(x) with dg(x) then it can show the area btween both curve fx and gx or not or what limit we use in ……. Int.f(x)dg(x).d(x)/d(x)

30. vishal says:

can u give the proof that integration is inverse of differentiation??

31. John Gabriel says:

It is a fallacy that integration is the reverse of differentiation.

An ante-derivative or primitive function can be used in the “summation” process, but it need not be. Finding an ante-derivative is NOT integration. Integration involves determining the product of two averages (more about this in my New Calculus). Integration has nothing to do with summation which stems from the idiotic ideas of Leibniz and Riemann.

32. John Gabriel says:

@The Physicist:

You wrote:
The mean value theorem is a statement about any function and the derivative of that function.
If the function is f, the derivative is f^\prime.
If the function is F, the derivative is f.

Correction: The mean value theorem states that the average value of the ordinates of any function f ‘ (x) on the interval (a,b) is given by the ratio: {f(b)-f(a)}/b-a

33. John Gabriel says:

One of the non-mathematicians who is a moderator on this site, deleted one of my comments. That comment contained a link:

http://www.spacetimeandtheuniverse.com/math/6661-0-999-really-equal-1-a-5.html#post23696

This link is the actual explanation of the connection between derivative and integral. All else is rubbish.

34. John Gabriel says:

@Sung wrote: “f ‘(c) (B-A) = f(B)-f(A) or f(c) (B-A) = F(B)-F(A) (since F’ =f).”

This doesn’t make any sense to me. And, if this is wrong, your entire derivation is wrong!

Gabriel: “f ‘(c) (B-A) = f(B)-f(A) or f(c) (B-A) = F(B)-F(A) (since F’ =f).” is correct.

35. gbc says:

I’m having trouble believing the area under the curve should approach the area of rectangles under the curve as the width of those rectangles decreases. It makes sense that the error for a particular rectangle decreases, but there are also more rectangles, so it’s not clear to me that the sum of those errors decreases or ever reaches zero.

So,

1. How do you know the decrease in error per rectangle is dominant over the quantity of rectangles as the width of the rectangles approaches zero?

2. How do you know the error reaches (not just approaches) zero at the limit?

36. GentleMathematician says:

@Borodin: I am a mathematician who did his mathematical training in England (at what is generally considered an important and famous UK university) and who now works in North America. I can tell you that the expression “antiderivative” is used widely on both sides of the pond. It is standard terminology in calculus, and it is not made up. In any case, it doesn’t matter because the author of the post explains clearly what he / she means by “antiderivative”. A definition is a definition. Mathematicians must introduce new definitions all of the time, in order to communicate their research effectively. As long as a definition is written clearly, it is “fair game”. If we did not introduce new words as we come across new ideas, mathematics would grind to a halt.

I don’t think the author is doing any kind of disservice to mathematics. When you write “When I accelerate to work there is no ‘area under’ my journey…”, I see that you are failing to understand the point of the author’s article, and to understand the necessity for abstraction in the development of mathematical tools.