# Q: Why is the integral/antiderivative the area under a function?

Physicist: If you’ve taken calculus, then at some point you learned that to find the area under a function (generally written $\int_A^B f(x) \, dx$) you need to find the anti-derivative of that function.  The most natural response to these types of theorems is “wait… what?… why?”.

This theorem is so important and widely used that it’s called the “fundamental theorem of calculus”, and it ties together the integral (area under a function) with the antiderivative (opposite of the derivative) so tightly that the two words are essentially interchangeable.  However, there are some mathematicians who may take issue with mixing up the two terms.

It comes back (in a roundabout way) to the fact that the derivative of a function is the slope of that function or the “rate of change”.  In what follows “f” is a function, and “F” is its anti-derivative (that is: F’ = f).

Intuitively: Say you’ve got a function f(x), and the area under f(x) (up to some value x) is given by A(x).

Then the statement “the area, A, is given by the anti-derivative of f” is equivalent to “the derivative of A is given by f”.

In other words, the rate at which the area increases (as you slide x to the right) is given by the height, f(x).

For a constant function the area is given by A=cx, and the rate of increase (the amount that the area increases if x increases by 1) is c. Whether or not the function moves around makes no difference. From moment-to-moment the rate of increase is always equal to the height (the value of f).

For example, if the height of the function were 3, then, for a moment, the area under the function is increasing by 3 for every 1 unit of distance you slide to the right.  Keep in mind that the function can move up and down as much as it wants.  As far as the function “knows”, at any particular moment it may as well be constant (dotted line in picture above).

So if the height of the function (which is just the function) is the rate at which the area changes, then f is the derivative of the area: A’=f.  But that’s exactly the same as saying that the area is the anti-derivative of the function.

Mathematically: There’s a theorem called the mean value theorem that states that if you have a “smooth” function with no sudden bends or kinks, then over any interval the derivative will be equal to the average slope at least once.  This needs a picture:

Given a smooth function f, there's a point c where the function has the same slope as the overall average slope.

More precisely, if you have a function on the interval [A,B], then there’s a point c between A and B such that $f^\prime (c) = \frac{f(B)-f(A)}{B-A}$.  You can just as easily write this as $f^\prime (c) (B-A) = f(B)-f(A)$ or $f(c) (B-A) = F(B)-F(A)$ (since F’ =f).

So if you drive 60 miles in one hour, then at some instant you must have been driving at exactly 60 mph, even though for almost the entire trip you may have been traveling much faster or much slower than 60 mph.

Keep that stuff in the back of your mind for a moment, and ponder instead how to go about approximating the area under a function.

You can approximate the area under a function by dividing it up into a whole lot of tiny rectangles. The area of each is the width times the height, where the height is any value of f in that particular interval. Choosing different values does change the area of that rectangle, but it turns out that that doesn't matter.

You can divide up the area between x=A and x=B under a function by putting a mess of rectangles under it.   Divide up the interval [A,B] by picking a string of points x0, x1, x2, …, xN, and use these as the left and right sides of your rectangles (and set x0=A and xN=B).

The point, ci, that you pick in between each xi-1 and xi is unimportant.  To get the exact area you let N, the total number of rectangles, go flying off to infinity, and you’ll find that the highest value of f and the lowest value of f in each tiny interval gets squeezed together.

So, why not choose a value of ci so that in each rectangle you can say $f(c_i) (x_i-x_{i-1}) = F(x_i)-F(x_{i-1})$?

$\begin{array}{ll}area \\\approx \sum_{i=1}^N f(c_i) (x_i-x_{i-1}) \\= \sum_{i=1}^N F(x_i)-F(x_{i-1}) \\= \left\{ \begin{array}{ll}F(x_1)-F(x_0)\\+F(x_2)-F(x_1)\\+F(x_3)-F(x_2)\\ \cdots \\ +F(x_{N-1})-F(x_{N-2})\\+F(x_N)-F(x_{N-1})\end{array} \right\}\\= F(x_N) - F(x_0)\\= F(B) - F(A)\end{array}$

Holy crap!  The area under the function (the integral) is given by the antiderivative!  Again, this approximation becomes an equality as the number of rectangles becomes infinite.

As an aside (for those of you who really wanted to read an entire post about integrals), integrals are surprisingly robust.  That is to say, if your function has a kink in it (the way |x| has a kink at zero, for example) then you can’t find a derivative at that kink, but integrals don’t have that problem.  If there’s a kink or even a discontinuity; no problem!

You can just put the edge of a rectangle at the problem point, and then ignore it.  In fact, think of (almost) any function in your head…  You can take the integral of that.  It may have an infinite value, or something awful like that, but you can still take the integral.

To make a function that can’t be integrated you have to make it infinitely messed up.  Mathematicians live for this sort of thing.  There is almost nothing in the world they enjoy more than coming up with ways to break each other’s theories.  One of the classic examples is the function $f(x) = \left\{ \begin{array}{ll} 0,&\textrm{when x is a rational number}\\1,&\textrm{when x is an irrational number}\end{array}\right.$

Over any interval you pick, f still jumps around infinitely often, so the whole “things will get better as the number of rectangles increases” thing can never get off the ground.  There are fixes to this, but they come boiling and howling up out of the ever-darker, stygian abyss that is measure theory.

This entry was posted in -- By the Physicist, Equations, Math. Bookmark the permalink.

### 50 Responses to Q: Why is the integral/antiderivative the area under a function?

1. daniel vaca says:

neat!

What did you use to generate the plots?

2. The Physicist says:

Power Point

3. Borodin says:

I am two thirds mathematician and one third physicist, and am very upset by your explanation.

I have never heard of antiderivative, but perhaps that is because I am from England. Even so, it seems as useful as using antiquicker instead of distance.

You seem to be antiabstracting notions to a physical graph, from where you can explain the idea of ‘area under’ a function in geometric terms. You have devised a ladder back out of the hole you imagined yourself into.

When I accelerate to work there is no ‘area under’ my journey: I stop and start as I please, and when I arrive I have covered enough miles for me to walk the remaining distance to my desk.

IMO you do mathematics a disservice by pretending that there is an x and a f(x) that can be drawn on a piece of paper and crayoned in. Even the relationship between trigonometry and imaginary numbers breaks down straight away.

4. The Physicist says:

British people really don’t use the term “anti-derivative” or crayons?

5. Cardshark says:

Borodin: anyone who’s taken an undergraduate degree in mathematics and has done a first year analysis course should have heard of an “antiderivative”! The more-meaty proof (which is what was explained heuristically, and usefully, above) is proving that this antiderivative is in fact the original function.

I don’t understand why it does maths a disservice by pretending there’s an x and an f(x)? No one’s pretending – we’re just saying that you could write it down in those terms if you wished and you could also find areas etc. also if you wished.

In England we don’t use crayons, no. We still use quill and ink of course – the Queen doesn’t allow otherwise.

6. Borodin says:

I have never heard “anti-derivative” before, although of course it is possible I blinked at the wrong times. And, erm, yes we have crayons – don’t you? I can’t think what American for crayon might be. I survived my time in Kansas City by being able to say “vanilla ice cream”.

7. OhMyEinstein says:

Why F’ = f?

8. The Physicist says:

It’s just a definition. I’m defining it that way.

9. Al says:

I have a question regarding the paragraph, “More precisely, if you have…”

We end by saying F(B)-F(A) gives the area under the curve and it all seemed to make logical sense. However, if I then return to the aforementioned paragraph and use what I’ve just learned, I come to the conclusions that

“f(c)(B-A) = F(B) – F(A)”

f(c)(B-A) is the area under the middle chart. However, it looks very much larger. It is also not intuitive to me that the x-coordinate, c, whose derivative matches the average gradient has a height that matches the corresponding average height. What am I doing wrong?

Many thanks.

10. The Physicist says:

Different c’s.
What you’ve done is turned the mean value theorem on it’s head. You’re statement boils down to: “for a smooth function, there exists a “c” between “A” and “B”, such that f(c) is equal to the average height of the function”. It’s this statement that actually gives the “mean value theorem” its name.
Imagine that the area under the curve is made of wax. When you melt it you end up with a rectangle with the same area and the same base length (B-A), but its height will be the average height of the original function.
So, if F(B) – F(A) is the area, then [F(B)-F(A)]/(B-A) is the average (mean) height. And the mean value theorem states that, for a smooth function, the function will assume its “mean value” at some point (c).

11. Al says:

Thank you for responding so quickly but I’m not sure that entirely answered my question. Do you mean to say you are referring to different c’s between the two equations:
f'(c)(B-A)=f(B)-f(A) and f(c)(B-A)=F(B)-F(A)
in which case how can I justify this step? I’m comfortable with the idea that on the interval [A,B] there exists a c whose gradient matches the average gradient and I’m comfortable with the idea that there exists a c whose f(c) corresponds to the average height and that these are not necessarily the same c. But I don’t understand how to get between these two equations.

Many thanks.

12. The Physicist says:

There’s no generalizable relationship between the c’s for F, f, f’, f”, …
The Mean Value Theorem is just a property of functions. It’s nice to have a picture (the post had one, and you found another) to understand what’s going on, but it’s not the whole story.
The MVT just says: “for a smooth function, f, on (A,B) there’s at least one value, c, between A and B such that (B-A) f'(c) = f(B)-f(A)”. It establishes a relation between functions and their derivatives (f and f’) or (exactly equivalently) between anti-derivatives of functions and their original functions (F and f).
Did that just run us in circles?

13. Shikhi says:

@the physicist

thank you very much for the post..exactly what i was looking for..a mathematical proof that the antiderivative is indeed equal to the area under the curve…and i didnt find any flaws in it..pretty straightforward. i didnt think the MVT was of much use, but guess it is.

14. Al says:

I get it!
This whole time I’ve been struggling to see how you got from f’(c)(B-A)=f(B)-f(A) to f(c)(B-A)=F(B)-F(A) as though you were somehow integrating it up but you’re not doing that. You’re simply applying MVT to F and f as opposed to f and f’, and saying there must be some other (unrelated) c on the interval [A,B] that satisfies:

f(c)(B-A)=F(B)-F(A)

Physicist. Sorry for being so slow. Thanks for all your help.

15. The Physicist says:

Thank you for helping the other people with exactly the same question!

16. neeserg says:

this proof was in my back of my head, but, because of the confusion between F(x), f(x) and f ‘(x) graphs, i would always lose track writing it down in paper. thanks to this now i have, hopefully, a permanent concept of integration.

17. Dane says:

So, would it be safe to say that when the amount of rectangles approaches infinity the mean value theorem for integrals becomes an identity rather than a relationship? In other words, do the two endpoints of each rectangle become synonymous with the point c as the distance between the endpoints approaches zero? If this is the case than I get it! Rather simple compared to lebesgue measure I bet!

Mechanical Engineering Major
Thanks Alot

18. The Physicist says:

Basically, yes!
What you can do is show that it doesn’t matter what point you pick. As the rectangles get thinner and thinner they’ll all give you very nearly the same answer. One of those points is “c”, so any point (for a very thin rectangle) will give you about the same answer as c. As the width of every rectangle is taken to zero, the difference between “nearly the same” and “exactly the same” disappears.

19. Guston says:

@physicist: About the method you use in this post to show the relationship between antiderivative and area under a curve. Is it your own method, or is it someone else’s? Does it have a popular name?

20. The Physicist says:

The “intuitive” part of the post is my own approach (not a proof), but the “mathematical” part is just the standard proof.

21. Ko says:

Sooo I tried to explain to myself how u go from f’(c)(B-A)=f(B)-f(A) to f(c)(B-A)=F(B)-F(A) and even after reading al’s comments and your concomitant answers I still end up nowhere. I do fully (at least that’s what my mind is saying) comprehend f’(c)(B-A)=f(B)-f(A), but I cannot make the logical transition to f(c)(B-A)=F(B)-F(A). Please help me put an end to this struggle 😀

22. Ko says:

So is it because the relationship between f’ and f is exactly the same as that of f and F and vice versa ?

23. The Physicist says:

Yup! Exactly.

24. HishoBoB says:

Hey guys,Im a lebanese Gs(physics-math) student id like to travel abroad but I dont know if my level is sufficient to do that,so if someone would help me..Were taking Logic.metric relations.irrational functions.parametric curves.conics(hyper,ellipse,parabola,curves of 2nd degree).level curves.mean value theoram.applications on complex numbers.transformTion plAnes.complements for integrals.numericL sequence.sphere.functions(limits,continuity,derivative).inverse functions.trigonometric functions.system of linear equations.vect Hey guys,Im a lebanese Gs(physics-math) student id like to travel abroad but I dont know if my level is sufficient to do that,so if someone would help me..Were taking Logic.metric relations.irrational functions.parametric curves.conics(hyper,ellipse,parabola,curves of 2nd degree).level curves.mean value theoram.applications on complex numbers.transformTion plAnes.complements for integrals.numericL sequence.sphere.functions(limits,continuity,derivative).inverse functions.trigonometric functions.system of linear equations.vector product-mixed product.lines and planes in space.complex numbers.integration.logarithims.exponentials.differential equations.binary operaation.statistics.counting.probability product-mixed product.lines and planes in space.complex numbers.integration.logarithims.exponentials.differential equations.binary operaation.statistics.counting.probability

25. HishoBoB says:

Weve taken this proof,and moreover we had A test(4 hour tests usually),in iy she asked prove that the area of a circle is pie.r*2 ,and we didnt do such a thing in class,and we had only solved 1/4 of the integral course in that time.

26. Jam says:

Hi,

Thank you for that explanation. I’m a physicist (not really, just have a bachelor’s degree) and I never really questioned why the slope of the tangent line was the opposite of the area under the curve. This makes perfect sense.

27. Sung says:

“f ‘(c) (B-A) = f(B)-f(A) or f(c) (B-A) = F(B)-F(A) (since F’ =f).”

This doesn’t make any sense to me. And, if this is wrong, your entire derivation is wrong!

28. The Physicist says:

You’re right, it would be.
The mean value theorem is a statement about any function and the derivative of that function.
If the function is $f$, the derivative is $f^\prime$.
If the function is $F$, the derivative is $f$.

29. som says:

if we integrate the f(x) with dx it gives area bw curve fx and x axia with appropriate limit
as same if we integrate f(x) with dg(x) then it can show the area btween both curve fx and gx or not or what limit we use in ……. Int.f(x)dg(x).d(x)/d(x)

30. vishal says:

can u give the proof that integration is inverse of differentiation??

31. John Gabriel says:

It is a fallacy that integration is the reverse of differentiation.

An ante-derivative or primitive function can be used in the “summation” process, but it need not be. Finding an ante-derivative is NOT integration. Integration involves determining the product of two averages (more about this in my New Calculus). Integration has nothing to do with summation which stems from the idiotic ideas of Leibniz and Riemann.

32. John Gabriel says:

@The Physicist:

You wrote:
The mean value theorem is a statement about any function and the derivative of that function.
If the function is f, the derivative is f^\prime.
If the function is F, the derivative is f.

Correction: The mean value theorem states that the average value of the ordinates of any function f ‘ (x) on the interval (a,b) is given by the ratio: {f(b)-f(a)}/b-a

33. John Gabriel says:

One of the non-mathematicians who is a moderator on this site, deleted one of my comments. That comment contained a link:

http://www.spacetimeandtheuniverse.com/math/6661-0-999-really-equal-1-a-5.html#post23696

This link is the actual explanation of the connection between derivative and integral. All else is rubbish.

34. John Gabriel says:

@Sung wrote: “f ‘(c) (B-A) = f(B)-f(A) or f(c) (B-A) = F(B)-F(A) (since F’ =f).”

This doesn’t make any sense to me. And, if this is wrong, your entire derivation is wrong!

Gabriel: “f ‘(c) (B-A) = f(B)-f(A) or f(c) (B-A) = F(B)-F(A) (since F’ =f).” is correct.

35. gbc says:

I’m having trouble believing the area under the curve should approach the area of rectangles under the curve as the width of those rectangles decreases. It makes sense that the error for a particular rectangle decreases, but there are also more rectangles, so it’s not clear to me that the sum of those errors decreases or ever reaches zero.

So,

1. How do you know the decrease in error per rectangle is dominant over the quantity of rectangles as the width of the rectangles approaches zero?

2. How do you know the error reaches (not just approaches) zero at the limit?

36. GentleMathematician says:

@Borodin: I am a mathematician who did his mathematical training in England (at what is generally considered an important and famous UK university) and who now works in North America. I can tell you that the expression “antiderivative” is used widely on both sides of the pond. It is standard terminology in calculus, and it is not made up. In any case, it doesn’t matter because the author of the post explains clearly what he / she means by “antiderivative”. A definition is a definition. Mathematicians must introduce new definitions all of the time, in order to communicate their research effectively. As long as a definition is written clearly, it is “fair game”. If we did not introduce new words as we come across new ideas, mathematics would grind to a halt.

I don’t think the author is doing any kind of disservice to mathematics. When you write “When I accelerate to work there is no ‘area under’ my journey…”, I see that you are failing to understand the point of the author’s article, and to understand the necessity for abstraction in the development of mathematical tools.

37. Steve says:

The crux of this article which I liked the most was this: “the rate at which the area increases is given by f ” – a very interesting view to represent the integral/area relation in a reverse notion! This is what I was exactly looking for in understanding the “why”!

Thanks!
Steve

38. Pablo Jeynes says:

A good name is the “Area So Far” function A(x) of a function f(x). The rate of change of this area function thing is clearly f (x) if one considers a thin slice of x width dx say.

Alternately or equivalently using time t as variable, for an object in motion then the distance moved is clearly the sum of velocity multiplied by time intervals. So the distance is just the area under velocity graph. While velocity is the rate of change of distance. And that’s it explained! ( if you don’t see it yet drink more beer!)

These simple intuitive ideas may need more care with wierd functions, but the for the present question wanting simple explanation its OK.

39. Pablo Jeynes says:

GBC has a good concern about the little triangular errors when adding up the myriad of thin rectangles. As our increment gets smaller the error gets proportionately smaller and even though we have more rectangles the total error is the sum of rectangle base areas which is the finite distance along x , say X, multiplied by the error … which is like a thin strip of wonky triangular error bits. However we see the strip gets narrower as dx gets smaller but the strip length stays the same, roughly X, so the error just gets smaller proportionately as dx which we can take as small as we like and finally let be vanishingly tiny. The bits do effectively vanish in limit.

40. DEVESH KOYEE says:

What’s the formula to calculate area covered by any sphere on athe plane surface?

41. Tom Rose says:

This comment is in response to GBC and Pablo Jeynes, regarding the question of whether the remainder area of triangles does or does not approach zero. I think that the following intuitive view indicates that the remainder area does in fact tend to zero:

Draw a right-angled triangle with a horizontal base and vertical side. For the purpose of this discussion, let the triangle be representative of the top end of any sliver. Next subdivide that triangle into two with a vertical line, and where the vertical line meets the hypotenuse, go across horizontally to the side of the triangle. This process represents what happens when the number of the slivers dividing the function is doubled. Just by looking at the original triangle, there is now a portion of that triangle contained within the top of a rectangular segment. So, doubling the number of slivers has reduced the remainder area by some proportion.

Repeating this process will further reduce the remainder areas, hence the remainder area tends to zero as the divisions increase. Pretend that each increase of divisions is a doubling of the number of divisions if that helps create a picture.

42. DW says:

This is great!

43. Bijay Shrestha says:

Thanks. I was just learning application of anti derivative in school(XI). I understood well enough and the part how u explained about line making area was awesome. I thought that integrating gives you nearly accurate area. but u said that it gives accurate area. Whats with that?.

44. Lilly says:

Hello, Physicist!

I am a pre-university student studying calculus. First of all, thank you for the explanation regarding the area under the curve. However, I’m studying using a book (Calculus: Late Transcendentals by Irl Bivens) which states that:
Let F(x) = A(x)+C
F(b) – F(a) = {A(b)+C} – {A(a)+C} = A(b) – A(a) = A since area under point a=0.

My first question is: How can F(x)=A(x)+C?
Secondly, if A(a) really is 0 all the time, why do I obtain values for F(a) when
F(a) = A(a) + C is supposedly 0 all the time? Or is this logic faulty? Please do reply. Thank you!

45. VarsityPhysics says:

It seems that the Riemann sum formed is always equal to the F(B)-F(A) even without the limit being applied to generate the equality in the last step. As it’s written, the sum of a finite number of rectangular areas reduces to F(B)-F(A). How does the limit change this to being an equality?

46. The Physicist says:

@VarsityPhysics
That’s a little subtle. By taking the limit you make the location of the point inside interval that determines the height of the rectangles irrelevant. Whether you always choose the highest, or the lowest, or the “correct” point (the ci that gives an exact, rather than approximate, answer and is guaranteed to exist by the mean value theorem) doesn’t matter. In the limit (assuming the function is continuous) all of these converge to the same value, and that value can be calculated using the “correct” points. Basically, the limit allows you to say “don’t sweat the point placement”.

47. VarsityPhysics says:

Thanks for the reply. That is subtle. Another question: I was watching the proof of this in the old Mechanical Universe series (The one on Integration). It presents the fundamental theorem using the concept of the area being a function of “x” as you start of with and passages to the derivative of this area function via the limit. It very nicely makes the case that the area od a given segment under the curve is equivalent to the a rectangular area given by the height of the function f(x) for some x times its base (Delta x). Would it turn out that this value of “x” is the same as the one that satisfies the mean value theorem on that interval?
Timestamp = 17:30
https://archive.org/details/The_Mechanical_Universe_and_Beyond_07_Integration

48. VarsityPhysics says:

Answered my own question I think. It corresponds to the average value of the function on that interval.

49. Anon says: