# But WHERE do the Taylor Series and Lagrange Error Bound even come from?!

*An intuitive derivation.*

*See below for the video version of this post:*

If you’ve taken calculus, then you’re probably familiar with the idea that lots of functions can be written as infinite polynomials, called **Taylor series**. For example, the Taylor series of the exponential function, $e^x$, is given by

In general, there’s a formula for the Taylor series of a function $f(x)$ centered around $x=a$:

These are *infinite* Taylor series. But often we want to approximate a function with only up to the $n^\text{th}$ degree of the Taylor series.

This approximation will often be off by a small amount, and there is a bound on that amount of error, which is called the **Lagrange error bound**.

Now, you may have done a few numerical examples and seen that indeed, Taylor series can be pretty good estimations. You may have even proven the Taylor series formula. I’ve done that too, but I’ve never really found that entirely satisfying, because how could somebody even come up with this series in the first place? You don’t just magically think of a series that has all these cool properties without some prior intuition. You need some sort of idea in your mind, that guides your thinking on the way to coming up with this Taylor series. And that’s what we’re going to learn about in this post.

But first of all, let’s get ourselves in the right frame of mind. Suppose you’re a mathematician back in the 17th century, and there’s this cool new toy called “calculus” that people are playing with. You can differentiate, and you can integrate, and there’s this cool theorem that links both of those ideas together, called the **Fundamental Theorem of Calculus**:

Being a mathematician, you want to discover something new that nobody has seen before – and one thing that often leads people to new discoveries is thinking about something in a new way. So, let’s think about this Fundamental Theorem of Calculus in a new way.

If we replace the $b$ with an $x$, and solve for $f(x)$, then it’s kind of like we’re saying that $f(x)$ can be approximated by $f(a)$, and there’s some sort of error in that approximation.

It makes sense that the error should be related to the derivative. If we are approximating $f(x)$ by a constant, $f(a)$, then we’re saying $f(x)$ is not changing at all. But often, that’s not the case, and the amount by which $f(x)$ is changing at any point is given by the derivative $f’(x)$. It’s like we’re saying $f(x)$ can be approximated by a constant, and then the error given by that approximation comes from integrating the derivative, which represents the extent to which $f(x)$ is not constant.

Now, what if we want to get a *better* approximation for $f(x)$? Maybe instead of just a constant, we want to approximate $f(x)$ as a line. If we do that, then the error should represent the degree to which $f(x)$ is nonlinear, which would be represented by the *second* derivative, $f’^\prime(x)$.

We write down something similar to the Fundamental Theorem of Calculus, but this time, instead of integrating the first derivative, we double-integrate the second derivative.

This looks a little crazy, but the inner integral is very similar to what we had before. The only difference is that we have an extre prime on the $f$, so the result will get an extra prime as well.

We substitute this result for the inner integral, and split up the outer integral over the subtraction.

The left integral is exactly like we had before, and for the right integral, remember that $f’(a)$ is just a constant, so we can factor it out.

We can solve for $f(x)$ in this equation:

This is the result that we were hoping for! There is a linear approximation, and the error in that approximation is represented by the double integral term. The double integral term contains $f’^\prime(x)$, which represents the degree to which $f(x)$ is nonlinear

If you were to continue this process over and over again, say you differentiate $n+1$ times and integrate $n+1$ times, you’d end up with an $n^\text{th}$ degree polynomial approximation for $f(x)$, with an error term consisting of $n+1$ integrals of the $(n+1)^\text{st}$ derivative. The approximation is exactly the $n^\text{th}$ degree Taylor polynomial,

This is the intuition behind where the Taylor polynomial comes from. But what about the Lagrange error bound? The expression

doesn’t look too familiar at this point, but it’s actually only one step away from the Lagrange error bound.

Let’s write our error term again, and see if we can place a bound on its magnitude. The highest possible value that this integral can come out to, is the integral of the maximum magnitude of the $(n+1)^\text{st}$ derivative. If we define

then we can place a bound on the magnitude of the error term:

The innermost integral is simply $x-a$.

If you integrate that again, you get $\frac{(x-a)^2}{2}$,

and if you integrate *that* again, you get $\frac{(x-a)^3}{3 \cdot 2}$, and $3 \cdot 2$ is the same as $3!$.

If you keep on integrating all of those $n+1$ integrals, you get a result of $\frac{(x-a)^{n+1}}{(n+1)!}$.

Now we just write in the rest of our expression with the $M$ and the absolute value.

We know that $M$ is nonnegative based on how we defined it, and the $(n+1)!$ is also nonnegative, so we can bring those out of the absolute value, and it’s just the $x-a$ that could potentially be negative. And there we have it, the Lagrange error bound!