Integral Calculus

MATH 10A METHODS OF MATHEMATICS:
CALCULUS, STATISTICS AND

COMBINATORICS
L. Pachter, B. Sturmfels and L.C. Evans
Department of Mathematics
University of California, Berkeley
October 25, 2015
1 / 158
Overview of Part 2: Integral calculus
The main references for this part are

I
I
I
Sebastian J. Schreiber, Karl Smith and Wayne Getz,

Calculus for Life Sciences 1E for UC Berkeley, Wiley
J. Stewart, Calculus, 7th edition, Cengage
C. Neuhauser, Calculus for Biology and Medicine, 3rd edition,
Prentice Hall
2 / 158
1. Histograms
2. Integrals and area
3. Approximation methods
4. Applications of integration
5. Antiderivatives, Fundamental Theorem of Calculus
6. Integration techniques
3 / 158
Section 1
Histograms
4 / 158
A. Displaying data
DEFINITION
A histogram is a graphical representation providing a visual impression
of the distribution of data. It consists of adjacent rectangles, erected over
given intervals, with areas equal to the proportion of the observations in
each interval.
Density
0.00 0.05 0.10 0.15 0.20 0.25 0.30
A Histogram
We will sometimes also think of the intervals as bins into which our data
points are distributed.
5 / 158
Example 1.1 (Birth weight and smoking)
Mothers who did not smoke
Density
0.025
0.020
0.015
0.010
0.005
0.000
60
80
100
120
140
160
180
160
180
Birth weight in ounces
Mothers who smoked
Density
0.020
0.015
0.010
0.005
0.000
60
80
100
120
140
6 / 158
How to draw histograms

I
First, choose the consecutive intervals (or bins) I1 , I2 , . . . Im into

which the data points are distributed.
Calculate the number of data points nk within each interval Ik . Then

N = n1 + n2 + + nm
is the total number of points.

We want the area of the rectangle Rk above the interval Ik to be nNk .
Since the area of a rectangle equals its height times its width, we
take
nk
sk = height of Rk =
N(length of Ik )
Then the total area of the histogram equals
m
X
k=1
(area of Rk ) =
m
X
nk
k=1
= 1.
7 / 158
Area (Percent) = height x width

So, height = percent/width
Bin width
8 / 158
Example 1.2 (Calculating percentiles using histograms)

What percentage of women who smoked had children with birth weights
less than 90 ounces?
Mothers who smoked
0.020
Density
0.015
0.010
0.005
0.000
60
80
100
120
140
160
180
9 / 158
We see that 8.68% of mothers who smoked had a child weighing less
than 90 ounces (5.63 lbs).
The red lines represent the 25th, 50th (median), and 75th
percentiles.
10 / 158
B. Partitioning an interval
When we decide upon the intervals/bins into which to sort our data
points for a histogram, we are in effect creating a partition of an interval.
DEFINITION
If a = x0 < x1 < < xm1 < xm = b, we call P = {x0 , x1 , ..., xm } a
partition of an interval [a, b].
The partition P divides the interval [a, b] into the m closed subintervals
I1 = [x0 , x1 ], I2 = [x1 , x2 ], ..., Im = [xm1 , xm ].
11 / 158
Example 1.3
Let Y = {1.2, 1.5, 1.5, 2.2, 2.2, 2.7, 5.5, 5.7} be the data we want to
graph. The minimum is 1.2 and the maximum is 5.7. We round 1.2 down
to the nearest integer and round 5.7 up to the nearest integer.
We choose our partition of [1, 6] to be P = {1, 2, 3, 5, 6}.
12 / 158
C. Step functions
To calculate and plot the height of the rectangles, we were actually

defining a piecewise constant function
if x0 x x1
s 1
s 2
if x1 < x x2
s(x) = .
..
sm if xm1 < x xm ,
where sk is the height of the rectangle over the kth subinterval.
13 / 158
Example 1.4
For our data, the percentages in the intervals (areas of the rectangles)
are 37.5, 37.5, 0, and 25. We divide each of these percentages by
100(width of the interval).
The function is then defined as
0.375
0.375
s(x) =
0.25
if
if
if
if
1x
2<x
3<x
5<x
2
3
5
6
14 / 158
DEFINITION
Let P = {x0 , x1 , ..., xm } be a partition of [a, b]. A step function is a
function s : [a, b] R that is constant on the open subintervals of P.
Denote sk as the constant value that s takes in the kth open subinterval
Ik :
s(x) = sk if xk1 < x xk , (k = 1, 2, ..., m).
0.4
0.3
0.2
0.1
0.0
Concerning the breakpoints we assume s(xk ) = sk for k = 1, 2, ..., m.

15 / 158
Histograms are step functions

Remember: you can always think of histograms as step functions.
0.4
0.4
0.3
0.3
Percent
0.2
0.2
0.1
0.1
0.0
0.0
As we collect more data, we might make the partition of [a, b] finer and
finer. What happens then?
16 / 158
Section 2
Integrals and area
17 / 158
A. Integral of a step function

Suppose s and t are step functions on [a, b]. Let P1 and P2 be partitions
of [a, b] such that s is constant on the subintervals of P1 and t is constant
on the open subintervals of P2 . Define the sum u = s + t by the rule
u(x) = s(x) + t(x)
if a x b.
x1
x1
b a
x2
b a
x2
To show that u is actually a step function, we must find a partition P

such that u is constant on the open subintervals of P.
DEFINITION
The common refinement of P1 and P2 is the union P = P1 P2 .
18 / 158
DEFINITION
The integral of a step function s from a to b is the number
Z
s(x) dx :=
a
m
X
sk (xk xk1 ).
k=1
s4
s2
s5
s1
s6
s3
x1
x2
x3
x4
x5
If each sk 0, the integral is the area between the graph of the step
function and the xaxis.
19 / 158
THEOREM (Additive Property)

b
Z
s(x) + t(x) dx =
Z
s(x) dx +
t(x) dx
a
s+t
x1
b a
x2
b a
x2
x1
20 / 158
THEOREM (Homogeneous Property)

Z
Z
c s(x)dx = c
s(x)dx
a
2s
x1
x1
21 / 158
We can combine the previous two assertions:
THEOREM (Linearity)
Z
Z
c1 s(x) + c2 t(x) dx = c1
Z
s(x) dx + c2
t(x) dx
a
THEOREM (Invariance under translation)

Z
b+c
s(x c) dx
s(x) dx =
a
for every real number c
a+c
s(x)
s(xc)
x1+c
b+c
x1
b a+c
22 / 158
THEOREM (Comparison)
If s(x) t(x) for every x [a, b] then
Z
Z
s(x) dx
t(x) dx.
a
THEOREM (Expansion or contraction of the interval)

Z
kb
s
ka
x
k
Z
dx = k
s(x) dx
for every k > 0
Next, we turn to the problem of computing integrals of more general

functions. To do so, we will need to take limits.
23 / 158
B. Riemann integrals
Our next goal is finding the area under a curve:
A=?
24 / 158
We instead find the area of a collection of rectangles that approximate

the desired area. That is, we approximate f by a step function.
x0
x1
x2
A (x1 x0 )f (x0 ) + (x2 x1 )f (x1 )
25 / 158
Using 10 subintervals makes the approximation even better:
x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
10
X
f (xk1 )(xk xk1 )
k=1
26 / 158
Notation
Suppose P is a partition, dividing our interval [a, b] into m subintervals
I1 , . . . , Im .
(i) Let
xk = xk xk1
denote the length of the k-th subinterval Ik .
(ii) Let xk be any point in the k-th subinterval Ik .
We will build a rectangle of height f (xk ) above Ik . The area of this
rectangle is f (xk )xk ; and so the total area is
m
X
k=1
f (xk )xk =
m
X
f (xk )(xk xk1 ).
k=1
This is an approximation to the area under the curve, called a Riemann

sum.
27 / 158
To find the actual area, we want to let m get bigger and bigger and xk
get smaller and smaller. If we then send m , we should get the
actual area.
DEFINITION
The Riemann integral of f from a to b is
Z
f (x) dx = lim
a
m
X
f (xk )xk ,
k=1
provided this limit exists, irrespective of the choice of the partition or the
choice of the points xk .
For nonnegative functions f , the integral
f between a and b.
Rb
a
f (x) dx gives the area under
28 / 158
A useful fact is that the Riemann integral always exists for continuous
functions:
THEOREM
If f : [a, b] R is continuous, then the limit on the previous slide exists;
and thus
Z b
f (x) dx is defined.
a
Rb
Remark: It can also be shown that a f (x) dx is defined if f is
piecewise continuous, meaning that we can subdivide [a, b] into finitely
many subintervals I1 , . . . , Im , such that f restricted to each interval
lk = [xk1 , xk ] is continuous (after possibly being redefined at the
endpoints).
But how can we actually compute integrals?
29 / 158
Useful formulas for Riemann sums

When calculating Riemann sums, the following rules will be helpful:
I
m
X
i=1
m
X
i=
m(m + 1)
2
i2 =
m(m + 1)(2m + 1)
6
i3 =
m2 (m + 1)2
4
ri =
r m+1 1
r 1
i=1
I
m
X
i=1
m
X
i=0
(r 6= 1).
We will discuss in Math 10B how to use mathematical induction to

establish the first three of these formulas.
30 / 158
Example 2.1
Find
R2
0
x 2 dx.
SOLUTION:
For simplicity, lets choose our m subintervals to all be the same size.
2
Then xk = 20
m = m.
Also for simplicity, lets choose xk to be the left endpoint of our
subintervals. Then x1 = 0, x2 = m2 , x3 = 2 m2 , . . . , xk = (k 1)
2
m
We must therefore compute

2
m
m1
X
X 2i 2 2
2
2(k 1)
= lim
,
lim
m
m
m m
m
m
k=1
i=0
and for this will use formulas from the previous slide.
31 / 158
Z
0
m1
2 X 2 4
i 2
m m
m
x 2 dx = lim
i=0
m1
8 X 2
i
m m3
= lim
i=0
8 (m 1)m(2m 1)
= lim
m m3
6
8 m3 (1 m1 )(2 m1 )
= lim
m m3
6
8(1 0)(2 + 0)
=
6
16
8
=
= .
6
3
A=8 3
32 / 158
Integrals of powers of x
Calculations similar to those in the previous example show that

Z
a
x j dx =
b j+1 aj+1
j +1
for all b > a and each positive integer j.

We will later learn simpler ways to derive these formulas.
33 / 158
Example
2.2
Z
10
e x dx.
Find
2
SOLUTION: Let us take x =

Then
Z
2
10
e x dx = lim
= lim
m
X
102
m
e 2+(k1)8/m
and xk = 2 + (k 1) m8 .
m1
8
8 X 2 8i/m
e e
= lim
m m m
i=0
k=1
8 2
e
m
8
m
m1
X
e 8/m
i
i=0
8/m
e 8/m 1
8/m
= (e 10 e 2 ) lim 8/m
= e 10 e 2 .
m e
1
|
{z
}
= e 2 lim (e 8 1)
m
=1
34 / 158
The foregoing calculation used the fact that

lim
8/n
= 1.
e 8/n 1
To confirm this, observe that

lim
h
8/n
= lim
e 8/n 1 h0 e h 1
1
=
h
limh0 e h1
1
= 0
e
= 1,
since (e x )0 = e x .
35 / 158
Properties of Riemann integrals

We earlier identified various properties for the integrals of step functions.
By approximation, the same properties hold for the integrals of any
function:
THEOREM (Linearity)
If the functions f , g : [a, b] R have integrals and if c1 , c2 are constants,
then
Z
Z
c1 f (x) + c2 g (x) dx = c1
Z
f (x)g (x)dx 6=
But in general,
a
Z
f (x)dx
g (x) dx ,
f (x) dx + c2
g (x)dx
a
36 / 158
THEOREM (Invariance under translation)

Z
b+c
f (x c) dx
f (x) dx =
a
for every real number c
a+c
THEOREM (Comparison)
If f (x) g (x) for every x [a, b], then
Z
Z
f (x) dx
g (x) dx.
a
THEOREM (Expansion or contraction of the interval)

Z
kb
f
ka
x
k
Z
dx = k
f (x) dx
for every k > 0
37 / 158
THEOREM (Additivity of integrals over different intervals)

If a < b < c, then
Z
Z
f (x) dx +
Z
f (x) dx =
f (x) dx
a
38 / 158
Negative area
When f < 0, then we regard the area above the graph of f and below the
x-axis as negative.
Example 2.3
Z
x 2 dx =
8
3
A=
39 / 158
DEFINITION (Exchanging limits of integration)

If a < b, then we define
Z
Z
f (x) dx =
f (x) dx
a
40 / 158
C. Improper Integrals
If the function f is integrable on [a, b] for each real number b > a, then
we define:
Z +
Z b
f (x) dx = lim
f (x) dx
a
provided the limit exists.

Likewise, if for the real number b the function f is integrable on [a, b] for
each real number a < b, we then define
Z
Z
f (x) dx = lim
f (x) dx
a
if the limit exists.
41 / 158
Finally, we define
Z
f (x) dx =
Z
f (x) dx +
f (x) dx
c
where c R is arbitrary, assuming each of the integrals in the right hand

side is defined.
In other words, we are assuming that the right hand side above is not of
the form () + () or () + ().
Integrals of the type defined on this and the previous slide are called
improper integrals.
42 / 158
Example 2.4
We will learn later that
Z
Rb
1
a x2
dx =
1
dx = lim
b
x2
1
a
1
b
for b > a > 0. Therefore

1
1
= 1.
dx = lim 1
b
x2
b
Example 2.5
Using the rule
Rb
a
x j dx =
b j+1 aj+1
j+1
for j = 1, 2, . . . , we see that

b4
0 = ,
b 0
b
4
0

Z 0
Z 0
a4
3
3
x dx = lim
x dx = lim
0
= ,
a a
a
4
Z
Z 0
Z
x 3 dx =
x 3 dx +
x 3 dx is undefined.
Z
x 3 dx = lim
x 3 dx = lim
43 / 158
Tolstoy on integration (from War and Peace)
The movement of humanity, arising as it does from innumerable

arbitrary human wills, is continuous. To understand the laws of this
continuous movement is the aim of history. But to arrive at these laws,
resulting from the sum of all those human wills, mans mind postulates
arbitrary and disconnected units.
. . . Only by taking infinitesimally small units for observation (the
differential of history, that is, the individual tendencies of men) and
attaining to the art of integrating them (that is, finding the sum of these
infinitesimals) can we hope to arrive at the laws of history.
44 / 158
Section 3
Approximation methods
45 / 158
A. Approximating integrals numerically

Rb
In order to numerically
the value of the integral a f (x) dx,
Pn approximate
we can compute k=1 f (xk )xk with a large value of n. To simply, we
use equal sized subintervals, each of width
x =
ba
.
n
Let
ba
n
denote the left endpoint of each subinterval.
xk = a + (k 1)
DEFINITION
The left endpoint rule approximates the integral
n
ba X
Ln =
f
n
k=1
Rb
a
ba
a + (k 1)
n
f (x) dx by the sum

.
46 / 158
Example 3.1
Fix n = 5. The step size is x = 31
5 = 0.4. The left endpoints are
x0 = 1, x1 = 1.4, x2 = 1.8, x3 = 2.2, x4 = 2.6.
1.0
Z
1
e x dx L5 =
1.4
1.8
2.2
2.6
3.0
2
(f (1) + f (1.4) + f (1.8) + f (2.2) + f (2.6))
5
47 / 158
Theres nothing particularly special about the left endpoints, so we could

just as easily use the right endpoints
xk = a + k
ba
.
n
DEFINITION
Rb
The right endpoint rule approximates the integral a f (x) dx by the
sum

n
ba X
ba
Rn =
f a+k
.
n
n
k=1
48 / 158
Example 3.2
For our example
R3
1
e x dx, this now gives
1.0
Z
1
e x dx R5 =
1.4
1.8
2.2
2.6
3.0
2
(f (1.4) + f (1.8) + f (2.2) + f (2.6) + f (3))
5
49 / 158
Left versus right
If we compare the formulas for Ln and Rn , we see they only differ in two
terms out of the entire sum:
ba
(f (x0 ) + f (x1 ) + f (x2 ) + + f (xn1 )
)
n
ba
Rn =
(
f (x1 ) + f (x2 ) + + f (xn1 ) + f (xn ))
n
Ln =
So the only real difference is whether we include f (a) = f (x0 ) or

f (b) = f (xn ) in the sum.
Which is better?
50 / 158
Example 3.3
R3
2
For the particular example of 1 e x dx, we can see graphically that the
left endpoint rule gives an overestimate and the right endpoint rule gives
an underestimate:
1.0
1.8
2.6
1.0
1.8
2.6
51 / 158
More accurate methods

We will see now that some surprisingly simple modifications of the
formulas above give us much better approximations.
One idea is to compromise between the left- and right-endpoints, by
choosing instead the midpoint of each subinterval,
xk = a + (k 12 ) ba
n .
DEFINITION
The midpoint rule approximates the integral
Rb
a
f (x) dx by
Mn =
ba X
f a + (k 12 )
n
ba
n
k=1
52 / 158
1.2 1.6 2.0 2.4 2.8
53 / 158
Trapezoid rule
Another way to improve the accuracy is not to approximate by a
rectangle in each subinterval, but rather to approximate by a trapezoid,
gotten by drawing a diagonal line from (xk1 , f (xk1 )) to (xk , f (xk )):
1.0
1.4
1.8
2.2
2.6
3.0
54 / 158
In this case, we get on each subinterval Ik = [xk1 , xk ] a small trapezoid,

f (xk1 )+f (xk )
2
x:
the area of which is h1 +h
2 x =
2
h1
h2
x
DEFINITION
The trapezoid rule approximates the integral
n
Tn =
ba X
f
2n
k=1

a + (k 1)
ba
n
Rb
a
f (x) dx by

+f
a+k
ba
n

.
55 / 158
We can also write

Tn = x
f (x0 ) + f (x1 )
f (x1 ) + f (x2 )
f (xn1 ) + f (xn )
+ x
+ + x
2
2
2
x
( f (x0 ) + f (x1 ) + f (x1 ) + f (x2 ) + + f (xn1 ) + f (xn ) )
2
x
=
( f (x0 ) + 2f (x1 ) + 2f (x2 ) + + 2f (xn1 ) + f (xn ) )
2
=
Notice also
Tn =
1
(Ln + Rn ).
2
The trapezoid rule is thus the average of the left and right endpoint
rules. We will see that this averaging process makes the errors for Tn
much smaller than for either Ln or Rn !
56 / 158
Example 3.4
1.0
T5 =
1.4
1.8
2.2
2.6
3.0

2
f (1) + 2f (1.4) + 2f (1.8) + 2f (2.2) + 2f (2.6) + f (3)
25
57 / 158
Error bounds
Relatively straightforward calculus methods, omitted in these notes, let
us estimate the accuracy of our approximations:
THEOREM (Error estimates for midpoint and trapezoid

rules)
Assume the function f is twice differentiable on the interval [a, b], with
|f 00 (x)| C
(a x b)
for some constant C . Then

Z

b
C (b a)3

f dx Mn

a

24n2
and
Z

b
C (b a)3

f dx Tn

a

12n2
58 / 158
Interpretation
We say that midpoint and trapezoid rules are of order n12 . Since the
step size is x = ba
n , we can equivalently say that these methods are of
order (x)2 . This means, loosely speaking, that if we double the number
of points from n to 2n the error should drop by at least 14 .
It turns out that the left- and right-endpoint rules are only of order n1
(equivalently, of order x). Since n12 is much, much smaller than n1 for
large n, the midpoint and trapezoid rules are much more accurate.
More sophisticated approximations are of even higher order:
59 / 158
Simpsons rule
DEFINITION
If n is an even integer, Simpsons rule approximates the integral
Rb
f (x) dx by
a
Sn =
x
(f (x0 ) + 4f (x1 ) + 2f (x2 ) + 4f (x3 ) + + 4f (xn1 ) + f (xn )) .
3
It turns out that

Sn =
4Tn T n2
,
3
and it can be shown that

Z
K (b a)5
b

f dx Sn

a

180n4
provided |f (4) (x)| K for all a x b. So Simpsons rule is of order
1
n4 .
60 / 158
Section 4
Applications of integration
61 / 158
A. Defining new functions

Many important functions used in the theoretical and applied sciences are
defined via integrals.
Example 4.1 (Logarithms as integrals)

We earlier reminded you about the natural logarithm ln, a key formula for
which is
ln(xy ) = ln x + ln y (x, y > 0).
But how to we know that a function with this useful property even exists?
A systematic approach is to define the natural logarithm by the formula
Z
ln x =
1
1
dt
t
(x > 0);
and then to prove that the natural log, so defined, really does satisfy
ln(xy ) = ln x + ln y . When we have later developed the relevant calculus
skills, we will do this.
62 / 158
The foregoing also provides an interesting geometric interpretation of the

number e. It is that value of the upper limit of integration for which
Z e
1
dt = 1.
1 t
Example 4.2 (The Gamma function)

The Gamma function is
Z
(x) =
t x1 e t dt.
This improper integral exists for all positive real numbers x.

Later, after we have developed more integration techniques, we will
derive some interesting formulas for the Gamma function. In particular,
(n) = (n 1)! for all positive integers n.
63 / 158
B. Length of curves
If f : [a, b] R is a function given by some explicit formula, then all the
geometric properties of the curve determined by graph of f must
somehow be contained within the formula. How can we extract this
information?
One important use of calculus is providing ways for us to to compute
various geometric properties, for instance the length of curves:
Example 4.3 (Length of curves)

The length L of the curve determined by the graph of f is given by
Z
L=
p
1 + (f 0 )2 dx
64 / 158
C. Approximating functions by polynomials

We discussed earlier the problem of approximating a given function f by
a simpler polynomial of the form
g (x) = an x n + an1 x n1 + + a1 x + a0 .
One solution is to use the Taylor polynomial
g (x) = Tn (x) =
n
X
f (k) (a)
k=0
k!
(x a)k ,
introduced earlier. However we observed earlier that this

approximation requires that we have available very detailed
information about the function f at the specific point x = a. We
need to know f (a), f 0 (a), f 00 (a), . . . , f (n) (a), and these would be
essentially impossible to find if, say, f were determined by experimental
data.
We need another, more robust way to approximate by polynomials.
65 / 158
One very useful idea is to use integrals to measure the error of our
approximations
For this, let us assume that f : [a, b] R is given, and define then the
integral error function
E (a0 , a1 , . . . ,an1 , an )
Z b
=
(f (x) g (x))2 dx
a
Z
=
(f (x) (an x n + an1 x n1 + + a1 x + a0 ))2 dx.
The idea now is to select the coefficients a0 , a1 , . . . an1 , an to minimize

this error.
This however requires that we know how to minimize the function
E (a0 , a1 , . . . , an1 , an ) depending on n + 1 variables, and this is beyond
the scope of Math 10. But in practice computers can quickly compute
the answers to high precision.
66 / 158
D. Integrating densities
Example 4.4 (Chemical concentration)
Suppose that some chemical (say, an insecticide) is spread unevenly along
a thin strip of land. We may for simplicity assume the region to be
one-dimensional, lying along the x axis. Let
(x) = concentration of the chemical at x.
What is the total amount of insecticide spread in the region a x b?
The total amount of the chemical between a and b is
Z b
(x) dx.
a
( = rho.)
67 / 158
Example 4.5 (Mass density)

Suppose that a straight piece of wire is made of a mixture of two metals,
the proportion of which changes along the wire. Assume for simplicity
the wire is one dimensional and that
(x) = mass density of the wire at x.
What is the total mass of the wire for a x b?
The total mass is
Z
(x) dx.
a
These two examples illustrate the point that the total amount of any
quantity between the points a and b is the integral of its density over the
interval [a, b].
68 / 158
E. Integral test for series convergence

THEOREM
Suppose that f : (0, ) [0, ) is a nonnegative, decreasing function.
Set
ak = f (k) (k = 1, 2, . . . ).
P
Then k=1 ak converges if and only if
Z
f (x) dx < .
1
To see why this is true, look at the pictures on the next slide, which show
geometrically that
X
k=1
Z
ak
f (x) dx
1
ak .
k=2
69 / 158
graph of y = f(x)
a1
a1
a2
a2
a3
a3
a4
a4
4
70 / 158
Example 4.6
Show that
X
1
kp
k=1
converges if p > 1.
SOLUTION: We will learn later that if b > a > 0, then
Z b
1
a1p b 1p
dx
=
.
p
p1
a x
Therefore
Z
1
1
dx = lim
b
xp
Z
1
1
1 b 1p
1
dx = lim
=
p
b p 1
x
p1
is finite. Note that limb b 1p = 0, since p > 1.
71 / 158
F. Integration and probabilities
In this section we will learn how integration can help us compute the
probabilities of certain random events.
We provide first some introductory motivation for the idea that areas
(and therefore integration) are somehow related to probabilities.
Example 4.7 (Simulating coin tosses)

I
Flip a fair coin 200 times.
Record the number of heads out of the 200 flips.
Repeat the process N times.
72 / 158
Histogram for N = 100 times:
0.06
0.04
0.02
0.00
Density
0.08
0.10
200 Coin Tosses
80
90
100
110
120
Number of Heads
73 / 158
Histogram for number of heads in 200 coin tosses repeated 1,000 times:
0.03
0.02
0.01
0.00
Density
0.04
0.05
0.06
200 Coin Tosses
80
90
100
110
120
Number of Heads
74 / 158
Histogram for number of heads in 200 coin tosses repeated 10,000 times:
0.03
0.02
0.01
0.00
Density
0.04
0.05
200 Coin Tosses
80
90
100
110
120
Number of Heads
75 / 158
0.03
0.00
0.01
0.02
Density
0.04
0.05
200 Coin Tosses
80
90
100
110
120
Number of Heads
This function in blue looks like a smooth version of our step function!
What is this function?
76 / 158
DEFINITION
A Gaussian function is a function having the formula
f (x) =
(x)2
1
e 22 .
2
Gaussian functions comprise a family of bell-shaped curves, each

determined by the parameters R and > 0.
As we see in the picture on the next slide gives the center of the
bell-shaped curve. The parameter determines the thickness and height
of the curve. We call the mean and the standard deviation, and will
later explain the probabilistic meaning of these terms.
( = mu, = sigma)
77 / 158
0.04
f(x)
0.03
0.02
0.01
0.00
70
80
90
100
110
120
130
The graph of f (x) =
1 e
2
(x)2
2 2
78 / 158
Examples
Gaussian Functions
= 0, = 0.5
0.8
0.6
0.4
0.2
0.0
15
10
10
15
20
79 / 158
Gaussian Functions
= 0, = 0.5
0.8
0.6
0.4
= 0, = 1
0.2
0.0
15
10
10
15
20
80 / 158
Gaussian Functions
= 0, = 0.5
0.8
0.6
0.4
= 0, = 1
0.2
= 0, = 7
0.0
15
10
10
15
20
81 / 158
Gaussian Functions
= 0, = 0.5
0.8
0.6
0.4
= 0, = 1
0.2
= 8, = 3
= 0, = 7
0.0
15
10
10
15
20
82 / 158
We will see later that a Gaussian function f corresponds to a normal (or

Gaussian) probability distribution. In particular,
I
Total area under the curve is always 1
The graph of f is symmetric around : f ( + x) = f ( x)

Normal Distribution
0.4
0.3
0.2
Area = 1
0.1
0.0
4
83 / 158
DEFINITION
The standard normal distribution has mean = 0, standard deviation
= 1 and is therefore
2
1
f (x) = e x /2 .
2
The area to right of 0 equals 21 , and the area to left of 0 equals 21 .
0.3
0.2
0.2
0.3
0.4
Standard
Normal Distribution
0.4
Standard
Normal Distribution
0.0
0.0
0.1
Area = 0.5
0.1
Area = 0.5
84 / 158
For the standard normal distribution,
0.3
0.2
0.3
0.4
Standard
Normal Distribution
0.4
Standard
Normal Distribution
0.2
Area
=
0.95
0.1
Area
=
0.68
0.0
the area between -1 and 1 equals 0.68,

the area between -2 and 2 equals 0.95,
the area between -3 and 3 equals 0.997.
0.1
0.0
85 / 158
0.3
0.2
Shaded area
=
.5*0.32
0.0
0.1
Shaded area
=
10.68
0.0
0.1
0.2
0.3
0.4
Standard
Normal Distribution
0.4
Standard
Normal Distribution
86 / 158
We can use the standard normal to calculate areas under the curve for
any Gaussian distribution.
Example 4.8
Suppose we have a normal distribution with = 50 and = 5. What is
the area under the curve to the left of 40?
SOLUTION: We first convert 40 to standard units, by subtracting the
mean and dividing by the standard deviation:
40
40 50
=
= 2.
5
We now need to find the area to the left of 2 for the standard normal
distribution.For this, we can use an online applet 1 from the UC Berkeley
Statistics Department to evaluate numerically areas under the curve of
the standard normal (with = 0, = 1).
1 http://statistics.berkeley.edu/
~stark/Java/Html/NormHiLite.htm
87 / 158
Using the applet, we learn that the area under the curve of the standard
normal between -2 and 0 is approximately .477.
Since the total area under the curve to the left of 0 is .5, it follows that
the area to the left of 2 is approximately
.5 .477 = .023
88 / 158
Example 4.9 (Womens heights)

Assume that US womens heights are normally distributed with mean 63
inches and standard deviation 3 inches.
About what percentage of US women are taller than 66 inches?
SOLUTION: Geometrically, we want to calculate the area to the right of
66. For our data, = 63 and = 3.
We as before convert 66 to standard units:
66 63
66
=
= 1.
3
Using the online applet we learn that the area under the standard normal
curve between 0 and 1 is approximately .341. Hence the area to the right
of 1 is about
.5 .341 = .159
So about 16% of women are taller than 66 inches.
89 / 158
Introduction to computing probabilities

We have introduced the idea that areas under curves can be interpreted
as probabilities, and now provide more mathematical details, which will
be further elaborated later. In particular we will learn in Part III of this
course about the concepts of a probability space
(, P)
and a random variable
X : R.
Interpretation
More precise definitions will appear later, but for now think of the
probability space as some sort of mathematical model for random
occurrences, for which P means the probability. And think of X as
giving the random outcomes of experiments or measurements.
90 / 158
DEFINITION
The cumulative distribution function (cdf) of a random variable X is
the function
F (x) = P(X x) ,
defined for < x < . In other words,
F (x) is the probability that X x.
F maps real numbers to a probability value in [0, 1]:
F : R [0, 1].
The cumulative distribution function is increasing and satisfies
lim F (x) = 0, lim F (x) = 1
91 / 158
DEFINITION
The probability density function (pdf) of a random variable X is a
nonnegative function f that has the following properties:
R
I
f (x) dx = 1
The probability that X falls in the interval (a, b) is the area under
the density function between a and b:
Z
P(a X b) =
f (x) dx.
a
So when a random variable X has a pdf f , we can calculate probabilities

by integrating f .
Rc
In particular, P(X = c) = c f (x) dx = 0. And since P(X = c) = 0, we
dont need to worry about endpoints:
P(a X b) = P(a < X b) = P(a X < b) = P(a < X < b).
92 / 158
Example 4.10
As noted earlier, the normal distribution has as its probability density
function the Gaussian function
f (x) =
(x)2
1
e 22 .
2
0.4
0.3
0.2
0.1
0.0
4
+ 2
+ 3
+ 4
93 / 158
Example 4.11
The uniform distribution gives probabilities for a continuous random
variable that takes values in the interval (a, b) and each value is equally
likely. The probability density function is
(
1
if a < x < b
.
f (x) = ba
0 otherwise
Uniform Distribution for (3,3)
0.15
0.10
0.05
0.00
4
94 / 158
Using the pdf to find the cdf

If we let a = and b = x, we can use the pdf to find the cdf:
Z x
F (x) = P(X x) = P( < X x) =
f (y )dy
Normal Distribution
0.4
0.3
f(x)
0.2
F(1)
0.1
0.0
4
95 / 158
Using the cdf to find the pdf

Now if we focus on area under the curve, we can use the cdf to find the
pdf. Namely, f can be recovered from F in the following sense:
F (b) F (a) = P(X b) P(X a) = P(a X b)
Z b
Z a
Z b
=
f (x) dx
f (x) dx =
f (x) dx
Normal Distribution
0.4
F(1)F(1)
0.3
f(x)
0.2
0.1
0.0
4
96 / 158
Mathematical relationship between pdf and cdf

Our discussion thus far shows that for continuous random variables, we
have
Z x
F (x) =
f (y ) dy
and
Z
f (y ) dy = F (b) F (a)
a
There is a very important relationship between the functions F and f

that can explain both of these properties: f is the derivative of F :
f = F 0.
The properties above follow from the Fundamental Theorem of
Calculus, which we discuss next.
97 / 158
Section 5
Antiderivatives, Fundamental Theorem of
Calculus
98 / 158
A. Antiderivatives
I
When you learn how to add, you then learn how undo the addition
via subtraction.
When you learn how to multiply, you then learn how to undo
multiplication via division.
So far this semester we have learned how to take derivative of a function.

Now we ask the reverse: can we undo a derivative? Yes, using
antidifferentiation.
Example 5.1
Input: f (x) = x 2
Input: f (x) =
1
+ sin x
x
Output: F (x) =
1 3
x .
3
Output: F (x) = ln x cos x + 5
99 / 158
DEFINITION
Given the function f , a function F is called an antiderivative of f on the
interval (a, b) if
F 0 (x) = f (x)
for all x in (a, b).
Example 5.2
If f is a function which describes how some quantity is changing over
time t, an antiderivative F determines the amount of the quantity at any
time, up to an additive constant.
The location of a car is an antiderivative of its velocity.
The velocity of a car is an antiderivative of its acceleration.
100 / 158
THEOREM
If F is an antiderivative of f and G is an antiderivative of g , then F + G
is an antiderivative of f + g .
Proof.
This follows directly from the corresponding property for derivatives, since
(F + G )0 = F 0 + G 0 = f + g .
THEOREM
If F is an antiderivative of f and c is a constant, then c F is an
antiderivative of c f .
Proof.
By the constant multiple rule for differentiation,
c F )0 = c F 0 = c f
101 / 158
If F and G are antiderivatives of f and g , respectively, it is in general

NOT true that F G is an antiderivative of f g
Example 5.3
2
F (x) = x2 is an antiderivative of f (x) = x, and G (x) =

antiderivative of g (x) = x 2 , but
F (x)G (x) =
x3
3
is an
x5
x2 x3
=
2 3
6
is NOT an antiderivative of f (x)g (x) = x x 2 = x 3 .
Similarly,
F
f
is generally NOT an antiderivative of .
G
g
102 / 158
THEOREM (Antiderivatives differ by a constant)

Suppose that f is a function whose domain contains the interval (a, b),
and assume that F is an antiderivative of f on (a, b).
Then another function G is also an antiderivative of f on (a, b) if and
only if G = F + C , for some constant C .
Proof.
If F 0 = f , then
(F + C )0 = F 0 + (C )0 = F 0 = f .
Consequently, if F is an antiderivative of f , then so is F + C .
Conversely, if F is an antiderivative of f on (a, b), then any other
antiderivative G must satisfy
(G F )0 = G 0 F 0 = f f = 0.
This means that G F = C is constant.
103 / 158
Example 5.4 (Difference of antiderivatives)

Consider the two functions
F (x) =
x 1
x +1
and
G (x) =
2
x +1
Differentiate:

x1
x+1
2
x+1
0
0
(x+1)(x1)0 (x1)(x+1)0
(x+1)2
(x+1)(x1)
(x+1)2
2
(x+1)2
(2)(x + 1)1
0
2
(x+1)2
Thus F and G are antiderivatives of the same function. According to the

theorem on the previous slide, they must differ by a constant.
Check:
F (x) G (x) =
x 1
2
x 1+2
=
= 1.
x +1 x +1
x +1
104 / 158
Example 5.5 (From pdf to cdf)

Consider the Gaussian function
f (x) =
(x)2
1
e 2 2 .
2
This, as we have seen, is the probability distribution function of the

normal distribution.
What is the probabilistic meaning of an antiderivative F of f ?
One antiderivative of the probability distribution function (pdf) is the
cumulative distribution function (cdf):
Z
F (x) = P(X x) =
f (y ) dy
105 / 158
Given a function f , an antiderivative F , if it exists, must be unique up to

additive constant.
But do antiderivatives actually exist?
THEOREM (Existence of antiderivatives)

Suppose a < b and f is continuous on (a, b). Then, there exists a
function F with domain (a, b) such that F 0 (x) = f (x) for x (a, b).
Antiderivatives always exist for the functions we will encounter in this
course, even though it can be difficult (or impossible!) to find simple
formulas for them. Here is a particularly tantalizing instance of this:
Example 5.6
2
The antiderivative F of f (x) = e x has no simple formula.
106 / 158
Notation (Indefinite integral notation)

We use the notation
Z
f (x) dx = F (x) + C
to indicate that f is a function whose antiderivates are all of the form

F (x) + C for some function F (x) and an arbitrary constant C .
The antiderivative symbol
Z
f (x) dx
is also called the indefinite integral of f .
Remark. Right now, there is no reason to assume that this symbol has
any connection to the notation introduced earlier for area under the
curve:
Z
b
f (x) dx
a
However, we shall see later why this makes sense.

107 / 158
Example 5.7
Find all the antiderivatives of f (x) = x1 on the domain
R\{0} =R (, 0) (0, ). In other words, determine the indefinite
integral x1 dx.
SOLUTION: The domain consists of two intervals, which we will
analyze separately.
The interval (0, ). We need to think of a function F such that
F 0 (x) =
1
.
x
Remembering our earlier discussion, we recall for the natural logarithm

that
0
1
ln x = .
x
108 / 158
1
on the interval
x
(0, ) are the functions of the form ln x + C1 on that interval.
Thus we know that the antiderivatives of f (x) =
The interval (, 0). We want to use the same idea as before, but we
cant use the function ln x because we cant take logs of negative
numbers. If x is negative, then x is positive, so consider:
0
0
ln(x)
=
ln(x)
So, the antiderivatives of
1
x
1
x
1
x
(x)0
on (, 0) are the functions ln(x) + C2 .
109 / 158
Conclusion. A function F is an antiderivative of 1/x on the domain

(, 0) (0, ) if and only if there are constants C1 and C2 such that
(
ln x + C1
if x is in (0, )
ln(x) + C2
if x is in (, 0)
F (x) =
In practice, many people (mathematicians included!) only think about

the case where C1 = C2 and they write
Z
1
dx = ln |x| + C .
x
110 / 158
We can convert our rules for differentiation into rules for

antidifferentiation.
Example 5.8 (Antiderivatives of powers)

Suppose that p is a real number, p 6= 1. Then the antiderivative of
f (x) = x p
on the interval (0, ) are exactly the functions of the form
F (x) =
x p+1
+ C.
p+1
Check the derivative:

p+1 0

0
x
1
1
=
x p+1 =
(p + 1)x (p+1)1 = x p .
p+1
p+1
p+1
111 / 158
Example 5.9
There exists a unique function F on the interval (, ) such that
F (1) = 7 and F is an antiderivative of x 2 . Find F .
Z
x3
+ C.
SOLUTION: We know that F (x) = x 2 dx =
3
To find C , we plug in x = 1:
F (1) =
1
+C =7
3
This implies
C =7
1
20
=
.
3
3
Hence the solution is the function

F (x) =
x3
20
+ .
3
3
112 / 158
Example 5.10
Find all antiderivatives of f (x) = ln x.
SOLUTION: After playing around with this for a while, we make the
guess
F (x) = x ln x x.
Thereafter, we simply check its derivative:

0

0
x ln x x
=
x ln x (x)0

0
=
x ln x + ln x(x)0 1 (product rule)
=
x( x1 ) + ln x 1
ln x
So on the domain (0, ), we have

Z
ln x dx = x ln x x + C .
113 / 158
Important antiderivatives
You should learn the following antiderivatives:
Z
x p dx =
Z
x p+1
+C
p+1
if p 6= 1
1
dx = ln |x| + C
x
e x dx = e x + C
Z
sin x dx = cos x + C
Z
cos x dx = sin x + C
114 / 158
Example 5.11
1
x 2
+C = 2 +C
2
2x
Z
3x 2
5x 4
+C
5x 3 3x dx =
4
2
Z
Z
x 3/2
x dx = x 1/2 dx =
+ C.
3/2
Z
1
dx =
x3
x 3 dx =
115 / 158
Fun on the internet

Go to Google and search for
Integral Calculator
or
Antiderivative Calculator
This will give you several options such as

integrals.wolfram.com
If you type in
sqrt(e^x) + sin(x)/cos(x),
then you will learn that
Z
sin x
dx
ex +
cos x
2 e x ln(cos x) + C .
Next try
sqrt(e^x) * sin(x)/cos(x),
and also
e^(-x^2)
116 / 158
B. Fundamental Theorem of Calculus

We have defined the area under f between a and b to be
Z b
n
X
f (x) dx = lim
f (xk )xk .
n
k=1
Even for very simple functions, calculating these definite integrals using
the Riemann sum definition can be very difficult.
We now introduce the Fundamental Theorem of Calculus, which ties
together integration and differentiation.
This will allow us to compute the area under the curve by the formula
Z
f (x) dx = F (b) F (a).

a
117 / 158
THEOREM (Fundamental Theorem of Calculus)

(i) Suppose that f is a continuous function on [a, b]. If F is any
antiderivative of f on (a, b), then
b
f (x) dx = F (b) F (a).

a
Since F 0 = f , we can rewrite this to read

Z
F 0 (x) dx = F (b) F (a).
(ii) If f is continuous on [a, b], then for a < x < b,

d
dx
f (t) dt = f (x).
a
118 / 158
Area as a function
We can view the area under the curve y = f (x) between 0 and b as a
function of the unknown b:
F(b)
Let F (b) equal the shaded area under y = f (x) between 0 and b
as a function of b, as shown. The formula for that function is
Z b
F (b) =
f (x) dx.
0
119 / 158
Derivative of the area function

We compute the derivative F 0 of the area function. By definition,
F 0 (b) = lim
h0
F (b + h) F (b)
.
h
For h > 0, F (b + h) F (b) is the area under f (x) between b and b + h.

Z
b+h
F (b + h) F (b) =
f (x) dx
0
Z
=
Z
=
Z
f (x) dx
f (x) dx +
0
f (x) dx
0
b+h
f (x) dx
0
b+h
f (x) dx.
b
Now, divide this by h and make h smaller and smaller. What do you get?
120 / 158
Example 5.12
Lets consider a concrete example with f (x) = x 2 .
R b+h
We can estimate b f (x) dx using a single rectangle. The left endpoint
rule gives an underestimate and the right endpoint rule gives an
overestimate:
L1 F (b + h) F (b) R1
b+h
121 / 158
For f (x) = x 2 , we get

h b 2 F (b + h) F (b) h (b + h)2
Consequently,
b2
F (b + h) F (b)
(b + h)2
h
Now, evaluate the limit for h > 0:

lim
h0
F (b + h) F (b)
= b2
h
A similar calculation for h < 0 yields the same limit.

Conclusion. For every b, we have
F 0 (b) = b 2 = f (b).
Thus the area function F is an antiderivative of f (x) = x 2 .
122 / 158
Example 5.13 (Area under a parabola)

We know that
Z
x 2 dx =
x3
+ C.
3
So there is a constant C such that

x3
+ C.
3
Z 0
x 2 dx = 0.
We must have C = 0, because F (0) =
F (x) =
Remarkable Conclusions: The area under the curve y = x 2 between 0

3
and b is equal to F (b) = b3 . For 0 a < b, the area under the curve
y = x 2 between a and b equals
F (b) F (a) =
a3
b3
.
3
3
We find the area by simply evaluating an antiderivative at the

endpoints.
123 / 158
Example 5.14 (Using cdf to find pdf)

Recall that for probability distributions the integral of the cdf is the pdf.
That is, the cdf F (x) is the antiderivative of the pdf f (x).
Z x
F (x) = P(X x) = P( < X x) =
f (y ) dy
pdf of Normal Distribution
cdf of Normal Distribution
0.4
1.0
F(1)
0.8
0.3
0.6
F(0)
f(x)
0.2
0.4
F(1)
0.1
0.2
0.0
0.0
4
124 / 158
Example 5.15
R2
(x 5 x 3 ) dx.
Z
x p+1
SOLUTION: Since x p dx =
+ C,
p+1
x4
x6
is an antiderivative of f (x) = x 5 x 3 .
the function F (x) =
6
4
Compute
Z
1
2
x6
x 4
6
4 1
6
6

24
1
14
2
=
6
4
6
4

64 16
1 1
=
6
4
6 4
27
=
4
(x 5 x 3 )dx =
125 / 158
Notation
We will often write
b
F (x)a = F (b) F (a).
126 / 158
Example 5.16
Compute
R
0
sin x dx.
SOLUTION: We know that cos x is an antiderivative of sin x.

So we have
Z
0
sin xdx = ( cos x)|0
= ( cos()) ( cos(0))
= (1) (1) = 2.
1
127 / 158
Example 5.17
Compute
R1
1
x 3 dx.
SOLUTION: We know that

So we have
1
3
x dx =
1
1
x 4
4 1
x4
4
is an antiderivative of x 3 .
14
(1)4
=
4
4
= 0.
128 / 158
ExampleZ 5.18
Find
d
dx
sin(t 2 ) dt.
SOLUTION: By the Fundamental Theorem of Calculus, this is just

sin(x 2 ).
ExampleZ 5.19
Find
d
dx
sin(t 2 ) dt.
SOLUTION: We cant apply the Fundamental Theorem directly, but we

can do the following.
d
dx
Z
x
sin(t 2 ) dt =
Z x

d
sin(t 2 ) dt
dx
3
= sin(x 2 ).
129 / 158
Example 5.20
d
Find
dx
x2
f (t) dt.
x
SOLUTION: Since x appears in both the upper and lower bounds of

integration, we split up the integral:
d
dx
Z
x
x2
d
f (t) dt =
dx
Z
f (t)dt +
x2
!
f (t)dt
Z x2
d
f (t)dt +
f (t)dt
dx 0
0
d 2
f (x) + f (x 2 )
x
dx
| {z }
d
=
dx
=
Chain Rule
= f (x) + f (x ) (2x) = 2x f (x 2 ) f (x).
2
130 / 158
Section 6
Integration techniques
131 / 158
The limits of antidifferentiation

I
Rb
Weve now seen that in order to compute a f (x) dx, we need only
find an antiderivative of f .
Recall that every continuous function f has an antiderivative,
Z x
F (x) =
f (t) dt.
0
Finding antiderivatives explicitly can be extremely challenging,

however.
Next well see how to invert the chain rule and the product rule we
learned for computing derivatives.
However, many functions just do not have simple antiderivatives. In
particular, there are no elementary formulas for the following:
R
R
R x2
e dx
sin(x 2 ) dx
cos(x 2 ) dx
Z
ex
dx
x
sin x
dx
x
cos x
dx
x
132 / 158
A. Substitution, changing variables

The Chain Rule states
F (g (x))0 = f (g (x))g 0 (x),
whenever F 0 = f , and therefore
Z b
Z
0
f (g (x))g (x) dx =
a
F (g (x))0 dx
= F (g (b)) F (g (a))
Z g (b)
=
f (u) du.
g (a)
This gives the substitution formula

Z
a
f (g (x))g 0 (x)dx =
g (b)
f (u) du.
g (a)
133 / 158
We can think of the substitution formula as giving us a way to change

variables from x to u = g (x), in which case we have the very useful
mnemonic:
du = g 0 (x)dx,
although strictly speaking the symbols du and dx are not defined by
themselves.
We can then write the substitution formula as
Z
Z
f (g (x))g 0 (x) dx = f (u) du.
134 / 158
Our main purpose in finding antidervatives is to evaluate definite

integrals. When using u-substitution, we can follow two routes:
I
Find an antiderivative as usual, and evaluate at end points.
An alternative (and usually easier) method is to replace the bounds

of integration when we change variables.
135 / 158
Example
6.1
Z
Find
x e x dx.
SOLUTION: If we set u = x 2 , then du = 2xdx. We obtain

Z
Z
2
2
1
x e x dx =
e x 2x dx
2
Z
1
=
e u du
2
1
= eu + C
2
1 x2
= e + C.
2
We must always check our work:

2
d 1 x2
1 2 d 2
e + C = ex
x + 0 = x ex
dx 2
2
dx
136 / 158
Example
6.2
Z
Find
cos(ln x)
dx
x
SOLUTION:This one looks pretty awful, but if we make the substitution

1
u = ln x, then du = dx and we have
x
Z
Z
cos(ln x)
dx = cos u du
x
= sin u + C
= sin(ln x) + C .
Again, we should check our work by computing the derivative of

F (x) = sin(ln x).
137 / 158
Many integrals can be solved in multiple ways. By a previous theorem,

we know all antiderivates will differ from each other by a constant.
Example 6.3
For example, we can find
x
x 2 +1
dx in two different ways:
Method 1. Set u = x 2 + 1, so du = 2x dx and x dx = 21 du:

Z
Z
x
du
1
dx =
2
u
x2 + 1
Z
1
u 1/2 du
=
2
p
1 u 1/2
+C =
x2 + 1 + C .
=
2 1/2
Method 2. Set u =
Z
x
x2 + 1
x 2 + 1. Then du =
2x
dx,
2 x 2 +1
Z
dx =
du = u + C =
and we get
p
x2 + 1 + C .
Both methods give the same answer.

138 / 158
Example
6.4
Z
Find
x5
p
1 + x 2 dx
SOLUTION: Lets try the substitution u = 1 + x 2 . Then du = 2x dx, so

x dx = du
2 :
Z
Z
p
p
5
2
x
1 + x dx = (x 2 )2 1 + x 2 x dx
Z
1
= (u 1)2 u du
2
Z
1
=
u 5/2 2u 3/2 + u 1/2 du
2

1 u 7/2
u 5/2
u 3/2
=
2
+
+C
2 7/2
5/2
3/2
=
(1 + x 2 )7/2
2(1 + x 2 )5/2
(1 + x 2 )3/2
+
+ C.
7
5
3
139 / 158
Example
6.5
Z
e
Find
1
ln x
dx
x
SOLUTION: Well use u = ln x, so du =
dx
x .
Note that u(1) = 0 and u(e) = 1. Thus we have

Z
1
ln x
dx =
x
Z
0
1
u 2
1
1
= 0= .
u du =
2 0
2
2
140 / 158
Example 6.6 (Normalizing constant for a cdf)

We wish to define a continuous probability distribution on the interval
= (1, e), by means of a probability distribution function of the form
f (x) =
1 ln x
.
Z
x
How should the constant Z be chosen?

SOLUTION: We want
f (x) dx = 1.
1
Equivalently,
1
Z
Z
1
ln x
dx = 1.
x
Therefore the previous example implies

Z e
ln x
1
Z=
dx = .
x
2
1
141 / 158
Example 6.7
Find
R5
dx
.
3 (23x)2
SOLUTION: We use u = 2 3x, so du = 3dx and thus dx = du/3.

Also u(3) = 7, u(5) = 13, and so we have
Z
3
1
1
dx =
(2 3x)2
3
13
1
du
u2
13
1 1
=
3 u 7

1
1
1
2
=
=
.
3 13 7
91
142 / 158
Example 6.8 (More on logarithms)

Recall that we have defined
Z
ln x =
1
1
dt
t
(x > 0).
Let us now compute for x, y > 0:

Z x
Z xy
Z xy
1
1
1
dt =
dt +
dt
ln(xy ) =
t
t
t
x
1
Z xy 1
1
= ln x +
dt
t
Zx y
1
= ln x +
du
u
1
= ln x + ln y ,
where we substituted u = xt , du =
dt
x .
143 / 158
Consequently, if we define the natural logarithm by the integral formula

Z x
1
ln x =
dt,
1 t
we can then deduce the standard formula
ln(xy ) = ln x + ln y .
It is an interesting exercise to use the definition to show also that
ln(x y ) = y ln x
(x > 0, y R).
144 / 158
B. Symmetry: even and odd functions

We now turn
R a our attention to the very special case of definite integrals of
the form a f (x) dx for functions f that have special symmetries:
DEFINITION
I
I
The function f is called even if f (x) = f (x).

The function f is called odd if f (x) = f (x).
The terms even and odd come from the power functions: x 2 , x 3 , x 4 , etc.
Even: f (x) = f (x)
Odd: f (x) = f (x)
Ra
Ra
It looks like we should have a f (x)dx = 2 0 f (x) dx for even functions
Ra
and a f (x) dx = 0 for odd functions. This is true.
145 / 158
THEOREM (Using symmetry)

Ra
If f is an odd function, then a f (x) dx = 0.
Ra
Ra
If f is an even function, then a f (x) dx = 2 0 f (x) dx.
Proof.
Z
f (x) dx =
a
f (x) dx +
a
Z
=
f (x) dx
0
Z
f (x) dx +
f (x) dx
Z a
Z a
f (u)(1) du +
f (x) dx
=
0
0
Z a
Z a
=
f (u) du +
f (x) dx = 0.
0
The proof that
Ra
a
(u = x, du = dx)
f (x) dx = 2
Ra
0
f (x) dx for even functions is similar.

146 / 158
Example Z6.9
2
Calculate
2
sin x
dx.
4 + 3x 2 + 2x 4
SOLUTION: Attempting to find an antiderivative would be a nightmare.

Luckily, the integrand is odd:
f (x) = f (x).
So, without calculating anything at all, we can conclude
Z
sin x
dx = 0.
4 + 3x 2 + 2x 4
147 / 158
C. Integration by parts
Recall from the Product Rule that
(fg )0 = f 0 g + fg 0 .
Now integrate and use the Fundamental Theorem of Calculus, to learn
that
Z b
Z b
Z b
0
0
f (x)g (x) dx +
f (x)g (x) dx =
(fg )0 (x) dx
a
= f (b)g (b) f (a)g (a)

b
= (fg )|a .
Rearranging gives the formula for integration by parts:
Z
b
0
f (x)g (x) dx =
a
b
(fg )|a
f (x)g 0 (x) dx
148 / 158
Let us now write u = g (x) and v = f (x). Recalling the useful (but
mathematically imprecise) expressions
du = g 0 dx, dv = f 0 dx,
we can rewrite the integration by parts formula as
Z
Z
u dv = uv
v du.
Whichever form of it we use, the point is that the integration by parts

formula gives us a way to move a derivative from one function onto
another within an integral. This quite often converts a difficult integral
into a simpler one, as we will see in subsequent examples.
149 / 158
Example 6.10
Find
x sin x dx.
SOLUTION: If we use u = x and dv = sin x dx, then du = dx and

v = cos x. So we have:
Z
Z
x sin x dx = x cos x
( cos x)dx
Z
= x cos x +
cos x dx
= x cos x + sin x + C
We should check our work:
d
(x cos x + sin x + C ) = cos x x( sin x) + cos x = x sin x.
dx
150 / 158
Example 6.11
Find
ln x dx.
SOLUTION: This one isnt obviously a candidate problem for

integration by parts. But let us try u = ln x and dv = dx.
Then we get du =
1
dx and v = x. Consequently,
x
Z
Z
1
ln x dx = x ln x x dx
x
Z
= x ln x 1 dx
= x ln x x + C
Check this answer!
151 / 158
Example 6.12
Find
x ln x dx.
SOLUTION: We choose u = ln x and dv = x dx. Then du =

v=
x2
, and therefore
2
Z
1
dx,
x
Z 2
x 1
x2
dx
2
2 x
Z
x 2 ln x
x
=
dx
2
2
x2
x 2 ln x
=
+ C.
2
4
x ln x dx = ln x
Again, confirm this answer.
152 / 158
Example
6.13
Z
Find
x 2 e 3x dx.
SOLUTION: Lets try u = x 2 , dv = e 3x dx. Then du = 2xdx and

v = 13 e 3x . Thus
Z
Z
1
2
x 2 e 3x dx = x 2 e 3x
xe 3x dx.
3
3
We can now integrate by parts again, this time using u = x, dv = e 3x dx
and thus du = dx, v = 31 e 3x :
Z
Z
1
2
xe 3x dx
x 2 e 3x dx = x 2 e 3x
3
3

Z
1 2 3x 2 1 3x 1
3x
xe
e dx
= x e
3
3 3
3

1 2 3x 2 1 3x 1 1 3x
= x e
xe
e +C
3
3 3
3 3

x 2 e 3x
2xe 3x
2e 3x
2C
=
+
+C
Question: why not
?
3
9
27
9
153 / 158
Repeated integration by parts
I
I
x 2 e 3x dx, we had to integrate by parts twice.

R
With a little work, you could find x 3 e 3x dx by integrating by parts
3 times.
R
To find x n e 3x dx, we would have to integrate by parts n times.
To find
In statistics, these kind of integrals are very useful for computing

moments of probability distributions.
154 / 158
Example 6.14 (Another trick)

Find
e x cos x dx.
SOLUTION: Try u = cos x, dv = e x dx. Then du = sin x dx, v = e x ,

and
Z
Z
e x cos x dx = e x cos x + e x sin x dx.
We integrate by parts again with u = sin x, dv = e x dx, and du = cos x
and v = e x . We now compute
Z
Z
x
x
e cos x dx = e cos x + e x sin x dx
Z
= e x cos x + (e x sin x e x cos x dx)
Z
This implies 2
e x cos x dx = e x cos x + e x sin x, and so

Z
e x cos x dx =
e x cos x + e x sin x
+C
2
155 / 158
Example 6.15 (More on the Gamma Function)

Recall that the Gamma function is defined by the integral
Z
(x) =
t x1 e t dt.
0
THEOREM
(i) The Gamma function satisfies
(x + 1) = x(x)
for all x (0, ).
(ii) In particular, the Gamma function is an extension of the factorial

function:
(n + 1) = n!
for nonnegative integers n.
156 / 158
Proof.
To prove this, we use integration by parts:
Z
u dv =
a
b
(uv )|a
v du.
a
Take u = t x and dv = e t dt; so that du = xt x1 dt and v = e t . Then

Z b
Z b
b
t x e t dt =
e t t x a
(e t ) x t x1 dt
a
= e a ax e b b x + x
t x1 e t dt.
Now take the limit a 0 and b . Then the left hand side converges
to (x + 1), and the right hand side becomes x (x). Since
Z

(1) =
e t dt = e t 0 = 1,
0
the formula stated in (ii) follows from (x + 1) = x(x).

157 / 158
The Gamma function and statistics
Certain other values of the Gamma function will turn out to be important
in statistics. In particular,

= = 1.77245 . . . ;
2
although this calculation requires tools beyond Math 10. (Take Math 53!)
It follows from the rule (x + 1) = x(x) that

1
5
3
7
15
3
=
,
=
,
=
, ...
2
2
2
4
2
8
158 / 158

Integral Calculus

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Integral Calculus

Transféré par

Droits d'auteur :

Formats disponibles

MATH 10A METHODS OF MATHEMATICS:

CALCULUS, STATISTICS AND

October 25, 2015

Overview of Part 2: Integral calculus

The main references for this part are

Sebastian J. Schreiber, Karl Smith and Wayne Getz,

0.00 0.05 0.10 0.15 0.20 0.25 0.30

Example 1.1 (Birth weight and smoking)

Mothers who did not smoke

Birth weight in ounces

Mothers who smoked

Birth weight in ounces

How to draw histograms

First, choose the consecutive intervals (or bins) I1 , I2 , . . . Im into

Calculate the number of data points nk within each interval Ik . Then

is the total number of points.

Area (Percent) = height x width

Example 1.2 (Calculating percentiles using histograms)

Mothers who smoked

Birth weight in ounces

To calculate and plot the height of the rectangles, we were actually

Concerning the breakpoints we assume s(xk ) = sk for k = 1, 2, ..., m.

Histograms are step functions

A. Integral of a step function

To show that u is actually a step function, we must find a partition P

THEOREM (Additive Property)

THEOREM (Homogeneous Property)

We can combine the previous two assertions:

THEOREM (Invariance under translation)

for every real number c

THEOREM (Expansion or contraction of the interval)

for every k > 0

Next, we turn to the problem of computing integrals of more general

We instead find the area of a collection of rectangles that approximate

A (x1 x0 )f (x0 ) + (x2 x1 )f (x1 )

Using 10 subintervals makes the approximation even better:

f (xk1 )(xk xk1 )

f (xk )(xk xk1 ).

This is an approximation to the area under the curve, called a Riemann

f (x) dx gives the area under

Useful formulas for Riemann sums

We will discuss in Math 10B how to use mathematical induction to

We must therefore compute

Calculations similar to those in the previous example show that

for all b > a and each positive integer j.

SOLUTION: Let us take x =

The foregoing calculation used the fact that

To confirm this, observe that

Properties of Riemann integrals

THEOREM (Invariance under translation)

for every real number c

THEOREM (Expansion or contraction of the interval)

for every k > 0

THEOREM (Additivity of integrals over different intervals)

DEFINITION (Exchanging limits of integration)

provided the limit exists.

if the limit exists.

where c R is arbitrary, assuming each of the integrals in the right hand

for b > a > 0. Therefore

for j = 1, 2, . . . , we see that

Tolstoy on integration (from War and Peace)

The movement of humanity, arising as it does from innumerable

A. Approximating integrals numerically

f (x) dx by the sum