Vous êtes sur la page 1sur 158

MATH 10A METHODS OF MATHEMATICS:

CALCULUS, STATISTICS AND


COMBINATORICS
L. Pachter, B. Sturmfels and L.C. Evans
Department of Mathematics
University of California, Berkeley

October 25, 2015

1 / 158

Overview of Part 2: Integral calculus

The main references for this part are


I

I
I

Sebastian J. Schreiber, Karl Smith and Wayne Getz,


Calculus for Life Sciences 1E for UC Berkeley, Wiley
J. Stewart, Calculus, 7th edition, Cengage
C. Neuhauser, Calculus for Biology and Medicine, 3rd edition,
Prentice Hall

2 / 158

1. Histograms
2. Integrals and area
3. Approximation methods
4. Applications of integration
5. Antiderivatives, Fundamental Theorem of Calculus
6. Integration techniques

3 / 158

Section 1
Histograms

4 / 158

A. Displaying data
DEFINITION
A histogram is a graphical representation providing a visual impression
of the distribution of data. It consists of adjacent rectangles, erected over
given intervals, with areas equal to the proportion of the observations in
each interval.

Density

0.00 0.05 0.10 0.15 0.20 0.25 0.30

A Histogram

We will sometimes also think of the intervals as bins into which our data
points are distributed.
5 / 158

Example 1.1 (Birth weight and smoking)

Mothers who did not smoke

Density

0.025
0.020
0.015
0.010
0.005
0.000
60

80

100

120

140

160

180

160

180

Birth weight in ounces

Mothers who smoked

Density

0.020
0.015
0.010
0.005
0.000
60

80

100

120

140

Birth weight in ounces

6 / 158

How to draw histograms


I

First, choose the consecutive intervals (or bins) I1 , I2 , . . . Im into


which the data points are distributed.

Calculate the number of data points nk within each interval Ik . Then


N = n1 + n2 + + nm

is the total number of points.


We want the area of the rectangle Rk above the interval Ik to be nNk .
Since the area of a rectangle equals its height times its width, we
take
nk
sk = height of Rk =
N(length of Ik )
Then the total area of the histogram equals
m
X
k=1

(area of Rk ) =

m
X
nk
k=1

= 1.

7 / 158

Area (Percent) = height x width


So, height = percent/width

Bin width

8 / 158

Example 1.2 (Calculating percentiles using histograms)


What percentage of women who smoked had children with birth weights
less than 90 ounces?

Mothers who smoked

0.020

Density

0.015

0.010

0.005

0.000
60

80

100

120

140

160

180

Birth weight in ounces

9 / 158

We see that 8.68% of mothers who smoked had a child weighing less
than 90 ounces (5.63 lbs).
The red lines represent the 25th, 50th (median), and 75th
percentiles.

10 / 158

B. Partitioning an interval

When we decide upon the intervals/bins into which to sort our data
points for a histogram, we are in effect creating a partition of an interval.

DEFINITION
If a = x0 < x1 < < xm1 < xm = b, we call P = {x0 , x1 , ..., xm } a
partition of an interval [a, b].
The partition P divides the interval [a, b] into the m closed subintervals
I1 = [x0 , x1 ], I2 = [x1 , x2 ], ..., Im = [xm1 , xm ].

11 / 158

Example 1.3
Let Y = {1.2, 1.5, 1.5, 2.2, 2.2, 2.7, 5.5, 5.7} be the data we want to
graph. The minimum is 1.2 and the maximum is 5.7. We round 1.2 down
to the nearest integer and round 5.7 up to the nearest integer.
We choose our partition of [1, 6] to be P = {1, 2, 3, 5, 6}.

12 / 158

C. Step functions

To calculate and plot the height of the rectangles, we were actually


defining a piecewise constant function

if x0 x x1

s 1

s 2
if x1 < x x2
s(x) = .
..

sm if xm1 < x xm ,
where sk is the height of the rectangle over the kth subinterval.

13 / 158

Example 1.4
For our data, the percentages in the intervals (areas of the rectangles)
are 37.5, 37.5, 0, and 25. We divide each of these percentages by
100(width of the interval).
The function is then defined as

0.375

0.375
s(x) =

0.25

if
if
if
if

1x
2<x
3<x
5<x

2
3
5
6

14 / 158

DEFINITION
Let P = {x0 , x1 , ..., xm } be a partition of [a, b]. A step function is a
function s : [a, b] R that is constant on the open subintervals of P.
Denote sk as the constant value that s takes in the kth open subinterval
Ik :
s(x) = sk if xk1 < x xk , (k = 1, 2, ..., m).
0.4

0.3

0.2

0.1

0.0

Concerning the breakpoints we assume s(xk ) = sk for k = 1, 2, ..., m.


15 / 158

Histograms are step functions


Remember: you can always think of histograms as step functions.
0.4

0.4

0.3

0.3

Percent

0.2

0.2

0.1

0.1

0.0

0.0

As we collect more data, we might make the partition of [a, b] finer and
finer. What happens then?

16 / 158

Section 2
Integrals and area

17 / 158

A. Integral of a step function


Suppose s and t are step functions on [a, b]. Let P1 and P2 be partitions
of [a, b] such that s is constant on the subintervals of P1 and t is constant
on the open subintervals of P2 . Define the sum u = s + t by the rule
u(x) = s(x) + t(x)

if a x b.

x1

x1

b a

x2

b a

x2

To show that u is actually a step function, we must find a partition P


such that u is constant on the open subintervals of P.

DEFINITION
The common refinement of P1 and P2 is the union P = P1 P2 .

18 / 158

DEFINITION
The integral of a step function s from a to b is the number
Z

s(x) dx :=
a

m
X

sk (xk xk1 ).

k=1
s4

s2

s5

s1

s6
s3

x1

x2

x3

x4

x5

If each sk 0, the integral is the area between the graph of the step
function and the xaxis.
19 / 158

THEOREM (Additive Property)


b

Z
s(x) + t(x) dx =

Z
s(x) dx +

t(x) dx
a

s+t

x1

b a

x2

b a

x2

x1

20 / 158

THEOREM (Homogeneous Property)


Z

Z
c s(x)dx = c

s(x)dx
a

2s

x1

x1

21 / 158

We can combine the previous two assertions:

THEOREM (Linearity)
Z

Z
c1 s(x) + c2 t(x) dx = c1

Z
s(x) dx + c2

t(x) dx
a

THEOREM (Invariance under translation)


Z

b+c

s(x c) dx

s(x) dx =
a

for every real number c

a+c

s(x)

s(xc)

x1+c

b+c

x1

b a+c

22 / 158

THEOREM (Comparison)
If s(x) t(x) for every x [a, b] then
Z

Z
s(x) dx

t(x) dx.
a

THEOREM (Expansion or contraction of the interval)


Z

kb

s
ka

x 
k

Z
dx = k

s(x) dx

for every k > 0

Next, we turn to the problem of computing integrals of more general


functions. To do so, we will need to take limits.

23 / 158

B. Riemann integrals
Our next goal is finding the area under a curve:

A=?

24 / 158

We instead find the area of a collection of rectangles that approximate


the desired area. That is, we approximate f by a step function.

x0

x1

x2

A (x1 x0 )f (x0 ) + (x2 x1 )f (x1 )

25 / 158

Using 10 subintervals makes the approximation even better:

x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10

10
X

f (xk1 )(xk xk1 )

k=1

26 / 158

Notation
Suppose P is a partition, dividing our interval [a, b] into m subintervals
I1 , . . . , Im .
(i) Let
xk = xk xk1
denote the length of the k-th subinterval Ik .
(ii) Let xk be any point in the k-th subinterval Ik .
We will build a rectangle of height f (xk ) above Ik . The area of this
rectangle is f (xk )xk ; and so the total area is
m
X
k=1

f (xk )xk =

m
X

f (xk )(xk xk1 ).

k=1

This is an approximation to the area under the curve, called a Riemann


sum.

27 / 158

To find the actual area, we want to let m get bigger and bigger and xk
get smaller and smaller. If we then send m , we should get the
actual area.

DEFINITION
The Riemann integral of f from a to b is
Z

f (x) dx = lim
a

m
X

f (xk )xk ,

k=1

provided this limit exists, irrespective of the choice of the partition or the
choice of the points xk .
For nonnegative functions f , the integral
f between a and b.

Rb
a

f (x) dx gives the area under

28 / 158

A useful fact is that the Riemann integral always exists for continuous
functions:

THEOREM
If f : [a, b] R is continuous, then the limit on the previous slide exists;
and thus
Z b
f (x) dx is defined.
a

Rb
Remark: It can also be shown that a f (x) dx is defined if f is
piecewise continuous, meaning that we can subdivide [a, b] into finitely
many subintervals I1 , . . . , Im , such that f restricted to each interval
lk = [xk1 , xk ] is continuous (after possibly being redefined at the
endpoints).
But how can we actually compute integrals?

29 / 158

Useful formulas for Riemann sums


When calculating Riemann sums, the following rules will be helpful:
I

m
X
i=1

m
X

i=

m(m + 1)
2

i2 =

m(m + 1)(2m + 1)
6

i3 =

m2 (m + 1)2
4

ri =

r m+1 1
r 1

i=1
I

m
X
i=1

m
X
i=0

(r 6= 1).

We will discuss in Math 10B how to use mathematical induction to


establish the first three of these formulas.

30 / 158

Example 2.1
Find

R2
0

x 2 dx.

SOLUTION:
For simplicity, lets choose our m subintervals to all be the same size.
2
Then xk = 20
m = m.
Also for simplicity, lets choose xk to be the left endpoint of our
subintervals. Then x1 = 0, x2 = m2 , x3 = 2 m2 , . . . , xk = (k 1)

2
m

We must therefore compute


2
m 
m1
X
X  2i 2 2
2
2(k 1)
= lim
,
lim
m
m
m m
m
m
k=1

i=0

and for this will use formulas from the previous slide.

31 / 158

Z
0

m1
2 X 2 4
i 2
m m
m

x 2 dx = lim

i=0

m1
8 X 2
i
m m3

= lim

i=0

8 (m 1)m(2m 1)
= lim
m m3
6
8 m3 (1 m1 )(2 m1 )
= lim
m m3
6
8(1 0)(2 + 0)
=
6
16
8
=
= .
6
3

A=8 3

32 / 158

Integrals of powers of x

Calculations similar to those in the previous example show that


Z
a

x j dx =

b j+1 aj+1
j +1

for all b > a and each positive integer j.


We will later learn simpler ways to derive these formulas.

33 / 158

Example
2.2
Z
10

e x dx.

Find
2

SOLUTION: Let us take x =


Then
Z
2

10

e x dx = lim

= lim

m
X

102
m

e 2+(k1)8/m

and xk = 2 + (k 1) m8 .

m1
8
8 X 2 8i/m
e e
= lim
m m m
i=0

k=1

8 2
e
m

8
m

m1
X

e 8/m

i

i=0

8/m
e 8/m 1
8/m
= (e 10 e 2 ) lim 8/m
= e 10 e 2 .
m e
1
|
{z
}

= e 2 lim (e 8 1)
m

=1

34 / 158

The foregoing calculation used the fact that


lim

8/n
= 1.
e 8/n 1

To confirm this, observe that


lim

h
8/n
= lim
e 8/n 1 h0 e h 1
1
=
h
limh0 e h1
1
= 0
e
= 1,

since (e x )0 = e x .

35 / 158

Properties of Riemann integrals


We earlier identified various properties for the integrals of step functions.
By approximation, the same properties hold for the integrals of any
function:

THEOREM (Linearity)
If the functions f , g : [a, b] R have integrals and if c1 , c2 are constants,
then
Z

Z
c1 f (x) + c2 g (x) dx = c1

Z
f (x)g (x)dx 6=

But in general,
a

Z
f (x)dx

g (x) dx ,

f (x) dx + c2

g (x)dx
a

36 / 158

THEOREM (Invariance under translation)


Z

b+c

f (x c) dx

f (x) dx =
a

for every real number c

a+c

THEOREM (Comparison)
If f (x) g (x) for every x [a, b], then
Z

Z
f (x) dx

g (x) dx.
a

THEOREM (Expansion or contraction of the interval)


Z

kb

f
ka

x 
k

Z
dx = k

f (x) dx

for every k > 0

37 / 158

THEOREM (Additivity of integrals over different intervals)


If a < b < c, then
Z

Z
f (x) dx +

Z
f (x) dx =

f (x) dx
a

38 / 158

Negative area
When f < 0, then we regard the area above the graph of f and below the
x-axis as negative.

Example 2.3
Z

x 2 dx =

8
3

A=

39 / 158

DEFINITION (Exchanging limits of integration)


If a < b, then we define
Z

Z
f (x) dx =

f (x) dx
a

40 / 158

C. Improper Integrals
If the function f is integrable on [a, b] for each real number b > a, then
we define:
Z +
Z b
f (x) dx = lim
f (x) dx
a

provided the limit exists.


Likewise, if for the real number b the function f is integrable on [a, b] for
each real number a < b, we then define
Z

Z
f (x) dx = lim

f (x) dx
a

if the limit exists.

41 / 158

Finally, we define
Z

f (x) dx =

Z
f (x) dx +

f (x) dx
c

where c R is arbitrary, assuming each of the integrals in the right hand


side is defined.
In other words, we are assuming that the right hand side above is not of
the form () + () or () + ().
Integrals of the type defined on this and the previous slide are called
improper integrals.

42 / 158

Example 2.4
We will learn later that
Z

Rb

1
a x2

dx =

1
dx = lim
b
x2

1
a

1
b

for b > a > 0. Therefore



1
1
= 1.
dx = lim 1
b
x2
b

Example 2.5
Using the rule

Rb
a

x j dx =

b j+1 aj+1
j+1

for j = 1, 2, . . . , we see that


b4
0 = ,
b 0
b
4
0


Z 0
Z 0
a4
3
3
x dx = lim
x dx = lim
0
= ,
a a
a
4

Z
Z 0
Z
x 3 dx =
x 3 dx +
x 3 dx is undefined.
Z

x 3 dx = lim

x 3 dx = lim

43 / 158

Tolstoy on integration (from War and Peace)

The movement of humanity, arising as it does from innumerable


arbitrary human wills, is continuous. To understand the laws of this
continuous movement is the aim of history. But to arrive at these laws,
resulting from the sum of all those human wills, mans mind postulates
arbitrary and disconnected units.
. . . Only by taking infinitesimally small units for observation (the
differential of history, that is, the individual tendencies of men) and
attaining to the art of integrating them (that is, finding the sum of these
infinitesimals) can we hope to arrive at the laws of history.

44 / 158

Section 3
Approximation methods

45 / 158

A. Approximating integrals numerically


Rb
In order to numerically
the value of the integral a f (x) dx,
Pn approximate
we can compute k=1 f (xk )xk with a large value of n. To simply, we
use equal sized subintervals, each of width
x =

ba
.
n

Let

ba
n
denote the left endpoint of each subinterval.
xk = a + (k 1)

DEFINITION
The left endpoint rule approximates the integral
n

ba X
Ln =
f
n
k=1

Rb
a

ba
a + (k 1)
n

f (x) dx by the sum



.

46 / 158

Example 3.1
Fix n = 5. The step size is x = 31
5 = 0.4. The left endpoints are
x0 = 1, x1 = 1.4, x2 = 1.8, x3 = 2.2, x4 = 2.6.

1.0

Z
1

e x dx L5 =

1.4

1.8

2.2

2.6

3.0

2
(f (1) + f (1.4) + f (1.8) + f (2.2) + f (2.6))
5

47 / 158

Theres nothing particularly special about the left endpoints, so we could


just as easily use the right endpoints
xk = a + k

ba
.
n

DEFINITION

Rb
The right endpoint rule approximates the integral a f (x) dx by the
sum


n
ba X
ba
Rn =
f a+k
.
n
n
k=1

48 / 158

Example 3.2
For our example

R3
1

e x dx, this now gives

1.0

Z
1

e x dx R5 =

1.4

1.8

2.2

2.6

3.0

2
(f (1.4) + f (1.8) + f (2.2) + f (2.6) + f (3))
5

49 / 158

Left versus right

If we compare the formulas for Ln and Rn , we see they only differ in two
terms out of the entire sum:
ba
(f (x0 ) + f (x1 ) + f (x2 ) + + f (xn1 )
)
n
ba
Rn =
(
f (x1 ) + f (x2 ) + + f (xn1 ) + f (xn ))
n
Ln =

So the only real difference is whether we include f (a) = f (x0 ) or


f (b) = f (xn ) in the sum.
Which is better?

50 / 158

Example 3.3
R3
2
For the particular example of 1 e x dx, we can see graphically that the
left endpoint rule gives an overestimate and the right endpoint rule gives
an underestimate:

1.0

1.8

2.6

1.0

1.8

2.6

51 / 158

More accurate methods


We will see now that some surprisingly simple modifications of the
formulas above give us much better approximations.
One idea is to compromise between the left- and right-endpoints, by
choosing instead the midpoint of each subinterval,
xk = a + (k 12 ) ba
n .

DEFINITION
The midpoint rule approximates the integral

Rb
a

f (x) dx by

Mn =

ba X
f a + (k 12 )
n

ba
n

k=1

52 / 158

1.2 1.6 2.0 2.4 2.8

53 / 158

Trapezoid rule
Another way to improve the accuracy is not to approximate by a
rectangle in each subinterval, but rather to approximate by a trapezoid,
gotten by drawing a diagonal line from (xk1 , f (xk1 )) to (xk , f (xk )):

1.0

1.4

1.8

2.2

2.6

3.0

54 / 158

In this case, we get on each subinterval Ik = [xk1 , xk ] a small trapezoid,


f (xk1 )+f (xk )
2
x:
the area of which is h1 +h
2 x =
2

h1

h2
x

DEFINITION
The trapezoid rule approximates the integral
n

Tn =

ba X
f
2n
k=1


a + (k 1)

ba
n

Rb
a

f (x) dx by


+f

a+k

ba
n


.

55 / 158

We can also write


Tn = x

f (x0 ) + f (x1 )
f (x1 ) + f (x2 )
f (xn1 ) + f (xn )
+ x
+ + x
2
2
2

x
( f (x0 ) + f (x1 ) + f (x1 ) + f (x2 ) + + f (xn1 ) + f (xn ) )
2
x
=
( f (x0 ) + 2f (x1 ) + 2f (x2 ) + + 2f (xn1 ) + f (xn ) )
2
=

Notice also
Tn =

1
(Ln + Rn ).
2

The trapezoid rule is thus the average of the left and right endpoint
rules. We will see that this averaging process makes the errors for Tn
much smaller than for either Ln or Rn !

56 / 158

Example 3.4

1.0

T5 =

1.4

1.8

2.2

2.6

3.0



2
f (1) + 2f (1.4) + 2f (1.8) + 2f (2.2) + 2f (2.6) + f (3)
25

57 / 158

Error bounds
Relatively straightforward calculus methods, omitted in these notes, let
us estimate the accuracy of our approximations:

THEOREM (Error estimates for midpoint and trapezoid


rules)
Assume the function f is twice differentiable on the interval [a, b], with
|f 00 (x)| C

(a x b)

for some constant C . Then


Z

b
C (b a)3


f dx Mn

a

24n2
and

Z

b
C (b a)3


f dx Tn

a

12n2

58 / 158

Interpretation
We say that midpoint and trapezoid rules are of order n12 . Since the
step size is x = ba
n , we can equivalently say that these methods are of
order (x)2 . This means, loosely speaking, that if we double the number
of points from n to 2n the error should drop by at least 14 .
It turns out that the left- and right-endpoint rules are only of order n1
(equivalently, of order x). Since n12 is much, much smaller than n1 for
large n, the midpoint and trapezoid rules are much more accurate.
More sophisticated approximations are of even higher order:

59 / 158

Simpsons rule
DEFINITION
If n is an even integer, Simpsons rule approximates the integral
Rb
f (x) dx by
a
Sn =

x
(f (x0 ) + 4f (x1 ) + 2f (x2 ) + 4f (x3 ) + + 4f (xn1 ) + f (xn )) .
3

It turns out that


Sn =

4Tn T n2
,
3

and it can be shown that



Z
K (b a)5
b


f dx Sn

a

180n4
provided |f (4) (x)| K for all a x b. So Simpsons rule is of order

1
n4 .
60 / 158

Section 4
Applications of integration

61 / 158

A. Defining new functions


Many important functions used in the theoretical and applied sciences are
defined via integrals.

Example 4.1 (Logarithms as integrals)


We earlier reminded you about the natural logarithm ln, a key formula for
which is
ln(xy ) = ln x + ln y (x, y > 0).
But how to we know that a function with this useful property even exists?
A systematic approach is to define the natural logarithm by the formula
Z
ln x =
1

1
dt
t

(x > 0);

and then to prove that the natural log, so defined, really does satisfy
ln(xy ) = ln x + ln y . When we have later developed the relevant calculus
skills, we will do this.
62 / 158

The foregoing also provides an interesting geometric interpretation of the


number e. It is that value of the upper limit of integration for which
Z e
1
dt = 1.
1 t

Example 4.2 (The Gamma function)


The Gamma function is
Z
(x) =

t x1 e t dt.

This improper integral exists for all positive real numbers x.


Later, after we have developed more integration techniques, we will
derive some interesting formulas for the Gamma function. In particular,
(n) = (n 1)! for all positive integers n.

63 / 158

B. Length of curves
If f : [a, b] R is a function given by some explicit formula, then all the
geometric properties of the curve determined by graph of f must
somehow be contained within the formula. How can we extract this
information?
One important use of calculus is providing ways for us to to compute
various geometric properties, for instance the length of curves:

Example 4.3 (Length of curves)


The length L of the curve determined by the graph of f is given by
Z
L=

p
1 + (f 0 )2 dx

64 / 158

C. Approximating functions by polynomials


We discussed earlier the problem of approximating a given function f by
a simpler polynomial of the form
g (x) = an x n + an1 x n1 + + a1 x + a0 .
One solution is to use the Taylor polynomial
g (x) = Tn (x) =

n
X
f (k) (a)
k=0

k!

(x a)k ,

introduced earlier. However we observed earlier that this


approximation requires that we have available very detailed
information about the function f at the specific point x = a. We
need to know f (a), f 0 (a), f 00 (a), . . . , f (n) (a), and these would be
essentially impossible to find if, say, f were determined by experimental
data.
We need another, more robust way to approximate by polynomials.
65 / 158

One very useful idea is to use integrals to measure the error of our
approximations
For this, let us assume that f : [a, b] R is given, and define then the
integral error function
E (a0 , a1 , . . . ,an1 , an )
Z b
=
(f (x) g (x))2 dx
a

Z
=

(f (x) (an x n + an1 x n1 + + a1 x + a0 ))2 dx.

The idea now is to select the coefficients a0 , a1 , . . . an1 , an to minimize


this error.
This however requires that we know how to minimize the function
E (a0 , a1 , . . . , an1 , an ) depending on n + 1 variables, and this is beyond
the scope of Math 10. But in practice computers can quickly compute
the answers to high precision.
66 / 158

D. Integrating densities
Example 4.4 (Chemical concentration)
Suppose that some chemical (say, an insecticide) is spread unevenly along
a thin strip of land. We may for simplicity assume the region to be
one-dimensional, lying along the x axis. Let
(x) = concentration of the chemical at x.
What is the total amount of insecticide spread in the region a x b?
The total amount of the chemical between a and b is
Z b
(x) dx.
a

( = rho.)

67 / 158

Example 4.5 (Mass density)


Suppose that a straight piece of wire is made of a mixture of two metals,
the proportion of which changes along the wire. Assume for simplicity
the wire is one dimensional and that
(x) = mass density of the wire at x.
What is the total mass of the wire for a x b?
The total mass is
Z

(x) dx.
a

These two examples illustrate the point that the total amount of any
quantity between the points a and b is the integral of its density over the
interval [a, b].

68 / 158

E. Integral test for series convergence


THEOREM
Suppose that f : (0, ) [0, ) is a nonnegative, decreasing function.
Set
ak = f (k) (k = 1, 2, . . . ).
P
Then k=1 ak converges if and only if
Z
f (x) dx < .
1

To see why this is true, look at the pictures on the next slide, which show
geometrically that

X
k=1

Z
ak

f (x) dx
1

ak .

k=2

69 / 158

graph of y = f(x)

a1

a1

a2

a2

a3

a3

a4

a4

4
70 / 158

Example 4.6
Show that

X
1
kp
k=1

converges if p > 1.
SOLUTION: We will learn later that if b > a > 0, then
Z b
1
a1p b 1p
dx
=
.
p
p1
a x
Therefore
Z
1

1
dx = lim
b
xp

Z
1

1
1 b 1p
1
dx = lim
=
p
b p 1
x
p1

is finite. Note that limb b 1p = 0, since p > 1.

71 / 158

F. Integration and probabilities

In this section we will learn how integration can help us compute the
probabilities of certain random events.
We provide first some introductory motivation for the idea that areas
(and therefore integration) are somehow related to probabilities.

Example 4.7 (Simulating coin tosses)


I

Flip a fair coin 200 times.

Record the number of heads out of the 200 flips.

Repeat the process N times.

72 / 158

Histogram for N = 100 times:

0.06
0.04
0.02
0.00

Density

0.08

0.10

200 Coin Tosses

80

90

100

110

120

Number of Heads

73 / 158

Histogram for number of heads in 200 coin tosses repeated 1,000 times:

0.03
0.02
0.01
0.00

Density

0.04

0.05

0.06

200 Coin Tosses

80

90

100

110

120

Number of Heads

74 / 158

Histogram for number of heads in 200 coin tosses repeated 10,000 times:

0.03
0.02
0.01
0.00

Density

0.04

0.05

200 Coin Tosses

80

90

100

110

120

Number of Heads

75 / 158

0.03
0.00

0.01

0.02

Density

0.04

0.05

200 Coin Tosses

80

90

100

110

120

Number of Heads

This function in blue looks like a smooth version of our step function!
What is this function?
76 / 158

DEFINITION
A Gaussian function is a function having the formula
f (x) =

(x)2
1
e 22 .
2

Gaussian functions comprise a family of bell-shaped curves, each


determined by the parameters R and > 0.
As we see in the picture on the next slide gives the center of the
bell-shaped curve. The parameter determines the thickness and height
of the curve. We call the mean and the standard deviation, and will
later explain the probabilistic meaning of these terms.
( = mu, = sigma)

77 / 158

0.04

f(x)

0.03

0.02

0.01

0.00
70

80

90

100

110

120

130

The graph of f (x) =

1 e
2

(x)2
2 2

78 / 158

Examples
Gaussian Functions

= 0, = 0.5

0.8

0.6

0.4

0.2

0.0
15

10

10

15

20

79 / 158

Gaussian Functions
= 0, = 0.5

0.8

0.6

0.4

= 0, = 1

0.2

0.0
15

10

10

15

20

80 / 158

Gaussian Functions
= 0, = 0.5

0.8

0.6

0.4

= 0, = 1

0.2

= 0, = 7
0.0
15

10

10

15

20

81 / 158

Gaussian Functions
= 0, = 0.5

0.8

0.6

0.4

= 0, = 1

0.2

= 8, = 3

= 0, = 7
0.0
15

10

10

15

20

82 / 158

We will see later that a Gaussian function f corresponds to a normal (or


Gaussian) probability distribution. In particular,
I

Total area under the curve is always 1

The graph of f is symmetric around : f ( + x) = f ( x)


Normal Distribution
0.4

0.3

0.2

Area = 1
0.1

0.0
4

83 / 158

DEFINITION
The standard normal distribution has mean = 0, standard deviation
= 1 and is therefore
2
1
f (x) = e x /2 .
2

The area to right of 0 equals 21 , and the area to left of 0 equals 21 .

0.3
0.2

0.2

0.3

0.4

Standard
Normal Distribution

0.4

Standard
Normal Distribution

0.0

0.0

0.1

Area = 0.5

0.1

Area = 0.5

84 / 158

For the standard normal distribution,

0.3
0.2

0.3

0.4

Standard
Normal Distribution

0.4

Standard
Normal Distribution

0.2

Area
=
0.95

0.1

Area
=
0.68

0.0

the area between -1 and 1 equals 0.68,


the area between -2 and 2 equals 0.95,
the area between -3 and 3 equals 0.997.

0.1

0.0

85 / 158

0.3
0.2

Shaded area
=
.5*0.32

0.0

0.1

Shaded area
=
10.68

0.0

0.1

0.2

0.3

0.4

Standard
Normal Distribution

0.4

Standard
Normal Distribution

86 / 158

We can use the standard normal to calculate areas under the curve for
any Gaussian distribution.

Example 4.8
Suppose we have a normal distribution with = 50 and = 5. What is
the area under the curve to the left of 40?
SOLUTION: We first convert 40 to standard units, by subtracting the
mean and dividing by the standard deviation:
40
40 50
=
= 2.

5
We now need to find the area to the left of 2 for the standard normal
distribution.For this, we can use an online applet 1 from the UC Berkeley
Statistics Department to evaluate numerically areas under the curve of
the standard normal (with = 0, = 1).

1 http://statistics.berkeley.edu/

~stark/Java/Html/NormHiLite.htm
87 / 158

Using the applet, we learn that the area under the curve of the standard
normal between -2 and 0 is approximately .477.
Since the total area under the curve to the left of 0 is .5, it follows that
the area to the left of 2 is approximately
.5 .477 = .023

88 / 158

Example 4.9 (Womens heights)


Assume that US womens heights are normally distributed with mean 63
inches and standard deviation 3 inches.
About what percentage of US women are taller than 66 inches?
SOLUTION: Geometrically, we want to calculate the area to the right of
66. For our data, = 63 and = 3.
We as before convert 66 to standard units:
66 63
66
=
= 1.

3
Using the online applet we learn that the area under the standard normal
curve between 0 and 1 is approximately .341. Hence the area to the right
of 1 is about
.5 .341 = .159
So about 16% of women are taller than 66 inches.

89 / 158

Introduction to computing probabilities


We have introduced the idea that areas under curves can be interpreted
as probabilities, and now provide more mathematical details, which will
be further elaborated later. In particular we will learn in Part III of this
course about the concepts of a probability space
(, P)
and a random variable
X : R.

Interpretation
More precise definitions will appear later, but for now think of the
probability space as some sort of mathematical model for random
occurrences, for which P means the probability. And think of X as
giving the random outcomes of experiments or measurements.

90 / 158

DEFINITION
The cumulative distribution function (cdf) of a random variable X is
the function
F (x) = P(X x) ,
defined for < x < . In other words,
F (x) is the probability that X x.
F maps real numbers to a probability value in [0, 1]:
F : R [0, 1].
The cumulative distribution function is increasing and satisfies
lim F (x) = 0, lim F (x) = 1

91 / 158

DEFINITION
The probability density function (pdf) of a random variable X is a
nonnegative function f that has the following properties:
R
I
f (x) dx = 1

The probability that X falls in the interval (a, b) is the area under
the density function between a and b:
Z
P(a X b) =

f (x) dx.
a

So when a random variable X has a pdf f , we can calculate probabilities


by integrating f .
Rc
In particular, P(X = c) = c f (x) dx = 0. And since P(X = c) = 0, we
dont need to worry about endpoints:
P(a X b) = P(a < X b) = P(a X < b) = P(a < X < b).

92 / 158

Example 4.10
As noted earlier, the normal distribution has as its probability density
function the Gaussian function
f (x) =

(x)2
1
e 22 .
2

0.4

0.3

0.2

0.1

0.0
4

+ 2

+ 3

+ 4

93 / 158

Example 4.11
The uniform distribution gives probabilities for a continuous random
variable that takes values in the interval (a, b) and each value is equally
likely. The probability density function is
(
1
if a < x < b
.
f (x) = ba
0 otherwise
Uniform Distribution for (3,3)

0.15

0.10

0.05

0.00
4

94 / 158

Using the pdf to find the cdf


If we let a = and b = x, we can use the pdf to find the cdf:
Z x
F (x) = P(X x) = P( < X x) =
f (y )dy

Normal Distribution
0.4

0.3

f(x)

0.2

F(1)

0.1

0.0
4

95 / 158

Using the cdf to find the pdf


Now if we focus on area under the curve, we can use the cdf to find the
pdf. Namely, f can be recovered from F in the following sense:
F (b) F (a) = P(X b) P(X a) = P(a X b)
Z b
Z a
Z b
=
f (x) dx
f (x) dx =
f (x) dx

Normal Distribution
0.4

F(1)F(1)
0.3

f(x)

0.2

0.1

0.0
4

96 / 158

Mathematical relationship between pdf and cdf


Our discussion thus far shows that for continuous random variables, we
have
Z x
F (x) =
f (y ) dy

and
Z

f (y ) dy = F (b) F (a)
a

There is a very important relationship between the functions F and f


that can explain both of these properties: f is the derivative of F :
f = F 0.
The properties above follow from the Fundamental Theorem of
Calculus, which we discuss next.

97 / 158

Section 5
Antiderivatives, Fundamental Theorem of
Calculus

98 / 158

A. Antiderivatives
I

When you learn how to add, you then learn how undo the addition
via subtraction.

When you learn how to multiply, you then learn how to undo
multiplication via division.

So far this semester we have learned how to take derivative of a function.


Now we ask the reverse: can we undo a derivative? Yes, using
antidifferentiation.

Example 5.1
Input: f (x) = x 2
Input: f (x) =

1
+ sin x
x

Output: F (x) =

1 3
x .
3

Output: F (x) = ln x cos x + 5

99 / 158

DEFINITION
Given the function f , a function F is called an antiderivative of f on the
interval (a, b) if
F 0 (x) = f (x)
for all x in (a, b).

Example 5.2
If f is a function which describes how some quantity is changing over
time t, an antiderivative F determines the amount of the quantity at any
time, up to an additive constant.
The location of a car is an antiderivative of its velocity.
The velocity of a car is an antiderivative of its acceleration.

100 / 158

THEOREM
If F is an antiderivative of f and G is an antiderivative of g , then F + G
is an antiderivative of f + g .

Proof.
This follows directly from the corresponding property for derivatives, since
(F + G )0 = F 0 + G 0 = f + g .

THEOREM
If F is an antiderivative of f and c is a constant, then c F is an
antiderivative of c f .

Proof.
By the constant multiple rule for differentiation,
c F )0 = c F 0 = c f

101 / 158

If F and G are antiderivatives of f and g , respectively, it is in general


NOT true that F G is an antiderivative of f g

Example 5.3
2

F (x) = x2 is an antiderivative of f (x) = x, and G (x) =


antiderivative of g (x) = x 2 , but
F (x)G (x) =

x3
3

is an

x5
x2 x3

=
2 3
6

is NOT an antiderivative of f (x)g (x) = x x 2 = x 3 .

Similarly,

F
f
is generally NOT an antiderivative of .
G
g

102 / 158

THEOREM (Antiderivatives differ by a constant)


Suppose that f is a function whose domain contains the interval (a, b),
and assume that F is an antiderivative of f on (a, b).
Then another function G is also an antiderivative of f on (a, b) if and
only if G = F + C , for some constant C .

Proof.
If F 0 = f , then
(F + C )0 = F 0 + (C )0 = F 0 = f .
Consequently, if F is an antiderivative of f , then so is F + C .
Conversely, if F is an antiderivative of f on (a, b), then any other
antiderivative G must satisfy
(G F )0 = G 0 F 0 = f f = 0.
This means that G F = C is constant.

103 / 158

Example 5.4 (Difference of antiderivatives)


Consider the two functions
F (x) =

x 1
x +1

and

G (x) =

2
x +1

Differentiate:


x1
x+1

2
x+1

0

0

(x+1)(x1)0 (x1)(x+1)0
(x+1)2

(x+1)(x1)
(x+1)2

2
(x+1)2

(2)(x + 1)1

0

2
(x+1)2

Thus F and G are antiderivatives of the same function. According to the


theorem on the previous slide, they must differ by a constant.
Check:

F (x) G (x) =

x 1
2
x 1+2

=
= 1.
x +1 x +1
x +1

104 / 158

Example 5.5 (From pdf to cdf)


Consider the Gaussian function
f (x) =

(x)2
1
e 2 2 .
2

This, as we have seen, is the probability distribution function of the


normal distribution.
What is the probabilistic meaning of an antiderivative F of f ?
One antiderivative of the probability distribution function (pdf) is the
cumulative distribution function (cdf):
Z

F (x) = P(X x) =

f (y ) dy

105 / 158

Given a function f , an antiderivative F , if it exists, must be unique up to


additive constant.
But do antiderivatives actually exist?

THEOREM (Existence of antiderivatives)


Suppose a < b and f is continuous on (a, b). Then, there exists a
function F with domain (a, b) such that F 0 (x) = f (x) for x (a, b).
Antiderivatives always exist for the functions we will encounter in this
course, even though it can be difficult (or impossible!) to find simple
formulas for them. Here is a particularly tantalizing instance of this:

Example 5.6
2

The antiderivative F of f (x) = e x has no simple formula.

106 / 158

Notation (Indefinite integral notation)


We use the notation

Z
f (x) dx = F (x) + C

to indicate that f is a function whose antiderivates are all of the form


F (x) + C for some function F (x) and an arbitrary constant C .
The antiderivative symbol
Z
f (x) dx
is also called the indefinite integral of f .
Remark. Right now, there is no reason to assume that this symbol has
any connection to the notation introduced earlier for area under the
curve:
Z
b

f (x) dx
a

However, we shall see later why this makes sense.


107 / 158

Example 5.7
Find all the antiderivatives of f (x) = x1 on the domain
R\{0} =R (, 0) (0, ). In other words, determine the indefinite
integral x1 dx.
SOLUTION: The domain consists of two intervals, which we will
analyze separately.
The interval (0, ). We need to think of a function F such that
F 0 (x) =

1
.
x

Remembering our earlier discussion, we recall for the natural logarithm


that
0
1
ln x = .
x

108 / 158

1
on the interval
x
(0, ) are the functions of the form ln x + C1 on that interval.

Thus we know that the antiderivatives of f (x) =

The interval (, 0). We want to use the same idea as before, but we
cant use the function ln x because we cant take logs of negative
numbers. If x is negative, then x is positive, so consider:
0
0
ln(x)
=
ln(x)

So, the antiderivatives of

1
x

1
x

1
x

(x)0

on (, 0) are the functions ln(x) + C2 .

109 / 158

Conclusion. A function F is an antiderivative of 1/x on the domain


(, 0) (0, ) if and only if there are constants C1 and C2 such that
(

ln x + C1

if x is in (0, )

ln(x) + C2

if x is in (, 0)

F (x) =

In practice, many people (mathematicians included!) only think about


the case where C1 = C2 and they write
Z

1
dx = ln |x| + C .
x

110 / 158

We can convert our rules for differentiation into rules for


antidifferentiation.

Example 5.8 (Antiderivatives of powers)


Suppose that p is a real number, p 6= 1. Then the antiderivative of
f (x) = x p
on the interval (0, ) are exactly the functions of the form
F (x) =

x p+1
+ C.
p+1

Check the derivative:


 p+1 0

0
x
1
1 
=
x p+1 =
(p + 1)x (p+1)1 = x p .
p+1
p+1
p+1

111 / 158

Example 5.9
There exists a unique function F on the interval (, ) such that
F (1) = 7 and F is an antiderivative of x 2 . Find F .
Z
x3
+ C.
SOLUTION: We know that F (x) = x 2 dx =
3
To find C , we plug in x = 1:
F (1) =

1
+C =7
3

This implies
C =7

1
20
=
.
3
3

Hence the solution is the function


F (x) =

x3
20
+ .
3
3

112 / 158

Example 5.10
Find all antiderivatives of f (x) = ln x.
SOLUTION: After playing around with this for a while, we make the
guess
F (x) = x ln x x.
Thereafter, we simply check its derivative:

0

0
x ln x x
=
x ln x (x)0


0
=
x ln x + ln x(x)0 1 (product rule)
=

x( x1 ) + ln x 1

ln x

So on the domain (0, ), we have


Z
ln x dx = x ln x x + C .
113 / 158

Important antiderivatives
You should learn the following antiderivatives:
Z

x p dx =
Z

x p+1
+C
p+1

if p 6= 1

1
dx = ln |x| + C
x

e x dx = e x + C

Z
sin x dx = cos x + C
Z
cos x dx = sin x + C

114 / 158

Example 5.11
1
x 2
+C = 2 +C
2
2x
Z
3x 2
5x 4

+C
5x 3 3x dx =
4
2
Z
Z

x 3/2
x dx = x 1/2 dx =
+ C.
3/2
Z

1
dx =
x3

x 3 dx =

115 / 158

Fun on the internet


Go to Google and search for
Integral Calculator

or

Antiderivative Calculator

This will give you several options such as


integrals.wolfram.com
If you type in
sqrt(e^x) + sin(x)/cos(x),
then you will learn that
Z

sin x
dx
ex +
cos x

2 e x ln(cos x) + C .

Next try
sqrt(e^x) * sin(x)/cos(x),
and also
e^(-x^2)
116 / 158

B. Fundamental Theorem of Calculus


We have defined the area under f between a and b to be
Z b
n
X
f (x) dx = lim
f (xk )xk .
n

k=1

Even for very simple functions, calculating these definite integrals using
the Riemann sum definition can be very difficult.
We now introduce the Fundamental Theorem of Calculus, which ties
together integration and differentiation.
This will allow us to compute the area under the curve by the formula
Z

f (x) dx = F (b) F (a).


a

117 / 158

THEOREM (Fundamental Theorem of Calculus)


(i) Suppose that f is a continuous function on [a, b]. If F is any
antiderivative of f on (a, b), then
b

f (x) dx = F (b) F (a).


a

Since F 0 = f , we can rewrite this to read


Z

F 0 (x) dx = F (b) F (a).

(ii) If f is continuous on [a, b], then for a < x < b,


d
dx

f (t) dt = f (x).
a

118 / 158

Area as a function
We can view the area under the curve y = f (x) between 0 and b as a
function of the unknown b:

F(b)

Let F (b) equal the shaded area under y = f (x) between 0 and b
as a function of b, as shown. The formula for that function is
Z b
F (b) =
f (x) dx.
0
119 / 158

Derivative of the area function


We compute the derivative F 0 of the area function. By definition,
F 0 (b) = lim

h0

F (b + h) F (b)
.
h

For h > 0, F (b + h) F (b) is the area under f (x) between b and b + h.


Z

b+h

F (b + h) F (b) =

f (x) dx
0

Z
=

Z
=

Z
f (x) dx

f (x) dx +
0

f (x) dx
0
b+h

f (x) dx
0

b+h

f (x) dx.
b

Now, divide this by h and make h smaller and smaller. What do you get?
120 / 158

Example 5.12
Lets consider a concrete example with f (x) = x 2 .
R b+h
We can estimate b f (x) dx using a single rectangle. The left endpoint
rule gives an underestimate and the right endpoint rule gives an
overestimate:
L1 F (b + h) F (b) R1

b+h

121 / 158

For f (x) = x 2 , we get


h b 2 F (b + h) F (b) h (b + h)2
Consequently,
b2

F (b + h) F (b)
(b + h)2
h

Now, evaluate the limit for h > 0:


lim

h0

F (b + h) F (b)
= b2
h

A similar calculation for h < 0 yields the same limit.


Conclusion. For every b, we have
F 0 (b) = b 2 = f (b).
Thus the area function F is an antiderivative of f (x) = x 2 .

122 / 158

Example 5.13 (Area under a parabola)


We know that
Z

x 2 dx =

x3
+ C.
3

So there is a constant C such that


x3
+ C.
3
Z 0
x 2 dx = 0.
We must have C = 0, because F (0) =
F (x) =

Remarkable Conclusions: The area under the curve y = x 2 between 0


3
and b is equal to F (b) = b3 . For 0 a < b, the area under the curve
y = x 2 between a and b equals
F (b) F (a) =

a3
b3
.
3
3

We find the area by simply evaluating an antiderivative at the


endpoints.
123 / 158

Example 5.14 (Using cdf to find pdf)


Recall that for probability distributions the integral of the cdf is the pdf.
That is, the cdf F (x) is the antiderivative of the pdf f (x).
Z x
F (x) = P(X x) = P( < X x) =
f (y ) dy

pdf of Normal Distribution

cdf of Normal Distribution

0.4

1.0
F(1)

0.8

0.3

0.6

F(0)

f(x)

0.2

0.4

F(1)

0.1

0.2

0.0

0.0
4

124 / 158

Example 5.15
R2

(x 5 x 3 ) dx.
Z
x p+1
SOLUTION: Since x p dx =
+ C,
p+1
x4
x6

is an antiderivative of f (x) = x 5 x 3 .
the function F (x) =
6
4

Compute

Z
1

 2
x6
x 4

6
4 1
 6
  6

24
1
14
2
=

6
4
6
4

 

64 16
1 1
=

6
4
6 4
27
=
4

(x 5 x 3 )dx =

125 / 158

Notation
We will often write
b
F (x) a = F (b) F (a).

126 / 158

Example 5.16
Compute

R
0

sin x dx.

SOLUTION: We know that cos x is an antiderivative of sin x.


So we have

Z
0

sin xdx = ( cos x)|0

= ( cos()) ( cos(0))
= (1) (1) = 2.
1

127 / 158

Example 5.17
Compute

R1
1

x 3 dx.

SOLUTION: We know that


So we have

1
3

x dx =
1

 1
x 4
4 1

x4
4

is an antiderivative of x 3 .

14
(1)4
=

4
4
= 0.

128 / 158

ExampleZ 5.18
Find

d
dx

sin(t 2 ) dt.

SOLUTION: By the Fundamental Theorem of Calculus, this is just


sin(x 2 ).

ExampleZ 5.19
Find

d
dx

sin(t 2 ) dt.

SOLUTION: We cant apply the Fundamental Theorem directly, but we


can do the following.
d
dx

Z
x

sin(t 2 ) dt =

 Z x

d

sin(t 2 ) dt
dx
3

= sin(x 2 ).

129 / 158

Example 5.20
d
Find
dx

x2

f (t) dt.
x

SOLUTION: Since x appears in both the upper and lower bounds of


integration, we split up the integral:
d
dx

Z
x

x2

d
f (t) dt =
dx

Z
f (t)dt +

x2

!
f (t)dt

Z x2
d
f (t)dt +
f (t)dt
dx 0
0
d 2
f (x) + f (x 2 )
x
dx
| {z }

d
=
dx
=

Chain Rule
= f (x) + f (x ) (2x) = 2x f (x 2 ) f (x).
2

130 / 158

Section 6
Integration techniques

131 / 158

The limits of antidifferentiation


I

Rb
Weve now seen that in order to compute a f (x) dx, we need only
find an antiderivative of f .
Recall that every continuous function f has an antiderivative,
Z x
F (x) =
f (t) dt.
0

Finding antiderivatives explicitly can be extremely challenging,


however.
Next well see how to invert the chain rule and the product rule we
learned for computing derivatives.
However, many functions just do not have simple antiderivatives. In
particular, there are no elementary formulas for the following:
R
R
R x2
e dx
sin(x 2 ) dx
cos(x 2 ) dx
Z

ex
dx
x

sin x
dx
x

cos x
dx
x
132 / 158

A. Substitution, changing variables


The Chain Rule states
F (g (x))0 = f (g (x))g 0 (x),
whenever F 0 = f , and therefore
Z b
Z
0
f (g (x))g (x) dx =
a

F (g (x))0 dx

= F (g (b)) F (g (a))
Z g (b)
=
f (u) du.
g (a)

This gives the substitution formula


Z
a

f (g (x))g 0 (x)dx =

g (b)

f (u) du.
g (a)

133 / 158

We can think of the substitution formula as giving us a way to change


variables from x to u = g (x), in which case we have the very useful
mnemonic:
du = g 0 (x)dx,
although strictly speaking the symbols du and dx are not defined by
themselves.
We can then write the substitution formula as
Z
Z
f (g (x))g 0 (x) dx = f (u) du.

134 / 158

Our main purpose in finding antidervatives is to evaluate definite


integrals. When using u-substitution, we can follow two routes:
I

Find an antiderivative as usual, and evaluate at end points.

An alternative (and usually easier) method is to replace the bounds


of integration when we change variables.

135 / 158

Example
6.1
Z
Find

x e x dx.

SOLUTION: If we set u = x 2 , then du = 2xdx. We obtain


Z
Z
2
2
1
x e x dx =
e x 2x dx
2
Z
1
=
e u du
2
1
= eu + C
2
1 x2
= e + C.
2
We must always check our work:


2
d 1 x2
1 2 d 2
e + C = ex
x + 0 = x ex
dx 2
2
dx

136 / 158

Example
6.2
Z
Find

cos(ln x)
dx
x

SOLUTION:This one looks pretty awful, but if we make the substitution


1
u = ln x, then du = dx and we have
x
Z

Z
cos(ln x)
dx = cos u du
x
= sin u + C
= sin(ln x) + C .

Again, we should check our work by computing the derivative of


F (x) = sin(ln x).

137 / 158

Many integrals can be solved in multiple ways. By a previous theorem,


we know all antiderivates will differ from each other by a constant.

Example 6.3
For example, we can find

x
x 2 +1

dx in two different ways:

Method 1. Set u = x 2 + 1, so du = 2x dx and x dx = 21 du:


Z
Z
x
du
1

dx =
2
u
x2 + 1
Z
1
u 1/2 du
=
2
p
1 u 1/2
+C =
x2 + 1 + C .
=
2 1/2

Method 2. Set u =
Z

x
x2 + 1

x 2 + 1. Then du =

2x
dx,
2 x 2 +1

Z
dx =

du = u + C =

and we get

p
x2 + 1 + C .

Both methods give the same answer.


138 / 158

Example
6.4
Z
Find

x5

p
1 + x 2 dx

SOLUTION: Lets try the substitution u = 1 + x 2 . Then du = 2x dx, so


x dx = du
2 :
Z
Z
p
p
5
2
x
1 + x dx = (x 2 )2 1 + x 2 x dx
Z
1
= (u 1)2 u du
2
Z
1
=
u 5/2 2u 3/2 + u 1/2 du
2


1 u 7/2
u 5/2
u 3/2
=
2
+
+C
2 7/2
5/2
3/2
=

(1 + x 2 )7/2
2(1 + x 2 )5/2
(1 + x 2 )3/2

+
+ C.
7
5
3

139 / 158

Example
6.5
Z
e

Find
1

ln x
dx
x

SOLUTION: Well use u = ln x, so du =

dx
x .

Note that u(1) = 0 and u(e) = 1. Thus we have


Z
1

ln x
dx =
x

Z
0

1
u 2
1
1
= 0= .
u du =
2 0
2
2

140 / 158

Example 6.6 (Normalizing constant for a cdf)


We wish to define a continuous probability distribution on the interval
= (1, e), by means of a probability distribution function of the form
f (x) =

1 ln x

.
Z
x

How should the constant Z be chosen?


SOLUTION: We want

f (x) dx = 1.
1

Equivalently,
1
Z

Z
1

ln x
dx = 1.
x

Therefore the previous example implies


Z e
ln x
1
Z=
dx = .
x
2
1

141 / 158

Example 6.7
Find

R5

dx
.
3 (23x)2

SOLUTION: We use u = 2 3x, so du = 3dx and thus dx = du/3.


Also u(3) = 7, u(5) = 13, and so we have
Z
3

1
1
dx =
(2 3x)2
3

13

1
du
u2

13
1 1
=
3 u 7


1
1
1
2
=

=
.
3 13 7
91

142 / 158

Example 6.8 (More on logarithms)


Recall that we have defined
Z

ln x =
1

1
dt
t

(x > 0).

Let us now compute for x, y > 0:


Z x
Z xy
Z xy
1
1
1
dt =
dt +
dt
ln(xy ) =
t
t
t
x
1
Z xy 1
1
= ln x +
dt
t
Zx y
1
= ln x +
du
u
1
= ln x + ln y ,
where we substituted u = xt , du =

dt
x .

143 / 158

Consequently, if we define the natural logarithm by the integral formula


Z x
1
ln x =
dt,
1 t
we can then deduce the standard formula
ln(xy ) = ln x + ln y .
It is an interesting exercise to use the definition to show also that
ln(x y ) = y ln x

(x > 0, y R).

144 / 158

B. Symmetry: even and odd functions


We now turn
R a our attention to the very special case of definite integrals of
the form a f (x) dx for functions f that have special symmetries:

DEFINITION
I
I

The function f is called even if f (x) = f (x).


The function f is called odd if f (x) = f (x).

The terms even and odd come from the power functions: x 2 , x 3 , x 4 , etc.
Even: f (x) = f (x)

Odd: f (x) = f (x)

Ra
Ra
It looks like we should have a f (x)dx = 2 0 f (x) dx for even functions
Ra
and a f (x) dx = 0 for odd functions. This is true.
145 / 158

THEOREM (Using symmetry)


Ra
If f is an odd function, then a f (x) dx = 0.
Ra
Ra
If f is an even function, then a f (x) dx = 2 0 f (x) dx.

Proof.
Z

f (x) dx =
a

f (x) dx +
a

Z
=

f (x) dx
0

Z
f (x) dx +

f (x) dx
Z a
Z a
f (u)(1) du +
f (x) dx
=
0
0
Z a
Z a
=
f (u) du +
f (x) dx = 0.
0

The proof that

Ra
a

(u = x, du = dx)

f (x) dx = 2

Ra
0

f (x) dx for even functions is similar.


146 / 158

Example Z6.9
2

Calculate
2

sin x
dx.
4 + 3x 2 + 2x 4

SOLUTION: Attempting to find an antiderivative would be a nightmare.


Luckily, the integrand is odd:
f (x) = f (x).
So, without calculating anything at all, we can conclude
Z

sin x
dx = 0.
4 + 3x 2 + 2x 4

147 / 158

C. Integration by parts
Recall from the Product Rule that
(fg )0 = f 0 g + fg 0 .
Now integrate and use the Fundamental Theorem of Calculus, to learn
that
Z b
Z b
Z b
0
0
f (x)g (x) dx +
f (x)g (x) dx =
(fg )0 (x) dx
a

= f (b)g (b) f (a)g (a)


b

= (fg )|a .
Rearranging gives the formula for integration by parts:
Z

b
0

f (x)g (x) dx =
a

b
(fg )|a

f (x)g 0 (x) dx

148 / 158

Let us now write u = g (x) and v = f (x). Recalling the useful (but
mathematically imprecise) expressions
du = g 0 dx, dv = f 0 dx,
we can rewrite the integration by parts formula as
Z

Z
u dv = uv

v du.

Whichever form of it we use, the point is that the integration by parts


formula gives us a way to move a derivative from one function onto
another within an integral. This quite often converts a difficult integral
into a simpler one, as we will see in subsequent examples.

149 / 158

Example 6.10
Find

x sin x dx.

SOLUTION: If we use u = x and dv = sin x dx, then du = dx and


v = cos x. So we have:
Z

Z
x sin x dx = x cos x

( cos x)dx
Z

= x cos x +

cos x dx

= x cos x + sin x + C
We should check our work:
d
(x cos x + sin x + C ) = cos x x( sin x) + cos x = x sin x.
dx

150 / 158

Example 6.11
Find

ln x dx.

SOLUTION: This one isnt obviously a candidate problem for


integration by parts. But let us try u = ln x and dv = dx.
Then we get du =

1
dx and v = x. Consequently,
x
Z
Z
1
ln x dx = x ln x x dx
x
Z
= x ln x 1 dx
= x ln x x + C

Check this answer!

151 / 158

Example 6.12
Find

x ln x dx.

SOLUTION: We choose u = ln x and dv = x dx. Then du =


v=

x2
, and therefore
2
Z

1
dx,
x

Z 2
x 1
x2

dx
2
2 x
Z
x 2 ln x
x
=

dx
2
2
x2
x 2 ln x
=

+ C.
2
4

x ln x dx = ln x

Again, confirm this answer.

152 / 158

Example
6.13
Z
Find

x 2 e 3x dx.

SOLUTION: Lets try u = x 2 , dv = e 3x dx. Then du = 2xdx and


v = 13 e 3x . Thus
Z
Z
1
2
x 2 e 3x dx = x 2 e 3x
xe 3x dx.
3
3
We can now integrate by parts again, this time using u = x, dv = e 3x dx
and thus du = dx, v = 31 e 3x :
Z
Z
1
2
xe 3x dx
x 2 e 3x dx = x 2 e 3x
3
3


Z
1 2 3x 2 1 3x 1
3x
xe
e dx
= x e
3
3 3
3



1 2 3x 2 1 3x 1 1 3x
= x e
xe
e +C
3
3 3
3 3


x 2 e 3x
2xe 3x
2e 3x
2C
=

+
+C
Question: why not
?
3
9
27
9
153 / 158

Repeated integration by parts

I
I

x 2 e 3x dx, we had to integrate by parts twice.


R
With a little work, you could find x 3 e 3x dx by integrating by parts
3 times.
R
To find x n e 3x dx, we would have to integrate by parts n times.
To find

In statistics, these kind of integrals are very useful for computing


moments of probability distributions.

154 / 158

Example 6.14 (Another trick)


Find

e x cos x dx.

SOLUTION: Try u = cos x, dv = e x dx. Then du = sin x dx, v = e x ,


and
Z
Z
e x cos x dx = e x cos x + e x sin x dx.
We integrate by parts again with u = sin x, dv = e x dx, and du = cos x
and v = e x . We now compute
Z
Z
x
x
e cos x dx = e cos x + e x sin x dx
Z
= e x cos x + (e x sin x e x cos x dx)
Z
This implies 2

e x cos x dx = e x cos x + e x sin x, and so


Z

e x cos x dx =

e x cos x + e x sin x
+C
2
155 / 158

Example 6.15 (More on the Gamma Function)


Recall that the Gamma function is defined by the integral
Z
(x) =
t x1 e t dt.
0

THEOREM
(i) The Gamma function satisfies
(x + 1) = x(x)

for all x (0, ).

(ii) In particular, the Gamma function is an extension of the factorial


function:
(n + 1) = n!

for nonnegative integers n.

156 / 158

Proof.
To prove this, we use integration by parts:
Z

u dv =
a

b
(uv )|a

v du.
a

Take u = t x and dv = e t dt; so that du = xt x1 dt and v = e t . Then


Z b
Z b
 b
t x e t dt =
e t t x a
(e t ) x t x1 dt
a

= e a ax e b b x + x

t x1 e t dt.

Now take the limit a 0 and b . Then the left hand side converges
to (x + 1), and the right hand side becomes x (x). Since
Z

(1) =
e t dt = e t 0 = 1,
0

the formula stated in (ii) follows from (x + 1) = x(x).


157 / 158

The Gamma function and statistics

Certain other values of the Gamma function will turn out to be important
in statistics. In particular,
 

= = 1.77245 . . . ;
2
although this calculation requires tools beyond Math 10. (Take Math 53!)
It follows from the rule (x + 1) = x(x) that
 
 
 
1
5
3
7
15
3
=
,
=
,
=
, ...

2
2
2
4
2
8

158 / 158

Vous aimerez peut-être aussi