Bio Calculus P

Calculus for the Life Sciences
Sebastian J. Schreiber, Karl J. Smith, and Wayne M. Getz
April 7, 2008
2
About the authors:

Sebastian J. Schreiber received his B.A. in mathematics from Boston University in 1989 and his Ph.D. in mathe-
matics from the University of California, Berkeley in 1995. He is currently Professor of Ecology and Evolution at the
University of California, Davis. Previously, he was an associate professor of mathematics at the College of William
and Mary, where he was the 2005 recipient of the Simon Prize for Excellence in the Teaching of Mathematics, and
Western Washington University. Professor Schreiber’s research on stochastic processes, nonlinear dynamics, and
applications to ecology, evolution, and epidemiology has been supported by grants from the U.S. National Science
Foundation and the U.S. National Oceanic and Atmospheric Administration. He is the author or co-author of over
40 scientific papers in peer-reviewed mathematics and biology journals. Several of these papers are co-authored with
undergraduate students that were supported by the National Science Foundation. Professor Schreiber is currently on
the editorial boards of the research journals: Mathematical Medicine and Biology, Journal of Biological Dynamics,
and Theoretical Ecology.
Karl J. Smith received his B.A. and M.A. (in 1967) degrees in mathematics form UCLA. He moved to northern
California in 1968 to teach at Santa Rosa Junior College, where he taught until his retirement in 1993. Along the way,
he served as department chair, and he received a Ph.D. in 1979 in mathematics education at Southeastern University.
A past president of the American Mathematical Association of Two-Year Colleges, Professor Smith is very active
nationally in mathematics education. He was founding editor of Western AMATYC News, a chairperson of the com-
mittee on Mathematics Excellence, and a NSF grant reviewer. He was a recipient in 1979 of an Outstanding Young
Men of America Award, in 1980 of an Outstanding Educator Award, and in 1989 of an Outstanding Teacher Award.
Professor Smith is the author of over 60 successful textbooks. Over two million students have learned mathematics
from his textbooks.
Wayne M. Getz received his B.Sc , B.Sc. Hons, and Ph.D. in applied mathematics from the University of the
Witwatersrand, South Africa, in 1971, 1972, and 1976 respectively. He was a research scientist at the National
Research Institute for Mathematical Sciences in South Africa until he moved to take up a faculty position in 1979 at
the University of California, Berkeley. He is currently Professor of Environmental Science and Chair of the Division
of Environmental Biology at UC Berkeley. Professor Getz also has a D.Sc. from the University of Cape Town
and is an Extraordinary Professor at the University of Pretoria, both in South Africa. Recognition for his research
in biomathematics and its application to various areas of physiology, behavior, ecology, and evolution include an
Alexander von Humbold US Senior Scientist Research Award in 1992, election to the American Association for the
Advancement of Science (1995), the California Academy of Sciences (2000), and the Royal Society of South Africa
(2003). He was appointed as a Chancellor’s Professor at Berkeley from 1998-2001. Professor Getz has served as a
consultant to the US and Canadian Governments and a US District Judge on matters pertaining to the management
of Fisheries, as a member of two National Academy of Sciences review panels, and is a founder and Trustee of the
South African Centre for Epidemiological modeling and Analysis. His research over the past 25 years has been
supported by the U. S. National Science Foundation, the National Institutes of Health, California Department of
Food and Agriculture, California Sea Grant, the A. P. Sloan Foundation, the Whitehall Foundation, DARPA, and the
Ellison Medical Foundation. Recently he received a prestigious James S. McDonnell 21st Century Science Initiative
Award. Professor Getz has published a book entitled Population Harvesting in the Princeton Monographs in
Population Biology series, edited other books and volumes, and is an author or coauthor on more than 150 scientific
papers in over 50 different peer-reviewed applied mathematics and biology journals.
©2008 Schreiber, Smith & Getz

Preface
If the 20th century belonged to physics, the 21st century may well belong to biology. Just 50 years after
the discovery of DNA’s chemical structure and the invention of the computer experiment, a revolution is
occurring in biology, driven by mathematical and computational science.
Jim Austin, US Editor of Science, and Carlos Castillo-Chavez, Professor of Biomathematics, Science, February 6, 2004
Calculus was invented in the second half of the seventeenth century by Isaac Newton and Gottfried Leibniz
to solve problems in physics and geometry. Calculus heralded in the age-of-physics with many of the advances in
mathematics over the past 300 years going hand-in-hand with the development of various fields of physics, such as
mechanics, thermodynamics, fluid dynamics, electromagnetism, and quantum mechanics. Today, physics and some
branches of mathematics are obligate mutualists: unable to exist without one another. This history of the growth of
this obligate association is evident in the types of problems that pervade modern calculus textbooks and contribute
to the canonical lower division mathematics curricula offered at educational institutions around the world.
The age-of-biology is most readily identified with two seminal events: the publication of Charles Darwin’s, On
The Origin of Species, in 1859; and, almost 100 years later, Francis Crick and James Watson’s discovery in 1953 of
the genetic code. About mathematics, Darwin stated
I have deeply regretted that I did not proceed far enough at least to understand something of the great
leading principles of mathematics; for men thus endowed seem to have an extra sense.
Despite Darwin’s assertion, mathematics was not as important in the initial growth of biology as it was in physics.
However, in the past decades, dramatic advances in biological understanding and experimental techniques have
unveiled complex networks of interacting components and have yielded vast data sets. To extract meaningful patterns
from these complexities, mathematical methods applied to the study of such patterns is going to be crucial to the
maturation of many fields of biology. Its role, however, will be more computational than analytical. Mathematics will
function as a tool to dissect out the complexities inherent in biological systems rather than be used to encapsulated
physical theories through elegant mathematical equations.
The reason that mathematics will ultimately play a different type of role in the age-of-biology than it did in
the age-of-physics is largely due to the units of analysis in biology being extraordinarily more complex than those
of physics. The difference between an ideal billiard ball and a real billiard ball or an ideal beam and a real beam
completely pales in comparison with the difference between an ideal and a real salmonella bacterium, let alone an
ideal and a real elephant. Biology, unlike physics, has no axiomatic laws that provide a precise and coherent theory
upon which to build powerful predictive models. The closest biology comes to this ideal is in the theory of enzyme
kinetics associated with the simplest cellular processes and the theory of population genetics that only works for a
small handful of discrete, environmentally insensitive, individual traits determined by the particular alleles occupying
discrete identifiable genetic loci. Eye color in humans provides one such example.
This complexity in biology means that accurate theories are much more detailed than in physics, and precise
predictions, if possible at all, are much more computationally demanding than comparable precision in physics. Only
with the advent of extremely powerful computers can we begin to aspire to solve the problems of how a string of
peptides folds into an enzyme with predicted catalytic properties, to understand how a neuropil structure in the brain
of some animal recognizes a sound, a smell, or the shape of an object, or to predict how the species composition of a
lake will change with an influx of heat, pesticides, or fertilizer. On the other hand, predictions regarding the response
of larger systems consisting of communities of individuals or whole ecosystems to external perturbations often cannot
be tested without irreversibly damaging an irreplaceable or unique system. Hence, mathematical models provide a
powerful tool to explore the potential effects of these perturbations.
©2008 Schreiber, Smith & Getz 3

4
It is critical that all biologists involved in modeling are properly trained to understand the meaning of output from
models and to have a proper perspective on the limitations of the models themselves. Just as we would not allow a
butcher with a fine set of scalpels to perform exploratory surgery for cancer in a human being, so we should be wary
of allowing biologists poorly trained in the mathematical sciences to use powerful simulation software to analyze the
behavior of biological systems. If, for example, an environmental impact analysis is dramatically wrong in predicting
how a lake will respond to an influx of heat coming from a power plant to be located on its shores, then the flora
and fauna in the lake and on its surrounding shores could end up being degraded to the point where the recreational
value of the lake is destroyed. Consequently, the time has come for all biologists, who are interested in more than
just the natural history of their subject, to obtain a sufficiently rigorous grounding in mathematics and modeling
so that they can appropriately interpret models with an awareness of their meaning and limitations. Reflecting this
view, in a news release of the National Institute of General Medical Sciences (NIGMS), Dr. Judith H. Greenberg,
acting director of NIGMS states: “Advances in biomedical research in the 21st century will be critically dependent
on collaboration between biologists and scientists in other disciplines, such as mathematics.”∗ And NIGMS, along
with the National Science Foundation (NSF), intends to “put their money where their mouth is” because these
organizations anticipate spending more than $24 million to “encourage the use of mathematical tools and approaches
to study biology.”
About this Book

In training biologists to be scientists, it is no longer adequate for them to study either an engineering calculus or
a “watered-down” version of the calculus. The application of mathematics to biology has progressed sufficiently
far in the last two decades and mathematical modeling is sufficiently ubiquitous in biology to justify an overhaul
of how mathematics is taught to students in the life sciences. In a recent article “Math and Biology: Careers at
the Interface,”∗ the authors state, “Today a biology department or research medical school without ‘theoreticians’
is almost unthinkable. Biology departments at research universities and medical schools routinely carry out inter-
disciplinary projects that involve computer scientists, mathematicians, physicists, statisticians, and computational
scientists. And mathematics departments frequently engage professors whose main expertise is in the analysis of
biological problems.”In other words, mathematics and biology departments at universities and colleges around the
world can no longer afford to build separate educational empires, but instead need to provide coordinated training
for students wishing to experience and ultimately contribute to the explosion of quantitatively rigorous research in
ecology, epidemiology, genetics, immunology, physiology, and molecular and cellular biology. To meet this need,
interdisciplinary courses are becoming more common at both large and small universities and colleges.
In this text, we present material to cover one year of calculus, which, when combined with a statistics course,
will make students conversant in the use of mathematics in the natural sciences and to inspire them to take further
courses in mathematics. In particular, the book can be viewed as a gateway to the exciting interface of mathematics
and biology. As a calculus based introduction to this interface, the main goals of this book are
• to provide students with a thorough grounding in calculus’ concepts and applications, analytical techniques,
and numerical methods.
• to have students understand how, when, and why calculus can be used to model biological phenomena.
To achieve these goals, the book has several important features.
Features
First, and foremost, every topic is motivated by a significant biological application several of which appear in no other
texts. These topics include CO2 build-up at the Mauna Loa observatory in Hawaii, scaling of metabolic rates with
body size, enzyme activity in response to temperature, optimal harvesting in patchy environments, developmental
rates and degree days, sudden population disappearances, stooping peregrine falcons, drug infusion, measuring
cardiac output, in vivo HIV dynamics, and mechanisms of memory formation. Many of these examples involve real
world data and whenever possible, we use these examples to motivate and develop formal definitions, procedures,
and theorems. Since students learn by doing, every section ends with a set of applied problems that expose them to
∗ Press release, of the National Institutes of Health, Alisa Zapp Machalek, August 22, 2002.
∗ “Math and Biology: Careers at the Interface,” Jim Austin and Carlos Castillo-Chavez, Science, February 6, 2004.

5
additional applications as well as further developing applications presented within the text. These applied problems
are always preceded by a set of drill problems designed to provide students with the practice they need to master
the methods and concepts that underlie many of the applied problems.
Second, for more in depth applications, each chapter will include at least two projects which can be used for
individual or group work. These projects will be diverse in scope ranging from a study of enzyme kinetics to the
heart rates in mammals to disease outbreaks.
Third, sequences, difference equations, and their applications are interwoven at the sectional level in the first four
chapters. We include sequences in the first half of the book for three reasons. The first reason is that difference
equations are a fundamental tool in modeling and give rise to a variety of exciting applications (e.g. population
genetics), mathematical phenomena (e.g. chaos) and numerical methods (i.e. Newton’s method and Euler’s method).
Hence, students get exposed to discrete dynamical models in the first half of the book and continuous dynamical
models in the second half of the book. The second reason is that two of the most important concepts, limits and
derivatives, provide fundamental ways to explore the behavior of difference equations (e.g., using limits to explore
asymptotic behavior and derivatives to linearize equilibria). The third reason is that integrals are defined as limits
of sequences. Consequently, it only makes sense to present sequences before one discusses integrals. The material
on sequences is placed in clearly marked sections so that instructors wishing to teach this topic during the second
semester can do so easily.
Fourth, we introduce two topics, bifurcation diagrams and life history tables, that are not covered by other
calculus books. Bifurcation diagrams for univariate differential equations are a conceptually rich yet accessible topic.
They provide an opportunity to illustrate that small parameter changes can have large dynamical effects. Life history
tables provide students with an introduction to age structured populations and the net reproductive number R0 of
a population or a disease.
Fifth, throughout the text are problems described as Historical Quest. These problems are not just historical
notes to help one see mathematics and biology as living and breathing disciplines, but are designed to involve the
student in the quest of pursing some great ideas in the history of science. Yes, they will give some interesting history,
but then lead one on a quest which should be interesting for those willing to pursue the challenge they offer.
Sixth, throughout the book, concepts are presented visually, numerically, algebraically, and verbally. By pre-
senting these different perspectives, we hope to enhance as well as reinforce the students understanding of and
appreciation for the main ideas.
Seventh, we include well-developed review sections at the end of each chapter that contain lists of definitions,
important ideas, important applications, as well as review questions.
Content
Chapter 1: This chapter begins with a brief overview of the role of modeling in the life sciences. It then focuses
on reviewing fundamental concepts from precalculus and probability. While many of the precalculus concepts are
familiar, the emphasis on modeling and verbal, numerical and visual representations of concepts will be new to many
students. Basic probability concepts are introduced because they play a fundamental role in many biological models.
This chapter also includes an introduction to sequences through an emphasis on elementary difference equations.
Chapter 2: In this chapter, the concepts of limits, continuity, and asymptotic behavior at infinity are first
discussed. The notion of a derivative at a point is defined and its interpretation as a tangent line to a function is
discussed. The idea of differentiability of functions and the realization of the derivative as a function itself are then
explored. Examples and problems focus on investigating the meaning of a derivative in a variety of contexts.
Chapter 3: In this chapter, the basic rules of differentiation are first developed for polynomials and exponentials.
The product and quotient rules are then covered, followed by the chain rule and the concept of implicit differentiation.
Derivatives for the trigonometric functions are explored and biological examples are developed throughout. The
chapter concludes with sections on linear approximation (including sensitivity analysis), higher order derivatives and
l’Hôpital’s rule.
Chapter 4: In chapter 4, we complete our introduction to differential calculus by demonstrating its application
to curve sketching, optimization, and analysis of the stability of dynamic processes described through the use of
derivatives. Applications include canonical problems in physiology, behavior, ecology, and resource economics.
Chapter 5: This chapter begins by motivating integration as the inverse of differentiation and in the process
introduces the concept of differential equations and their solution through the construction of slope fields. The
concept of the integral as an “area under a curve” and net change is then discussed and motivates the definition of an

6
integral as the limit of Riemann sums. The concept of the definite integral is developed as a precursor to presenting
The Fundamental Theorem of Calculus. Integration by substitution, by parts, and through the use of partial fractions
are discussed with a particular focus on biological applications. The chapter concludes with a section on numerical
integration and a final section on additional applications including estimation of cardiac output, survival-renewal
processes, and work as measured by energy output.
Chapter 6: In this chapter we provide a comprehensive introduction to univariate differential equations. Qualita-
tive, numerical, and analytic approaches are covered and a modelling theme unites all sections. Students are exposed
via phase line diagrams, classification of equilibria, and bifurcation diagrams to the modern approach of studying
differential equations. Applications to in vivo HIV dynamics, population collapse, evolutionary games, continuous
drug infusion, and memory formation are presented.
Chapter 7: In this chapter we introduce applications of integration to probability. Probability density functions
are motivated by approximating histograms of real world data sets. Improper integration is presented and used as
a tool to computes expectations and variances. Distributions covered in the context of describing real world data
include the uniform, Pareto, exponential, logistic, normal, and log normal distributions. The chapter concludes with
a section on life history tables and the net reproductive number of an age-structured population.
Supplemetary Material
To be added later.
Acknowledgements
To be added later.

Contents
1 Modeling with Functions 3

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Real Numbers and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.3 Data fitting with Linear and Periodic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.4 Power Functions and Scaling Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
1.5 Exponentials and Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
1.6 Function Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
1.7 Sequences and Difference Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
1.8 Summary and Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
1.9 Group Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
2 Limits and Derivatives 141

2.1 Rates of Change and Tangent Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
2.2 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
2.3 Limit Laws and Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
2.4 To Infinity and Beyond . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
2.5 Sequential Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
2.6 The Derivative at a Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
2.7 Derivatives as Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
2.9 Group Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
3 Derivative Rules and Tools 261

3.1 Derivatives of Polynomials and Exponentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
3.2 Product and Quotient Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
3.3 Chain Rule and Implicit Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
3.4 Trigonometric Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
3.5 Linear Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
3.6 Higher-Order Derivatives and Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
3.7 l’Hôpital’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
3.9 Group Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
4 Applications of Differentiation 353

4.1 Graphing with Gusto . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
4.2 Getting Extreme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
4.3 Optimization in Biology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
4.4 Applications to Optimal Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
4.5 Linearization and Difference Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
4.7 Group Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429

2 CONTENTS
5 Integration 433
5.1 Antiderivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
5.2 Accumulated Change and Area under a Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
5.3 The Definite Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
5.4 The Fundamental Theorem of Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
5.5 Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487
5.6 Integration by Parts and Partial Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
5.7 Numerical Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
5.8 Applications of Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
5.10 Group Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
6 Differential Equations 547

6.1 A Modeling Introduction to Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
6.2 Separable Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562
6.3 Linear Models in Biology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572
6.4 Slope Fields and Euler’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584
6.5 Phase Lines and Classifying Equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
6.6 Bifurcations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616
6.8 Group Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633
7 Probabilistic Applications of Integration 637

7.1 Histograms, PDFs and CDFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
7.2 Improper Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659
7.3 Mean and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674
7.4 Bell-shaped distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691
7.5 Life tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
7.7 Group Research Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731

Chapter 1
Modeling with Functions
1.1 Introduction, p. 5
1.2 Real Numbers and Functions, p. 17
1.3 Data Fitting with Linear and Periodic Functions, p. 40
1.4 Power Functions and Scaling Laws, p. 57
1.5 Exponentials and Logarithms, p. 71
1.6 Function Building, p. 87
1.7 Sequences and Difference Equations, p. 107
1.9 Summary and Review, p. 128
Figure 1.1: The humpback whale (Megaptera novaeangliae) is found in all the world’s oceans. They are known for
the complex “songs” which last 10-20 minutes. (See Problem 29, Section 1.5)
PREVIEW
The interface between mathematics and biology presents challenges and opportunities for both mathematicians and biol-
ogists. Unique opportunities for research have surfaced within the last ten to twenty years, both because of the explosion
of biological data with the advent of new technologies and because of the availability of advanced and powerful computers
that can organize the plethora of data. For biology, the possibilities range from the level of the cell and molecule to the

4
biosphere. For mathematics, the potential is great in traditional applied areas such as statistics and differential equations,
as well as in such non-traditional areas as knot theory.
..
.
These challenges: aggregation of components to elucidate the behavior of ensembles, integration across scales, and inverse
problems, are basic to all sciences, and a variety of techniques exist to deal with them and to begin to solve the biological
problems that generate them. However, the uniqueness of biological systems, shaped by evolutionary forces, will pose new
difficulties, mandate new perspectives, and led to the development of new mathematics. The excitement of this area of
science is already evident, and is sure to grow in the years to come. -Executive Summary from a NSF-Sponsored Workshop
Led by Simon Levin (1990)
The above quotation is as true today as when it was written. It provides a hint of the exciting opportunities
that exists at the interface of mathematics and biology. The goal of this course is to provide you with a strong
grounding in calculus while, at the same time introducing you to various research areas of mathematical biology
and inspiring you to take more courses at this interdisciplinary interface. In this chapter, we will set the tone for
the entire book and will provide you with some of the skills you will need to work at this interface. As the title of
the chapter suggests, we introduce you to modeling with mathematical functions. In the first section, the idea of
mathematical modeling is introduced. In the next five sections, we remind you of the mathematical concepts that
will be important to you as you make your journey through this book. Throughout the book you will find real life
problems that can be solved using mathematics. For example, the decline of whales is a serious problem that we
inherited from the whaling activities of the past two centuries. The International Whaling Commission in 1966 gave
the humpback whale worldwide protection status, but their population today is only about 30-35% of their estimated
original population levels. In the last problem in this chapter, we use a model to explore the densities we can expect
a whale population to recover to after harvesting individuals in the population has ceased.

1.1. INTRODUCTION 5
1.1 Introduction
Models come in many guises: architects make models of buildings prior to construction that are either small scale
replicas or, more recently, in the form of visual images using computer-aided design packages; politicians, through
debate and discussion, create verbal and written models that simulate the potential outcomes of a proposed policy;
artists make sketches and small-scale sculptures prior to starting a large-scale project; flight simulators allow people
to gain skills in piloting without the dangers associated with flying; words such as “tree” are models of the real
objects they represent. For scientists in many disciplines (e.g., physics, biology, economics, chemistry, sociology, and
even psychology), mathematical models are used to investigate many important phenomena.
Figure 1.2: Earth as seen from Apollo 11. Mathematics has been used to model many biological systems on earth
ranging from global climatic processes to viral dynamics.
Real world problems inspired the creation of quantitative tools to grapple with their complexity. The counting
and division of flocks led to the foundations of number theory. The measurement and division of land resulted in
the development of geometry. Understanding the motion of the planets and the forces of electricity, magnetism,
and gravity resulted in the development of calculus. More recently, understanding the dynamics of population
growth and population genetics led to many of the basic topics in stochastic processes. The immense success
of mathematical models in understanding physical processes led to E. P. Wigner writing his famous essay “The
Unreasonable Effectiveness of Mathematics in the Natural Sciences”∗ in which he states:
The miracle of the appropriateness of the language of mathematics for the formulation of the laws of
physics is a wonderful gift which we neither understand nor deserve. We should be grateful for it, and
hope that it will remain valid in future research and that it will extend, for better or for worse, to our
pleasure even though perhaps also to our bafflement, to wide branches of learning.
As highlighted in the NSF-Sponsored Executive Summary quotation at the opening of this chapter, one of the
areas mathematics has extended most rapidly is the biological sciences. The importance of this mathematics-biology
interface is threefold. First, in the past century, field and laboratory experiments have generated vast amounts
of data. To make this data meaningful requires extracting patterns within in the data (e.g. correlations between
variables, clustering, etc.). Mathematics, which is the study of patterns (e.g. numerical, geometric, etc.), provides
a powerful methodology to extract these patterns. This power of mathematics is reflected in the following quote of
one of the founders of calculus, Sir Isaac Newton (1642–1727):
∗ Communications in Pure and Applied Mathematics 13:1–14 1960

6 1.1. INTRODUCTION
The latest authors, like the most ancient, strove to subordinate the phenomena of nature to the laws of
mathematics.
Second, mathematics is a language that permits the precise formulations of assumptions and hypotheses. In the
words of another founding father of calculus, Gottfried Wilhem Leibniz (1646–1716):
In symbols one observes an advantage in discovery which is greatest when they express the exact nature
of a thing briefly and, as it were, picture it; then indeed the labor of thought is wonderfully diminished.
Third, mathematics provides a logical coherent framework to deduce the implications of one’s assumptions.
Reflecting these roles of mathematics in biology, one of the goals of this book is to help you understand how, when
and why calculus can be used to model biological phenomena. To achieve this goal, you will be expected sometimes to
develop simple models, to understand more complicated models sufficiently well to slightly modify them, to determine
the appropriate techniques to analyze the models (e.g. numerical vs. analytic, stability vs. bifurcation analysis,
etc.), and to interpret the results of your analysis. Examples of biological phenomena that we will encounter include
epidemic outbreaks, blood flow, population extinctions, tumor regrowth after chemotherapy, population genetics,
regulatory genetic networks, mechanisms for memory formation, enzyme kinetics and evolutionary games. The
second goal of this book is to provide you with a thorough grounding in calculus’ concepts and applications, analytical
techniques, and numerical methods. In the remainder of this section, we briefly provide a preview of different types
of models that you will encounter in this text, discuss the process of modeling, and then give you a brief glimpse at
calculus by answering the question “What is Calculus?”
What is Mathematical Modeling?

A real-life situation is usually far too complicated to be precisely and mathematically defined. When confronted with
a problem in the real world, therefore, it is usually necessary to develop a mathematical framework based on certain
assumptions about the real world. This framework can then be used to find a solution to a problem that, hopefully,
will tell us something about the real world. The process of developing this body of mathematics is referred to as
mathematical modeling.
What, precisely, is a mathematical model? It is an abstract description of a real-life problem that does not have
an obvious solution. The first step involves abstraction in which certain assumptions about the real world are made,
variables are defined, and appropriate mathematical expressions are developed.
In this text, we will discuss modeling biological systems. Consequently, as we progress through this book, we
will spend some time identifying the features associated with molecular, physiological, behavioral, life history, and
population-level processes of many species and situations. After abstraction, the next step in the modeling process
is to simplify the mathematics or derive related mathematical facts from the mathematical model.

1.1. INTRODUCTION 7
The results derived from the mathematical model should led us to some predictions about the real world. The next
step is to gather data from the situation being modeled and then to compare those data with the predictions. If the
two do not agree, then the gathered data are used to modify the assumptions underlying the model, and the process
repeats.
In this first chapter, we introduce some modeling concepts while reviewing some basic mathematical concepts such
as real numbers and functions including linear, periodic, power, exponential, and logarithmic functions. Using these
functions, we model the cyclic rise of carbon dioxide concentrations in the atmosphere, the dangers facing large
versus small organisms, population growth, and binding of receptor molecules. We also introduce the basic notions
of sequences and difference equations. Using these constructs, we encounter the dynamics of chaotic populations,
drug delivery, and population genetics. We then develop the ideas of differential and integral calculus, and along the
way, build the necessary skills in biological modeling.
What is Calculus?
Very likely, you have enrolled in a course that requires that you use this book. If you have looked at the preface,
you will see that the intended audience are students who wish to learn about calculus and biology. You might
think of calculus as the culmination of all of your mathematical studies. To a certain extent that is true, but it
is also the beginning of your study of mathematics as it applies to the real world around us. All your prior work
in mathematics is considered elementary. With calculus you cross the dividing line between using elementary and
advanced mathematical tools for studying a variety of applied topics. It is the mathematics of motion and change
over time and space.
What distinguishes calculus from your previous mathematics courses of algebra, geometry, and trigonometry is
the transition from discrete static applications to those that are dynamic and often continuous. For example, in
elementary mathematics you considered the slope of a line, but in calculus we define the (non-constant) slope of
a nonlinear curve. In elementary mathematics you found average changes in quantities such as the position and
velocity of a moving object, but in calculus we can find instantaneous changes in the same quantities. In elementary
mathematics you found the average of a finite collection of numbers, but in calculus we can find the average value of
a function with infinitely many values over an interval.
The development of calculus in the seventeenth century by Newton and Leibniz was the result of their attempt
to answer some fundamental questions about the world and the way things work. These investigations led to two
fundamental concepts of calculus — namely, the idea of a derivative which deals with rates of change and that of
an integral which deals with accumulated change. The breakthrough in the development of these concepts was the
formulation of a mathematical tool called a limit.
1. Limit The limit is a mathematical tool for studying the tendency of a function as its variable approaches some
value.
2. Derivative The derivative is defined as a limit, and it is used initially to compute rates of change and slopes
of tangent lines to curves. The study of derivatives is called differential calculus. Derivatives can be used in
sketching graphs and in finding the extreme (largest and smallest) values of functions. Biologists use derivatives

8 1.1. INTRODUCTION
to calculate, for example, the rates of growth of individuals, of populations, of the spread of disease within
populations, of changes to the physiological state of individuals, or changes to the biochemical states of cells
within individuals.
3. Integral The integral is found by taking a special limit of a sum of terms, and it is used initially to compute
the accumulation of change. The study of this process is called integral calculus. Area, volume, work, and
degree days are a few of the many quantities that can be expressed as integrals. Biologists can use integrals to
calculate, for example, the amount of fat bears store before going into hibernation, the time it takes an insect
to develop from an egg into an adult as a function of temperature, the probability that an individual will die
before a certain age, or the average number of people infected by an infectious person.
Let us begin our study by taking an intuitive look at each of these three essential ideas of calculus.
The Limit
Zeno (ca. 500 B.C.) was a Greek philosopher who is known primarily for his famous paradoxes. One of these
concerns a race between Achilles, a legendary Greek hero, and a tortoise. When the race begins, the (slower) tortoise
is given a head start, as shown in Figure 1.3
Figure 1.3: Achilles and the tortoise
It is possible for Achilles to overtake the tortoise? Zeno pointed out that by the time Achilles reaches the tortoise’s
starting point, a1 = t0 the tortoise will have moved ahead to a new point t1 When Achilles gets to this next point, a2 ,
the tortoise will be at a new point t2 . The tortoise, even though much slower than Achilles, keeps moving forward.
Although the distance between Achilles and the tortoise is getting smaller and smaller the tortoise will apparently
always be ahead.
Of course, common sense tells us that Achilles will overtake the slow tortoise, but where is the error in this
reasoning? The error is in the assumption that an infinite amount of time is required to cover a distance divided
into an infinite number of segments. This discussion is getting at an essential idea in calculus, the notion of a limit.
Consider the successive positions for both Achilles and the tortoise:
Achilles: a0 , a1 , a2 , a3 , a4 , · · ·
Tortoise: t0 , t1 , t2 , t3 , t4 , · · ·
After the start, the positions for Achilles, as well as those for the tortoise, form sets of positions that are ordered
by the counting numbers. Such ordered listings are called sequences which we introduce in Section 1.7.
Example 1. Sequences: an intuitive preview
The sequence
1 2 3 4
, , , ,···
2 3 4 5
n n
can be described by writing a general term: n+1 where n = 1, 2, 3, 4, · · · . Can you guess the value L that n+1
approaches as n gets large? This value is called the limit of the sequence.

1.1. INTRODUCTION 9
n
Solution. We say that L is the number that n+1 tends toward as n becomes large without bound. We will define
a notation to summarize this idea:
n
L = lim
n→∞ n+1
As you consider larger and larger values for n you find a sequence of fractions:
1 2 3 1, 000 1, 001 9, 999, 999

, , ,···, , ,···, ,···
2 3 4 1, 001 1, 002 10, 000, 000
It is reasonable to guess that the sequence of fractions is approaching the number 1. 2
The Derivative: Rates of Change

The derivative provides information about the rate of change over small intervals (in fact, infinitesimally small!)
of time or space. For instance, in trying to understand the role of humans in global climate change, we may be
interested in the rate at which carbon dioxide levels are changing. In Section 1.3 [xref] of this chapter, we show that
it is possible to come up with a function that describes how carbon dioxide levels (in parts per million) vary as a
function of time. The relationship between this function and the data is illustrated in Figure 1.4.
ppm
345
340
335
month
20 40 60 80 100 120 140
Figure 1.4: Carbon dioxide levels (in parts per million) as a function of months after April 1974.
In a scientific discussion about carbon dioxide levels, we might be interested in the rate of change of carbon
dioxide levels at a particular time, say the second month (June 1974) of this data set. To find the rate of change
from the second to tenth month, we could find the change in carbon dioxide levels, 331.8 − 331.0 = 0.8 parts per
million, and divide it by the change in time, 10 − 2 = 8 months, to get the rate of change
331.8 − 331.0
= 0.1 ppm per month
10 − 2
over this eight month period. Note that this is the rate of change corresponds to the slope of the secant line passing
through the points P = (2, 331.0) and Q = (10, 331.8) as illustrated in Figure 1.5a. While this rate of change describes
what happens over the eight month period, it clearly does not describe what is happening right around the second
month. Indeed, during the second month, the carbon dioxide levels are decreasing not increasing. Consequently, we
would expect the rate of change to be negative.
To get the instantaneous rate of change at the beginning of the second month, we can consider moving the
point Q along the curve to the point P . As we do so, the points P and Q define secant lines that appear to approach
a limiting line. This limiting line, as illustrated in Figure 1.5, is called the tangent line. The slope of this line
corresponds to the instantanuous rate of change for carbon dioxide levels at the beginning of the second month of the
data set. Later you will be able to find the exact value of this instantenous rate of change, which is approximately
−1.24 ppm per month. The slope of this limiting line is also known as the derivative at P . The study of the derivative
forms what is called differential calculus.

10 1.1. INTRODUCTION
334 334
333 333
334
Q 334
Q
332 332
333 333
es es
secant lin secant lin
parts per million (ppm)

332 332
parts per mil ion (p m)
parts per mil ion (p m)

331 331
330 330
P 329
328
327
P 329
328
327
326 326
0 2 4 6 8 10 12 0 2 4 6 8 10 12
months months
331 331
334 334 334
334 334 334
333 333
333 333
333 333
332 332
330 330
million (ppm)
331 331
332 332
332 332
(ppm)
(ppm)
million (ppm)
330 330
parts per
329 331 329
331 331 331
329 329
parts per million

per million
328 328
330 330
330 330
327 327
parts per
tan
326
0 2 4 6 8 10 12 329 326
329 329 3290 2 4 6 8 10 12
328 328
ge
parts
months months
nt
328 328
328 328
line
327 327
327 327
327 327
326 326
326 326
0 0 02 0 42 2 6 4 8 42 106 12 6 8 4 8
10 12106 12 8 10 12
months monthsmonths months
326 326
0 2 4 6 8 10 12 0 2 4 6 8 10 12
months months
Secant line whose slope is a rate of change Limit of secant lines is the tangent line.
Figure 1.5: The Tangent Line
Figure 1.6: Incidence rate of the 1999 outbreak of measles in the Netherlands.
Integration: Accumulated Change

The integral deals with accumulated change over intervals of time or space. For instance, consider the 1999 outbreak
of measles in Netherlands. During this outbreak, scientists collected information about the incidence rate: the
number of reported new cases of measles per day. How this incidence rate varies over the course of the measles
outbreak is shown in Figure 1.6. To find the the total number of cases of measles during the outbreak, we want to
find the area under this incidence “curve.” Indeed each rectangle in the left hand side of this figure has a base of
width “one day” and a height with units of measles per day. Hence, the area of each of these rectangles corresponds
to the number of measle cases in one day. Summing up the area of these rectangles gives us the total number of
measles cases during the outbreak. To get a rough estimate of this accumulated change, we can approximate the
area under the incidence curve using rectangles as illustrated in the right hand side of Figure 1.6. Computing these
areas yields an estimate of
8 · 0 + 11 · 25 + 6 · 80 + 3 · 200 + 5 · 250 + 3 · 125 + 3 · 50 = 3, 130 cases of measles
The actual number of reported cases was 3, 292. Hence, our back of the envelope estimate was pretty good.
Integrals are a refined version of the calculation that we just made. Given any curve (e.g. incidence function) as
illustrated in Figure 1.7, we can approximate the area by using rectangles. If An is the area of the nth rectangle,

1.1. INTRODUCTION 11
Figure 1.7: Area under a curve
then the total area can be approximated by finding the sum

A1 + A2 + A3 + · · · + An−1 + An
This process is shown in Figure 1.8. To get better estimates of the area, we use more rectangles with smaller bases.
The limit of this process leads to the definite integral, the key concept for integral calculus.
a. 8 approximating rectangles b. 16 approximating rectangles
Figure 1.8: Approximating the area using circumscribed rectangles
Problem Set 1.1

LEVEL 1 – DRILL PROBLEMS
1. This book begins with a discussion of the word models. In fact, the first word of the text is this word. Even
though we devote an entire chapter to modeling, we do not define the word. Look at least three different
sources giving a definition and then write a few paragraphs discussing your understanding of this word.
2. Consider the sequence 0.3, 0.33, 0.333, 0.3333, · · ·. If this pattern continues, what do you think is the appropriate
limit of this sequence?

3. Consider the sequence 5, 5.5, 5.55, 5.555, 5.5555, · · ·. If this pattern continues, what do you think is the appro-
priate limit of this sequence?
6. Consider the sequence 3, 3.1, 3.14, 3.1415, · · ·. What do you think is the appropriate limit of this sequence?
7. Consider the sequence 1, 0, 1, 0, 1, . . .. If the pattern continues, do you think that this sequence has a limit?
8. Consider the sequence 1, −0.5, 0.25, −0.125, . . .. If the pattern continues, what do you think is the appropriate
limit for the sequence?
Copy the figures in Problems 9 to 14 on your paper. Draw what you think is an appropriate tangent line for each
curve at the point P .
9.
10.
11.

12.
13.
14.
In Problems 15 to 20, guess the requested limits.
15.
2n
L = lim
n→∞ n+4
16.
2n
L = lim
n→∞ 3n + 1
17.
n+1
L = lim
n→∞ n+2
18.
n+1
L = lim
n→∞ 2n
19.
3n
L = lim
n→∞ n2 +2
20.
3n2 + 1
L = lim
n→∞ 2n2 − 1
Estimate the area in each figure shown in Problems 21 to 26.

21.
22.
23.

24.
25.
26.
LEVEL 2 – APPLIED PROBLEMS AND THEORY
27. What is a mathematical model?
28. Why are mathematical models necessary or useful?
29. An analogy to Zeno’s tortoise paradox can be made as follows.
A woman standing in a room cannot walk to a wall. To do so, she would first half the distance, then
half the remaining distance, and then again half of what still remains. This process can always be
continued and can never be ended.

Draw an appropriate figure for this problem and then present an argument using sequences to show that the
woman will, indeed, reach the wall.
30. Zeno’s paradoxes remind us of an argument that might lead to an absurd conclusion:
Suppose I am playing baseball and decide to steal second base. To run from first to second base, I
must first go half the distance, then half the remaining distance, and then again half of what remains.
This process is continued so that I never reach second base. Therefore, it is pointless to steal the
base.
Draw an appropriate figure for this problem and then present a mathematical argument using sequences to
show that the conclusion is absurd.
31. In this section we mentioned techniques to analyze a model. Do some research (via the Internet or library) to
briefly distinguish the given pairs.
a. numerical vs. analytic
b. stability vs. bifurcation analysis
∗
32. HISTORICAL QUEST
Isaac Newton Gottfried Leibniz

(1642-1727) (1646-1716)
The invention of calculus in the late 1600s is credited jointly to Isaac Newton and Gottfried Leibniz. Each
worked independently, and arrived at similar conclusions. At the time of their respective publications (around
1685), there was a bitter controversy throughout Europe as to whose work had been done first, along with
accusations that each stole the idea from the other. Part of the explanation for this is the fact that each had
done his work earlier, and another part can be attributed to the rivalry between mathematicians in England
who championed Newton and those in Europe who supported Leibniz. It is thought that Newton’s discoveries
were made earlier, but Leibniz’ was the first to be published. The fact is, however, that the intellectual climate
for the invention of calculus was ripe and inevitable.
For this first Historical Quest you are do some research to answer the following questions.
a. Since Newton and Leibniz’ each developed calculus independent of the other, they each invented their
own notation for the basic concepts of derivative and integral. Who is consider to have invented the
most efficient notation?
b. Neither Newton nor Leibniz invented the limit notation we introduced in this section. In fact, the
limit definition of the derivative mentioned in this section was not presented until much later. To
whom do we attribute the first use of the limit symbol?
∗ Throughout the text, you find problems called Historical Quest. These problems are not just historical notes to help you see
mathematics and biology as living/breathing disciplines, but are designed to involve you in the quest of pursuing some great ideas in the
history of science. Yes, they will give you some interesting history, but will then lead you on a quest which you should find interesting.

1.2. REAL NUMBERS AND FUNCTIONS 17
1.2 Real Numbers and Functions

You many have had a medical test in which an electrocardiograph, as shown in Figure 1.9, was used to check
whether or not your heart was beating normally. In order to analyze graphs such as this, we need to seek unifying
ideas relating graphs, data, tables, and equations. The mathematical concept that unifies these elements is the notion
of a real-valued function, which as at the core of the development of both differential and integral calculus.
Figure 1.9: Portion of an electrocardiograph
In this section, we discuss real numbers, functions, and basic properties of functions.
Real Numbers
Number systems arose historically to answer a need to count and keeping exact records of land, property, and available
resources. Mesopotamian sheep herders kept records of the number of sheep in their herd by dropping pebbles into
a jar. This counting leads to the natural numbers:
N = {1, 2, 3, ...}
Adding zero to the natural numbers gives the set of whole numbers. It took human civilization much longer than
we might think to “invent” zero because, in a sense, it is unnatural. For example, the great logician, Alfred North
Whitehead (1861-1947) wrote
The point about zero is that we do not need to use it in the operations of daily life. No one goes out to
buy zero fish. It is in a way the most civilized of all the cardinals, and its use is only forced on us by the
needs of cultivated modes of thought.
If you want to know more about the history of the number zero, it is well worth reading Charles Seife’s book Zero:
The Biography of a Dangerous Idea ∗ .
Negative numbers come much later; by some accounts, they first appeared in India and China around the seventh
century, in the writings of the Indian mathematician, Brahmagupta, who also gave rules for dividing numbers by
each other. Prior to this, the ancient mathematicians concluded the negative solutions to equations had no meaning.
Adding the negative numbers to the whole numbers provides us with a set of numbers we call the integers:
Z = {. . . , −2, −1, 0, 1, 2, 3, . . .}
Ancient Egyptian surveyors, however, were well aware of fractional numbers in their calculations and measurements
of the amount of land owned by subjects for the purposes of calculating land taxes. The set of all positive and
∗ Published by Viking, 1999

18 1.2. REAL NUMBERS AND FUNCTIONS
negative fractions is called the rational numbers:

p
Q={ : p, q are integers, q 6= 0}
q
Rational numbers are extremely useful for the measurements of “continuous” traits such as weight, height,
humidity, and temperature, which are often measured by counting. For instance, we measure lengths by counting
the number of marked intervals (e.g. inches, centimeters) on a tape measure. By subdividing these intervals into
smaller and smaller fractions, we obtain more and more accurate measurements. We might expect that if we allow
for all possible fractional divisions, then we can measure the precise length of anything. It came as a shock to the
Greeks that this expectation is wrong! For instance, the Greeks proved that the length of the diagonal of a unit
square (i.e. sides of length one) cannot be expressed as a rational number (see the HISTORICAL QUEST in the problem
set). Because this length corresponds to a number that √ cannot be found in the set of rational numbers, it is called
irrational (not rational). It is denoted by the symbol 2 and its value can be approximated as precisely as we want
by bounding it above and below by sequences of rational numbers that approach it in the limit! Intuitively, if we
have a ruler with all fractional divisions, we can measure arbitrarily close approximations of the length.
To deal with irrational numbers, mathematicians extended the rational numbers to a larger set of numbers that
we call the real numbers R. One can think of the real numbers as living on the edge of an infinitely long ruler
with demarcations at all powers of ten. A real number is a point on this line and can be represented in a decimal
form with its integer part before the decimal and tenths, hundredths, thousandths, ten thousandths, etc. after the
decimal. What makes the real numbers somewhat mysterious and mathematically delicate is that these decimal
representations may never terminate. Rational numbers on this line have decimal representations that terminate or
repeat, while the irrational numbers have decimal representations that do not terminate. For example, 21 = 0.5 has
a terminating decimal and, hence, is a rational number. Alternatively, 31 = 0.333 . . . has a repeating decimal and,
consequently, is a rational number. However, π = 3.141592 . . . has a decimal representation that does not terminate
or repeat and, consequently, is an irrational number. A proof of this fact, unfortunately, is outside the scope of this
book.
Intervals of real numbers arise so frequently in calculus, it is worthwhile giving them special names and notations.
An open interval from a to b is denoted
(a, b) = {x : a < x < b}
Notice that this interval includes all the real numbers between a and b but doesn’t include a or b themselves. A
closed interval from a to b is denoted
[a, b] = {x : a ≤ x ≤ b}
Unlike an open interval, a closed interval includes the end points. In addition to these finite intervals, we are often
interested in infinite intervals. These are intervals where either the right-hand side of the interval extends infinitely
far in the positive direction or the left-hand-side extends infinitely far in the negative direction, or both. In the first
case, to denote this situation, we use the symbol ∞ on the right hand side of the interval and in the second case we
use the symbol −∞ on the left hand side of the interval, as follows:
(a, ∞) = {x : x > a}, [a, ∞) = {x : x ≥ a}, (−∞, b) = {x : x < b}, and(−∞, b] = {x : x ≤ b}
The typical graphical depictions of these intervals on the real line is shown in Figure 1.10.
For infinite intervals, it is important to realize there is no number “∞” or “−∞”. These symbols are only used to
indicate that the numbers in the interval whose magnitudes are arbitrarily large and positive or large and negative,
respectively.
Functions
Biologists, mathematicians, and other researchers study relationships between quantities. For example, an engineer
may need to know how the illumination from a light source is related to the distance to the light source; an environ-
mental scientist may wish to investigate how carbon dioxide levels on the earth vary in time; a physiologist may be
interested in how the metabolic rate of an organism depends on its body mass; an economist may wish to determine
the relationship between consumer demand for a certain commodity and its market price. The mathematical study
of such relationships involves the concept of a function.

Figure 1.10: Graphical representations of intervals.
A function f : X → Y is a rule that assigns to each element x of a set X (called

the domain) a unique element y of a set Y . The element y is called the image
Function
of x under f and is denoted by f (x), read as “f of x”. The set of all images f (x)
for x in X is called the range of f .
A function whose name is f can be thought of as the set of ordered pairs (x, y) for which each member, x, of the
domain is associated with exactly one member y = f (x). The function can also be regarded as: a rule that assigns a
unique “output” in the set Y to each “input” from the set X (Figure 1.11a); a graph (Figure 1.11b); a machine into
which values of x are inserted and, after some internal operations are performed, a unique value f (x) is prodcued
(Figure 1.11c); or even an algebraic equation (Figure 1.11d).
Example 1. Identifying functions
Determine whether the following rules are functions. If one is a function, identify its domain and (if possible) its
range.
a. To the real number r, assign the area of a circle with radius r.

b. To each person in Atlanta, assign their telephone number.
c. To the irrational reals assign the value 1, and to the rational reals, assign the value 0.
d. To each month from May 1974 to Dec 1985, assign the average CO2 concentration measured at the Mauna
Loa observatory of Hawaii. The data is graphed in Figure 1.12.
e. To each (adjusted) income of a single individual, assign the federal tax rate for 2004.
Solution.
a. This function can be expressed algebraically as
Area = πr2
Since a radius of a circle can only be non-negative, the domain of this function is the non-negative reals,
[0, ∞). The range of this function is also [0, ∞).

Figure 1.11: Different representations of a function
b. Assigning telephone numbers to individuals in Atlanta is not a function for two reasons. First, not
everyone has a phone number. For these individuals, no assignment can be made. Second, many people
may have more than one phone number in which case the rule does not specify which of these phone
numbers to associate with such an individual. By appropriately shrinking the domain, this rule does
become a function. For instance, if the domain is restricted to individuals in Atlanta with a single home
phone number, assigning the home phone numbers to these individuals is a function.
c. Assigning 1 to irrationals and 0 to rationals defines a function whose domain is the reals and whose range
is the set {0, 1}. (Note, this function cannot be drawn as a graph. Try drawing it!)
d. Assigning average monthly CO2 concentrations from May 1974 to Dec 1985 is a function whose domain
is the set
{May 1974, June 1974, July 1974, . . . , Mar 1986, April 1985}
Alternatively, if we identify any natural number n with n months after April 1974 until Dec 1985, then
the domain of this function is
{1, 2, 3, . . . , 140}
as there are 12 years with 12 months each. To determine the range, we would have to find the values of
the collected data. These data are illustrated in Figure 1.12 and suggests the range is contained in the

ppm
345
340
335
month
20 40 60 80 100 120 140
Figure 1.12: CO2 (ppm) at the Mauna Loa Observatory
interval [327, 350]. While these data, in themselves, cannot be precisely described by a simple algebraic
formula, we shall see in later sections that they can be well approximated by a simple algebraic formula.
e. Assign each adjusted income for a single individual in 2004, the federal tax rate. Since each adjusted
income for a single has one and only tax rate, this rule, which is described in the tax tables, is a function.
For instance, an adjusted income of greater than $319,100 is assigned a tax rate of 35%.
2
As the preceding example and figure illustrate, functions can be represented in a variety of ways: verbally,
algebraically, numerically, or graphically. Being able to move freely between these representations of a function is a
skill that this book tries to cultivate.
Example 2. From words to algebraic representations
A cylindrical tube for carrying artwork has a length of ℓ meters and a radius of 0.1 meters. The material for the
body of the tube costs a manufacturer $2/m2 and the material for the ends of the tube costs $5/m2 . Write down a
formula in terms of ℓ for the material cost, C, of one tube.
Solution. The area of the top of the tube is given by π (0.1)2 = 0.01π. Hence, the cost of the top and bottom of the
tube is give by $5 · 2 · 0.01π = 0.1π. The area of the body of the tube is given by the length ℓ times the circumference
2π(0.1) = 0.2π of the tube. Thus it is 0.2πℓ and the cost is $0.4πℓ. Therefore, the material cost of one tube is
C = $0.1π + $0.4πℓ = $π(0.1 + 0.4ℓ)

2
In this book, unless otherwise specified, the domain of a function is the set of real numbers for which the function
is a well-defined real number determined by the context of the problem. We call this convention the used domain
1 √
convention. For example, if f (x) = x−2 and g(y) = y, we need x 6= 2 and y ≥ 0, respectively. Alternatively, if n
is the number of people on an elevator, the context requires that n is a whole number.
Example 3. From algebraic expressions to graphs
Find the domain of the following functions:

√
a. y = 1 − x

b. y = √1
1−x
Solution.
a. Because the argument of square roots must be nonnegative whenever we are dealing with real numbers,
the domain consists of x such that 1 − x ≥ 0. Equivalently, x ≤ 1.
b. Because we cannot divide by zero, the domain consists of x such that 1 − x > 0. Equivalently x < 1.
Example 4. From verbal descriptions to graphs
Plants use light energy, in the form of photons, to synthesize glucose from carbon dioxide and water, while
excreting oxygen as a by-product of this process called photosynthesis. Plants then use the sugars to fuel other
processes associated with their maintenance and growth while the oxygen is used by animals and other creatures for
respiration. Thus, photosynthesis is a key process not only for plants but also for animal life on earth!
Let P (t) denote the photosynthetic activity of a leaf as function of t, where t is the number of hours after midnight.
Sketch a rough graph of this function. Assume the sunrise is at 6 AM and the sunset is at 8 PM.
Solution. Noting that there is no photosynthetic activity prior to the sunrise, we have P (t) = 0 for 0 ≤ t ≤ 6. At
sunrise, the photosynthetic activity slowly increases with the availability of light and reaches some maximum during
midday. As the sun begins to set the photosynthetic activity of the plant declines to zero and remains zero for the
rest of the day. The graph of this function is shown in Figure 1.13. 2
Figure 1.13: Sample graph of photosynthetic activity
Example 5. From numerals to graphs and words
Table 1.1 tabulates the estimated number of HIV/AIDS cases diagnosed each year in the USA, from 1999-2002.∗
a. Use this data to draw a graph of the number of cases being diagnosed each day during the period starting
at the beginning of 1999 and ending at the end of 2002 for the age group 25-34. This should be done by
assuming that the average daily rate each year holds at the beginning of the year and then joining these
points by a “continuous” curve i.e. a curve with no jumps or breaks. The concept of continuity will be
made more precise in the next chapter
∗ Survey Report Volume 14 from the Center of Disease Control, Division of HIV/AIDS Prevention.

Table 1.1: Number of Diagnosed cases of HIV/AIDS by year
Age at diagnosis (years) 1999 2000 2001 2002

< 13 187 163 206 162
13 − 14 28 31 33 30
15 − 24 2, 646 2, 803 2, 926 2, 926
25 − 34 7, 817 7, 386 7, 221 7, 338
35 − 44 9, 115 9, 289 9, 119 9, 450
45 − 54 3, 887 4, 212 4, 408 4, 675
55 − 64 1, 112 1, 250 1, 303 1, 450
> 64 382 386 427 432
b. Use this data to draw a graph of the number of new cases diagnosed each day for all age groups.
Solution.
a. Ignoring leaps years, we divide the entries in the fourth row of Table 1.1 by 365 to obtain the number
of cases diagnosed per day during the four tabulated years. For the age group 25-34 they are 21.4, 20.2,
19.8, and 20.1 for years 1999 to 2002 respectively. These rates are referred to as daily incidence rates by
epidemiologist. Plotting these points and connecting them by a continuous curve yields:
b. To determine the total number of cases in each year, we add up the entries in the column which yields the
values 25174, 25520, 25643, and 26463. Dividing each of the values by 365 yields daily rates of 68.9699,
69.9178, 70.2548, and 72.5014. Plotting these points and connecting them by a continuous curve yields:

The figure suggests that the number of cases per day is increasing during this time period.
2
In Example 1 you were asked to identify functions. We extend this question to deciding if a given graph is the
graph of a function. By looking at the definition of a function, we see that its graph has one point for a given element
of the domain. Graphically, this idea can be stated in terms of the following vertical line test.
A set of points in the xy-plane is the graph of a real-valued function if and only if
Vertical Line Test
every vertical line intersects the graph in at most one point.
Example 6. Vertical line test in action
Determine which of the given graphs are the graph of a function.

y
1
y 0.8
2
0.6
1
0.4
x
-1 -0.5 0.5 1
-1 0.2
-2 x
-2 2 4 6 8
a. b.
y y
3 26
24
2
22
1
20
x 18
-1 1 2 3
x
-1 1902 1904 1906 1908 1910
c. d.
Solution.
a. A vertical line (imagine one sweeping from left to right) intersects the curve at two points, where x = −0.5,
for example, as shown below.

Hence, this curve fails the vertical line test and is not the graph of a function. In fact, this curve is an
ellipse given by the set of points that satisfy
y2
x2 + = 1.
4
The upper and lower halves of this ellipse can be described by the pair of functions
p p
y = 2 1 − x2 and y = −2 1 − x2 .
b. This curve does satisfy the vertical line test for all points x, as shown below for x = 1:
In fact, it is the graph of the function y = | sin x|.
c. This set of points is not the graph of a function as the vertical line at x = 1 intersects three points.
d. This set of points is the graph of a function as it passes the vertical line test for all x, as shown below for
x = 1905.
In fact, these points are the graph of the average annual temperature in New York as a function of year.

Piecewise-Defined Functions
Regarding federal income taxes, the mathematician Hermann Weyl (1885-1955) stated
Our federal income tax law defines the tax y to be paid in terms of the income x; it does so in a clumsy
enough way by pasting several linear functions together, each valid in another interval or bracket of
income. An archaeologist who, five thousand years from now, shall unearth some of our income tax
returns together with relics of engineering works and mathematical books, will probably date them a
couple of centuries earlier, certainly before Galileo and Vieta.∗
Hence, in the real world sometimes functions must be defined with more than one formula and therefore are called
piecewise defined functions.
Example 7. Income tax rates
The federal income tax rates for singles in 2006 can be described as 10% for (adjusted) incomes up to $7,550, 15%
for incomes up to $30,650, 25% for incomes up to $74,2000, 28% for incomes up to $154,800, 33% for incomes up to
$336,550, and 35% for incomes greater than $336,550. Express the income tax rate f (x) for an individual in 2006
with adjusted income x as a piece-wise defined function. Graph the income tax rates over the interval [0, 500000].
Solution. An algebraic representation of this function is given by



 0.1 if x ≤ 7, 550



 0.15 if 7, 550 < x ≤ 30, 650

0.25 if 30, 650 < x ≤ 74, 200
f (x) =

 0.28 if 74, 200 < x ≤ 154, 800



 0.3 if 154, 800 < x ≤ 336, 550

0.35 if x > 336, 550
The graph of this piecewise function over the interval [0, 500, 000] is shown in Figure 1.14. This graph consists of
linear pieces with jumps between income brackets.
Figure 1.14: Graph of 2006 income tax rates for singles
A particularly important piecewise defined function is the absolute value function.
The absolute value function y = |x| is defined by

Absolute Value Function x if x ≥ 0
|x| =
−x if x < 0
When x is non-negative, the absolute value of x is itself. When x is negative, the absolute value of x is the
negative of itself. Hence, the graph of the absolute value function is shown in Figure 1.15.
∗ The Mathematical Way of Thinking, an address given at the Bicentennial Conference at the University of Pennsylvania, 1940.

Figure 1.15: Graph of y = |x|.
Increasing and decreasing functions
There are several different properties of functions, which are useful in a variety of ways.
Let I be an interval in the domain of a function. Then:

Increasing and decreasing f is increasing on I if f (x) < f (y) for all x < y in I;
functions f is decreasing on I if f (x) > f (y) for all x < y in I;
f is constant on I if f (x) = f (y) for every x and y in I.
These classifications are shown graphically in Figure 1.16
Figure 1.16: Classifications of functions
Example 8. Classifying a function
Consider the function f defined by the following graph on the interval I = [−2, 3].

Find the intervals on which f is increasing and the intervals on which f is decreasing.
Solution. The function f is decreasing on [−2, −1], increasing on [−1, 0], decreasing on [0, 2], and increasing on
[2, 3]. 2
Problem Set 1.2
Determine whether the descriptions in Problems 1 to 6 represent functions. If it is a function, find the domain, and
(if possible) the range.
1. a. {(4, 7), (3, 4), (5, 4), (6, 9)}
b. {6, 9, 12, 15}
2. a. {(5, 2), (7, 3), (1, 6), (7, 4)}
b. {(x, y) : y = 4x + 3}
3. a. {(x, y) : y ≤ 4x + 3}
b. {(x, y) : y = 1 if x is positive and y = −1 if x is negative}
4. a. {(x, y) : y is the closing price of IBM stock on July 1 of year x}
b. {(x, y) : x is the closing price of Apple stock on July 1 of year y}
5. a. {(x, y) : (x, y) is a point on a circle of radius 4 passing through (2, 3)}
b. {(x, y) : (x, y) is a point on an upward-opening parabola with vertex (−3, −4)}
6. a. {(x, y) : (x, y) is a point on a line passing through(2, 3) and (4, 5)}
b. {(x, y) : (x, y) is a point on a line passing through (4, 5) and (−3, 5)}
Use the vertical line test in Problems 7 to 12 to determine whether the curve is a function. Also state the probable
domain and range.

7. a.
b.
8. a.

b.
9. a.
b.

10. a.
b.
11. a.

b.
12. a.
b.
In Problems 13 to 18 find the domain of f and compute the indicated values or state that the corresponding x-value
is not in the domain.
13. f (x) = −x2 + 2x + 3; f (0), f (1), f (−2)
14. f (x) = 3x2 + 5x − 2; f (1), f (0), f (−2)
(x+3)(x−2)
15. f (x) = x+3 ; f (2), f (0), f (−3)
16. f (x) = (2x − 1)−3/2 ; f (1), f ( 21 ), f (0)

17.

−2x + 4 if x ≤ 0
f (x) =
x+1 if x > 0
f (3), f (1), f (0)

18.

 3 if x < −1
f (x) = x + 1 if − 1 ≤ x ≤ 5
 p
(x) if x > 5
f (−6), f (−5), f (16)
19. Consider a Squaring Function Machine:
A table of values from this squaring machine is given below:

Algebraically, define a function, F , for input values x from the domain.
Input values Output values
1 1
2 4
3 9
−5 25
If there is another Secret Machine:
A table for this machine is also given:

0 3
1 5
2 7
3 9
4 11
Algebraically define a function, S, for input values t from the domain.
20. Suppose you are given a machine that multiplies the input value by 3 and then subtracts 7.

Complete the table of values given below: Algebraically, define a function, M ,

3 2
5
0
−3
for input values x from the domain.
Suppose there is another Super Secret Machine with the following table given.

0 5
1 6
2 9
3 14
4 21
Algebraically define a function, T , for input values t from the domain.
Find the domain and range for the graphs indicated in Problems 21 to 26. Also tell where the function is increasing,
decreasing, and constant.
21.
22.

23.
24.
25.

26.
For each verbal description in Problems 27 to 30, write a rule in the form of an equation, state the domain, and then
graph the function.
27. For each number x in the domain, the corresponding range value, y, is found by multiplying by 3 and then
subtracting 5.
28. For each number x in the domain, the corresponding range value, y, is found by squaring and then subtracting
5 times the domain value.
29. For each number x in the domain, the corresponding range value, y, is found by taking the square root of the
difference of the domain value subtracted from 5.
30. For each number x in the domain, the corresponding range value, y, is found by adding 1 to the domain value
and then dividing that result into 5 added to 5 times the domain value.
31. From a square whose side has length x (in inches), create a new square whose side is 5 in. longer. Find an
expression for the difference between the area of the two squares (in square inches) as a function of x. Graph
this expression for 0 ≤ x ≤ 10.
32. From a square whose side has length x (in meters), create a new square whose side is 10 m longer. Find
an expression for the sum of the areas of the two squares (in square meters) as a function of x. Graph this
expression for 0 ≤ x ≤ 10.
33. Find the area of square as a function of its perimeter.
34. Find the area of a circle as a function of its circumference.
35. Biologists have found that the speed of blood in an artery is a function of the distance of the blood from the
artery’s central axis Figure 1.17. According to Poiseuille’s law, the speed (cm/sec) of blood that is r cm from
the central axis of an artery is given by the function
S(r) = C(R2 − r2 )
where R is the radius of the artery and C is a constant that depends on the viscosity of the blood and the
pressure between the two ends of the blood vessel. ∗ Suppose that for a certain artery,
C = 1.76 × 105 cm/sec
and
R = 1.2 × 10−2 cm
∗ The law and the unit poise, a unit of viscosity, are both named for the French physician Jean Louis Poiseuille (1799-1869).

Figure 1.17: Cut-away view of an artery
a. Compute the speed of the blood at the central axis of this artery.
b. Compute the speed of the blood midway between the artery’s wall and central axis.
c. What is the domain for the function defined by the ordered pairs (r, S)?
d. Graph this function for S ≥ 0.
36. The reaction rate of an auto-catalytic reaction is given by the formula
R(x) = kx(a − x)
for 0 ≤ x ≤ a, where a is the initial concentration of substance A and x is the concentration of X.

a. What is the domain?
b. Graph this function for k = 3 and a = 8.
37. Consider the function defined by
12
f (n) = 3 +
n
a. What is the domain of the function f ?
b. To study the rate at which animals learn, a psychology student performed an experiment in which
a rat was sent repeatedly through a laboratory maze. Suppose that the time (in minutes) required
for the rat to traverse the maze on the nth trial is modeled by the function f . For what values of n
does f (n) have meaning in the context of the psychology experiment?
c. What is the name of principle you used as a basis for your answers to parts a and b?
d. Graph the function.
e. According the function f , what will happen to the time required for the rat to traverse the maze
as the number of trials increases? Will the rat ever be able to traverse the maze in less than three
minutes?
38. Consider the function defined by

150x
f (x) =
200 − x
a. What is the domain of the function f ?

b. Suppose that during a nationwide program to immunize the population against a certain form of
influenza, public health officials found the cost (in millions of dollars) of inoculating x% of the
population is modeled by f . For what values of x does f (x) have a practical interpretation in this
context?
c. What is the name of principle you used as a basis for your answers to parts a and b?
d. Graph the function.
e. Compare the cost of inoculating the first 50% of the population with the cost for the second 50%.
39. Friend’s rule is a method for calculating pediatric drug dosages in terms of a child’s age. If A is the adult
dosage (in mg) and n is the age of the child (in years), then the child’s dosage is given by
2
D(n) = nA
25
a. What is the domain for the function defined by (n, D)?

b. Graph this function for A = 100.
c. If a three-year-old child receives 100 mg of a certain drug, what is the corresponding dosage for a
five-year-old child?
40. Cowley’s rule is another method for calculating pediatric drug dosages. If A denotes the adult dosage (in mg)
and n is the age of the child (in years),then the corresponding child’s dosage is given by

n+1
D(n) = A
24
a. What is the domain for the function defined by (n, D)?

b. Graph this function for A = 192.
c. If a three-year-old child receives 100 mg of a certain drug, what is the corresponding dosage for an
adult?
∗
Pythagoras (ca 569 B.C.–475 B.C.)
Even though we know very little about the man himself, we do know he was a Greek philosopher and is
sometimes described as the first true mathematician in the history of mathematics. He founded a philosophical
and religious school in Croton and had many followers, known today as the Pythagoreans. The Pythagoreans
∗ Throughout the text, you find problems called HISTORICAL QUEST. These problems are not just historical notes to help you see
mathematics and biology as living/breathing disciplines, but are designed to involve you in the quest of pursuing some great ideas in the
history of science. Yes, they will give you some interesting history, but will then lead you on a quest which you should find interesting.

were a secret society who had their own philosophy, religion, and way of life. This group investigated music,
astronomy, geometry, and number properties. Because of their strict secrecy, much of what we know about
them is legend, and it is difficult to tell just what work can be attributed to Pythagoras himself. We also know
that it was considered impious for a member of the Pythagorean Society to claim any discovery for himself.
Instead, each new idea was attributed to their founder Pythagoras. You, no doubt, know the Pythagoreans
theorem, but did you know that the Pythagoreans believed that all things are numbers, and by a number they
meant √ the ratio of two whole numbers? For this HISTORICAL QUEST you are to use these two ideas to prove
that 2 is an irrational number. There is a legend (not an historical fact) that one day a group of Pythagorean
were out in a boat seeking truth, and one person on board came up with the following argument: Construct
a right triangle with legs of length √ 1 unit. Then, by the Pythagorean theorem the length of the hypotenuse
is (using modern√ notation) exactly 2 units long. Is the length of this side a rational number or an irrational
number? Let 2 = pq . (Remember, they believed that all numbers could be expressed as the ratio of two
√
whole numbers; thus, assume that 2 is a rational number.) Assume that pq is a reduced fraction (because if
it is not reduced, simply reduce it and work with the reduced form). See if you can reproduce the work done
in the boat. (That is, show the details that we outline here.) Square both sides of the equation and prove
that p is an even number. If p is even, then it can be written as p = 2k. Use this fact to show that q is even.
Thus, the fraction pq is not reduced. Now, if you understand logic, as did the Pythagoreans, you can see the
√
contradiction. What is it? How can you use this information to prove that 2 is irrational. Legend has it,
that this contradiction bothered those on the boat so much that they tossed the person who came up with this
argument overboard, and pledged themselves to secrecy!

40 1.3. DATA FITTING WITH LINEAR AND PERIODIC FUNCTIONS
1.3 Data fitting with Linear and Periodic Functions

In the previous section we presented data, such as the carbon dioxide (CO2 ) data collected at the top of the
Mauna Loa volcano since 1958 by the US government’s Climate Monitoring Diagnostics Laboratory. These data are
plotted in Figure 1.12. Each point in this plot can be written as a pair of values (x, y), where x is the month and
has values from 1 to 44, and y is in parts per million and has values that range from 330 to 350. Scientists routinely
collect data involving two variables x and y and refer to such data as bivariate. In many cases, a list of bivariate
points, such as the Mauna Loa CO2 data, can be replaced by a relatively simple functional relationship of the form
y = f (x) that passes, if not through all points, then close by all points. What we mean by this will become clear in
this section where we explore how to find simple algebraic expressions that can replace data in the form of a bivariate
list of values. The advantage of doing this is that the function describes the data more concisely than a list and
can be used to interpolate missing data points, make predictions for uncollected data values, and test hypotheses.
For instance, if we had a function that did a good job of describing how carbon dioxide concentrations fluctuate in
time, then we could make predictions about future levels of carbon dioxide concentrations. The importance of these
predictions stems from the fact that carbon dioxide is a greenhouse gas. It prevents the escape of heat radiating
from the earth. Consequently, carbon dioxide in the atmosphere influences the earth’s temperature and we would
like to know what the temperature might be 20 or 50 years from now so that we can plan accordingly.
The most commonly fitted function is a linear function, which leads into one of the most important topics in
statistical analysis: linear regression. In this section, we review basic facts about linear functions and examine how
to fit linear functions to data sets. For instance, the data in Figure 1.12 suggests that carbon dioxide concentrations
in the atmosphere have been increasing. Using linear regression, we determine at what rate this increase is occurring.
In addition to exhibiting a linear trend, the carbon dioxide data clearly exhibits seasonal fluctuations. These seasonal
fluctuations can be modeled by periodic functions. Consequently, the section continues by reviewing basic properties
of periodic functions and fitting periodic functions to data sets. Using these functions, we can determine at what
times of year the carbon dioxide levels are highest or lowest.
Linear Functions
Linear functions play a fundamental role in differential calculus in which functions are approximated locally (i.e.
over a relatively small interval of the domain of the variable x) by linear functions. A linear function is a function
of the form
y = f (x) = mx + b
where m is the slope and b is the vertical or y-intercept of the linear function. The vertical intercept b is the
value of y when x equals zero. Equivalently, it is the y-value at which the graph of y = f (x) intercepts the y-axis:
that is, b = f (0). Alternatively, the slope m of the line tells us that if we increase the x-value by an increment, say
0.2, then the corresponding y-value increases by m times that increment, 0.2m. Equivalently, the change in y divided
by the corresponding change in x is always the constant m. This leads us to a slope formula.
A non-vertical line that contains the points P1 = (x1 , y1 ) and P2 = (x2 , y2 ) has
slope
Slope of a Line y2 − y1
m=
x2 − x1
When the function y = mx + b is regarded as a relationship between the paired variables (x, y), x is called the
independent variable and y the dependent variable because the relationship is designed to answer the question,
“What value of y corresponds to a given value for x.”
Example 1. From graphs to equations
Let y = f (x) be the linear function whose graph is shown in Figure 1.18. Find the equation for f (x).

1.3. DATA FITTING WITH LINEAR AND PERIODIC FUNCTIONS 41
Figure 1.18: Graph of y = −2x + 1
Solution. Looking at the graph, we see that the y intercept is given by b = 1. Since y = 1 when x = 0 and y = 0
when x = 0.5, we see that y decreases by 1 when x increases by 0.5. Thus,
1−0
m= = −2
0 − 0.5
and the equation of this line is
y = −2x + 1
2
Example 2. From equations to graphs
Let y = f (x) be a linear function such that f (2) = 3 and f (−2) = −1. Find and graph f (x).
Solution. Since f (x) is linear, we can write f (x) = m x + b for constants a and b that we need to determine. The
slope is given by
f (2) − f (−2) 3 − (−1)
m= = =1
2 − (−2) 4
Therefore, f (x) = x + b. To find b, we can solve
3 = f (2)
3 = 2+b
1 = b
Hence, y = f (x) = x + 1. To graph this function, it suffices to draw a line that passes through the points (−2, −1)
and (2, 3) as shown in Fig. 1.19. 2
Example 3. From Fahrenheit to Celsius
To convert from Fahrenheit to Celsius, it suffices to recall that water freezes at 32 ◦ F or 0 ◦ C and boils at 212 ◦ F
or 100 ◦C.
a. Find the linear function which converts Fahrenheit F to Celsius C.
b. Convert 23◦ C to Fahrenheit and 85 ◦ F to Celsius. Round your answers to the nearest degree.

2.5
1.5
0.5
-2 -1.5 -1 -0.5 0.5 1 1.5 2
-0.5
-1
Figure 1.19: Graph of y = x + 1
Solution.
a. Writing the ordered pair as (F, C), we are given the points (32, 0) and (212, 100). Therefore the slope for
our linear function is
100 − 0 5
m= = .
212 − 32 9
Using either the given points, we use (0, 32), in the equation,
5
C= F +b
9
to find the value of b, we have
5 160
0= (32) + b or b = − .
9 9
The desired equation is
5 160 5
C= F− = (F − 32).
9 9 9
b. We find the desired values by substitution:
5 5
C = 9 (F− 32) and C = 9 (F − 32)
5 5
23 = 9 (F− 32) C = 9 (85 − 32)
5
207 = 5F − 160 C = 9 (53)
F = 73.4 C = 29.4
We see that 23 ◦ C ≈ 73 ◦ F and 85 ◦ F ≈ 29 ◦ C.

2
Fitting linear functions to data

Many data sets exhibit trends that can be reasonably described by linear functions. We can fit linear functions to
data either using formal or informal approaches. Informal approaches include “eye-balling” how well a selected line
passes through a given set of data or fitting a line to two suitably chosen points in the data set. Formal statistical
methods provide methods for finding the “best-fitting line” in some well-defined mathematical sense that we
describe after the next example.
Example 4. CO2 output from electric power plants

In Figure 1.20 the CO2 emissions of most of the electricity generation plants in California are plotted as a function
of the heat input for the year 1997. The heat input units are a million British Thermal Units (i.e. 1012 BTUs or 1
MMBTU) and CO2 emissions are measured in metric tons.
Figure 1.20: Data from the Emissions and Generation Resource Integrated Database
In Table 1.2, six points that appear in Figure 1.20 are listed.
Table 1.2: California Power Plants in 1997
Heat input (MMBTU) CO2 ouput (tons)

45.179 × 106 2.685 × 106
1.00 × 106 0.058 × 106
1.902 × 106 0.113 × 106
3.334 × 106 0.197 × 106
0.086 × 106 0.005 × 106
13.897 × 106 0.826 × 106
.. ..
. .
a. Since the data in Figure 1.20 looks linear, use the first two data points in Table 1.2 to find a line that
passes through the data. Graph this line.
b. One data point in Figure 1.20 looks like it does not fit the rest of the data. This data point corresponds
to a heat input of 4.488 × 106 MMBTU with a corresponding out put of around 2.3 × 106 metric tons. Use
the linear function in part a to estimate the CO2 output for this plant. Then use the graph to estimate
the actual output.
Solution.
a. To find the line y = mx + b that passes through (45.179 × 106 , 2.685 × 106 ) and (1 × 106 , 0.058 × 106 ), we
first solve for the slope:
2.685 − 0.058
m= ≈ 0.059
45.179 − 1

Using the point-slope formula (see Problem 17) for a line yields
y − 2.685 = 0.059(x − 45.179)

y ≈ 0.059x − 0.019 × 106 tons of CO2
We sketch this line over the graph shown in Figure 1.20.
A very good fit considering we just used the first two data points. This does not always happen.
b. Substituting x = 4.488 into our linear equation yields
y = 0.059(4.488) − 0.019 ≈ 0.25 × 106 tons of CO2 .
This is significantly smaller than the value of 2.3 × 106 tons of CO2 given in the data. Thus the power
plant represented by this point on the graph pollutes almost 10 times as much as it should compared
with the other power plants of similar energy output.
2
Sometimes we can get a good fit to data set by appropriately choosing two data points and finding the line
that passes through these points. However, this method is quite ad-hoc and yields many different possible lines.
Statisticians have come up with a method called linear regression, that is used to find a line that best fits that
data in the following sense: the slope parameter m and y-intercept parameter b are chosen to minimize the sum of
the squared vertical distances ei of the data from the line (see Figure 1.21). The values ei are called the residuals
because they represent “what is left once the linear fit has been taken into account.”
Why squared distances? To find the answer to this question and to learn the statistical underpinnings of linear
regression, you should take an introductory statistics course! However, we note without further details (see any
elementary statistics text for details) that a sums-of-squares measure of the fit leads to relatively simple formulae for
the slope and y-intercept of the best-fitting line (which can be easily computed with calculators, computer software,
and on-line java scripts).
Example 5. CO2 concentrations in Hawaii
Table 1.3 describes how CO2 concentrations (in ppm) have varied from May 1974 to Dec 1985. The plots of this
data (where time is measured in months) was given by Example 1 of Section 1.2.∗
a. Find the best-fitting line to the CO2 data. Plot this line against the data.
∗ http://www.seattlecentral.org/qelp/sets/016/016.html

a+bx
ei
(xi,yi )
Figure 1.21: Vertical distance of data from a line
b. Determine at what rate (in ppm/yr) the concentration of CO2 has been increasing.
c. Estimate the CO2 concentration for Dec 2004 using your best-fitting line. How does this compare with
the average level of 338 ppm over the period May 1974 to Dec 1985?
d. For the CO2 concentration in each data point, subtract the CO2 concentration predicted by the best-
fitting line. Plot the resulting residuals. What do you notice?
Solution.
a. Downloading the data from the website and entering it into a graphing calculator or to a computer
spreadsheet, and then running a linear regression routine yields the best-fitting curve
y = 0.1225x + 329.3
STOP: Do not just read this, do it! Plotting this line against the data results in Figure 1.22.
ppm
345
340
335
month
20 40 60 80 100 120 140
Figure 1.22: Best fitting line for CO2 in Hawaii
b. Since the slope of the line is 0.1225, the rate that CO2 concentration has been increasing is 0.1225ppm/month.
Multiplying by 12 yields an annual rate of 1.47 ppm/year.
c. The number of months between Dec 2004 and May 1974 is 12 · 30 + 8 = 368. Substituting x = 368 into
the best- fitting line yields a prediction of
y = 0.1225 · 368 + 329.3 = 374.3
The estimated CO2 concentration for December 2004 is 374.3 ppm. This is 374.3 − 338 = 36.3 ppm higher
than the average level from May 1974 to Dec 1985.

Table 1.3: CO2 concentrations at the Mauna Loa Observatory of Hawaii
d. Subtracting the best-fitting line from the data and plotting the first five years yields Figure 1.23. This
figure illustrates that in the absence of the linearly increasing trend, the CO2 concentrations exhibit
well-defined oscillations.
ppm
month
10 20 30 40 50 60
-2
-4
Figure 1.23: Residuals for the CO2 in Hawaii once the values predicted by the best fitting line have been subtracted
from the data.
Periodic functions
Many biological and physical time series exhibit oscillatory behavior, as shown by Example 5. Less regular oscilla-
tions can be observed in the Nicholson blowfly (Lucilla cuprina) population data in Figure 1.24. Under controlled
laboratory conditions, the abundance of the blowfly exhibits rapid growth followed by spectacular crashes when the
populations get too large. These types of data sets can be described by periodic functions that repeat their values
at evenly spaced intervals. More formally, we make the following definition.

abundance
14000
12000
10000
8000
6000
4000
2000
days
25 50 75 100 125 150 175 200
Figure 1.24: Population abundance of blowflies under controlled laboratory conditions.
A real valued function f is periodic if there is a real number T > 0 such that
f (x) = f (x + T )
Periodic Function
for all x. The smallest possible value of T is called the period of f . The amplitude
(if it exists) of a periodic function is half of the difference between its largest and
smallest values.
Example 6. Estimating periods and amplitudes
Estimate the period and amplitude for the CO2 data in Figure 1.23b and the Nicholson blow fly data in Figure 1.24.
Solution. A quick examination of the CO2 data reveals that the time between peaks is approximately 12 months,
so the period is a year. From the plot of the residuals in Figure 1.23, we see that the largest value of the data seem
to be around 3 ppm, while the smallest values are typically around −3 ppm. Hence, the amplitude is approximately
(3 − (−3))/2 = 3 ppm.
In the blowfly data, the time between population peaks is approximately 30 days, so the period is approximately
one month. The peaks tend to be around 9, 000 and the minimum seems to be 0. So the amplitude is approximately
4, 500. 2
Two important periodic functions that you have encountered previously are the cosine and sine functions. The
graphs of these sin x and cos x are illustrated in Figure 1.25. Since the graph of sine is the graph of cosine shifted to
the right by π/2, that is
π
sin x = cos x −
2
we can focus our attention on the cosine function. Curves with this shape are called sinusoidal.
Example 7. Fitting the CO2 data
Consider
y = f (x) = a cos(bx)
where a and b are positive constants.
a. Find the period and amplitude of f .
b. Write down an equation f (x) that provides a good fit to the CO2 residual data shown in Figure 1.23 from
the Mauna Loa Observatory and plot this equation against the given data.

a. cosine curve period: 2π; amplitude: 1 b. sine curve period: 2π; amplitude: 1
Figure 1.25: Graphs of cosine and sine.
c. Let g(x) = 0.1225x + 329.3 be the best fitting line shown in Figure 1.22 and f (x) the equation you have
just obtained above. Plot h(x) = f (x) + g(x) against the data shown in Table 1.3. Use h to to predict
the carbon dioxide level in March 2006, and compare to what you find online.
Solution.
a. Since cos x achieves its maximum of 1 at x = 0, a cos(0) = a is the maximum of the function y = a cos(bx).
Similarly, the minimum of f (x) is −a. Hence, the amplitude is a.
The period of cos x is 2π, so as bx goes from 0 to 2π one period of y = a cos(bx) is completed. Since
bx = 2π when x = 2π/b, the period of f (x) is 2π/b.
b. We found in the previous example that the amplitude for the data in Figure 1.23 is 3 and the period is
12 months, so we need to choose a = 3 and b such that 2π/b = 12, namely b = π/6. Therefore,
π
f (x) = 3 cos x
6
The graph of this equation against the data is shown:
c. Plotting h(x) = f (x) + g(x) against the data yields the following graph.

ppm
345
340
335
month
20 40 60 80 100 120 140
A truly remarkable fit! Next, calculate h(12 · 31 + 11) = h(383) = f (383) + g(383) = 376.2 + 2.60 = 378.8.
According to one web site, the March measurement was 381 ppm. Hence, CO2 may well be increasing
slightly faster than predicted by the model, possibly due to an accelerating rate of CO2 emissions.
Problem Set 1.3

Solve for y as a function of x and graph the resulting function for Problems 1 to 10.
1. 5x − 4y − 8 = 0
2. x − 3y + 2 = 0
3. 100x − 250y + 500 = 0
4. 2x − 5y − 1, 200 = 0
5. 3x + y − 2 = 0, −7 ≤ x ≤ 1
6. 2x − 2y + 6 = 0, 1 ≤ x ≤ 5
7. y = cos(4x)
8. y = 4 cos x
9. y = sin x
10. y = sin(2x)
Using the information in Problems 11 to 16 find the formula for the line y = mx + b.
11. Slope 3, passing through (1, 3)
12. Slope 52 ; passing through (5, −2)
13. Passing through (−1, 2) and (0, 1)
14. Passing through (5, 6) and (7, 6)
15. y-intercept 4 passing through (3, 4)
16. horizontal line through (−2, 5)

17. Derive the point-slope form equation of the line passing through the point (h, k) with slope m:
y − k = m(x − h)
18. Derive the equation of vertical line passing through (h, k). Does this set of points represent a function?
Classify each graph in Problems 19 to 24 as a linear function or a periodic function. If it is linear, estimate the slope
and write an equation of the form y = mx + b. If it is periodic, estimate the period and the amplitude and write an
equation of the form y = a cos(bx).
19.
20.
21.
22.

23.
24.
Match the equations in Problems 25 to 30 along with the scatter diagrams and best-fitting lines.
25. y = 0.6x + 2
26. y = 0.5x + 2
27. y = 0.4x + 2
28. y = −0.4x + 2
29. y = −0.5x + 2
30. y = −0.7x + 2

31. A life insurance table indicates that a woman who is now A years old can expect to live E years longer. Suppose
that A and E are linearly related and that E = 50 when A = 24 and E = 20 when A = 60.
a. At what age may a woman expect to live 30 years longer?

b. What is the life expectancy of a newborn female child?
c. At what age is the life expectancy zero?

32. In certain parts of the world, the number of deaths N per week have been observed to be linearly related to
the average concentration x of sulfur dioxide in the air. Suppose there are 97 deaths when x = 100 mg/m3
and 110 deaths when x = 500 mg/m3 .
a. What is the functional relationship between N and x?
b. Use the function in part a to find the number of deaths per week when x = 300 mg/m3 . What
concentration of sulfur dioxide corresponds to 100 deaths per week?
c. Research data on how air pollution affects the death rate in a population.∗
Summarize your results in a one-paragraph essay.
∗
33. The chart in Figure 1.26 is taken from the November 1987 issue of Scientific American.
Figure 1.26: Fat intake compared with death rate
It can be shown that the best-fitting line is one of the following:

A. y = 0.139x
B. y = 0.231x − 3
C. y = 0.981x + 1
Which do you think is the correct one? Use your choice to estimate the number of deaths per 100, 000 population
to be expected from an average fat intake of 150 g/day (roughly the fat intake in the United States).
∗
34. The chart in Figure 1.27 is taken from the April 1991 issue of Scientific American.
It can be shown that the best-fitting line is one of the following:
A. y = 0.31x
B. y = 0.221x + 2
C. y = 0.29x + 1
Which do you think is the correct one? Use your choice to estimate the relative stride length that corresponds
to a Froude number x = 4.
35. In a classic study by Huxley, the weight X, in mg of the small fiddler crab (Uca Pugnax ) is compared with the
weight of the large claw (Y , in mg). The data is shown in Table 1.4.
∗ You may find the following articles helpful: D.W. Dockery, J. Schwartz, and J.D. Spengler, “Air Pollution and Daily Mortality:
Associations with Particulates and Acid Aerosols,“ Environ. Res, Vol. 59, 1992, pp. 362-373; Y.S. Kim, “Air Pollution, Climate,
Socioeconomics Status and Total Mortality in the United States,” Sci. Total Environ., Vol. 42, 1985, pp. 245-256.
∗ Graph by Slim Films, from “Diet and Cancer”, by Leonard A. Cohen, Scientific American, November 1987, p. 44 © 1987 by Scientific
American, Inc. All rights reserved.

∗ Graph by Patricia J. Wynne, from “How Dinosaurs Ran”, by R. McNeill Alexander, Scientific American, April 1991, p. 132 © 1991
by Scientific American, Inc. All rights reserved.

Figure 1.27: Comparison of Froude number with stride length
Table 1.4: Comparison of the weight of the fiddler crab with the weight of its large claw
X Y X Y
57.6 5.3 355.2 104.5
80.3 9.0 420.1 135.0
109.2 13.7 470.1 164.9
156.1 25.1 535.7 195.6
199.7 38.3 617.9 243.0
238.3 52.5 680.6 271.6
270.0 59.0 743.3 319.2
300.2 78.1
a. Plot the points in the table. Does this look like a linear model to you?
b. Plot the line y = 0.47x − 49 on the axis for the points you plotted in part a. Does this look like a
best-fitting line? Do you think you can find a better fitting line?
36. The data in Table 1.5 compares the mandibles of the male stag-beetle (Cyclommatus tarandus) where X is the
total length (body and mandibles) in millimeters and Y is the length of the mandibles in millimeters.
Table 1.5: Comparison of body weight with the length of the mandibles of the male stag-beetle
X Y X Y
20.38 3.88 36.13 12.08
24.01 5.31 37.32 12.73
26.38 6.33 38.44 14.11
27.76 7.32 39.26 14.70
29.65 8.17 41.34 15.84
32.20 9.73 43.22 17.39
33.11 10.71 45.51 18.83
35.01 11.49 46.32 19.19
a. Plot these points. Does this look like a linear model to you?
b. Plot the line y = 0.62x − 9.7 on the axis for the points you plotted in part a. Does this look like a
best fitting line? Do you think you can find a better fitting line?
37. Table 1.6 shows the census figures (in millions) for the U.S. population since the first census.

Table 1.6: U.S. Population

Year Population Year Population
1780 2.8 1900 76.0
1790 3.9 1910 92.0
1800 5.3 1920 105.7
1810 7.2 1930 122.8
1820 9.6 1940 131.7
1830 12.9 1950 150.7
1840 17.1 1960 179.3
1850 23.2 1970 203.3
1860 31.4 1980 226.5
1870 39.8 1990 248.7
1880 50.2 2000 281.4
1890 62.9
a. Plot these points where 1780 represents t = 0. Does this look like a linear model to you?
b. Plot the line y = 1.15x − 39 on the axis for the points you plotted in part a. Does this look like a
best fitting line? Do you think you can find a better fitting line?
38. Ethyl alcohol is metabolized by the human body at a constant rate (independent of concentration). Suppose
the rate is 10 mL per hour.
a. Express the time t (in hours) required to metabolize the effects of drinking ethyl alcohol in terms of
the amount A of ethyl alcohol consumed (in mL).
b. How much time is required to eliminate the effects of a liter of beer containing 3% ethyl alcohol?
c. Discuss how the function in part a can be used to determine a reasonable “cutoff” value for the
amount of ethyl alcohol A that each individual may be served at a party.
39. In a 1971 published study (Savini and Bodhaine (1971), USGS WSP 1869-F), data for velocity of water versus
depth was collected for the Columbia River below Grand Coulee Dam. The data is reported in Table 1.7 and
was measured 13 feet from the shoreline
Table 1.7: Depth and flow of Grand Coulee Dam

depth (ft) vel (ft/sec)
0.7 1.55
2.0 1.11
2.6 1.42
3.3 1.39
4.6 1.39
5.9 1.14
7.3 0.91
8.6 0.59
9.9 0.59
10.6 0.41
11.2 0.22
a. Plot these points.

b. Find the line defined by the first two data points. Plot this line against the data. Discuss how well
it fits the data.
c. Draw a line which you think best fits the data.
d. Estimate the velocity of the river at a depth of 12 feet and 20 feet. Discuss the answers you obtain.

40. At Seattle Central, 88 samples of shells of the native butter clam (axidomus giganteus) were collected. These
clams grow to lengths of 12-13 cm and live for more than 20 years. A scatter plot of their data is given in
Figure 1.28
Figure 1.28: Plot of length and width of clam samples at Seattle Central
a. A pair of points on this data set are given by (1.3, 1.7) and (7.3, 8.9). These two points are drawn in
black in the above figure. Sketch the line passing through these points and find the formula for this
line.
b. Use your line to estimate the width of a butter clam whose length is 12 cm.

1.4. POWER FUNCTIONS AND SCALING LAWS 57
1.4 Power Functions and Scaling Laws

Why can an ant lift one hundred times its weight while a typical man can only lift about 0.6 of his weight? Why
is getting wet life-threatening for a fly but not a human? Why can a mouse fall from the top of a sky scraper and
still scurry home, while a person will almost certainly be killed? Why are elephants legs so thick relative to their
length while the legs of gazelle so much thinner relative to their length? A class of functions called power functions
provide a means to answering these questions.
Power Functions and Their Properties
We begin with a definition.
A function f (x) is a power function if it is of the form
y = f (x) = axb
Power Functions
where a and b are real numbers. The variable x is called the base, the parameter
b is called the exponent and the parameter a the constant of proportionality.
Note that 57 x−1 and x3 are power functions, while 3x is not because, in this latter case, the exponent rather than
the base is the variable.
Example 1. Graphing power functions
Graph each of the following sets of functions and discuss how they differ from one other and what properties they
have in common.
a. y = x2 , y = x4 , and y = x6 .
b. y = x3 , y = x5 , and y = x7 .
c. y = x1/2 , y = x, and y = x3/2 .

1 1
d. y = x and y = x2 .
Solution.
a. Graphing y = x2 , y = x4 , and y = x6 gives

58 1.4. POWER FUNCTIONS AND SCALING LAWS
All of these graphs tend to “bend” upward and are “U-shaped.” All three of these graphs intersect at
the points (0, 0), (−1, 1) and (1, 1). On the interval [−1, 1] the function with the smallest exponent grows
most rapidly as you move away from x = 0, and on the intervals (−∞, 1) and (1, ∞) the function with
the largest exponent increases most rapidly.
b. Graphing y = x3 , y = x5 , and y = x7 gives
All of these graphs are “seat shaped”, bending downward for negative x and bending upward for positive
x. All three of these graphs intersect at the points (0, 0), (−1, −1) and (1, 1). On the interval [−1, 1]
the function with the smallest exponent grows most rapidly, and on the intervals (−∞, 1) and (1, ∞) the
function with the largest exponent grows most rapidly.
c. Graphing y = x1/2 , y = x, and y = x3/2 gives
We graphed over the domain [0, ∞) of y = x1/2 and y = x3/2 (these two functions are only real for x ≥ 0).
All of these graphs increase as x increases, and pass through the points (0, 0) and (1, 1). The graph of

x1/2 becomes steeper and steeper at 0, while the graph of x3/2 becomes flatter and flatter. Moreover, the
graph of x1/2 bends downward, while the graph of x3/2 bends upward.
1 1
d. Graphing y = x and y = x2 gives
Both of these functions “blow up” (i.e. have a vertical asymptotes) at x = 0 and pass through the point
(1, 1). Parts of the graphs lying above the x-axis, bend upwards, while parts lying below bend downwards.
To algebraically manipulate power functions, we need to review some properties of exponents, which we review
here.
Let x, y, a and b be any real numbers. Then provided that both sides of the equality
are well defined, the following five rules govern the use of exponents:
1. Addition law: xa xb = xa+b
xa
2. Subtraction law: xb
= xa−b
Laws of Exponents
3. Multiplication law: (xa )b = xab
4. Distributive law (exponent over multiplication): (xy)a = xa y a
xa
5. Distributive law (exponent over division): ( xy )a = ya
Example 2. Using Laws of Exponents
Simplify the following expressions using the laws of exponents.

x2
a. x
b. (x3 )1/3 x
4
c. √1x

Solution.
1
a. Since x = x−1 , we obtain
x2
= x2 x−1
x
= x2−1 = x by the addition law; note x 6= 0 implied
b. We have
(x3 )1/3 x = x3/3 x by the multiplication law

1 1 2
= x x =x by the addition law
√
c. Since x = x1/2 , we have
4
1
√ = (x−1/2 )4 by the subtraction law
x
= x−2 by the multiplication law; note x > 0 implied
Proportionality and Geometric Similarity

In his essay, “On being the right size,” John B. S. Haldane (1892–1964), one of the founders of the field of population
genetics, wrote:
A man coming out of a bath carries with him a film of water of about one-fiftieth of an inch in thickness.
This weighs roughly a pound. A wet mouse has to carry about its own weight in water. A wet fly has to
lift many times its own weight and, as everybody knows, a fly once wetted by water or any other liquid
is in a very serious position indeed.
If you have not thought about these things before, you might wonder how did Haldane come up with these conclusions?
Did he go out and weigh men, mice, and flies before and after dipping them in water? Probably not! In fact, these
statements are probably not that precise. For instance, when Professor Schreiber weighed himself before and after
taking a bath, he found that the difference in his weight was less than one-tenth of a pound. The main point of
Haldane’s statement is that as you get smaller the more dangerous getting wet becomes. To see why, let us perform
a gedankenexperiment (i.e. thought experiment) involving power laws which are most easily expressed using power
functions and the notion of proportionality.
We say that y, is proportional to x if there exists some constant a > 0 such that
y = a x for all x > 0. When y is proportional to x, we write
Proportionality
y∝x
Example 3. Geometric similarity
Imagine a world in which all individuals were cubical critters of different types: one such critter is drawn in
Figure 1.29. The size of each critter can be characterized using one measurement, L meters, which denotes the
length of the critter in any of its three dimensions.
a. Argue that the surface area, S, and volume, V , of the cubical critter are proportional to Lb for appropriate
choices of b.

Figure 1.29: Cute Cubical Critter (a C 3 )
b. If we assume that these cubical critters are essentially “ugly bags of mostly water,”∗ argue that body
mass, M , is also proportional to Lb for an appropriate choice of b. In your argument, you may use the
fact that 1 m3 of water has a mass of 1, 000 kilograms.
Solution.
a. Since the surface area of a cube is 6L2 and the volume of a cube is L3 ,
S ∝ L2 V ∝ L3
In other words, surface area is proportional to length squared and volume is proportional to length cubed.
b. Since we are assuming the cubical critters are made of water and the density of water is 1, 000 kg/m3 ,
we get the mass is M = 1, 000 · V = 1, 000 · L3 . Hence,
M ∝ L3
Notice that this proportionality would not change even if we used a different density constant.
To work with proportionality relationships, it is good to remember a few basic rules. Essentially these rules have
the effect that you can treat a proportionality symbol for manipulative purposes like an equality.
Example 4. Rules of proportionality
Demonstrate that proportionality satisfies the following properties:
Transitive Property: If x ∝ y and y ∝ z, then x ∝ z.
Power-to-Root Property: If y ∝ xb with b 6= 0, then x ∝ y 1/b .
General Transitive Property: If x ∝ y b and y ∝ z c , then x ∝ z bc
Solution.
∗ Star Trek fans may remember this line as an alien’s description of humans that are mostly water encased in a bag of skin. The
“ugly” part is a matter of extraterrestrial taste.

Transitive Property: Since x ∝ y, then there exists a constant a > 0 such that x = ay. Since y ∝ z, then there
exists a constant b > 0 such that y = bz. Therefore,
x = ay = a(bz) = (ab)z
This means that x ∝ z with proportionality constant ab.
Power-to-Root Property: If y ∝ xb , then there exists a constant a > 0 such that y = axb . Solving for x in terms of
y yields
y 1/b
x= = a1/b y 1/b
a
Hence, x ∝ y 1/b with proportionality constant a1/b .
General Transitive Property: This property is really just a simple extension of the transitive property, but is easily
demonstrated directly. If x ∝ y b and y ∝ z c , then there exist a1 > 0 and a2 > 0 such that x = a1 y b and
b
y = a2 z c . Therefore, x = a1 (a2 z c ) = a1 ab2 z bc . Hence, x ∝ z bc with proportionaility constant a1 ab2 .
Example 5. The dangers of getting wet
To understand the dangers of getting wet, it is reasonable to assume that the mass, W , of the water on your
body of mass M after getting wet is proportional to the surface area, S, of your body.
a. Show that for cubical critters W ∝ M b for an appropriate choice of b.
b. Suppose you had two cubical critters: a man-sized cubical critter with mass 60 kg, and a mouse-sized
cubical critter with mass 0.01kg. Moreover, assume when the man gets wet, the mass of water clinging
to his skin is 0.6kg. Using proportionality, find the mass of water on the mouse. Compare the ratios W
M
for the two critters.
c. Graph the ratio W/M as a function of M and discuss its implications for the danger of getting wet.
Solution.
a. Since we have assumed that W ∝ S and S ∝ L2 , we have W ∝ L2 (transitive property). Also, M ∝ L3 ,

so M 1/3 ∝ L (power-to-root property). Thus, from the general transitive property,
S ∝ L2 ∝ (M 1/3 )2 = M 2/3 .
In other words, W ∝ M b for b = 2/3.
b. Since W is proportional to M 2/3 , we know (from the definition of proportionality) that there exists some
number a so that
W = aM 2/3
The man-sized cubical critter has mass M = 60 with W = 0.6. We now need to find a:
W = aM 2/3
0.6 = a602/3
a ≈ 0.04
Furthermore,
W 0.6
= = 1%
M 60

Next, the mouse-sized critter has mass M = 0.01 kg with W = 0.04(0.01)2/3 ≈ 0.00186, so the amount
of water on the mouse is approximately 0.00186 kg. Finally,
W 0.00186
≈ ≈ 18.6%
M 0.01
We see that the wet cubical man has to lift only 1% of his body mass while the wet cubical mouse has to
lift approximately 19% of its body mass.
c. We have
W a M 2/3
= = aM −1/3 ≈ 0.04M −1/3
M M
Thus, the graph of W = M −1/3 is shown in Figure 1.30.
0.2
0.18
0.16
0.14
y
0.12
0.1
0.08
0.06
0.04
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
M
Figure 1.30: Graph of y = 0.04 M −1/3
This graph illustrates that as the bigger creature (i.e. M becomes larger) the amount of water one has
to carry relative to ones body mass decreases. Hence, getting wet is much worse for a fly than a human.
The previous example shows how we can use the notion of geometric similarity to understand how various
physiological attributes scale with body size. We used this notion to consider the implications of getting wet for
critters of vastly different sizes—from humans to flies. In the problem set we pose a counterpoint analysis of how
smaller animals are favored when it comes to the dangers of falling from high places. While it is true that organisms
are often geometrically quite dissimilar, it turns out that analyses using the approximation of geometric similarity are
quite good in many cases. In fact, geometrical similarity is not confined to cubical critters. So long as all dimensions
of an organism scales in the same way, the organisms are geometrically similar. Moreover, for any measurement of
length L (e.g. height, arm length, chest circumference), surface area S (e.g. palm surface area, cross-sectional area
of a muscle), and mass M (e.g. mass of a hair or the entire body), the relationships S ∝ L2 and M ∝ L3 continue
to hold.
Example 6. Olympic weight lifting
Table 1.8 tabulates the body mass and the winning lift in kilograms for the male gold medalists in the 1988, 1992,
and 1996 Olympic games.∗
In this example, we develop a simple model relating body mass to mass lifted.
∗ The heaviest weight class are excluded as individuals in this class have no weight restriction and therefore are often not geometrically
similar to their lighter counterparts

Table 1.8: Body mass vs winning lift (Gold Medal)

1988 1992 1996
Class Mass Lift Class Mass Lift Class Mass Lift
≤ 52 51.85 270.0 ≤ 52 51.8 265 ≤ 54 53.91 287.5
≤56 55.75 292.5 ≤ 56 55.9 287.5 ≤ 59 58.61 307.5
≤60 59.7 342.5 ≤ 60 59.9 320 ≤ 64 63.9 335
≤67.5 67.2 340.0 ≤ 67.5 67.25 337.5 ≤ 70 69.98 375.5
≤75 74.8 375.0 ≤ 75 74.5 357.5 ≤ 76 75.91 367.5
≤82.5 82.15 377.5 ≤ 82.5 81.8 370 ≤ 83 82.06 392.5
≤90 89.45 412.5 ≤ 90 89.25 412.5 ≤ 91 90.89 402.5
≤100 99.7 425.0 ≤ 100 97.25 410 ≤ 99 96.78 420
≤110 109.55 455.0 ≤ 110 109.4 432.5 ≤ 108 107.32 430
Heavy weight Middle weight Light weight
Figure 1.31: Geometrically similar weight lifters
a. A basic physiological principle is that the strength of a muscle is proportional to the cross-sectional
area of that muscle. Assume that Olympic male weightlifters are geometrically similar as illustrated in
Figure 1.31.
Argue that for a lifter of mass M the amount ℓ he can lift is proportional M b for the appropriate value
of b.
b. The relationship ℓ ∝ M b implies ℓ = aM b for some a > 0 and the value of b that you obtained in
part a. Find the proportionality constant a by forcing this relationship to pass through the data point
(ℓ, M ) = (287.5, 53.91) (table entry for the category ≤ 54 in year 1996 that leads to a good fit). Plot
ℓ = aM b for the values of a and b that you obtain.
c. Since the power law you find in part a. does a relatively good job of predicting lifts as a function of body
weight, you can use it to determine an overall winner amongst the weight classes. Namely, associate a
score
y = lift/(body mass)b
with each weight lifter and declare the individual with the largest score to be the overall winner. Use this
approach to find the overall winner in each of these Olympics.
Solution.

a. Let L a measurement of length (e.g. height), M the mass, and S the cross-sectional area of the weightlifter.
Since we assume that weightlifters are geometrically similar, we have M ∝ L3 and S ∝ L2 . Thus,
S ∝ L2 ∝ (M 1/3 )2 = M 2/3 .
Since we have assumed that ℓ ∝ S, we can conclude that
ℓ ∝ M 2/3 .
b. We substitute the data points to find
ℓ = aM 2/3
287.5 = a(53.91)2/3
a ≈ 20.15
We plot ℓ = 20.15M 2/3 as shown in Figure 1.32. Notice that it is a remarkable fit.
lift
400
300
200
100
body mass
20 40 60 80 100
Figure 1.32: Graph showing data points and graph of ℓ = 20.15M 2/3
c. Calculating the individuals scores y for each of the 1988 Olympic lifters in Table 1.8 we get the values
19.42 (= 270.0/51.852/3), 20.04, 22.42, 20.57, 21.12, 19.98, 20.62, 19.77, 19.87. The overall winner here is
the gold medal winner in the third lightest weight class. A quick search shows that this medal winner is
Naim Suleymanoglu. Similarly, in the 1992 Olympics the scores are are 19.07, 19.67, 20.90, 20.41, 20.19,
19.64, 20.66, 19.39, 18.91. Again the honors of overall winner goes to Naim Suleymanoglu in the third
lightest weight class! In the 1996 Olympics the scores are are 20.15, 20.38, 20.96, 22.11, 20.50, 20.79,
19.91, 19.92, 19.04. While Naim Suleymanoglu in the third lightest weight class gets the second highest
overall score, the honors in the 1996 Olympics goes to another gold medalist, Zhan Xugang in the fourth
lightest weight class.
So who are these two overall winners? Are they worthy of this distinction? A search on the internet
reveals that Naim Suleymanoglu has been nicknamed “Pocket Hercules” At 16 years old, he set the seniors
world record in the jerk at 160kg and thus made entry in the Guiness Book of Records as the youngest
world recorder holder in history. In the Olympics, he won three Olympic gold medals for weightlifting
in the 1988, 1992, and 1996 (the first weight lifter ever to win three golds!) and he set his 44th, 45th,
and 46th world records with 185kg in jerk, then 332.5 and 335kg in total. The world’s sports journalists
elected him among the Top 25 Athletes of the Century. In the case of the Lightweight, Zhan Xugang,
he blew away his opponents with three world records in the 70-kilogram class to give China its second
weightlifting gold of the Olympics.

Allometric scaling
While geometric similarity works wonders, the late Renaissance Astronomer, Galileo Galilei (1564-1642), observes
that geometric similarity is not universal:
From what has already been demonstrated, you can plainly see the impossibility of increasing the size
of structures to vast dimensions either in art or in nature; likewise the impossibility of building ships,
palaces, or temples of enormous size in such a way that their oars, yards, beams, iron bolts, and, in
short, all their other parts will hold together; nor can nature produce trees of extraordinary size because
the branches would break down under their own weight, so also it would be impossible to build up
the bony structures of men, horses, or other animals so as to hold together and perform their normal
functions if these animals were to be increased enormously in their height; for this increase in height can
be accomplished only by employing a material which is harder and stronger than usual, or by enlarging
the size of the bones, thus changing their shape until the form and appearance of the animals suggest a
monstrosity.
When an animal (or organ or tissue) changes shape in response to size changes differently from we expect with
geometrical similarity (e.g. the cubical cat), we say that it scales allometrically (allo = different, metric = measure).
Allometric scaling is common in nature, both when comparing two animals of different sizes and when comparing
the same animal at two different sizes (that is, growth). For example, a dog may have huge paws as a puppy, but
the paws grow more slowly than its body. Thomas Huxley (1825-1895) was the first to apply the term broadly in
biology, although it had been previously applied only to animals.
Suppose the size of a particular organ in an individual is measured by the variable x and that y is the size of
another organ. Then the fundamental allometric formula says that y is a power function of x; namely
y = axb or y ∝ xb
where a and b are constants, called the constants of allometry. The parameter a is the allometry rate and b is
known as the index of origin, a constant representing the initial relation between the two variables.
Example 7. Breaking bones
J. B. S. Haldane in his essay “On being the right size.”∗

... consider a giant man sixty feet high – about the height of Giant Pope and Giant Pagan in the illustrated
Pilgrim’s Progress of my childhood. These monsters were not only ten times as high as Christian, but
ten times as wide and ten times as thick, so that their total weight was a thousand times his, or about
eighty to ninety tons. Unfortunately, the cross sections of their bones were only a hundred times those
of Christian, so that every square inch of giant bone had to support ten times the weight borne by a
square inch of human bone. As the human thigh-bone breaks under about ten times the human weight,
Pope and Pagan would have broken their thighs every time they took a step. This was doubtless why
they were sitting down in the picture I remember. But it lessens one’s respect for Christian and Jack the
Giant Killer.
Haldane’s argument hinges on the key observation that a structure breaks when the load (total weight of the
organism) per unit cross-sectional area exceeds the strength of the material from which the structure is built. To
better understand this arguement, consider the following two problems.
a. From physics we know that the force per unit area at the base of a cube, which we denote here using the
symbol K, is given by:
density × volume
K = gravitational acceleration ×
area
Calculate the dimensions of a sugar cube that would crush under its own weight at the surface of the
earth where the gravitational acceleration is 9.81 m/s2 , given that the sugar cube’s density is 1040 kg/m3
and its crushing strength (the maximum value of K that it can resist) is 5.17 × 106 Newtons/m2 .
∗ Oxford University Press, 1985, J. Maynard Smith, Editor.

b. Thomas McMahon collected data on lengths L and diameters D of bones for various cloven-hoofed
animals. If these animals were geometrically similar, we would expect L ∝ D. However, the data
suggests that L ∝ D2/3 as illustrated in Figure 1.33 where lengths are measures in millimeters (mm).
length
500
400
300
200
100
diameter
20 40 60 80
Figure 1.33: Lengths L and diameters D in mm of bones for various cloven hoofed animals and fitted curve .
In this data set, the humerus bone of an African impala has a length of 173mm and a diameter of 22.5mm.
Use this information to estimate the length of a wildebeest humerus whose diameter is 42.6mm.
Solution.
a. From the formula, the force per unit area at the base of the sugar cube is K = 9.81×1040L3/L2 = 1020.2L
Newtons per meter where L is the length of one side of the base of the cube. Since the crushing strength
of sugar is 5, 170, 000 Newtons per meter, the cube gets crushed under its own weight if
10202L ≥ 5, 170, 000

L ≥ 506.76 meters!
b. Assume that L = aD2/3 where L is length and D is diameter measured in mm. For an African impala,
we are given that L = 173 and D = 22.5. Solving for the proportionality constant a yields
173 = a(22.5)2/3
a = 173/(22.5)2/3 ≈ 21.7
Using the relationship L = 21.7D2/3 with D = 42.6 yields L ≈ 264.7mm for the length of wildebeest
humerus. The actual value from the data set is 256. Hence, our estimate from the scaling law is not too
bad.
2
Problem Set 1.4

Simplify the functions in Problems 1 to 10, and determine whether the functions are power functions. If a function
is a power function, write it in the form y = axb .
x
1. a. y = 3
1
b. y = 3x
x
c. y = 3

2. a. y = 10
b. y = x10
c. y = 10x
1 1
3. y = 3 + x
2x+15
4. y = 5x
5. y = √ 1
16x3
√
5 x
6. y = 7x2
7. y = 2x 32x 5x
√
36x
8. y = 6x5
√
144x3
9. y = 2x2
10. y = (2x3 )2
11. If y ∝ x2 and y increases from 103 to 1015 , what happens to x?
12. If y ∝ 6x and x ∝ t, how does t change when y increases from 2 × 102 to 6 × 104 ?
13. If y ∝ 10x3 how is x proportionally related to y?
14. If x ∝ 100y and y ∝ 45z, then how does z change as x decreases from 95 to 12?
15. If x ∝ y 2 and y ∝ z 3 , then how is x proportionally related to z?
√
16. If x ∝ y and y ∝ z 2 , then how is x proportionally related to z?
Graph the functions in Problems 17 to 22. By inspection, state the intervals where the function is rising, the intervals
where it is falling, and the turning points.
17. y = 2x2
18. y = 81 x4
19. y = −x3
20. y = 0.1x5
21. y = 12x1/2
2
22. y = x
23. The linear function

y = 3x + b
represents a family of functions whose graphs all look the same except for the relative placement with respect
to the y-axis. On the same coordinate axis, graph the members of this family for the given parameter,b.
a. b = 0
b. b = 4
c. b = −3
√
d. b = 2
24. The quadratic function y = ax2 represents a family of functions whose graphs all look the same except for
the relative placement with respect to the y-axis. On the same coordinate axis, graph the members of this
family for the given parameter,a.

a. a = 0
b. a = 4
c. a = −3
√
d. a = 2
25. A spherical cell of radius r has volume V = 34 πr3 and surface area S = 4πr2 . Express V as a function of S. If
S is quadrupled, what happens to r?
26. Consider a cylinder of radius r and height 5r. Express the volume and surface area of this cylinder as a function
r. If r is doubled, what happens to the volume? If S is quadrupled, what happens to r?
27. Consider a cone of height h and radius h/2 at the top. Express the volume and surface area of this cone as a
function of h. If r is doubled, what happens to S?
Drug doses for dogs and cats are known to scale with their surface area S. When body mass W is measured in kg,
then surface area S in m2 is given by
K ×W
S= ,
100
where for dogs K=10.4 and for cats K=10.1. Further, when converting human drug doses of an average adult to pet
drug doses, the formula
Pet’s S
Pet’s drug dose = × Human adult drug dose
1.73
is used.
In problems 28 to 33, the human adult dose is given of a drug. Calculate the drug dose (rounded to the nearest
milligram) that you would give your dog or cat of the indicated weight.
28. 100 mg of aspirin and your dog weighs 7 kg

29. 200 mg of aspirin and your cat weighs 4.6 kg
30. 250 mg of an antibiotic and your dog weighs 16 kg
31. 500 mg of a renal drug and your cat weighs 5.3 kg
32. 50 mg of an anticoagulant and your dog weighs 31 kg
33. 50 mg of an anticoagulant and your cat weighs 4.8 kg
34. An ant weighs approximately 1/500 ounce and can lift 1/5 ounce which is approximately 100 times its weight.
Assume that strength is proportional to the cross section of a muscle and that all organisms on earth (ants and
men) are geometrically similar. Using these assumptions, determine how much a 150-lb man on earth can lift.
35. A D.C. comic explained Superman’s strength by stating that on Krypton an organisms strength is directly
proportional to their body mass. Based on this assumption and assuming that Krypton ants are like earth ants
(see Problem 34), how much can a 150 pound man on Krypton lift?
36. A sample based on sixty-two species, the leaf area, A, was found to be related to the stem diameter, d, according
to the relationship
A ∝ d1.85
Write this as an equation, select a scaling factor, and then sketch its graph.
37. In a sample of 26 species of trees, wood density, D, is related to breaking strength, S, according to the
relationship
D ∝ S 0.91
Write this as an equation, select a scaling factor, and then sketch its graph.

38. In Julian Huxley’s classic book Problems of Relative Growth (1932) there is data showing the relationship
between the mass of the large claw (chela) and that of the rest of the body in the male fiddler crab (Uca
pugnax ) which exhibits an allometry rate of approximately 1.6. Graph this relationship for a large claw mass
of 0 ≤ x ≤ 800 mg assuming that the initial population is one crab.
39. In 1936, Sinnott showed that there is an allometric relationship between the length and width of gourds, when
observed from ovary to maturity.∗ He obtained the rates of m = 0.95 for pumpkins (Cucubita peop) to m = 2.2
for the snake gourd (Trichosanthehes). Graph these relationships for an initial population of 1 plant.
40. Professor Schreiber’s house (10m wide, 20m long, 4m high - just a hovel, really) has a 30,000 watt furnace that
just barely keeps him warm on cold winter nights. He’s thinking of building a larger house to accommodate his
growing insect collection, and needs advice on the output of the new furnace. The new house will be 3 times
as high, 3 times as wide, and 3 times as long.
a. If he assumes that the furnace size should be proportional to the volume of the house, then what size
furnace should he install?
b. If heat loss depends on the surface area of exterior walls, roof, and floor exposed to the winter cold
rather than on the volume of the house, then what size furnace would you recommend?
41. Consider the following quote from Gulliver’s Travels by Jonathan Swift:
The reader may please to observe, that, in the last article of the recovery of my liberty, the emperor
stipulates to allow me a quantity of meat and drink sufficient for the support of 1724 Lilliputians.
Some time after, asking a friend at court how they came to fix on that determinate number, he
told me that his majestys mathematicians, having taken the height of my body by the help of a
quadrant, and finding it to exceed theirs in the proportion of twelve to one, they concluded from the
similarity of their bodies, that mine must contain at least 1724 of theirs, and consequently would
require as much food as was necessary to support that number of Lilliputians. By which the reader
may conceive an idea of the ingenuity of that people, as well as the prudent and exact economy of
so great a prince.
Let F denote the amount of food an individual eats and L the height of an individual. This quotation implicitly
assumes that F ∝ Lb for an appropriate choice of b. Find this b value and provide a biological explanation for
this choice of b.
42. Suppose the main loss of energy is heat loss through the surface. For the quotation in Problem 41, determine
the appropriate choice of b so that F ∝ Lb . Under the assumption, how much should the Lilliputians feed
Gulliver?
43. The following quote from Haldane illustrates the dangers of being large:∗
To the mouse and any smaller animal, [gravity] presents practically no dangers. You can drop a
mouse in a thousand-yard mine shaft; and, on arriving at the bottom, it gets a slight shock and
walks away. A rat would be probably killed, though it can fall safely from the eleventh story of
a building; a man is killed, a horse splashes. For the resistance presented to movement by air is
proportional to the surface of a moving object. Divide an animal’s length, breadth, and height each
by ten; its weight is reduced to a thousandth, but its surface only to a hundredth. So the resistance
to falling in the case of the small animal is relatively ten times greater than the driving force.
Consider a cubical critter being dropped down a mine shaft. Let A denote the force due to air resistance that
the cubical critter experiences and let M denote the critter’s weight. Assume that A is proportional to surface
area and M is proportional to volume.
M
a. Determine the value of b for which A ∝ M b.
b
b. Graph y = M and discuss the implications for a falling cubical critter.
∗ Differential Growth, Huxley’s Allometric Formula and Sigmoid Growth, by Roger V. Jean. UMAP Module 635, p. 421.
∗ Oxford University Press, 1985, J. Maynard Smith, Editor.

1.5. EXPONENTIALS AND LOGARITHMS 71
1.5 Exponentials and Logarithms

Without doubt, the linear function y = ax + b is the most important elementary function in mathematics. In
the context of calculus, its importance is equalled only by the functions we introduce in this section, the exponential
function and, its close relative, the logarithmic function. Just why these functions are so important in calculus will
become apparent once we introduce the concept of a derivative. In this section, we show that the exponential function
is suitable for describing how populations, income, beer froth and the radioactivity of unstable isotopes change over
time. We then introduce the logarithmic function and its applications to solving problems that arise when modeling
natural processes using the exponential function.
Exponential growth
Figure 1.34 shows the growth of the United States from 1815 until 1895.
population in millions
80
60
40
20 Year Population (in millions)

1815 8.3
1825 11.0
year 1835 14.7
1840 1860 1880
1845 19.7
1855 26.7
1865 35.2
1875 44.4
1885 55.9
1895 68.9
Figure 1.34: Population of the United States
You may notice from the graph that the population seems to be rapidly growing. To obtain a better sense of
population growth, we can divide the size of the population in any given year by the size of the population one
decade earlier. For example,
Population in 1825 11
= ≈ 1.3253
Population in 1815 8.3
and
= ≈ 1.3363
These calculations tell us that population increased by a factor of approximately 33% over both periods. Let us
assume that the population increases by 33% every decade. If t denotes the number of decades that have elapsed
since 1815, then we can estimate the population size N (t) at time t by the exponential function
N (t) = 8.3(1.33)t
The graph of N (t) is plotted in Figure 1.34 against the data, and reasonably approximates the data until 1880, after
which it begins to overestimate the population size. Notice that this function differs from power functions in that
the independent variable is in the exponent instead of the base.

72 1.5. EXPONENTIALS AND LOGARITHMS
Example 1. Malthus’ estimate for doubling time
In Chapter 2 of An Essay on the Principal of Population (1798), Thomas Malthus wrote

In the United States of America, where the means of subsistence have been more ample, the manners of
the people more pure, and consequently the checks to early marriages fewer, than in any of the modern
states of Europe, the population has been found to double itself in twenty-five years.
Let N (t) = 8.3(1.33)t be our model of population growth in the United States from 1815 onwards.
a. Determine whether the population size doubles from 1815 until 1840. Recall that the units of t are
decades.
b. Determine whether the population size doubles over any 25 years period.
Solution.
a. Since t is decades after 1815, we see that t = 2.5. To determine whether the population doubles between
1815 and 1840, we compute the ratio of the population sizes in those years
N (2.5) 8.3(1.33)2.5
= = 1.332.5 ≈ 2.04
N (0) 8.3
b. Consider any time t. To determine whether the population doubles between t and t + 2.5, we compute
the ratio of the population sizes in those years
N (t + 2.5) 8.3(1.33)t+2.5
= = 1.332.5 ≈ 2.04 Using laws of exponents
N (t) 8.3(1.33)t
We see that Malthus’ prediction conforms reasonably well with our model. Notice that we could not test
the prediction directly with data as it is only reported in ten-year intervals.
2
Example 1 illustrates a key property of exponential functions. Namely, if f (x) = ax , then for any h > 0
f (x + h) ax+h
= x = ah
f (x) a
In other words, over any interval of lengthh, the exponential function increases by a fixed factor ah . In the case
of Example 1, this observation implies that the population approximately doubles over any twenty-five year period.
The study of exponential growth in comparison to linear growth also gives us a sense of the urgency of Malthus’
recommendations.
Example 2. Exponential growth vs linear growth
In Chapter 2 of An Essay on the Principle of Population (1798), Thomas Malthus wrote

Let us then take this for our rule, though certainly far beyond the truth, and allow that, by great exertion,
the whole produce of the Island might be increased every twenty-five years, by a quantity of subsistence
equal to what it at present produces. The most enthusiastic speculator cannot suppose a greater increase
than this. In a few centuries it would make every acre of land in the Island like a garden.
Let N (t) = 8.3(1.33)t (in millions) be the population size t decades after 1815. Assume that in 1815, the amount
of food produced in this year is equivalent to 10 million full yearly rations (which is more than sufficient to feed the
1815 U.S. population of 8.3 million individuals). Further, assume, as predicted by Malthus, that the production of
food in the U.S. will increase every 25 years by 10 million full yearly rations.

a. Write a formula for the number R(t) of full yearly rations (in millions) produced over time (remember
the units of t are decades after 1815 ).
b. Graph R(t) and N (t) on the same coordinate plane.
c. Determine the first year in which there is just enough food to provide everyone with one full yearly ration.
d. Determine the year when the amount of food is sufficient to supply everyone with no more than half a
yearly ration (or, equivalently, is sufficient to feed full rations to only half the population.)
Solution.
a. As the amount of full yearly rations increases by 10 million every 25 years, R(t) is a linear function with
10
slope 2.5 = 4. Since R(0) = 10, the intercept of this linear function is 10 and we have
R(t) = 10 + 4t
b. Using technology to plot R(t) and N (t) gives the following graph.
c. By inspection, it looks like the graphs of N (t) and R(t) intersect at t = 4. Hence in 40 years, every
individual in the population will get precisely one full ration per day.
d. We wish to know when the ratio R(t)/N (t) takes on the value 0.5. We use technology to illustrate this
R(t) 10+4t
by plotting the curves y = 0.5 and y = N (t) = 8.3(1.33)t , as depicted in the figure below and looking for
the point of intersection:

From the graph we see that the curves intersect just beyond t = 8. This means that in approximately
80 years, every individual in the population will have to live on a meager half-ration of food per day,
or half of the population will get a full daily ration and the rest will get nothing. Given that both of
these scenarios are rather miserable, this observation of Malthus has been dubbed by some as a “law of
misery.” Other “laws of misery” can be found in Malthus’ Essay.
Exponential Decay
The Ig Nobel Prize is annually awarded to scientists who firstly make people laugh, and secondly make them think.
Dr. Arnd Leike, professor of physics at Universität München, won the 2002 Ig Nobel Prize in Physics for his paper,
“Demonstration of the Exponential Decay Law Using Beer Froth.”∗ After pouring a mug full of the German beer
Erdinger Weissbier, Dr. Leike measured the height of the beer froth at regular time intervals. The measured values
are shown in Table 1.9.
Table 1.9: Froth Height Decay

time t (seconds) froth height H (cm)
0 17
15 16.1
30 14.9
45 14
60 13.2
75 12.5
90 11.9
105 11.2
120 10.7
If we consider the ratios of heights at subsequent time intervals, we find
Height at 45 seconds 14
= ≈ 0.94
Height at 30 seconds 14.9
∗ European Journal Physics 23 (2002) 21–26.

and
Height at 60 seconds 13.2
= ≈ 0.94
Height at 45 seconds 14
Note, 0.94 represents 6% decay. If we assume, as the data suggests, every 15 seconds the height of the froth decays
by a factor of 6%, then we can write an expression (formula) for the froth height and see how well it fits the data.
Example 3. Modelling beer froth
Find values for the parameters a and b of the function
H(t) = abt
that ensure the function passes through the first data point in Table 1.9 and that the height of the froth declines 6%
every 15 seconds. Use technology to graph H(t) alongside the data. How well does the function fit the data?
Solution. Since the initial height of the froth is 17 cm and H(0) = ab0 = a, we set a = 17. On the other hand,
assuming that the froth decays by a factor of 6% every 15 seconds mean that
H(15) ab15
0.94 = = = b15
H(0) a
Hence b = 0.941/15 ≈ 0.99588. Therefore, we have (in cm)
H(t) = 17(0.99588)t
The graph is shown in Figure 1.35 and appears to fit the data very well.
froth height
17.5
15
12.5
10
7.5
5
2.5
time
50 100 150 200 250 300
Figure 1.35: Froth height equation plotted with data points
One way of understanding this exponential decay is to think of the froth as a large collection of bubbles. According
to our calculations, approximately every 15 seconds, 6% of the bubbles will pop, leaving only 94% of the original
head of froth. As the bubbles continue to pop, there are fewer and fewer that can pop and, consequently, as shown
in Figure 1.35, the number of bubbles left to pop declines to zero over time in a way that seems to be modeled rather
well by a function that has a variable appearing as the exponent of some base value. For this reason, the decline is
called exponential decay.
Exponential Functions and the Number e

In the previous section, we introduced power functions y = axb characterized by a variable base raised to some fixed
power. In this section, we encountered functions where the exponent is variable and the base is fixed. Such functions
are termed exponential functions.

An exponential function is a function of the form
y = f (x) = bx
Exponential Function
where the parameter b (the base) is a positive real number and the variable x (the
exponent) is a real number.
The graphs of exponential functions have three different shapes, depending on the value of the base, as shown in
the following example.
Example 4. Graphing exponential functions
Graph the exponential function

y = ax
where a > 1,a = 1, and 0 < a < 1. Show these graphs on the same coordinate axes, and comment on each.
Solution. The graph is shown in Figure 1.36
Figure 1.36: Graph for y = ax
The graph of y = ax passes through (0, 1) for all values of a. We also notice that:
• If a < 1, the graph is increasing for all x;
• if a = 1, the graph is a horizontal line (a constant function);
• if a < 1, the graph is decreasing for all x.
2
In Section 1.2 we discussed

√ the set of real numbers, which is made up of both the rational and irrational numbers.
We discussed the fact that 2 is an irrational number, and as such, cannot be represented as a terminating or
repeating decimal. There are two other famous irrational numbers, π and e. These numbers are so important
that they are assigned keys on your calculator. Here is what you should see if you press the appropriate keys:
π = 3.14159265359 . . . and e = 2.71828182846 . . .. Keep in mind that these calculators are decimal approximations

for the numbers π and e. The following is a list of Top 10 reasons (ala a David Letterman countdown) of why e is
better than π.
10. e is easier to write than π.
9. e = 1 + 1 + 1 + 1 + · · · while π = 2(1 + 1 + 1·2 + 1·2·3 · · ·).

1! 2! 3! 3 3·5 3·5·7
8. e is on your keyboard while π is not.
7. Everybody fights for their piece of the π.
Top 10 reasons why 6. e is easier to spell than pi.
e is better than π 5. e is the most commonly picked vowel in Wheel of Fortune.
4. e stands for Euler’s number (big stuff) but π stands for squat.
3. e is used in calculus while π is used in baby geometry.
2. You don’t need to know Greek to be able to pronounce e.
1. (Drum roll ... ) You can’t confuse e with a food product.
Example 5. Naı̈ve approach to solving an exponential equation
Graph y = ex and y = 10, 000 to solve the equation ex = 10, 000.
Solution. Graphing y = ex and y = 10, 000 yields

4
x 10
2.5
1.5
y
0.5
0
0 1 2 3 4 5 6 7 8 9 10
x
Estimating the x value at which the intersection occurs at x ≈ 9. 2
Logarithms
In Example 5, we needed to solve an exponential equation of the form ex = 10, 000. We solved this equation by
graphing, but logarithms give us a way to solve these equations analytically. If ask, for example, can you solve
2x = 8, we quickly respond x = 3. However, what is the solution to the equation 2x = 14? We express the idea in
words:
x is the exponent on a base 2 that gives the answer 14
This can be abbreviated as
x = exp2 14
For historical reasons we use the word “logarithm” for “exponent” and now write this shortened notation as
x = log2 14
This statement is read, “x is the log (exponent) on the base 2 which gives 14.” For example, 52 = 25 can be
rewritten as “2 is the log (exponent) on the base 5 which gives 25” or “2 = log5 25” and 2−3 = 81 can be rewritten
as −3 = log2 81 . This leads us to the following definition of logarithm.

Let b and x be positive real numbers, b 6= 1,
Logarithm y = logb x means by = x
y is called the logarithm on base b and x is called the argument.
The statement “y = logb x” should be read as “y is the exponent on a base b that gives the value x.” Do not
forget that a logarithm is an exponent.
Example 6. Using the definition of logarithm
Find x such that
a. x = log2 16
b. x = log4 16
c. log10 x = 3
d. loge x = 2
Solution.
a. “x is the exponent on a base 2 that gives 16”; Since 24 = 16, x = 4.
b. “x is the exponent on a base 4 that gives 16”; Since 42 = 16, x = 2.
c. “3 is the exponent on a base 10 that gives x”; x = 103 = 1, 000.
d. “2 is the exponent on a base e that gives x”; x = e2 .
In elementary work, the most commonly used base is 10, so we call a logarithm to the base 10 a common
logarithm, and agree to write it without using a subscript 10. Thus, part c of the previous example is usually
written log x = 3. In most biological applications dealing with natural growth or decay, the base e is more common.
A logarithm to the base e is called a natural logarithm and is denoted by ln x. The expression ln x is often
pronounced “ell en ex” or “lawn ex”. In some texts, especially those pertaining to information theory in computer
science, the function log2 x is of theoretical importance and its written simply as lg x.
a. Common logarithm: log x means log10 x.

Logarithmic
Notations b. Natural logarithm: ln x means loge x.
To evaluate a logarithm means to find a decimal approximation. You should find the keys labeled LOG and
LN on your calculator. Verify the following calculator evaluations using your own calculator:
log 5.03 ≈ 0.7015679851 ln 3.49 ≈ 1.249901736 log 0.00728 ≈ −2.1378621
The following properties of logarithms follow immediately from the properties of exponents and the definition of
logarithms.

Additive law: logb x + logb y = logb xy

x
Subtractive law: logb x − logb y = logb y
Multiplicative law: y logb x = logb (xy )

Laws of Logarithms logb x
Change of base: loga x = logb a
Grant’s tomb properties:

logb bx = x
blogb x = x, x > 0
Example 7. Graphing logarithmic functions
Use technology to graph the logarithmic functions y = log x, y = ln x, and y = log2 x on the same coordinate
axes. Discuss the common properties of these graphs.
Solution. The graphs (using technology) are shown in Figure 1.37.
Figure 1.37: Graphs of logarithmic functions
In all cases, the function has a domain of (0, ∞) and range of (−∞, ∞); that is the real number line R. The
x-intercept is (1, 0) and has a vertical asymptote at x = 0, and are increasing and concave down. 2
Example 8. Solving exponential equations
Approximate the solutions to two decimal places.
a. 10x = 0.5

b. ex = 10, 000
c. 1.33t = 2
d. ln(2x) = 1
e. log2 4x = 3
Solution. Be sure to duplicate the results below using your own calculator.
a. This means x is the exponent on a base ten which gives 0.5; in symbols, x = log 0.5. Then, evaluate with
your computer to find x = log 0.5 ≈ −0.30.
b. Using the definition, x = ln 10, 000 ≈ 9.21.
c. We note t is the exponent on a base 1.33 which is 2. That is,
log 2
t = log1 .332 = ≈ 2.43
log 1.33
d. We have
ln(2x) = 1
e1 = 2x Definition of logarithm
e
x =
2
e. We have
log2 4x = 3
x log2 4 = 3 By Grant’s tomb proprieties
x·2 = 3
2
x = ≈ 0.67
3
Logarithmic functions are key to finding half lives of exponentially decaying quantities and to finding doubling
times for exponentially growing quantities.
Example 9. Half-life and doubling time
a. An important quantity associated with exponential decay is the half-life, the time it takes half of the
substance to decay. Let H(t) = 17(0.99588)t denote the height of the beer froth at time t seconds. Find
the time at which half of the froth has been lost.
b. Previously we estimated the doubling time in part b as 25 years. Find a more precise estimate by solving
the equation N (t)/N (0) = 2 for the unknown t.
Solution.

H(t)
a. To find the half-life, we want to find t such that H(t) = 0.5H(0). Equivalently, H(0) = 0.5.
H(t)
= 0.5 Given equation
H(0)
(0.99588)t = 0.5 Evaluate functions
t = log0.99588 0.5 Definition of logs
log 0.5
= Change of base
log 0.99588
= 167.89 Evaluate
It takes the froth almost three minutes to decay! No wonder this is Dr. Leike’s favorite beer.
b.
N (t)
= 2 Given equation
N (0)
1.33t = 2
ln 1.33t = ln 2
t ln 1.33 = ln 2 Multiplicative law
ln 2
t = ≈ 2.431 Evaluate
ln 1.33
Since t is in decades, the doubling of the population occurs approximately in 24 years and 4 months.
2
Logarithms provide the perfect tool for fitting power functions y = axb to data.
Example 10. Linear regression on a logarithmic scale
The metabolic rate of an organism is the rate at which it builds up (anabolism) and breaks down (catabolism)
the organic material that constitute its body. A famous data set exhibiting an allometric scaling law for relating
metabolic rate y to body mass x was first published by Max Kleiber and is reproduced here in Table 1.10.∗
Since the data should exhibit allometry, we would expect that there exist reals a > 0 and b such that
y = axb
a. Convert the equation y = axb to a linear equation in the variables ln y and ln x using logarithms.
b. Apply ln to all of the data in Table 1.10 and use technology to find the best-fitting line for the converted
data.
c. One data point missing from Table 1.10 is for the elephant. Use the best-fitting line to estimate the
metabolic rate of an African elephant with mass 6,800 kilograms.
Solution.
a. Taking the natural logarithm of both sides of y = axb and applying logarithmic rules yields
y = axb
ln y = ln(axb )
= ln a + ln xb
= ln a + b ln x
Hence, Y = ln y is a linear function of X = ln x.

∗ Source: M. Kleiber, The Fire of Life, 1961, pg. 205

Table 1.10: Metabolic data

Animal Weight kCal/day
Mouse 0.021 3.6
Rat 0.282 28.1
Guinea-pig 0.410 35.1
Rabbit 2.980 167
Rabbit 2 1.520 83
Rabbit 3 2.460 119
Rabbit 4 3.570 164
Rabbit 5 4.330 191
Rabbit 6 5.330 233
Cat 3.0 152
Monkey 4.200 207
Dog 6.6 288
Dog 2 14.1 534
Dog 3 24.8 875
Dog 4 23.6 872
Goat 36 800
Chimpanzee 38 1,090
Sheep 46.4 1,254
Sheep 2 46.8 1,330
Woman 57.2 1,368
Woman 2 54.8 1,224
Woman 3 57.9 1,320
Cow 300. 4,221
Cow 2 435 8,166
Cow 3 600 7,877
Heifer 482 7,754
b. Taking logarithms of the masses and metabolic rates in Table 1.10 and plotting the new data yields the
red dots in Figure 1.38. This figure illustrates that the data on a logarithmic scale appears linear.
Figure 1.38: Metabolic rates on a logarithmic scale with the best-fitting line
As before, we use technology to find the best-fitting line:
ln y = 0.755917 ln x + 4.20577
3
The ln y-intercept is (0, 4.20577) and the slope is 0.755917 ≈ 4. There have been many theoretical
attempts to explain this scaling exponent.

c. To predict the metabolic rate, y, for an elephant of mass x = 6, 800, we substitute this x-value into the
equation for the best-fitting line and solve for y.
ln y = 0.755917 ln x + 4.20577
ln y = 0.7559 ln 6, 800 + 4.20577
ln y = 10.8765
y = e10.8765 ≈ 52, 918
The elephant will burn off approximately 53, 000 kilocalories per day.
2
Problem Set 1.5

Graph the exponential or logarithmic functions in Problems 1 to 8.
1. y = 2x
2. y = ( 21 )x
3. y = 3−x
2
4. y = ex
5. y = log3 x
6. y = ln x2
7. y = eπx
8. y = π ex
Find x in Problems 9 to 13 using the definition of logarithm (no calculator)

9. a. x = log 10
b. x = log 0.001
10. a. x = ln e2
b. x = ln e−4
11. a. x = log5 125
b. x = log8 64
12. a. 5 = log x
b. 18 = ln x
13. a. ln x = 3
b. log x = 4.5
Simplify the expressions given in Problems 14 to 16
14. a. 28 log2 x
b. 33 log3 x
c. 5−2 log2 x

d. 23 log1/2 x
e. 3− log1/2 x
15. a. log2 8x
b. log3 81x
c. log4 64x
d. log1/2 32x
e. log3 9−x
16. a. e4 ln x
2
b. e3 ln(x +1)
2
c. e−2 ln(x −1)
−3 ln(1/x)
d. e
2
e. e− ln(1/(x +1))
In Problems 17 to 19 write the expressions in terms of base e and simplify where possible.
17. a. 5x
1
b. 2x
1/x
c. 5
d. π x
2
e. 4x
e
f. 3x
18. a. 31−x
b. 3x+2
c. 21/x+e
2
d. 4x
e. 3−3x−2
19. a. log(x + 1)
b. log(ex + e)
c. log2 (x2 − 2)
d. log7 (2x − 3)
Simplify the expressions in 20 to 24 using the definition of logarithm (no calculator)
√
20. log 100 + log 10
21. ln e + ln 1 + ln e542
22. log8 4 + log8 16 + log8 82.3
23. 10log 0.5
24. ln elog 1,000
25. The following functions give the population size P (t) in millions for four fictional countries where t is the
number of decades since 1900.

Country #1 P1 (t) = 3(1.5)t

Country #2 P2 (t) = 10(1.1)t
Country #3 P3 (t) = 20(0.95)t
Country #4 P4 (t) = 2(1.4)t
a. Which country had the largest population size in 1900?

b. Which country has the fastest population growth rate? By what percentage does this population
grow every decade?
c. Is any of these populations decreasing in size? If so, which one and by what fraction does the
population size decrease every decade?
26. The following functions given the froth height of three fictional beers where t represents time (in sec).
Beer #1 H1 (t) = 20(0.99)t

Beer #2 H2 (t) = 40(0.9)t
Beer #3 H3 (t) = 15(0.98)t
a. Which beer has the highest froth initially? What is the height?
b. Which beer has the slowest decay of froth? For this beer, what percentage of the height is lost in 10
seconds? 20 seconds?
c. Which beer has the highest froth height after 10 seconds?
27. Carbon-14 has a half-life of 5,730 years. How much is left of 500g of C–14 after t years?
28. If a bacterial population initially has 20 individuals and doubles every 9.3 hr, then how many individuals will
it have after three days?
29. “Whale Numbers up 12% a Year” was a headline in a 1993 Australian newspaper. ∗ A 13-year study had
found that the humpback whale (Megaptera novaengliae) off the coast of Australia was increasing significantly.
When the study began in 1981, the humpback whale population was 350.
a. Write down an expression for P (t), the population size at t years after 1981.
b. Estimate the doubling time for this population of whales.
c. Estimate the size of the population in 2004.
30. The population size (in millions) of Mexico in the early 1980s is reported in Table 1.11:
Table 1.11: Population in Mexico

Year Population (in millions)
1980 67.38
1981 69.13
1982 70.93
1983 72.77
1984 74.66
1985 76.60
a. Assume the population growth in Mexico is exponential. Use the first two data points to find a
formula for P (t), the population size (in millions) t years after 1980.
b. Plot P (t) against the data. Discuss the quality of the fit.
∗ This problem is based on problem that you can find at www.learner.org.

c. Estimate the doubling time for the population.

d. Estimate the size of the population in 2004.
e. Look up Mexico’s actual population size in 2004. Does your model over or under predict the popu-
lation size? Discuss your answer.
31. Figure 1.39 shows a plot of the weight W (in grams) vs. length L in meters) for a sample of 158 male and 167
female western hognose snakes (Heterodon nasicus) from Harvey County, Kansas. The females are represented
by open circles, and the males by closed circles. The scale is log-log , and is from D. R. Platt (1969).
Figure 1.39: Regression line of weight vs. length
It appears that when L = 0.4 cm, the corresponding weight on the best-fitting line is W = 28 g; likewise,
L = 0.6 m appears to correspond to W = 100 g. Assuming an allometric relationship W = cLm , we have
28 = c(0.4)m and 100 = c(0.6)m
Find the allometric relationship between weight and length (round c to the nearest integer).
32. It is known that fluorocarbons have the effect of depleting ozone in the upper atmosphere. Suppose the amount
Q of ozone in the atmosphere is depleted by 15% per year, so that after t years, the amount of original ozone
Q0 that remains may be modeled by
Q = Q0 (0.85)t
a. How long (to the nearest year) will it take before half the original ozone is depleted?
b. Suppose through the efforts of careful environmental management, the ozone depletion rate is de-
creased so that it takes 100 years for half the original ozone to be depleted, what is the new rate (to
the nearest hundredth of a percent)?

1.6. FUNCTION BUILDING 87
1.6 Function Building

We have reviewed basic properties of linear, periodic, exponential, and power functions. By combining these
functions, we can greatly enlarge our “toolbox” of functions. With this larger toolbox of functions, we can describe
more data sets and model more biological processes. For instance, in this section, we develop models of the waxing
and waning of tides and the rates at which organisms consume their resources.
Shifting, Reflecting, and Stretching
The simplest way to create the graph of a new function from the graph of another function is to shift the graph
vertically or horizontally.
Let y = f (x) be a given function with a > 0.

Horizontal shifts:
y = f (x − a) shifts the graph of y = f (x) to the right a units;
Horizontal and
y = f (x + a) shifts the graph of y = f (x) to the left a units;
Vertical shifts
Vertical shifts:
y = f (x) + a shifts the graph of y = f (x) upward with a units; and
y = f (x) − a shifts the graph of y = f (x) downward a units.
To understand why these shifts occur, consider y = f (x−a). Substituting x+a for x yields y = f (x+a−a) = f (x).
Hence, the function y = f (x − a) has the same value as the function y = f (x) when you “shift x” to the right by a.
Example 1. Shifty graphs
Consider the function y = f (x) whose graph is given by Figure 1.40.
Figure 1.40: Graph of y = f (x)
Sketch the graphs of y = f (x − 0.5), y = f (x) − 0.5, and y = f (x + 1) + 1.
Solution. y = f (x − 0.5) shifts the graph right 0.5 units. y = f (x) − 0.5 shifts the graph down 0.5 units.
y = f (x + 1) + 1 shift the graph left 1 unit and up 1 unit. These graphs are shown in Figure 1.41
2
In addition to shifting graphs, we can reflect graphs across axes.

88 1.6. FUNCTION BUILDING
a. Shift to the right 0.5 units b. Shift down 0.5 units c. Shift left 1 unit and up 1 unit
Figure 1.41: Shifting a graph
Let y = f (x) be a given function.

The graph of y = −f (x) is the reflection across the x-axis It is found by replacing
Reflections each point (x, y) on the graph with (x, −y).
The graph of y = f (−x) is the reflection across the y-axis It is found by replacing
each point (x, y) on the graph with (−x, y).
Example 2. Reflecting a function
Consider the function y = f (x) whose graph is given by
−1
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
a. Sketch y = f (−x).
b. Sketch y = −f (x).
Solution.
a. Reflecting the graph about the y-axis yields the desired graph in red:

−1
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
b. Reflecting the graph about the x-axis yields the desired graph in red:
−2
−4
−6
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
A curve can be stretched or compressed in either the x-direction, the y-direction, or both, as shown in Figure 1.42
Given curve x-stretch y-stretch x-compression y-compression
Figure 1.42: Stretching and compressing a given graph

Let y = f (x) be a given function.

To sketch the graph of y = f (bx), replace each point (x, y) with ( 1b x, y). If 0 < b < 1,
then we call the transformation and x-dilation (or x-stretch). If b > 1, then we
Stretching and
call the transformation an x-compression.
Compressing
To sketch the graph of y = c f (x), replace each point (x, y) with (x, cy). If c > 1,
then we call the transformation and y-dilation (or y-stretch). If 0 < c < 1, then
we call the transformation an y-compression.
Example 3. Transforming y = ln 2x
Graph y = ln 2x by comparing it to y = ln x.
Solution. The function y = ln 2x can be interpreted two ways. First, it is a horizontal compression by a factor of
2. Alternatively, since ln 2x = ln x + 2, it corresponds to vertically shifting the graph of y = ln x by 2. Hence, the
graph is given by Figure 1.43. 2
Figure 1.43: Graph of y = ln 2x
By compressing and stretching sinusoidal functions, we can model periodic phenomena like tidal movements.
Example 4. Modelling tidal movements
The tides for Toms Cove in Assateague Beach, Virginia on August 19, 2004 are listed in the following table:
Time Height (ft) Tide
5:07 AM 0.4 Low
10:57 AM 4.0 High
5:23 PM 0.4 Low
Assume that this can be modeled by
T (t) = A cos[B (t + C)] + D
where T denotes the height (in feet) of the tide t hours after midnight. Find values of A, B, C and D such that the
function fits the Assategue tide data.

Solution. The data suggests that the period of T is approximately 12 hours, in which case B = 2π π
12 = 6 . The
amplitude of the tide is given by A = 4−0.4
2 = 1.8. Since the graph of cosine is always centered around the horizontal
axis, we need to vertically shift the graph up by the mid-tide height, D = (4.0 + 0.4)/2 = 2.2. Finally since the high
tide occurs approximately at t = 11, we can choose C = −11 to shift the graph left by 11. Putting this all together
yields
hπ i
T (t) = 1.8 cos (t − 11) + 2.2
6
To graph this function, we note (h, k) = (11, 2.2); amplitude, A = 1.8, and period T = 12, as shown in Figure 1.44.
Figure 1.44: Graph fitting the Assategue tide data
Adding, Subtracting, Multiplying, and Dividing

The easiest way to create new functions is to perform arithmetic operations on old functions. The first three of
these operations result in a function whose domain is the intersection of the domains of the original functions. Since
division by zero is not permitted, division can further reduce the domain of the new function.
Let f and g be functions with domains A and B, respectively. Then

f + g is defined by (f + g)(x) = f (x) + g(x) with domain A ∩ B.
f − g is defined by (f − g)(x) = f (x) − g(x) with domain A ∩ B.
Functional
Arithmetic f g is defined by (f g)(x) = f (x)g(x) with domain A ∩ B
f /g is defined by (f /g)(x) = f (x)/g(x) with domain consisting of points x in
A ∩ B such that g(x) 6= 0
Example 5. Combining functions

√
Consider the functions f (x) = 100 − x2 and g(x) = sin x. Find the domains and sketch the graphs of f + g, f g
and f /g.
Solution. The domain of f (x) is [−10, 10] and the domain of g(x) is (−∞, ∞). It follows that the domains of f + g
and f g are [−10, 10]. The graph of y = f (x) (in red) is a semicircle of radius 10 and the graph of y = g(x) is sine
(in blue). Adding the graphs point-wise, yields the graph of y = f (x) + g(x) shown in black.

y
12
10
x
-10 -5 5 10
Multiplying the graphs point-wise yields the graph of y = f (x)g(x), as shown in black.
y
10
x
-10 -5 5 10
-5
-10
For the quotient, we must think about division by zero. Since g(x) = 0 whenever x is an integer multiple of π, the
domain of f /g is the interval [−10, 10] without the values 0, ±π, ±2π, ±3π. Dividing the graphs of y = f (x) (in red)
and y = g(x) (in blue) point-wise yields the graph of y = f (x)/g(x) (in black)
y
20
15
10
5
x
-10 -5 5 10
-5
-10
-15
-20
An important class of functions that we get by adding and multiplying are polynomials, f (x) = a0 + a1 x + a2 x2 +
. . . + an xn where a0 , a1 , . . . , an are constants, and rational functions, a polynomial divided by a polynomial. The
following example illustrates how rational functions arise in biology.
Example 6. Michaelis-Menten uptake rate
To bring nutrients such as glucose into their cell bodies, bacteria have special molecular receptors embedded in
their cell membrane. These receptors “capture” nutrient molecules outside of the cell and transport them into the
cell body. This process is illustrated in Figure 1.45. The rate at which nutrients can be brought into the cell body is
called the uptake rate. The uptake rate is limited by number of receptors and the time it takes a receptor to bring a
nutrient particle into the cell body.

Figure 1.45: Cell body and receptors
a. To model this uptake rate, let f (x) be the amount of nutrients brought into the cell per minute as a
function of nutrient concentration x outside of the cell body. Since the chance a cell becomes occupied
increases with the time it is unoccupied and increases with the nutrient concentration, it is reasonable
to assume that the fraction of time 1 − t that a receptor is occupied is proportional to tx. Using these
assumptions, find an expression for t as a function of x. In finding this expression, you will have to
introduce a proportionality constant.
b. Since nutrients are being brought into the cell when receptors are occupied, the uptake rate f (x) should
be proportional to 1 − t. Write down an expression for f (x).
c. In the 1960s, scientists at Woodshole Oceanographic Institute measured the uptake rate of glucose by
bacterial populations from the coast of Peru.∗ In one field experiment, they collected the following data:
glucose concentration uptake rate for one liter of bacteria

(micrograms per liter) (micrograms per hour)
0 0
20 12
40 16
60 18
80 19
100 20
By an appropriate change of variables (see the problem set!), one can use linear regression to estimate
1.2078x
the parameters for the uptake function. Doing so yields f (x) = 1+0.0506x . Use technology to plot this
function against the data. How good is the fit?
Solution.
a. Since 1 − t ∝ tx, there exists a > 0 such that 1 − t = atx. Therefore,
t + atx = 1
t(1 + ax) = 1
1
t =
1 + ax
∗ R. F. Vaccaro and H. W. Jannasch. 1967. Variations in uptake kinetics for glucose by natural populations in seawater. Limnology
and Oceanography. 12:540–542.

b. Since f (x) ∝ 1 − t, there exists a constant b > 0 such that
f (x) = b(1 − t)

1
= b 1−
1 + ax

1 + ax 1
= b −
1 + ax 1 + ax
bax
=
1 + ax
c. Using technology to plot the function against the data yields
25
20
micrograms per liter per hour
15
10
0
0 20 40 60 80 100 120
micrograms per liter
The fact that the function fits the data so well gives us confidence that the arguments used to construct
the function are sound! One interesting question to ask in this example is what happens to the uptake
rate f (x) as x gets very large (i.e. approaches +∞)? In the next chapter, we will develop ideas to tackle
this question.
abx
The uptake function f (x) = 1+ax is known as the Michaelis-Menton uptake function. It is named after two
biochemists, Leonor Michaelis (1875–1947) and Maud Menten (1879–1960). In addition to describing nutrient uptake,
this function can be used to describe enzyme kinetics, the growth of populations, and the consumption rates of
organisms.
Composing Functions
Situations often arise in biology where the relationship between two variables x and z is mediated by a third variable
y. For example, the rate z at which a population of mice or shrews grows is related to the number y of insects it
consumes per unit time and this rate y is related to the density x of insects in the area where these animals feed. Let
f be the function that relates consumption rate y to resource density x; that is y = f (x). Let g be the function that
relates the per-capita population growth rate z of the population to the consumption rate y; that is, z = g(y). Then
by substitution, we obtain z = g(f (x)). We have expressed the growth rate as a function of resource density through
the process of taking a function of a function. This process is known as composition and is shown in Figure 1.46.

Figure 1.46: Composition of two functions
Let f and g be functions with domains A and B, respectively. The composite

function g ◦ f is defined by
Composite Functions
(g ◦ f )(x) = g[f (x)]
The domain of g ◦ f is the subset of A for which g ◦ f is defined.
To visualize how functional composition works, think of f ◦ g in terms of an “assembly line” in which f and g are
arranged in series, with output f becoming the input of g.
Example 7. Composing functions

√
Let f (x) = 2x + 1 and g(x) = x. Find the composite functions g ◦ f and f ◦ g and their domains.
Solution. The function g ◦ f is defined by

√
g[f (x)] = g(2x + 1) = 2x + 1
Notice that g ◦ f means that f is applied first, then g is applied. Since g ◦ f is defined only for 2x + 1 ≥ 0 or x ≥ − 12 ,
the domain of g ◦ f is [− 21 , ∞).
The function f ◦ g is defined by √ √
f (g(x)) = f ( x) = 2 x + 1
In this part, first apply g then apply f . Since f ◦ g is defined only for x ≥ 0, we see the domain of f ◦ g is [0, ∞). 2
Example 7 illustrates that functional composition is not, in general, commutative. That is, in general,
f ◦ g 6= g ◦ f
Sometimes it can be useful to express a function as the composite of two simpler functions.
Example 8. Decomposing functions
Express each of the following functions as the composite f ◦ g of two functions f and g.
a. sin2 x
b. ln(2 + cos x)
Solution.
a. A good way of thinking about this is to think about how you would use a calculator to evaluate this
expression. We would first find sine of x, and then square the result. Hence, let g(x) = sin x and f (x) = x2
so that
f [g(x)] = f (sin x) = (sin x)2 = sin2 x

Figure 1.47: The short-tailed shrew (Blarina brevicauda)
b. To evaluate the function y = ln(2 + cos x), we first take cosine of x, add two, and then find the natural
logarithm of the result. Since the evaluation of this function takes three steps, there is more than one
way that we can represent it as a composition of two functions.
Let g(x) = cos x + 2 and f (x) = ln x. Then,
f [g(x)] = f (cos x + 2) = ln(cos x + 2)
Alternatively, let g(x) = cos x and f (x) = ln(2 + x). Then,
f [g(x)] = f (cos x) = ln(2 + cos x)
The next example involves the composition of two well-known functions in ecology. The first is the consumption
function y = f (x) that relates the rate at which an organism is able to consume a resource of density x in the
environment; and is referred to by ecologists as the functional response. The second is the per-capita growth rate of
an organism g(y) that is a function of the consumption rate y. Thus, it follows that the per-capita growth rate G(x),
as a function of the resource density x, is given by the composition G(x) = g[f (x)]. A particularly suitable form for
g(y) is the hyperbolic function∗
b
g(y) = r 1 − .
y
where r is the maximal per-capita growth rate of the population and b is the growth break-even point i.e. g(b) = 0.
The well-known Canadian ecologist, C. S. Holling (1959), collected data on the daily rates at which individual
short-tailed shrews (Blarina brevicauda), as shown in Figure 1.47, gather cocoons of the European pine sawfly
(Neodiprion sertifer (Geoff.)) buried in forest-floor litter floor found in the sand-plain area of southwestern Ontario,
Canada. These data, as a function of cocoon density x per thousandth acre (i.e. acres ×10−3 ), can be fitted
reasonably well by the function
x
y = f (x) = 320 cocoons per day.
110 + x
We use this function in the next example and note that after dividing both the numerator and denominator by 110, it
(320/110)x
can also be written as f (x) = 1+(1/110)x . Written this way, it is clear this function is the same as a Michaelis-Menten
with a = 1/100 and b = 320, as presented in Example 6.
Example 9. Short-tailed shrews exploiting cocoons

∗ Getz, W. M. 1993. “Metaphysiological and evolutionary dynamics of populations exploiting constant and interactive resources: r-K
selection revisited.” Evolutionary Ecology 7:287-305

For the shrew population studied by C. S. Holling, suppose we are given the information that under ideal conditions
(i.e. when the number of sawfly cocoons per shrew is essentially unlimited) each pair of shrews produces an average
of around 20 female and 20 male progeny per year. Use this data to estimate the maximum per capita growth rate
r per day in the growth rate function g(y) = r 1 − yb , where y is the number of cocoons consumed per day per
shrew per unit area. Since the growth break-even point b is not known for this species, we assume that b = 100
cocoons consumed per day per shrew per unit area. Use functional composition on this growth rate function and
320x
C. S. Hollings response function f (x) = 110+x on the daily rate at which shrews collect cocoons to find the daily
per-capita growth rate G = g ◦ f as a function of cocoon density x.
Solution. If the maximum daily rate r of population growth corresponds to a 20-fold increase in population levels
in one year, it follows that √
r365 = 20 ⇒ r =
365
20 ≈ 1.00824

Hence g(y) = 1.00824 1 − 100
y . Taking functional composition now yields

100
G(x) = g[f (x)] = 1.00824 1 −
f (x)

100(110 + x)
= 1.00824 1 −
320x

320x − 100x − 11, 000
= 1.00824
320x
1.00824(240x − 11, 000)
=
320x
2
Inverse Functions
Sometimes when we are given the output of a function, we want to know what inputs could generate the observed
output. For instance, consider the function that assigns to each gene the protein that it encodes. If in an experimental
study we observe certain proteins at high abundance, we might want to know what genes might have been expressed.
Consider the linear function f (x) = 2x + 9. Then at x = 3 we have f (3) = 15. The inverse function, call it f −1
if it exists, is a function the assigns an output 3 to the input 15 (i.e. it reverses the roles of the input and output),
so that f −1 (15) = 3. Of course, for f −1 to be an inverse function, it must undo the effect of f for each and every
member of the domain. This may be impossible if f is a function such that two x-values give the same y value. For
example, if g(x) = x2 , then g(2) = 4 and g(−2) = 4. So we need a function g −1 so that
g −1 (4) = 2 and also g −1 (4) = −2
However, this violates the definition of a function. So it is necessary to limit the given function to be one-to-one.
A function f : X → Y is one-to-one if f (a) = f (b) for some a, b in X implies that

One-to-one
a = b.
Remember, the vertical line test we used to determine if a given relation is a function; we have a similar test,
called the horizontal line test to determine if a given function is one-to-one.
A function f is one-to-one if and only if every horizontal line intersects the graph
Horizontal Line Test of y = f (x) in at most one point.
Example 10. Using the horizontal line test
Determine which of the following functions are one-to-one.

y y
7.5
0.05
5
0.025
2.5
x
-0.1 -0.05 0.05 0.1 x
-2 -1 1 2
-0.025
-2.5
-0.05
-5
-0.075
-7.5
(a) (b)
y y
1 6
4
0.8
2
0.6
x
-4 -2 2 4
0.4
-2
0.2
-4
x
0.5 1 1.5 2 -6
(c) (d)
Solution.
a. Since any horizontal line would intersect multiple points on the graph of y = f (x), this function is not
one-to-one.
b. Since any horizontal line only intersects the graph of y = f (x) in one point, this function is one-to-one.
c. Since the horizontal line y = 1 passes through an infinite number of points of the graph, this function is
not one-to-one.
d. Since any horizontal line intersects the graph in at most one point, this function is one-to-one.
2
We can now define an inverse function.
Let f be a one-to-one function with domain D and range R. The inverse f −1 of f

is the function with domain R and range D such that
Inverse Function f −1 (x) = y if and only if y = f (x)
Equivalently,
(f ◦ f −1 )(x) = x for every x in R
Example 11. Showing functions are inverses
Show that g(x) = (x − 3)1/3 is the inverse of f (x) = x3 + 3.
Solution. Since the range of f is all of the reals, the domain of g should be all of the reals. For any real number
x, we have that
(f ◦ g)(x) = [(x − 3)1/3 ]3 + 3

= x−3+3
= x
Thus, g is f −1 . 2

To find the inverse, it is helpful to visualize a function as a set of ordered pairs. Suppose we pick a number, say
3, and evaluate a function f at 3 to find f (3) = 15. Then (3, 15) is an element of f . Now, the inverse function f −1
requires that 15 be changed back into 3; that is f −1 (15) = 3, so that (15, 3) is an element of f −1 . This means that if
y = f (x), then the inverse y = f −1 (x) is found by interchanging the x’s and y’s; and solve, if possible, the resulting
equation x = f (y) for y.
Example 12. Finding inverses
Find the inverses for the following functions:

a. The function defined by the table
x y = f (x)
1 9
2 0
3 4
4 5
5 −42
b. The function defined by the equation

1
y = f (x) =
1+x
c. The function defined by the verbal description: to every r ≥ 0 associate the area of a circle of radius r.
Solution.
a. Reverse the ordered pairs in the table:
y x = f −1 (y)
9 1
0 2
4 3
5 4
−42 5
b. We begin by interchanging the x’s and y’s and solve for y:

1
x = assuming y 6= −1
1+y
1
1+y =
x
1
y = −1
x
Therefore, f −1 (x) = x1 − 1 for all x 6= 0. Notice that the range of f (x) is all the reals but zero and this
range corresponds to the domain of the function f −1 (x) that we found.
c. The area A of a circle of radius r ≥ 0 is given by A = πr2 . The range of A is [0, ∞). To find the inverse,
for every A ≥ 0
2
A = πr
r
A
r =
π
In words, the radius of a circle is the square root of its area divided by π.

In Example 12c, we did not interchange the names of A and r as we did for x and y in part b. In general,
it is not necessary to interchange the names of x and y if we are comfortable expressing the inverse function as
x = f −1 (y). The only reason we change the names in the latter case is that, by convention, the variable x usually
is the independent variable in the domain of the function and the variable y is the dependent variable in the range
of the function. Since there are no conventions associated with the variable names A and r, we do not bother to
interchange the names, especially because A stand for area and r stands for radius and we do not want to mix these
up.
The last property of inverses that we consider in this section tells us about the graph of the inverse of a given
function.
If f is one-to-one, then the graph of its inverse y = f −1 (x) is given by reflecting the
Graphing Inverses
graph of y = f (x) about the line y = x.
Example 13. Graphing inverses
Consider the function y = f (x) = 1 − e−x . Find the inverse function y = f −1 (x) and sketch the graphs of
y = f (x), y = x and y = f −1 (x).
Solution. First, find the inverse:
x = 1 − e−y
e−y = 1−x
−y = ln(1 − x)
y = − ln(1 − x)
We graph y = f (x) = 1 − e−x in black and the line y = x in blue. Reflecting the graph y = 1 − e−x about the
line y = x yields the graph of y = f −1 (x) in red.
x
-4 -2 2 4
-2
-4
Problem Set 1.6

LEVEL 1 – DRILL
Let y = f (x) be the function whose graph is given by Figure 1.48. Sketch the graph of the functions Problems 1 to 6.
1. y = f (x) + 2

10
x
-4 -2 2 4
Figure 1.48: Graph of f
2. y = f (x + 1)
3. y = f (x − 2) + 1
4. y = 2f (x + 2)
5. y = −f (x)
6. y = f (−x + 2)
Sketch the graph of the functions in Problems 7 to 10 by appropriately shifting, stretching, etc. the graph of y = cos x
7. y = cos(x − π2 )
8. y = 3 cos(2x)
9. y = 3 cos x2
2π
10. y − 2 = 2 cos(x + 3 )
In Problems 11 to 19 sketch the graph of each function without using a calculator.
11. y = ex−1
12. y = e−x+2
13. y = 2ex+1
14. y = e2x
15. y = 2e3 x + 1
16. y = ln(x + 1)
17. y = ln(x − 1) + 1
18. y = ln x2
19. y = − ln(1 − x) + 1
20. Find the indicated values given the functions f = {(0, 1), (1, 4), (2, 7), (3, 10)} and
g = {(0, 3), (1, −1), (2, 1), (3, 3)}
a. (f + g)(1)
b. (f − g)(2)

c. (f g)(2)
d. (f /g)(0)
e. (f ◦ g)(2)
21. Find the indicated values given the functions
2x2 − 5x + 2
f (x) =
x−2
and
g(x) = x2 − x − 2
a. (f + g)(−1)
b. (f − g)(2)
c. (f g)(9)
d. (f /g)(99)
e. (f ◦ g)(0)
22. Let p(t) be a periodic function with period 2π and amplitude 1. Show that the given functions are periodic,
and find their period and amplitude.
a. p(t − 1) + 2.
b. 5p(t)

c. p πt
d. 2p(t + π2 ) − 3
23. Let f (t) be a periodic function with period T and amplitude A. Show that the following functions are periodic
and find their period and amplitude.
a. f (t + 1) − 2.
b. 4f (t)
c. −2f (3t)
d. 2f (t − 4) + 1
Express each of the functions in Problems 24 to 29 as the composition f ◦ g of two functions f and g. (Answers are
not unique.)
24. y = (2x2 − 1)4

√
25. y = 1 − sin x
2
26. y = e−x
27. y = (ln x)4
28. y = |x + 1|2 + 6
√
29. y = (x2 − 1)3 + x2 − 1 + 5
For each of the functions in Problems 30 to 33, find f + g, f g, f /g, and f ◦ g. Also give the domain and range of
each of these functions.
x−2
30. f (x) = x+1 and g(x) = x2 − x − 2
2x2 −x−3
31. f (x) = x+1 and g(x) = x2 − x − 2

√
32. f (x) = ln(1 − x) and g(x) = 4 − x2
33. f (x) = ln(4 − x2 ) and g(x) = sin(πx)
Find the inverse of the functions in Problems 34 to 39. State the domain and range of the inverse.
x
34. y = 1+x
35. y = e2x+1
36. y = (x + 1)3 − 2
37. y = x2 on [0, ∞)
38. y = x2 on (−∞, 0]
√
39. y = ln x
Use the horizontal line test to determine which of the functions in Problems 40 to 43 is one-to-one. For the functions
that are one-to-one, sketch the inverse.
40.
y
60
40
20
x
-2 -1 1 2
-20
-40
-60
41.
x
-2 -1 1 2
-2
-4

42.
43.
44. The tides for Hell Gate, Ward Island on September 6th, 2004 are given by the following table:

12:08AM 2.1 Low
5:19AM 5.3 High
12:00 noon 2.1 Low
Let
T (t) = A cos[B (t + C)] + D feet
denote the height of the tide t hours after midnight. Find values of A, B, C and D such that the function fits
the Hell Gate tide data.
45. The tides for Bodega Bay, CA on March 10, 2005 are given by the following table:

4:36 AM 1.1 Low
10:43 AM 5.8 High
5:02 PM −0.4 Low
Let
T (t) = A cos[B (t + C)] + D feet
denote the height of the tide t hours after midnight. Find values of A, B, C and D such that the function fits
the Bodega Bay tide data

46. Enzymes are nature’s catalysts, as they are compounds that enhance the rate (speed) of biochemical reactions.
Enzymes are used according to the body’s need for them. There are enzymes that aid in blood clotting, and
those that aid in digestion, and even those within the cell that are needed for specific reactions. In this problem,
you will derive a model of a biochemical reaction where there is a substance (e.g. glucose) that is converted to
a new substance (e.g. fructose) by an enzyme (e.g. isomerase). Let f (x) be the amount of substance produced
per minute as a function of the substrate concentration x. To model this reaction rate, assume that enzymes
are either “occupied” (i.e. processing a substrate particle) or are “unoccupied” (i.e. waiting to bind to another
substrate particle).
a. Let t be the fraction of time that an enzyme is unoccupied. Assuming that 1 − t is proportional tx,
find t as a function of x
b. Assuming that f (x) is proportional to 1 − t, write down an expression for f (x).
c. Below is some data for glucose-6-phosphate converted to fructose-6-phospate by the enzyme phos-
phoglucose isomerse.
Substrate concentration Reaction rate
(micromolar) (micromolar/minute)
0.08 0.15
0.12 0.21
0.54 0.70
1.23 1.1
1.82 1.3
2.72 1.5
4.94 1.7
10.00 1.8
Using linear regression on the transformed data, the uptake rate can be approximated by f (x) =
1.95x
1+0.95x . Graph this function against the data.
bx
47. We have seen several applications where it useful to fit a function of the form y = f (x) = 1+ax to a data set.
Consider the change of variables given by t = 1/x and z = 1/y.
a. Write down an expression for z in terms of t.

b. Consider the following data set
Substrate concentration y Reaction rate x
(micromolar) (micromolar/minute)
0.08 0.15
0.12 0.21
0.54 0.70
1.23 1.1
1.82 1.3
2.72 1.5
4.94 1.7
10.00 1.8
Take the recriprocals of the (x, y) data values to get the corresponding (t, z) values. Use technology
to fit a line to the (t, z) data. If this line is given by z = c + d t, use your work in (a) to find the
bx
parameters a and b in y = 1+ax .
48. Environmental studies are often concerned with the relationship between the population of an urban area and
the level of pollution. Suppose it is estimated that when p hundred thousand people live in a certain city, the
average daily level of carbon monoxide in the air is
p
L(p) = 0.07 p2 + 3

ppm. Further assume that in years, there will be
p(t) = 1 + 0.02t3
hundred thousand people in the city. Based on these assumptions, what level of air pollution should be expected
in four years?
49. The volume, V , of a certain cone is given by
πh3
V (h) =
12
Suppose the height is expressed as a function of time, t by h(t) = 2t.
a. Find the volume when t = 2.
b. Express the volume as a function of elapsed time by finding V ◦ h.
c. If the domain of V is [0, 6], find the domain of h; that is, what are the permissible values for t?
50. The surface area, S, of a spherical balloon with radius r is given by
S(r) = 4πr2
Suppose the radius is expressed as a function of time t by r(t) = 3t

a. Find the surface area when t = 2.
b. Express the surface area as a function of elapsed time by finding S ◦ r.
c. If the domain of S is (0, 8),find the domain of r; that is, what are the permissible values for t?
51. The Canadian ecologist, C. S. Holling (1959), mentioned in Example 9 also collected data on the daily rates at
which individual masked shrews (Sorex cinereus), gathered European pine sawfly cocoons in forest-floor litter.
His data for this species are fitted by the functional response
x4
f (x) = 110 cocoons per day
3004+ x4
where x is the density of cocoons on the forest floor. If breeding pairs for this species produce approximately
4 female and 4 male progeny per year under favorable conditions and the growth break-even point is b = 400
cocoons per day, then write down the specific form of the per capita hyperbolic growth rate r per day: g(y) =
r(1 − b/y) for this species and use it to derive the composite per-capita growth rate function G = (g ◦ f )(x).
Plot a graph of this composite function.
52. Suppose the number of hours between sunrise and sunset in Los Angeles, CA, is modeled by

2πn
H = 12.17 + 1.5 sin − 1.5
365
where n is the number of the day in the year (n = 1 on Jan. 1 and n = 365 on Dec. 31, except on leap years
when n = 366). On what days of the year in 2009 will there be approximately 12 hours of daylight in Los
Angeles?
53. According to the model in Problem 52, when will the length of the day in Los Angeles be about 13 hours?

1.7. SEQUENCES AND DIFFERENCE EQUATIONS 107
1.7 Sequences and Difference Equations

Often, experimental measurements are collected at discrete intervals of time. For example, the number of elephants
in wildlife park in Africa may be counted every year to ensure that poachers are not driving the population extinct in
the near future. Blood may be drawn on a weekly basis from a patient infected with HIV and the number of CD4+
cells produced by patient’s immune system counted to monitor the progression of the patient towards full-blown
AIDS. Data obtained in this regular fashion can be represented by a sequence of numbers over time. In this section,
we describe the basic properties of such sequences and demonstrate that some sequences can be generated recursively
using a relationship called a difference equation. These equations are formulated using a function from the natural
numbers to the real numbers.
Sequences
We begin with the idea of a sequence, which is simply a succession of numbers that are listed according to a given
prescription or rule. Specifically, if n is a natural number, the sequence whose nth terms is the number an can be
written as
a1 , a2 , a3 , . . . , an , . . .
The number a1 is called the first term, a2 the second term, . . ., and an the nth term.
Sequence A sequence is a real-valued function whose domain is the set of natural numbers.
When working with sequences, we alter the usual functional notation. For a function a from the natural to the
real numbers we should write a(1), a(2), a(3), . . ., but for convenience we write a1 , a2 , a3 , . . .. The function a(n)
is written an and is called the general term.
Example 1. Finding the sequence, given the general term
Find the first five terms of the sequences whose general term is given.
a. an = n
b. an = sin πn
2
n
c. an = 1+n
d. an is the digit in the nth decimal place of the number π.

e.
a1 = 5 an+1 = 2an for n ≥ 1
Solution.
a. Since n is the general term, we have 1, 2, 3, 4, and 5 for the first five terms.
b. For n = 1, sin π2 = 1; for n = 2, sin 2π 3π 4π
2 = 0; for n = 3, sin 2 = −1; for n = 4, sin 2 = 0; and for
5π
n = 5, sin 2 = 1.
1
c. Take the first five natural numbers (in order) to find: 1+1 = 21 , 2
1+2 = 23 , 3
1+3 = 34 , 4
1+4 = 54 , and 5
1+5 = 5
6
d. Since π ≈ 3.141592 · · ·; we see the first five terms of this sequence is: 1, 4, 1, 5, and 9.
e. This is known as a recursive formula because after one (or more) given term(s), the subsequent terms
are found in terms of the given term(s). For this example, the first term is given: a1 = 5; for n = 2, we
use a2 = 2a1 = 2(5) = 10; for n = 3, a3 = 2a2 = 2(10) = 20; for n = 4, a4 = 2a3 = 2(20) = 40, and for
n = 5, a5 = 2(40) = 80. In summary, the first five terms of the sequence are 5, 10, 20, 40, 80.

108 1.7. SEQUENCES AND DIFFERENCE EQUATIONS
10
1
8
0.5
6
n
2 4 6 8 10
4
-0.5
2
n -1
2 4 6 8 10
a. Sequence {n} b. Sequence {sin πn

2 }
1 10
0.8 8
0.6 6
0.4 4
0.2 2
n n
2 4 6 8 10 10 20 30 40 50
n
a. Sequence { 1+n } b. nth decimal place in π
Figure 1.49: Graphs of sequences
To visualize a sequence, one can graph the sequence of points
(1, a1 ), (2, a2 ), (3, a3 ), . . .
in the coordinate plane. The first 10 terms of the first four sequences from Example 1 are graphed in Figure 1.49.
Since the domain consists of the natural numbers, the graph consists of discrete points.
Difference Equations
Beyond specifying a sequence by its general term, sequences can also be generated term by term using a rule call a
difference equation that specifies how to calculate each term in the sequence from the values of preceding terms.
For example, the difference equation
an+1 = ran and a1 = r
generates the geometric sequence
a1 = r
a2 = ra1 = r2
a3 = ra2 = r3
a4 = ra3 = r4
..
.
an = ran−1 = rn
..
.
Similarly, the difference equation

an+1 = an + d and a1 = d
generates the arithmetic sequence
a1 = d

a2 = a1 + d = 2d
a3 = a2 + d = 3d
a4 = a3 + d = 4d
..
.
an = nd
..
.
More generally, for any real-valued function f , difference equations of the form
an+1 = f (an )
allow us to describe how quantities evolve over discrete intervals of time. For example, the geometric sequence
generator an+1 = (1 + k/100)an describes how our money will grow each week in the bank if we initially invest
a1 dollars and the weekly interest rate is k%. This same equation could describe the weekly growth of a bacterial
culture in a laboratory, or even a population of California condors that had been reintroduced to a wild area where
they had previously gone extinct from use of the pesticide DDT prior to a ban in 1972.
From a modeling perspective, discrete intervals of time implied by the iteration of the difference equation (e.g.
daily, weekly, or annual growth rules) correspond either to synchronized events of the system (e.g. daily injections of
a drug, annual reproductive cycles in a population) or intervals separating experimental measurements of the system
(e.g. daily blood cell counts, annual population counts). To fully define the sequence, it is necessary to specify the
initial value a1 after which the recursive formula defines the rest of the sequence inductively. Hence, for difference
equations, the value of the variable a1 determines all future values an , n = 1, 2, 3, . . ..
Example 2. The difference equation implicit in taking repeated square roots
√
Enter any nonzero number into your calculator. Press the “square root” ( ) key and record your answer. Press
again and record repeatedly. Let an denote the nth number displayed on the screen.
a. Find a recursive formula for an .
b. Graph the first 20 terms of the sequence when a1 = 4. Discuss what happens to an as n gets very large.
c. Graph the first 20 terms of the sequence when a1 = 0.1. Discuss what happens to an as n gets very large.
d. What happens when a1 = 1?
Solution.
√
a. For any selected value a1 , after pressing the square root key, the calculator generates the number a2 = a1
√ p√
is obtained. Similarly, after the second iteration the number a3 = a2 = a1 is obtained. Proceeding
inductively yields
√
an+1 = an .
√
Thus the recursive formula in this case is an+1 = f (an ) with f (x) = x.
b. Plotting the first 20 terms of the sequence with a1 = 4 yields

n
5 10 15 20
This plot suggest that as n gets larger, an decreases toward the value 1 (but not 1).
c. Plotting the first 20 terms of the sequence with a1 = 0.1 yields
0.8
0.6
0.4
0.2
n
5 10 15 20
This plot suggest that as n gets larger, an increases toward the value 1.
√ √
d. If a1 = 1, then a2 = a1 = 1 = 1. Proceeding inductively, we get that an = 1 for all integers n ≥ 1.
2
Difference equations can be used to model a variety of biological phenomena. The next two examples illustrate
their usage in modeling drug dosages and the purging of a lethal recessive gene from a population.
Example 3. Drug delivery
For regular strength Tylenol, the directions recommend taking 2 tablets every 4 to 6 hours and not to take more
than 5 tablets in 24 hours. Each tablet contains 325 mg of Acetaminophen. Suppose Professor Schreiber takes 2
tablets every 4 hours. According the Handbook of Basic Pharmacokinetics, 2nd Edition, approximately 67% of the
drug is removed from the body every 4 hours. To model the amount of drug in Professor Schreiber’s body, let xn be
the amount of drug in his body right before taking the n-dosage.
a. Write down a difference equation for xn .
b. Find x1 , x2 , x3 .
c. What is the maximum amount of Acetaminophen in Professort Schreiber’s body during the first 12 hours
of taking Tylenol.
d. Suppose contrary to the directions, Professor Schreiber kept on taking his dosage for several days. What
value does xn seem to approach?
Solution.

a. If xn is the amount of drug in the body just before taking the nth dose, then amount of drug in the body
after taking the nth dose is xn + 650 mg. Since 67% of the drug leaves the body in 6 hours, the amount
of drug left in the body before taking the next dose is (1 − 0.67) (xn + 650) = 0.33 xn + 214.5. Therefore,
xn+1 = 0.33 xn + 214.5.
b. Without being told, there is now way for us to know what the value of x1 is. The most reasonable is for
us to presume before taking the first dose that Professor Schreiber has no Acetaminophen in his body,
in which case x1 = 0. In this case, for n = 2 and n = 3, we obtain x2 = 0.33 · 0 + 214.5 = 214.5 mg and
x3 = 0.33(214.5) + 214.5 = 285.285 mg.
c. The maximum amount of Acetaminophen in the body occurs right after taking a dosage. The amounts
of Acetaminophen in the body after taking the first, second, and third dosages are 650, 214.5 + 650 =
864.5mg, and 285.285 + 650 = 935.285 mg. Hence, the maximum is given by 935.2853mg.
d. Computing xn for n = 1, 2, . . . , 20 yields the table of values
n xn n xn
1 0 11 320.14
2 214.50 12 320.15
3 285.29 13 320.15
4 308.64 14 320.15
5 316.35 15 320.15
6 318.9 16 320.15
7 319.74 17 320.15
8 328.01 18 320.15
9 320.10 19 320.15
10 320.13.82 20 320.15
This table suggests that xn is approaching a value that rounded to two decimal places is 320.15 mg.
The difference equation xn+1 = 0.33xn + 206.25 in Example 3 is an example of a linear difference equation:
the right hand side of the difference equation depends linearly on xn . In the problem set, you are asked to write
down explicit solutions for linear difference equations. Another difference equation for which explicit solutions can
be written down is presented in the next example.
In the next example, through the formulation of an appropriate difference equation model, we move on to
considering how the frequency of genes that influence the reproductive fitness of individuals changes over time. In
particular, we trace the fate of genes that code for traits relating to diseases, such as Tay Sachs or cystic fibrosis, that
have a lethal effect when the disease goes untreated. The models we use are based on the concept of a gene residing
at a locus in an individual’s genome and we talk about alleles at this locus, one of which is responsible for the disease
in question while the others are not. Most loci in vertebrates (except for those related to sex determination or sex
related traits) represent a pair of alleles because the organisms are genetically diploid. If there are only two possible
alleles A and a that can pair up, each individual can only be one of three possible genotypes for the locus in question:
namely AA, Aa or aa. Note that we do not distinguish between Aa and aA unless it becomes important to know
which allele an individual inherited from which of its two parents.
If the frequency of alleles A and a in a population are x and (1 − x) respectively (0 ≤ x ≤ 1)(note that the
frequencies add to one thereby implying no other alleles are around at the locus in question) and individuals are
equally likely to get any one of two possible alleles from each of their parents, then one can use elementary probability
theory to show that the frequencies of genotypes among the progeny are x2 for AA, 2x(1 − x) for Aa (the 2 arises
because Aa and aA are the same genotype) and (1 − x)2 for aa. This accounts for all possible genotypes, which we
check by adding these three genotype frequencies to obtain
x2 + 2x(1 − x) + (1 − x)2 = x2 + 2x − 2x2 + 1 − 2x + x2 = 1.

0.5
0.45
0.4
0.35
0.3
xn
0.25
0.2
0.15
0.1
0.05
0
0 10 20 30 40 50 60 70 80 90 100
n
Figure 1.50: Rate of decline of a recessive lethal gene over n generations when initially at a frequency of 0.5 in the
population
Example 4. Lethal recessive genes
Suppose a disease in humans is primarily due to the existence of a lethal recessive allele a. By lethal recessive we
mean that individuals of type aa die from the disease, while individuals of type AA and Aa are not affected by the
disease.
a. To understand how the frequency of lethal recessive genes change over time, let xn denote the fraction of
a alleles in the population at time n. At each time step, assume that alleles pair up randomly (i.e. mating
occurs), the pairs aa die, and the pairs Aa and AA produce an extra copy of themselves. Assuming a
population size of 500 (i.e. 1, 000 alleles), write down a difference equation for xn .
b. Does the difference equation obtained apply irrespective of the population size?
c. Suppose in a population of 500 individuals (i.e. 1000 alleles), there are initially 500 copies of the lethal
recessive allele a. Use technology to plot xn for n = 1, . . . , 100.
Solution.
a. If alleles are pairing up randomly at time n, then we expect that the fraction of aa pairs is x2n , the fraction
of aA pairs is 2xn (1 − xn ), and the fraction of AA pairs is (1 − xn )2 . Hence, we expect that the number of
aa, Aa, and AA pairs is 500x2n , 1000xn (1 − xn ) and 500(1 − xn)2 . Since the Aa and AA produce an extra
copy of themselves and the aa’s die, we are left with 0, 2000xn (1 − xn ) and 1000(1 − xn )2 individuals of
types aa, Aa, and AA. Hence, we expect that the number of a alleles is 2000xn (1 − xn ) and the number
of A alleles is 2000xn (1 − xn ) + 2 · 1000(1 − xn )2 . Therefore, the fraction of a alleles in the next time step
is
number of A alleles
xn+1 =
total number of alleles
2000xn (1 − xn )
=
4000xn(1 − xn ) + 2000(1 − xn )2
xn
=
2xn + (1 − xn )
xn
=
1 + xn

Figure 1.51: Data on the decline of the Glued gene in fruit flies compared with the expected rate predicted by the
theoretical model in Example 4
b. If the population consists of N rather than 500 individuals then going through the same reasoning as in
a. the equation
4N xn (1 − xn )
xn+1 =
8N xn (1 − xn ) + 4N (1 − xn )2
4N (xn )
=
4N (2xn + (1 − xn ))
xn
=
1 + xn
is obtained, which is independent of N .
500
c. We have initially x1 = 1000 = 0.5. Using technology to compute x2 , . . . , x100 , we get the plot illustrated
in Figure 1.50 This plot illustrates two things. First, when the initial frequency of the lethal allele is high,
the frequency of this lethal allele initially decreases very rapidly. However, as the frequency of the allele
gets low, it decreases much less rapidly. For instance, in the problem set you will be asked to show that
it takes approximately 1000 time steps for the alleles to reach a frequency of 0.1%.
xn
In Figure 1.51, experiments on the fruit fly show that the difference equation xn+1 = 1+x n
does a reasonable
job of describing observed frequencies of the lethal allele, Glued, in fruit flies. The observed trajectories illustrate
that even if you start with the same initial conditions (i.e. 50% Glued ), random birth and death events can result
in different experimental trajectories. Hence, the model can only be expected to describe what happens for the
“average” experiment.
In Examples 2 and 4 we saw that for certain initial values the difference equations generating the sequences
√
in question produced a string of constant values. Specifically, in Example 2 the difference equation an+1 = an
produced the sequence 1, 1, 1, . . . for a1 = 1 (i.e. the square root of 1 is 1)and in Example 4 the difference equation
xn
xn+1 = 1+x n
produced the sequence 0, 0, 0, . . . , when x1 = 0 (i.e. if the lethal is initially not present, it never
appears). Such starting values are called equilibria for the equations in question.

An equilibrium of the difference equation
an+1 = f (an )
Equilibrium
is an initial value a1 such that f (a1 ) = a1 . From this it easily follows that a1 =
a2 = a3 = · · ·.
Example 5. Finding equilibria
Find the equilibria for the following three difference equations. Discuss how the answers you found relate to what
was observed in Examples 2, 3, and 4.
√
a. an+1 = an
b. an+1 = 0.33an + 206.25
xn
c. xn+1 = 1+xn
Solution.
√
a. To find the equilibria, we need to solve a = a. Since the only numbers whose square roots are themselves
are 0, 1. The equilibria for this difference equation are given by 0 and 1. In Example 2, we saw that for
various positive initial conditions, the sequence an approaches the equilibrium 1 as n gets large.
b. To find the equilibria, we need to solve
a = 0.33a + 206.25
0.67a = 206.25
a ≈ 307.84
In Example 3, we observed for the initial condition a1 = 0, the sequence an would approach this equilib-
rium value.
x
c. To find the equilibria, we need to solve, x = 1+x . One solution to this equation is x = 0. Any other
1
solution must satisfy 1 = 1+x . Cross multiplying yields 1 + x = 1. Hence, x = 0 is the only equilibrium.
In Example 4 it appeared the sequence corresponding to x1 = 0.5 might be approaching this equilibrium.
However, since the approach seems quite slow, it is not obvious whether xn becomes arbitrarily close to
zero.
2
In the next Chapter, we explore more carefully the question posed made in Example 5 the sequences approach
the identified equilibria. The following example illustrates that an equilibrium is not always approached.
Example 6. Generalized Beverton-Holt Dynamics
In 1981, Thomas Bellows investigated how the survivorship of different species of stored grain beetles depended
on the population abundance x. Some of the data from this experiment are illustrated in Figure 1.52.∗ Bellows
1
showed that the function s(x) = 1+(ax) b with x corresponding to population density, a > 0 and b > 0 could describe
all of these data sets. s(x) describes the fraction of individuals surviving as a function of population abundance. If
r > 0 is the average number of progeny produced by an individual, then the population dynamics of the grain beetles
can be given by
r xn
xn+1 = .
1 + (axn )b
where xn is the population density in generation n.
∗ After Bellows, T. S. 1981. “The Descriptive Properties of Some Models for Density Dependence”. Journal of Animal Ecology, Vol.
50, No. 1. pp. 139-156.

Figure 1.52: The relationship between number of survivors and initial egg density for four species of stored product
beetles.
a. For r = 2 and a = 0.01, find the equilibria of the model.
b. For b = 3 and b = 6 compute and graph the first 50 terms of the sequence determined by the initial
condition x1 = 99. Compare the sequences obtained for b = 3 and b = 6. (Assume r and a have the same
values as in part a.
Solution.
a. To find the equilibria, we need to solve

2x
x=
1 + (x/100)b
for x. x = 0 is a solution. For x 6= 0, we obtain

2
1 =
1 + (x/100)b
1 + (x/100)b = 2
(x/100)b = 1
x/100 = 1
x = 100
Thus, x = 100 is an equilibrium value irrespective of the value of b > 0.
b. Using technology for b = 3:

2xn
xn+1 =
1 + (xn /100)3
and x1 = 99 yields

100.5
100
xn
99.5
99
0 5 10 15 20 25 30 35 40 45 50
n
It appears that the sequence is approaching the equilibrium value of x = 100.

Using technology for b = 6 :
2xn
xn+1 =
1 + (xn /100)6
and x1 = 99 yields
130
120
110
100
90
xn
80
70
60
50
40
0 5 10 15 20 25 30 35 40 45 50
n
Despite starting near the equilibrium abundance of x = 100, this sequence exhibits oscillatory bursts of
population growth and decline without any other characterizable pattern of behavior. In Chapter 4, we
will discuss methods to distinguish between these different outcomes.
Cobwebbing
Another way to visualize sequences determined by a difference equation
an+1 = f (an )
is via a graphic technique known as cobwebbing. To cobweb you begin by graphing the functions y = f (x) and
y = x in the xy-plane, and choose an initial condition a1 for the sequence. To visualize the sequence determined
by this initial condition, start at the point (a1 , a1 ). Draw a vertical line segment from (a1 , a1 ) to (a1 , f (a1 )). Draw
a horizontal line segment from (a1 , f (a1 )) to (f (a1 ), f (a1 )). Since a2 = f (a1 ), these line segments bring us to the
second value of the sequence. Repeating this procedure will generate more terms of the sequence, as illustrated by

Example 7. Cobwebbing square roots
√
Consider the difference equation an+1 = f (an ) where f (x) = x. Use cobwebbing to visualize the first ten terms
of the sequence determined by a1 = 4 and a1 = 0.1.
√
Solution. We begin with the graphs of y = x and y = x.
y
4
x
1 2 3 4
To visualize the first two terms of the sequence, we start at the point (4, 4) and draw a vertical line down to the
graph of y = f (x) followed by a horizontal line to the graph of y = x.
To visualize the next term, draw a vertical down from (2, 2) to the graph of y = f (x) followed by a horizontal line
to the graph of y = x.
y
4
x
1 2 3 4
Proceeding in this manner for seven more iterates gives the following cobweb figure:

y
4
x
1 2 3 4
This figure shows that the sequence of an values down the diagonal y = x are getting closer to the value 1, as we
found in Example 2.
To visualize the first ten terms of the sequence with a1 = 0.1, start at (0.1, 0.1), draw a vertical line to the graph
of y = f (x) and then a horizontal line to the graph of y = x.
y
1
0.8
0.6
0.4
0.2
x
0.2 0.4 0.6 0.8 1
Continuing, the cobwebbing shows that the sequence of an values are getting closer to the value 1, as we found before.
y
1
0.8
0.6
0.4
0.2
x
0.2 0.4 0.6 0.8 1
Cobwebbing an increasing function, such as the square-root function is relatively simple. The cobweb diagram,
as illustrated in the next example, gets more complicated when the function is not increasing.
Example 8. Cobwebbing a hump shaped function
Use cobwebbing to visualize the first 40 terms of the sequence determined by the equation
3an
an+1 =
1 + (an /100)6
from starting value a1 = 50. Discuss the primary difference between this example and Example 7.

3x
Solution. We being by drawing the graphs of y = f (x) = 1+(x/100)6 and y = x.
150
100
50
x
25 50 75 100 125 150 175
To visualize the first two terms of the sequence, start at (50, 50) and draw a vertical line up to the graph of y = f (x)
followed by a horizontal line to the graph of y = x.
150
100
50
x
25 50 75 100 125 150 175
To visualize the next term, we draw a vertical down from (150, 150) to the graph of y = f (x) followed by a horizontal
line to the graph of y = x.
150
100
50
x
25 50 75 100 125 150 175
Unlike our previous cobwebbing, we see that the sequence is already exhibiting some oscillation. In fact continuing
for the remaining 37 terms yields the following wild web:

150
100
50
x
25 50 75 100 125 150 175
The previous examples indicate that equilibria occur at the intersection points of the graph as summarized in the
following box.
Finding Equilibria To find equilibria of an+1 = f (an ), it suffices to look for intersection points of the
Graphically graphs of y = x and y = f (x)
Example 9. Demise of a whale population
Whales have difficulty finding mates in the vast oceans of the world when their population numbers drop below
a critical value. Thus a model of the growth of whale populations from one whale generation to the next is going to
be relatively more robust at intermediate whale densities, than at low densities when finding mates is a problem, or
high densities when competition for food is a problem. The form of the function f in the difference equation
an+1 = f (an )
that reflects the above properties is illustrated in Figure 1.53, where an is the density of the whales in generation n
(units are whales per 1000 sq km).
200
150
100
50
x
50 100 150 200
Figure 1.53: A function, f , modeling the growth of a hypothetical whale population
a. Estimate the equilibria of the difference equation.
b. Use cobwebbing to determine the fate of the whales when a1 = 45.
c. Use cobwebbing to determine the fate of the whales when a1 = 55.

Solution.
a. Since y = x intersects y = f (x) roughly at the values 0, 50 and 200, the equilibria of this difference
equation are 0, 50, and 200.
b. Cobwebbing from the point (45, 45) yields the following graph:
70
60
50
40
30
20
10
x
10 20 30 40 50 60 70
Hence, it appears that if the initial density of whales is below 50 per 1000 km2 , then the population dies
out.
c. Cobwebbing from the point (55, 55) yields the following graph:
200
150
100
50
x
50 100 150 200
Hence, it appears that if the initial density of whales is above 50 per 1,000 km2 , then the population
approaches a density of 200 per 1,000 km2 .
2
Problem Set 1.7

LEVEL 1 – DRILL
Find and graph the first five terms for the sequences in Problems 1 to 10.
1
1. an = 1 − n
2. an = (−1)n+1

3. an = cos πn
2
cos(2nπ)
4. an = n
5. an is the nth digit of the decimal representation of the number 71 .

6. an is the nth digit of e.

√
7. a1 = 256, an+1 = an
8. a1 = 2, an+1 = a2n , n ≥ 2
9. a1 = −4, a2 = 6, an = an−1 + an−2 , n ≥ 3
10. a1 = 1 and a2 = 2, an+1 = an an−1 , n ≥ 3
Find a4 for each difference equation in Problems 11 to 20.
11. an+1 = an + 8; a1 = 0
12. an+1 = 3an ; a1 = 0
13. an = 12 an−1 + 2; a1 = 100

1
14. an = 10 an−1 + 2; a1 = 1, 000
15. an+1 = 5an + 2; a1 = 0
16. an+1 = 2an + 1; a1 = 8
17. an+1 = 1 − 2an ; a1 = 0
18. an+1 = 1 − 21 an ; a1 = 0
2 an
19. an+1 = 1+0.01 an ; a1 = 1
20. an+1 = 2an (1 − an ); a1 = 1
Find the equilibria of an+1 = f (an ) and sketch cobwebbing diagrams for the values of a1 given in Problems 21 to 26.
21. f (x) = 2x(1 − x) with a1 = 0.1
22. f (x) = x(2 − x) with a1 = 0.4

3x
23. f (x) = 1+x with a1 = 0.1
3x
24. f (x) = 1+x with a1 = 3
25. f (x) = 1 + x/2 with a1 = 0

1
26. f (x) = 1+x with a1 = 3
Find the equilibria of an+1 = f (an ) where the graph of y = f (x) is shown in Problems 27 to 30, and sketch the
cobwebbing diagrams starting with the given a1 value.
27. a1 = 1
y
x
-2 -1 1 2
-2
-4

28. a1 = −0.5
29. a1 = −0.5
30. a1 = 1
y
3
x
-1 1 2 3
-1
-2
31. A drug is administered into the body. At the end of each hour, the amount of drug present is half what it was
at the end of the previous hour. What percent of the drug is present at the end of 4 hr? At the end of n hours?
32. The wildebeest (or gnu) is a dominant species in the Serengeti. The following data of wildebeest abundance
was collected by the Serengeti Research Institute.

year 1961 1963 1965 1967 1971 1972 1977 1978

population size 263 357 439 483 693 773 1444 1249
in thousands
a. Assuming xn+1 = a xn can be used to model the data, approximate the value of a that gives the best
fit to the data.
b. Suppose poachers kill 10 thousand wildebeest per year. Write down a new difference equation for
the population.
c. Determine the effect of poaching on the wildebeest population in the next twenty years given x1 =
1249 thousand. Does the population survive?
33. Jacky Chan has a real bad headache. He decides to take 500 mg of aspirin every four hours. At the end of
each four hour period, the body clears out 80% of the aspirin in his body. Let an denote the amount of aspirin
in Jackie Chan’s body right before he takes the n-th aspirin.
a. Write down a difference equation for an and identify the value of a1 .
b. Write down the first 5 terms of an
c. Find the equilibrium of this difference equation.
34. The Ricker model in population dynamics is given by
an+1 = b an e−c an
where b is the total number of progeny produced per individual per generation and e−c an represents the fraction
of progeny that survive cannibalism. Find all the equilibria for this model and determine under what conditions
they are positive. Sketch cobwebbing diagrams for b = 0.9, b = 2.0, b = 8.0, and b = 20.0. In these diagrams,
let c = 1.0 and a1 = 2.
35. Continued Fractions A simple continued fraction is an expression of the form
1
b0 + 1
b1 + b2 + b 1
3 +...
where b0 , b1 , . . . are real numbers. The simplest continued fraction occurs when 1 = b0 = b1 = b2 = . . .. This
continued fraction is generated by the sequence
1
an+1 = 1 + a1 = 1
an
a. Find the first five terms of this sequence in “expanded form” (i.e. no algebraic reductions) and in
simplified form.
b. Find the equilibria of the difference equation.
c. Use cobwebbing to determine the asymptotic behavior of an .

Fibonacci
1170-1250
Leonardo de Pisa, also known as Fibonacci, was one of the best mathematicians of the Middle Ages. He played
an important role in reviving ancient mathematics and introduced the Hindu-Arabic place-value decimal system
to Europe. His book, Liber Abaci, published in 1202, introduced the Arabic numerals, as well as the famous
rabbit problem, for which he is best remembered today. To describe Fibonacci’s rabbit problem, we consider a
sequence whose nth term is defined by a difference equation. Suppose rabbits breed in such a way that each
pair of adult rabbits produces a pair of baby rabbits each month.
The first month after birth, the rabbits are adolescents and produce no offspring. However, beginning the
second month, the rabbits are adults, and each pair produces a pair of offspring every month. The sequence
of numbers describing the number of rabbits is called the Fibonacci sequence, and it has applications in many
areas, including biology and botany.
In this Historical Quest you are to examine some properties of the Fibonacci sequence. Let denote the
number of pairs of rabbits in the colony at the end of months.
a. Explain why a1 , a2 = 1, a3 = 2, a4 = 3, and, in general,
an+1 = an−1 + an
for n = 2, 3, 4, . . .
b. The growth rate of the colony during the (n + 1)st month is
an+1
rn =
an

Compute rn for n = 1, 2, 3, . . . , 10.

c. Show that rn satisfies the difference equation rn+1 = 1 + r1n (Hint: combine the difference equations
in parts (a) and (b)) and solve for the equilibrium of this difference equation.
37. Consider the difference equation xn+1 = rxn . If x1 is given and r is given, find an explicit expression for xn .
38. Consider the difference equation xn+1 = a + bxn . If x1 is given and r is given, find an explicit expression for
xn .
xn
39. Consider the difference equation xn+1 = 1+xn . Let x1 be given.
a. Write explicit expressions for x2 , x3 , x4 , and x5 in terms of x1 .
b. Use (a) to find a reasonable guess for an explicit expression of xn in terms of x1 .
c. Verify your guess by making sure it satisfies the difference equation.
40. A biologist discovers that a particular gene has an allele a that differs from the usual recessive lethal: as
expected, genotypes of the form aa all die before reproducing, but only half the genotypes of the form Aa also
die before reproducing.
a. Show in contrast to Example 4 that the difference equation describing the frequency xn of the lethal
gene from one generation to the next is now given by the difference equation
xn
xn+1 =
2
b. Calculate the first 10 terms of the resulting sequence starting from x1 = 0.5.
c. Find all equilibrium solutions.
d. Compare the sequence you obtain in b. with the first 10 terms of the sequence obtained in Example 4
(you are going to have to calculate these yourself). What do you notice about how rapidly the allele
disappears?
expected genotypes of the form aa all die before reproducing, but only two thirds rather than all the genotypes
of the form Aa die before reproducing.
xn
xn+1 =
3 − xn
(you are going have to calculate these yourself). What do you notice about how rapidly the allele
disappears?
expected genotypes of the form aa all die before reproducing, but only one third rather than all the genotypes
of the form Aa die before reproducing.
2xn
xn+1 =
3 + xn


(you are going have to calculate these yourself). What do you notice about how rapidly the allele
disappears?
43. Compare the first 10 terms of the sequences obtained from the difference equations derived in Example 4 and
in Problem Sets 40, 41, and 42. What do you conclude about the effect of a lethal allele in the population
when it has a partial effect on the genotypes Aa. What happens when the lethal allele kills all Aa genotypes
before they have a chance to reproduce?

128 1.8. SUMMARY AND REVIEW
1.8 Summary and Review

Definitions
Section 1.1
Model, p. 5
Limit, p. 7
Derivative, p. 233
Integral, p. 8
Tangent line, p. 222
Differential calculus, p. 9
Integral calculus, p. 11
Section 1.2
Natural number, p. 17
Whole number, p. 17
Integer, p. 17
Rational number, p. 18
Real number, p. 18
Open interval, p. 18
Closed interval, p. 18
Function, p. 19
Domain, p. 19
Range, p. 19
Image, p. 19
Vertical line test, p. 24
Piecewise function, p. 26
Absolute value functions, p. 26
Increasing function, p. 27
Decreasing function, p. 27
Constant function, p. 27
Section 1.3
Range, p. 19
Linear function, p. 40
Slope, p. 40
y-intercept, p. 40
Best-fitting line, p. 42
Linear regression, p. 44
Period, p. 47
Amplitude, p. 47
Section 1.4
Power function, p. 57
Base, p. 57
Exponent, p. 57
Proportionality constant, p. 57
Addition law of exponents, p. 59
Subtraction law of exponents, p. 59
Multiplication law of exponents, p. 59
Distributive laws of exponents, p. 59
Proportional, p. 60
Allometry, p. 66
Allometry rate, p. 66
Index of origin, p. 66

1.8. SUMMARY AND REVIEW 129
Family of functions, p. 68
Section 1.5
Exponential function, p. 76
Logarithm, p. 78
Argument of a logarithm, p. 78
Common logarithm, p. 78
Natural logarithm, p. 78
Additive law of logarithms, p. 79
Subtractive law of logarithms, p. 79
Multiplicative law of logarithms, p. 79
Change of base law, p.79
Grant’s tomb laws, p. 79
Section 1.6
Vertical shift, p. 87
Horizontal shift, p. 87
Reflection, p. 88
Dilation, p. 90
Compression, p. 90
Composition, p. 94
Composite function, p. 95
Horizontal line test, p. 97
Inverse function, p. 98
Section 1.7
Sequence, p.107
Recursive formula, p. 107
Difference equation, p. 108
Equilibria, p. 113
Cobweb, p. 116
Important Ideas
Section 1.1
Mathematical modeling, p. 5
Calculus, p. 7
Section 1.2
Function, p. 19
Used domain convention, p. 21
Vertical line test, p. 24
Classifications of functions, p. 27
Section 1.3
Data fitting, p. 40
Linear regression, p. 44
Slope formula, p. 40
Periodic function, p. 46
Section 1.4
Laws of exponents, p. 59
Rules of proportionality, p. 61
Allometric formula, p. 66
Section 1.5
Solving an exponential equation, p. 77
Evaluating a logarithm, p. 78
Laws of logarithms, p. 78
Composing functions, p. 94
Section 1.6

Shifting, reflecting, and stretching, p. 87

Functional arithmetic, p. 91
Inverse function, p. 98
Section 1.7
Difference equations, p. 108
Cobwebbing, p. 116
Finding equilibria, p. 120
Important Applications
Section 1.3
CO2 from electric power plants
CO2 concentrations in Hawaii
Section 1.4
Olympic weight lifting
Breaking bones
Section 1.5
U.S. population growth; Malthus’ estimate
Beer froth height decay
Half-life and doubling time
Kleiber’s data on metabolic rate
Section 1.6
Modeling tidal movements
Generalized Beverton-Holt model
Modeling whale growth from plankton density
Section 1.7
Generalized Beverton-Holt Dynamics
Genetic networks
CHAPTER 1 REVIEW QUESTIONS

Sketch the graph of the functions given in Problems 1 to 5.
1. a. f (x) = x2 , for all x

b. f (x) = x2 , x ≤ 0
2. a. f (x) = 10x
b. f (x) = log x
3. a. f (x) = ln x
b. f (x) = ex
4. a. f (x) = cos x on [0, π]
b. f (x) = sin x on [ π2 , 3π
2 ]
5. a. ln y(x) = 0.5 ln x − 2.5

√
b. y = 1 − x2 on (−1, 1)
Evaluate the expressions in Problems 6 to 8 without using a calculator or technology.

1
6. a. f (x) = log2 4 + log3 9
b. 2log2 3−log2 5
7. a. ln(log 10e )
b. e5 ln 2

8. a. log3 34 − ln e0.5
b. exp(ln 3 − ln 10)
9. Data points with a curve fit to those points are shown. Decide whether the data are better modeled by an
exponential or a logarithmic function.
a.
b.
c.

d.
10. Let y = f (x) be the function whose graph is given in Figure 1.54
0.5
x
-2 -1 1 2
-0.5
-1
-1.5
-2
Mix and match the following functions with their corresponding graphs.
x

a. y = f 2
b. y = 2f (x)
c. y = f (−x)
d. y = −f (x)

y y
2
1.5 1
1 0.5
x
0.5 -4 -2 2 4
x -0.5
-2 -1 1 2
-1
-0.5
-1.5
-1
-2
(A) (B)
y y
1 2
0.5 1
x x
-2 -1 1 2 -2 -1 1 2
-0.5 -1
-1 -2
-1.5 -3
-2 -4
(C) (D)
11. Given a function defined by y = f (x) and shown by the graph in Figure 1.55. Graph:
a. y − 2 = f (x − 3)
b. y = − 21 f (x − π)
√ √
c. y − 2 = f (x − 3)
12. Given a function defined by y = f (x) and shown by the graph in Figure 1.55. Graph:
a. y = −f (x)
b. y = f (−x)
c. y = f (2x)
13. Find the equilibria of an+1 = f (an ) where the graph of y = f (x) is shown. Sketch the cobwebbing diagram
for a1 = −0.6 for a1 = 2.

14. Find the first five terms for the given sequences.
n
a. an = 2 − n+1
n−1
b. an = 21
c. an is the nth prime number.
15. Find the first five terms for the given difference equation.
a. an = 0.3an−1 ; a1 = 1
b. an = 2an−1 ; a1 = 1
c. an+1 = 2an + 3; a1 = 1
16. The pollution level in Lake Bowegon varies during a typical year according to the formula

2πt
P (t) = 50 − 30 cos
365
where t is the number of days from the beginning of the year. A treatment program initiated by the Department
of Wildlife is 50% effective against this pollution. When does the model predict that the pollution will be at a
level of 40?
17. The number of apples, n, in a tree is a function of the population density, d, of bees pollinating the apple
orchard. This function can be modeled by the formula
500d
n(d) =
6+d
The average weight, w, in grams, of an apple at time of harvest is the following decreasing function of the
number apples:
n
w(n) = 70 −
10
Are either of these linear functions?
a. Graph the weight of the apple as a function of the density of bees. What is the domain of w?
b. As the number of bees increase, what can you say about the average weight of an apple?
18. The amount of solids discharged from the MWRA (Massachusetts Water Resources Authority) sewage treatment
plant on Deer Island (near Boston Harbor) is given by the function

1.9. GROUP PROJECTS 135


 160 if 0≤t≤1

 −30t + 160
 if 1<t≤2
f (t) = 100 if 2<t≤4
 −5t2 + 25t + 80


 if 4<t≤6
1.25t2 − 26.25t + 162

if 6 < t ≤ 10
where f (t) is measured in tons/day, and t is measured in years, from 1992.
a. How many more tons per day were discharged in 2002 than in 1996?
b. Sketch the graph of f .
19. A female moth (Tinea pellionella) lays nearly 150 eggs. In one year, up to five generations may be born, and
each female larva eats about 20 mg of wool. Assume that 2/3 of the eggs die, and 50% of the remaining moths
are females. Use an exponential model to estimate the largest amount of wool that may be destroyed by the
female descendants of one female over a period of one year.
20. The level of a certain pollutant in the Los Angeles area has been decreasing linearly since 1990 when a new
pollution control program began. The level of pollution was 0.17 ppm in 1990 and had fallen to 0.11 ppm in
2000.
a. Let P be the level of pollutant (in ppm) at time t (years after 1990). Express P as a function of t.
b. Air with a pollutant level of 0.05 ppm is considered clean. If the present trend continues, when will
this clean level be achieved?
1.9 Group Projects

Working in small groups is typical of most work environments. Thus learning to work with others and to commu-
nicate specific ideas is an important skill. Work with three or four other students to submit a single report address
one or more of the following questions.
Project 1A: Heart Rates in Mammals
Smaller mammals and birds have faster heart rates than larger ones. If we assume that evolution has determined
the best rate for each, why isn’t there a single best rate? Is there a model that leads to a correct rule relating heart
rates? A warm blooded animal uses large quantities of energy in order to maintain body temperature, because of
heat loss through its body surfaces. Since cold-blooded animals require very little energy when they are resting, the
major energy drain on a resting warm blooded animal seems to be maintenance of body temperature.
The amount of energy available is roughly proportional to blood flow through the lungs-the source of oxygen.
Assuming the least amount of blood needed is circulated, the amount of available energy will equal the amount used.
In this project, you are to develop a model of blood flow and heart pulse rates as a function of body size and validate
the model using the data in Tables 1.12 and 1.13. [xref] Be sure to address the following points.
• Set up two models based on geometric and elastic similarity relating body weight to basal (resting) blood flow
through the heart. State your assumptions.
• There are many animals for which pulse rate data is available but not blood flow data. Set up two models
based on geometric and elastic similarity that relate body weight to basal pulse rate.
• Test your ideas using the data in Tables 1 and 2. In addition to finding the best-fitting lines, determine how
the data supports your assumptions about how the shape of the heart scales with size.

136 1.9. GROUP PROJECTS
Table 1.12: Data on Humans and Some Mammals (W. S. Spector (1956) Handbook of Biological Data)
Animal Weight (kg) Blood Flow (deciliters/min)
Human (age 5) 18 23
Human (age 10) 31 33
Rabbit 4.1 5.3
Goat 24 31
Dog 1 16 22
Dog 2 12 12
Dog 3 6.4 11
Table 1.13: Data from A. J. Clark, Comparative Physiology of the Heart, Macmillan, 1972
Animal Weight Pulse Heart Weight Ventricle
(kg) (1/min) (g) length(cm)
Shrew 0.004 660
Mouse 0.025 670 0.13 0.55
Rat 0.2 420 0.64 1.0
Guinea Pig 0.3 300 32.00
Rabbit 2 205 5.8 2.2
Dog 5 120
Dog 2 30 85 102 4.0
Sheep 50 70 210 6.5
Human 70 72
Horse 450 38 3900 16
Ox 500 40 2030 12
Elephant 3000 48
Project 1B: The Mouse to Elephant Curve

The most universal feature of living organisms is their turnover of energy. Animals, with few exceptions, obtain
energy by the oxidation of organic compounds, and the rate of energy turnover, or the metabolic rate, is often
measured by the rate of oxygen consumption. The fact that there is a regular relationship between the metabolic
rate, or rate of oxygen consumption, and the body size of animals is thoroughly familiar to biologists. In the early
part of the century French scientists realized that the heat dissipation from warm-blooded animals must be roughly
proportionate to their free surface, and since smaller animals have a larger relative surface, they must also have a
higher relative rate of heat production than larger animals. In this project, you are to develop a model to explore this
relation and use the data on the next page to assess the accuracy of your model and its assumptions. The project
needs to address the following points.
• Develop two models based on geometric and elastic similarity to describe surface area of a mammal as a function
of body size. Be sure to state all of your assumptions.
• Develop two models based on geometric and elastic similarity to describe the metabolic rate of an organism as
a function of body size.
• Test your models using the data in Table 1.13.
• The curve described by the data on the next page is called the “mouse to elephant” curve. However, the
original data set from which it was derived did not include the relevant numbers for the elephant. Find the

weight of an elephant and determine what the models predict for its metabolic rate. If possible, compare this
prediction with the actual value.
• In 1847, Bergmann formulated what later became known as Bergmann’s rule which state that animals that live
in colder climates are of larger body size than their relatives from warmer climates. Based on your analysis,
does this rule make sense?
Animal Weight kCal/day

Mouse 0.021 3.6
Rat 0.282 28.1
Guinea-pig 0.410 35.1
Rabbit 2.980 167
Rabbit 2 1.520 83
Rabbit 3 2.460 119
Rabbit 4 3.570 164
Rabbit 5 4.330 191
Rabbit 6 5.330 233
Cat 3.0 152
Monkey 4.200 207
Dog 6.6 288
Dog 2 14.1 534
Dog 3 24.8 875
Dog 4 23.6 872
Goat 36 800
Chimpanzee 38 1,090
Sheep 46.4 1,254
Sheep 2 46.8 1,330
Woman 57.2 1,368
Woman 2 54.8 1,224
Woman 3 57.9 1,320
Cow 300 4,221
Cow 2 435 8,166
Cow 3 600 7,877
Heifer 482 7,754
Project 1C: Golden Ratio

Around 300 BC, the greatest of the ancient Greek Geometers, Euclid of Alexander, defined what he called the
”extreme and mean ratio,” now better known as the golden ratio, as follows∗ :
A straight line is said to have been cut in extreme and mean ratio when, as the whole line is to the
greater segment, so is the greater segment to the lesser segment.
Specifically, if we look at the line illustrated in Figure 1.56 this statement can be expressed mathematically as
(a+b)/a=a/b.
The golden rectangle is the rectangle whose sides conform to the golden ratio. The beauty, and astonishing
perfection, of the golden rectangle arises from the fact that if you add one additional edge parallel to a short side of
the rectangle to form a square within the rectangle, the smaller rectangle so formed (now oriented at 90 degrees to
the original rectangle–see figure below) is also a golden rectangle as illustrated in Figure 1.57
A spiral can be constructed passing through the corners of all embedded squares of the preceding construction
in a such way that the spiral is equiangular—also known as the logarithmic spiral which has the form shown in
Figure 1.58.
∗ Mario Livio, The Golden Ratio: the Story of Phi, the Worlds Most Astonishing Number, Broadway Books, New York, 2002, p. 3.

Figure 1.56: Line segments used by Euclid to define this “extreme and mean ratio”
Figure 1.57: The illustrated panel is a golden rectangle. If we cut off the red square, as in the second panel, the
vertical residual rectangle is also golden. Continuing to cut off squares as in the third to fifth panels leaves ever
smaller residual golden rectangles. This leaves a smaller rectangle which again is cut and so on infinitum. The inset
curve is the logarithmic or equiangular spiral
.
Leonardo da Vinci, one of the greatest painters of all time, so valued the aesthetic proportions of the golden
rectangle that aspects of figures and forms in many of his paintings conform to golden rectangle proportions. Fur-
ther, the logarithmic spiral is a construction that the famous American architect, Frank Lloyd Wright used for the
Guggenheim Museum in New York City.
1. List five paintings that art historians regard as compositions containing golden rectangles.
2. The shell of the nautilus mollusk, Nautilus pompilius has the shape of a logarithmic spiral. Find a list of at
least five other natural objects that contain shapes conforming to logarithmic spirals.
3. Define, using the concept of a tangent to a curve, what is meant by an equiangular spiral.
4. From Euclid’s statement

√
regarding the extreme and mean ratio, commonly denoted by the Greek letter phi
(φ), show that φ = 1+2 5 .
5. If √
φ1 =
1 = 1,
q
√ √
φ2 = 1 + 1 = 2 = 1.141421....
r q q
√ √
φ3 = 1 + 1 + 1 = 1 + 2 = 1.553773....

Figure 1.58: If we inset quarter circles into the squares obtained by repeatedly reducing golden rectangles as illustrated
in this construction, we obtain a very good approximation to a true logarithmic or equiangular spiral which is much
more difficult to accurately construct.
s r q
√
φ4 = 1+ 1+ 1 + 1 = 1.598053....
..
.
r q
√
φn = 1 + 1 + ...1 + 1 n square roots deep
then use your technology to calculate φi , i = 5, ..., 10. Use the definition of φn to generate a relationship of the
form φn+1 = f (φn ) and demonstrate that an equilibrium solution φ = f (φ) is the golden ratio. To how many
decimal places do the numerical values of φ and φ10 coincide?
6. If
φ′0 = 1,
1 2
φ′1 = 1 + = = 2,
1 1
′ 1 3
φ2 = 1 + = = 1.5
1 + 11 2
1 5
φ′3 = 1 + 1 = = 1.666...
1+ 1+ 11
3
1 8
φ′4 = 1 + 1 = = 1.666...
1+ 1+ 1 5
1+ 1
1
1 13
φ′5 = 1 + 1 = = 1.625
1+ 1+ 1 8
1+ 1
1+ 1
1
..
.
1
φ′n = 1 + 1 n denominators deep
1+ 1+ 1
1+ 1
1+···1+ 1
1
can you find a relationship of the form φn+1′ = f (φ′n )? Demonstrate that an equilibrium solution φ = f (φ) is
the golden ratio. Notice that the denominators and numerators of the consecutive fractions φ′1 = 21 , φ′2 = 23 ,
φ′3 = 53 ,... are the Fibonacci sequence discussed in the Historical Quest, Problem 36, Section 1.7. Use this fact
to write the expression for φ′10 . Draw a conclusion
7. Compare the value of φ′10 obtained in the preceding question with the golden ratio φ and φ10 obtained from in
the question before that. Which of φ10 and φ′10 provides the better approximation to φ? Can you generalize
this statement to φn and φ′n as an approximation to φ for any n?


Chapter 2
Limits and Derivatives
2.1 Rates of Change and Tangent Lines, p. 143
2.2 Limits, p. 154
2.3 Limit Laws and Continuity, p. 174
2.4 To Infinity and Beyond, p. 190
2.5 Sequential Limits, p. 202
2.6 The Derivative at a Point, p. 220
2.7 Derivatives as Functions, p. 233
Figure 2.1: Sockeye salmon (see Example 8, Section 2.5 [xref])
PREVIEW
Calculus is the greatest aid we have to the application of physical truth in the broadest sense of the word.
William F. Osgood, US Mathematician, 1864-1943.
The Calculus, one of the great intellectual achievements of humankind, came to fruition through the work of
Sir Isaac Newton (1642-1727) and Gottfried Leibniz (1646-1716). It consists of two parts, differential calculus and

142
integral calculus, both of which hinge on the concept of a limit in which the behavior of a function is described as
its argument approaches a selected limiting value. In this section, we first discuss some of the basic properties of
a limit and the associated concept of continuity. Using limits we then introduce the main concept in differential
calculus: that of a derivative. This concept is one of the fundamental ideas in mathematics, and is also a cornerstone
of modern scientific thought. It allows us to come to grips with the phenomenon of change in the value of variables
over increasingly small intervals of time or space (or any other independent variable)—variables representing such
things as the velocity of a stooping falcon or the density of a population of fish. Using the derivative, we will find the
slopes of tangent lines, examine rates of changes, and consider applications to enzyme kinetics, biodiversity, foraging
of hummingbirds, wolf killing rates, and the population dynamics of sockeye salmon depicted in Figure 2.1.

2.1. RATES OF CHANGE AND TANGENT LINES 143
2.1 Rates of Change and Tangent Lines

One of the fundamental concepts in calculus is that of the limit. Intuitively “taking a limit” corresponds to
investigating the value of a function as you get closer and closer to a specified point without actually reaching that
point. To motivate limits, we begin by showing how they arise when considering rates of change and tangent lines.
Rates of change
Rates of change describe how quantities change with respect to a variable such as time or body mass. For instance,
in countries where overpopulation is an issue, projections are constantly made about the population growth rate.
Alternatively, for a patient receiving drug treatment, physicians perform experiments to estimate the clearance rate
of the drug (i.e. the rate at which the drug leaves the blood stream). To define a rate of change for a given function,
we can specify an interval over which we find the average rate of change or can specify a point at which we find the
instantaneous rate of change.
The average rate of change of f over the interval [a, b] is

Average Rate f (b) − f (a)
of Change
b−a
Example 1. Mexico population growth
The population size of Mexico (in millions) in the early 1980s is reported in the following table.
Year Time Lapsed t Population N (t)

1980 0 67.38
1981 1 69.13
1982 2 70.93
1983 3 72.77
1984 4 74.66
1985 5 76.60
From this table we see that N (t) denotes the population size in millions t years after 1980.
a. Compute the population’s average rate of change (i.e. the population growth rate) for the interval [3, 5].
Identify the units for this average rate of change and interpret this rate of change.
b. Compute the average population growth rate for the interval [1, 3]. How does this compare to your answer
in a. What does this imply about the population growth?
c. Compute the average population growth rates for the intervals [0, 5], [0, 4], [0, 3], [0, 2], and [0, 1]. Discuss
the trend of these growth rates.
Solution.
a. The average growth rate for N (t) over [3, 5] is given by
N (5) − N (3) 76.6 − 72.77

= = 1.915
5−3 2
The units of this growth rate are millions per year. Hence, between the years 1983 and 1985, the
population increases on average by 1.915 million individuals per year.

144 2.1. RATES OF CHANGE AND TANGENT LINES
b. The average growth rate for N (t) over [1, 3] is given by
N (3) − N (1) 72.77 − 69.13

= = 1.82
3−1 2
The population is growing on average of approximately 1.8 million per year. Thus, the average growth
rate from 1981 to 1983 is less than the average growth rate from 1983 to 1985. The population growth
rate in Mexico appears to be increasing over this time period.
c. Computing the average growth rates over the requested time intervals yields:
Time Interval Average Growth Rate

[0, 5] 1.84
[0, 4] 1.82
[0, 3] 1.80
[0, 2] 1.77
[0, 1] 1.75
The average population growth rate is decreasing over the smaller time intervals and appears to converging
to a value close to a value a little below 1.75 million per year.
2
The instantaneous rate of change of f (x) at x = a (if it exists) is defined to be the limiting value of the
average rate of change of f on smaller and smaller intervals starting at x = a. For example, in the Example 1, we
would estimate the instantaneous rate of change of the Mexico population in 1980 to be 1.75 million per year. In
other words, the population is growing at a rate of 1.75 million per year in 1980. More precisely, we get the following
formal definition.
The instantaneous rate of change of f at x = a is given by
f (b) − f (a)
Instantaneous lim
Rate of Change b→a b−a
where the symbol “limb→a ” is interpreted as taking b arbitrarily close but not equal
to a.
We provide more details about computing limits in Sections 2.2 and 2.3.
Example 2. Rate of change of CO2
In Chapter 1, we initially approximated the concentration of CO2 in ppm at the Mauna Loa observatory of Hawaii
with the linear function
L(t) = 329.3 + 0.1225 t
where t is measured in months after April 1974. We then refined our approximation with the function

πt
F (t) = 329.3 + 0.1225 t + 3 cos .
6
Using the definition introduced in this section, estimate the instantaneous rate of change of the functions L(t) and
F (t) at t = 3.
Solution. To estimate the instantaneous rate of change of L at t = 3, we can look at the average rate of change
over the interval [3, b] for b values closer and closer to 3
L(b) − L(3) 329.3 + 0.1225b − (329.3 + 0.1225 · 3)

=
b−3 b−3

0.1225(b − 3)
=
b−3
= 0.1225
Since L is a linear function, the average rate of change of L is independent of b and equals the slope of L(t) for all
b 6= 3. Hence, we find that the instantaneous rate of change of L(t) at t = 3 is about 0.12 ppm/month.
To estimate the instantaneous rate of change of F (t) at t = 3, we look at the average rate of change
F (b) − F (3)
b−3
over intervals [3, b] for b values closer and closer to 3 (b > 3). Doing so yields the following table.
Interval b−3 Average Rate of Change of F (t)

[3, 4] 1 −1.38775
[3, 3.5] 0.5 −1.43041
[3, 3.1] 0.1 −1.44758
[3, 3.01] 0.01 −1.44829
[3, 3.001] 0.001 −1.44830
This table suggests the instantaneous rate of change of F (t) is a little less than −1.45 ppm/month.
You might ask why there is a difference in the signs between the answers for L(t) and F (t)? Recall that L(t) only
described the linear trend of the CO2 data which was increasing. On the other hand, F (t) captured the seasonal
fluctuations of the CO2 levels. Turning back to Figure 1.12, we see that indeed in the third month of the data, the
level of CO2 was decreasing. 2
This last example indicates that a linear fit to an oscillating function may provide a reasonable estimate of rates
of change averaged over several oscillations, but is a poor estimate of the instantaneous rate of change because the
latter depends on the particular stage of each oscillation. More insight into this issue is provided by the concept of
a tangent line that you probably first encountered in your high school geometry class.
Tangent lines
A linear function is a function with constant slope. This raises the question, “What is the slope of a nonlinear
function at a point?” To answer this question, we need to solve the “Tangent Problem” whose solution resides in
the following words of Gottfried Leibniz (1646-1716):
We have only to keep in mind that to find a tangent means to draw the line that connects two points of
the curve at an infinitely small distance... ∗
As illustrated in Figure 2.2 and stated by Leibniz, the tangent line of y = f (x) at a point (a, f (a)) (in blue) can
be approximated by secant lines (in red) passing through the points (a, f (a)) and (a + h, f (a + h)) as h 6= 0 gets
closer and closer to 0. The slope of this secant line is given by
f (a + h) − f (a)
Slope of Secant Line =
h
By letting h get arbitrarily close to 0, we get the
f (a + h) − f (a)
Slope of Tangent Line = lim
h→0 h
where the symbol “limh→0 ” can be interpreted as taking h arbitrarily close to, but not equal to 0. This limit may
or may not exist, so not every curve will have a tangent line at every point.
Example 3. Approximating a tangent line

∗ Selected Works Vol. 2 [Russian translation], Mysl’, Moscow, 1984.

Figure 2.2: Approximating the tangent line with a secant line
Approximate the tangent line of y = ln x at the point x = 1.
Solution. As a first approximation to the tangent line, we can consider the secant line passing through the point
(1, ln 1) = (1, 0) and (2, ln 2). The slope of this secant line is given by
ln 2 − 0
Slope of Secant Line = = ln 2 ≈ 0.693
2−1
Notice that for this approximation, h = 1. Using the point slope formula for a line, the equation of the secant line is
y = ln 2(x − 1)
Graphing y = ln x and y = ln 2(x − 1) yields:
To obtain a better approximation, we now move closer to (1, 0) by letting h = 0.5. This secant line passes through
(1, 0) and (1.5, ln 1.5). The slope of this secant line is
ln 1.5 − 0
= 2 ln 1.5 ≈ 0.811
1.5 − 1

and the secant line is

y = 2 ln 1.5(x − 1)
This secant looks like a better approximation to a tangent line:
To obtain better approximations, we can ask what happens to the slope of the secant line passing through (1, 0)
and (1 + h, ln(1 + h)) as h 6= 0 gets closer and closer to 0? The answer to this question lies in the following table:
h Approximate Slope of Secant Line

1 0.693
0.5 0.811
0.1 0.953
0.01 0.995
0.001 1.000
This table suggests that as h gets closer to 0, the slope of the corresponding secant line approaches 1. Hence, it
seems reasonable to approximate the tangent line by y = x − 1, as shown here:

We will see later that this is actually the tangent line. 2
Sometimes it is possible to algebraically determine the slope of the tangent line.
Example 4. Tangent line for a parabola
Find the tangent line passing through the point (1, 1) for y = x2 .
Solution. To find the tangent line, we first need its slope. The slope of the secant line passing through the point
(1, 1) and (1 + h, (1 + h)2 ) is given by
(1 + h)2 − 1 1 + 2h + h2 − 1
=
1+h−1 h
2h + h2
=
h
= 2 + h for h 6= 0
The slope of the tangent line is

lim (2 + h)
h→0
Since 2 + h gets arbitrarily close to 2 as h gets close to 0, the slope of the tangent line is 2. Using the point-slope
formula, we find the equation of the tangent line:
y = 2(x − 1) + 1 = 2x − 1
Plotting this line against y = x2 in Figure 2.3 shows that the tangent line just “kisses” the parabola at (1, 1), touching
it in a single point in this case.
y
2
1.5
1
0.5
x
-2 -1 1 2
-0.5
-1
-1.5
-2
Figure 2.3: Graph of the tangent to y = x2 at (1, 1).
Problem Set 2.1

Find the average rate of change for the functions in Problems 1 to 6 on the specified intervals.
1. f (x) = 4 − 3x on [−3, 2].
2. f (x) = 5 on [−3, 3].

3. f (x) = 3x2 on [1, 3].

4. f (x) = −2x2 + x + 4 on [1, 4].
√
5. f (x) = x on [4, 9].
−2
6. f (x) = x+1 on [1, 5].
Approximate the instantaneous rate of change for the functions in Problems 7 to 12 at the indicated point. These
are the same functions as those given in Problems 1 to 6.
7. f (x) = 4 − 3x at x = −3.
8. f (x) = 5 at x = 3.
9. f (x) = 3x2 at x = 1.
10. f (x) = −2x2 + x + 4 at x = 4.
√
11. f (x) = x at x = 4.
−2
12. f (x) = x+1 at x = 1
Trace the curves in Problems 13 to 18 onto your own paper and draw the secant line passing through P and Q. Next,
imagine h → 0 and draw the tangent line at P assuming that Q moves along the curve to the point P . Finally,
estimate the slope of the curve at P using the slope of the tangent line you have drawn.
13.
14.

15.
16.
17.

18.
Estimate the tangent line for y = f (x) in Problems 19 to 24 by approximating secant slopes to estimate the limiting
value. Graph the both the function and the tangent line on the same plot.
19. f (x) = x2 − x + 1 at x = −1.
20. f (x) = 4 − x2 at x = 0.
21. f (x) = sin πx

2 at x = 0.5.
1
22. f (x) = x+3 at x = 3.
23. f (x) = ex at x = 0.
24. f (x) = e−x at x = 0.
Algebraically determine the tangent line for y = f (x) at the point specified in Problems 25 to 30. Graph the both
y = f (x) and the tangent line on the same plot.
25. f (x) = 3x − 7 at x = 3.
26. f (x) = x2 at x = −1.
27. f (x) = 3x2 at x = −2.
28. f (x) = x3 at x = 1.
√ √ √
29. x at x = 9. Hint: Multiply by √x+h+√x .
x+h+ x
√
30. 5x at x = 5.
Find the average rates of change of the given functions over the specified intervals in Problems 31 to 34. Be sure to
specify units and briefly state the meaning of the average rate of change.
31. P (t) = 8.3 × 1.33t is the number (in millions) of people living in the United States t decades after 1815.
Intervals: [0, 2] and [2, 4].
32. L(x) = 20.15 x2/3 is the number of kilograms lifted by an Olympic Gold Weightlifter weighing x kilograms.
Intervals: [56, 75] and [100, 110] .
33. The height H(t) of beer froth after t seconds:∗

Intervals: [0, 30] and [60, 90].

Time t (seconds) Froth Height H (cm)

0 17.0
15 16.1
30 14.9
45 14.0
60 13.2
75 12.5
90 11.9
105 11.2
120 10.7
Table 2.1: Height of beer froth as a function of time after pouring
0.5 1 1.5 2
Figure 2.4: Tidal height
34. The height f (x), in feet, of the tide at time x hours where the graph of y = f (x) is given in Figure 2.4.
Intervals:[0.25, 0.50] and [1, 2]
Approximate the instantaneous rates change of the given functions at the points specified in Problems 35 to 38. These
are the same functions as those in Problems 31 to 34.
35. P (t) = 8.3(1.33)t is the number (in millions) of people living in the United States t decades after 1815. Points:
t = 0 and t = 2.
36. L(x) = 20.15 x2/3 is the number of kilograms lifted by an Olympic Gold Weightlifter weighing x kilograms.
Points: x = 56 and x = 100.
37. Use the data in Table 2.1 to estimate the instantaneous rate of change of height H(t) of beer froth after 0 and
after 60 seconds.
38. The height f (x), in feet, of the tide at time x hours where the graph of y = f (x) is in given in Figure 2.4.
Points: x = 0.25, x = 1.
39. An environmental study of a certain suburban community suggests that t years from now, the average level of
CO2 in the air can be modeled by the formula
q(t) = 0.05t2 + 0.1t + 3.4
parts per million.

a. At what rate will the CO2 level be changing with respect to time exactly one year from now?
b. By how much will the CO2 level change in the first year?
c. By how much will the CO2 level change over the next (second) year?
∗ Data from “Demonstration of the exponential decay law using beer froth” European Journal Physics 23 (2002) by Dr. Leike, pp.
21-26.

40. The population of a bacterial colony is approximately
P (t) = P0 + 61t + 3t2 thousand individuals
t hours after observation begins, where P0 is the initial population, i.e. P (0) = P0 . Find the rate at which the
colony is growing after exactly five hours.

154 2.2. LIMITS
2.2 Limits
In defining the instantaneous rate of change and the slope of a tangent line, we use the notation for a limit, first
introduced in Section 2.1. The concept of a limit is one of the foundations of both differential and integral calculus.
Thus we devote this subsection to exploring the concept of limits in the context of functions before proceeding to
the calculus itself.
Limits
We begin our study of limits with the following mathematically “informal” definition.
Let f be a function. The notation
lim f (x) = L
Limit x→a
(Informal)
is read as “the limit of f (x) as x approaches a is L” and means that the functional
values f (x) can be made arbitrarily close to L by requiring that x be sufficiently
close to, but not equal to, a.
With the use of additional notation, at the end of this section we make this definition mathematically precise. At
several other places in this book we favor informal over formal definitions using the following quote by the historian
E. T. Bell∗ as our justification: “To the early developers of calculus the notions of variables and limits were intuitive;
to us they are extremely subtle concepts hedged about with thickets of semimetaphysical mysteries concerning the
nature of numbers . . ..” Our goal is to provide definitions that make the concepts usable without getting caught up
in the technicalities of mathematically formal definitions.
Example 1. Finding limits
Find the following limits using the informal definition provided above.
a. Graph the function y = x2 to find limx→2 x2 .
x−2 x−2
b. Numerically evaluate the function y = x2 −4 to find limx→2 x2 −4 .
2
c. Use the informal definition of a limit to find limx→0 e−1/x .
Solution.
a. Graph y = x2 . Choose several values of x (getting closer to 2, but not equal to 2) and then corresponding
y-values.
∗ Men of Mathematics, New York: Simon and Shuster, 1937

2.2. LIMITS 155
From this graph, we can see that as x gets closer to 2, the value of the function gets closer to 4. In fact,
if we zoom in around the point x = 2, we obtain
These graphs (correctly) suggest that limx→2 x2 = 4. This limit corresponds to simply evaluating x2 at
x = 2. This is the idea of “continuity” which is introduced in the next section.
b. The function f (x) = xx−2
2 −4 is not defined at x = 2 (division by 0), so we cannot simply evaluate this
function at x = 2 as suggested at the end of part a. Instead, we can only consider values near (but not
equal to) 2. Since x2 − 4 = (x − 2)(x + 2), we have
x−2 1
f (x) = = for x 6= 2
(x − 2)(x + 2) x+2
Evaluating f at x-values near 2 yields the following table:
x f (x) x f (x)
1.0 0.333333 3.0 0.200000
1.5 0.285714 2.5 0.222222
1.9 0.256410 2.1 0.243902
1.99 0.250627 2.01 0.249377
1.999 0.250063 2.001 0.249938
1.9999 0.250006 2.0001 0.249994
x−2 1
This table suggest that limx→2 x2 −4 = 0.25 which corresponds to evaluating x+2 at x = 2.

156 2.2. LIMITS
2
c. The function f (x) = e−1/x is not defined at x = 0. However, if x is close to zero, then − x12 is very large
2 2
and negative. Consequently, if x is close to zero, e−1/x is close to 0. This suggests that limx→0 e−1/x = 0.
We can reinforce this conclusion by looking at the graph of f , as shown here:
y
0.8
0.6
0.4
0.2
x
-3 -2 -1 1 2 3
The existence of a limit limx→a f (x) = L can be interpreted in terms of choosing the appropriate window for
viewing a function.
Example 2. Choosing the correct viewing window
For the following limits limx→a f (x) = L, determine how close x needs to be to a to ensure that f (x) is within
0.01 and 0.00001 of L. In each case, plot the function in the appropriate window to illustrate your findings.
a. limx→4 (2x − 1) = 7
2
b. limx→0 e−1/x
Solution.
a. To have 2x − 1 within 0.01 of 7, we need
6.99 ≤ 2x − 1 ≤ 7.01
7.99 ≤ 2x ≤ 8.01 Adding 1 to all sides of the inequality.
3.995 ≤ x ≤ 4.495 Dividing all sides of the inequality by 2.
Plotting y = 2x − 1 in the window [3.995, 4.005] × [6.99, 7.01] yields
7.01
7.005
7
y
6.995
6.99
3.996 3.998 4 4.002 4.004
x
Notice that the graph of the function just fits in this window!
To have 2x − 1 within 0.00001 of 7, we need
6.99999 ≤ 2x − 1 ≤ 7.00001
7.99999 ≤ 2x ≤ 8.00001
3.999995 ≤ x ≤ 4.00005
Plotting y = 2x − 1 in the window [3.999995, 4.000005] × [6.99999, 7.00001] yields

2.2. LIMITS 157
7.00001
7.00001
y
7
6.99999
4 4 4 4 4
x
Again, notice that the graph of the function just fits in this window.
2
b. To ensure that e−1/x is within 0.01 of 0, we need
2
−0.01 ≤ e−1/x ≤ 0.01
Since the left-hand inequality is always true, we can ignore it. Since the natural logarithm is an increasing
function, we have
2
ln e−1/x ≤ ln 0.01
−1
≤ − ln 100
x2
1
≥ ln 100
x2
1
≥ x2
ln 100
since x2 > 0 for x 6= 0
r
1 √
0.46599 ≈ ≥ |x| since x is increasing
ln 100
2
Thus, if we plot y = e−1/x in the window [−0.46599, 0.46599] × [−0.01, 0.01] we obtain
0.01
0.005
0
y
-0.005
-0.4 -0.2 0 0.2 0.4

x
Again, the graph just fits our window.

2
To ensure that e−1/x is within 0.00001 of 0, we need
2
−0.00001 ≤ e−1/x ≤ 0.00001
Since the left-hand inequality is always true, we can ignore it. Also, since the natural logarithm is an
increasing function, we have
2
ln e−1/x ≤ ln 0.00001

158 2.2. LIMITS
−1
≤ − ln 100, 000
x2
1
≥ x2 since x2 > 0 for x 6= 0
ln 100, 000
r
1 √
0.294718 ≈ ≥ |x| since x is increasing
ln 100, 000
2
Thus, if we plot e−1/x in the window [−0.294718, 0.294718] × [−0.00001, 0.00001] we obtain
0.00001
-6
5·10
0
y
-6
-5·10
-0.3 -0.2 -0.1 0 0.1 0.2 0.3

x
Once again the graph just fits our window.
The statement limx→a f (x) = L can fail in two ways. First, limx→a f (x) exists but does not equal L. Second,
the limx→a f (x) does not exist. In other words, no matter what L we choose the statement limx→a f (x) = L is false.
The first failure you typically encounter when someone is testing you on this material or for some reason has defined
a function such that f (a) 6= limx→a f (x). The second failure is more interesting.
Example 3. Limit failures
Determine whether the following limits exist.
a. limx→0 cos 2π
x
b. limx→0 x cos 2π
x
Solution.
a. To understand limx→0 cos 2π 2π

x , we might begin by evaluating cos x at smaller and smaller x values. But
we need to be careful!
• If we evaluate at x = 1, x = 0.1, x = 0.01, x = 0.001, . . ., we obtain cos 2π = 1, cos 20π = 1,
cos 200π = 1, . . .. This suggests limx→0 cos 2π
x = 1.
• If we evaluate at x = 2, x = 2/3, x = 2/5 . . ., we obtain cos π = −1, cos 3π = −1, cos 5π = −1, . . ..
This suggests that limx→0 cos 2π
x = −1.
Both of these statements cannot be true simultaneously. We can see this by considering limx→0 cos 2π x = 1;
this requires that cos 2π
x can be made arbitrarily close to 1 for all x sufficiently close (but not equal to)
0. However, there are xs arbitrarily close but not equal to 0 (namely, x = 2/3, 2/5, 2/7, · · ·) such that
cos 2π 2π
x = −1 which is 2 units away from 1. Hence, limx→a cos x 6= 1. This argument can be refined to
2π
show that limx→a cos x 6= L for any choice of L. Therefore, the limit does not exist.
Graphing this function illustrates the dramatic nature of this non-existing limit:

2.2. LIMITS 159
y
1
0.5
x
-1 -0.5 0.5 1
-0.5
-1
b. To understand limx→0 x cos 2π x , we begin by noticing that cosine takes on values between −1 and 1.
Hence, for x 6= 0, −1 ≤ cos( 2π
x ) ≤ 1 and thus (since |x| > 0)
2π
−|x| ≤ x cos ≤ |x|
x
for all x 6= 0. Therefore, by choosing x sufficiently close to 0 but not equal to 0, we can make |x| as close
to 0 as we want, so that x cos 2πx becomes arbitrarily close to 0. Therefore,
2π
lim x cos =0
x→0 x
as the graph of y = x cos 2π
x illustrates:
y
1
0.5
x
-1 -0.5 0.5 1
-0.5
-1
We have relied occasionally on technology to compute limits. While technology often steers us in the right
direction, occasionally it drives us to incorrect conclusions.
Example 4. A computational dilemma
Consider the function √

1 + x2 − 1
f (x) =
x2
a. Use technology to evaluate f (x) at x = ±0.1, ±0.01, ±0.001, ±0.0001. Based on these evaluations, for-
mulate a conclusion about limx→0 f (x).
b. Use technology to evaluate f (x) at x = ±10−5 , ±10−6, ±10−7 , ±10−8 ± 10−9 . Based on these evaluations,
formulate a conclusion about limx→0 f (x). Compare your results to those of part a.

160 2.2. LIMITS
Solution.
a. We begin with a table of values.
x f (x)
±0.1 0.498756
±0.01 0.499988
±0.001 0.500000
±0.0001 0.500000
This table suggests that the limit is 0.5. Moreover, plotting the function over the interval −1 ≤ x ≤ 1
reaffirms this conclusion:
y
0.575
0.55
0.525
0.5
0.475
0.45
0.425
x
-1 -0.5 0.5 1
b. Next, we evaluate f for even smaller values of x:
x f (x)
±10−5 0.500000
±10−6 0.500044
±10−7 0.488498
±10−8 0.000000
±10−9 0.000000
It appears that f is approaching the value 0, not 0.5. Plotting the graph of y = f (x) over this smaller
range of x values yields
y
0.8
0.6
0.4
0.2
x
-7 -8 -8 -7
-1·10 -5·10 5·10 1·10
This graph suggests that the limiting value is 0 ... very strange!
Do you see the dilemma? It is not clear whether the answer should be 0 or 0.5. Later, we will develop
more reliable methods which will show this limit is 0.5. Hence, when you use technology, always be aware
that technology may mislead you.
2

2.2. LIMITS 161
One-sided limits
The definition of the limit of f (x) as x approaches a requires that f (x) approach the same value independent of
whether x approaches a from the right or the left. In this sense, limx→a f (x) is a “two-sided” limit. One-sided limits
are defined in the following box.
Right-hand limit We write

lim f (x) = L
x→a+
if we can make f (x) as close to L as we please by choosing x sufficiently close to a

One-sided limits and to the right of a
(informal) Left-hand limit We write
lim f (x) = L
x→a−
if we can make f (x) as close to L as we please by choosing x sufficiently close to a

and to the left of a
It should be clear that, in general, a two-sided limit cannot exist if the corresponding pair of one-sided limits are
different. Conversely, it can be shown that if the two one-sided limits of a given function f as x → a− and x → a+
both exist and are equal, then the two-sided limit, limx→a f (x) must also exist. These observations are so important
that we restate them as follows.
Let f be a function. Then

Matching limits lim f (x) = L if and only if lim f (x) = lim f (x) = L
x→a x→a+ x→a−
Example 5. Finding one-sided limits
x−3
Consider the function f (x) = |x−3| . Find the right-hand limit as x → 3+ , the left-hand limit as x → 3− , and
discuss whether limx→3 f (x) exists.
x−3
Solution. Since f (x) = |x−3| = 1 whenever x > 3, we have
lim f (x) = 1
x→3+
x−3
Since f (x) = |x−3| = −1 whenever x < 3, we have
lim f (x) = −1
x→3−
A graph of this function is given in Figure 2.5.

Since the right-hand and left-hand limits are not the same, we say that limx→3 f (x) does not exist. 2
Example 6. The floor function
The floor function, sometimes called the step-function , is the function that returns the largest integer less
than or equal to x. The function is typically denoted by ⌊x⌋. For instance, ⌊3⌋ = 3, ⌊π⌋ = 3, ⌊ 13 ⌋ = 0, and
⌊−1.1⌋ = −2.
a. Graph the y = ⌊x⌋ over the interval [−π, π]
b. Determine at what values of a, limx→a ⌊x⌋ does not exist.

162 2.2. LIMITS
y
1
0.5
x
2.5 3 3.5 4
-0.5
-1
x−3
Figure 2.5: Graph of f (x) = |x−3|
Solution.
a. The graph of y = ⌊x⌋ is shown in Figure 2.6. Notice that the closed circles include the endpoint and the
open circles exclude the endpoint.
Figure 2.6: Graph of ⌊x⌋
b. From Figure 2.6, we see the limit will not exist at integral values. That is, at the points a = −3, −2, −1, 0, 1, 2, 3,
we have that
lim+ ⌊x⌋ = lim− ⌊x⌋ + 1
x→a x→a
Therefore, by the matching limits property, the limit does not exist at a = −3, −2, −1, 0, 1, 2, 3.
One-sided limits are particularly useful when considering functions defined piecewise: that is, using different
expressions to describe the function over its domain broken up into several sub-domains. The following example
illustrates the idea of a piecewise defined function.
Example 7. Type I functional response
Planktonic copepods, such as the species shown in Figure 2.7, are small crustaceans found in the sea. These
organisms play an important role in global ecology as they are a major food source for small fish, whales, and

2.2. LIMITS 163
Figure 2.7: The planktonic copepod, Calanus pacificus, as seen under an electron microscope. Copepods such as this
are believed by some scientists to form the largest animal biomass on the earth.
seabirds. It is believed that they form the largest animal biomass on the earth. Given their importance, scientists
are interested in understanding how their feeding rate depends on availability of resources. In a classic ecology
paper, C. S. Holling classified feeding rates into three types∗ . The first type, so-called type I, assumes that organisms
consume at a rate proportional to the amount of food available until they achieve a maximal feeding rate.
In the 1970s, a scientist, B. W. Frost, from the Department of Oceanography at University of Washington,
measured feeding rates of the planktonic copepod, Calanus pacificus, in the lab. In one of his experiments, C.
pacificus were offered different concentrations of the diatom species, Coscinodiscus anstii. He found that C. pacificus
reached its maximal feeding rate of 1, 250 cells/hour when the concentration of C.anstii was approximately 200
cells/ml (see Figure 2.8). If you assume that the feeding rate is proportional to the concentration x of C. anstii until
they achieve their maximal feeding rate, then the feeding rate as a function of x is of the form

a x cells/hour if x ≤ 200
f (x) =
1, 250 cells/hour if x > 200
where a > 0 is a proportionality constant.
a. Find limx→200+ f (x) and limx→200− f (x).
b. Determine for what choice of a, limx→200 f (x) exists.
Solution.
a. Since f (x) = 1, 250 for all x > 200, we find
lim f (x) = 1, 250.
x→200+
On the other hand, f (x) = ax for all x ≤ 200. Hence, as x increases to 200, f (x) approaches 200a and
lim f (x) = 200a.
x→200−
b. By the matching limit property, limx→200 f (x) exists if and only if the left- and right-hand limits are
equal. Therefore, we need that 1, 250 = 200a or a = 6.25. In which case, limx→200 f (x) = 1, 250. The
graph of this function, along with the data as plotted in Figure 2.8, illustrates that by choosing a = 6.25,
the linear function and constant function are pasted together in such a way that their values agree at
x = 200.
2
∗ “The functional response of invertebrate predators to prey density”, Memoirs of the Entomological Society of Canada, 48 (1966),
1–86.

164 2.2. LIMITS
Figure 2.8: Feeding rate I (cells/hour) of a copepod as a function of the density of the diatoms (cells/ml) upon which
it feeds.
Limits: A formal perspective

This section can be omitted by those not going on to major in mathematics at the undergraduate level. Our
informal definition of the limit provides valuable intuition that allows you to develop a working knowledge of this
fundamental concept. For theoretical work, however, the intuitive definition will not suffice, because it gives no
precise, quantifiable meaning to the terms “arbitrarily close to L” and “sufficiently close to a.” In the nineteenth
century, leading mathematicians, including Augustin-Louis Cauchy (1789-1857) and Karl Weierstrass (1815-1897),
sought to put calculus on a sound logical foundation by giving precise definitions for the foundational ideas of calculus.
The following definition, derived from the work of Cauchy and Weierstrass, gives precision to the limit notation.
Let f be a real-valued function.
Limit lim f (x) = L

x→a
(Formal definition)
if for every ǫ > 0 there is some δ > 0 such that |f (x)− L| ≤ ǫ whenever 0 < |x− a| <
δ.
Behind the formal language is a fairly straightforward idea. In particular, to establish a specific limit, say
limx→a f (x) = L. Given any ǫ > 0 specifying a desired degree of proximity to L, a number δ > 0 is found that
determines how close x must be to a to ensure that f (x) is within ǫ units of L. This is shown in Figure 2.9.
Because the Greek letters ǫ (epsilon) and δ (delta) are traditionally used in this context, the formal definition of
limit is sometimes called the epsilon-delta definition of the limit. The goal of this subsection is to show how it can
be used rigorously to establish a variety of results. According to Michael Spivak, a professor emeritus at Brandeis
University, “This definition is so important (everything we do from now on depends on it) that proceeding any further
without knowing it is hopeless. If necessary, memorize it like a poem!”
One can view this definition as setting up an adversarial relationship between two individuals. One person shouts
out a value of ǫ > 0. The opponent has to come up with a δ > 0 such that f (x) is within ǫ of L whenever x is within
δ of a. This relationship is illustrated in Figure 2.10.
Notice that whenever x is within δ units of a (but not equal to a), the point (x, f (x)) on the graph of f must
lie in the rectangle (shaded region) formed by the intersection of the horizontal band of width 2ǫ centered at L and
the vertical band of width 2δ centered at a. The smaller the ǫ-interval around the proposed limit L, generally the
smaller the δ-interval will need to be for f (x) to lie in the ǫ-interval. If such a δ can be found no matter how small
ǫ is, then f (x) and L are arbitrarily close, so L must be the limit. The following examples illustrate epsilon-delta
proofs, one in which the limit exists and one in which it does not.

2.2. LIMITS 165
Figure 2.9: The epsilon-delta definition of limit
Figure 2.10: Formal definition of limit: limx→c f (x) = L
Example 8. An epsilon-delta proof of a limit statement
Show that limx→2 (4x − 3) = 5.
Solution. We guess that the limit as x → 2 is 5. The object is to prove that the limit is 5. We have
|f (x) − L| = |4x − 3 − 5|
= |4x − 8|
= 4|x − 2|
| {z }
T his must be less than
ǫ whenever |x − 2| < δ
For a given ǫ > 0 choose δ = 4ǫ . Then

ǫ
|f (x) − L| = 4|x − 2| < 4δ = 4 = ǫ.
4

166 2.2. LIMITS
This process is illustrated in the following figure:
Example 9. Limit of a constant
Use an epsilon-delta proof that the limit of a constant is a constant. That is show limx→a c = c
Solution. Let f (x) = c
|f (x) − c| = |c − c|
= 0
For every ǫ > 0 there exists a δ > 0 such that
|f (x) − c| < ǫ for allδ
Example 10. An epsilon-delta proof that a limit does not exist
1
Show that limx → 0 x does not exist.
1
Solution. Let f (x) = x and L be any number. Suppose that limx → 0 f (x) = L. Consider the graph of f , as shown
below:

2.2. LIMITS 167
It would seem that no matter what value ǫ > 0 is chosen, it would be impossible to find a corresponding δ > 0.
Indeed, suppose that
1
− L < ǫ
x
Then
1
−ǫ < −L<ǫ
x
and
1
L−ǫ< < L + ǫ.
x
This inequality holds whenever
1
< |L|+ǫ,
|x|
and hence will be violated whenever
1
|x| < .
|L|+ǫ
Thus, no matter what value of ǫ > 0 we choose, we cannot find a δ > 0 such that |1/x − L| < ǫ for all 0 < |x − 0| < δ.
Since L was chosen arbitrarily, it follows that the limit does not exist. 2
Problem Set 2.2

Given the functions defined by the graphs in Figure 2.11, find the limits in Problems 1-4.
(Graph of f ) (Graph of g)
Figure 2.11: Graphs of the functions f and g
1. a. limx→−4 f (x)
b. limx→0 f (x)
2. a. limx→7 g(x)
b. limx→0 g(x)
3. a. limx→2 f (x)
b. limx→−4 g(x)
4. a. limx→2− f (x)
b. limx→−4+ g(x)
Given the functions defined by the graphs in Figure 2.12, find the limits, if they exist, in Problems 5-8. If the limits
do not exist, discuss why.

168 2.2. LIMITS
y y
1
2 0.75
0.5
1
0.25
x
-1 -0.75 -0.5 -0.25 0.25 0.5 0.75 1
x
-2 -1.5 -1 -0.5 0.5 1 1.5 2
-0.25
-0.5
-1
-0.75
-2 -1
(Graph of F ) (Graph of G)
Figure 2.12: Graphs of the functions F and G
5. a. limx→1− F (x)
b. limx→1+ F (x)
c. limx→1 F (x)
6. a. limx→−1− F (x)
b. limx→−1+ F (x)
c. limx→−1 F (x)
7. a. limx→0− G(x)
b. limx→0+ G(x)
c. limx→0 G(x)
8. a. limx→0.5− G(x)
b. limx→0.5+ G(x)
c. limx→0.5 G(x)
Describe each figure in Problems 9-12 with a one-sided limit statement. For example, for 9 the answer is limx→1+ f (x) =
2
9.

2.2. LIMITS 169
10.
11.
12.
Approximate the limits by filling in the appropriate values in the tables in Problems 13-15 using a one-sides statement.
13. limx→5− f (x) where f (x) = (3x − 2)
x 2 3 4 4.5 4.9 4.99
f (x) 4

170 2.2. LIMITS
x3 −8
14. limx→2− g(x) where g(x) = x2 +2x+4
x 1 1.5 1.9 1.99 1.999 1.9999

g(x) -1
3x2 −2x−8
15. limx→2 H(x) where H(x) = x−2
x 1 1.5 1.9 1.99 1.999 1.9999

h(x) 7
x 3 2.5 2.1 2.01 2.001 2.0001
h(x) 13
Determine the limits in Problems 16 to 24. If the limit exists, explain how you found the limit. If the limit does not
exist, explain why.
1
16. limx→5 x
|x+3|
17. limx→−3+ x+3
18. limx→−1 cos x
19. limx→−1 cos(πx)

ln x
20. limx→1 x−1
√
x+2−2
21. limx→2 x−2
x
22. limx→0 |x|
x2
23. limx→0 |x|
(x−4)2
24. limx→4 |x−4|
25. Consider the function

√ 1
f (x) = x cos
x
whose graph is given by
0.4
0.2
0.2 0.4 0.6 0.8 1
-0.2
-0.4
Does limx→0+ f (x) exist. If so, what is it? If not, why not?
26. Consider the function

1
f (x) = |x| sin
x
whose graph is given by

2.2. LIMITS 171
Does limx→0+ f (x) exist. If so, what is it? If not, why not?
27. Consider the statement limx→1 (4 + x) = 5. How close does x need to be to 1 ensure that 4 + x is within the
given distance of 5?
a. 0.1
b. 0.01
c. 0.001
28. Consider the statement limx→2 x2 = 4. How close does x need to be to 2 ensure that x2 is within the given
distance of 4?
a. 0.1
b. 0.01
c. 0.001
√ √
29. Consider the statement limx→0+ x = 0. How close does x need to be to 0 ensure that x is within the given
distance of 0?
a. 0.1
b. 0.01
c. 0.001
30. Consider the statement limx→e ln x = 1. How close does x need to be to 0 to ensure that ln x = 1 is within the
given distance of 1?
a. 0.1
b. 0.01
c. 0.001
31. Consider the function √
4 − x2 − 2
f (x) =
x2
a. Use technology to graph y = f (x) over the interval [−2, 2].
b. Use technology to graph y = f (x) over the interval [−0.1, 0.1]. Based on your graph, guess the value
of limx→0 f (x).
c. Use technology to graph y = f (x) over the interval [−10−7, 10−7 ]. Based on your graph, guess the
value of limx→0 f (x).

172 2.2. LIMITS
d. Most technologies can keep track of 16 or less digits. In light of this observation, discuss what might
be happening in parts c and d.
ln(1+x2 )
32. Consider the function f (x) = x2 .
a. Use technology to graph y = f (x) over the interval [−2, 2].

b. Use technology to graph y = f (x) over the interval [−0.1, 0.1]. Based on your graph, guess the value
of limx→0 f (x).
c. Use technology to graph y = f (x) over the interval [−10−7, 10−7 ]. Based on your graph, guess the
value of limx→0 f (x).
d. Most technologies can keep track of 16 or less digits. In light of this observation, discuss what might
be happening in parts c and d.
In Problems 33 to 38, prove the limit exists using the formal definition of the limit.
33. limx→5 (x + 1) = 6
34. limx→5 (1 − 3x) = −14

1 1
35. limx→2 x = 2
2
36. limx→0 e−1/x = 0
37. limx→2 (x2 + 2) = 6
38. limx→1 (x2 + 1) = 1
LEVEL 2 – APPLICATIONS
39. The federal income tax rates for singles in 2006 is shown in in Table 2.2.
Table 2.2: Schedule X - Single

If taxable income is over But not over The tax is
$0 $7,550 10% of the amount over $0
$7,550 $30,650 $755 plus 15% of the amount over $7,550
$30,650 $74,200 $4,220.00 plus 15% of the amount over $30,650
$336,550 no limit $97,653.00 plus 35% of the amount over $336,550
Express the income tax f (x) for an individual in 2006 with adjusted income x dollars as a piecewise defined
function.
a. Graph y = f (x) over the interval [0, 500, 000].

b. Determine at what values of a, limx→a f (x) does not exist.
40. In 2007, the U. S. postal rates were 41¢ for the first ounce or fraction of an ounce, and 17¢ for each additional
ounce or fraction of an ounce up to 2 pounds. Let p represent the total amount of postage (in cents) for a
letter weighing x ounces.
a. Graph y = p(x) over the interval [0, 8] ounces.

b. Determine at what values of a, limx→a f (x) does not exist.

2.2. LIMITS 173
41. A wildlife ecologist who studied the rate at which wolves kill moose in Yellowstone National Park found when
moose were plentiful, wolves killed moose at the rate of one moose per wolf every 25 days. (Note this doesn’t
mean that wolves only eat every 25 days because they hunt in packs and share kills.) However when the density
of moose drops below x = 3 per km2 , then the rate at which wolves kill moose is proportional to the density.
Construct a Type I functional response f (x) (see Example 7) such that f (x) has a limit at x = 3.
42. A student looking at the data in Figure 2.8 decided that the following function might provide a better fit to
the data:

 6.25x cells/hour if x ≤ 150
f (x) = ax + b cells/hour if 150 < x < 300

1, 300 cells/hour if x ≥ 300
Find values for the parameters a and b that ensure f (x) has limits at x = 150 and x = 300.

174 2.3. LIMIT LAWS AND CONTINUITY
2.3 Limit Laws and Continuity

Having defined limits, we are ready to develop some tools to verify their existence and to compute them more
readily. In some cases, taking the limit of a function reduces to evaluating the function at the limit point, and in
some cases we cannot find the limit by evaluation. In this section, we find when evaluation is acceptable, and when
it is not.
Properties of Limits
With a definition of a limit in hand, it is important to understand how the definition acts under functional arithmetic.
For instance, if limx→a f (x) = L and limx→a g(x) = M , then f (x) and g(x) can be made arbitrarily close to L and
M , respectively, for all x sufficiently close but not equal to a. Hence, f (x)g(x) must be arbitrarily close to LM for
all x sufficiently close but not equal to a. Therefore, it is reasonable to conjecture that the limit of the product f · g
is the product L M of the limits. Indeed this is true and can be proven using the formal definition of the limit. In
fact limits satisfy all the arithmetic properties that you would think they should, as summarized in the following
box.
Let f and g be functions such that limx→a f (x) = L and limx→a g(x) = M . Then
Sums limx→a (f (x) + g(x)) = L + M
Differences limx→a (f (x) − g(x)) = L − M
Limit Laws
Products limx→a f (x)g(x) = L M
f (x) L
Quotients limx→a g(x) = M provided that M 6= 0.
Example 1. Using limit laws
Using the limit laws, find the following limits. You may assume that limx→4 x = 4 and limx→4 1 = 1.
a. limx→4 x2 .
b. limx→4 (x2 + x).

c. limx→4 x1 − x12 .
Solution.
a.
lim x2 = ( lim x)( lim x) Product law.

x→4 x→4 x→4
= 4 · 4 = 16 Given value
b.
lim (x2 + x) = lim x2 + lim x Sum law

x→4 x→4 x→4
h i2
= lim x + lim x Product law
x→4 x→4
2
= [4] + 4 Given value
= 20

2.3. LIMIT LAWS AND CONTINUITY 175
c.

1 1 1 1
lim − = lim − lim Difference law
x→4 x x2 x→4 x x→4 x2
2
1 1
= lim − lim Product law
x→4 x x→4 x
2
1 1
= + Quotient law
4 4
3
=
16
2
The preceding example illustrates that applying the product and sum limit laws repeatedly allows us to quickly
compute limits of polynomials and rational functions as x approaches a by evaluating them at the value a, provided
a is in the domain.
Let f be either a polynomial or a rational function. If a is in the domain of f , then

Limits of polynomials
and lim f (x) = f (a)
rational function x→a
Proof. We have previously shown that limx→a c = c (the limit of a constant is a constant) and limx→a x = a.
By applying the limit law for products repeatedly, we have limx→a xn = an for n = 1, 2, 3, . . . .. Let p(x) =
b0 + b1 x + b2 x2 + . . . bn xn be a polynomial. Then
lim p(x) = lim b0 + lim b1 x + . . . + lim bn xn Limit law for sums

x→a x→a x→a x→a
= b0 lim 1 + b1 lim x + . . . + bn lim xn Limit law for products
x→a x→a x→a
n
= b0 + b1 a + . . . bn a
= p(a)
Thus we have shown limx→a p(x) = p(a) for any polynomial. You are asked to prove the result for a rational function
in the problem set. 2
Example 2. Finding limits algebraically
Find the limits and show each step of your derivation.

a. limx→2 (2x4 − 5x3 + 2x2 − 5)
x2 −4
b. limx→2 x+2
x2 −4
c. limx→−2 x+2
Solution.
a. Since 2x4 − 5x3 + 2x2 − 5 is a polynomial, it is sufficient to evaluate the polynomial at x = 2:
lim (2x4 − 5x3 + 2x2 + 5) = 2(2)4 − 5(2)3 + 2(2)2 − 5

x→2
= 32 − 40 + 8 − 5
= −5

2
b. Since xx+2
−4
is a rational functions and x = 2 is in the domain, it is sufficient to evaluate the rational
function at x = 2:
x2 − 4 (2)2 − 4
lim =
x→2 x + 2 2+2
0
=
4
= 0
c. Since x = −2 is not in the domain, we cannot simply evaluate the function at x = −2 to determine the
limit. However, we can factor and then evaluate at x = −2
x2 − 4 (x − 2)(x + 2)
lim = lim
x→−2 x + 2 x→−2 x+2
= lim (x − 2) Now it is a polynomial.
x→−2
= −4
Consider the following limit statement involving composition of functions.

Let f and g be functions such that limx→a f (x) = L and limx→L g(x) = M . Then
lim g[f (x)] = M

x→a
This “limit law,” for composition, as stated, is not true, in spite of the fact that it may seem intuitively clear. In
order to see the difficulty of this statement, we consider an example.
Example 3. Limits of compositions
Consider f and g whose graphs y = f (x) and y = g(x) in black and red, respectively, are shown below:
y
x
-2 -1.5 -1 -0.5 0.5 1 1.5 2
-1
-2
Find the following limits provided they exist.

a. limx→0 g[f (x)]
b. limx→0 f [g(x)]

c. limx→1 g[f (x)]
Solution.
a. From the graphs, limx→0 f (x) = 1 and limx→1 g(x) = −1, so (by the composition limit law)
lim g[f (x)] = −1
x→0
b. From the graphs, limx→0− g(x) = 0 and limx→0+ g(x) = −1. So the limit of g(x) as x → 0 does not exist,
and the composition limit law does not apply. To find the limit, we consider the left- and right-hand
limits:
lim f [g(x)] = lim f (x) = 1
x→0− x→0+
and
lim f [g(x)] = lim f (x) = 2.
x→0+ x→−1+
Since the one-sided limits do not agree, limx→0 f [g(x)] does not exist.
c. From the graphs, limx→1− f (x) = 0 and limx→1+ f (x) = 1. So the limit of f (x) as x → 1 does not
exist, and the composition limit law does not apply. To find the limit directly, we consider the left- and
right-hand limits:
lim− g[f (x)] = lim+ g(x) = −1
x→1 x→0
and
lim g[f (x)] = lim g(x) = −1.
x→1+ x→1−
Since the one-sided limits agree, limx→1 f [g(x)] = −1.

2
Notice from Example 3 (parts b and c) that just because the composition limit law does not apply, you cannot
draw any immediate conclusions about the compositional limit. On the other hand, if the composition limit law does
apply, then you can use it to find the limit.
Continuity and Its Properties

The reason Example 3 was interesting was the fact that the red and black functions were piecewise functions. The
word that describes graphs whose parts are “connected” is the idea of continuity. The idea evolved from the intuitive
notion of a curve “without breaks or jumps” to a formal mathematical definition. We begin with a definition of
continuity at a point .
A function f is continuous at the point a if

Continuity at a point lim f (x) = f (a)
x→a
Continuity can fail in three ways. First, limx→a f (x) may exist, but f might not be defined at a. Second, if
limx→a f (x) = L, then it may be that f (a) 6= L. And third, limx→a f (x) may not exist; this failure is irreparable
without altering the behavior of the fuction. In the first two cases, continuity can be restored by redefining f (a) to
be L, which does not change the behavior near a.
Example 4. Checking continuity
Test the continuity of each of the following functions at x = 0 and x = 1. If the function is not continuous at the
point, explain. Discuss whether or not the function can be redefined at a single point to make it continuous at any
points of discontinuity.

a. The function f is defined by the graph y = f (x):

y
x
-2 -1.5 -1 -0.5 0.5 1 1.5 2
-1
-2
x2 +2x−3
b. g(x) = x−1 if x 6= 1, g(x) = 6 if x = 1.
Solution.
a. Since f (x) approaches 1 from both sides of x = 0,we see limx→0 f (x) = 1. However, as f (0) = 2, we see
that
lim f (x) 6= f (0)
x→0
and f is not continuous at x = 0. However, this is reparable by redefining f (0) to be 1.

At x = 1, we see
lim f (x) = 0 and lim f (x) = 1
x→1− x→1+
Thus, the limit does not exist, so f is not continuous at x = 1, and this discontinuity is not reparable.
b. At x = 0, we use the limit of a quotient law
x2 + 2x − 3 02 + 2(0) − 3
lim = =3
x→0 x−1 0−1
and g(0) = 3, so g is continuous at x = 0.
At x = 1, we see g(1) = 6. We cannot use the limit of a quotient law because of division by zero at x = 1.
However, if we factor the numerator and then take the limit, we get
x2 + 2x − 3 (x − 1)(x + 3)
lim = lim
x→1 x−1 x→1 x−1
= lim (x + 3)
x→1
= 4
Since limx→1 g(x) 6= g(1), g is not continuous at x = 1. However, this discontinuity is reparable by
redefining g(1) = 4.
2
We can now return to the useful limit law which motivated our discussion of continuity, namely the limit of a
composition of functions.
Let f and g be continuous functions such that limx→a f (x) = L and limx→L g(x) =
Composition M . Then
Limit Law lim g[f (x)] = M
x→a

We now add to this limit law some other laws of continuity which are derived directly from their corresponding
limit laws.
Let f and g be functions that are continuous at x = a. Then

Sums f + g is continuous at x = a.
Differences f − g is continuous at x = a.
Continuity Laws Products f · g is continuous at x = a
Quotients f /g is continuous at a provided that g(a) 6= 0.
Composition g ◦ f is continuous at x = a, provided g is continuous at x = f (a).
Proof. We will illustrate the proof of this property for products. All other parts follow in a similar manner. Assume
that f and g are continuous at a. Then limx→a f (x) = f (a) and limx→a g(x) = g(a). Hence
lim (f g)(x) = lim f (x) g(x)

x→a x→a
= lim f (x) lim g(x) Limit law for products
x→a x→a
= f (a)g(a) Continuity of f and g at x = a
= (f g)(a)
There f g is continuous at x = a. 2
Since we have shown that limx→a f (x) = f (a) for polynomial and rational functions at points in their domain,
these functions are continuous at all points on their domain. As it turns out, this statement holds for all elementary
functions.
Theorem 2.1. Continuity of elementary functions
Let f be either a polynomial, rational function, trigonometric function, power function, exponential function, or
logarithmic function. Then f is continuous at all points in its domain.
Armed with the tools of continuity, we can readily calculate many limits.
Example 5. Quick limits
Use the results of this section to find the given limits, and justify each step of your derivation.

a. limx→1 ln x − sin(πx) + x3
√
ln x
b. limx→4 1+x
Solution.
a.

lim ln x − sin(πx) + x3 = lim ln x − lim sin(πx) + lim x3 Sum and difference limit laws
x→1 x→1 x→1 x→1
3
= ln 1 − sin π + 1 Continuity of elementary functions
= 0−0+1
= 1

b.
√ √
ln x limx→4 ln x
lim = Quotient limit law
x→4 1 + x limx→4 (1 + x)
√
ln 4
= Composition limit law and continuity of elementary functions
1+4
1
= ln 2
5
2
Combining our continuity theorems with the limit laws, we can compute limits that we could not otherwise find.
Example 6. Technology vanquished
Recall in Example 4, Section 2.2, p. 159 we used technology to study the limit
√
1 + x2 − 1
lim
x→0 x2
and this study was inconclusive. Find this limit using algebra and the results of this section.
√
2
Solution. To work with the expression f (x) = 1+x
x2
−1
, we need to simplify it. One way to simplify is to multiply
√
2
the numerator and denominator by 1 + x + 1.
√ √ √
1 + x2 − 1 1 + x2 − 1 1 + x2 + 1
2
= 2
· √
x x 1 + x2 + 1
2
1+x −1
= √
x2 ( 1 + x2 + 1)
1
= √
1 + x2 + 1
We now turn to evaluating the limit.
√
1 + x2 − 1 1
limx→0 = limx→0 √ From the above simplification.
x2 1 + x2 + 1
limx→0 1
= √ Limit law for quotients
limx→0 ( 1 + x2 + 1)
1 √
= √ x is continuous
2
1+0 +1
1
=
2
1
Notice that this value of 2 corresponding to our initial guess when using technology. 2
Intermediate Value Theorem

The function f is said to be continuous on the open interval (a, b) if it is continuous at each number in this
interval. Note that the endpoints are not part of open intervals. If f is also continuous from the right at a, we say it is
continuous on the half-open interval [a, b). Similarly, f is continuous on the half-open interval (a, b] if it is continuous
at each number between a and b and is continuous from the left at the endpoint b. Finally, f is continuous on the
closed interval [a, b] if it is continuous at each number between a and b and is both continuous from the right at a
and continuous from the left at b.
Example 7. Intervals of continuity
For the following functions, determine on which intervals the function is continuous.

1
a. 1−x2 .
x+3
b. |x+3| .
c. tan x
Solution.
1
a. Since 1−x 2 is a rational function, it is continuous on its domain, that is, whenever its denominator is
1
no-zero. Since 1 − x2 = 0 if and only if x = ±1, 1−x 2 is continuous on the open intervals (−∞, −1),
(−1, 1) and (1, ∞).

x+3 x+3
b. Since |x+3| equals 1 for all x > −3 and equals −1 for all x < −3, |x+3| is continuous on the open intervals
(−∞, −3) and (3, ∞).
c. Since tan x is a quotient of the elementary functions sin x and cos x, it is continuous at all points where
cos x 6= 0. Therefore tan x is continuous on all intervals of the form (π/2 + kπ, 3π/2 + kπ) where k is an
integer.
2
The graphs of functions that are continuous on an interval cannot have any breaks or gaps. Because of this, we
can guarantee the conclusion of the intermediate value theorem.
Theorem 2.2. Intermediate Value Theorem
Let f be continuous on the closed interval [a, b]. If L lies strictly between f (a) and f (b), then there exists at least
one number c on the open interval (a, b) such that f (c) = L.
This theorem says that if f is a continuous function (with emphasis on the word continuous) on some closed
interval [a, b], then f (x) must take on all values between f (a) and f (b). The intermediate value theorem is extremely
useful in ensuring that we can solve certain nonlinear equations. Consider the following example.
Example 8. Proving the existence of roots
Use the intermediate value theorem to prove that there exists a solution to
x5 − x2 + 1 = 0
Use technology to estimate one of the solutions.
Solution. Let f (x) = x5 − x2 + 1. Since f is a polynomial, it is continuous at all points on the real number line. To
use the intermediate value theorem, we need to find an interval [a, b] such that f (a) and f (b) have opposite signs (since
0 is a value between a positive number and a negative one). A little experimentation reveals that f (−1) = −1 < 0
and f (1) = 1 > 0. Hence, there must be a c in (−1, 1) such that f (x) = 0. Using technology, we see that there is a
solution around x = −0.8.

Example 9. The bisection method for solving equations
Given a crude plot of the polynomial

x5 − 10x3 + 21x + 4 = 0
and a calculator able to do only simple arithmetic operations, apply the intermediate value theorem to the problem
of finding the largest root of this equation correct to two decimal places.
Solution. We see from a plot of this polynomial in Fig. 2.13 that it has five roots, one each respectively on the
integer intervals [−3, −2], [−2, −1], [−1, 0], [1, 2], and [2, 3]. The intermediate value theorem suggests the following
algorithm, know as the bisection method, to solve for any of these roots to as many decimal places as we like.
Figure 2.13: Graph of f (x) = x5 − 10x3 + 21x + 1
1. Choose any two points a and b known to be on either side of the root of interest such that only one root exists
on [a, b]. Call these points a and b.
2. Calculate f ( a+b
2 ).
3. If f ( a+b a+b
2 ) has the same sign as f (a) then by the intermediate value theorem the root lies between f ( 2 ) and
a+b
f (b). In this case rename the interval [ 2 , b] as [a, b] and repeat the process.
4. If f ( a+b
2 ) has the same sign as f (b) then by the intermediate value theorem the root lies between f (a) and
f ( a+b
2 ). In this case rename the interval [a, a+b
2 ] as [a, b] and repeat the process.
5. Keep repeating until f ( a+b

2 ) is as close to 0 as desired, at which point the root is approximated by x =
a+b
2 .
Note that if f (a) and f (b) have opposite signs, but the interval [a, b] contains more than one root, then this method
will converge to one of these roots; but you will not know which root (e.g. the biggest or smallest on the interval).
In Table 2.3, the calculations for the method are illustrated using the starting interval [2, 3] which we know from
Fig. 2.13 contains the largest root. Note that for purposes of illustration, we have used the exact values of a and b
throughout this calculation, although this is not necessary (as we will illustrate in the next example) when calculating
the solution correct to 2 decimal places. After 10 iterations, we see from Table 2.3 that we have established the root
lies on [2.5615, 2.5621]. Thus, to 2 decimal places the root is x = 2.56. If we wanted to find the root to more than 2
decimal places, we could keep going as illustrated in Table 2.3 until the desired accuracy is obtained.

Table 2.3: Bisection method for finding roots of a nonlinear function

a+b
Iteration a 2 b f (a) f ( a+b
2 ) f (b)
0 2 2.5 3 -2 -2.09375 40
1 2.5 2.75 3 -2.09375 11.05761 40
2 2.5 2.625 2.75 -2.09375 2.882965 11.05761
3 2.5 2.5625 2.625 -2.09375 0.037423 2.882965
4 2.5 2.53125 2.5625 -2.09375 -1.112400 0.037423
5 2.53125 2.546875 2.5625 -1.112400 -0.559168 0.037423
6 2.546875 2.5546875 2.5625 -0.559168 -0.266371 0.037423
7 2.5546875 2.55859375 2.5625 -0.266371 -0.115859 0.037423
8 2.55859375 2.560546875 2.5625 -0.115859 -0.039565 0.037423
9 2.560546875 2.5615234375 2.5625 -0.039565 -0.001158 0.037423
10 2.5615234375 2.56201171875 2.5625 -0.001158 0.018111 0.037423
Equations that take the form of finding the roots of polynomials, such as the problem in the previous example
of finding the largest root of a 5th order polynomial, are known as algebraic
√ equations. Equations that involve
exponential or trigonometric functions, such as x sin x − 1/2 = 0 or e x − π = 0, are known as transcendental
equations generally have nonanalytical solutions, but need to be solved numerically.
Example 10. Time to global warming
According to an online article in the New Scientist ∗ recent research suggests that stabilizing carbon dioxide
concentrations in the atmosphere at 450 parts per million (ppm) could limit global warming to 2◦ C. In Section 1.3,
we modeled carbon dioxide concentrations in the atmosphere with the function (which we now present to higher
precision to make more transparent the numerical details of the convergence process)
πx
f (x) = 0.122463x + 329.253 + 3 cos ppm
6
where x is months after April 1974. Use the bisection method to find the first time that the model predicts carbon
dioxide levels of 450 ppm. Get a prediction that is accurate to two decimals.
Solution. Solving f (x) = 450 is equivalent to solving g(x) = 0 where g(x) = f (x) − 450. Using technology to plot
g(x), we get
−2
−4
−6
−8
−10
−12
−14
900 910 920 930 940 950 960 970 980 990 1000
∗ Carbon emissions rising faster than ever 17:29 10 November 2006. New Scientist.com news service Catherine Brahic

Hence, there appears to be a zero in this interval. Zooming around the interval [960, 980], we get
−1
−2
−3
−4
−5
−6
960 962 964 966 968 970 972 974 976 978 980
Since the first zero appears to be in the interval [965, 972], we can set a = 965 and b = 972 and apply the bisection
method which yields the following table of values (where all the values have all been rounded to 4 decimal places
throughout the calculations):
a+b
Iteration a 2 b f (a) f ( a+b
2 ) f (b)
0 965 968.5 972 -5.1683 -2.9180 1.2870
1 968.5 970.25 972 -2.9180 -0.1010 1.2870
2 970.25 971.125 972 -0.1010 0.8705 1.2870
3 970.25 970.6875 971.125 -0.1010 0.4453 0.8705
4 970.25 970.4688 970.6875 -0.1010 0.1859 0.4453
5 970.25 970.3594 970.4688 -0.1010 0.0457 0.1859
6 970.25 970.3047 970.3594 -0.1010 -0.0269 0.0457
7 970.3047 970.3320 970.3594 -0.0269 0.0095 0.0457
8 970.3047 970.3184 970.3320 -0.0269 -0.0086 0.0095
9 970.3184 970.3252 970.3320 -0.0086 0.0005 0.0095
10 970.3184 970.3218 970.3252 -0.0086 -0.0041 0.0005
11 970.3218 970.3235 970.3252 -0.0041 -0.0018 0.0005
Hence, the model predicts that carbon dioxide concentrations will reach 450 ppm in 970.32 months, which is 80 years
and 10 months, after April 1974. In other words, in February of 2055.
2
Problem Set 2.3

Determine the limits limx→a− f (x), limx→a+ f (x) and limx→a f (x) in Problems 1 to 6. If they do not exist, discuss
why.
2
x − 2 if x > 1
1. f (x) = with a = 1.
2x − 3 if x ≤ 1

 3x + 2 if x ≤ 1
2. f (x) = 5 if 1 < x ≤ 3 with a = 1.
3x2 − 1 if x > 3

3. f (x) = x/|x| with a = 0.

4. f (x) = x2 /|x| with a = 0.

5. f (x) defined by the graph with a = 1. (See Problem 5, Section 2.2.)

y
x
-2 -1.5 -1 -0.5 0.5 1 1.5 2
-1
-2
6. f (x) defined by the graph with a = 0. (See Problem 7, Section 2.2.)
y
1
0.75
0.5
0.25
x
-1 -0.75 -0.5 -0.25 0.25 0.5 0.75 1
-0.25
-0.5
-0.75
-1
Find the limits in Problems 7 to 14. Justify each step with limit laws and the appropriate results from this section.
x2 +3x−10
7. limx→3 3x2 +5x−7
(x+1)2 −1
8. limx→0 x
t2 +5t+6
9. limt→−3 t+3
√
4−t2 −2
10. limt→0 t2
s−2
11. lims→3 s+2 + sin s
12. lims→1 s + sin(ln s)
1+tan x
13. limx→π 2−cos x
x sin(πx)
14. limx→1/3 1+cos(πx)
The graph of a function f is shown in Problems 15 to 18. Determine at what points f is not continuous and whether
f can be redefined at these points to make it continuous. Explain briefly.

15.
y
2
1.5
0.5
x
-2 -1.5 -1 -0.5 0.5 1 1.5 2
-0.5
-1
-1.5
-2
16.
y
2
1.5
0.5
x
-2 -1.5 -1 -0.5 0.5 1 1.5 2
-0.5
-1
-1.5
-2
17.
x
-2 -1.5 -1 -0.5 0.5 1 1.5 2
-1
-2

18.
y
1
0.75
0.5
0.25
x
-1 -0.75 -0.5 -0.25 0.25 0.5 0.75 1
-0.25
-0.5
-0.75
-1
Each function in Problems 19 to 22 is defined for all x > 0, except at x = 2. In each case, find the value that should
be assigned to f (2), if any, to guarantee that f will be continuous at 2. Explain briefly.
x2 −x−2
19. f (x) = x−2
q
x2 −4
20. f (x) = x−2

2x + 5 if x > 2
21. f (x) =
15 − x2 if x < 2
1
x −1
22. f (x) = x−2
In Problems 23 to 28, use the intermediate value theorem to prove the following equations have at least one solution.
23. −x7 + x2 + 4 = 0
1
24. x+1 = x2 − x − 1
√
25. 3
x = x2 + 2x − 1
√
26. 3 x − 8 + 9x2/3 = 29
27. x2x = π
28. 1 + sin x + x3 = 0
29. Use the bisection method explicated in Example 9 to find the following roots of the polynomial
x5 − 10x3 + 21x + 4 = 0
to an accuracy of 3 decimal places.
a. The smallest root.

b. The second smallest root.
c. The largest negative root.
d. The small positive root.
30. Prove that if p and q are polynomial functions with q(a) 6= 0, then
p(x) p(a)
limx→a =
q(x) q(a)

31. Prove if f and g are continuous at x = a, then f + g is continuous at x = a.

32. Prove if f and g are continuous at x = a, then f − g is continuous at x = a.
f
33. Prove if f and g are continuous at x = a and g(a) 6= 0, then g is continuous at x = a.
34. Prove if f is continuous at x = a and g is continuous at x = f (a), then g ◦ f is continuous at x = a.

35. Why does the cubic equation x3 + ax2 + bx + c = 0 have at least one root for any values of a, b, and c?
36. For any constant c, why does the polynomial equation xn = c always have at least one real root when n is an
odd integer but not necessarily when n is an even integer?
37. Consider an organism that can move freely between two spatial locations. In one location, call it patch 1, the
number of progeny produced per individual is
100
f (N ) =
1+N
where N is the population size. In the other location, call it patch 2, the number of progeny produced per
individual is always 5. Assume that all individuals in the population move to the patch that allows them to
produce the greatest number of progeny. Let g(N ) represent the number of progeny produced per individual
for such a population.
a. Find an explicit expression for g(N ) by setting g(N ) = f (N ) whenever f (N ) > 5 and setting
g(N ) = 5 whenever f (N ) < 5.
b. Determine how the expression for g(N ) should be defined when f (N ) = 5 to ensure that g(N ) is
continuous.
38. As discussed in Example 10, recent research suggests that stabilizing carbon dioxide concentrations in the
atmosphere at 450 parts per million (ppm) could limit global warming to 2◦ C. Use the bisection method and
the carbon dioxide concentration model
πx
f (x) = 0.122463x + 329.253 + 3 cos ppm,
6
where x is months after December 1973, to find the second time that this model predicts carbon dioxide levels
of 450 ppm. Get a prediction that is accurate to two decimals.
39. Use the bisection method to find the last time that the model in Problem 38 predicts carbon dioxide levels of
450 ppm. Get a prediction that is accurate to two decimals.
40. Scientists believe that it will be extremely difficult to rein in carbon emissions enough to stabilize the atmo-
spheric CO2 concentration at 450 parts per million, as discussed in Example 10, and think that even 550 ppm
will be a challenge. Use the bisection method to find the first time that the model in Problem 38 predicts
carbon dioxide levels of 550 ppm. Get a prediction that is accurate to two decimals.
41. Fisheries scientist often use data to establish a stock-recruitment relationship of the general form y = f (x),
where x is the number of adult fish participating in the spawning process (i.e. the laying and fertilizing of eggs)
that occurs on a seasonal basis each year and y is the number of young fish recruited to the fishery as a result
of hatching from the eggs and surviving through to the life stage at which they become part of the fishery (i.e.
available for harvesting). Two fisheries scientist∗ found that the following stock-recruitment function provides
a good fit to data pertaining to the Southeast Alaska pink salmon fishery:
y = 0.12x1.5 e−0.00014x .
Use the bisection method to find the spawning stock level x that is expected to recruit 10,000 individuals to
the fishery. (Hint: for the value of y in question, find the root of the equation y − f (x) = 0.)
∗ T. J. Quinn and R. B. Deriso, 1997. Quantitative Fish Dynamics. Oxford UP.

42. Use the bisection method to find the spawning stock level x that is expected to recruit 20,000 individuals to
the fishery modeled by the stock-recruitment function given in Problem 41.
43. Use the bisection method to find the spawning stock level x that is expected to recruit 5,000 individuals to the
fishery modeled by the stock-recruitment function given in Problem 41.

190 2.4. TO INFINITY AND BEYOND
2.4 To Infinity and Beyond

In Chapter 1, we introduced the notion of infinity and represented it with the symbols ∞ and −∞. This symbol
was used by the Romans to represent the number 1, 000 (a BIG number to them). It it was not until 1650, however,
that it was first used by John Wallis (1616-1703) to represent an uncountably large number. Many of us still cling
to the idea from childhood that infinity represents endlessness, but to the mathematician, infinity is a not a number
but a tool to represent a sophisticated and complex idea. As the famous mathematician David Hilbert (1862-1943)
said, “The infinite! No other question has ever moved so profoundly the spirit of man.”∗ In this section, we tackle
limits involving the infinite in two ways. First, we determine under what conditions functions approach a limiting
value as their argument becomes arbitrarily large (positive or negative). Second, we study functions that take on
arbitrarily large values as you approach a certain value where the function is not defined.
Horizontal asymptotes
To understand the behavior of functions as their argument becomes more positive or negative (i.e. further from the
origin in either direction), we introduce horizontal asymptotes.
Let f be a function. We write
lim f (x) = L
x→∞
Horizontal Asymptotes if f (x) can be made arbitrarily close to L for all x sufficiently large. We write
(Informal Definition)
lim f (x) = L
x→−∞
if f (x) can be made arbitrarily close to L for all x sufficiently negative. Whenever
one of these limits occur, we say that f (x) has a horizontal asymptote at y = L.
Example 1. Finding horizontal asymptotes
Find the following limits involving a given function, f . In each, indicate how positive or negative x needs to be
to ensure that f (x) is within one ten-millionth of the limiting value, L.

a. limx→∞ 2 + x1
b. limx→−∞ ex
10x
c. limx→∞ 1+5x
Solution.
1
a. For x sufficiently large, x is arbitrarily close to 0. Hence, we would expect
1
lim 2 + =2
x→∞ x
To say that f (x) is within one ten-millionth of the limiting value, L, is to say that
1
|f (x) − L| < .
10, 000, 000
∗ J. R. Newman (ed.), The World of Mathematics, New York: Simon and Schuster, 1956, p. 1593.

2.4. TO INFINITY AND BEYOND 191
Note that another way of writing one ten millionth is to use the powers of ten notion 10−7 . For positive
x, we need

2 + 1 − 2 1

<
x 10, 000, 000
1 1
<
x 10, 000, 000
x > 10, 000, 000
Thus, if x is greater than 10, 000, 000, then f (x) will be within one ten-millionth of two.
b. For x sufficiently negative, ex is arbitrarily small. Hence, we would expect that
lim ex = 0.
x→−∞
To see how negative x needs to be to ensure that ex is less than one ten-millionth, we have
1
|ex − 0| <
10, 000, 000
1
ex <
10, 000, 000
1
x < ln
10, 000, 000
< −16.2.
Thus, if x is less than −16.2, then f (x) will be within one ten-millionth of zero.
c. To find the limiting value of this function, we can divide the numerator and denominator of f (x) by x:
10x 10
lim = lim
x→∞ 1 + 5x x→∞ 1 +5
x
10
= Since 1/x approaches 0 as x gets large
5
= 2
In order to be within one ten-millionth of the limiting value of 2, we need

10x 1

1 + 5x − 2 <
10, 000, 000

10x 2 + 10x 1

1 + 5x − <
1 + 5x 10, 000, 000

−2 1

1 + 5x
<
10, 000, 000
2 1
< Since x > 0
1 + 5x 10, 000, 000
20, 000, 000 < 1 + 5x
19, 999, 999 < 5x
19, 999, 999
x >
5
Thus, if x is greater than about 4, 000, 000, then f (x) will be within one ten-millionth of two.
2
Understanding the asymptotic behavior of a function can help us graph and interpret it, as seen in the next
example.

y
120
100
80
60
40
20
x
-10 -8 -6 -4 -2
Figure 2.14: Percentage of patients responding to a dosage of Histamine. The x-axis corresponds to the natural
logarithm of the dosages in mM∗ .
Example 2. Dose response curves
“Dose-response curves can be used to plot the results of many kinds of experiments. The x-axis represents
concentration of a drug or hormone. The y-axis represents the response, which could be almost anything. For
example, the response might be enzyme activity, accumulation of an intracellular second messenger, membrane
potential, secretion of a hormone, heart rate or contraction of a muscle.”∗ A dose response curve for patients
responding to a dose of Histamine is given by the function∗
100ex
R(x) = ,
ex+ e−5
where x is the natural logarithm of the dosage in mmol (millimoles).
a. Find the horizontal asymptotes of R(x).
b. Show that R(x) is increasing and sketch y = R(x).
c. Calculate how large x needs to be to ensure that it is within 0.01 of its asymptotic value.
Solution.
a. To find the horizontal asymptotes we find
100 ex 100 ex e−x
lim = lim ·
x→∞ ex + e−5 x→∞ ex + e−5 e−x
100
= lim
x→∞ 1 + e−x−5
100
= Since the value e−x approaches zero as x becomes large (in the positive direction).
1+0
= 100
and
100 ex 0
lim = Since the value ex approaches zero as x becomes small (in the negative direction).
x→−∞ ex + e−5 e−5 + 0
= 0
Thus, the horizontal asymptotes are y = 100 and y = 0.

∗ http://www.curvefit.com/introduction89.htm
∗ K. A. Skau, “Teaching Pharmocodynamics: An introductory module on learning dose-response relationships,” American Journal of
Pharmaceutical Education (2004), 68: Article 73

0
b. Note that the y-intercept is R(0) = e100e
−5 +e0 ≈ 100. Also, since both e
−x
+e−5 and e−x are both decreasing,
we know that R(x) is increasing. Thus, the graph of the functions looks something like what is shown in
Figure 2.14. Notice that this curve fits the data fairly well.
c. To find when R(x) is within 0.01 of 100, notice that R(x) < 100 for all x. Hence, we only need to solve
100ex
99 <
ex + e−5
99(ex + e−5 ) < 100ex
99e−5 < ex
99e−5 < ex
ln 99 − 5 < x
−0.40 < x
Vertical asymptotes
Many functions, such as rational functions, logarithms, and certain power functions, are not defined at isolated
values. As the argument of the function gets close to these isolated values, the function may become arbitrarily
positive or negative and exhibit a vertical asymptote.
lim f (x) = ∞
x→a−
if f (x) can be made arbitrarily large for all x sufficiently close to a and to the left
of a. We write
lim+ f (x) = ∞
x→a
if f (x) can be made arbitrarily large for all x sufficiently close to a and to the right
Vertical asymptotes of a. We write
(Informal Definition) lim− f (x) = −∞
x→a
if f (x) can be made arbitrarily negative for all x sufficiently close to a and to the
left of a. We write
lim+ f (x) = −∞
x→a
if f (x) can be made arbitrarily negative for all x sufficiently close to a and to the
right of a.
Whenever any one of these limits occur, we say that f (x) has a vertical asymptote
at x = a.
Example 3. Infinite blow up
Find limx→a− f (x) and limx→a+ f (x) for the given functions, and then sketch the graph of y = f (x) near x = a.
1
a. f (x) = x with a = 0.
1
b. f (x) = (x−2)2 with a = 2.
π
c. f (x) = tan x with a = 2.

Solution.
1
a. limx→0− f (x) = −∞ since for x < 0 sufficiently close to 0, x is large and negative.
1
limx→0+ f (x) = ∞ since for x > 0 sufficiently near 0, x is arbitrarily large and positive.
1 1
Since y = x is decreasing for all x 6= 0, the graph of y = x near x = 0 is as follows:
1
b. limx→2− (x−2)2 = ∞ since for x < 2 and sufficiently close to 2, 1/(x − 2)2 is large and positive.
1 2
limx→2+ (x−2) 2 = ∞ since for x > 2 and sufficiently close to 2, 1/(x − 2) is large and positive.∗ The
1
graph of y = (x−2) 2 close to the vertical asymptote y = 2 is as follows:
sin x π π
c. limx→ π2 − tan x = limx→ π2 − cos x = ∞ since for x < 2 and sufficiently close to 2 , sin x is close to 1 and
cos x is positive and close to 0, so the quotient of sine and cosine is large and positive.
sin x π π
limx→ π2 + cos x = −∞ since for x > 2 and sufficiently close to 2 , sin x is close to 1 and cos x is negative
and close to 0, so the quotient of sine and cosine is large and negative. The graph of y = tan x close to
the vertical asymptote x = π/2 is as follows:
∗ If 1
you want to support this statement, suppose you want (x−2)2
≥ 1, 000, 000. Taking the square root of both sides and cross
1 1
multiplying yields 1000
≥ |x − 2|. Hence, f (x) ≥ 1, 000, 000 provided 0 < |x − 2| < 1000
.

Combining the information about horizontal and vertical asymptotes can provide a relatively complete sense of
the graph of a function.
Example 4. Running with wolves
In a paper appearing in Ecology ∗ , Francois Messier examined wolf-moose interactions over a broad spectrum of
moose densities throughout North America. One of his primary objectives was to determine how the killing rate
of moose by wolves depends on the moose density. He found that the Michaelis-Menton function used to describe
nutrient uptake in Example 6 of Section 1.6 fit the data rather well. The parametrized function used by Messier is
3.36x
f (x) = moose killed per wolf per 100 days
0.46 + x
where x is measured in number of moose per km2 . The following figure illustrates the data plotted against f (x).
In this example, we examine the shape of the function for biologically relevant (i.e. x ≥ 0) as well as biologically
irrelevant (i.e. x < 0) values of x.
a. Find all horizontal and vertical asymptotes for y = f (x). Discuss the biological meaning of the horizontal
asymptote.
b. Sketch the graph of y = f (x) for all x. Discuss the biological meaning of the graph for non-negative x.
c. Relate the graph to the following quotation of Sir Winston Churchill (1874-1965).∗
∗ F. Messier. 1994. Ungulate population models with predation: A case study with the North American moose. Ecology. 75: 478–488
∗ H. Eves, Return to Mathematical Circles, Boston: Prindle, Weber and Schmidt, 1988.

I had a feeling once about Mathematics—that I saw it all. Depth beyond depth was revealed
to me—the Byss and Abyss. I saw—as one might see the transit of Venus or even the Lord
Mayor’s Show— a quantity passing through infinity and changing its sign from plus to minus. I
saw exactly why it happened and why the tergiversation was inevitable but it was after dinner
and I let it go.
Solution.
a. First, let us find the horizontal asymptotes.
1
3.36x 3.36x
lim = lim · x1
x→∞ 0.46 + x x→∞ 0.46 + x
x
3.36
= lim
x→∞ 0.46/x + 1
3.36
=
0+1
= 3.36
Thus, f (x) has a horizontal asymptote y ≈ 3.36, which f (x) approaches as x approaches ∞ means that
when the moose density is very large, the wolf killing rate stabilizes around 3.36 moose per wolf per 100
days. Similarly (without the corresponding biological meaning for x < 0), we obtain
3.36x
lim = 3.36
x→−∞ 0.46 + x
Next, let us find the vertical asymptotes. Since f (x) is not defined at x = −0.46, there is a possible
vertical asymptote at x = −0.46. We have limx→−0.46− f (x) = ∞ because when x < −0.46, but close
to −0.46, x is negative and 0.46 + x is arbitrarily small and also negative. Alternatively, we have
limx→−0.46+ f (x) = −∞ because when x > −0.46, but close to −0.46, x is negative and 0.46 + x is
arbitrarily small and positive. Note, for a biological point of view, this function is only meaningful for
x ≥ 0.
b. We begin by drawing the asymptotes: y = 3.36 and x = −0.46. The y-intercept is found at x = 0 as
3.36x 3.36
f (0) = 0. Our observation that 0.46+x = 0.46/x+1 for x 6= 0 implies that this function is increasing for all
x 6= 0. Using this information, we draw the graph shown below:
−1
−2
−3
−4
−10 −8 −6 −4 −2 0 2 4 6 8 10
Looking at the non-negative portion of this graph, we see that the number of recruits increases and
saturates at approximately 3.36 as the density of moose increases.
c. As viewed from left-to-right, the function passes from positive infinity to negative infinity as it passes
through the value x = −0.46, which is Churchill’s “quantity passing through infinity changing its sign
from plus to minus.” Perhaps, Churchill saw a wolf after his dinner and that is why he let it go.

Infinite limits at infinity

As x gets larger and larger without bound, the value of f might also get larger and larger without bound. In such a
case, it is natural to say that f (x) approaches infinity as x approaches infinity.
lim f (x) = ∞
x→∞
if f (x) can be made arbitrarily large for all x sufficiently large. We write
lim f (x) = ∞
x→−∞
Infinity at infinity
if f (x) can be made arbitrarily large for all x sufficiently negative. We write
(Informal Definition)
lim f (x) = −∞
x→∞
if f (x) can be made arbitrarily negative for all x sufficiently large. We write
lim f (x) = −∞
x→−∞
if f (x) can be made arbitrarily negative for all x sufficiently negative.
Example 5. Limits to infinity
Find the following limits
a. limx→∞ x2
b. limx→∞ (x − x2 )
x2
c. limx→∞ 1,000,000+10x
Solution.
a. For large x the number x2 can be made arbitrarily large for all sufficiently large x, so we say limx→∞ x2 =
∞.
b. It is tempting to use a limit law here and write
lim (x − x2 ) = lim x − lim x2

x→∞ x→∞ x→∞
= ∞−∞
= 0
However, this is incorrect! Limit laws do not apply to infinite limits. Indeed, ∞ − ∞ is not a meaningful
statement as ∞ is not a real number. Luckily, we can deal with this by noticing that for large x,
x − x2 = x(1 − x) is the product of two numbers such that for large x one of these numbers is large and
positive and the other has large absolute value but is negative. Thus, for sufficiently large x, x(1 − x)
can be made arbitrarily negative. Hence, limx→∞ (x − x2 ) = −∞.

c. Again it is tempting to use a limit law to conclude the limit is ∞

∞ . This is meaningless. However, if we
divide the numerator and denominator by x, we find (for x 6= 0)
x2 x
= 1,000,000
1, 000, 000 + 10 x x + 10
x2 x
Since 1, 000, 000/x + 10 approaches 0 + 10 = 10 as x approaches ∞, we find 1,000,000+10 x ≈ 10 for x
sufficiently large. Therefore,
x2
lim =∞
x→∞ 1, 000, 000 + 10x
Example 6. Unabated population growth
In Section 1.5, [xref] we modeled population growth in the United States with the function
f (x) = 8.3(1.33)x millions
where x represents the number of decades after 1815.
a. Find limx→∞ f (x).
b. Determine how large x has to be to ensure that f (x) is greater than 300, 000, 000. Discuss how your
answer relates to the current U.S. population size.
Solution.
a. Since 8.3(1.33)x gets arbitrarily large for large x, we have that limx→∞ 8.33(1.33)x = ∞.
b. We want f (x) ≥ 300, 000, 000. Solving for x in this inequality yields
8.3(1.33)x ≥ 300, 000, 000

300, 000, 000
1.33x ≥ ≈ 36, 145, 000
8.3
300, 000, 000
x ln 1.33 ≥ ln( )
8.3
17.4
x ≥ ≈ 61
ln 1.33
Therefore the model predicts that 61 decades after 1815, in other words in the year 2425, there will be
approximately 300 million people in the U.S. Given that the population size in January 2007 is over 300
million, we can see that the model from the 1800s considerably underestimated the future growth of the
U.S. population.
Problem Set 2.4

In Problems 1 to 17, find the specified limits.
1. limx→−∞ ex

2. limx→0+ ln x
1
3. limx→2+ x−2
1
4. limx→2− x−2

2x
5. limx→3− 3 + x−3

2x
6. limx→3+ 3 + x−3
x−1
7. limx→1− |x2 −1|
x2 −4x+3
8. limx→3+ x2 −6x+9
x3
9. limx→∞ 1+x3
x3
10. limx→−∞ 1+x3
(2x+5)(x−2)
11. limx→∞ (7x−2)(3x+1)
(2x2 −5x+7)
12. limx→∞ x2 −9
aQ2 +Q
13. limQ→∞ 1−Q2 where a is a constant.
Aex +3
14. limx→∞ Be2x +4 where A > 0 and B > 0 are constants.
1+ax+3x3
15. limx→−∞ 1+5x−5x3 where a is a constant.
1+5eax
16. limx→∞ 7+2eax where a > 0 is a constant.
ax
1+5e
17. limx→∞ 7+2eax where a < 0 is a constant.
For limx→a+ f (x) in Problems 18 to 23, determine how close x > a needs to be to a to ensure that f (x) ≥ 1, 000, 000
1
18. limx→2+ x−2
19. limx→0+ ln x1
1
20. limx→1− 1−x
1
21. limx→3+ (x−3)2
−1
22. limx→1− sin x
1
23. limx→1− ln x−1
For limx→−∞ f (x) = L in Problems 24 to 27, determine how negative x needs to be to ensure that |f (x) − L| ≤ 0.05.
1
24. limx→−∞ x2 =0
25. limx→−∞ (ex + 5 = 5)
x
26. limx→−∞ 1+x =1
1
27. limx→−∞ ln x2 =0
For the limit limx→∞ f (x) = ∞ in Problems 28 to 31, determine how large x needs to be to ensure that f (x) >
1, 000, 000.

28. limx→∞ x2
29. limx→∞ (ex + 5)
x2
30. limx→∞ 1+x
31. limx→∞ ln x
32. In Example 6, we showed that limx→∞ f (x) = ∞ where f (x) = 8.3(1.33)x represents US population size in
millions x decades after 1815. To see that f (x) can get arbitrarily large for x sufficiently large, do the following:
a. Determine how large x needs to be to ensure that f (x) ≥ 500, 000, 000.
b. Determine how large x needs to be to ensure that f (x) ≥ 1, 000, 000, 000.
33. In Example 3 from Section 1.5, we modeled the height of Erdinger Weissbier froth with the function H(t) =
17(0.99588)t cm where t is measured in seconds.
a. Determine L such that limt→∞ H(t) = L.
b. Determine how large t needs to be to ensure that H(t) is within 0.1 of L.
c. Determine how large t needs to be to ensure that H(t) is within 0.01 of L.
34. In Example 6 from Section 1.6, we modeled the uptake rate of glucose by bacterial populations wiht the function
1.2708x
f (x) = 1+0.0506x mg per hour where x is measured in mg per liter.
a. Find the horizontal and vertical asymptotes of f (x). Interpret the horizontal asymptote(s).
b. Graph f (x) for all values of x.
35. In Example 4, we examined how the killing rate of wolves depended on the moose density. Dr. Messier also
studied how wolf densities in North America depend on moose densities. He found that the following function
provides a good fit to the data:
58.7(x − 0.03)
f (x) = wolves per 1000 km2
0.76 + x
where x is number of moose per km2 . This function and the data are shown below:
a. Find the horizontal and vertical asymptotes of f (x). Interpret the horizontal asymptotes.
b. Graph f (x) for all values of x.
36. In problem 35, you were asked to find L such that limx→∞ f (x) = L.
a. Determine how large x needs to be to ensure that f (x) is within 0.1 of L.

b. Determine how large x needs to be to ensure that f (x) is within 0.01 of L.

37. The von Bertalanffy growth curve is used to describe how the size L (usually in terms of length) of an animal
changes with time. The curve is given by
L(t) = a(1 − e−b(t−t0 ) )
where t measures time after birth and a, b, and t0 are positive parameters. We will derive this curve in Chapter
6. To better understand the meaning of the parameters t0 and b, carry out these steps.
a. Evaluate L(t0 ). What does this imply about the meaning of t0 ?
b. Find limt→∞ L(t) What do this limits say about the biological meaning of a?
c. Graph L(t) and discuss how an organism grows according to this curve.
38. At the beginning of the 20th century, several notable biologists including G. F. Gause and T. Carlson studied
the population dynamics of yeast. For example, T. Carlson grew yeast under constant environmental conditions
in a flask. He regularly monitored their population densities.∗ . In chapter 6, we will show that the following
function describes the growth of the population:
9.7417e0.53t
N (t) =
1 + 0.01476e0.53t
where N is the population density and t is time in hours. Find limt→∞ N (t) and discuss the meaning of this
limit. Show that this function is logistic—that is, find constants a, b, and c such that the function has the
logistic form defined in this section.
39. The following equation is used to calculate the average firing rate f of a neuron (in spikes per second) as a
function of the concentration x of neurotransmitters perfusing its synapses.
20e3x
f (x) = .
2.1 + e3x
Find the horizontal asymptote and then find the values of x such that f (x) is within 0.5% of its asymptotic
values.
40. The following equation is used to calculate the average firing rate f of a neuron (in spikes per second) as a
function of the concentration x of neurotransmitters perfusing its synapses.
16e5x
f (x) = .
3.2 + e5x
Find the values of x such that f (x) is within 0.5% of its asymptotic values.
41. Compare the solutions obtained to Problems 39 and 40 above and decide which of these represents a tighter
on-off switch of the neuron from being inactive to firing at its maximum rate. What to do you conclude in
terms of which of the parameters a, b, and c in the function
aecx
f (x) =
b + ecx
controls the narrowness of the range of x over which on-off switching occurs. Note that this function is called
the logistic function and will be encountered in many different examples in the upcoming chapters.
∗ Über Geschwindigkeit und Größe der Hefevermehrung in Würze. Biochem. Z.57: 313-334, 1913

202 2.5. SEQUENTIAL LIMITS
2.5 Sequential Limits

In Section 1.7, we considered sequences a1 , a2 , . . . of real numbers, which can be used to model drug concentrations,
population dynamics, and population genetics. In some case, these sequences converged to a limiting value as n got
very large. In this section, we study the limits of sequences, their relationship to continuity, a convergence theorem,
and how these concepts can be used to understand the asymptotic behavior of difference equations. While limits of
functions form the basis of differentiation as we shall soon see, limits of sequences form the basis of integration as
we discuss in Chapter 5.
Sequential Limits and Continuity
For sequences, there is only one type of limit to consider: the sequential limit, defined as the limiting value of an
as n → ∞.
Let a1 , a2 , a3 , . . . be a sequence. We write
lim an = L
n→∞
provided that we can make an arbitrarily close to L for all n sufficiently large. In
this case, we say the sequence converges to L.
Sequential limits We write
(Informal definition) lim an = ∞
n→∞
provided that we can make an arbitrarily positive for all n sufficiently large.
We write
lim an = −∞
n→∞
provided that we can make an arbitrarily negative for all n sufficiently large.
Example 1. Finding sequential limits
In each of the following, if it exists, calculate limn→∞ an where
a. an = 2n .
b. an = 1 + n1 .
2n2 +3n−1
c. an = 5n2 −n+8 .
d. an = cos nπ
2 .
1
e. an = n cos nπ
2 .
Solution.
a. Since 2n gets arbitrarily positive as n gets very positive, limn→∞ 2n = ∞.

1 1
b. Since n approaches zero as n gets very positive, limn→∞ 1 + n = 1.
c. Since the numerator and denominator are polynomials in n, we divide the numerator and denominator
by the term with largest exponent. Namely, n2 .
1
2n2 + 3n − 1 2n2 + 3n − 1 n2
lim = lim 1
n→∞ 5n2 − n + 8 n→∞ 5n2 − n + 8
n2

2.5. SEQUENTIAL LIMITS 203
3 1
2− n − n2
= lim 1 8
n→∞ 5 − +
n n2
2
=
5
d. Since cos nπ
2 alternates between the values 0, 1 and −1, this sequence does not have a limit. There is no
unique value that the sequence approaches.
e. Since | n1 cos nπ
2 |≤
1
n and we can make 1
n arbitrarily close to 0 for n sufficiently large,
1 nπ
lim cos =0
n→∞ n 2
Graphing this sequence confirms this convergence to zero.
0.2
0.1
n
10 20 30 40
-0.1
-0.2
-0.3
-0.4
-0.5
As in our previous limit definitions, the existence of a sequential limit implies that we can make an as close to L
as we like, provided that n is sufficiently large. But how do we verify this statement? What is meant by sufficiently
large? The following example illustrates the answer to this question.
Example 2. Finding sufficiently large n
n
Consider an = 2+n .
a. Find limn→∞ an = L.
b. Determine how large n needs to be to ensure that |an − L| < 0.002.
Solution.
a. Dividing the numerator and denominator by n yields
n 1
lim = lim
n→∞ 2 + n n→∞ 2/n + 1
= 1
Hence, L = 1.
b. We have that


n
2 + n − 1 < 0.002 This is |an − L| < 0.002.

n 2 + n
2 + n − 2 + n < 0.002

−2

2 + n
< 0.002
2
< 0.002 Absolute value of a negative divided by a positive number.
2+n
2
< 2+n Multiply both sides by 2 + n, and divide both sides by 0.002.
0.002
998 < n Simplify and subtract 2 from both sides.
The number n must be greater than 998.

2
There is a wonderful relationship between limits of sequences and limits of functions. This relationship is most
useful for proving discontinuity of a function.
Theorem 2.3. Sequential continuity
Let f be a function. Then limx→a f (x) = L if and only if limn→∞ f (an ) = L for any sequence satisfying
limn→∞ an = a.
One direction of this theorem is clear. If limn→∞ an = a, then an can be made arbitrarily close to a for n
sufficiently large. Therefore, if limx→a f (x) = L, then f (an ) is arbitrarily close to L for n sufficiently large. Hence,
if limx→a f (x) = L, then limn→∞ f (an ) = L. If you are feeling sufficiently adventuresome, try proving this direction
using formal definitions of limits. To do so, you will have to come up with a formal definition of sequential limits.
The other direction of the sequential continuity theorem is significantly more subtle and the ideas of the proof are
beyond the scope of this text.
Example 3. Proving nonexistence of limits
Show that limx→0 sin x1 does not exist.
Solution. Let f (x) = sin x1 . Our goal here is to find two sequences satisfying limn→∞ an = a, but at the same time
their limits of f (an ) are not the same, thus contradicting the fact that the limit of f (x) exists. For this example, we
1 2
let an = πn and bn = π(4n+1) . Then,
1 2
lim =0 and lim =0
n→∞ πn n→∞ π(4n + 1)
We now find the limits of f (an ) and f (bn ).

1
lim f (an ) = lim sin = lim sin(πn) = 0
n→∞ n→∞ an n→∞
and
1 π(4n + 1)
lim f (bn ) = lim sin = lim sin =1
n→∞ n→∞ bn n→∞ 2
Since limn→∞ f (an ) 6= limn→∞ f (bn ) it follows (from the sequential continuity theorem) that limx→0 sin x1 does
not exist because it cannot be equal to both 0 and 1 at the same time. 2

Asymptotic Behavior of Difference Equations

In Section 1.7 when we introduced sequences, we considered a special class of sequences that arise through a difference
equation
an+1 = f (an )
where a1 is specified and f is a function. In some instances, we can actually find explicit expressions for the sequence
defined by the difference equation and take the limit.
Example 4. Finding the limit of a sequence
Find explicit expression for the sequences defined by the following difference equations and find the limit as n
becomes large.
a. an+1 = 0.1an with a1 = 0.1

√
b. an+1 = an with a1 = 2
Solution.
a. We have a1 = 0.1, a2 = 0.1a1 = 0.12 , and a3 = 0.1a2 = 0.13 . Hence, we can see inductively that
an = 0.1n . Since an gets arbitrarily small as n gets sufficiently large, we obtain limn→∞ an = 0.
√ 1/2 1/2 1/2 n
b. We have, a1 = 2 = 21/2 , a2 = a1 = 21/4 , a3 = a2 = 21/8 , . . . , an = an−1 = 21/2 . To find this limit,
consider the logarithm of this sequence—that is
ln 2
ln an =
2n
Clearly, limn→∞ ln an = 0. Thus, by the continuity of ex and the sequential continuity theorem, we get
that
lim an = lim eln an = elimn→∞ ln an = e0 = 1
n→∞ n→∞
In part b of Example 4, we saw that sometimes it is useful to find limn→∞ an by finding limn→∞ f (an ) for an
appropriate choice of a continuous one-to-one function f . In the problem set, you will find more problems of this
type!
Example 5. Lethal recessives revisited
In Example 4 in Section 1.7, we modeled the frequency an of a lethal recessive allele in a population at time n
with the difference equation:
an
an+1 =
1 + an
Assume that the initial frequency of allele is 0.5.
1
a. Verify that an = 1+n satisfies the difference equation. Notice that a1 = 0.5.
b. Determine limn→∞ an . Discuss the implication for the frequency of the lethal recessive allele in the
long-term.
c. Determine how large n needs to be to ensure that an ≤ 0.1.
d. Determine how large n needs to be to ensure that an ≤ 0.01. Discuss the implications.
Solution.

1
a. To verify that an = 1+n satisfies the difference equation, we only need to substitute our expression for
an into both sides of the difference equation:
1
an+1 =
1 + (n + 1)
1
n+1
= 1 n+1 Multiply by 1.
n+1 + n+1
an
=
an + 1
1
b. Since 1+n gets arbitrarily small as n gets arbitrarily large, limn→∞ an = 0. Hence, in the long-term, we
expect the lethal recessive genes to vanish from the population.
c. We want
1 1
≤
1+n 10
10 ≤ 1 + n
9 ≤ n
Hence after 9 generations the frequency of lethal recessives is less than 0.1.
d. We want
1 1
≤
1+n 100
100 ≤ 1 + n
99 ≤ n
Hence after 99 generations the frequency of lethal recessives is less than 0.01. These calculations suggest
that initially the frequency of lethal recessives decreases rapidly, but further decrease in frequency occurs
more and more slowly.
2
Recall that a point a is an equilibrium of a difference equation an+1 = f (an ) if f (a) = a. In Example 4a and
Example 5, the only equilibrium is given by a = 0 and the sequences generated by these difference equations converge
to this equilibrium. In Example 4b, the equilibria are given by a = 0 and a = 1 and sequence determined by the
difference equation converged to the latter equilibrium. This is not a coincidence. To see why, consider a difference
equation
an+1 = f (an )
where f is a continuous function. Now, let us assume that limn→∞ an = a. By the sequential continuity theorem,
we have
a = lim f (an )
n→∞
= lim an+1
n→∞
= f (a)
Hence, the limiting value a is an equilibrium for this difference equation.
Let f be a continuous function and an be a sequence that satisfies

Limits of
an+1 = f (an )
Difference Equations
If limn→∞ an = a, then f (a) = a. In other words, a is an equilibrium.

Example 6. To converge or not to converge
Find the equilibria of the following difference equations and use technology to determine whether the specified
sequence converges to one of the equilibria.
1
a. an+1 = 1+an with a1 = 1
b. an+1 = 2an (1 − an ) with a1 = 0.1
c. an+1 = 3.5an (1 − an ) with a1 = 0.1
Solution.
a. To find the equilibria, we solve
1
a =
1+a
a(1 + a) = 1
a2 + a − 1 = 0
√
1 5
a = − ± By the quadratic formula
2 2
Hence, if the sequences
√
determined by this
√
difference equation have well-defined limits, then these limits
1 5 1 5
are either − 2 + 2 ≈ 0.6180 or − 2 − 2 ≈ −1.6180. Computing the first 20 terms of the difference
equation with a1 = 1 and plotting yields
0.9
0.8
0.7
0.6
n
0.5
a
0.4
0.3
0.2
0.1
0
2 4 6 8 10 12 14 16 18 20
n
It appears that the sequence is converging to the positive equilibrium.
b. To find the equilibrium, we solve
a = 2a(1 − a)
2
2a − a = 0
a(2a − 1) = 0
1
a = 0 and a =
2
Computing and plotting the first 20 terms of the difference equation with a1 = 0.1 and plotting yields
the following graph.

0.9
0.8
0.7
0.6
n
0.5
a
0.4
0.3
0.2
0.1
0
2 4 6 8 10 12 14 16 18 20
n
It appears that the sequence is converging to 21 .

c. To find the equilibrium, we solve
a = 3.5a(1 − a)
2
3.5a − 2.5a = 0
a(3.5a − 2.5) = 0
5
a = 0 or a =
7
Computing and plotting the first 100 terms of the difference equation with a1 = 0.1 and plotting yields
0.9
0.8
0.7
0.6
n
0.5
a
0.4
0.3
0.2
0.1
0
10 20 30 40 50 60 70 80 90 100
n
It appears that the sequence does not converge. Rather it seems to eventually oscillate between four
different values.
2
One of the most important models in population biology is the discrete logistic model an+1 = an +ran (1 − an /K) ,
where the parameter r > 0 is called the intrinsic rate of growth rate and K > 0 is called the environmental carrying
capacity. Of course, it can be written in its equivalent quadratic form an+1 = (1 + r)an − ra2n /K, but this looks
a bit odd because the parameter r appears in two places for seemingly no good reason, while in the first form the
appearance of an on its own will be seen to be very natural once we introduce the notion of a derivative in calculus.
Why this equation is called the discrete logistic equation will become more apparent once we have introduced in
Section 6.1 the logistic differential equation as a model of the way populations grow in time.
Example 7. Dynamics of the Discrete Logistic

a. Find the equilibrium solutions associated with the discrete logistic equation. What do you observe about
the roles of the parameters r and K in determining this equillibrium?
b. Calculate the first 20 points of the sequence an+1 = an + 0.3an (1 + an ) with a1 = 0.1
c. Repeat part b. with a1 = 1.5
d. Calculate the first 20 points of the sequence an+1 = an + 1.9an (1 + an ) with a1 = 0.6
e. Calculate the first 20 points of the sequence an+1 = an + 2.2an (1 + an ) with a1 = 0.6
f. Compare the behavior of the four sequences derived in parts b. to e.. Can you infer anything about how
the value of the parameter r in the logistic model may influence the behavior of the sequences generated
using the model?
Solution.
a. The equilibria are solutions to the equation

a
a = a + ra 1 −
K
a
ra 1 − = 0
K
a=0 and a = K.
From this it is clear that the value of r does not influence the value of the equilibria, one of which is equal
to K. Also, without loss of generality, we can set K = 1, since it is only a scaling variable (its value as it
turns out is determined by the units used to measure x and these units can be multiples or proportions
of the value of K no matter how K is measured) and interpret the units of x in terms of multiples or
fractions of K.
b. The values in this sequence are given in column 2 of Table 2.4
c. The values in this sequence are given in column 3 of Table 2.4
d. The values in this sequence are given in column 4 of Table 2.4
e. The values in this sequence are given in column 5 of Table 2.4
f. The four sequences generated in b. to e. respectively are: nondecreasing (i.e. a1 ≤ a2 ≤ · · · ≤ a20 )

and approaching 1, nonincreasing (i.e. a1 ≥ a2 ≥ · · · ≥ a20 ) and approaching 1, alternating above and
below 1 while approaching 1, and alternating above and below 1 but not approaching 1. In the latter
case the odd terms from a3 onwards are decreasing and appear to be approaching some number just less
than 0.746, while the even terms are increasing and appear to be approaching 1.163 (correct to 3 decimal
places). We can guess that for relatively small values of r the sequence is monotone and approaches
1 from below or above, depending on the starting condition. As r increase from r = 0.3 the sequence
begins to oscillate, doing so by the time r = 1.9, but still approaches 1 until it reaches large values such
as r = 2.2 for which the sequence oscillates but no longer converges to 1. We have no way yet to verify
this statement mathematically, but we will obtain more insight into this process in Section 4.5.
Examples 6 and 7 illustrate that the existence of equilibria for a difference equation does not ensure the conver-
gence of the sequences generated by it. This raises the question, when do the sequences generated by a difference
equation converge to an equilibrium? In general this is a hard question. The following theorem, however, though
too complicated to fully prove here, provides a criterion that ensures convergence of solutions to difference equation.
Later when we have covered the basics of derivatives of functions, we will present another criterion that ensures
convergence of a sequence to an equilibrium when it starts out sufficiently close to that equilibrium. Recall from

Table 2.4: Monotonic and Oscillatory Sequences generated by the Discrete Logistic Model (correct to 3 decimal
places)
Term r = 0.3 r = 0.3 r = 1.9 r = 2.2

a1 0.100 1.500 0.600 0.500
a2 0.127 1.275 1.056 1.050
a3 0.160 1.170 0.944 0.935
a4 0.201 1.110 1.045 1.069
a5 0.249 1.074 0.956 0.906
a6 0.305 1.050 1.036 1.093
a7 0.368 1.034 0.965 0.869
a8 0.438 1.024 1.029 1.119
a9 0.512 1.016 0.972 0.826
a10 0.587 1.011 1.023 1.142
a11 0.660 1.008 0.982 0.759
a14 0.837 1.003 1.015 1.161
a15 0.878 1.002 0.986 0.749
a16 0.910 1.001 1.013 1.163
a17 0.935 1.001 0.988 0.747
a18 0.953 1.001 1.010 1.163
a19 0.966 1.000 0.991 0.746
a20 0.976 1.000 1.008 1.163
Example 7 that a sequence is called increasing (respectively decreasing) if a1 ≤ a2 ≤ a3 ≤ . . . (respectively

a1 ≥ a2 ≥ a3 . . .).
Theorem 2.4. A monotone convergence theorem
Let f be a continuous, increasing function on an interval I such that the image of f lies in I. If a1 , a2 , a3 , . . . , is a
sequence that satisfies an+1 = f (an ), then the sequence is either increasing or decreasing. Moreover, limn→∞ an = a
where a satisfies one of f (a) = a or a = ∞ or a = −∞.
To use this theorem for the difference equation an+1 = f (an ), it often suffices to graph the function f .
450
400
350
300
250
recruits
200
150
100
50
0
0 50 100 150 200 250 300
spawners
Male and female sockeye salmon The relationship between recruits

and stock for Sockeye Salmon in Karluk Lake, Alaska.
Figure 2.15: Sockeye salmon (oncorhymchus nerka)
Example 8. Beverton and Holt sockeye salmon dynamics

The Beverton-Holt model has been used extensively by fisheries to describe stock-recruitment curves. These curves
describe how the current stock of individuals (i.e. the current population) contributes recruits (i.e. new individuals)
to the next year. An example of a stock-recruitment data and a fitted Beverton-Holt function for Sockeye Salmon
in Karluk Lake∗ is shown in Figure 2.15. This fitted function is given by
N
R(N ) =
0.006 N + 0.2
where N is the current stock size (spawners) and R(N ) is the number of recruits for the next year. Since the number
of recruits determines the size of the stock in the next year, we get that the salmon dynamics can be approximated
by the difference equation
Nn+1 = R(Nn )
where Nn is the stock size in the nth year.
a. Find the equilibria of this difference equation.
b. Graph R(N ) and y = N .
c. Apply the convergence theorem to determine what happens to Nn when N1 = 10 and when N1 = 200.
We should note that it is possible to find an explicit solution of this difference equation. This is explored in the
problem set.
Solution.
a. To find the equilibria, we solve
N
N =
0.006 N + 0.2
N (0.006 N + 0.2) = N
N (0.006 N − 0.8) = 0
0.8
N = 0,
0.006
0.8
Hence, the equilibria are given by 0 and 0.006 = 133 31 .
b. Plotting the two functions yields
∗ John A. Gulland, 1983. Fish Stock Assessment: A manual of basic methods, Wiley.

The equilibria correspond to the points where the functions intersect.
c. Since the graph of R(N ) is increasing on I = [0, ∞) and the image of I under f is I, we can apply the
monotone convergence theorem.
10
Assume N1 = 10. Since N2 = 0.06+0.2 ≈ 38.46 ≥ N1 , the monotone convergence theorem implies
that Nn is increasing. On the other hand, since the graph of R(N ) is saturating at 166 32 , we have
10 ≤ Nn+1 = R(Nn ) ≤ 166 32 for all n ≥ 1. Therefore, by the monotone convergence theorem limn→∞ Nn
must equal the equilibrium 133 31 . Cobwebbing with N1 = 10 illustrates this convergence.
200
180
160
140
120
100
80
60
40
20
0
0 20 40 60 80 100 120 140 160 180 200
200
Assume N1 = 200. Since N2 = 0.006·200+0.2 ≈ 142.85 ≤ N1 , the monotone convergence theorem implies
that Nn is a decreasing sequence. On the other hand, 200 ≥ Nn ≥ 133 31 for all n ≥ 1. Therefore, by the
monotone convergence theorem limn→∞ Nn must equal the equilibrium 133 31 (from part a). Cobwebbing
with N1 = 200 illustrates this convergence.
250
200
150
100
50
0
0 50 100 150 200 250
Example 9. Disruptive selection
In Example 4 in Section 1.7, we developed a population genetics model under the assumption that there was
a recessive lethal allele. For this model, we assumed that there were two alleles, A and a, that determined three
possible genotypes, AA, Aa, and aa. Here, we assume that genotype Aa (so called heterozygote) is the least viable.
Extreme instances of inviability are given by different genotypes that can mate but produce non fertile offspring such
as the liger produced by a lion and a tiger or the mule produced by a horse and a donkey. If the genotypes AA and

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 2.16: Disruptive selection
aa produce equal numbers of progeny, then the frequency an of allele a at time n can be modeled by an+1 = f (an )
where the graph of f is shown in Figure 2.16.
a. Determine what happens to an in the long-term if a1 = 0.6.
b. Determine what happens to an in the long-term if a1 = 0.4.
c. As reported in a 1972 Science article, Foster et al. experimentally examined changes in two chromosomal
frequencies in D. melanogaster. Data from a set of experiments is graphed Figure 2.17.
Figure 2.17: Data set for disruptive selection
These experimentally determined graphs show how the frequency of an allele changes over generations
for initial conditions that through a small amount of random variation lead to different population levels
at time 1 on the x-axis. Discuss whether these experiments are consistent with the model predictions.
Solution.
a. Since the graph of f is increasing, we can apply the monotone convergence theorem. Since f (0.6) > 0.6,
the sequence an is increasing if a1 = 0.6. Since an ≤ 1 for all n and f (1) = 1 is the only equilibrium
greater than 0.6, an converges to 1 as n increases. In other words, the frequency of a alleles approaches
one. Cobwebbing reaffirms this prediction:

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
b. Since the graph of f is increasing, we can apply the monotone convergence theorem. Since f (0.4) < 0.4,
the sequence an is decreasing if a1 = 0.4. Since an ≥ 0 for all n and f (0) = 0 is the only equilibrium less
than 0.4, an converges to 0 as n increases. In other words, the frequency of a alleles approaches zero.
Cobwebbing reaffirms this prediction:
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
c. The experiments of Foster et al. are consistent with the model predictions. In particular, the experiments
show that, within the limits of a small amount of random variation, if the frequency of a alleles at time
n = 1 is greater than one-half, then the frequency of a alleles approaches one. Alternatively, if the initial
frequency is less than one-half, then the a alleles are driven to extinction.
Example 10. Fibonacci and the Growth Rate of Rabbits
Fibonacci (see Historical Quest, Problem 33 of Section 1.7) famously posed the following reworded problem a
little over 800 years ago (1202 to be precise). Suppose a newly-born pair of rabbits, one male, one female, are put
in an enclosed, but very large, field. Further, suppose all rabbits are able to mate at the age of one month and that
the impregnated female gives birth to a male-female pair one month later. Ignoring questions relating to inbreeding,
assuming that rabbits never die and there is always enough food for them to eat, what is the asymptotic annual rate
of increases of the rabbits in the enclosed field?
Solution.
Define Rn to be the number of rabbit pairs in the field in month n. The number of rabbit pairs in any month
(Rn ) is equal to the number of rabbit pairs in the field the previous month (Rn−1 ), and for all rabbit pairs in the

Figure 2.18: A rabbit population from a single male-female pair of newborns
field two months ago (Rn−2 ), each will produce a new pair in the month in question. Translating these words into
mathematics yields the following equation:
Rn = Rn−1 + Rn−2 .
If we divide each side by Rn−1 we get
Rn /Rn−1 = 1 + Rn−2 /Rn−1 .
and if we now define an = Rn /Rn−1 —that is, an is the ratio of the number of rabbits in month n to those in month
n − 1—we obtain the equation
an = 1 + 1/an−1 .
The equilibrium solution to this equation is
a = 1 + 1/a
a2 − a − 1 = 0
√
1 5
a= ± .
2 2
Only the positive solution a ≈ 1.6180 applies here and it can be shown that the sequence converges as the number
of months increases (see Problems 44). To get the annual rate of increase we need to calculate (1.618)12 ≈ 322, a
really stunning rate of growth. In Table 2.5 we list the first 13 terms and note that the rate of increase over the first
12 iterations is not exactly 322, because the equilibrium value represents an asymptotic rate rather than an actual
rate for any 12 iterations, particularly the initial 12.

Table 2.5: Fibonacci Rabbit Growth

Month Number of Pairs
0 1
1 1
2 2
3 3
4 5
5 8
6 13
7 21
8 34
9 55
10 89
11 144
12 233
Problem Set 2.5

Determine whether the sequential limits in Problems 1 to 8 exist. If they exist, find the limit. If they don’t exist,
explain briefly why.
n2 −n
1. limn→∞ an where an = 1+3n2
5−2n
2. limn→∞ an where an = 6+3n
en
3. limn→∞ an where an = 1+en
4. limn→∞ an where an = 23/n

5. limn→∞ an where an+1 = −an and a1 = 2
6. limn→∞ an where an+1 = −a−1
n and a1 = 3
7. limn→∞ an where an = cos n

8. limn→∞ an where an = [1 + (−1)n ]
Consider the sequences defined in Problems 9 to 14.

a. Find limn→∞ an .
b. Determine how large n needs to be to ensure that |an − L| < 0.001.
n
9. limn→∞ an where an = 3+n .
2n
10. limn→∞ an where an = n−1 .
1,000
11. limn→∞ an where an = n .
n+1
12. limn→∞ an where an = 1,00n .
n2 +1
13. limn→∞ an where an = n3 .

14. limn→∞ an where an = e−n .
All the sequences in 15 to 18 satisfy limn→∞ an = ∞. Determine how large n has to be to ensure that an ≥ 1, 000, 000.
15. an = 2n.
16. an = n2 .
17. an = 2n − 10, 000.
n2
18. an = 1+n .
Find the sequences determined by the difference equation
an+1 = f (an )
with the initial condition a1 specified in Problems 19 to 24. Determine limn→∞ an . Justify your answer.
19. f (x) = x + 2 with a1 = 0.

x
20. f (x) = with a1 = 27.
3
√
21. f (x) = x with a1 = 100.
22. f (x) = x2 with a1 = 1.00001.
23. f (x) = x2 with a1 = 0.99999.
24. f (x) = 4x2 with a1 = 1.
Find the equilibrium of the difference equations in Problems 25 to 28 and use technology to determine which of the
specified sequences converge to one of the equilibria.
3
25. an+1 = 2+an with a1 = 1.
1
26. an+1 = 5−an with a1 = 1.
27. an+1 = 3an (1 − an ) with a1 = 0.1.
28. an+1 = 5.5(1 − an ) with a1 = 0.2.
Use the monotone convergence theorem in Problems 29 to 34 to determine the limits of the following specified se-
quences.
2an
29. an+1 = 1+an with a1 = 0.5.
2an
30. an+1 = 1+an with a1 = 2.
31. an+1 = 2 ln an with a1 = 1.
32. an+1 = 2 ln an with a1 = 100.
√
33. an+1 = 5 + x with a1 = 0.
√
34. an+1 = 5 + x with a1 = 20.
35. In Example 4 in Section 4, we introduced a model for the frequency of lethal recessive alleles in a population.
The model is given by
an
an+1 =
1 + an
where an is the frequency of the recessive allele in the population.

1
a. If a1 = 0.25, then verify that an = n+3 satisfies the difference equation.
b. Find limn→∞ an .
c. Determine how large n needs to be to ensure that an ≤ 0.1.
d. Determine how large n needs to be to ensure that an ≤ 0.001.
36. Lets consider the lethal recessive allele model in greater generality.
a1 an
a. Verify that an = 1+(n−1)a1 satisfies the difference equation an+1 = 1+an for any choice of a1 .
b. For any a1 , find limn→∞ an .
c. Assuming that a1 lies in (0, 1), determine how large n needs to be to ensure that an ≤ 0.01.
37. In Example 5, we discussed a population genetics model under the assumption that there were two alleles, A
and a, that determined three possible genetic types, AA, Aa, and aa. We assume that genetic type Aa (so
called heterozygote) is the least viable. If the genotype aa produce nine times more progeny than genotype
AA progeny, then the frequency an of allele a at time n can be modeled by an+1 = f (an ) where the graph of
f is given by
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
a. Determine what happens to an in the long-term if a1 = 0.91.

b. Determine what happens to an in the long-term if a1 = 0.89.
c. As reported in a 1972 Science article, Foster et al. experimentally examined changes in two chromo-
somal frequencies in D. melanogaster. Data from a set of experiments is graphed below:
These experimentally determined graphs show how the frequency of an allele changes over generations
for different initial conditions. Discuss whether these experiments are consistent with the model
predictions.

38. The Beverton-Holt model has been used extensively by fisheries. This model assumes that populations are
competing for a single limiting resource and reproduce at discrete moments in time. If we let Nn denote
the population abundance in the nth year (or generation), r the maximal per-capita growth rate, and a as a
competition coefficient, then the model is given by
rNn
Nn+1 =
1 + aNn
with r > 0 and a > 0.
a. Assume Nn = 1 and find the first 4 terms of the sequence.
b. Guess the explicit expression for the sequence and verify that your guess is correct.
c. Find the limn→∞ Nn and the equilibria of this difference equation.
d. Determine under what conditions the population is able to persist i.e. converge to a positive equi-
librium.
In the Fibonacci rabbit problem laid out in Example 10 suppose only a proportion p of the females that could fall
pregnant actually do fall pregnant each month. (We assume the population starts with a large number of pairs so
that we refer to proportions, we are actually thinking of whole numbers of pairs rather than, say 1/3 of 2 pairs which
makes no biological sense). What is the annual rate of increase in Problems 39 to 43.
39. p = 3/4
40. p = 2/3
41. p = 1/2
42. p = 1/3
43. p = 1/4
44. Consider the Fibonacci sequence in Example 10, and show that
1 an
an+2 = 1 + =1+
an+1 1 + an
In other words, the even elements of the sequence satisfy a difference equation with an increasing function
x
f (x) = 1 + 1+x and the convergence theorem can be applied. Similarly, show the same for odd terms.
45. Use technology to calculate the first 20 points of the sequence an+1 = an + ran (1 − an ) with a1 = 0.5 for
the cases r = 0.9, r = 1.5 and r = 2.1. How does this fit in with the discussion in the solution to part f. of
Example 7.
46. Revisiting the sockeye salmon stock-recruitment relationship considered in Example 8 we see from Fig. 2.15
that we could just as credibly fit the Ricker function
y = f (x) = 3.7xe−0.01x
If x is the stock in one generation and y is the stock recruited in the next generation then this relationship is
actually a population model of the form
xn+1 = 3.7xn e−0.01xn .
If the population is now at the level x1 = 100 individuals, use your technology to generate the number of
individuals that you expect in the next 10 generations. Hence deduce the equilibrium value and check this
value by using your technology or the bisection method to solve the equation x = 3.7xe−0.01x .

220 2.6. THE DERIVATIVE AT A POINT
Figure 2.19: Sockeye salmon stock-recruitment. The solid line is fit to the data using a Ricker functional form that
is very similar to the Beverton and Holt form considered in Example 8.
2.6 The Derivative at a Point

In this section, we introduce one of the major concepts in calculus, the idea of a derivative. While functions
are fundamental and limits are essential, they simply laid the foundation for the excitement to come. To help us
motivate the idea of the derivative at a point, it is worthwhile recasting Example 1 in Section 2.1 using different
notation. If we let f (x) represent the population of Mexico City in year x, then the average population between the
years 1980 and 1985 can be found by
f (a + h) − f (a) f (1980 + 5) − f (1980)

=
h 5
where a = 1980 (the base year) and h = 5 (the duration of the time interval). We also found the average rate of
change over smaller and smaller intervals to guess that the instantaneous of change of the Mexico City population
in 1980 to be 1.75 million per year. This idea can be written as
f (1980 + h) − f (1980)
lim ≈ 1.75
h→0 h
In Chapter 1 and Section 2.1, we previewed the notion of the derivative at a point by defining the tangent line for
f (x) at a point x = a to be the limit of the slope of the secant lines
f (a + h) − f (a)
lim
h →0 h
These two ideas, as well as many other important concepts, have the same formula, and it is this limiting process
which is the basis for the concept of the derivative.
The derivative of function f at a point x = a, denoted by f ′ (a), is
f (a + h) − f (a)
Derivative f ′ (a) = lim
at a Point h→0 h
provided this limit exists. If the limit exists, we say that f is differentiable at
x = a.
Example 1. Finding derivatives using the definition

2.6. THE DERIVATIVE AT A POINT 221
Use the definition of a derivative to find the following derivatives.

a. f ′ (3) where f (x) = 1
b. f ′ (2) where f (x) = 3x
c. f ′ (1) where f (x) = 1 + 3x2
Solution.
a. Let f (x) = 1 and a = 3.
f (3 + h) − f (3)
f ′ (3) = lim
h→0 h
1−1
= lim
h→0 h
0
= lim
h→0 h
= 0
b. Let f (x) = 3x and a = 2.
f (2 + h) − f (2)
f ′ (2) = lim
h→0 h
3(2 + h) − 6
= lim
h→0 h
6 + 3h − 6
= lim
h→0 h
3h
= lim
h→0 h
= 3
c. Let f (x) = 1 + 3x2 and a = 1.
f (1 + h) − f (1)
f ′ (1) = lim
h→0 h
[1 + 3(1 + h)2 ] − [1 + 3(1)2 ]
= lim
h→0 h
1 + 3 + 6h + 3h2 − 4
= lim
h→0 h
2
6h + 3h
= lim
h→0 h
= lim (6 + 3h)
h→0
= 6
Example 1 illustrates two facts. First, the derivative of a constant function is 0. Intuitively this makes sense
as a constant function by definition does not change and, consequently, its rate of change should be 0. Second, the
derivative of a linear function is the slope of the linear function. Intuitively this makes sense as the rate at which
the function is increasing is given by the slope of the function. More interestingly, Example 1 illustrates that we can
explicitly compute the slope (equivalently, the instantaneous rate) of a quadratic function.
While the derivatives in Example 1 were pretty straightforward to compute, other derivatives require certain
algebraic procedures to compute, as illustrated by the next example.

Example 2. Algebraic steps to find a derivatives
Find the following derivatives algebraically:

√
a. f ′ (4) where f (x) = x.
1
b. f ′ (5) where f (x) = 1+x .
Solution.
a. To find this derivative, we multiply the numerator and denominator of the quotient by the “conjugate”
of the original numerator.
√ √
f (4 + h) − f (4) 4+h− 4
=
h h √ √
√ √
4+h− 4 4+h+ 4
= √ √ Multiply by 1.
h 4+h+ 4
4+h−4
= √ √ Multiplying out the numerator.
h( 4 + h + 4)
h
= √ Simplifying
h( 4 + h + 2)
1
= √ Since h 6= 0 in the limit
4+h+2
Hence, taking the limit as h goes to 0 yields f ′ (4) = √1 = 14 .
4+2
b. To find this derivative, we can multiply by the common denominator in the numerator.
f (5 + h) − f (5) 1/(1 + (5 + h)) − 1/6
=
h h
1/(6 + h) − 1/6
= Simplifying.
h
1/(6 + h) − 1/6 (6 + h)6
= Multiply by 1.
h (6 + h)6
6 − (6 + h)
= Simplifying.
h(6 + h)6
−h
= Simplifying more.
h(6 + h)6
−1
= Since h 6= 0 in the limit.
(6 + h)6
1
Taking the limit as h goes to 0 yields f ′ (5) = − 36 .
2
Slopes of Tangent Lines

The definition of the derivative was inspired directly by the slope of the tangent line. Using derivatives, we can
redefine the tangent line.
Let f be a function that is differentiable at the point x = a. The tangent line of

Tangent Line f at x = a is the line with slope f ′ (a) that passes through the point (a, f (a)).
Example 3. Tangent line to a parabola

Find the tangent line to f (x) = 1 + 3x2 at x = 1. Sketch the parabola and the tangent line.
Solution. In Example 1 we found that the slope of the tangent line is f ′ (1) = 6. Since the tangent line passes
through (1, f (1)) = (1, 4), we can use the point-slope formula to find the equation of the tangent line:
y−4 = 6(x − 1)
y = 6x − 2.
The graph of the parabola, along with the tangent line at (1, 4) is shown in Figure 2.20.
10
x
-2 -1 1 2
-5
-10
Figure 2.20: Graph of the parabola y = 1 + 3x2 with tangent line at (1, 4)
In Example 3, the tangent line intersects the graph of the function in exactly one point. This unique intersection
is not typical, as illustrated in the next example.
Example 4. Multiple intersections
Find the tangent line to f (x) = x3 at x = 1. Sketch the parabola and the tangent line.
Solution. We first find the slope of the tangent line:
f (1 + h) − f (1)
f ′ (1) = lim
h→0 h
3
(1 + h) − 1
= lim
h→0 h
1 + 3h + 3h2 + h3 − 1
= lim
h→0 h
= lim (3 + 3h + h2 )
h→0
= 3.
The tangent line passes through (1, f (1)) = (1, 1), so the equation of the tangent line is
y − 1 = 3(x − 1).
Equivalently, y = 3x − 2. The graph of y = x3 and the tangent line at (1, 1) is shown in Figure 2.21.
2

y
10
7.5
5
2.5
x
-2 -1 1 2
-2.5
-5
-7.5
-10
Figure 2.21: Graphs of y = x3 and tangent line at (1, 1)
Instantaneous rates
In Section 2.1 we defined
f (b) − f (a)
b−a
to be the average rate of change of f over the interval [a, b]. Taking the limit as b approaches a yields the instantaneous
rate of change
f (b) − f (a)
lim
b→a b−a
In the next example, we relate this definition of the instantaneous rate of change to the derivative.
Example 5. Instantaneous rates
Show that
f (b) − f (a)
lim = f ′ (a)
b→a b−a
provided that the limits exist.
Solution. Let h = b − a. Then b = a + h so that

f (b) − f (a) f (a + h) − f (a)
=
b−a h
Since b approaching a is equivalent to h approaching 0, we see
f (b) − f (a) f (a + h) − f (a)

lim = lim
b→a b−a h→0 h
provided the limits exist. By the definition of a derivative, we get
f (b) − f (a)
f ′ (a) = lim
b→a b−a
2
The solution to Example 5 allows us to equate the derivative with an instantaneous rate.
Instantaneous Rate Let f be a function that is differentiable at x = a. The instantaneous rate of

as a Derivative change of f at x = a is f ′ (a).

Example 6. Instantaneous velocity
On a calm sunny day a penny is dropped from the torch of the Statue of Liberty. The distance (in feet) that the
penny dropped after t seconds is
s(t) = 16 t2 .
a. Find s′ (1) and interpret this quantity.

b. Find the velocity (instantaneous rate of change) of the penny at the moment it hits the ground. Use the
fact that the height from torch to the ground is 305 feet.
Solution.
a.
s(1 + h) − s(1)
s′ (1) = lim
h→0 h
16(1 + h)2 − 16(1)2
= lim
h→0 h
16 + 32h + 16h2 − 16
= lim
h→0 h
32h + 16h2
= lim
h→0 h
= lim (32 + 16h)
h→0
= 32
After one second, the penny is falling at a velocity of 32 feet per second (or 32 ft/s).
b. First, we need to find how long it takes the penny to fall to the ground. When the √penny hits the ground,
√
it has fallen 305 feet. Hence, we need to solve 305 = s(t) = 16t2 which yields t = 305
4 . To find s ′
( 305
4 ):
√ √
305
√
305
′ 305 s( 4 + h) − s( 4 )
s( ) = lim
4 h→0 h
√ √
305 305 2
16( + h)2 − 16(
4 4 )
= lim
h→0 h
√
8 305 h + 16h2
= lim
h→0 h
√
= lim (8 305 + 16h)
h→0
√
= 8 305
Hence, at the moment the penny hits the ground it is falling at a velocity of 140.8 ft/s. This is equivalent to
96 mi/h. We should note, however, that this calculation does not account for the effects of air resistance,
which would be considerable if the penny where attached to a parachute.
2
Example 7. Enzyme activity
∗
Figure 2.22 illustrates some data relating enzyme activity to temperature in Celsius. The best-fitting quadratic
∗ http://www.ecologynet.stir.ac.uk/home/universities/stirling/courses/aqib/timlect10.html

Figure 2.22: Enzyme activity as a function of temperature
equation to this data is given by

A(x) = 11.8 + 19.1 x − 0.2 x2
a. Find A′ (10).
b. Determine the units of this derivative and discuss their meaning.
Solution.
a.
A(10 + h) − A(10)
A′ (10) = lim
h→0 h
[11.8 + 19.1(10 + h) − 0.2(10 + h)2 ] − [11.8 + 19.1(10) − 0.2(10)2 ]
= lim
h→0 h
2
15.1h − 0.2h
= lim
h→0 h
= lim (15.1 − 0.2h)
h→0
= 15.1
b. The units of A′ (10) are enzyme activity per degree Celsius. A′ (10) = 15.1 means that at 10◦ C, the
enzyme activity is increasing at a rate of 15.1 per degree.
2
Differentiability and Continuity

To fully appreciate differentiability, it is useful to understand examples of non-differentiability.
Example 8. Continuous but not differentiable
Examine the continuity and differentiability of f (x) at x = a for the following two functions.
a. In Example 7 from Section 2.2, we modeled the feeding rate of planktonic copepods with the function

6.25 x cells/hour if x ≤ 200
f (x) =
1, 250 cells/hour if x > 200

where x is number of cells per liter. Let a = 200.

b. Let f (x) = x1/3 and a = 0.
Solution.
a. Since
lim f (x) = lim 1, 250 = 1, 250
x→200+ x→200+
and
lim f (x) = lim 6.25 x = 1, 250
x→200− x→200−
we see f is continuous at x = 200.
On the other hand, for h < 0,
f (200 + h) − f (200) 6.25(200 + h) − 6.25(200)
lim = lim
h→0− h h→0− h
6.25 h
= lim
h→0− h
= 6.25
and for h > 0,
f (200 + h) − f (200) 1, 250 − 1, 250
lim = lim
h→0+ h h→0+ h
0
= lim
h→0+ h
= 0
Since the left- and right-hand limits are not equal, the limit does not exist, so f is not differentiable
at x = 200. As you can see in Figure 2.23, the function is continuous but is still not differentiable at
x = 200.
Figure 2.23: Feeding rate of planktonic copepods
b. Since f (x) = x1/3 is arbitrarily close to 0 for all x sufficiently close to 0, limx→0 f (x) = 0. Since f (0) = 0,
f is continuous at x = 0. To determine the derivative of x1/3 at x = 0, we need to consider
f (0 + h) − f (0) (0 + h)1/3 − 01/3
lim =
h→0 h h
= lim h−2/3
h→0
= ∞

Hence, the derivative is not defined as the limit is not finite. Graphing y = x1/3 reveals that the slope of
the tangent line at x = 0 is infinite: i.e. it is vertical.
0.4
0.2
x
-0.04 -0.02 0.02 0.04
-0.2
-0.4
Example 8 illustrates that continuity does not ensure differentiability and that differentiability can fail in at least
two ways. The limit of the slopes of the secant lines might not converge or this limit may become infinitely large.
While continuity does not imply differentiability, the opposite, differentiability ensures continuity, is true. Hence,
differentiability can be viewed as an improvement over continuity in what mathematicians refer to as the smoothness
of a function.
Differentiability
If f is differentiable at the point x = a, then f is continuous at x = a.
implies Continuity
To prove this property, assume that f is differentiable at x = a. Then
lim [f (x) − f (a)] = lim [f (a + h) − f (a)]

x→a h→0
h
= lim [f (a + h) − f (a)] Multiplying by one
h→0 h

f (a + h) − f (a)
= lim h
h→0 h
f (a + h) − f (a)
= lim h · lim Limit law for product
h→0 h→0 h
′
= 0 · f (a) Definition of derivative
= 0
Therefore, by the limit law for sums,

lim f (x) = lim f (a)
x→a x→a
or
lim f (x) − lim f (a) = 0
x→a x→a
and thus f is continuous at x = a since f (a) is a constant.
Problem Set 2.6

Using the definition of a derivative, find the derivatives specified in Problems 1 to 10.
1. f ′ (−2) where f (x) = 3x − 2.

2. f ′ (3) where f (x) = 5 − 2x.

3. f ′ (1) where f (x) = −x2 .

4. f ′ (0) where f (x) = x + x2 .
1
5. f ′ (−4) where f (x) = 2x .
1
6. f ′ (2) where f (x) = x+1 .
7. f ′ (−1) where f (x) = x3 .

8. f ′ (2) where f (x) = x3 + 1.
√
9. f ′ (9) where f (x) = x. Hint: Multiply by 1 (think conjugate)
√
10. f ′ (5) where f (x) = 5x. Hint: Multiply by 1 (think conjugate)
Find the tangent line at the specified point and graph the tangent line and the corresponding function in Problems 11
to 20. Notice these functions are the same as those given in Problems 1 to 10.
11. f (x) = 3x − 2 at x = −2.
12. f (x) = 5 − 2x at x = 3.
13. f (x) = −x2 at x = 1.
14. f (x) = x + x2 at x = 0.
1
15. f (x) = 2x at x = −4.
1
16. f (x) = x+1 at x = 2.
17. f (x) = x3 at x = −1.

18. f (x) = x3 + 1 at x = 2.
√
19. f (x) = x at x = 9.
√
20. f (x) = 5x at x = 5.
Determine at which values of x in Problems 21 to 26 that f is not differentiable. Explain briefly.
21.
y
1
0.8
0.6
0.4
0.2
x
-1 -0.5 0.5 1
22.
y
0.8
0.6
0.4
0.2
x
-1 -0.5 0.5 1

23.
y
1
0.8
0.6
0.4
0.2
x
-1 -0.5 0.5 1 1.5
24.
y
2
1.5
0.5
x
-2 -1.5 -1 -0.5 0.5 1 1.5 2
-0.5
-1
-1.5
-2
25. f (x) = |x − 2|
26. f (x) = 2|x + 1|

27. Let f (x) = √−2x if x < 1
x−3 if x ≥ 1
a. Sketch the graph of f .

b. Show that f is continuous, but not differentiable at x = 1.
28. Give an example of a function that is continuous on (−∞, ∞) but is not differentiable at x = 5.
29. A baseball is thrown upwards and its height at time t in seconds is given by
H(t) = 10t − 16t2 meters
a. Find the velocity of the baseball after 2 seconds.

b. Find the time at which the baseball hits the ground
c. Find the velocity of the baseball when it hits the ground.

30. A ball is thrown directly upward from the edge of a cliff and travels in such a way that t seconds later, its
height above the ground at the base of the cliff is
H(t) = −16t2 + 40t + 24

feet.
a. Find the velocity of the ball after three seconds.

b. When does the ball hit the ground, and what is its impact velocity?
c. When does the ball have a velocity of zero? What physical interpretation should be given to this
time?
31. Figure 2.22 illustrates some data relating enzyme activity to temperature in Celsius. The best fitting quadratic
equation to this data is given by
A(x) = 11.8 + 19.1 x − 0.2 x2
Find A′ (50) and discuss its meaning.
32. An environmental study of a certain suburban community suggest that t years from now, the average level of
carbon monoxide in the air can be modeled by the formula
f (t) = 0.05t2 + 0.1t + 3.4
parts per million.
a. At what rate will the carbon monoxide level be changing with respect to time one year from now.
b. By how much will the carbon monoxide level change during the first year?
33. Perelson and colleagues∗ studied the viral load of HIV patients during antiviral drug treatment. They estimated
the viral load of the typical patient to be
V (t) = 216e−0.2t
particles per mL on day t after the drug treatment.
a. Estimate V ′ (2).
b. Describe the units of V ′ (2) and interpret this quantity.
34. Stock-recruitment data and a fitted Beverton-Holt function for Sockeye Salmon in Karluk Lake, Alaska was
shown in Figure 2.15. The fitted function was
x
y = f (x) =
0.006 x + 0.2
where x is the current stock size and y is the number of recruits for the next year. To determine the number
of recruits produced per individual, consider the function
f (x) 1
y = g(x) = =
x 0.006 x + 0.2
a. Algebraically find g ′ (10).

b. Describe the units of g ′ (10), and discuss the meaning of this quantity.
∗ A.S. Perelson, A.U. Neumann, M. Markowitz, J.M. Leonard, D.D. Ho: HIV-1 Dynamics in vivo: virion clearance rate, infected cell
lifespan, and viral generation time (1996): Science, 271, 1582-1586 and A. S. Perelson, P. W. Nelson: Mathematical Analysis of HIV-1
Dynamics in vivo (1999): SIAM Review, 41, 3–44.

35. In Example 6 in Section 1.6, we developed the Michaelis-Menton model for the rate at which an organism
consumes its resource. For bacterial populations in the ocean, this model was given by
1.2078x
f (x) = micrograms of glucose per hour
1 + 0.0506x
where x is the concentration of glucose (micrograms per liter) in the environment. To determine the rate of
glucose consumption per microgram of glucose in the environment, consider the function
f (x) 1.2078
y = g(x) = =
x 1 + 0.0506x
a. Algebraically compute g ′ (0) and g ′ (20).

b. Describe the meaning of the derivatives that you computed.
36. In Example 4 in Section 2.4, we found that the rate at which wolves kill moose can be modeled by
3.36x
f (x) = moose killed per wolf per hundred days
0.42 + x
where x is measured in number of moose per km2 . To determine the per-capita killing rate of moose, consider
the function
f (x) 3.36
y = g(x) = =
x 0.42 + x
a. Algebraically compute g ′ (1) and g ′ (2).

b. Describe the meaning of the derivatives that you computed.

2.7. DERIVATIVES AS FUNCTIONS 233
2.7 Derivatives as Functions

Our notion f ′ (a) for the derivative at the point x = a suggests that f ′ is a function. Indeed this is true.
Let f be a function. The derivative of f is defined by
f (x + h) − f (x)
Derivative as a Function f ′ (x) = lim
h→0 h
for all x for which this limit exists.
Example 1. Finding Derivatives
Find the derivatives f ′ of the following functions f .

a. f (x) = 1
b. f (x) = x
c. f (x) = x2
d. f (x) = x3
e. Guess the derivative of f (x) = xn for n a whole number.
Solution.
a. If f (x) = 1, then f ′ (x) = 0 for every x, (see Example 1of Section 2.6). The derivative of a constant is 0.
b. Use the definition of the derivative of a function.
f (x + h) − f (x)
f ′ (x) = lim
h→0 h
x+h−x
= lim
h→0 h
h
= lim
h→0 h
= 1.
The derivative of a linear function is the slope of that linear function. This generalizes our work in
Example 1 of Section 2.6.
c. We use the definition. For a fixed number x:
f (x + h) − f (x)
f ′ (x) = lim
h→0 h
(x + h)2 − x2
= lim
h→0 h
x + 2hx + h2 − x2
2
= lim
h→0 h
2hx + h2
= lim
h→0 h
= lim (2x + h)
h→0
= 2x.

234 2.7. DERIVATIVES AS FUNCTIONS
d. Again, we use the definition. For a fixed number x:

f (x + h) − f (x)
f ′ (x) = lim
h→0 h
(x + h)3 − x3
= lim
h→0 h
x + 3hx2 + 3h2 x + h3 − x3
3
= lim
h→0 h
3hx2 + 3h2 x + h3
= lim
h→0 h
= lim (3x2 + 3hx + h2 )
h→0
2
= 3x .
e. The above parts suggest that f ′ (x) = nxn−1 for n a whole number. Indeed, this turns out to be true, as
we will show in Chapter 3.
2
It is worthwhile to designate a point on a continuous curve where the graph changes from falling to rising or
from rising to falling. If a point x = a separates an interval over which a continous function f is increasing from an
interval over which f is decreasing, then (a, f (a) is a turning point. The same terminology is used if it changes
from decreasing to increasing. This property is related to the derivative of f . Since f ′ (a) corresponds to the slope
of the tangent line of y = f (x) at the point (a, f (a)), the graphs of y = f (x) and y = f ′ (x) are intimately related as
the following example illustrates.
Example 2. Mix and Match
Match the graphs of y = f (x)

y y y
1 x
-1 -0.5 0.5 1
0.75 2 0.9
0.5 0.8
1.5
0.25 0.7
1
x
-1 -0.5 0.5 1 0.6
0.5
-0.25
0.5
-0.5 x
-1 -0.5 0.5 1 0.4
-0.75
(a) (b) (c)

with the graph of their derivatives y = f ′ (x).
y y y
x
-1 -0.5 0.5 1
6
4
-0.2 4
2 2
-0.4
x
-1 -0.5 0.5 1
x -0.6
-1 -0.5 0.5 1 -2
-0.8 -4
-2
-6
-1
(i) (ii) (iii)
Solution.
a. Looking at the graph we see three turning points at approximately −0.6, 0, and 0.6. A turning point on
the graph corresponds to a place where the derivative is 0. We see that the derivative graph must be

(iii). Also, note that the slope of the tangent lines on graph (a) go from positive to negative to positive
to negative as x goes from −1 to 1. The only derivative graph consistent with this pattern is (iii).
b. The turning points for this graph are at approximately −0.4 and 0.4, the graph of (i) shows the derivative
to be 0 at those points. Also, note that the slope of the tangent lines go from negative to positive to
negative as x goes from −1 to 1. The only derivative graph consistent with this pattern is (i).
c. There are no turning points on this graph, so the derivative graph should not cross the x-axis, the
derivative graph is (ii). Also, by the pigeon hole principle, the graph of the derivative for (c) is (ii).
When given numerical data, we can estimate the derivative of the data using the definition of the derivative with
the smallest possible h value.
Example 3. Estimating the derivative using a table
In Example 1 of Section 2.1, we considered the population size of Mexico (in millions) in the early 1980s as
reported in the following table:
Year Population
1980 67.38
1981 69.13
1982 70.93
1983 72.77
1984 74.66
1985 76.60
Let P (t) denote the population size t year after 1980. That is, t = 0 corresponds to 1980. For example, we found
the average population growth rate in 1980 to be about 1.75 million/yr, so we would write this as P ′ (0) ≈ 1.75.
a. Estimate P ′ (t) at t = 0, 1, 2, 3, 4 using h = 1 in the definition of a derivative.
b. In Problem 28, Section 1.5 you may have found the population size could be represented by the exponential
function P (t) = 67.37(1.026)t. Approximate the derivative with an exponential function, and compare it
with P (t). What do you notice?
Solution.
a. The estimates of P ′ (t) are calculated from the data as indicated in the third column in Table 2.6.
Table 2.6: Estimates of the rate of population growth in Mexico
Year t Estimates of P ′ (t) from data P ′ (t) = 0.26P (t)

P (1)−P (0)
1980 0 1 = 69.13 − 67.38 = 1.75 1.75
P (2)−P (1)
1981 1 1 = 70.93 − 69.13 = 1.80 1.80
P (3)−P (2)
1982 2 1 = 72.77 − 70.93 = 1.84 1.84
P (4)−P (3)
1983 3 1 = 74.66 − 72.77 = 1.89 1.89
P (5)−P (4)
1984 4 1 = 76.60 − 74.66 = 1.96 1.94
1985 5 calculation not possible 1.99

b. To approximate P ′ (t) by a exponential function, we can look at the ratios,
P ′ (1) 1.80
= ≈ 1.029
P ′ (0) 1.75
P ′ (2) 1.84
= ≈ 1.022
P ′ (1) 1.80
P ′ (3) 1.89
′
= ≈ 1.027
P (2) 1.84
P ′ (4) 1.94
= ≈ 1.026
P ′ (3) 1.89
We notice these ratios are all about the same; in fact, the average is 1.026, which is the ratio for the
population function itself! We approximate P ′ (t) by
P ′ (t) ≈ 1.75(1.026)t
Comparing this function with the function for the population growth, we see that
P ′ (t) 1.75(1.026)t
= ≈ 0.026
P (t) 67.37(1.026)t
Thus
P ′ (t) = 0.26P (t).
If we use this formula to calculate the derivative at times t = 0, 1, ..., 5 we see in Table 2.6 that values
obtained are very close to the estimates obtained directly from the data, with only the t = 4 differing by
and amount of 0.02. The advantage of having the formula is that we can calculate the derive at t = 5 as
well.
2
The solution to Part b. of the above example suggests that whenever P (t) has the general exponential form
P (t) = abt then the derivative has the same form but differing by some constant: that is P ′ (t) = cP (t), where for
the example considered above we obtained c = 0.026. This equation is an example of what is known as a differential
equation as it relates a function to its derivative. In Chapter 3, we will indeed verify that the derivative of an
exponential function is a constant multiple of the exponential function. In the context of biological populations, the
equation P ′ (t) = cP (t) implies that, for exponential growth, the population growth rate as function of time is a
constant multiple of the population abundance as a function of time.
Notational Alternatives
Leibniz developed an alternative notation for the derivative f ′ of a function f . This notation is inspired by the
following rewriting of the derivative. Let ∆x represent a small change in x. The change of y = f (x) over the interval
[x, x + ∆x] is given by
∆y = f (x + ∆x) − f (x)
The average rate of change of y = f (x) over the interval [x, x + ∆x] is given by
∆y
∆x
Hence, the derivative of f at x is
∆y
lim
∆x→0 ∆x
Leibniz represented this limit as

dy ∆y
= lim
dx ∆x→0 ∆x

where in some sense dy corresponds to an “infinitesimal” change in y and dx represents an “infinitesimal” change in
x. Variations of this notation include
dy df d
f ′ (x) = = = f (x)
dx dx dx
To indicate the derivative at the point x = a using Leibniz notation, we have to write down the rather cumbersome
expression
dy

dx x=a
Example 4. Using alternative derivative notations
Find the following derivatives:

a. dy

dx where y = x3 .
x=−1
df
b. dx where f (x) = x5 .
Solution.

dy
a. In Example 1 we found that the derivative of x3 is 3x2 . Since dx x=−1 where y = x3 is the derivative of
x3 evaluated at x = −1, we have
dy
= 3x2

= 3.
dx x=−1 x=−1
b. In Example 1 we guessed that the derivative of xn is nxn−1 ; in which case for n = 5 we have
df d 5
= (x ) = 5x4 .
dx dx
Example 5. Ant Biodiversity
Ecologist Nathan Sanders and colleagues examined the patterns of local ant species richness along an elevational
gradient in the Spring Mountains in Nevada.∗ The data illustrated in Figure 2.24 shows the number of species of
ants as a function of elevation (in km) in Kyle Canyon, Spring Mountains, Nevada.
A parabola which best fits this data is
S = −10.3 + 24.9 x − 7.7 x2 species
where x is elevation measures in kilometers.
dS
a. Find dx .
dS dS
b. Identify the units of dx and interpret dx .
Solution.
∗ N. Sanders, J. Moss, and D. Wagner, “Patterns of ant species richness along elevational gradients in an arid ecosystem,” Global
Ecology and Biogeography, 2003, 12:93–102

richness
14
12
10
8
6
4
2
elevation
0.5 1 1.5 2 2.5 3
Figure 2.24: Number of species of ants
a.
dS S(x + h) − S(x)
= lim
dx h→0 h
[−10.3 + 24.9(x + h) − 7.7(x + h)2 ] − [−10.3 + 24.9x − 7.7x2 ]
= lim
h→0 h
24.9 h − 15.4 xh − 7.7h2
= lim
h→0 h
= lim (24.9 − 15.4 x − 7.7 h)
h→0
= 24.9 − 15.4x.
b. The units of dS dS
dx are species per kilometer. dx represents the rate of change of species richness with respect
to elevation. For elevations less than 24.9/15.4 ≈ 1.6 kilometers, dSdx > 0. Consequently, for elevations of
less than 1.6 kilometers, an ant loving entomologist should move on up. However, for elevations greater
than 1.6 kilometers, an ant-loving entomologist should move on down.
Mean value theorem

To understand what the derivative tells us about the shape of a function, we need the mean value theorem. The
proof of this theorem is left as a series of challenging exercises in the problem set.
Theorem 2.5. The Mean Value Theorem
Let f be a function that is differentiable on the interval [a, b]. Then there exists c in [a, b] such that
f (b) − f (a)
f ′ (c) =
b−a
Notice that the right-hand side of this equation is the average rate of change of f over the interval [a, b]. Hence,
the mean value theorem states that for a differentiable function on an interval [a, b], there is a point in the interval
where the instantaneous rate of change equals the average rate of change. Alternatively, we can think of the mean
value theorem in geometric terms. Recall that the right hand side of (2) is the slope of the secant line passing through
the points (a, f (a)) and (b, f (b)). Hence, the mean value theorem asserts that there is a point in the interval such
that the slope of the tangent line at this point equals the slope of the secant line. A graphical representation of this
interpretation is given in Figure 2.25.

Figure 2.25: The Mean value theorem in action.
Example 6. Mean value theorem in action
Determine whether the mean value theorem applies for the following functions f on the specified intervals [a, b].
If the mean value theorem applies, then find c in [a, b] such that the statement of the mean value theorem holds.
a. f (x) = x2 on the interval [0, 2].
b. f (x) = |x| on the interval [−1, 1].
Solution.
a. Recall that f ′ (x) = 2x for all x. Hence, f is differentiable on the interval [0, 2]. Consequently, the mean
value theorem applies and we should be able to find the desired “c”. The average rate of change of f on
[0, 2] is given by
f (2) − f (0) 22 − 0
= =2
2−0 2
Solving f ′ (x) = 2x = 2 yields x = 1. Hence, the instantaneous rate of change at x = 1 equals the average
rate of change over the interval [0, 2]. The following plot with y = x2 in red, the tangent line in blue, and
the dashed line connecting (0, f (0)) to (2, f (2)) illustrates our calculations:
3.5
2.5
1.5
0.5
−0.5
−1
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

b. We need to find the derivative of f (x) = |x|. Since f (x) = x for x > 0, we get
f (x + h) − f (x) x+h−x
=
h h
h
= =1
h
whenever every h is sufficiently small but not equal to zero. Hence, f ′ (x) = 1 for x > 0. On the other
hand, since f (x) = −x for x < 0, we get
f (x + h) − f (x) −x − h − (−x)
=
h h
−h
= = −1
h
whenever h is sufficiently small but not equal to zero. Hence, f ′ (x) = −1 for x < 0.
So what happens at the point x = 0? Our calculations imply that limh→0+ f (h)−f h
(0)
= 1 but limh→0− f (h)−f
h
(0)
=
−1. Since these one-sided limits do not agree, f is not differentiable at x = 0 and the mean value theorem
need not apply.
|−1|−|1|
In fact, since the average rate of change over the interval [−1, 1] equals 1−(−1) = 0, there is no instanta-
neous rate of f that equals the average rate of change.
2
Example 7. Foraging for food
Figure 2.26: A ruby-throated hummingbird (Archilochus cloubris)
Hummingbirds are small birds weighing as little as three grams and having an energetically demanding lifestyle.
With their wings beating at rates of 80-100 beats per second, the hummingbird can lose 10-20% of its body weight
in one to two hours. To survive, hummingbirds require relatively large amounts of nectar from flowers. Therefore,
they spend much of the day flying between patches of flowers extracting nectar. As a hummingbird extracts nectar
in a patch, its energetic gains E(t) in calories increases with time t (in seconds). Figure 2.27 shows a hypothetical
graph of energetic gains E(t), (in calories) in one patch.
a. Approximate the average rate of energy intake over the interval [0, 60].
b. Use the geometric interpretation of the mean value theorem to estimate the time when the instantaneous
rate of energy intake equals the average rate of energy intake.

E
1000
800
600
400
200
t
10 20 30 40 50 60
Figure 2.27: Energy gains over time.
Solution.
a. Since E(0) = 0 and E(60) ≈ 1000, we obtain
1000
Average Rate of Energy Intake ≈ ≈ 16.7
60
calories per second.
b. Graphing the line connecting the points (0, 0) and (60, 1000) yields
E
1000
800
600
400
200
t
10 20 30 40 50 60
The slope of this line is approximately 16.7. To estimate the time at which E ′ (t) = 1000
60 , we can place
a straightedge on top of the red line segment and slowly slide it upwards keeping it parallel to the red
segment. If we slide it upwards until the straightedge is tangent to the curve y = E(t), then we obtain
E
1500
1250
1000
750
500
250
t
10 20 30 40 50 60

where the blue segment shows the location of the tangent line at t ≈ 20 seconds. Hence, the instantaneous
rate of energy intake rate equals the average energy intake rate at t ≈ 20.
It is worth noting from the last figure that E ′ (t) is above the average rate of energy intake for t < 20 and below
the average for t > 20. Hence, as we explore in more detail in Section [x-ref], the humming bird may consider leaving
the patch once its instantaneous rate of energy intake has dropped to a relatively low level.
Derivative and Graphs

Using the mean value theorem, we can prove the following two facts about the relationship of the sign of the derivative
f ′ to the graph of y = f (x).
Let f be a function that is differentiable on the interval (a, b). If f ′ (x) > 0 for all
Increasing-decreasing x in (a, b), then f is increasing on (a, b). If f ′ (x) < 0 for all x in (a, b), then f is
decreasing on (a, b).
To prove these properties, assume that f ′ > 0 on (a, b). Take any two points x2 > x1 in the interval (a, b). By
the Mean Value Theorem, there exists c in [x1 , x2 ] such that
f (x2 ) − f (x1 )
f ′ (c) =
x2 − x1
Since f ′ (c) > 0, we have

f (x2 ) − f (x1 )
> 0.
x2 − x1
Since x2 − x1 > 0, we have f (x2 ) − f (x1 ) > 0. Equivalently f (x2 ) > f (x1 ). Therefore, f is increasing on the interval
[a, b].
The proof in the case of f ′ < 0 on [a, b] can be proven similarly and is left as an exercise in the problem set.
Example 8. Identifying signs of f ′
Let the graph of y = f (x) be given by Figure 2.28.
x
-3 -2 -1 1 2 3
-1
-2
For the interval [−3, 2], where is the derivative positive and where is it negative?
Solution. Since the graph is increasing on the intervals (−3, −2) and (0, 1), f ′ > 0 on these intervals. Since the
graph is decreasing on the intervals (−2, 0) and (1, 3), f ′ < 0 on these intervals. 2

Example 9. From f ′ to f
Let the graph of y = f ′ (x) be given by the graph in Figure 2.29.
Figure 2.29: Graph of a derivative
Sketch a possible graph for y = f (x).
Solution. We find the graph of f by looking at the intervals for which the graph of f ′ (x) is positive or negative,
as shown in Figure 2.30.
a. On (−∞, 0) the graph of the b. On (0, 2) graph of the c. On (2, ∞) the graph of the
derivative is positive, so the derivative is negative, so the derivative is positive, so the
graph of f is rising (slope positive). graph of f is falling. graph of f is rising.
Figure 2.30: Construction of the graph of a function f given its derivative f ′
Furthermore, you might note where the derivative crosses the x-axis and therefore must be a turning point for
the graph of y = f (x) as shown here:

A possible graph of y = f (x) is shown:
Problem Set 2.7

Use the definition of a derivative to find f ′ (x) for the functions in Problems 1 to 8.
1. f (x) = 8
2. f (x) = 3x − 2
3. f (x) = −x2 .
4. f (x) = x + x2
5. f (x) = x4
6. f (x) = x3 − x
1
7. f (x) = x
1
8. f (x) = 2x
Use the derivatives found in Problems 1 to 8 to find the values requested in Problems 9 to 16.


dy
9. dx x=−2 where y = 8

dy
10. dx x=−2 where y = 3x − 2

dy
11. dx where y = −x2
x=4

dy
12. dx x=4 where y = x + x2

dy
13. dx x=2 where y = x4

dy
14. dx where y = x3 − x
x=2

dy 1
15. dx x=10 where y = x

dy 1
16. dx x=10 where y = 2x
Find at what point the slope of the instantaneous rate of change equals the average rate of change over the specified
intervals specified in Problems 17 to 22. Also, provide a sketch that illustrates this relationship.
17. f (x) = 8 over the interval [−5, 5]
18. f (x) = 3x − 2 over the interval [3, 4]
19. f (x) = −x2 over the interval [−1, 1]
20. f (x) = x + x2 over the interval [0, 1]
1
21. f (x) = x over the interval [1, 2]
1
22. f (x) = 2x over the interval [1, 4].
Mix and match the graphs in Problems 23 to 28 with the graphs labeled (i) to (vi) which are the derivative graphs.
23.

24.
25.
26.

27.
28.
(i)

(ii)
(iii)
(iv)

(v)
(vi)
For each of the functions given in Problems 29 to 34, find intervals for which f is increasing and the intervals for
which f is decreasing.
29. f (x) = x2 − x + 1
30. f (x) = 5 − x2
31. f (x) = x3 + x
32. f (x) = 8 − x3
33. Let f be the function for which the graph of the derivative y = f ′ (x) is given by
3.25
2.75
2.5
2.25
x
-3 -2 -1
1.75

34. Let f be the function for which the graph of the derivative y = f ′ (x) is given by
y
8
x
-2 -1 1 2
-2
-4
-6
35. For the graph y = f (x) given in Problem 33, estimate all values of c in [−3, 0] such that
f (−3) − f (0)
= f ′ (c)
3
36. For the graph y = g(x) given in Problem 34, estimate all values of c in [−3, 2] such that
f (−2) − f (2)
= f ′ (c)
4
In each of Problems 37 to 40, the graphs of a function f ′ is given. Draw a possible graph of f .
37.
38.

39.
40.
41. Let f be differentiable on the interval [a, b]. Use the mean value theorem to prove if f ′ < 0 on [a, b], then f is
decreasing on [a, b].
42. Rolle’s theorem Let f be differentiable on [a, b]. Assume f (a) = f (b) = 0. Prove that there exists a c in [a, b]
such that f ′ (c) = 0.
43. Use Rolle’s Theorem to prove the mean value theorem.
44. A baseball is throw upwards and its height at time t in seconds is given by
H(t) = 10t − 16t2 meters
a. Find the velocity of the baseball after t seconds.

b. Find the time at which the velocity of the ball is 0.
c. Find the height of the ball at which the velocity is 0.
45. To study the response of nerve fibers to a stimulus, a biologist models the sensitivity, S, of a particular group
of fibers by the function.

t for 0 ≤ t ≤ 3
f (t) = 9
t for t > 3
where t is the number of days since the excitation began.

a. Over what time period is sensitivity increasing? When is it decreasing?

b. Graph S ′ (t).
46. During the time period 1905-1940, hunters virtually wiped out all large predators on the Kaibab Plateau near
the Grand Canyon in northern Arizona. The data for the deer population, P over this period of time is given.
Year Deer Population
1905 4,000
1910 9,000
1915 25,000
1920 65,000
1924 100,000
1925 60,000
1926 40,000
1927 37,000
1928 35,000
1929 30,000
1930 25,000
1931 20,000
1935 18,000
1939 10,000
a. Estimate P ′ (t) for 1905 ≤ t ≤ 1939

b. Graph and interpret P ′ (t).
47. In 1913, Carlson studied a growing culture of yeast (see Section 6.1). Below we recopy the table of population
densities N (t) at one hour intervals.
Time Population Time Population Time Population

0 9.6 6 174.6 12 594.8
1 18.3 7 257.3 13 629.4
2 29.0 8 350.7 14 640.8
3 47.2 9 441.0 15 651.1
4 71.1 10 513.3 16 655.9
5 119.1 11 559.7 17 659.6
a. Estimate N ′ (t) for 0 ≤ t ≤ 17.

b. Graph N ′ (t) and briefly interpret this graph.
48. Our ruby throated hummingbird has entered another patch of flowers and the energy she is getting as function
of time in the patch is plotted below:
2200
2000
1800
1600
1400
calories
1200
1000
800
600
400
200
0
0 10 20 30 40 50 60
seconds

a. Find the average energy intake rate.

b. Estimate at what time the instantaneous energy intake rate equals the average intake rate.


DEFINITIONS
Section 2.1
Average rate of change, p. 143
Instantaneous rate of change, p. 144
Section 2.2
Limit (informal), p. 154
Right-hand limit, p. 161
Left-hand limit, p. 161
Floor function, p. 161
Limit (formal), p. 164
Section 2.3
Continuity at a point, p. 177
Continuity on an interval, p. 180
Section 2.4
Infinity, p. 190
Horizontal asymptote (informal), p. 190
Vertical asymptote (informal), p. 193
Infinite limits (informal), p. 197
Section 2.5
Sequential limit, p. 202
Increasing sequence, p. 210
Decreasing sequence, p. 210
Section 2.6
Derivative at a point, p.220
Differentiable function, p.220
Tangent line, p. 222
Instantaneous rate, p. 224
Section 2.7
Derivative of a function, p. 233
IMPORTANT IDEAS AND THEOREMS
Section 2.1
Tangent and secant lines p.145
Section 2.2
Evaluating limits, p. 154
Matching limits, p. 161
Section 2.3
Properties of limits (limit laws), p. 174
Limits of polynomials and rational functions, p. 175
Composition limit law, p. 179
Continuity laws, p. 179
Continuity of elementary functions, p. 179
Intermediate value theorem, p. 181
Section 2.4
Understanding functional behavior using asymptotes, p. 190
Section 2.5
Sequential continuity, p. 204

Limits of a difference equation, p. 207

Monotone convergence theorem, p. 210
Section 2.6
Instantaneous rate as a derivative, p. 224
Continuity does not imply differentiability, p. 227
Differentiability implies continuity, p. 228
Section 2.7
Notational alternatives for derivatives, p. 236
Mean value theorem, p. 238
Relationship between graphs and derivatives, p. 242
Increasing and decreasing, p. 242
IMPORTANT APPLICATIONS
Section 2.1
Rate of change of CO2
Section 2.2
Type I functional response (wolf with prey)
Section 2.3
Equilibrium of a salmon population.
Section 2.4
Cell’s response to an external stimulus
Section 2.5
Sockeye salmon dynamics
Section 2.6
Enzyme activity
Section 2.7
Ant Biodiversity
Hummingbirds foraging for food

Problem Set 2.8

Find at what point the slope of the instantaneous rate of change equals the average rate of change over the specified
intervals specified in Problems 1 to 2. Also, provide a sketch that illustrates this relationship.
1. f (x) = x4 over the interval [−2, −1]

2. f (x) = x3 − x over the interval [−1, 2]
Use the definition of derivative to find f ′ (x) for the functions in Problems 3 to 8.
3. f (x) = 10.
4. f (x) = 5 − 7x
5. f (x) = x − 3x2
1
6. f (x) = x2
7. f (x) = x2 − 2x + 1
8. f (x) = 14 x1/4
9. Find the average rate of change of f (x) = x2 − 2x + 1 on [1, 3] and the instantaneous rate of change at x = 1.
10. Consider f (x) = 9 − x2 and g(x) = ln x.
a. Graph y = f (x) and y = g(x) and on the same coordinate axes.
b. Plot the point P (2, ln 2) on the graph of g. Graphically, estimate the position of the line tangent to
g at the point P .
c. Plot the point Q(2, 5) on the graph of f . Algebraically find the line tangent to f at the point Q. Use
the equation of this tangent line to show that it “kisses” the graph of f at the point Q.
16−x2
11. Find limx→4 x−4 with the suggested methods.
a. Graphically
b. Using a table of values
c. By algebraic simplification
d. By using the informal definition of limit
e. Using technology

2 − 2x if x < 2, x 6= 1
12. Let f (x) = −1 Find the requested limits.
x−4 if x > 2 and x 6= 5
a. limx → 1 f (x)
b. limx → 2− f (x)
c. limx → 4+ f (x)
d. limx → 4 f (x)
e. Is f continuous at x = 7? If not, is this reparable?
13. Evaluate the sequential limits,if they exist.
2n3 +4n
a. limn → ∞ 1−2n2 −5n3
1
b. limn → ∞ an where a1 = 1, and an+1 = an

x
c. limn → ∞ an where a1 = 14, and an+1 = f (an ) for f (x) = 2
In each of Problems 14 to 16, the graphs of a function f ′ is given. Draw a possible graph of f .
14.
15.
16.
17. An environmental study of a certain suburban community suggests that t years from now, the average level of
CO2 in the air can be modeled by the formula
q(t) = 0.05t2 + 0.1t + 3.4
parts per million.

a. At what rate will the CO2 level be changing with respect to time one year from now?
b. By how much will the CO2 level change in the first year?
c. By how much will the CO2 level change over the next (second) year?
18. The canopy height (in meters) of a tropical elephantgrass (Pennisetum purpureum) is modeled by
h(t) = −3.14 + 0.142t − 0.0016t2 + 0.00000 79t3 − 0.00000 00133t4
where t is the number of days after mowing.

a. Sketch the graph of h(t).
b. Sketch the graph of h′ (t)
c. Approximately when was the canopy height growing most rapidly? Least rapidly?
19. The concentration C(t) of a drug in a patient’s bloodstream is given by
t in minutes 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
C in mg/ml 0 0.2 0.4 0.6 0.8 0.9 1.0 0.9 1.0 0.9 0.7
a. Estimate C ′ (t) for t = 0, 0.1, . . . 0.9

b. Sketch C ′ (t) and interpret it.
20. Suppose that systolic blood pressure of a patient t years old is modeled by
P (t) = 39.73 + 23.5 ln(0.97t + 1)
for 0 ≤ t ≤ 65, where P (t) is measured in millimeters of mercury.

a. Sketch the graph of y = P (t)
b. Using the graph in part a, sketch the graph of y = P ′ (t)
2.9 Group Projects

Working in small groups is typical of most work environments, and learning to work with others to communicate
specific ideas is an important skill. Work with three or four other students to submit a single report based on each
of the following questions.
Project 2A: A simple model of gene selection

One of the simplest problems in population genetics is to consider what happens to a particular version of a gene,
where each version is referred to as an allele, that is being selected for or against because it confers some advantage
or disadvantage to carriers of that allele. Examples of disadvantageous or deleterious alleles are those associated with
various genetic diseases such as sickle cell anemia, hemophilia, Tay-Sachs disease and so on. Most of our genes come
in pairs of alleles, and if one of them is deleterious then the effect of the allele may often be partially or fully masked
by the other allele in the pair. If we have a double dose of the deleterious allele then the disease is expressed in its
severest form. If we have a single dose of the deleterious allele (i.e. we have one normal and one deleterious allele)
then, depending on the disease, a milder version of the disease may be expressed (partial masking) or the individual
is completely healthy (full masking). In this latter case, the individual is said to be a carrier for the disease (e.g.
hemophilia).
On the other hand alleles may confer a strong advantage to an organism that carries them. For example, if an
insect carries an allele of a particular gene that allows it to detoxify a pesticide or a virus carries an allele that allows
it to neutralize an otherwise effective drug, then we say that these pests and pathogens carry alleles of genes that
confer resistance to the chemicals that would otherwise control or kill them.

Finally, sometimes individuals who carry two different alleles of a particular gene are better off than individuals
that have two copies of the same allele, irrespective of which allele it is. This condition is referred to by biologist as
heterozygous superiority and is associated with the phenomenon called hybrid vigor. For example, individual humans
or other vertebrates are going to be better at fighting diseases if they have different alleles of genes associated with
antibody production by their immune systems.
Population geneticists have a devised a simple model that allows them to assess what happens to such alleles.
The form of this model is pn+1 = f (pn ) where pn represents the frequency of the allele in question in the population
in the nth generation: if pn = 1 then every individual in the population has a double dose of the allele in question, if
pn = 0 then no one has even a single dose of this allele, and if pn = 0.5 some individuals don’t have the allele, some
have a single and some a double dose of the allele, but the total frequency in the population of this allele is 1/2.
In this simple allele frequency model, the specific form of f (p) is:
p (ap + (1 − p))
f (p) =
ap2 + 2p(1 − p) + b(1 − p)2
where a ≥ 0 and b ≥ 0 are constants that determine whether the allele in question confers an overall advantage,
disadvantage, or is associated with heterozygote superiority.
In this project, investigate the value of the equilibria that arise for various combinations of a and b, paying
particular attention to whether a and/or b are greater than or less than 1. Interpret the various cases in terms of the
limiting values of the sequence of frequencies pn , as well as how these cases correspond to classification of the alleles
as advantageous, deleterious, or associated with heterozygote superiority. Find specific cases in the literature, or by
searching the web, to illustrate these three phenomena.
Project 2B: Fibonacci rabbit growth when death is included

In the rabbit population growth process proposed by Fibonacci (Example 10, Section 2.5), the assumption is that all
the rabbits live forever. As an alternative, let’s assume that only a proportion s of rabbits alive each month survive
to the next month (that is independent of how old they are or what gender they are) and, of those that survive, only
a proportion p of the females from the month before produce a litter that always consists of r male-female pairs.
Investigate the growth of this population by carrying out the following tasks:
a. Derive an equation for the rate at which the proportion of pairs increases from month to month as a
function of the three population parameters 0 < s ≤ 1, 0 < p ≤ 1, and r a positive integer. Note, that
getting the correct equation can be a little tricky so use a diagram as illustrated in Fig. 2.18 to help
you. In particular, starting out with a suitable number of new-born pairs draw diagrams for the cases
(s, p, r) = (1/2, 1, 1), (1,1/2,1), and (1,1,2) and use these to help you construct a general expression for
an = f (an−1 ) that contains the three parameters in question.
b. What value must the equilibrium solution be for the population to be neither growing nor declining in the
limit? For the case r = 1, express s as a function of p such the population is neither growing nor declining.
Hence in the square of the positive quadrant of the p-s plane defined by 0 ≤ p ≤ 1 and 0 ≤ s ≤ 1 shade
all points where the rabbit population is growing and all points where it is declining.
c. Repeat the above exercise for the case r = 2 and hence make a general statement about how the two
shaded areas change as r increases.


Chapter 3
Derivative Rules and Tools
3.1 Derivatives of Polynomials and Exponentials, p. 263
3.2 Product and Quotient Rules, p. 274
3.3 Chain Rule and Implicit Differentiation, p. 288
3.4 Trigonometric Derivatives, p. 304
3.5 Linear Approximation, p. 312
3.6 Higher-Order Derivatives and Approximations, p. 323
3.7 l’Hôpital’s Rule, p. 337
Figure 3.1: The North American Bison (Bison bison bison) is one of two species of bison that roamed the great
plains. Their numbers have dropped from tens of millions to thousands over the last 200 years.
PREVIEW
If you are feeling a little apprehensive about taking calculus, it is quite natural—but persevere and you will come
to see its beauty and be awed by its power.
I loved history, but still loved science, and thought maybe you don’t need quite as much calculus to be a
biology major. Elizabeth Moon, Science Fiction Writer, b. 1945.

262
Until this chapter we have only been able to compute derivatives by directly appealing to the definition of the
derivative. As you may have already noticed, computing derivatives in this manner quickly becomes tedious. In this
chapter, we provide the rules and tools that allow us to quickly compute the derivative of any imaginable function.
Learn these rules. In the words of Colin Adams, Joel Hass, and Abigail Thompson,∗
Know these backwards and forwards. They are to calculus what “Don’t go through a red light” and
“Don’t run over a pedestrian” are to driving.
The first three sections of this chapter provide these basic rules for calculating derivatives, with the next section
focusing on the important trigonometric functions. The last three sections expand these basic tools to allow us to
apply these tools to a variety of applications in biology and the life sciences. One application, for example, pertains
to predicting the growth of the Yellowstone bison population. In particular, 200 years ago it is estimated that 20 to
70 million bison roamed the great plains, but by the early 1900s the numbers had dwindled to thousands. In 1894
it became illegal to kill bison in Yellowstone Park, and the American Bison Society was formed in 1905 to save the
bison. In Section 3.5 we travel back in time to 1908 and use linear approximation to predict the growth of bison in
Yellowstone Park. Other applications that we encounter include growth of fetal hearts, the clearance of HIV viral
particles from the body, Northwestern crows breaking whelk shells, working with dose-response curves in the context
of administering drugs, calculating the likelihoods that insect pests escape parasitism, and calculating mortality rates
due to airborne diseases.
∗ How to Ace Calculus: The Streetwise Guide, New York: W. H. Freeman and Company, 1998.

3.1. DERIVATIVES OF POLYNOMIALS AND EXPONENTIALS 263
3.1 Derivatives of Polynomials and Exponentials

Derivatives of y = xn
In Example 1 in Section 2.7, we proved that
d
x = 1
dx
d 2
x = 2x
dx
d 3
x = 3x2
dx
d n
and we guessed that dx x = nxn−1 . We will now prove this powerful result, known as the power rule:
For any real number n 6= 0,

Power rule d n
x = nxn−1
dx
At this point, we are only equipped to prove the power rule for any natural number n. Later on, we shall prove
the general power rule. The proof when n is a natural number involves the binomial expansion of (a + b)n , which we
remind you in terms of its first three and last terms, is
n(n − 1) n−2 2
(a + b)n = an + nan−1 b + a n + · · · + bn .
2
(If you don’t remember this expansion look it up on the web.) Now to the proof for n a natural number:
Proof. If f (x) = xn , then from the binomial theorem
f (x + h) = (x + h)n
n(n − 1) n−2 2
= xn + nxn−1 h + x h + · · · + hn
2
From the definition of derivative we have
f (x + h) − f (x)
f ′ (x) = lim
h→0 h
h i
n(n−1) n−2 2
x + nxn−1 h +
n
2 x h + · · · + hn − [xn ]
= lim
h→0 h
n(n−1) n−2 2
nxn−1 h + 2 x h + · · · + hn
= lim
h→0 h
h i
n(n−1) n−2
h nxn−1 + 2 x h + · · · + hn−1
= lim
h→0 h

n(n − 1)
= lim nxn−1 + xn−2 h + · · · + hn−1
h→0 2
= nxn−1
Note that if n = 0, then f (x) = xn = x0 = 1, so f ′ (x) = 0.

264 3.1. DERIVATIVES OF POLYNOMIALS AND EXPONENTIALS
Example 1. Using the power rule
Find

d 5
a. dx x
x=2
d 29
b. dQ Q
Solution.
a.
d 5
= 5x4 = 5 · 24 = 80

x
dx x=2 x=2
b.
d 29
Q = 29Q28
dQ
2
Derivatives of Sums, Differences, and Scalar Multiples

The limit laws from Chapter 2 allow us to quickly compute the derivatives of a sum, difference, or scalar multiple
whenever we know the derivatives for f and g.
Let f and g be differentiable at x. Let c be a constant. Then

Sum (f + g)′ (x) = f ′ (x) + g ′ (x)
Elementary
Differentiation Rules Difference (f − g)′ (x) = f ′ (x) − g ′ (x)
Scalar multiple (cf )′ (x) = c f ′ (x)
In other words, the derivative of a sum is the sum of the derivatives, the derivative of a difference is the difference
of the derivatives, and the derivative of a scalar multiple is the scalar multiple of the derivative.
Combining these elementary differentiation rules with the power rule allows us to differentiate any polynomial.
Example 2. Using Differentiation Rules
Let f (x) = x3 + 3x2 + 10

a. Find f ′ . Justify each step of your differentiation.
b. Determine on what intervals f is increasing and on what intervals f is decreasing.
Solution.
a.
d 3 d 3 d 2 d
(x + 3x2 + 10) = x + 3x + 10 Sum rule
dx dx dx dx
d 3 d d
= x + 3 x2 + 10 Scalar Multiple rule
dx dx dx
2
= 3x + 6x + 0 Power Rule
Hence
f ′ (x) = 3x2 + 6x

b. To determine where f is increasing and where f is decreasing, we need to find where f ′ > 0 and f ′ < 0,
respectively. Since
f ′ (x) = 3x2 + 6x = 3x(x + 2)
we look at the signs of the factors and the product by looking at a number line.
On the interval (−∞, −2), f is increasing since f ′ (x) > 0; on (−2, 0) , f is decreasing since f ′ (x) < 0, on
(0, ∞), f is increasing again since f ′ (x) > 0. Graphing y = f (x) confirms these calculations.
Example 3. Growth of a fetus heart
In 1992, a team of cardiologists determined how the left ventricular length, L (in cm), of the heart in a fetus
increases from 18 weeks until birth.∗ They found
L(t) = −2.318 + 0.2356t − 0.002674t2
where t is the age of the fetus (in weeks). Here t = 18 means “at the end of week 18.”
a. Find L′ (t) for 18 ≤ t ≤ 38
b. Discuss and interpret the units of L′ (t)
c. During which week is the ventricular length growing most rapidly and what is the associated rate?
Solution.
a.
dL d d d
= (−2.318) + 0.2356 (t) − 0.002674 (t2 ) Elementary differentiation laws
dt dt dt dt
= 0 + 0.2356 · 1 − 0.002674 · 2t Power law
= 0.2356 − 0.005348 t
∗ Tan, J., Silverman, N., Hoffman, J. , Villegas, M. and Schmidt, K., “Cardiac dimensions determined by cross-sectional echocardiog-
raphy in the normal human fetus from 18 weeks to term,” 1992. American Journal of Cardiology, 70: 1459–1497.

Figure 3.2: Fetal echocardiogram reveals a four-chamber heart correctly oriented in the left chest. Source: Fetal
Heart Screening Examination by Dr. Vladimir Nikolaev, Batumi, Georgia.
b. The units of L′ (t) are cm per week. L′ (t) describes the rate at which the ventricular length is growing.
c. Since L′ (t) is a linear function with negative slope, its largest value on the interval [18, 38] is at t = 18
and its smallest value on this interval is at t = 38. In particular,
L′ (18) = 0.2356 − 0.005348 × 18 = 0.139336 cm/week
and
L′ (38) = 0.2356 − 0.005348 × 38 = 0.032376 cm/week.
Hence, the ventricular length in the last 20 weeks of pregnancy is increasing most rapidly at the beginning
of this 20 week period and growing least rapidly at the time of birth.
2
In addition to finding derivatives of all polynomials, we can use the power rule and the scalar multiplication rule
to find derivatives of all scaling laws.
Example 4. Back to lifting weights
In Example 6 in Section 2.4 [x-ref], we modeled the amount an Olympic weightlifter could lift as
L = 20.15M 2/3 kilograms

dL
where M is the body mass in kilograms of the weightlifter. Find and interpret dM for M = 90.
Solution. To compute the derivative,

dL d
M 2/3

= 20.15
dM M=90 dM M=90
2
−1/3
= 20.15 · · M
3 M=90

≈ 3.00 correct to 2 decimal places
Hence for weightlifters weighing close to 90 kilograms, the rate at which the amount lifted increases with mass of the
weightlifter is 3.00 kilograms per kilogram of mass. 2
Derivatives of Exponentials
Consider the function f (x) = ax for some positive constant a > 0. To find the derivative, we use the definition of
derivative. Let x be a fixed number.
f (x + h) − f (x)
f ′ (x) = lim Definition of derivative, provided the limits exists.
h→0 h
ax+h − ax
= lim Since f (x) = ax
h→0 h
(ah − 1)ax
= lim Common factor
h→0 h

ah − 1 x
= lim a Property of limits
h→0 h
ah −1
= kax provided k = limh→0 h exists
= kf (x)
ah −1
Although it is beyond the scope of this book, it can be shown that k = limh→0 h exists whenever a > 0. In
the following example, we will estimate the value of k for the case a = 2.
Example 5. Derivative of 2x
d x 2h −1
Find dx 2 by estimating the limit limh→0 h .
d x 2h −1
Solution. dx 2 = k 2x where k = limh→0 h . To estimate this limit, we can create the following table with a
calculator:
2h −1 2h −1
h h h h
0.1 0.717735 −0.1 0.66967
0.01 0.695555 −0.01 0.69075
0.001 0.693387 −0.001 0.692907
0.0001 0.693171 −0.0001 0.693123
d x
Since k ≈ 0.693, dx 2 ≈ (0.693)2x. We show in Example 6 that in fact k = ln 2 ≈ 0.69315.
2
Since f ′ (x) = kf (x) for an appropriate choice of k whenever f (x) = ax , we can ask “when is k = 1?” We answer
with the following definition involving the number e = 2.71828 . . ., which was proved to be irrational by one of the
most famous French mathematician’s, Joseph Fourier (1768-1830).
The irrational number e is the only number that has the property:
A Definition of e eh − 1
lim =1
h→0 h
We first encountered this number in Section 1.5 where we compared the irrational numbers π and e, but we did
not define them at that time. This is an important number in mathematics—so important, in fact, that there is even
an “e key” on your calculator. Here are some facts related to e.

• It is known as Euler’s constant, after the famous mathematician Leonhard Euler (1707-1783).
n
• An equivalent definition of e is e = limn→∞ 1 + n1 .
• The function f (x) = ex is the simplest of all functions when it comes to differentiation. It is the only function
h
that remains unchanged under the operation of differentiation: that is, f ′ (x) = ex since limh→0 e h−1 = 1.
• “Who has not be amazed to learn that the function y = ex , like a phoenix rising again from its own ashes, is
its own derivative?”∗
Armed with the derivative of ex , we can differentiate any exponential function.
For any real number a,

d ax
e = aeax
dx
Derivative of the Furthermore, if b = ea , then
natural exponential
d x
b = abx = (ln b)bx
dx
Proof. If a = 0, then
d 0
e =0
dx
so the statement is true. If a is any nonzero real number, then since eax = (ea )x ,

d ax eah − 1 ax
e = lim e
dx h→0 h
To find this limit define ∆x = ah. Since h = ∆x/a and h → 0 whenever ∆x → 0,
eah − 1 e∆x − 1
lim = lim Substitute ∆x = ah
h→0 h ∆x→0 ∆x/a
e∆x − 1
= a lim Limit law for products
∆x→0 ∆x
= a·1 Definition of e
Thus, we have shown that

d ax
e = aeax
dx
For the last part, write b as ea for some a, and finish the details on your own in Problem 25. 2
Example 6. Déjà Vu
d x
Find the exact value of dx 2 .
d x
Solution. dx 2 = (ln 2)2x , which agrees with the estimate in Example 5. 2
Example 7. Clearance of HIV
Human immunodeficiency virus (HIV) is a bloodborne pathogen that is typically transmitted through sexual
contact or sharing of needles amongst drug users. HIV attacks the immune system and understanding how the
∗ Francois l’Lionnais Great Currents of Mathematical Thought, vol. 1, New York: Dover Publications, 1962.

viral load in the blood of an HIV-infected individual changes with time is critical to treating HIV patients with
a “cocktail” of several antiretroviral drugs. A theoretical immunologist Alan Perelson and colleagues used data
from various experiments to model observed changes in the viral load V (t), in particles per ml, of an HIV patient
undergoing antiretroviral drug therapy for t days.∗ Using regression methods of analysis, they found that if no new
viral particles are generated by the host, then the viral load over time can be modeled by the equation
V (t) = 216, 000 e−0.2t
Find V ′ (t) and interpret it.
Solution.
d −0.2t
V ′ (t) = 216, 000 e Differentiation for scalar multiples
dt
−0.2t
= 216, 000(−0.2)e Derivative of an exponential
= −43, 200e−0.2t
The units of V ′ (t) are particles per mL per day. V ′ (t) describes the rate at which the viral load is changing whenever
it is not replenished by new viral particles. Since V ′ (t) < 0 for all t the viral load is decreasing. (Note: In HIV
infected patients, new viral particles are produced in the various cells found in the blood and in lymph tissue. This is
a second component of the infection process that needs to be understood and added to the above equation to obtain
a more complete model of viral load dynamics within human hosts). 2
We can easily generalize our result regarding the derivative of eax to a result pertaining to the derivative of the
general exponential.
Example 8. Exponential depletion of resources
In Example 2, Section 1.5, we projected that the U.S. population would contain N (t) = 8.3(1.33)t million
individuals t decades after 1815 when the population stood at 8.3 million individuals. Suppose the amount of food
produced each year, measured in terms of “individual rations” (i.e. the amount of food needed to sustain one
individual for one year), grew linearly during this same period with the amount given by the equation
R(t) = 10 + 4t.
The number of surplus rations S(t) over this period can be found by taking the difference of the above two functions:
S(t) = R(t) − N (t)

= 10 + 4t − 8.3(1.33)t
Determine at what point in time S(t) starts decreasing.
Solution. To find where S changes from increasing to decreasing (or vice-versa), we need to determine where S ′ (t)
changes sign from S ′ (t) > 0 to S ′ (t) < 0 (or vice-versa): that is, where S ′ (t) = 0 provided the derivative exists at
the point in question. Now
d
S ′ (t) = [10 + 4t − 8.3(1.33)t] Derivative of both sides of given equation
dt
d d d
= 10 + 4 t − 8.3 (1.33)t Elementary rules of differentiation
dt dt dt
= 0 + 4 − 8.3 ln 1.33 (1.33)t Derivative rules
= 4 − 8.3 ln 1.33(1.33)t
∗ A.S. Perelson, A.U. Neumann, M. Markowitz, J.M. Leonard, D.D. Ho, “HIV-1 Dynamics in vivo: virion clearance rate, infected cell
lifespan, and viral generation time” (1996): Science, 271, 1582-1586. Also, A.S. Perelson, P.W. Nelson, “Mathematical Analysis of HIV-1
Dynamics in vivo,” (1999): SIAM Review, 41, 3–44.

If we now solve for the values of t satisfying S ′ (t) = 0 we obtain:
S ′ (t) = 0
4 − 8.3 ln 1.33(1.33)t = 0
4
= (1.33)t
8.3 ln 1.33
4
ln = t ln 1.33
8.3 ln 1.33
ln 8.3 ln4 1.33
t =
ln 1.33
≈ 1.84
Evaluating S ′ (t) at values of t greater than and less than 1.84 in the neighborhood of 1.84, we find that S ′ > 0 for
t < 1.84 and S ′ < 0 for t > 1.84. Since the units of time are in decades, we see that in the year 1815 + 18.4 ≈ 1833
the surplus of resources will begin to decline. Plotting y = S(t) reveals that at t ≈ 1.84, S(t) appears to take on its
largest value and then begins to decrease, as shown in Figure 3.3.
y
3.5
2.5
1.5
0.5
t
0.5 1 1.5 2 2.5 3
Figure 3.3: Graph of the number of surplus rations
Problem Set 3.1

Differentiate the functions given in Problems 1 to 14. Assume that C is a constant.
1. a. f (x) = x7
b. g(x) = 7x
2. a. f (x) = x4
b. g(x) = 4x
3. a. f (x) = 3x5
b. g(x) = 3(7)5
4. a. f (x) = x3 + C
b. g(x) = C 2 + x
5. a. f (x) = x2 + 3π + C
b. g(x) = π 2 − 2x − C
6. f (x) = 5x3 − 5x2 + 3x − 5

7. f (x) = x5 − 3x2 − 1
8. f (x) = 2x2 − 5x8 + 1

9. s(t) = 4et − 5t + 1
10. f (x) = 5 − e2t
11. f (x) = 5.9(2.25)t
12. f (x) = 82.1(1.85)t
13. g(P ) = Cx2 + 5x + e−2x
14. F (x) = 5eCx − 4x2
Determine on what intervals each function given in Problems 15 to 19 is increasing and on what intervals it is
decreasing.
15. f (x) = x3 − x2 + 1
16. g(x) = 31 x3 − 9x + 2
17. f (x) = x5 + 5x4 − 550x3 − 2, 000x2 + 60, 000x (Round to the nearest tenth.)
18. g(x) = x3 + 35x2 − 125x − 9, 375
19. H(w) = 2w − ew
20. Let f (x) = x1/2 .
a. Find the derivative using the definition of the derivative.

b. Apply the power rule with n = 1/2.
21. Let f (x) = x3/2 .

√
a. Find the derivative using the definition of derivative. Hint : Write x3/2 as x x and rationalize the
numerator.
b. Apply the power rule with n = 1/2.
x1/3
22. Differentiate f (x) = x2 by first simplifying and then by using the power rule.
23. Differentiate g(x) = x2 (x3 − 3x)

x2 −4
24. Differentiate q(x) = x+2
25. Prove that for any real number b > 0
d x
b = (ln b)bx
dx
26. Prove the sum rule:
(f + g)′ = f ′ + g ′
27. Prove the scalar multiple rule:

(cf )′ = cf ′
for a constant c.

28. After pouring a mug full of the German beer Erdinger Weissbier, Dr. Leike measured the height of the beer
froth at regular time intervals.∗ We estimated the height (in cm) of the beer froth as
H(t) = 17 0.94t
where t is measured in 15 second units. Find

dH

dt t=25
and interpret this quantity.
29. A drug that influences weight gain was tested on eight animals of the same size, age and sex.† . Each animal
was randomly assigned to a dose level. After two weeks, the difference in the end and start weight (measured
in dekagrams). The best fitting quadratic to the data is
W = 1.13 − 0.41 D + 0.17 D2
where D is the dose level that ranges from 1 to 8.

dW
a. Find dD and identify its units.
b. When does weight gain increase with dosage level D?
30. Using data from 158 marine species, Professor John Hoenig of the Virginia Institute of Marine Sciences studied
how the natural mortality rate M of a species which depends on the maximum T observed age.‡ Using linear
regression, he found
M = e1.44−9.82 T
where T is measured in years. Find and interpret
dM

dT T =10
31. During a certain epidemic, the number of people who have become ill after t days is given by
2, 000
N (t) =
1 + Ce−t
where C is a constant.
a. If 10 people were ill at the beginning of the epidemic (when t = 0), what is C?
b. At what rate is N (t) increasing when t = 5?
32. A glucose solution is administered intravenously into the bloodstream of a patient at a constant rate of r mg/h.
As the glucose is being administered, it is converted into other substances and removed from the bloodstream.
Suppose the concentration of the glucose solution after t hours is given by
C(t) = r − (r − k)e−t
where k is a constant.
a. If C0 is the initial concentration of glucose (when t = 0), what is C0 in terms of r and k?
b. What is the rate at which the concentration of glucose is changing at time t?
33. In Section 1.4, we found that the amount lifted (in kg) by an Olympic weightlifter can be predicted by the
scaling law
L = 20.15 M 2/3

dL
where M is the mass of the lifter in kg. Find and interpret dM .
M=100
∗ European Journal Physics 23 (2002) 21–26.
† Kleinbaum, Kupper and Muller, page 233
‡ “Empirical use of longevity data to estimate mortality rates.” Fisheries Bulletin. 82 (1983): 898–902

34. In Example 10, Section 1.5 (changing the names of the variables from x and y to M and R), we found that
the metabolic rate (in kCal/day) for animals ranging in size from mice to elephants is given by the function
ln R = 0.75 ln M + 4.2 which yields the equation
R = e4.2 M 3/4 ,
where M is the body mass of the animal in kg.

dR
a. The average California Condor weighs about 10 kg. Find and interpret dM M=10 .

dR
b. The average football player weighs about 100 kg. Find and interpret dM M=100 .
c. Compare and discuss the quantities that you found in a.

274 3.2. PRODUCT AND QUOTIENT RULES
3.2 Product and Quotient Rules

Previously, we saw that the derivative of a sum equals the sum of the derivatives and the derivative of a difference
equals the difference of the derivatives. Armed with these elementary differentiation rules, we might guess that the
derivative of a product is the product of the derivative. The following simple example, however, shows this not to
be the case. Let f (x) = x and g(x) = x2 , and consider their product
p(x) = f (x)g(x) = x3
Because f ′ (x) = 1 and g ′ (x) = 2x, the product of the derivatives is
f ′ (x)g ′ (x) = (1)(2x) = 2x
whereas the actual derivative of p(x) = x3 is p′ (x) = 3x2 . Hence, our naı̈ve guess is wrong! It is also easy to show
that the derivative of a quotient is not the quotient of the derivatives. The goal of this section is to find out what
they are.
Product Rule
To derive a rule for products, we appeal to our geometric intuition by considering areas where ∆x > 0 and f (x)
and g(x) are assumed both to be increasing differentiable functions of x. It should be noted that the algebraic steps
stand alone without considering area or making the above assumptions.
Let
p(x) = f (x) g(x)
|{z} |{z} |{z}
Area of rectangle Length Width
This product of p can be represented as the area of a rectangle:
Next, we find
p(x + ∆x) = f (x + ∆x) g(x + ∆x)

| {z } | {z } | {z }
Area of larger rectangle Length Width

3.2. PRODUCT AND QUOTIENT RULES 275
The next step gives us the area of the “L-shaped” region:
p(x + ∆x) − p(x) = f (x + ∆x)g(x + ∆x) − f (x)g(x)
The key to the proof of the product rule is to rewrite this difference. We can see how to do this by looking at
the area of the “L-shaped” region in another way:
p(x + ∆x) − p(x) = [g(x + ∆x) − g(x)]f (x + ∆x) + [f (x + ∆x) − f (x)]g(x)

| {z } | {z } | {z }
Area of L-shaped region Area of region I Area of region II
Divide both sides by ∆x (where ∆x 6= 0):
p(x + ∆x) − p(x) [g(x + ∆x) − g(x)] [f (x + ∆x) − f (x)]

= f (x + ∆x) + g(x)
∆x ∆x ∆x

The last step in deriving the product rule is to take the limit as ∆x → 0.
p(x + ∆x) − p(x)

p′ (x) = lim
∆x→0 ∆x

g(x + ∆x) − g(x) f (x + ∆x) − f (x)
= lim f (x + ∆x) + g(x)
∆x→0 ∆x ∆x

g(x + ∆x) − g(x) f (x + ∆x) − f (x)
= lim f (x + ∆x) lim +g(x) lim
∆x→0 ∆x→0 ∆x ∆x→0 ∆x
| {z } | {z }
This is the derivative of g This is the derivative of f
= f (x)g ′ (x) + g(x)f ′ (x) lim∆x→0f (x + ∆x) = f (x) because f is continuous
We have just proven the product rule.
Let f and g be differentiable at x. Then

Product Rule (f g)′ (x) = f ′ (x)g(x) + f (x)g ′ (x)
A simple way to remember the product rule is with the mnemonic “the derivative of the product is the derivative
of the first times the second plus the derivative of the second times the first.” Or, if you wish to fit the following old
poem to a melody,
“Sing the product rule in time,

One prime two plus one two prime.
Isn’t mathematics fun,
One prime two plus two prime one.”
Example 1. Computing with the product rule
Find p′ (x) and determine on what intervals p is increasing.
a. p(x) = xex
b. p(x) = x2 2x
Solution.
a. Let f (x) = x and g(x) = ex . Then p(x) = f (x)g(x). By the product rule,
p′ (x) = f ′ (x)g(x) + f (x)g ′ (x)

= 1 · ex + x · ex
= (1 + x)ex
We have p′ (x) > 0 if and only if 1 + x > 0. Hence, p is increasing on the interval [−1, ∞). Indeed, plotting
y = p(x) supports this conclusion:

1.5
0.5
x
-4 -3 -2 -1 1 2
b. Let f (x) = x2 and g(x) = 2x . Then p(x) = f (x)g(x). Recall that f ′ (x) = 2x and g ′ (x) = (ln 2)2x . Hence,
by the product rule,
p′ (x) = f ′ (x)g(x) + f (x)g ′ (x)

= 2x2x + x2 (ln 2)2x
= x2x (2 + x ln 2)
−2
Since p′ > 0 whenever x > 0 or x < ln 2 , p is increasing on these intervals. Indeed, plotting y = p(x)
supports this conclusion.
y
2
1.5
0.5
x
-5 -4 -3 -2 -1 1
Example 2. Survival rates
For a particular species of insect, the probability of an individual surviving heat shock for the day under given
environmental conditions has been estimated to be 0.8 per day at the time it is observed but decreasing at a rate of
0.3 per day beyond that point in time. It has also been estimated that the probability of surviving starvation at the
same point in time of observation is 0.1 per day and increasing at a rate of 0.2 per day. If surviving both starvation
and heat shock are assumed to be independent of one another, then the probability of surviving both sources of
mortality is given by the product of the two probabilities.
a. Find the probability of surviving today.
b. Is the probability increasing or decreasing?
Solution. Let f (x) and g(x) denote the probability of surviving starvation and heat shock, respectively, where x
denotes time in days. The probability of surviving x days from now is given by p(x) = f (x)g(x). We are given that
f (0) = 0.1, g(0) = 0.8, f ′ (0) = 0.2, and g ′ (0) = −0.3.

a. The probability of surviving today is p(0) = f (0)g(0) = 0.1 × 0.8 = 0.08.
b. We use the product rule to find the derivative.
p′ (0) = f ′ (0)g(0) + f (0)g ′ (0)

= 0.2 · 0.8 + 0.1(−0.3)
= 0.16 − 0.03
= 0.13
The probability of surviving is increasing at a rate of 13% per day.
Example 3. Per-capita or intrinsic rate of growth
As we have seen in Section 1.7, single species population models can be of the form
Pn+1 = Pn f (Pn ) = g(Pn )
where Pn is the population abundance in the nth generation, f (P ) is the per-capita growth rate of the population
density as a function of population P , and g(P ) is the growth rate of the whole population as a function of P .
a. Find an expression in terms of f and P for g ′ (0). Briefly explain what this expression represents.
b. A famous model in fisheries, the Ricker model, has f (P ) = λe−bP where λ > 0 is the maximum per-
capita reproductive rate and b > 0 reflects the degree to which organisms interfere and compete with one
another. Find g ′ (0).
Solution.
a. Applying the product rule to the relationship g(P ) = P f (P ), we have

d
g ′ (P ) = P f (P ) + P f ′ (P )
dP
= f (P ) + P f ′ (P )
Evaluating at P = 0,
g ′ (0) = f (0) + cf ′ (0) = f (0)
Hence, the rate g ′ (0) at which growth changes at low densities equals the per-capita growth rate of the
population at low densities.
b. When f (P ) = λe−b P , we obtain g ′ (0) = f (0) = λe0 = λ.
Quotient Rule
Before we derive a quotient rule, we begin with an example for finding the derivative of a reciprocal, which is a
special case of a quotient that has 1 in the numerator.
Example 4. Reciprocal rule
1
Find the derivative of the reciprocal f (x) of a differentiable function f by using the definition of derivative.

1 1
Solution. Let r(x) = f (x) . Then r(x + h) = f (x+h) so using the definition of derivative we find:
r(x + h) − r(x)
r′ (x) = lim Definition of derivative
h→0 h
1 1
f (x+h) − f (x)
= lim
h→0 h
f (x)−f (x+h)
f (x)f (x+h)
= lim Common denominator
h→0 h
f (x) − f (x + h)
= lim Simplifying fraction
h→0 hf (x)f (x + h)
1 f (x) − f (x + h)
= lim lim Limit of a product
h→0 f (x)f (x + h) h→0 h

1 f (x + h) − f (x)
= lim − Since f is continuous
[f (x)]2 h→0 h
1
= [−f ′ (x)] Definition of derivative
[f (x)]2
2
We restate the result of this example for easy reference.
Let f be differentiable at x. Then

d 1 f ′ (x)
Reciprocal Rule =−
dx f (x) [f (x)]2
provided that f (x) 6= 0.
Example 5. Using the reciprocal rule
1
Find the derivative of g(x) = x2 +x+1 .
1
Solution. Let f (x) = x2 + x + 1. Then g(x) = f (x) and f ′ (x) = 2x + 1. By the reciprocal rule,
f ′ (x)
g ′ (x) = −
f (x)2
2x + 1
= − 2
(x + x + 1)2
2
Example 6. Breaking Whelks
Crows feed on whelks by flying up and dropping the whelks on a hard surface to break them. Biologists have
noticed that northeastern crows consistently drop whelks from about 5 meters. As a first step to understanding why
this might be the case, we consider some data collected by the Canadian scientist Reto Zach in which he repeatedly
dropped whelks from various heights to determine how many drops were required to break the whelk.∗ The data is
shown in Figure 3.5.
∗ Zach, Reto, “Selection and dropping of whelks by northwestern crows.” Behavior 67 (1978): 134 - 147.

Figure 3.4: Two types of whelks are pictured; Lightning Whelks (Busycon sinistrum) (left) and the Turnip Whelks
(Busycon contrarium) (right)
30
25
20
15
10
h
2 4 6 8 10 12 14
Figure 3.5: Data collected by Reto Zach showing how the number of drops to break a whelk depends on the height
of the drops.
A best-fitting curve relating the number of drops, D, to the height, h (in meters), for this data is given by
20.4
D(h) = 1 +
h − 0.84
dD
a. Find dh

dD
b. Find dh and interpret this quantity.
h=4
In Chapter 4, we shall use this function to determine the optimal height from which to drop whelks.
Solution.
a.

dD d d 1
= (1) + 20.4 Elementary rules of differentiation
dh dh dh h − 0.84

−1
= 0 + 20.4 Reciprocal rule
(h − 0.84)2
−20.4
=
(h − 0.84)2
b.
dD −20.4
= ≈ −2.04
dh h=4 (4 − 0.84)2

At h = 4 meters, the required number of drops decreases at a rate of −2.04 per meter. For instance, if
we increased the height by approximately 1 meter, we should expect the number of drops to decrease by
approximately 2. This can also be seen on the graph in Fig. 3.5 from the fact that at h = 4, D ≈ 8, while
at h = 5, D ≈ 8 − 2 = 6.
2
Combining the reciprocal and product rule, we can find the derivative of a quotient of functions. Let f and g be
differentiable functions, and assume that g(x) 6= 0.

d f (x) d 1
= f (x) ·
dx g(x) dx g(x)

′ 1 d 1
= f (x) · + f (x) Product rule
g(x) dx g(x)

f ′ (x) −g ′ (x)
= + f (x) Reciprocal rule
g(x) g(x)2
f ′ (x)g(x) − f (x)g ′ (x)
= Common denominator
g(x)2
2
Reiterating the conclusion of this example, we have what is known as the quotient rule.
Let f and g be differentiable at x. Then
f ′ (x)g(x) − f (x)g ′ (x)

Quotient Rule (f /g)′ (x) =
g(x)2
provided g(x) 6= 0.
In the book, How to Ace Calculus: The Streetwise Guide a playful way to remember the the quotient rule is
provided: ∗ Replacing f by hi and g by ho (hi for high up there in the numerator and ho for low down there in the
denominator), and letting D stand in for ‘the derivative of’, the formula becomes:
hi ho Dhi − hi Dho
D =
ho ho2
In words, that is “ho dee hi minus hi dee ho over ho ho.” Another memory song can be sung to the tune of Old
MacDonald’s Farm:
Ho D high less high D ho EIEIO
Then draw the line and down below EIEIO
With a dx here and a dy there,
Here a slope, yes there’s hope, you can cope,
Denominator squared we will go EIEIO.
Example 7. Computing with the quotient rule
Find the following derivatives

d 1+2t
a. dt 3+4t
d ex
b. dx 1+x2
∗ Colin Adams,Joel Hass,Abigail Thompson, W.H.Freeman & Co (1998): ISBN: 0716731606

Solution.
a. Let f (t) = 1 + 2t and g(t) = 3 + 4t. By the quotient rule

d 1 + 2t d f (t)
=
dt 3 + 4t dt g(t)
f ′ (t)g(t) − f (t)g ′ (t)
=
g(t)2
2(3 + 4t) − (1 + 2t)4
=
(3 + 4t)2
2
=
(3 + 4t)2
b. Let f (x) = ex and g(x) = 1 + x2 . By the quotient rule

x
d e d f (x)
2
=
dx 1 + x dx g(x)
f ′ (x)g(x) − f (x)g ′ (x)
=
g(x)2
e (1 + x2 ) − ex 2x
x
=
(1 + x2 )2
x 2
e (x − 2x + 1)
=
(1 + x2 )2
e (x − 1)2
x
=
(1 + x2 )2
Example 8. Dose-response curves revisited
In Example 2 in Section 2.4, a dose response curve for patients responding to a dose of Histamine was given by∗
100ex
R= ,
ex + e−5
dR
a. Find dx .
dR
b. Graph dx to determine at what logarithmic dosage the response is increasing most rapidly.
Solution.
a.

d 100ex 100ex(ex + e−5 ) − ex 100ex
=
dx ex + e−5 (ex + e−5 )2
x−5
100e
=
(ex + e−5 )2
dR
b. Graphing dx yields
∗ K. A. Skau, “Teaching Pharmocodynamics: An introductory module on learning dose-response relationships,” American Journal of
Pharmaceutical Education (2004), 68: Article 73

R
25
20
15
10
x
-10 -8 -6 -4 -2
Hence dRdx takes on its largest value at approximately x = −5 and the response increases most rapidly at
this logarithmic dosage. That is, the dosage is e−5 ≈ 0.0067 mmol.
Problem Set 3.2

Find the derivatives in Problems 1 to 18.
1. p(x) = (3x2 − 1)(7 + 2x3 )
2. p(x) = (x2 + 4)(1 − 3x3 )

4x−7
3. q(x) = 3−x2
x+1
4. q(x) = 1+x2
5. f (x) = x2x
6. f (x) = x3 3x
7. f (x) = (1 + x + x2 )ex
8. f (x) = (e3 + e2 + e)x2
9. F (L) = (1 + L + L3 + L4 )(L − L2 )
10. G(M ) = (M − M 3 )(1 − 4M )
11. f (x) = (4x + 3)2 Hint: Think (4x + 3)(4x + 3)
12. g(x) = (5 − 2x)2

ex
13. f (x) = 1+ex
1+tet
14. g(t) = 1+t
ap
15. f (p) = 1+2p where a is a constant
bm
16. g(m) = 1−3m where b is a constant
2 x 4 x+1
17. F (x) = 3x2 − 3 + 5 + x
1 5
18. G(x) = x2 − x2 + x4

Find the equation for the tangent line at the prescribed point for each function in Problems 19 to 24
19. f (x) = (x3 − 2x2 )(x + 2) where x = 1
20. G(x) = (x − 5)(x3 − x) where x = −1
x+1
21. F (x) = x−1 where x = 0
3x2 +5
22. F (x) = 2x2 +x−3 where x = −1
23. f (t) = et + e−t where t = 0
24. g(t) = t ln t where t = 1.
25. a. Differentiate the function
f (x) = 2x2 − 5x − 3
b. Factor the function in part a and differentiate by using the product rule. Show that the two answers
are the same.
26. a. Use the quotient rule to differentiate
2x − 3
f (x) =
x3
b. Rewrite the function in part a as f (x) = x−3 (2x − 3) and differentiate by using the product rule.
c. Rewrite the function in part a as
f (x) = 2x−2 − 3x−3
and differentiate using the power rule.
d. Show the answers to parts a, b, and c are all the same.
27. The body mass index (BMI) for individual weighing w pounds and h inches tall is given by
703w
B=
h2

A person with a body mass index greater than 30 is considered obese.

dB
a. For an adult that weighs 130 lbs and is 63 inches tall, find and interpret dw
dB
b. For a child that weighs 60 lbs and is 54 inches high, find and interpret dh .
28. Consider the generalized Beverton-Holt model of population growth given by
Pn+1 = g(Pn )
where
P
g(P ) =
1 + (aP )b
and a > 0, b > 0.
a. Find g ′ (P ).
b. Determine what values of b > 0 cause g to be increasing for all P > 0.
c. When b is outside the range of values found in b, determine on what interval g is increasing and on
what interval g is decreasing.
29. A ligand is a molecule that binds to another molecule or other chemically active structure (e.g. a receptor
on a membrane) to form a larger complex. In a study of two ligands I and II competing for the same sites
on a substrate, ligand II is added to a substrate solution that already contains ligand I. As the concentration
of ligand II is increased, the concentration of ligand I bound to the substrate decreases. This one-site ligand
competition process is characterized by the equation:
b−a
T =a+ ,
1 + 10x−c

where T is the concentration of bound ligand I per mg of tissue and x is logarithm of the concentration of
ligand II in the solution. The constants a and b arise from the relative binding rates of the two ligands and
satisfy a > 0 and b > a.
a. Compute dT
dx and determine whether T is increasing or decreasing.
b. Graph T by hand and interpret the quantities a and b.
30. In the 1960s, scientists at Woodshole Oceanographic Institute measured the uptake rate of glucose by bacterial
populations from the coast of Peru.∗ In one field experiment, they found that the uptake rate can be modeled
1.2078x
by f (x) = 1+0.0506x micrograms per hour where x is micrograms of glucose per liter. Compute and interpret
f ′ (20) and f ′ (100).
31. In Example 4 in Section 2.4, we found that the killing rate wolves in North America could be modeled by
3.36x
0.46 + x
where x is measured in number of moose per km2 . Compute and interpret f ′ (0.5) and f ′ (2.0).
32. Cells often use receptors to transport nutrients from outside of the cell membrane to the inner cell. In Example 6
in Section 1.6, we determined that the rate, R, at which nutrients enter the cell depends on the concentration,
N , of nutrients outside the cell. The function
aN
R=
b+N
models the amount of nutrients absorbed in one hour where a and b are positive constants.
a. Find R when N = b. What does this tell you about b?
dR
b. Compute and interpret dN . When is R increasing? When is R decreasing?
33. In Problem 39 in Section 2.4, we modeled how wolf densities in North America depend on moose densities with
the following function
58.7(x − 0.03)
0.76 + x
where x is number of moose per km2 . Determine for what x values f (x) is increasing.
34. The number of children newly infected with a particular pathogen that is trasmitted through contact with their
mothers has been modeled by the function
N (t) = −0.21t3 + 3.04t2 + 44.05t + 200.29
where N (t) is measured in thousands of individuals per year, and t is the number of years since 2000. In
epidemiology, N (t) is known as an incidence function.
a. At what rate is the incidence function N changing with respect to time in the year 2005?
b. When will the incidence start to decline?
c. What will the rate of change of incidence be in the year 2010?
35. Two mathematicians, W. O. Kermack and A. G. McKendrick, showed that the weekly mortality rate during
the outbreak of the Black Plague in Bombay in 1905–1906 can be reasonably well described by the function
f (t) = 890 sech2 (0.2t − 3.4) deaths/week
where t is measured in weeks. Determine when the mortality rate is increasing and when the mortality rate is
decreasing. The function sech x is an important function in mathematics that is called the hyperbolic secant
and is defined by the formula
2
sech x = x
e + e−x

36. In Problem 41 in Section 2.3, two fisheries scientists∗ found that the following stock-recruitment function
provides a good fit to data pertaining to the Southeast Alaska pink salmon fishery:
y = 0.12x1.5 e−0.00014x .
number of young fish recruited, and x is the number of adult fish involved in recruitment.
dy
a. Compute dx .
b. Determine for what x values is y increasing and decreasing. Interpret your results.
∗ T. J. Quinn and R. B. Deriso, 1997. Quantitative Fish Dynamics. Oxford UP.

288 3.3. CHAIN RULE AND IMPLICIT DIFFERENTIATION
3.3 Chain Rule and Implicit Differentiation

In this section, we move to the next level in terms of developing tools to differentiate functions that can be
regarded as the composite of more elementary functions, as discussed in Section 1.6. This gives us the power to
2
differentiate functions such as the bell-shaped curve y = e−x , the complex polynomial y = (1 + 2x + x3 )101 , and the
logarithm function y = ln x.
Chain Rule
Suppose we were asked to find the derivative of the function y = (1 + 2x + x3 )101 . It is not practical to expand this
product in order to take the derivative of a polynomial. Instead, we use a result known as the chain rule. In order
to motivate this important rule, we consider an application.
If it is known that the carbon monoxide pollution in the air is changing at the rate of 0.02 ppm (parts per million)
for each person in a town whose population is growing at the rate of 1,000 people per year. To find the rate at which
the level of pollution is increasing with respect to time, we must compute the product
(0.02 ppm/person)(1, 000 people/year) = 20 ppm/year
We can generalize this common sense calculation by noting that the pollution level, L, is a function of the population
size, P , which itself is a function of time, t. Thus, L as a function of time is (L ◦ P )(t) or, equivalently, L[P (t)].
With this notation, the common sense calculation becomes:

RATE OF CHANGE OF L RATE OF CHANGE OF L RATE OF CHANGE OF P
=
WITH RESPECT TO t WITH RESPECT TO P WITH RESPECT TO t
Expressing each of these rates in terms of an appropriate derivative of L[P (t)] in Leibniz form, we obtain the following
equation:
dL dL dP
=
dt dP dt
These observations anticipate the following important theorem known as the chain rule.
If y = f (u) is a differentiable function of u and u, in turn, is a differentiable function

of x, then y = f [u(x)] is a differentiable function of x, and its derivative is given by
the product
dy dy du
Chain Rule =
dx du dx
Equivalently,
(f ◦ u)′ (x) = f ′ [u(x)]u′ (x)
Proof. To prove the chain rule, define

(
f [u(x+h)]−f [u(x)]
u(x+h)−u(x) if u(x + h) 6= u(x)
G(h) = ′
f [u(x)] otherwise
It should be intuitive that G(h) is continuous at h = 0. (You will be asked to verify this statement in the problem
set.) With this observation in hand, the proof of the chain rule becomes straightforward. By the definition of the
derivative.
f [u(x + h)] − f [u(x)]
(f ◦ u)′ (x) = lim Definition of derivative; note h 6= 0.
h→0 h

u(x + h) − u(x)
= lim G(h) · Definition of G
h→0 h

3.3. CHAIN RULE AND IMPLICIT DIFFERENTIATION 289
u(x + h) − u(x)
= lim G(h) lim Limit law for products
h→0 h→0 h
= G(0) g ′ (x) Continuity of G at 0 and differentiability of g at a
= f ′ [u(x)] u′ (x) Definition of G
2
Example 1. Life made easier
d dy
Let y = dx (1 + 2x + x3 )101 . Find dx .
Solution. We view this as the composition of two functions: the “inner” function u(x) = 1 + 2x + x3 and the
“outer” function f (u) = u101 . We can now use the chain rule:
dy dy du
= ·
dx du dx
= 101u(x)100 (0 + 2 + 3x2 )
= 101(1 + 2x + x3 )100 (2 + 3x2 )
In practice, we usually do not write down a function u, but carry out the above process mentally and write:
y = (1 + 2x + x3 )101
dy
= 101(1 + 2x + x3 )100 (2 + 3x2 )
dx | {z } | {z }
derivative of derivative of
outer function inner function
2
Example 2. Escaping parasitism
Parasitoids, usually wasps or flies, are insects whose young develop on and eventually kill their host, typically
another insect. Parasitoids have been extremely successful in controlling insect pests especially in agriculture. To
better understand this success, theoreticians have extensively modeled host-parasitoid interactions. A key term in
these models is the so-called escape function f (x) that describes the fraction of hosts that escape parasitism when
the parasitoid density is x individuals per acre. If parasitoid attacks are randomly distributed amongst the hosts,
then the escape function is the form f (x) = e−ax where a is the searching efficiency of the parasitoid.
Suppose that a population of parasitoids attacks alfalfa aphids with a searching efficiency of a = 0.01. If the
density of parasitoids is currently 100 wasps per acre and increasing at a rate of 20 wasps per acre per day, find the
rate at which the fraction of aphids escaping parasitism is changing.
Solution. Let time in days be denoted by the independent variable t. Since the density of wasps x(t) is changing
with time, the fraction of hosts that escape is a composition of two functions f [x(t)]. Hence, by the chain rule
df df dx
= · Chain rule
dt dx dt
dx
= (−0.01)e−0.01x · 20 = 20 is given
dt
= −0.2e−0.01x
In this example, we seek to find f at time t = 0; note that x(0) = 100 and evaluate
d
f [x(t)] = −0.2e−0.01(100) ≈ −0.074.
dt t=0

Thus the fraction of hosts escaping is decreasing at a rate of 0.074 per day. 2
In Example 2, we found the derivative of f (x) = e−0.01 x by using Theorem 3.2, but we could also use the chain
rule. It is worthwhile to restate an extended derivative rule for a natural exponential function:
d u du
e = eu
dx dx
We illustrate this idea with the following example.
Example 3. The Bell Shaped Curve
Consider the bell-shaped function

2
f (x) = e−x
a. Find f ′ and determine where f is increasing and where it is decreasing.
b. Plot f and f ′ on the same coordinate axes, and graphically verify the results from part a.
Solution.
a. Using the extended derivative rule for a natural exponential function with u = −x2 , we find
2 2
f ′ (x) = e−x (−2x) = −2xe−x
2
Since e−x > 0, we have the derivative is positive when x < 0, so the function f is increasing on (−∞, 0);
and is negative when x > 0, so the function is decreasing on (0, ∞).
b. The graph of y = f (x) is shown in red and y = f ′ (x) in blue in Figure 3.6.
y
1
0.75
0.5
0.25
x
-2 -1 1 2
-0.25
-0.5
-0.75
Figure 3.6: Graph of f and f ′
We see that the derivative function (blue) is positive where the bell-shaped curve (red) is rising, and the
derivative function is negative where the bell-shaped curve is falling.
Example 4. Chain rule with graphs
Consider the functions y = f (x) and y = g(x) whose graphs in red and blue, respectively, are shown Figure 3.7.

y
6
x
-1 -0.5 0.5 1 1.5 2
-2
Figure 3.7: Graph of y = f (x) in red and y = g(x) in blue
Find
(f ◦ g)′ (−0.5)
Solution. By the chain rule,

(f ◦ g)′ (−0.5) = f ′ [g(−0.5)]g ′ (−0.5)
By inspection, g(−0.5) = 2. To find the derivative of g at −0.5, we note that g is linear on the interval [−1, 0] so the
derivative is the slope of the line segment which we find by rise/run = 2/1 = 2, thus, g ′ (−0.5) = 2.
To find the derivative of f at g(−0.5) = 2, we note that f (red curve) is linear on [0, 2] has slope m = rise/run
8/2 = 4. Thus f ′ [g(−0.5)] = f ′ (2) = 4.
Thus, we conclude that
d
f (g(x)) = f ′ (g(−0.5))g ′ (−0.5) = 2 · 4 = 8
dx x=−0.5
2
Implicit Differentiation
√ √
The equation y = 25 − x2 explicitly defines f (x) = 25 − x2 as a function of x for −5 ≤ x ≤ 5, but the same
function can also be defined implicitly by the equation x2 + y 2 = 25, as long as we restrict y by 0 ≤ y ≤ 5 so the
vertical line test is satisfied. To find the derivative of the explicit form, we use the chain rule:
d p d
25 − x2 = (25 − x2 )1/2
dx dx
1
= (25 − x2 )−1/2 (−2x) Chain rule with f (u) = u1/2 and u(x) = 25 − x2
2
−x
= √
25 − x2
To obtain the derivative of the same function in its implicit form, we simply differentiate across the equation
x2 + y 2 = 25, remembering that y is a function of x:
d 2 d
(x + y 2 ) = (25) Differentiate both sides
dx dx
dy
2x + 2y = 0 Chain rule for the derivative of y
dx
dy x dy
= − Solve for .
dx y dx
x
= −√ Write as a function of x, if desired.
25 − x2

The procedure we have just illustrated is called implicit differentiation.
Example 5. Circular tangents
Consider a circle of radius 5 centered at the origin. Find the equation of the tangent line of this circle at (3, 4).
y
8
x
-8 -6 -4 -2 2 4 6 8
-2
-4
-6
-8
Figure 3.8: Tangent line (red) to a given circle
Solution. The equation of this circle is

x2 + y 2 = 25
We recognize that this circle is not the graph of a function. However, if we look at a small neighborhood around the
point (3, 4), as shown in Figure 3.8, we see that this part of the graph does pass the vertical line test for functions.
Thus, the required slope of the tangent line can be found by evaluating the derivative of dy/dx at (3, 4). We have
found that
dy x
=−
dx y
so that the slope of the tangent at P (3, 4) is
dy 3
=− .
dx 4
Thus the equation of the tangent line is
3
y−4 = − (x − 3)
4
3 9
y = − x+
4 4
2
More generally, given any equation involving x and y, we can differentiate both sides of the equation, use the
dy
chain rule, and solve for dx . This becomes particularly important when one cannot (or not easily) express y in terms
of x explicitly, as illustrated by the following example.
Example 6. Limaçon of Pascal

The limaçon of Pascal is a famous curve that is defined by the set of points that satisfy
(x2 + y 2 − 2x)2 = x2 + y 2
0
y
-1
-2
-1 0 1 2 3
x
Figure 3.9: The Limaçon of Pascal
The graph is shown in Figure 3.9. This curve was discovered by Étienne Pascal who was the father of the more
famous Blaise Pascal. The name limaçon comes from the Latin limax which means “a snail.” Find the equation for
the tangent line at the point (0, 1).
Solution. To find the slope of the tangent line, we differentiate both sides implicitly and then evaluate at (0, 1).
(x2 + y 2 − 2x)2 = x2 + y 2 Given equation
d 2 d
(x + y 2 − 2x)2 = (x2 + y 2 ) Differentiate both sides with respect to x.
dx dx
dy dy
2(x2 + y 2 − 2x)(2x + 2y − 2) = 2x + 2y Chain rule
dx dx
dy dy
2(02 + 12 − 2 · 0)(2 · 0 + 2 · 1 − 2) = 2(0) + 2(1) Evaluate at (0, 1).
dx dx
dy dy
2(1)(2 − 2) = 2 Simplify.
dx dx
dy dy
4 −4 = 2
dx dx
dy
2 = 4
dx
dy
= 2 Solve for dy/dx.
dx
Hence the slope of the tangent line is 2 and the tangent line is
y−1 = 2(x − 0)
y = 2x + 1
The tangent line is shown in Figure 3.10.
2

1.5
0.5
y
-0.5
-1
-1.5
-0.5 0 0.5 1 1.5 2 2.5 3

x
Figure 3.10: Limaçon with tangent at (0, 1)
Derivatives of Logarithms
Implicit differentiation allows us to easily find the derivative of logarithms and power functions.
If y = ln x, then
Derivative of Natural Loga- dy 1
rithm =
dx x
Proof.
y = ln x Given function
y
e = x Definition of logarithm
d y d
e = (x) Derivative of both sides
dx dx
dy
ey = 1 Chain rule
dx
dy 1
= Solve for dy/dx
dx ey
1
= Substitute
x
2
In the problem set you are asked to prove the more general statements of this theorem, namely if y = ln |x| then
dy
dx = x1 .
Example 7. Clearance of Acetaminophen
A group of scientists estimated that the clearance rate of the drug Acetaminophen in the blood stream of an
average adult is 0.28 per hour.∗ This means that after an initial dose of Acetaminophen at time t = 0, the fraction
∗ Ritschel, W.A., Handbook of Basic Pharmacokinetics, 2nd Ed., Drug Intelligence Publications, 1980, pp. 413-426. Also see

of acetaminophen in the blood t hours later is e−0.28t .

a. Find the time, T , it takes for a fraction x of the drug to clear the body.

b. Find and interpret dT

dx .
x=1/2
Solution.
a. Since e−0.28t is the fraction of drug remaining in the body and x is the fraction that has left the body,
we need to solve
1−x = e−0.28T
ln(1 − x) = −0.28T
T = −3.57 ln(1 − x)
b. We use the results of part a to find
dT (−3.57)(−1)
=
dx x=1/2 1−x x=1/2
3.57
= = 7.14
1 − 0.5
1
Thus the time it takes to clear an extra percentage of the drug, given 50% (x = 2 is 50%) has cleared, is
approximately 7.14 · 0.01 = 0.0714 hr or 4 min and 17 sec.
2
If the base on the logarithm has a base other than the natural base, e, then we use the following result which you
are asked to verify in the problem set:
If b is a positive number (other than 1) then

Derivative of d 1
General Logarithm logb x =
dx x ln b
Example 8. Derivative of a log with base 2
Differentiate f (x) = x log2 x.
Solution.

′ d d
f (x) = x log2 x + x log2 x by the product rule
dx dx
1 1
= (1) log2 x + x by derivative of general logarithm
ln 2 x
1
= log2 x +
ln 2
2
In Section 3.1, we stated the power rule and promised to prove it for all real numbers later in this chapter. We
do this in the following example.
http://www.boomer.org/c/p1/Ch04/Ch0405.html

Example 9. Power law for positive real numbers
Consider y = xn where x > 0 and n is any real number other than 0. Prove that
dy
= nxn−1
dx
Solution. We will prove this by taking the natural logarithm of both sides and then differentiate to find the
derivative.
y = xn Given equation
ln y = ln xn Take the natural logarithm of both sides.
ln y = n ln x Property of logarithms
1 dy 1
= n By chain rule and derivatives of natural logarithm
y dx x
dy y dy
= n Solve for dx
dx x
xn
= n Since y = xn
x
= nxn−1 Property of exponents
Example 10. A modeling problem using the chain rule
An environmental study of a certain suburban community suggests that when the population is p thousand
people, the amount of carbon monoxide in the air can be modeled by the function
p
C(p) = 0.5p2 + 17
where C is measured in parts per million. The population (in millions) at various times (in years) for the last three
years is given in Table 3.1.
Table 3.1: Population as a function of time

Time t Population p(t)
0 4.6696
1/4 4.6717
1/2 4.6779
3/4 4.6884
1 4.7032
1.25 4.7225
1.5 4.7463
1.75 4.7751
2 4.8088
2.25 4.8479
2.5 4.8926
2.75 4.9432
3 5.0000

a. Which of the following functions model the population data most accurately? (Note how we use subscripts
to allow us to distinguish between the values predicted by the different models and that once the best
model is selected we will rename the left hand side p.)
Linear: yl = 0.109t + 4.618
Quadratic: yq = 0.039t2 − 0.009t + 4.672
Exponential: ye = 4.620e0.023t
b. Suppose the model you have selected in part a. as best-fitting the population data continues to apply for
a decade 0 ≤ t ≤ 10). At what rate will the level of pollution be changing at the end of 3 years?
Solution.
a. To determine which of the three proposed models best fits the population data, we plot the graphs shown
in Figure 3.11.
Figure 3.11: Data fitted by the three functions given in part a of Example 10, respectively from left to right: Linear,
Quadratic, Exponential.
The rates of change for these models are calculated by finding the derivatives:
Linear: yl′ = 0.109; At t = 3, yl = 0.109 ppm
Quadratic: yq′ = 0.078t − 0.009; At t = 3, yq = 0.234 ppm
Exponential: ye′ = 0.10626e0.023t; At t = 3, ye = 0.114 ppm
All three graphs fit the data fairly well, but predictions differ by a factor of 2 at t = 3. It appears from a
visual inspection that the best-fitting is the graph of the quadratic function∗ , which after renaming the
right-hand-side is:
p(t) = 0.039t2 − 0.009t + 4.672
b. By substituting the quadratic population function p(t) selected in part a into the researcher’s pollution
function C(p), we can represent the level of pollution as C[p(t)], a composite function of time. Applying
the chain rule, we find that
dC dC dp
=
dt dp dt

d 1 d
= (0.5p2 + 17)1/2 0.39t2 − 0.009t + 4.672
dp 2 dt

1
= (0.5p2 + 17)−1/2 (0.5)(2p) [0.039(2t) − 0.009]
2
= 0.5p(0.5p2 + 17)−1/2 (0.78t − 0.009)
∗ How well functions fit data can be assessed using a sum-of-squares measure. Further, the general quadratic function y = ax2 + bx + c
q
has three parameters while the general linear and exponential functions yl = ax + b and ye = axebx only contain two parameters each.
Hence these two functions have less freedom to fit a curve to data then the quadratic function. These issues traditionally are not explored
in calculus texts, but rather in statistical texts. Thus our treatment here is informal and purely visual.

When t = 3, p(3) = 0.039(3)2 − 0.009(3) + 4.672 = 4.996
dC −1/2
= 0.5(4.996) 0.5(4.996)2 + 17 [0.078(3) − 0.009]
dt t=3
≈ 0.104
Thus, our analysis suggests that after 3 years, the level of pollution is increasing at the rate of 0.104 parts per
million per year. 2
Problem Set 3.3

Use the chain rule to compute the derivative dy/dx for the functions given in Problems 1 to 4.
1. y = u2 + 1; u = 3x − 2
2. y = 2u2 − u + 5; u = 1 − x2
2
3. y = u2 ; u = x2 − 9
4. y = u2 ; u = ln x
Differentiate each function in Problems 5 to 8 with respect to the given variable of the function.
5. a. g(u) = u5
b. u(x) = 3x − 1
c. f (x) = (3x − 1)5
6. a. g(u) = u3
b. u(x) = x2 + 1
c. f (x) = (x2 + 1)2
7. a. g(u) = u15
b. u(x) = 3x2 + 5x − 7
c. f (x) = (3x2 + 5x − 7)15
8. a. g(u) = u7
b. u(x) = 5 − 8x − 12x2
c. f (x) = (5 − 8x − 12x2 )7
Differentiate each function in Problems 9 to 18.
9. y = (5 − x + x4 )9
2
10. y = e2+x
1
11. y = (1+x−x5 )11
7
12. y = e(x+1)
13. y = ln x2
14. y = (2x + 12)π

15. y = ln(2x + 5)
2
16. y = xe−x
17. y = (x4 − 1)10 (2x4 + 3)7
q
3 −x
18. y = x4−x 2
Find dy/dx by implicit differentiation in 19 to 25.
19. x2 + y = x3 + y 3
20. xy = 25
21. xy(2x + 3y) = 2
1 1
22. y + x =1
23. (2x + 3y)2 = 10

24. ln(xy) = e2x
25. exy + ln y 2 = x
26. Consider the functions y = f (x) and y = g(x) whose graphs in red and blue, respectively, as shown in
Figure 3.12.
Figure 3.12: Functions y = f (x) (red) and y = g(x) (blue)
d

a. Find dx f [g(x)] x=2

d

b. Find dx g[f (x) x=2

27. The graphs of u = g(x) and y = f (u) are shown in Figure 3.13.
a. Find the approximate value of u at x = 2. What is the slope of the tangent line at that point?
b. Find the approximate value of y at x = 2. What is the slope of the tangent line at that point?
c. Find the slope of y = f [g(x)] at x = 1.
28. Let g(x) = f [u(x)], where u(−3) = 5, u′ (−3) = 2, f (5) = 3, and f ′ (5) = −3. Find an equation for the tangent
to the graph of g at the point where x = −3.
29. Let f be a function for which
1
f ′ (x) =
x2 + 1

u = g(x) y = f (u)
Figure 3.13: Chain rule with graphs
a. If g(x) = f (3x − 1), what is g ′ (x)?

b. If h(x) = f x1 , what is h′ (x)?
30. The cissoid of Diocles is a curve of the general form represented by the following particular equation
y 2 (6 − x) = x3
as illustrated in Figure 3.14.
Figure 3.14: Cissoid of Diocles
Find the equation of the tangent line to this graph at (3, 3).
31. The folium of Descartes is is a curve of the general form represented by the following particular equation
9
x3 + y 3 − xy = 0
2
as illustrated in Figure 3.15.
Find the equation of the tangent line to this graph at (2, 1).
32. Another version of the folium of Descartes is given by the equation
x3 + y 3 = 3xy
as illustrated in Figure 3.16

Find at what points the tangent line is horizontal.

Figure 3.15: Folium of Descartes
Figure 3.16: Folium of Descartes
33. The bicorn (also called the cocked-hat) is a quartic curve studied by mathematician James Sylvester (1814-1897)
in 1864. It is given by the set of points that satisfy the equation
y 2 (1 − x2 ) = (x2 + 2y − 1)2
as illustrated in Figure 3.17
Figure 3.17: Bicorn curve
Find the formulas for the two tangent lines at x = 0.

34. Prove that if f is differentiable at u(a) and u is continuous at a, then
(
f [u(a+h)]−f (u(a))
u(a+h)−u(a) if u(a + h) − u(a) 6= 0
G(h) =
f ′ [u(a)] otherwise

is continuous at h = 0.
d 1
35. Prove that dx logb x = x ln b for b 6= 1, x > 0.
36. Arteriosclerosis develops when plaque forms in the arterial walls of a patient, blocking the flow of blood, which,
in turn, often leads to heart attack or stroke. Model the cross-section of an artery as a circle with radius R
cm, and assume that plaque is deposited in such a way that when the patient is t years old, it is p(t) cm thick,
where
p(t) = R[1 − 0.009(12, 350 − t2 )1/2 ]
Find the rate at which the cross sectional area covered by plaque is changing with respect to time in a sixty-
year-old patient.
37. In a classic paper, V. A. Tucker and K. Schmidt-Koenig modeled the energy expended by a species of Australian
parakeet during flight by the function
[0.074(v − 35)2 + 32]
E(v) =
v
where v is the bird’s velocity (km/h).∗
Australian blue Red-rumped parakeet (Psephotus haematonotus)
a. Find a formula for the rate of change of energy with respect to v.

b. At what velocity, v, is the energy expenditure neither increasing nor decreasing? Discuss the impor-
tance of this velocity.
38. In Example 4 in Section 2.4 , we found that the killing rate wolves in North America could be modeled by
3.36x
0.46 + x
where x is measured in number of moose per km2 . If the current moose density is x = 0.5 and increasing at a
rate of 0.1 per year, determine at what rate the killing rate is increasing.
∗ V.A. Tucker and K. Schmidt-Koenig, “Flight of Birds in Relation to Energetics and Wind Directions,” The Auk, V. 88 (1971), pp.
97-107.

39. In Problem 39 in Section 2.4, we modeled how wolf densities in North America depend on moose densities with
the following function
58.7(x − 0.03)
0.76 + x
where x is number of moose per km2 . If the current moose density is x = 2.0 and decreasing at a rate of 0.2
per year, determine how the rate of change of the wolf densities.
40. The proportion of a species of aphid that escapes parasitism is
f (x) = e−0.02x
where x is the density of parasitoids. If the density of parasitoids is currently 10 wasps per acre and decreasing
at a rate 20 wasps per acre per day, find at what rate the likelihood of escaping parasitism is changing.
41. Suppose that a study of the average daily level of carbon monoxide is measured over time is shown in the
following table:
Time (in yr) Parts per million

start 8.185
0.25 8.721
0.50 9.083
0.75 9.343
1.00 9.539
1.25 9.692
1.50 9.815
1.75 9.916
2.00 10.000
a. Formulate a model, as in Example 10, to find the rate at which the level of carbon monoxide will be
changing with respect to time two years from now.
b. Suppose a researcher models this data by
p
L(p) = 0.5 p2 + p + 58
where L is the carbon monoxide level in oarts per million (ppm) when the population is p thousand.
Furthermore, suppose that t years from now, the population of a certain suburban community is
modeled by the formula
6
p(t) = 20 −
t+1
where p(t) is in thousands of people. Using this model, find the rate at which the level of carbon
monoxide will be changing with respect to time two years form now and compare the results with
your model.

304 3.4. TRIGONOMETRIC DERIVATIVES
3.4 Trigonometric Derivatives

Many physical and biological processes change periodically over time and consequently are represented by a
periodic function. A powerful result in mathematical analysis proves that periodic functions can be represented as a
sum of sines and cosines. In this section, we find the derivative of these fundamental functions and their functional
relatives, tangent, secant, cotangent and cosecant. In calculus, we assume that the trigonometric functions are
functions of real numbers or of angles measured in radians. We make this assumption because the trigonometric
differentiation formulas rely on limit formulas that become more complicated if degree measurement is used instead
of radian measure.
Derivative of Sine and Cosine
Before stating the theorem that gives the derivative of sine and cosine, suppose we look at the graph of the difference
quotient. Consider f (x) = sin x. Then
sin(x + h) − sin x sin(x + 0.01) − sin x

=
h 0.01
for a “small” value h = 0.01 has the following graph:
From the graph of this difference quotient, it appears that the derivative of f (x) = sin x is f ′ (x) = cos x. This
relationship makes sense as illustrated in Figure 3.18. Sine is increasing on intervals where cosine is positive and has
turning points where cosine is zero.
0.5
x
-3 -2 -1 1 2 3
-0.5
-1
Figure 3.18: Graphs of cosine (blue) and sine (red)
Before verifying this assertion, we need to find two important limits:

sin x cos x − 1
lim =1 and lim =0
x→0 x x→0 x

3.4. TRIGONOMETRIC DERIVATIVES 305
Using technology, we will illustrate the first limit in the following example, and will leave the second for you in the
problem set. Problem 26 in the problem set finds this limit using a rigorous geometric argument.
Example 1. An important trigonometric limit
sin x
Find limx → 0 x numerically and graphically using technology.
Solution. We note that

sin(−x) − sin x sin x
= =
−x −x x
so the left- and right-hand limits should be the same. Thus, for the numerical approach, we consider a table for
x-values approaching 0 from the right.
sin x
x x
1
10 0.998334
1
20 0.999583
1
30 0.999815
1
40 0.999896
1
50 0.999933
1
60 0.999954
1
70 0.999966
1
80 0.999974
1
90 0.999979
1
100 0.999983
1
120 0.999988
Note that the numbers in this table appear to be approaching 1 as x tends toward 0 from the right (x > 0) Thus,
we might infer from the table that
sin x
lim = 1.
x→0 x
The conclusion from the numerical approach is confirmed by the graph in Figure 3.19.
Figure 3.19: A graphing calculator does not show that there is a “hole” at x = 0
We can now state the derivative rule for the sine and cosine functions.
The functions sin x and cos x are differentiable for all x and
Derivative Rules for Sine and d d
Cosine sin x = cos x and cos x = − sin x
dx dx

Proof. We will prove the first derivative formula using the trigonometri identity
sin(x + h) = sin x cos h + cos x sin h
and the definition of derivative. For a fixed x:

d sin(x + h) − sin x
sin x = lim
dx h→0 h
sin x cos h + cos x sin h − sin x
= lim
h→0 h
sin x(cos h − 1) + cos x sin h
= lim
h→0 h

cos h − 1 sin h
= lim sin x + cos x
h→0 h h

cos h − 1 sin h
= lim sin x + lim cos x
h→0 h h→0 h
cos h − 1 sin h
= sin x lim + cos x lim
h→0 h h→0 h
= (sin x)(0) + (cos x)(1)
= cos x
To find the derivative of cosine, we use the trigonometric identities

π π
cos x = sin x + and cos x + = − sin x
2 2
and the chain rule.
d d π
cos x = sin x +
dx dx 2
π
= cos x +
2
= − sin x
Example 2. Derivatives involving sine and cosine functions
Differentiate the given functions.
a. f (x) = sin 2x
b. f (x) = x2 sin x
√
x
c. f (x) = cos x
Solution.
a. Setting u = 2x and y = f (u) = sin u, the chain rule implies that
dy du
= f ′ (u)
dx dx
= (cos u)2
= 2 cos(2x)

b. By the product rule,
d 2 d
f ′ (x) = sin x x + x2 sin x
dx dx
= 2x sin x + x2 cos x
c.

′ d x1/2
f (x) =
dx cos x
d d
cos x dx (x1/2 ) − x1/2 dx cos x
= 2
Quotient rule
cos x
1 −1/2
2x cos x − x1/2 (− sin x)
= Power rule
cos2 x
1 −1/2
2x (cos x + 2x sin x)
= Common factor
cos2 x
cos x + 2x sin x
= √
2 x cos2 x
Example 3. Periodic populations
Many populations live in environments that change in a periodic fashion with time (e.g. diurnal and seasonal
cycles). To understand the dynamics of an algal population growing in a climate chamber set to have a particular
light/dark cycle, Professor Gut at Bezerkeley University conducted a series of experiments (the results to some of
which can be found in the problem set). In one set of experiments, he found that the algae abundance (in cells per
liter) over time t in hours was given by
N (t) = 10, 000esin t .
a. Verify that N (t) satisfies the relationship
N ′ (t) = cos t N (t)
Explain what this relationship means. At what times is the light intensity greatest?
b. Determine at what times the population is increasing and at what times the population is decreasing.
Solution.
a. Taking the derivative of N (t) with the chain rule, we get
d
N ′ (t) = 10, 000 esin t sin t = 10, 000 esin t cos t
dt
Hence, by the definition of N (t), we have
N ′ (t) = cos t N (t)
We can interpret cos t as the per-capita growth rate of the population. This per-capita growth is greatest
(i.e. equals one) at t = 0, ±2π, ±4π, . . . Hence at these moments of time the light intensity must be the
greatest.

b. The population increases when N ′ (t) > 0. This occurs when cos t > 0, in other words when t is in the
intervals (0, π/2), (3π/2, 5π/2), . . .. On the complementary intervals, the population is decreasing: that
is, more algal cells are dying than are dividing on these complementary intervals.
Example 4. Rate of change of CO2
In Section 1.3, we initially approximated the concentration of CO2 (in ppm) at the Mauna Loa observatory of
Hawaii with the function
πt
h(t) = 329.3 + 0.1225 t + 3 cos
6
Find h′ (3) and compare to the approximation found in Example 2 in Section 2.1.
Solution.

πt
h(t) = 329.3 + 0.1225 t + 3 cos Given function
6

πt π
h′ (t) = 0 + 0.1225 + 3 − sin Elementary derivatives and chain rule
6 6

π πt
= 0.1225 − sin
2 6
Evaluating at t = 3 yields π
π π
h′ (3) = 0.1225 −
sin = 0.1225 − ≈ −1.4483
2 2 2
This agrees with our numerical solution for Example 2 in Section 2.1. 2
Derivatives of Other Trigonometric Functions

If you know the derivatives of sine and cosine and the basic rules of differentiation, then all the other trigonometric
derivatives follow.
The six basic trigonometric functions sin x, cos x, tan x, csc x, sec x, and cot x are
all differentiable wherever they are defined and
d d
Derivative Rules sin x = cos x cos x = − sin x
dx dx
for Trigonometric d d
Functions tan x = sec2 x cot x = − csc2 x
dx dx
d d
sec x = sec x tan x csc x = − csc x cot x
dx dx
All the additional derivative rules are proved by using the appropriate quotient rules along with formulas for the
derivative of sine and cosine. We will obtain the derivative of the tangent function and leave the rest to the problem
set.
d d sin x
tan x = Trigonometric identity
dx dx cos x
d d
cos x dx (sin x) − sin x dx (cos x)
= 2
Quotient rule
cos x
cos x(cos x) − sin x(cos x)
= Derivatives of sine and cosine
cos2 x

cos2 x + sin2 x
=
cos2 x
1
= Trigonometric identity
cos2 x
= sec2 x Trigonometric identity
Notice that the derivatives of all “co” trig functions have the “co-trig” derivative form of their corresponding trigono-
metric partners, but with a sign change. Thus, for example, because the derivative of tangent is secant squared, this
rule implies that the derivative of cotangent is the opposite of cosecant squared.
Example 5. Derivative of a product of trigonometric functions
Differentiate f (x) = sec x tan x.
Solution.
d
f ′ (x) = (sec x tan x)
dx
d d
= sec x (tan x) + tan x (sec x) Product rule
dx dx
= sec x(sec2 x) + tan x(sec x tan x)
= sec3 x + sec x tan2 x
Problem Set 3.4

Differentiate the functions given in Problems 1 to 20.
1. f (x) = sin x + cos x
2. g(x) = 2 sin x + tan x

3. y = sin 2x
4. y = cos 2x
5. f (t) = t2 + cos t + cos π4
6. g(t) = 2 sec t + 3 tan t − tan π3
7. y = e−x sin x
8. y = tan x2
9. f (θ) = sin2 θ
10. g(θ) = cos2 θ
11. y = cos x101

12. y = (cos x)101
13. p(t) = (t2 + 2) sin t
14. y = x sec x

sin t
15. q(t) = t
sin x
16. f (x) = 1−cos x
x
17. g(x) = 1−sin x
18. y = sin(2t3 + 1)
19. y = ln(sin x + cos x)
20. y = ln(sec x + tan x)
Use the given trigonometric identity in parenthesis and the basic rules of differentiation to find the derivatives of the
functions given in Problems 21-24.
1
21. f (x) = sec x (sec x = cos x )
1
22. f (x) = csc x (csc x = sin x )
1
23. f (x) = cot x (cot x = tan x )
sin x
24. f (x) = cot x (cot = cos x )
25. a. If F (x) = ln | cos x| show that F ′ (x) = − tan x.

b. If f (x) = ln | sec x + tan x| show that f ′ (x) = sec x.
26. Consider three areas as shown in Figure 3.20
Figure 3.20: Triangles and a unit circle
a. What is the area of the blue-shaded triangle?

b. What is the area of the pink-shaded sector? Hint : The area of a sector of a circle of radius r and
central angle θ measured in radians is A = 21 r2 θ.
c. What is the area of the yellow-shaded triangle?
d. If g(x) ≤ f (x) ≤ h(x) on an open interval containing c, and if limx → c g(x) = L and limx → c h(x) = L
then
lim f (x) = L
x→c
This is sometimes called the squeeze rule. Use the squeeze rule to find
sin x
lim =1
x→1 x
by beginning with the inequality
BLUE AREA ≤ PINK AREA ≤ YELLOW AREA

27. Prove
cos x − 1
lim =0
x→0 x

cos x+1
Hint : Multiply by 1 written as cos x+1 and use a fundamental trigonometric identity.
28. A researcher studying a certain species of fish in a northern lake models the population after t months of the
study by the function
P (t) = 100e−t sin t + 800
At what rate is the population changing after 2 months? Is the population growing or declining at this time?
29. In a wacky algae experiment, Professor Gut at Bezerkeley manipulated the light in the growth chambers so
that
P (t) = 7, 000 ecos(πt/12)
described the population density (in cells per liter) as a function of t (in hours).
a. Find a function r(t) such that
P ′ (t) = r(t) P (t)
b. What is the period of the light fluctuations in the tank?
c. Determine at what times (if any) the population is decreasing in abundance.
30. In another wacky algae experiment, Professor Gut at Bezerkeley manipulates the light in the algae tanks so
that
P (t) = 5, 000 ecos t+t
describes the population density (in cells per liter) as a function of t (in hours).
a. Find a function r(t) such that
P ′ (t) = r(t) P (t)
b. What is the period of the light fluctuations in the tank?
c. Determine at what times (if any) the population is decreasing in abundance.
31. In Section 1.6, we modeled the tides for Toms cove in Assateague Beach, Virginia on August 19, 2004 with the
function hπ i
H(t) = 1.8 cos (t − 11) + 2.2
6
where
H is the height of the tide (in feet) and t is the time (in hours after midnight). Find and interpret
dH
dt .
t=6

312 3.5. LINEAR APPROXIMATION
3.5 Linear Approximation

We have seen that the tangent line is the line that just “touches” a curve at a point. In this section, we will discover
that the tangent line can be used to give a reasonable approximation to a curve. Using these linear √ approximations
we will be able to make projections about the size of a bison population, estimate quantities like 10, and estimate
the effects of measurement error.
Approximating with the Tangent Line
Let us begin with an example that illustrates how well a tangent line can approximate a curve.
Example 1. Zooming in at a point
Consider the function y = ln x.
a. Find the tangent line at x = 1.
b. Graph y = ln x and the tangent line over the intervals [0.1, 2], [0.5, 1.5], [0.9, 1.1]. Discuss what you find.
Solution.
a. Since
d 1
ln x = =1
dx x=1 x x=1
we get the tangent line is the line of slop 1 through the point (1, 0)—that is the equation
(y − 0) = 1 · (x − 1) ⇒ y = x − 1
as we claimed in Section 2.1.
b. The graphs y = ln x (in blue) and y = x − 1 (in red) on the intervals [0.1, 2], [0.5, 1.5] and [0.9, 1.1]
are shown in Figure 3.21. This figure illustrates that as we zoom into the point (1, 0), the tangent line
provides a better and better approximation of our original function.
y y y
1 0.1
0.4
0.5
0.2 0.05
x
0.5 1 1.5 2
x
-0.5 0.6 0.8 1.2 1.4 x
0.9 0.95 1.05 1.1
-1 -0.2
-1.5 -0.4 -0.05
-2 -0.6
-0.1
a. Domain [0.1, 2] b. Domain [0.5, 1.5] c. Domain [0.9, 1.1]
Figure 3.21: Zooming in on the graphs of y = ln x and y = x − 1 about the point (1, 0)
The difference between a tangent line and the associated curve becomes more and more negligible as you zoom
in to the point of contact. Thus it seems quite reasonable to approximate the function with the tangent line in a
neighborhood of the point at which this tangent is constructed. This is called the linearization of a function.

3.5. LINEAR APPROXIMATION 313
abundance
abundance 1200
110 1000
100 800
90
600
80
70 400
60 200
time
1903 1904 1905 1906 1907 1908 1909 time
1910 1915 1920 1925 1930
a. Abundance from 1902 to 1909 Abundance from 1902 to 1930

and projected abundance via
linear approximation (black line)
Figure 3.22: Bison abundance
If f is differentiable at x = a, then the linear approximation of f at a is given

by
Linear Approximation f (x) ≈ f (a) + f ′ (a)(x − a)
for x near a.
With linear approximations, we can make predictions about the future.
Example 2. Predicting bison abundance
Data exists on the abundance of the North American bison in Yellowstone National Park going back as early as
1902.∗ Annual abundances for bison in Yellowstone for the seven-year period from 1902 to 1909 and for the 28-year
period 1902 to 1930 are shown in the left and right panels respectively of Figure 3.22. These data suggest that the
bison population was recovering in the first part of the 20th century from years of intense hunting in the 19th century.
Suppose that is 1908 you are the Yellowstone park manager concerned about the recovery of the bison population.
You might, for example, be interested in predicting the abundance of the bison 1909 onwards for the next decade or
two.
a. Given the fact that in 1908 and 1909 the bison abundance was 95 and 118 respectively, use a linear
function to extrapolate what the abundance of bison might be in 1910 through 1915.
b. Compare your estimates to the actual population size in 1910 to 1915.
Solution.
a. Let N (t) denote the number of bison in t years after 1900. If we approximate N (t) at t = 8 by a linear
function we get
N (t) ≈ N (8) + N ′ (8)(t − 8)
To approximate N ′ (8), we can use
N (9) − N (8)
N ′ (8) ≈ = 23
9−8
Hence,
N (t) ≈ 95 + 23(t − 8)
for t “near” 8. Hence, our approximation yields the following predictions
∗ Estimates of bison population levels in Yellowstone from 1902-1931 can be found at http://www.seattlecentral.org/qelp/Data.html

year t estimated abundance

1910 10 141
1911 11 164
1912 12 187
1913 13 210
1914 14 233
1915 15 256
b. The actual numbers are given by
year t actual abundance

1910 10 149
1911 11 168
1912 12 192
1913 13 215
1914 14 Unknown
1915 15 270
As we can see our estimates are pretty good but underestimate the actual population size more and more
as time wears on. This is consistent with our expectation that population growth might be exponential.
Our linear approximation is plotted against the entire data set (see the web site for the numbers) in
Figure 3.22b.
2
Using linear approximation, we can estimate the value of a function at points near known values of the function.
As we will see, however, linear approximations of nonlinear functions generally get increasingly worse as we move
away from the point where the approximation is rooted.
√
Example 3. Approximating 10
√
Consider the function f (x) = x.
a. Find the linear approximation of f at x = 9.
√
b. Use the linear approximation found in part a to approximate 10. Compare this approximation to a
calculator approximation.
√
c. How well does this same approximation work for 16?
Solution.
√
a. f ′ (x) = 21 x−1/2 , so f ′ (9) = 12 (9)−1/2 = 61 and the linear approximation for x at x = 9 is
√
x ≈ f (9) + f ′ (9)(x − 9)
1
= 3 + (x − 9)
6
for x near 9.
√
b. If we now apply the above approximation to find 10, we obtain
√ 1 1
10 ≈ 3 + (10 − 9) = 3 ≈ 3.16667
6 6
This is fairly close to the calculator approximation
√
10 ≈ 3.16228
So the error is 0.004 (to 3 decimal places)

c. Similarly,
√ 1 7
16 ≈ 4 + (16 − 9) = 4 ≈ 5.16666
6 6
Since we know the answer is 4, the approximation now has an error of more than 1.1.
The next example shows that the linear approximation of sin x in the neighborhood of 0 is very simple, but fails
badly as the approximation is pushed too far beyond 0.
Example 4. Approximating sin x
Consider y = sin x
a. Find linear approximation of sin x at x = 0.
b. Plot the difference between y = sin x and its linear approximation on the intervals [−1, 1], [−0.5, 0.5] and
[−0.1, 0.1]. Discuss the meaning of these plots.
c. Approximate sin 2, sin 1 and sin 0.25 with the linear approximation from a. Compare your approximations
to calculator approximations.
Solution.

d
a. Since dx sin x = cos 0 = 1, we get the linear approximation at 0 is
x=0
sin x ≈ f (0) + f ′ (0)(x − 0)

= 0 + 1(x − 0)
= x
b. The graphs of sin x − x on the intervals [−1, 1], [−0.5, 0.5] and [−0.1, 0.1] are illustrated in Figure 3.23.
These figures illustrate that the difference between sin x and x gets smaller and smaller as you zoom
around the point x = 0. Hence, y = x is a better and better approximation for sin x as x approaches 0.
Figure 3.23: The graphs of sin x − x zooming onto the point (0, 0)
c. linear approximation calculator approximation error

sin 0.25 ≈ 0.25 sin 0.25 ≈ 0.247404 small: < 0.5%
sin 1 ≈ 1 sin 1≈ 0.841471 moderate: around 15%
sin 2 ≈ 2 sin 2≈ 0.909297 large: > 100%
2

Error Analysis
When a scientist makes a measurement it is always subject to some measurement error. Hence, we have
measurement = actual value + measurement error
For instance, when the clearance rate of Acetaminophen is given as 0.28 per hour, this estimate is the average of a
series of measurements that may vary by 0.05 or so. Consequently, when we estimate the half-life of Acetaminophen
or the amount in the blood stream several hours after taking the drug, it is important to understand how small
variations in the estimate 0.28 influence our half-life estimates.
Example 5. Jackie Chan’s headache
Jackie Chan takes a 1000 mg of Acetaminophen to combat a headache.

a. Solve for the half-life T of the drug as a function of the clearance rate x per hour.
b. Determine the half-life of the Acetaminophen assuming that x = 0.28 is a good estimate of the clearance
rate.
c. What is the derivative of the half-life T with respect to x and its value at x = 0.28.
d. Use linear approximation to estimate the change ∆T in the estimated half-life if the estimate x = 0.28 is
off by ∆x. Interpret this result.
Solution.
a. Let A(t) denote the amount of Acetaminophen in the body at time t hours. Since the clearance rate is
x, we have
A(t) = A(0)e−xt
The half-life is the time t = T such that
A(0)
A(T ) =
2
1
e−xT =
2
1
−xT = ln
2
ln 2
T =
x
Hence, the half-life as a function of x is
ln 2
T (x) =
x
b. Evaluating T at x = 0.28 yields T (0.28) = 2.47553.
c. Differentiating the half-life function derived in part a. yields
ln 2
T ′ (x) = −
x2
from which we can calculate T ′ (0.28) = −8.84116.
d. If we have x = 0.28 + ∆x where ∆x can be viewed as a small measurement error, then by linear
approximation we have
T (0.28 + ∆x) ≈ T (0.28) + T ′ (0.28)∆x

= 2.47553 − 8.84116 ∆x

Thus,
∆T = T (0.28 + ∆x) − T (0.28)
≈ 2.47553 − 8.84116 ∆x − 2.47553
= −8.84116 ∆x
Hence, for a measurement error of ∆x per hour, we get that half-life changes by approximately −8.84116 ∆x
hours. For instance if the measurement error in the clearance rate is ∆x = 0.05, then our estimate of the
half-life decreases approximately by 8.84116 · 0.05 = 0.4421 hours. Hence, the estimate of the half-life, T ,
is quite sensitive to the estimate of the clearance rate.
2
Example 5 illustrates how an error in the measurement of the independent variable x propagates to an error in
the dependent variable y.
Suppose y = f (x) is a quantity of interest and x = a is the true value of x. If there

is an error of ∆x in measuring x = a, then by linear approximating the resulting
error in y is given by
Error Estimates ∆y = f (a + ∆x) − f (a)

and Sensitivity. ≈ f (a) + f ′ (a)∆x − f (a)
= f ′ (a)∆x
f ′ (a) is often called the sensitivity of y to x at x = a. The greater the sensitivity

the greater the propagation of error.
Consider another example.
Example 6. Estimating metabolic rates
In Example 10, Section 1.5 [xref] (below we have changed the names of the variables from x and y to M and R),
we discovered the mouse to elephant curve that describes how the metabolic rate R (in kCal/day) depends on body
mass M (in kilograms) is approximately given by
ln R = 0.75 ln M + 4.2
Show that taking the exponentials on other side yields the equation
R = e4.2 M 0.75 .
where M is mass in kilograms.
a. Estimate the metabolic rate of a California condor weighing 10kg.
b. Determine the sensitivity of your estimate to the measurement of 10 kg. Discuss how a small error ∆M
propagates to an error ∆R in your estimate for R.
Solution.
a. For the 10 kg condor, we get R = e4.2 × 100.75 ≈ 377 kCal/day.
b. The sensitivity of this estimate to our estimate for the condor weight is

R′ (10) = 50.25M −0.25

≈ 28.26
M=10
Hence, ∆R ≈ 28.26∆M . For example, an error of ∆M = 0.1 kg yields an error of ∆R = 2.826k Cal/day
in estimating R.

Often scientists are more interested in the percent error and not the absolute error. For example, a scientist may
want to know how does a 10% error in the measurement of the clearance rate result in a percentage error in the
estimate of the half-life. If x = a is the true value of the independent variable and there is a measurement error of
∆x, then the percent error in x is
∆x
× 100%
a
With an error of ∆x in the independent variable, we get an error of ∆y = f (a + ∆x) − f (a) in y. Hence, the percent
error in y is
∆y
× 100%
f (a)
The ratio of the percentage error in y over the percentage error in x is given by
∆y
f (a) × 100% ∆y a
∆x
=
a × 100%
∆x f (a)
and can be approximated by

a
f ′ (a)
f (a)
This quantity is used quite commonly in the analysis of biological models and, consequently, has a special name.
Let y = f (x) be a function that is differentiable at x = a. The elasticity of f with

respect to x at a is
a
E = f ′ (a)
Elasticity f (a)
We can interpret E as stating that for a 1% error in the measurement of x = a,
there is a E% error in the measurement of y.
Example 7. Elasticity of metabolic rates
Let us revisit Example 6 where we estimated the metabolic rate of a California condor weighing 10 kg.
a. Find the elasticity of your estimate of the metabolic rate to the estimate to condor weight.
b. Interpret your elasticity in terms of 10% error in the condor weight.
Solution.
a. To compute the elasticity, recall we found that R(10) ≈ 377kCal/day and R′ (10) ≈ 28.26. Hence, the
elasticity is
10 10
R′ (10) ≈ 28.26 ≈ 0.75
R(10) 377
b. Since the elasticity is 0.75, a 10% measurement error in the weight of the condor would result in approx-
imately a 7.5% error in the estimate of the metabolic rate.
2
Using elasticity, we can estimate with what accuracy we need to measure an independent variable to ensure a
certain accuracy in the estimate of a dependent variable.
Example 8. Determining measurement accuracy

The Body Mass Index (BMI) for individual weighing w pounds and h inches tall is given by
703w
B=
h2
a. Determine the elasticity of B with respect to the variable h.

b. Estimate how accurate your height measurement needs to be to guarantee less than a ±5% error in your
BMI measurement.
Solution.
a. To compute the elasticity, we first need the derivative
dB 1406w
=−
dh h3
Hence, the elasticity is
dB h 1406w h
= −
dh B h3 703w/h2
= −2
Note that this answer does not depend on w or h but is a pure number! (Think about why this is the
case?)
b. Since the elasticity is −2, an x% error in h results in a −2x% error in our estimate for BMI. Hence to
ensure that our error is no greater than ±5%, we need to ensure that the error in the measurement of
the height is no greater than ±2.5%.
2
Problem Set 3.5

In Problems 1 to 6 find the linear approximation of y = f (x) at the specified point and use technology (i.e. graph
the linear approximation and y = f (x)) to determine whether the linear approximation tends to overestimate or
underestimate y = f (x) near the specified point.
π
1. y = cos x at x = 2
2. y = ex at x = 0.
π
3. y = sin x at x = 2
4. y = x2 at x = −2
1
5. y = 1+x2 at x = 2
6. y = xe−x at x = ln 2
In Problems 7 to 12, estimate the indicated quantity using a linear approximation.

√
7. 26
√
8. 0.99

9. ln 0.9
10. cos( π2 + 0.01)
11. tan 0.2
12. e−0.2
Find the sensitivity of y = f (x) at the point specified in Problems 13 to 18, and use it to estimate ∆y for the given
measurement error ∆x.
√
13. y = x at x = 9, with ∆x = 0.01
√
14. y = 2x2 + 1 at x = −2, with ∆x = 0.01
15. y = ln x at x = 2, with ∆x = −0.2

π
16. y = cot x at x = 2, with ∆x = 0.1
π
17. y = cos x at x = 2, with ∆x = −0.01
1
18. y = x+1 at x = 0, with ∆x = −0.05
Find the elasticity of y = f (x) at the point specified in Problems 19 to 24 and use it to estimate the percent error in
y for the given percent error in x.
√
19. y = x at x = 9, with 1% error in x
√
20. y = 2x2 + 1 at x = −2, with 8% error in x
21. y = ln x at x = 2, with 5% error in x

π
22. y = cot x at x = 2, with 10% error in x
π
23. y = sin x at x = 2, with 10% error in x
1
24. y = x+1 at x = 0, with 12% error in x
25. If your measurement of the radius of a circle is accurate to within 3%, approximately how accurate (to the
nearest percent) is your calculation to the area A when the radius is r = 12 cm? (Recall the formula A = πr2 ).
26. Suppose a 12-oz can of Coker has a height of 4.5 in. If your measurement of the radius has an accuracy to
within 1%, how accurate is your measurement for the volume? Check your answer by examining a Coke can.
27. An environmental study suggests that t years from now, the average level of carbon monoxide in the air will be
Q(t) = 0.05t2 + 0.1t + 3.4
parts per million (ppm). By approximately how much will the carbon monoxide level change during the next
6 months?
28. A certain cell is modeled as a sphere. If the formulas S = 4πr2 and V = 43 πr3 are used to compute the surface
area and volume of the sphere, respectively, estimate the effect of S and V produced by a 1% increase in the
radius r.

29. In a model developed by John Helms, the water evaporation E(T ) for a ponderosa pine is modeled by
E(T ) = 4.6e17.3T /(T +237)
where T (degrees Celsius) is the surrounding air temperature.∗

a. Compute the elasticity of E(T ) at T = 30.
b. If the temperature is increased by 5% from 30◦ C, estimate the corresponding percentage change in
E(T ).
30. In Example 5, we showed that the half-life, T , of a drug with clearance rate x is given by
ln 2
T (x) =
x
Suppose that the true value of the clearance rate of some drug is given by x = a.
a. Find the elasticity of T with respect to x.
b. If you want to estimate the half-life of this drug within an error of 2%, how accurately do you have
to measure the clearance rate of the drug.
31. In a healthy person of height x in., the average pulse rate in beats per minute is modeled by the formula
596
P (x) = √ 30 ≤ x ≤ 100
x
a. Compute the sensitivity of P at x = 60.

b. Estimate the change in pulse rate that corresponds to a height change from 59 to 60 in.
c. Compute the elasticity of P . Does it depend on x?
d. Determine how accurate the measurement of x needs to be to ensure the estimate for P has an error
of less than 10%.
32. A drug is injected into a patient’s bloodstream. The concentration of the drug in the bloodstream t hours after
the drug is injected is modeled by the formula
0.12t
C(t) =
t2 + t + 1
where C is measured in milligrams per cubic centimeter.
a. Compute the sensitivity of C at t = 30.
b. Estimate the change in concentration over the time period from 30 to 35 minutes after injection.
33. According to Poiseuille’s law, the speed of blood flowing along the central axis of an artery of radius R is
modeled by the formula
S(R) = cR2
where c is a constant.∗ What percentage error (rounded to the nearest percent) will you make in the calculation
of S(R) from this formula if you make a 1% error in the measurement of R?
34. Consider a power function f (x) = axb with a > 0 and b 6= 0. Show that the elasticity of f (x) is independent
of the x value.
35. Use your answer from problem 34 to quickly answer the following questions:
∗ John A. Helms, “Environmental Control of Net Photosynthesis in Naturally Grown Pinus Pondeosa Nets,” Ecology (Winter 1972),
p. 92.
∗ Introduction to Mathematics for Life Sciences, 2nd edition. New York: Springer-Verlag (1976), pp. 102-103.

a. If there is a 5% error in estimating the mass M of a weight lifter, what is approximately the percent
error in estimating the lift L = 20.15M 2/3kg of the weightlifter?
b. If there is a 10% error in estimating the mass M of an organism, then what is approximately the
percent error in estimating the metabolic rate R ∝ M 2/3 cal/hour of the organism?
c. If there is a 2% error in measuring the weight W of a person, what is approximately the percent
error in estimating the body mass index B ∝ W of the person?
36. The gross U.S. federal debt (in trillions of dollars) from 1999 to 2005 is given in the following table†
Year Gross Federal Debt

1999 5.606
2000 5.629
2001 5.770
2002 6.198
2003 6.760
2004 7.355
a. Plot the data and the linear approximation of the data at t = 1999. Discuss the quality of this
approximation.
b. Use a linear approximation to estimate the federal debt in 2005. Look up the actual gross federal
debt to see how well the approximation worked.
† The data is from the historical tables in the 2006 OMB Budget as download from WhiteHouse.gov, Historical Tables.

3.6. HIGHER-ORDER DERIVATIVES AND APPROXIMATIONS 323
3.6 Higher-Order Derivatives and Approximations

The derivative of a function can be interpreted as the instantaneous rate of change and yields linear approximations
to the function. Since the derivative of a function is also a function, it also has a derivative. What does this
derivative of a derivative represent? How useful is it? The goal of this section is to answer these questions and
more—considerably more.
Second-order Derivatives
The second derivative of a function f is the derivative of f ′ and is denoted f ′′ . In other words,

d d
f ′′ (x) = f (x)
dx dx
Equivalently, we write
d
f ′′ (x) = f (x) = f (2) (x)
dx2
or if y = f (x),
d dy d2 y
f ′′ (x) = =
dx dx dx2
Example 1. Finding second-order derivatives
Find f ′′ (x) for the given functions:

a. f (x) = sin x
b. f (x) = x2
c. f (x) = x2x
Solution.
d
a. Since f ′ (x) = cos x, we get that f ′′ (x) = dx cos x = − sin x.
d
b. Since f ′ (x) = 2x, we get that f ′′ (x) = dx 2x = 2.
c. Since f ′ (x) = 2x + x(ln 2)2x = 2x (1 + x ln 2), we get that
f ′′ (x) = 2x ln 2 + 2x ln 2(1 + x ln 2)
= 2x ln 2(1 + 1 + x ln 2)
= 2x ln 2(2 + x ln 2)
What do these second derivatives represent? Consider the following definition.
If the graph of a function f lies above all its tangents on an interval I, then it is
Concave up/concave down said to be concave up on I. If the graph of f lies below all of its tangents on I, it
is said to be concave down.
Since f ′′ is the derivative of f ′ , the mean value theorem implies that if f ′′ > 0 on an interval, then f ′ is
increasing on this interval. What does this mean? In terms of tangent lines, this means that the slope of the tangent
line is increasing in the interval. Hence, f is “bending upwards” or, equivalently, is concave up on this interval.

324 3.6. HIGHER-ORDER DERIVATIVES AND APPROXIMATIONS
Alternatively, if f ′′ < 0, the slope of the tangent line is decreasing and f is “bending downwards” or, equivalently, is
concave down.
Let f be a function whose first and second derivatives are defined at x = a. If

Concavity f ′′ (a) < 0 then y = f (x) is concave down near x = a. If f ′′ (a) > 0 then y = f (x)
is concave up near x = a
Example 2. Identifying concavities
Identify the concavities of the function defined by the given graphs. In other words, determine where the graphs
are concave up and where they are concave down.
y
1
0.5
x
-1 -0.5 0.5 1
-0.5
a. -1
y
1
0.75
0.5
0.25
x
-1 1 2 3 4
-0.25
-0.5
b.
Solution.
a. The easiest way to proceed is to place a straight edge (e.g. ruler, pencil, etc.) on the graph and keeping it
tangent to the curve move from left to right. Whenever the straight edge is rotating in a counter-clockwise
fashion, the slope of the tangent line is increasing and f ′′ > 0.
Alternatively, whenever the straight edge is rotating in a clockwise fashion, the slope of the tangent line
is decreasing and f ′′ < 0.
For this graph, we obtain a clockwise rotation from x = −1 to x = −0.5 and from x = 0 to x = 0.5.
Hence, f ′′ < 0 on (−1, 0.5) and (0, 0.5).
Also, we obtain a counterclockwise rotation from x = −0.5 to x = 0 and from x = 0.5 to x = 1. Hence,
f ′′ > 0 on (−0.5, 0) and (0.5, 1).
b. This graph is concave down over (−1, 1) and concave up on (1, 4)

There is another way to interpret the concavity of a graph. If the curve lies above its tangent lines and the slopes
of the tangent lines exhibit a sign change, the graph is concave upwards and will hold water. On the other hand, if
the curve lies below its tangent lines, then it is concave downwards and will not hold water. A point on a continyuous
graph that separates a concave downward portion of a curve from a concave upward portion is called an inflection
point. Figure 3.24 illustrates these ideas. An inflection point must be on the graph, meaning that f (c) must be
defined if there in an inflection point at x = c.
Figure 3.24: Concavity with an inflection point
In Figure 1.34 of Section 1.5 on the growth of the US population, we saw that points initially follow an exponential
rise but begin to fall behind. If the population now levels off asymptotically rather than continuing to grow larger
and larger (i.e. without bound), then call the growth curve sigmoidal, because it looks like an ”S” that has been
stretched from is lower left to its upper right so that it passes the vertical line test discussed in Section 1.2 for a
function.
A sigmoidal function (black) and its mirror image (red)
Example 3. Sigmoidal decay in deaths due to aerial borne diseases
In a study of deaths in the United States, Jesse Ausubel and colleagues found that deaths from aerially transmitted
diseases as a fraction of all deaths could be very well described by the mirror image of a sigmoidal function shown
in Figure 3.25.∗ Determine where this function is concave up and down. Find the point of infection. Discuss what
these changes in concavity mean.
Solution. To estimate the intervals of concavity, we can place a ruler as a tangent to the curve and slowly move
it from the left hand side to the right side. Doing so, we notice that the ruler is rotating clockwise from x = 0 to
x ≈ 40 and rotating counterclockwise from x ≈ 40 to x = 120. Hence, the point of inflection appears to be located
at x = 40, and the fraction of deaths due to aerial diseases is decreasing at a faster and faster rate from 1880 (x = 0)
to 1920 (x = 40) that is, the curve is concave down. The fraction of deaths is decreasing at a slower and slower rate
from 1920 (x = 40) to 1980 (x = 100); that is, the curve is concave up. 2
∗ J.H. Ausubel, P.S. Meyer, and I.K. Wernick, “Death and the Human Environment: The United States in the 20th Century,”
Technology in Society 23(2):131–146 (2001).

fraction deaths
0.1
0.08
0.06
0.04
0.02
years
20 40 60 80 100
Figure 3.25: Fraction of deaths as a function of time (in years) after 1880
Example 4. A Simpson’s Episode
Discuss the role of second derivatives taken from an episode of the Simpson’s.∗ Homer and Lisa are reading an
issue of USA TODAY over breakfast.
Homer: Here’s good news! According to this eye-catching article, SAT scores are declining at a slower rate!
Lisa: Dad, I think this paper is a flimsy hodgepodge of pie graphs, factoids and Larry King.
Homer: Hey, this is the only paper in America that’s not afraid to tell the truth, that everything is just fine.
Solution. The key statement is that “SAT scores are declining at a slower rate.” If S(t) denotes the average SAT
score as a function of time, then the phrase “SAT scores are declining” means that S ′ (t) < 0 and the phrase “at a
slower rate” means that S ′′ (t) > 0 as the rate S ′ (t) is increasing. In other words, SAT scores are decreasing, yet
concave up and so ”leveling off”! 2
Using our interpretations of the first and second derivative, we should be able to identify the graph of one from
the other.
Example 5. Finding f , f ′ , and f ′′
The graphs of y = f (x), y = f ′ (x), and y = f ′′ (x) are shown in Figure 3.26. Identify f , f ′ , and f ′′ .
Solution. To identify f , f ′ , f ′′ , we can start with one graph, say the black one, and determine when it is rising and
falling. We can see that it is falling roughly on the intervals [−2, −1.25] and [0, 1] and rising on the complementary
intervals. Since the blue curve is negative on [−2, −1.25] and [0, 1] and positive on the complementary intervals,
the blue curve may be the graph of the derivative of function defined by the black curve. On the other hand, the
black curve appears to be negative where the red curve is falling and positive where the red curve is rising. Hence,
the black curve appears to be the graph of the derivative of the function defined by the red curve. Therefore, it is
reasonable (and correct to conclude that y = f (x) is defined by the red curve, y = f ′ (x) is defined by the black
curve, and y = f ′′ (x) is defined by the blue curve. 2
Second-order Approximations
In Section 3.5, we approximated functions with their tangent lines. While a good start, these approximations can be
improved upon by taking more derivatives.
∗ Episode 8F04, from http://www.snpp.com/episodes/8F04.html.

y
2
x
-2 -1 1 2
-1
-2
Figure 3.26: Graph of a function and its first and second derivatives
Example 6. Stripping away the tangent line
Consider y = e2x .
a. Find the tangent line at x = 0.

d2 y
b. Compute dx 2 and determine whether the linear approximation overestimates or underestimates y = ex
x=0
near x = 0.
c. Plot the difference between y = e2x and its tangent line. Discuss what you notice.
Solution.

dy
a. Since dx = 2e2x = 2e0 = 2, the tangent line has slope 2 and passes through the point (0, 1)—that
x=0 x=0
is the line
(y − 1) = 2(x − 0) ⇒ y = 2x + 1
d2 y d dy
b. dx2 = dx 2e2x = 4e2x , which is 4 at x = 0. Since the second derivative is positive, dx is increasing near
2x
x = 0, and we would expect the tangent line to underestimate (i.e. lie under) y = e near x = 0. Indeed,
graphing the function y = e2x (black curve) and y = 2x + 1 (red curve) confirms this prediction:
1.5
0.5
x
-0.4 -0.2 0.2 0.4
c. Plotting y = e2x − 2x − 1 yields what looks like a parabola:

0.4
0.3
0.2
0.1
x
-0.4 -0.2 0.2 0.4
The preceding example suggests that difference between the original function and its derivative is approximately
parabolic. But what should this parabola be? To answer this question consider that we want to approximate a
function f around the point x = 0 with a quadratic function
f (x) ≈ a + bx + cx2
Since we want these function to agree at x = 0, we need
a = f (0)
To have their first derivatives agree at x = 0, we can take derivatives of both sides
f ′ (x) ≈ b + 2cx
At x = 0, we have f ′ (0) ≈ b, so we define b = f ′ (x). Finally to have their second derivative agree at x = 0, we
differentiate one more time:
f ′′ (x) ≈ 2c
f ′′ (0)
This leads us to define c = 2 . This gives a quadratic (second-order or parabolic) approximation at x = 0:
1
f (x) ≈ f (0) + f ′ (0)x + f ′′ (0)x2
2
Let us see how well this approximation works.
Example 7. Quadratic approximation
Find the quadratic approximation to y = e2x at x = 0. Plot y = e2x , its linear approximation, and its quadratic
approximation.
Solution. Let f (x) = e2x , so from the previous example f ′ (x) = 2e2x and f ′′ (x) = 4e2x . The linear approximation
is
f (x) ≈ f (0) + f ′ (0)(x − 0) = 1 + 2x
The quadratic approximation is
1
f (x) = f (0) + f ′ (0)x + f ′′ (0)x2
2
2
= 1 + 2x + 2x
The graphs of y = e2x , y = 2x + 1 , and y = 1 + 2x + 2x2 are shown in Figure 3.27. The quadratic approximation
does a significantly better job of approximating the function y = e2x .
2

1.5
0.5
x
-0.4 -0.2 0.2 0.4
Figure 3.27: Graph of a function f (x) = e2x (in black) along with its linear (in red) and quadratic approximations
(in blue)
In cases where the linear approximation is a horizontal line, the quadratic approximation is the first approximation
to give you some real information about the concavity of the curve in question.
Example 8. Approximating the cosine
a. Find the linear and quadratic approximations of y = cos x at x = 0.

b. Use the quadratic approximation to estimate cos 1, cos 0.5, and cos 0.1. Compare your approximations to
the answers given by a calculator.
Solution.
a. Let f (x) = cos x, so f ′ (0) = − sin 0 = 0 and f ′′ (0) = − cos 0 = −1. The linear approximation is
y=1
The quadratic approximation is

1
f (x) = f (0) + f ′ (0)x + f ′′ (0)x2
2
1 2
= 1− x
2
The graph is a downward-facing parabola. The graphs of y = cos x, y = 1 and y = 1 − 21 x2 are shown in
Figure 3.28.
b. We compare the quadratic and calculator approximations.
quadratic approximation calculator approximation comment (d.p. ≡ decimal places)

cos 1≈ 1 − 12 = 0.5 cos 1≈ 0.540302 fair: correct to 1 d.p.
cos 0.5≈ 1 − 21 (0.5)2 = 0.875 cos 0.5≈ 0.877583 better: correct to 2 dp
cos 0.1 ≈ 1 − 21 (0.1)2 = 0.995 cos 0.1≈ 0.995004 better yet: correct to 6 d.p.!
The approximations get better and better as you get closer and closer to x = 0.
More generally, we may wish to approximate a function near a point x = a with a parabola. As an exercise, you
can verify that by forcing the quadratic approximation and the function to agree up to the second-order derivative
at x = a, you obtain the following second-order approximation of f at x = a.

y
1
0.5
x
-2 -1 1 2
-0.5
-1
Figure 3.28: Graph of a function f (x) = cos x (in black) along with its linear (in red) and quadratic approximations
(in blue)
Let f have a first and second derivative defined at x = a. The second-order approx-
imation of f at x = a is given by
Second-order
approximation 1
f (x) ≈ f (a) + f ′ (a)(x − a) + f ′′ (a)(x − a)2
2
Example 9. Working around another point
Find the first- and second-order approximation of ln x about x = 1. Use technology to compare first order and
second-order approximations of ln 23 .
Solution. Let f (x) = ln x, f ′ (x) = x1 , and f ′′ (x) = − x12 , so f (1) = 0, f ′ (1) = 1, and f ′′ (1) = −1. Hence, the first-
order and second-order approximations are, respectively,
ln x ≈ f (x) + f ′ (1)(x − 1)
= x−1
and
1
ln x ≈ f (1) + f ′ (x)(x − 1) + f ′′ (1)(x − 1)2
2
1
= (x − 1) − (x − 1)2
2
With the first-order approximation, we find ln 23 ≈ 32 − 1 = 21 and with the second-order approximation, we find
ln 23 ≈ 21 − 81 = 0.375. By calculator, we find ln 32 ≈ 0.4055. Neither approximation is very close, but the second-order
approximation is more accurate. 2
Even Higher-order Derivatives

Once we have taken two derivatives, there is no reason to stop. We can attempt to take the derivative of the second
derivative, and the derivative of the resulting derivatives, and so on and so on. These higher-order derivatives
are defined as follows:
d
First order: f (1) (x) = f ′ (x) = dx f (x)
d d

Second-order: f (2) (x) = f ′′ (x) = dx dx f (x)

d
d d

Third order: f (3) (x) = f ′′′ (x) = dx dx dx f (x)
d

n-th order f (n) (x) = dx f (n−1) (x)
Example 10. Higher-order derivatives
Find the following higher-order derivatives

d3
a. dx3 (1 + x + x3 )
d5 2x
b. dx5 e
c. f (101) (x) where f (x) = sin x
Solution.
a. If y = 1 + x + x3 , then
dy d
dx = dy (1 + x + x3 ) = 1 + 3x2
d2 y d
dx2 = dy (1 + 3x2 ) = 6x
d3 y d
dx3 = dy (6x) =6
b. If f (x) = e2x . then
f ′ (x) = 2e2x
f ′′ (x) = 4e2x
f ′′′ (x) = 8e2x
f (4) (x) = 16e2x
f (5) (x) = 32e2x
c. At first this problem might seem insane. Take 101 derivatives of a function? However, if we proceed
calmly, a pattern will quickly emerge that will dispel this insanity. If f (x) = sin x, then
f ′ (x) = cos x
f ′′ (x) = − sin x
f ′′′ (x) = − cos x
f (4) (x) = sin x
We are back to where we started, and the derivatives cycle in a fixed pattern of period four (i.e. repe-
tition occurs every four derivatives so that f (1) (x) = f (5) (x) = f (5) (x) = · · · cos x, f (2) (x) = f (6) (x) =
d
f (10) (x) = · · · − sin x, and so on). Hence, f (100) (x) = sin x and f (101) (x) = dx sin x = cos x.
We conclude with an application of higher-order derivatives from politics.
Example 11. Presidential proclamation
In the fall of 1972 President Nixon announced that the rate of increase of inflation was decreasing. This
was the first time a sitting president used the third derivative to advance his case for reelection. ∗
∗ Hugo Rossi, “Mathematics Is an Edifice, Not a Toolbox”, Notices of the AMS, v. 43, no. 10, October 1996.

Discuss how a third-order derivative is being used in President Nixon’s statement.
Solution. Let V denote the “value of a dollar” at time t (in years). Inflation means that the value of a dollar is
decreasing, so dV
dt < 0. If inflation is increasing, then the value of a dollar is decreasing at a faster rate; that is,
d2 V d3 V
dt2 < 0. Finally, if the rate of increase of inflation is decreasing, we get dt3 > 0. Hence, at third order, things are
looking good for the value of the dollar! 2
Problem Set 3.6

Find the higher-order derivatives indicated in In Problems 1 to 12.
d2
1. dx2 of x e−x
d3
2. dx3 of 2x
3. f (4) (x), where f (x) = 1 + x + x2 + x3 + x4
4. f (103) (x), where f (x) = cos x

d99
5. dx99 of sin 3x
d3
6. dw 3 of (1 + w + w2 + w3 + w4 )
d4
7. dt4 of ( 41 t8 − 12 t6 − t2 + 2)
dn+1
8. dxx+1 of xn
9. f (10) (x), where f (x) = (1 + x)10
10. f (4) (x), where f (x) = √4

x
d2 1
11. dw 2 of 1+w
d2 y
12. dx2 , where y = (x2 + 4)(1 − 3x3 )
In Problems 13 to 18 find the linear approximation of indicated function f (x) at x = a. Using second-order deriva-
tives, determine whether the linear approximation tends to overestimate or underestimate f (x) near x = a.
13. f (x) = ex at x = 0
14. f (x) = cos x at x = 0
15. f (x) = 1 − x2 at x = 2.
16. f (x) = tan x at x = π

1
17. f (x) = 1+x at x = 2.
18. f (x) = xe−x at x = 1
Determine on what intervals f is increasing, decreasing, concave up, concave down, and find the points of inflection
in Problems 19 to 28.
19. y = 1 − x + x3
20. y = 1 + 2x + 18/x

21. y = xe−x
2
22. y = e−x
x
23. y = 1+x
x
24. y = x2 +1
25. y = 3x4 − 2x3 − 12x2 + 18x − 5
26. y = x4 + 6x3 − 12x2 + 18x − 5
27. y = sec x

28. y = x3 +sin x on − π2 , π2
Find the first- and second-order approximations of y = f (x) at x = a in In Problems 29 to 34. Use technology to
plot the function and its approximations near x = a.
29. y = sin x at x = 0
30. y = 1 + x2 at x = 2
31. y = ex at x = 0
32. y = sec x at x = 0
√
33. y = x at x = 4
√
34. y = 3 x at x = 27
In Problems 35 to 38, identify y = f (x), y = f ′ (x), and y = f ′′ (x).
10
x
-2 -1 1 2
-5
35.
y
x
-2 -1 1 2
-2
-4
36.

y
2
x
-2 -1 1 2
-1
-2
37.
x
-2 -1 1 2
-2
-4
-6
-8
38. -10
39. Sketch the graph of a function with all of the following properties:
f ′ (x) > 0 when x < −1
f ′ (x) > 0 when x > 3
f ′ (x) < 0 when −1 < x < 3
f ′′ (x) < 0 when x < 2
f ′′ (x) > 0 when x > 2
40. Sketch the graph of a function with all of the following properties:
f ′ (x) > 0 when x < 2 and when 2 < x < 5
f ′ (x) < 0 when x > 5
f ′ (2) = 0
f ′′ (x) < 0 when x < 2 and when 4 < x < 7
f ′′ (x) > 0 when 2 < x < 4 and when x > 7
41. The slogan of the Lowes Home Improvement company is “Improving Home Improvement.” Explain the role of
derivatives in this slogan.

∗
42. Explain the role of higher order derivatives in the following MAL cartoon.
43. At the website http://www.nlreg.com/aids.htm, you can find the following figure that graphs the number
of new cases of AIDS since 1980:
a. Estimate where the function is concave up and concave down.

b. Describe in words what these changes in the concavity mean for the AIDS epidemic.
44. In Example 2 from Section 2.4, a dose-response curve for patients responding to a dose of Histamine is given
by
100ex
R= x
e + e−5
d2 R
a. Compute dx
b. Determine for what dosage ranges R is concave up and concave down. Interpret your results.
45. Historical Quest One of the most famous women in the history of mathematics is Maria Gaëtana Agnesi.
∗ MAL cartoon,©1974, by permission from the estate of Malcolm Hancock.

Maria Agnesi 1718-1799
She was born in Milan, the first of 21 children. Her first publication was at age 9, when she wrote a Latin
discourse defending higher education for women. Her most important work was a now-classic calculus textbook
published in 1748. Maria Agnesi is primarily remembered for a curve defined by the equation
a3
y=
x2 + a2
for a positive constant a. The curve was named veriera (from the Italian verb to turn) by Agnesi, but
John Colson, an Englishman who translated her work, confused the word versiera with the word avversiera,
which means “wife of the devil” in Italian; the curve has ever since been called the “witch of Agnesi.” This
was particularly unfortunate because Colson wanted Agnesi’s work to serve as a model for budding young
mathematicians, especially young women. Graph this curve, find the points of inflection (if any), and discuss
its concavity.
46. The spruce budworm is a moth whose larvae eat the leaves of coniferous trees. They suffer predation by birds.
Ludwig and other suggested a model for the per capita predation rate, p(x): ∗
bx2
p(x) =
a2 + x2
where b is the maximum predation rate and a is the number of budworms at which the predation rate is half
its maximum rate. What is the concavity of this curve, and is there a point of inflection?
47. Let f be a function that is twice differentiable on an interval I containing the point x = a. If there exists a
K > 0 such that |f ′′ (x)| ≤ K for all x in I, then
K 2
|f (x) − f (a) − f ′ (a)(x − a)| ≤ |x − a|
2
for all x in I. This result gives the error of the first-order approximation. Hint : Pick any point b 6= a on I.
Define
G(x) = f (x) − f (a) − f ′ (a)(x − a) − C(x − a)2
where C is chosen such that G(b) = 0. Differentiate G and apply the mean value theorem to f ′ .
48. Verify the second-order approximation formula.
49. Let f be a function with first and second-order derivatives at x = a. Consider a quadratic of the form
q(x) = b + c(x − a) + d(x − a)2 . Show that f (a) = q(a), f ′ (a) = q ′ (a), and f ′′ (a) = q ′′ (a) if and only if b = f (a),
c = f ′ (a) and d = f ′′ (a)/2.
∗ D. Ludwig, D.D. Hones,& C.S. Holling, “Qualitative analysis of insect outbreak systems: the spruce budworm and forest” Journal
of Animal Ecology(1978), v. 47, pp. 315-332.

3.7. L’HÔPITAL’S RULE 337
3.7 l’Hôpital’s Rule

In curve sketching, optimization, and other applications, it is often necessary to evaluate a limit of the form
f (x)
lim
x→c g(x)
where limx→c f (x) and limx→ c g(x) are either both 0 or both ∞. Such limits are called 0/0 indeterminate form
and ∞/∞ indeterminate form, respectively, because their value cannot be determined without further analysis.
There is a rule for evaluating such indeterminate forms, known as l’Hôpital’s rule, which relates the evaluation to a
computation of
f (x)
lim
x→c g(x)
the limit of the ratio of the derivatives of f and g. Here is a precise statement of this rule.
Theorem 3.1. l’Hôpital’s Rule
Let f and g be differentiable functions on an open interval containing c (except possibly at c itself ). Suppose
limx → c fg(x)
(x)
produces an indeterminate form 00 or ∞
∞ and that
f ′ (x)
lim =L
x→c g ′ (x)
where L is either a finite number, −∞, or ∞. Then

f (x)
lim =L
x→c g(x)
The theorem also applies to one-sided limits and to limits at infinity where x → ∞ and x → −∞.
H
When we use l’Hôpital’s rule we use the symbol = as shown in the following example.
Example 1. l’Hôpital’s rule with 0/0 form
Evaluate the following limits.

a. limx → 0 sinx x
x7 −128
b. limx → 2 x3 −8
Solution.
a. Note that this is of indeterminate form because sin x and x both approach 0 as x → 0. This means that
l’Hôpital’s rule applies:
d
sin x H sin x cos x
lim = lim dx d = lim =1
x→0 x x
dx x
x→0 x → 0
b. For this example, f (x) = x7 − 128 and g(x) = x3 − 8, and the form is 0/0.
x7 − 128 H 7x6
lim = lim l’Hôpital’s rule
x → 2 x3 − 8 x → 2 3x2
7x4
= lim Simplify
x→2 3
7(2)4
= Limit of a quotient
3
112
=
3

338 3.7. L’HÔPITAL’S RULE
Example 2. Exponential vs arithmetic growth
Thomas Malthus in his Essay on Population wrote
Population, when unchecked, increases in a geometrical ratio. Subsistence increases only in an arithmetical
ratio. A slight acquaintance with numbers will shew [sic] the immensity of the first power in comparison
of the second.
While Example 2 in Section 1.5 explored a special case of this observation, l’Hôpital’s rule allows us to fully appreciate
Malthus’ observation. Let P (t) = at for some a > 0 represent the size of a population at time t and let F (t) = bt for
some b > 0 represent the total amount of food available at time t. Find
F (t)
lim
t→∞ P (t)
and discuss its implications.
Solution. Since both at and bt approach infinity as t approaches infinity, we obtain
bt H b
lim = lim
t→∞ at t→∞ at ln a
= 0
Hence, as time marches on, the amount of food per individual approaches nothing. Therefore, either everyone receives
almost nothing or almost no one receives something. 2
Example 3. Applying l’Hôpital’s rule twice.
2x2 −3x+1
Evaluate limx → ∞ 3x2 +5x−2 .
Solution. We can evaluate this limit by multiplying by 1 written as (1/x2 )/(1/x2 ). Instead, we note that this is of
the form ∞/∞ and apply l’Hôpital’s rule twice:
2x2 − 3x + 1 H 4x − 3 H 4 2
lim = lim = lim =
x → ∞ 3x2 + 5x − 2 x → ∞ 6x − 5 x→∞ 6 3
2
Note that L’Hôpital’s rule is not the only way to solve the above example. We could have divided both the
numerator and denominator by 1/x2 to obtain
2x2 − 3x + 1 2 − 3/x + 1/x2 2−0+0 2

lim 2
= lim = =
x → ∞ 3x + 5x − 2 x → ∞ 3 + 5/x − 2/x2 3+0−0 3
However, most examples in the section do not yield to this simple procedure so that either L’Hôpital’s rule or other
more sophisticated procedures beyond the scope of this text must be employed. Before applying L’Hôpital’s rule,
however, we must check that the conditions of Theorem 3.1 apply. If they do not, then the analysis is not valid as
illustrated by the next two examples.
Example 4. Limit is not an indeterminate form
1−cos x
Evaluate limx → 0 sec x .

Solution. You must always remember to check that you have an indeterminate form before applying l’Hôpital’s
rule. The limit is
1 − cos x H limx → 0 (1 − cos x) 0
lim = = =0
x→0 sec x limx → 0 sec x 1
2
If you blindly apply l’Hôpital’s rule in Example 4, you obtain the WRONG answer:
1 − cos x H sin x
lim = lim This is NOT correct.
x→0 sec x x→0 sec x tan x
cos x
= lim
x → 0 sec x
1
=
1
= 1 Hence the answer is WRONG.
Example 5. Conditions of l’Hôpital’s rule are not satisfied
x+sin x
Evaluate limx → ∞ x−cos x .
Solution. This limit has the indeterminate form ∞/∞. If you try to apply l’Hôpital’s rule , you find
x + sin x H 1 + cos x
lim = lim
x→∞ x − cos x x → ∞ 1 + sin x
The limit on the right does not exist, because both sin x and cos x oscillate between −1 and 1 as x → ∞. Recall
′
that l’Hôpital’s rule applies only if limx → x fg′ (x)
(x)
= L or is ±∞. This does not mean that the limit of the original
expression does not exist or that we cannot find it; it simply means that we cannot apply l’Hôpital’s rule. To find
this limit, factor out an x from the numerator and denominator and proceed as follows:

x + sin x x 1 + sinx x
lim = lim
x → ∞ x − cos x x → ∞ x 1 − cos x
x
1 + sinx x
= lim
x → ∞ 1 − cos x
x
1+0
=
1−0
= 1
Other Indeterminate Forms

Remember that l’Hôpital’s rule itself applies only to the indeterminate forms 0/0 and ∞/∞. Other indeterminate
forms such as 1∞ , 00 , ∞0 , ∞ − ∞, and 0 · ∞, can often be manipulated algebraically into one of the standard forms
0/0 or ∞/∞, and then evaluated using l’Hôpital’s rule. The following examples illustrate such procedures.
Example 6. Compounded growth and e
Consider a bank account (or a flask) with initially one dollar (or one yeast cell per ml). If the money in this
account gets 100% interest annually (the yeast cells double once per day) and this interest is only applied once a year
(all yeast cells only replicate at the same time once a day), then at the end of the year (day) you have two dollars
(two yeast cells/ml). Alternatively, if twice a year (100/2)% interest is applied to the account (if half of the cells
reproduce every 12 hours), then at the end of the year (day) you have (1 + 1/2)2 = 9/4 dollars (cells/ml). Similarly,
if every month (100/12)% interest is applied to the account (1/12th of the cells reproduce every two hours), then you

have (1 + 1/12)12 dollars (cells/ml) at the end of the year (day). In general, if n times a year (100/n)% interest is
applied to the account (1/nth of the cell reproduces n times a day), then there are (1 + 1/n)n dollars (cells/ml) in
the account (flask) at the end of the year (day). If the interest (growth of yeast cells) is accumulating continuously,
then we expect to have
lim (1 + 1/n)n dollars (cell/ml)
n→∞
in the account (flask) at the end of the year (day). Find this limit.
Solution. Note that this limit is indeed of the indeterminate form 1∞ . Let
n
1
L = lim 1 +
n→∞ n
We take the logarithm of both sides:

n
1
ln L = ln lim 1 +
n→∞ n
n
1
= lim ln 1 + The natural logarithm is continuous.
n→∞ n

1
= lim n ln 1 + Property of logarithms
n→∞ n

ln 1 + n1
= lim 1 Form 00
n→∞
n
1

H 1
1+ n
− n12
= lim l’Hôpital’s rule
n→∞ − n12
1
= lim 1 Simplify
n→∞ 1+ n
= 1
Thus, ln L = 1 and L = e. 2
Example 7. Limit of the form 00
Find limx → 0+ xsin x .
Solution. This is a 00 indeterminate form. From the graph shown in Figure 3.29, it looks as though the desired
limit is 1.
We can verify this conjecture analytically. We proceed as with the previous example, by using properties of
logarithms.
L = lim xsin x Given equation

x → 0+

ln L = ln lim+ xsin x Logarithm of both sides.
x→0

= lim lnxsin x The natural logarithm is continuous.
x → 0+
= lim [(sin x) ln x] Property of logarithms.
x → 0+
ln x
= lim+ This is ∞
∞ form.
x → 0 csc x
H 1/x
= lim l’Hôpital’s rule
x → 0+ − csc x cot x

Figure 3.29: Graph of xsin x
− sin2 x
= lim Algebraically simplify
x → 0+ x cos x

sin x − sin x
= lim
x→ 0+ x cos x
= (1)(0)
= 0
Thus, L = e0 = 1. 2
Example 8. Finding a horizontal asymptote with l’Hôpital’s rule.
Find all horizontal asymptotes of the graph f (x) = x1/x for x > 0.
Solution. To determine whether the graph of f has a horizontal asymptote, we must evaluate
lim x1/x
x→∞
which is indeterminate of the form ∞0 . To find this limit, we take the natural logarithm and proceed as follows:
L = lim x1/x
x→∞
h i
ln L = ln lim x1/x
x→∞
h i
= lim ln x1/x
x→∞

1
= lim ln x
x→∞ x
ln x ∞
= lim Form
x→∞ x ∞
1
H x
= lim
x→∞ 1
= 0
Thus, we have ln L = 0; therefore, L =e0 = 1, so y = 1 is a horizontal asymptote for the graph of y = x1/x , as
shown in Figure 3.30.

Figure 3.30: Graph of y = x1/x with horizontal asymptote
We have just seen (Figure 3.28) that the graph of f (x) = x1/x approaches the line y = 1 asymptotically as
x → ∞, but how does f (x) behave as x → 0+ ? That is, what is
lim x1/x ?
x → 0+
It may seem that to answer this question, we will need to apply l’Hôpital’s rule again, but this limit has the form
0∞ , which is simply 0 and is not indeterminate at all. Note how the graph of f shown in Figure 3.28 approaches the
origin as x approaches zero from the right. Other forms that may appear to be indeterminate, but really are not,
are 0/∞, ∞/0, ∞ · ∞, ∞ + ∞, −∞ − ∞.
Problem Set 3.7

1. An incorrect use of l’Hôpital’s rule is illustrated in the following limit computations. In each case, explain what
is wrong and find the correct value of the limit.
1−cos x sin x
a. limx → π x = limx → π 1 =0
sin x cos x
b. limx → π/2 x = limx → π/2 1 =0
2. Sometimes l’Hôpital’s rule leads nowhere. For example, observe what happens when the rule is applied to
x
lim √
x→∞ 2
x −1
Use any method you wish to evaluate this limit.
Find the limits, if possible, in Problems 3 to 18.

x3 −1
3. limx → 1 x2 −1
x10 −1
4. limx → 1 x−1

1−cos2 x
5. limx → 0 sin2 x
1−cos x
6. limx → 0 x2
7. limx → ∞ x−5 ln x
8. limx → 0+ x−5 ln x
9. limx → 0+ sin x/ ln x
ln(ln x)
10. limx → ∞ x

3 2x
11. limx → ∞ 1 − x

1 3x
12. limx → ∞ 1 + 2x
13. limx → ∞ )(ln x)1/x

14. limx → 0+ (ex + x)1/x
ex −1−x−x3/2
15. limx → 0 x3
e−1/x
16. limx → 0 x
√
17. limx → ∞ x2 − x − x
1 √
18. limx → 0+ x2 − ln x
In Problems 19 to 22, use l’Hôpital’s rule to determine all horizontal asymptotes to the graph of the given function.
You are NOT required to sketch the graph.
19. f (x) = x3 e−0.01x
ln x5
20. f (x) = x0.02
√ 2/x
21. f (x) = (ln x)
2x
x+3
22. f (x) = x+2
Prove the parts of Theorem 3.10 [x-ref ] where k and n are positive integers in Problems 23-25.
ln x
23. limx → 0+ xn = −∞
ekx
24. limx → ∞ xn =∞
25. limx → ∞ xn e−kx = 0
26. Fisheries scientists have found that a Ricker stock-recruitment relationship, which has the form
y = axe−bx ,
where y is an index of the relative number of individuals recruited to the fishery each year (typically one-year
olds) and x is an index of the spawning stock biomass (sometimes measured in terms of eggs produced), provides
a reasonable fit to Norwegian cod data for parameter values a = 5.9 and b = 0.0018.∗.
a. What is the value of the recruitment index as x → ∞?
∗ For additional details see http://www.fw.umn.edu/FW5601/ALAB/LAB10/STOCKREC.HTM

b. What is the maximum value of the recruitment index and at what spawning stock index value does
it occur?
c. Over what range of spawning stock index values is the recruitment function concave up and over
what values is it concave down?
d. Use the information obtained in parts a.-c. to sketch this function.
27. An agronomist experimenting with a new breed of giant potato has found that individual tubers x months
after planting have a biomass in kilograms given by the equation y(x) = 2e1/(5x) for x > 0.
a. Calculate the rate of growth of the tuber over time and determine what happens to this rate in the
limit as x → 0 and x → ∞.
b. Find the time after planting when the growth rate of the tuber is maximized.
c. Show that the growth rate is positive for all x > 0 and determine the regions over which the growth
is accelerating and decelerating.
d. Sketch the biomass of the potato, as well as its growth rate, indicating the important points and
regions calculated in parts a.-c.
28. Determine which function, f (x) = xn with n > 0, g(x) = eax with a > 0, grows faster at ∞ by computing
limx→∞ fg(x)
(x)
.
f (x)
29. Determine which function, f (x) = xn with n > 0, g(x) = ln x, grows faster at ∞ by computing limx→∞ g(x) .
30. Consider a drug in the body whose current concentration is 1 mg/liter. In this problem, you investigate the
meaning of exponential decay of the drug.
a. If 1/2 of the drug particles cleared the body after 1 hour, determine the concentration of the drug
that remains after one hour.
b. If 1/4 of the drug particles cleared the body every half an hour, determine the concentration of the
drug that remains after one hour.
c. If 1/20 of the drug particles cleared the body every six minutes, determine the concentration of the
d. If 1/2n of the drug particles cleared the body every 1/nth of an hour, determine the concentration
cn of the drug that remains after one hour.
e. Find limn→∞ cn .
31. Consider a drug in the body whose current concentration is 1 mg/liter. In this problem, you investigate the
meaning of exponential decay of the drug.
a. If all of the drug particles cleared the body after 1 hour, determine the concentration of the drug
that remains after one hour.
b. If 1/4 of the drug particles cleared the body every half an hour, determine the concentration of the
c. If 1/10 of the drug particles cleared the body every six minutes, determine the concentration of the
d. If 1/n of the drug particles cleared the body every 1/nth of an hour, determine the concentration cn
of the drug that remains after one hour.
e. Find limn→∞ cn .
32. Historical Quest The French mathematician Guillaume de l’Hôpital’s (1661-1704) is best known today for
the rule that bears his name, but that rule was discovered by l’Hôpital’s teacher, Johann Bernoulli. Not only
did l’Hôpital neglect to cite his sources in his book, but there is also evidence that he paid Bernoulli for his
results and for keeping their arrangements for payment confidential. In a letter dated March 17, 1694, he asked
Bernoulli “to communicate to me your discoveries . . .” — with the request not to mention them to others

— “. . . it would not please me if they were made public.” L’Hôpital’s argument, which was originally given
without using functional notation, can easily be reproduced:∗
f (a + dx) f (a) + f ′ (a) dx

=
g(a + dx) g(a) + g ′ (a) dx
f ′ (a) dx
=
g ′ (a) dx
f ′ (a)
=
g ′ (a)
First, place some conditions on the functions f and g which will make this argument true. Finally, supply
reasons for this argument, and give necessary conditions for the functions f and g.
∗ D.J. Stuik, A Source Book in Mathematics, 1200-1800. Cambridge, MA: Harvard University Press, 1969, pp. 313-316.


DEFINITIONS
Section 3.1
Natural base, e, p.267
Section 3.2
Ricker model, p. 278
Section 3.3
Chain rule, p. 288
Explicit form, p. 291
Implicit form, p. 291
Implicit differentiation, p. 292
Section 3.5
Linearization, p. 410
Linear approximation, p. 313
Sensitivity, p. 317
Elasticity, p. 318
Section 3.6
Concave up, p. 323
Concave down, p. 323
Inflection point, p. 325
Quadratic approximation, p. 328
Section 3.7
Indeterminate forms, p. 337
Section 3.1
Power rule, p. 263
Sum rule, p. 264
Difference rule, p. 264
Scalar multiple rule, p. 264
Derivative of a general exponential ax , p. 268
Section 3.2
Product rule, p. 276
Reciprocal rule, p. 278
Quotient rule, p. 281
Section 3.3
Chain Rule, p. 288
Finding tangent lines to circles and other implicitly defined curves, p. 292
Derivatives of logarithms p. 294
Section 3.4
Limit of limx → 0 sinx x = 1, p. 305
Limit of limx → 0 cos xx−1 = 0, p. 305
Derivative rules for sine and cosine functions, p. 305
Derivative rules for the trigonometric functions, p. 308
Section 3.6
Concavity, p. 355
Second order approximation, p. 329
Higher-order derivatives, p. 330

Section 3.7
l’Hôpital’s rule p. 337
Section 3.1
Growth of a fetus heart
Clearance of HIV
Depletion of resources in the U.S.
Section 3.2
Survival rates
Breaking whelks
Dose-response curves
Section 3.3
Escaping parasitism
Clearance of Acetaminophen
Periodic populations
Rate of change of CO2
Section 3.5
Predicting bison abundance at Yellowstone National Park
Estimating metabolic rates
Elasticity of metabolic rates
Section 3.6
Sigmoidal decay in deaths due to aerial borne diseases
Section 3.7
Exponential population growth versus arithmetic resource growth
Problem Set 3.8

dy
Find dx in Problems
√ 1 to 6.
1. x3 + x x + sin 3x
2. xy + y 3 = 25
ln(x2 −1)
3. y = √
3
x(2−x)3
√
4. y = x2 e− x
√
5. y = x3 + x x + cos 2x
6. y = sin2 ( πx
4 )
d2 y
7. Find dx2 where y = x2 (2x − 3)3
d
8. Use the definition of derivative to find dx (x − 3x2 ).
2
9. Find the first- and second-order approximations to y = ex at x = 0. Graph the function and its approximations.
10. The graphs in Figure 3.31 are taken from the December 1990 issue of Mathematics Teacher (p. 718). Which
is the derivative and which is the function?
11. Sketch the graph of a function with the following properties:
f ′ (x) > 0 when x<1

Figure 3.31: A function and its derivative
f ′ (x) < 0 when x>1
f ′′ (x) > 0 when x<1
f ′′ (x) > 0 when x>1
What can you say about the derivative of f when x = 1.
12. Find numbers A, B, C, and D that guarantee that the function
f (x) = Ax3 + Bx2 + Cx + D
has a local minimum at (−1, 1) and a local maximum at (1, −1).
13. Let f be a function defined by

y = x3 + 35x2 − 125x − 9, 375
Determine where the function is increasing, where it is decreasing, and where the graph is concave up and
where it is concave down.
14. Let g be a function defined by

g(t) = (t3 + t)2
Determine the concavity, where the graph is rising, where it is falling, and where the points of inflection are
located.
15. Suppose the concentration in the blood at time t of a drug injected into the body is modeled by
C(t) = te−2t
At what time does the largest concentration occur? Use l’Hôpital’s rule to find the horizontal asymptote.
Graph this curve.
16. An individual wants to estimate the height of a tall tree. To do so, one cannot simply drop a tape measure
from the top of the tree. However, the height can be determined by using a sextant to determine the angle θ
between the ground and the tip of the tree at a distance of 100 ft from the base of the tree.
a. Find the height of the tree, H, as a function of θ.

b. If you measure an angle θ = 1.1 radians, determine the height of the tree.

c. Determine the sensitivity of the height in part b to θ. Discuss how a 10% error in measuring θ
influences the estimate for the height of the tree.
17. A bacterial colony is estimated to have a population of P thousand individuals, where
24t + 10
P (t) =
t2 + 1
and t is the number of hours after a toxin is introduced.
a. At what rate is the population changing when t = 1?
b. Is the rate increasing or decreasing at this time?
c. At what time does the population begin to decrease?
18. In the 1960s, scientists at Woodshole Oceanographic Institute measured the uptake rate of glucose by bacterial
populations from the coast of Peru.∗ In one field experiment, they found that the uptake rate can be modeled
1.2078x
by f (x) = 1+0.0506x micrograms per hour where x is micrograms of glucose per liter. If the current level of
glucose is x = 20 and is increasing at a rate of 10 micrograms per hour, determine the rate at which uptake
rate is changing.
19. Insect parasitoids have an immature life stage that develops on or within a single insect host. Parasitoids
are often called parasites, but the term parasitoid is more technically correct. Although the life cycle and
reproductive habits of parasitoids can be complex, but one of their characteristics includes that they lay their
eggs on or close to a host insect, and then devour the host insect.∗ The proportion of the host escaping
parasitism may depend on parasitoid density, d. Suppose the proportion of escaping parasitism is modeled by
P (d) = e−.05d
Does the proportion of escaping parasitism increase or decrease with parasitoid density? What is the concavity
of this curve, and is there a point of inflection?
20. The gross U.S. federal debt (in trillions of dollars) is plotted below.
7
gross federal debt in trillions
0
1940 1950 1960 1970 1980 1990 2000
year
Regarding this debt, President Ronald Reagan stated in 1979 that U.S. is
“...going deeper into debt at a faster rate than we ever have before”
Discuss the role of higher order derivatives in the graph of federal debt and President Reagan’s quote.
∗ From M.P. Hoffmann and A.C. Frodsham, Natural Enemies of Vegetable Insect Pests, New York (1993): Cooperative Extension,
Cornell University, Ithaca, NY, p. 63.

3.9 Group Projects

Working in small groups is typical of most work environments, and learning to work with others to communicate
specific ideas is an important skill. Work with three or four other students to submit a single report based on each of
the following questions.
Project 3A: Modeling North American Bison Population

If we look closely at the data plotted in Example 2, Section 3.5, on the abundance of North American Bison in
Yellowstone Park from 1902-1931, we get the distinct impression that these data can be represented much better by
two linear functions than by one: the first representing the data from 1902-1915 and the second representing the
data from 1915-1931. We can either fit these two functions by eye or we can work more precisely using the concept
of a sum-of-squares measure to gauge how well the line fits the data. This concept requires that we know the actual
values of the data points. Specifically, if we have n data points, indexed by i = 1, ..., n, then we need to know the
values (xi , yi ) for each data point. For the bison these data are specified in the table below. (Note that the data for
some years are missing. This is not a problem if we just ignore these missing points when indexing the data that
does exist.)
For the buffalo data, consider piecing together two linear functions so that the both meet at the point (x12 , y12 ) =
(1915, 270). Since both lines pass through this point, they must both satisfy the equation
y − 270
=c
x − 1915
for some constant c. If the line fitted to the 1902-1915 date is specified by a constant c1 and the line fitted to the
1915-1931 data is specified by a constant c2 , then the the actual function fitted to the data is y = f (x) where

c1 x + (270 − 1915c1) 1902 ≤ x ≤ 1915
f (x) =
c2 x + (270 − 1915c2) 1915 ≤ x ≤ 1931.
The only question that now remains is to find the values of c1 and c2 that provide the best fit of the function f (x)
to the data in the sense of minimizing the sum-of-squares measure of the fit. If we define the value of this measure
to be S, where
X26
2
S= (yi − f (xi )) ,
i=1
then we can plot the value of S for different choices of c1 and c2 . This is best done by considering the sums
11
X
S1 (c1 ) = (yi − f (xi ))2
i=1
and
26
X
S2 (c2 ) = (yi − f (xi ))2
i=12
separately.
1. By calculating S1 for a range of values of c1 and S2 for a range of values of c2 , and then plotting the results,
find to two significant figures the values of c1 and c2 that minimize the sum S = S1 + S2 . (We will not tell you
what range of values to use. You need to find this out by “playing around” with the functions until you find
the appropriate intervals over which to plot the two sums). This is a graphical approach to finding the best
fitting function f (x) defined above.
2. Can you think of a way that you might use your differential calculus to solve this problem analytically? Once
you find a way to do this, then solve the problem analytically and compare this analytical solution with your
graphical solution.
3. What advantages does the analytical solution have over the graphical solution and vice-versa?

Table 3.2: Population for the North American Bison
Index (i) Year (xi ) Abundance (yi )

1 1902 44
2 1903 47
3 1904 51
4 1905 74
5 1907 84
6 1908 95
7 1909 118
8 1910 149
9 1911 168
10 1912 192
11 1913 215
12 1915 270
13 1916 348
14 1917 397
15 1919 504
16 1920 501
17 1921 602
18 1922 647
19 1923 748
20 1925 830
21 1926 931
22 1927 1008
23 1928 1057
24 1929 1109
25 1930 1124
26 1931 1192
Project 3B: Modeling Sockeye Salmon

In Problem 34 of Problem Set 2.5, you came across the Ricker stock-recruitment relationship
y = axe−bx ,
with a = 1.35 and b = 1/100 providing a good fit to the sockeye salmon data illustrated in Figure 2.18. If we
denote the population abundance at time t by xt then, as we did for the Beverton and Holt model in Example 8 of
Section 2.5, we obtain the following Ricker fisheries model for this species:
xt+1 = axt e−bxt .
As with the Beverton and Holt model in Example 8 of Section 2.5, we are interested in finding out whether or not
there is a positive fixed point associated with the function f (x) = axe−bx , which we find by solving the equation
f (x) = x or, equivalently solving for the roots of the equation F (x) = f (x) − x = 0. Thus, for our Ricker model,
F (x) = axe−bx − x.
Our task now is to solve for the roots of the equation F (x) = axe−bx − x. Try doing this analytically for the
particular values a = 1.35 and b = 1/100: that is analytically find the solution to the equation 1.35xe−0.01x − x = 0.
Why can you not do this?
This equation can be solved numerically to a desired level of accuracy using calculus and approximation theory.
Here is one approach that works sometimes, and it will not always find all the roots of the equation F (x) = 0.
We know from calculus that if x0 is our guess at a root of this equation, then the linear approximation
F (x0 ) ≈ F (x) − F ′ (x0 )(x − x0 )

is good for all x close to x0 . In particular, if x1 is an actual root of F (x), then F (x1 ) = 0 and
F (x0 ) ≈ F ′ (x0 )(x1 − x0 ).
After rearranging terms the equation becomes
F (x0 )
x1 ≈ x0 − .
F ′ (x0 )
Of course, if we consider this expression to be exact, rather than approximate—that is, we guess the root to be x0
and we use the equation x1 = x0 − FF′(x 0)
(x0 ) to calculate the root—then x1 is a new approximation rather than the
actual value itself. Because we are using a linear approximation, only if F (x) is linear will x1 be the actual root.
Under a specific set of conditions, however, we can expect x1 to be a better approximation to the root than our
original guess x0 . Further, we can iteratively generate a sequence of values
F (xn )
xn+1 = xn − , n = 0, 1, 2, (3.1)
F ′ (xn )
in the hope that it converges an actual root x̂ of F (x). Specifically, if xn → x̂ as n → ∞, then the above equation
implies
F (x̂)
x̂ = x̂ − ′ ⇒ F (x̂) = 0 provided F ′ (x̂) exists and is not equal to 0,
F (x̂)
in which case x̂ is root of F (x) and x̂ is a fixed point of f (x).
1. Use this numerical scheme to generate a sequence of points xi for the Ricker model f (x) = 1.35xe−0.01x starting
from a number of different initial guesses x0 for the fixed point solution to the equation f (x) = x. Does the
solution always converge to the same value?
2. This numerical procedure for finding roots is called Newton’s Method and also the Newton-Raphson Method.
See what you can find out about this method for finding the roots of functions such as x4 − x2 = 0 that have
multiple solutions (in this latter case x = −1, 0, 1 are all roots). When might you expect the method to work
and when might it fail? (First play around with finding the roots of F (x) = x4 − x2 and then read the material
in Section 4.5 and search the literature to find out more about the procedure.)

Chapter 4
Applications of Differentiation
4.1 Properties of Graphs, p. 355

4.2 Getting Extreme, p. 366
4.3 Optimization in Biology, p. 379
4.4 Applications to Optimal Behavior, p. 393
4.5 Linearization and Difference Equations, p. 408
Figure 4.1: A great tit is a species of bird whose foraging behavior was studied by biologist Richard Cowie and whose
behavior can be predicted by optimal foraging models.
PREVIEW
One of the central ideas in physics, chemistry, and biology is that processes act to optimize some physically or
biologically meaningful quantity. For example, from physics we know that light travels along a path that is the
shortest distance between two points (taking into account that gravity “bends” space) and, from biochemistry, we
know that proteins fold in a way that minimizes the energy of their constituent amino acid configuration.

354
Differential calculus is an important tool for analyzing such optimization (maximization or minimization) pro-
cesses. In this chapter we show how it applies to various biological problems and processes. Before we do this, however,
we first study how calculus can be used to graph a variety of different functions. We then develop procedures for
modeling and solving optimization problems. After going through a number of different biological applications, we
study how calculus provides insight into dynamical processes such as the growth of populations or the spread of dele-
terious or mutant genes (e.g. the gene that cause sickle cell anemia) within populations. We end the chapter with
an application of difference equations that is at the heart of many numerical methods used by current technologies
for finding solutions to nonlinear equations.

4.1. GRAPHING WITH GUSTO 355
4.1 Graphing with Gusto

In this section, we get to put together many of the tools that we have learned so far (i.e. limits involving infinity,
first order and second derivatives) to graph a function. In graphing a function, you should envision walking along
the graph and indicating all the highlights of your walk. For instance, vertical asymptotes are places you have such
a rapid ascent or descent that it makes Mount Everest look like a stroll in park. Horizontal asymptotes are places
where the landscape levels out into a never ending plain. Where the derivative is positive the graph is ascending and
where the derivative is negative the graph is descending. Switches in the the sign of the slope correspond to either
hill tops and valley bottoms along your walk. On ascents where the second derivative is positive, the walk is getting
harder. On descents where the second derivative is negative, your descent becomes faster.
Properties of Graphs
When graphing the function y = f (x) by hand, do the following to find the highlights of the functions shape:
Vertical asymptotes: Determine at what points the function is not well defined (e.g. division by zero). At each of
these points, say x = a, evaluate the one-sided limits, limx→a+ f (x) and limx→a− f (x), to determine what the
graph looks like near x = a. If either of these one-sided limits is +∞ or −∞, then there is a vertical asymptote
at this point.
Intervals of increase and decrease: Compute the first derivative f ′ (x) of f (x) and determine on which intervals
f ′ (x) > 0 and on which intervals f ′ (x) < 0. On these intervals, f is increasing and decreasing, respectively.
Intervals of concavity: Compute the second derivative f ′′ (x) and determine on which intervals f ′′ (x) > 0 and
f ′′ (x) < 0. On these intervals, f is concave up and concave down, respectively.
The x and y intercepts Find the x intercepts (i.e. where f (x) = 0) and the y intercept (i.e. y = f (0)). These
points help pin down the placement of the graph.
After identifying these functional highlights, we can try to sketch the function as best as we can.
Example 1. Doodling with Whelk Droppings
In Example 6, Section 3.2, we considered how often D(h) a whelk had to be dropped from a height of h meters
before breaking. The function based on data collected by Reto Zach∗ is given by
20.4
D(h) = 1 + drops
h − 0.84
a. Find the horizontal and vertical asymptotes.
b. Find on what intervals D is increasing and on what intervals D is decreasing.
c. Find on what intervals D is concave up and on what intervals D is concave down.
d. Take all this information and sketch D(h). Discuss for which h values, this function is biologically
meaningful.
Solution.
a. There is a vertical asymptote at h = 0.84. Moreover, limh→0.84+ D(h) = +∞ and limh→0.84− D(h) = −∞.
Since limh→±∞ D(h) = 1, D has a horizontal asymptote of 1.
∗ Zach, Reto, “Selection and dropping of whelks by northwestern crows.” Behavior 67 (1978): 134 - 147.

356 4.1. GRAPHING WITH GUSTO
b. Taking the first derivative yields

20.4
D′ (h) = −
(h − 0.84)2
Since (h − 0.84)2 is always positive for h 6= 0.84, we get D′ (h) < 0 for all h 6= 0.84. Therefore, D is
decreasing for all h 6= 0.84.
c. Taking the second derivative yields
40.8
D′′ (h) =
(h − 0.84)3
which is positive for h > 0.84 and negative for h < 0.84. Hence, D is concave up for h > 0.84 and concave
down for h < 0.84.
d. Putting all this together yields the following graph.
¿From the crow’s point of view, this graph is only meaningful for h > 0.84. For h > 0.84, the graph we
drew is very similar to the graph found in Figure 3.5 from Chapter 3.
Many functions have no horizontal asymptotes. Nonetheless, understanding the limits as x approaches ±∞ may
help us graph the function.
Example 2. The double-welled potential
Consider the function y = x4 − 2x2 .

a. Find the asymptotes, the intervals where the function is increasing/decreasing, the intervals where the
function is concave up/down, the roots (x intercepts) and the y intercept.
b. Use all the information found in a. to graph the function.
Solution.
a. The function is continuous for all real numbers. Hence, there are no vertical asymptotes. We have
limx→±∞ x4 − 2x2 = limx→±∞ x2 (x2 − 2) = ∞. While there are no horizontal asymptotes, we know that
the function gets arbitrarily positive as x gets either sufficiently positive or sufficiently negative.
To determine where the function is increasing, we determine where the derivative equals zero:
dy
0= = 4x3 − 4x
dx
0 = 4x(x2 − 1)
Hence, the derivative vanishes at x = 0, ±1. Since dy dy dy

dx = 24 at x = 2, dx > 0 on (1, ∞). Since dx = −3/2
dy dy dy dy
at x = 1/2, dx < 0 on (0, 1). Since dx = 3/2 at x = −1/2, dx > 0 on (−1, 0). Since dx = −24 at

dy
x = −2, dx < 0 on (−∞, −1). Therefore, the function is increasing on the intervals (−1, 0) and (1, ∞).
The function is decreasing on the intervals (−∞, −1) and (0, 1). We will formalize this idea in the next
section.
To determine intervals of concave up and concave down, we determine where the second derivative equals
zero.
d2 y
0= = 12x2 − 4
dx2
1
x = ±√
3
d2 y
√ √ d2 y
Since dx 2 = −4 at x = 0, y is concave down on (−1/ 3, 1/ 3). Since = 8 at x = ±1, y is concave
√ √ dx2
up on (−∞, −1/ 3) and (1/ 3, ∞).
The y intercept (when x = 0) is y = 0. The roots (when y = 0) are given by
0=y = x4 − 2x2
0 = x2 (x2 − 2)
√
x = 0, ± 2
b. To sketch the graph using the information from a., we can envision how the graph of the function
√ changes
dy
as you move from −∞ to ∞. Since limx→−∞ y = ∞, dx < 0 on (−∞, 1), and y = 0 at − 2, the function
√
decreases from +∞, crosses the x axis at x = − 2, and continues to decrease to the value y = −1 at
dy dy
x = −1. Since dx > 0 on (−1, 0), the function increase to y = 0 at x = 0. Since dx < 0 on (0, 1), the
dy
√
function decreases to y = −1 at x = 1. Since dx > 0 on (1, ∞), y = 0 at x = 2, and limx→∞ y = ∞,
√
the function increases, crosses the x axis again at x = 2,√and approaches +∞ as x approaches +∞.
Moreover, the function changes concavity at the points ±1/ 3. Hence, the graph looks something like:
Example 3. Doodling with derivatives and asymptotes
x2 +2
Consider the function y = x .
a. Find the asymptotes, the intervals where the function is increasing/decreasing, the intervals where the
function is concave up/down, and the x and y intercepts.
b. Use all the information found in a. to graph the function.
Solution.

a. We have y = x + x2 has a vertical asymptote at x = 0. In fact, limx→0+ y = +∞ and limx→0− y = −∞.

Since limx→∞ y = ∞ and limx→−∞ y = −∞, y has no horizontal asymptotes.
To find the intervals of increase and decrease, we determine where the first derivative equals zero:
dy 2
0= = 1−
dx x2
2
= 1
x2
2 = x2
√
x = ± 2
dy dy
√
Since dx = 1/2 at x = ±2 and dx = −1 at x = ±1, we find that y is increasing on the intervals (−∞, − 2)
√ √ √
and ( 2, ∞), and decreasing on the intervals (− 2, 0), (0, 2).
2
d y 4
To determine concavity, we compute the second derivative dx 2 = x3 which is positive when x > 0 and
negative when x < 0. Hence, y is concave up on (0, ∞) and y is concave down on (−∞, 0).
There is no y intercept as the function has a vertical asymptote at x = 0. The roots (x intercepts) must
satisfy
2
0=y = x+
x
0 = x2 + 2
for which there is no real valued solution. Hence there are no roots.
b. To graph y = x + x2 , think about what happens as you move from −∞ to ∞. Since limx→−∞ y = −∞
dy
√ √ √ dy
and dx > 0 on (−∞, − 2), we find the graph increases from ∞ to y = −2 2 at x = − 2. Since dx <0
√
on (− 2, 0) and limx→0− y = −∞, √ the graph decreases to infinity as x approaches
√ 0 from
√ below. Since
dy dy
limx→0+ y = ∞ and dx < 0 on (0, 2), the graph decreases from ∞ to y = 2 2 at x = 2. Since dx >0
√
on ( 2, ∞) and limx→∞ y = +∞, the graph increases toward +∞ as x approaches +∞. Moreover, the
concavity only changes at x = 0. Finally, since limx→±∞ x2 = 0 it follows that y = x + x2 behaves like
y = x for sufficiently positive or negative values of x. Using this information, we obtain a sketch that
looks something like:
Sometimes just using limits and first derivatives is enough to get a good sense of the graph.
Example 4. Tylenol in the bloodstream
As a project for a mathematical biology class, three William and Mary students developed a model of how
acetaminophen levels diffuse from the stomach and intestines to the bloodstream after taking a dosage of 1000mg.

Using FDA data, they found that
C(t) = 28.6(e−0.3t − e−t ) micrograms/ml
where t is hours after taking the dosage. Use information about asymptotes, and first derivatives to sketch this
function. Discuss the meaning of the graph.
Solution. Since C(t) is continuous everywhere, there are no vertical asymptotes. Since e−0.3t and e−t approach
zero as t gets large, limt→+∞ C(t) = 0. Therefore there is a horizontal asymptote at C = 0. Alternatively, noting
that e−0.3t − e−t = e−0.3t (1 − e−0.7t ), it follows from limt→−∞ e−0.3t = ∞ that limt→−∞ C(t) = −∞.
Taking the first derivative yields
C ′ (t) = 28.6 e−t − 0.3e−0.3 t
We have C ′ (t) = 0 if and only if
e−t = 0.3e−0.3t
e−0.7t = 0.3
−0.7t = ln 0.3
ln 0.3
t = ≈ 1.72 hours
−0.7
Since C ′ (0) ≈ 20, we have C ′ (t) > 0 on (−∞, 1.72). Since C ′ (t) < 0 for t very large, we have C ′ (t) < 0 on (1.72, ∞).
Hence, as t goes from −∞ to 0, the function increases up from −∞ and passes through 0 at t = 0. C(t) increases
from t = 0 to t ≈ 1.72 at which point it takes on the value of approximately 12 micrograms/ml. For t greater than
1.72, C(t) decreases toward zero as t approaches +∞. Therefore, we can graph the function as follows:
This graph is only meaningful for t ≥ 0. It shows that initially there is no drug in the bloodstream and then the
concentration of drug increases to a maximum concentration of 12 mg per ml after 1.7 hours. Hence, the maximum
effect of Tylenol is only felt after approximately 2 hours. After reaching the maximum value, the concentration
decays to zero. 2
Graphing Families of Functions

The shape of some functions, such as f (x) = 3x + a, does not depend in any critical sense on the value of the
parameter a. All a does is move the line of slope 3 up and down the x-y plane. As we will see in this subsection,
however, in more complicated functions the value of a parameter can have a surprising effect, and we can use calculus
to discover such effects.
Example 5. To infinity and back
Consider the function f (x) = x21−a with the parameter a. Using first derivatives and asymptotes determine how
the shape of this function depends on the parameter a.
Solution. Since limx→±∞ x21−a = 0, f (x) has a horizontal asymptote at 0 as x → ±∞. If a < 0, then x2 − a is
positive for all x and there are no vertical asymptotes. If a = 0, then there is a vertical asymptote at x = 0. If a > 0,

√
then there are horizontal asymptotes at x = ± a. Computing the first derivative yields
2x
f ′ (x) = −
(x2 − a)2
Since the denominator of this expression is positive whenever x2 6= a, f ′ (x) < 0 for all positive x with x2 6= a and
f ′ (x) > 0 for all negative x with x2 6= a.
These computations suggest there are qualitatively three distinctive graphs. First, consider the case where a < 0.
In this case, there are no vertical asymptotes. There are horizontal asymptotes of 0 as x → ±∞. Moreover, f (x) is
increasing for negative x and f (x) is decreasing for positive x. Hence, the graph looks something like:
Next, consider the case where a = 0. In this case, f (x) = x12 and there is a vertical asymptote at x = 0. In fact,
limx→0 x12 = ∞. There are horizontal asymptotes of 0 as x → ±∞. Moreover, f (x) is increasing for negative x and
f (x) is decreasing for positive x. Hence, the graph looks something like (as you well know!)
√
Finally, consider the case where√a > 0. In this case, there are vertical asymptotes at x = ± a. In fact, evaluating
all the limits as x approaches ± a yields
1
lim
√ + = ∞
x→ a x2 − a
1
lim
√ = −∞
x→ a− x2 − a
1
lim
√ = −∞
x→− a+ x2 − a
1
lim
√ = ∞
x→− a− x2 − a

There are horizontal asymptotes of 0 as x → ±∞. Moreover, f (x) is decreasing for negative x and f (x) is increasing
for positive x. Hence, if we walk along the graph
√ of f from +∞ to −∞,
√ then we initially ascend from 0. The ascent
gets exceptionally steep as x approaches − a. As we cross x = − a, we suddenly fall down to y = −∞. After
1
√ reaching a maximum of y = − a at x = 0. From there, we descend
crossing both infinities, we continue to ascend until
through −∞ and skyrocket past +∞ at x = a. After this harrowing jump through infinities, we continue with a
descent to zero. In other words, our graph looks something like:
Example 6. Dose response curves
Dose response curves can be used to plot the response of an individual to a dosage of a drug or hormone. This
response can almost be anything. For instance, the response may be heart rate, dilation of an artery, membrane
potential, enzyme activity or the secretion of a hormone. We have previously encountered dose response curves in
Example 2 from Section 2.4 and Problem 29 from Problem Set 3.2. A general form of a dose response curve is
b−a
y =a+ ,
1 + ex−c
where y is the response of the individual and x is the concentration of the dosage of drug or hormone. The parameters
a > 0, b > 0, and c > 0 affect the shape of the dose response curve and often can be used to fit the function to
particular data sets. Assuming c = 0, use limits and first derivatives to determine how the shape of this curve
depends on the parameters a and b.
b−a
Solution. Let f (x) = a + 1+e x . Since f (x) is continuous for all reals, there are no vertical asymptotes. Since
b−a b−a
limx→∞ a + 1+ex = a and limx→−∞ a + 1+e x = a + b − a = b, there is a horizontal asymptote of a as x approaches
+∞ and a horizontal asymptote of b as x approaches −∞.

Taking the first derivative of f , we obtain
b−a x
f ′ (x) = − e
(1 + ex )2
This derivative is negative for all x if b > a, positive for all x if b < a, and zero for all x if b = a.
Hence, the graph of f (x) comes in three flavors. If b > a, then the function decreases from an asymptote of b to
an asymptote of a. If b < a, then the function increases from an asymptote of b to an asymptote of a. Finally, if
b = a, then the function is the constant function y = a. These three different graphs are sketched below:

Example 7. Stock recruitment curves
In conservation biology and fisheries management, stock-recruitment curves are used to describe the relationship
between the current abundance of a population (i.e. the stock) and the number of juveniles entering the system in
the next year (i.e. the recruits). A general class of stock-recruitment curves are given by the functions
aN b
F (N ) =
1 + Nb
where N is the current population size, F (N ) is the number of recruits in the next generation, and a and b are
positive parameters. A useful way to distinguish these function is to consider the relationship between the current
population abundance and the average of number recruits per individual i.e.
F (N ) aN b−1
f (N ) = =
N 1 + Nb
Use limits and first derivatives to determine how the parameter b influences the shape of f (N ). Discuss the possible
meaning.
Solution. Notice that if 0 < b < 1, then there is a vertical asymptote at N = 0 and limN →0+ f (N ) = ∞. If b ≥ 1,
there is no vertical asymptote. To determine the horizontal asymptote as N → ∞, it suffices to notice that the power
of the numerator is less than the power of the higher order term in the denominator. Hence, limN →∞ f (N ) = 0.
The first derivative of f (N ) is given by
(b − 1)N b−2 (1 + N b ) − bN b−1 N b−1

f ′ (N ) = a
(1 + N b )2
N b−2 (b − 1) − N 2b−2
= a
(1 + N b )2
N b−2 (b − 1 − N b )
= a
(1 + N b )2
Hence if b ≤ 1, f ′ (N ) < 0 for all N ≥ 0. However if b > 1, then f ′ (N ) > 0 for 0 ≤ N ≤ (b − 1)1/b and f ′ (N ) < 0
otherwise. Therefore, we get three types of graphs depending on whether b < 1, b = 1 or b > 1:
For b ≤ 1, we get the number of recruits constantly decreases with stock levels. One interpretation of this fact is
that for whatever reason, b ≤ 1 causes the situation that at higher population densities, there are fewer resources

per individuals and, consequently, fewer recruits produced per individual. For b > 1, we get the number of recruits
per individual initially increases and then decreases. One possible explanation is that for whatever reason, b > 1
causes the situation that at low population densities, individuals have difficulty finding mates. Therefore, as densities
increase the chance of finding mates increases and the number of recruits produced per individual increases. However
as the population density increase too much (i.e. beyond (b − 1)1/b ) the advantage of finding mates is outweighed
by the limited resources available per individual. Consequently, at higher densities, the number of recruits per
individual decreases. When b > 1, the population exhibits what ecologists call dispensation or a strong Allee effect.
Using data on 128 species, Ran Myers and colleagues∗ used F (N ) to evaluate to what extent fish populations exhibit
dispensation and discussed the implications for populations to recover from environmental disturbances. 2
Problem Set 4.1

In problems 1 to 14, graph the following functions by finding asymptotes and using first and second order derivatives.
Compare your graphs to what you get with technology.
1. y = x2 − x
2. y = x2 + 5x − 3
1
3. y = 1+x2
x
4. y = 1+x2
1
5. y = x + 2+x
1
6. y = x−1 +x
9 x2
7. y = −12 x − 2 + x3
8. y = 13 x3 − 9x + 2
9. y = ex + 2e−x
10. y = 2ex + e−x
11. y = x − x3
2+x
12. y = 1+x
x−3
13. y = x+1
x2
14. y = 1+x4
In problems 15 to 20, graph the following families of functions by finding asymptotes and using first and second order
derivatives. In particular determine how the graph of the functions depends on the parameter a > 0.
15. y = x4 − ax2
ax
16. y = x2 +1
17. y = aex + e−x
18. y = ex + ae−x
∗ Myers, R. A.; Barrowman, N. J.; Hutchings, J. A.; Rosenberg, A. A. J. 1995. Population Dynamics of Exploited Fish Stocks at Low
Population Levels. Science, Volume 269, Issue 5227, pp. 1106-1108

a+x
19. y = 1+x
1
20. y = ax + x
In problems 21 to 22, sketch the graph of a function with the given properties.
21. y = 2, y = −2 are asymptotes
f is increasing for 0 < x < 2 and x > 2
f is decreasing for x < −2 and −2 < x < 0
The graph is concave down on (−∞, −2) and (2, ∞)
The intercepts are (−1, 0), (0, −4) and (1, 0).
22. y = 1, y = −1 are asymptotes
f is increasing for x < − 23 and for x > 3
2
f is decreasing for −1 < x < 1
The graph is concave down for x < −1 and for 0 < x < 1.
The graph is concave up for (x > −1 and for −1 < x < 0.
23. Consider the graph of y = ax2 + bx + c for constants a, b, and c. Use second order derivatives to determine
what happens to the graph as a changes.
ax
e
24. Consider the graph of y = 1+e ax . Use limits and first derivatives to determine how the shape of this curve
depends on the parameter a.

25. In Example 5 from Section 2.7, we consider patterns of local ant species richness along an elevational gradient
in the Spring Mountains in Nevada.∗ A parabola which best fits this data is
S = −10.3 + 24.9 x − 7.7 x2 ,
where x is elevation measured in kilometers and S is the number of species. Plot this function using information
about first derivatives.
26. In Example 6 in Section 1.6, we developed the Michaelis-Menton model for the rate at which an organism
consumes its resource. For bacterial populations in the ocean, this model was given by
1.2078x
f (x) = micrograms of glucose per hour
1 + 0.0506x
where x is the concentration of glucose (micrograms per liter) in the environment. Use asymptotes and first
derivatives to sketch this function by hand.
27. In Example 4 in Section 2.4, we found that the rate at which wolves kill moose can be modeled by
3.36x
0.42 + x
where x is measured in number of moose per km2 . Use asymptotes and first derivatives to sketch this function.
28. In Problem 39 in Section 2.4, we examined how wolf densities in North America depend on moose densities.
We found that the following function provides a good fit to the data:
58.7(x − 0.03)
0.76 + x
where x is number of moose per km2 .
∗ N. Sanders, J. Moss, and D. Wagner, “Patterns of ant species richness along elevational gradients in an arid ecosystem,” Global
Ecology and Biogeography, 2003, 12:93–102

a. Find the horizontal and vertical asymptotes.

b. Determine on which intervals f is increasing and decreasing.
c. Determine on which intervals f is concave up and concave down.
d. Use the information from (a)–(c) to sketch the graph of f (x).
f (t) = 890 sech2 (0.2 · t − 3.4) deaths/week
where t is measured in weeks. Sketch this function using information about asymptotes and first derivatives.
Recall that
2
sech x = x
e + e−x
30. Let f be a function that represents the weight of a fish at age t. Write a function that satisfies the following
properties.
i. The weight of the fish at birth must be positive.
ii. As the fish ages, the weight increases at decreasing rate.
iii. No fish can grow bigger than 2 kg.
31. As a project for their mathematical biology class, three William and Mary students developed a model of how
acetaminophen levels varied in the blood stream for a child after taking a dosage of 325mg. Using FDA data,
they found that
C(t) = 23.725 −e−0.7 t + e−0.5 t micrograms/ml
where t is hours after taking the dosage.
a. Use information about asymptotes, and first derivatives to sketch this function.
b. Discuss the meaning of your graph. In particular, address when the maximum concentration is
achieved and what the maximum concentration is.
32. In an experiment, a microbiologist introduces a toxin into a bacterial colony growing in an agar dish. The data
on the area of the dish covered by living colony members at time t minutes after the introduction of the toxin
is given by the equation
A(t) = 5 + e−0.04t+1
Sketch the graph of A(t) showing its salient features.
33. The aerobic rate is the rate of a person’s oxygen consumption and is sometimes modeled by the function A
defined by
ln x − 2
A(x) = 110
x
for x ≥ 10. Graph this function.
34. A naturalist at an animal sanctuary has determined that the function
2
4e−(ln x)
f (x) = √
πx
provides a good measure of the number of animals in the sanctuary that are x years old. Sketch the graph of
f for x > 0.

366 4.2. GETTING EXTREME
4.2 Getting Extreme

When viewing the graph of a function as a landscape, hilltops and valley bottoms corresponds to places that a
function has an extremum. Methods to identify these extrema play an important role in applications. For instance,
if the function of interest represents how profits due to harvesting a crop depend on the amount of seeds planted,
then the farmer would like to know how many seeds per acre yield the greatest profits. In other words, he would
like to identify the largest hilltop of the function. Alternatively, if a northwestern crow minimizes the amount of
energy required to break whelk shells, then the crow’s behavior corresponds to the deepest valley of a function. In
this section, we develop methods to find these hilltops and valleys.
Local extrema
Let f be a function. We say that f has a local maximum at x = a if
f (a) ≥ f (x)
Local maxima for all x near a. We say that f has a local minimum at x = a if
and minima
f (a) ≤ f (x)
for all x near a. We say f has a local extremum at x = a if there is a local

maximum or local minimum at x = a.
Example 1. Finding Extrema
Estimate for what x values, y = f (x) which is graphed below has local maximum and local minima.
6
-3 -2 -1 1 2 3
-1
Solution. There are local minima at x ≈ −1.75, x = 0, and x = 3. There are local maxima at x = −3, x ≈ −1, and
x = 2. The point x = 2 does not correspond to an extremum as there are nearby values of x for which f (x) > f (2)
and for which f (x) < f (2). 2
The previous example suggests that either extrema occur at end points of the domain, points where f is not
differentiable, or points where the derivative of f equals zero. The following theorem verifies these observations.
Theorem 4.1. Fermat’s Theorem
If f is defined on (a, b) and has a local extremum at c ∈ (a, b), then either f ′ (c) = 0 or f ′ (c) is not defined.

4.2. GETTING EXTREME 367
Proof. Suppose that f is defined on (a, b) and has a local extremum at c ∈ (a, b). This extremum is either a local
maximum or a local minimum. Suppose that this extremum is a local minimum. Then we have f (c) ≤ f (x) for all
x near c. Equivalently, f (c) ≤ f (c + h) for all h sufficiently small. Taking a difference quotient yields that
f (c + h) − f (c)
≥0
h
for all sufficiently small positive h and
f (c + h) − f (c)
≤0
h
for all sufficiently small negative h. Assume f ′ (c) exists. Then taking one sided limits yields
f (c + h) − f (c)
f ′ (c) = lim+ ≥0
h→0 h
and
f (c + h) − f (c)
f ′ (c) = lim ≤0
h→0− h
Therefore f ′ (c) = 0. The case of a local maximum is proved similarly and left as a problem in the problem set. 2
Fermat’s Theorem tell us that we can find possible local maxima and local minima by finding points where
f ′ (x) = 0 or f ′ is not defined. Such points have a special name.
Critical points If f ′ (c) = 0 or f ′ (c) is not defined, then c is a critical point for f . The value of f
and values at a critical point is called a critical value.
While all local extrema are critical values, not all critical values are local extrema. Consider, for example, y = x3 .
dy
The derivative is dx = 3x2 . Hence x = 0 is the only critical point. However as y = x3 increases over all the reals,
x = 0 is neither a local maximum or a local minimum.
Example 2. Finding and Classifying Critical Points
Find the critical points of y = x3 − 3x2 − 4 and determine whether these critical points are local maxima or local
minima or neither.
dy dy
Solution. We have dx = 3x2 − 6x = 3x(x − 2). Hence dx = 0 at x = 0 of x = 2. These are the critical points of
y. To determine whether these critical points correspond to local maxima, local minima, or neither, we can consider
dy dy
how the sign of derivative varies over the real line. Since dx < 0 for 0 < x < 2 and dx > 0 for x > 2, we get the
the function decreases over the interval (0, 2) and increases on (2, ∞). Hence, there is a local minimum of y = −8
dy dy
at x = 2. Alternatively, since dx > 0 for x < 0 and dx < 0 for 0 < x < 2, we find that the function increases until
x = 0 and then decreases. Hence, there is a local maximum of y = −4 at x = 0. Graphing y = x3 − 3x2 − 4 with
technology corroborates these statements.

Example 2 illustrates one method for identifying local maxima and local minima.
Assume f ′ (c) = 0 and f is differentiable near x = c.

• If the sign of f ′ changes from positive to negative at x = c (i.e. f changes
First derivative test from increasing to decreasing), then f has a local maximum at x = c.
• If the sign of f ′ changes from negative to positive (i.e. f changes from de-
creasing to increasing) at x = c, then f has a local maximum at x = c.
Figure 4.2: A diagram of the heart. Thermodilution involving injecting a cold dextrose solution in the venacava and
measures the temperature in from the aorta or artery.
Example 3. Thermodilution
Cardiac output can be determined by thermodilution. The doctors inject 10 milliliter of a cold dextrose solution
in a vein entering the heart. As the cold solution mixes with the blood in the heart, the temperature variations in the
blood leaving the heart are measured. A typical temperature variation curve (i.e. degrees below normal temperature)
plotted below may be described by the function T (t) = 0.2t2 e−t degrees Celsius where t is measured in seconds. Find
the critical points and classify them. Discuss the meaning of your results.
Solution. Taking the derivative yields
T ′ (t) = 0.2(2t)e−t − 0.2t2 e−t

= 0.2te−t (2 − t)
We have T ′ (t) = 0 at t = 0 and t = 2. Hence, t = 0, 2 are the critical points of T . To apply the first derivative test,
we need to determine the sign of T ′ on the intervals (−∞, 0), (0, 2) and (2, ∞). Since T ′ is continuous everywhere,
it can only have sign changes at the points t = 0, 2. Therefore, it suffices to check the sign of T ′ at one point in each
of the intervals. Since T ′ (−1) = −0.6e1 < 0, T ′ is negative on (−∞, 0). Since T ′ (1) = 0.2e−1 > 0, T ′ is positive on
(0, 2). Since T ′ (3) = −0.6e−3 < 0, T ′ is negative on (2, ∞). Since at t = 0 the sign of T ′ changes from negative to
positive, we have a local minimum at t = 0. Since at t = 2 the sign of T ′ changes from positive to negative, at t = 2

we have a local maximum. Hence, the temperature of blood leaving the heart drops T (2) ≈ 0.11 degrees Celsius
after two seconds before returning to its normal temperature. 2
Another possibility for identifying local maxima and local minima is using the second derivative. Suppose f has
a critical point at x = a and has second order derivatives at x = a. From Section 3.6, we have seen that a second
order approximation of f (x) is given by:
f (x) ≈ f (a) + f ′ (a)(x − a) + f ′′ (a)(x − a)2 /2
Since a is a critical point, f ′ (a) = 0 and the second order approximation reduces to
f (x) ≈ f (a) + f ′′ (a)(x − a)2 /2
Provided that f ′′ (a) 6= 0, the graph of this equation is given by a parabola whose vertex is at x = a. Furthermore,
if f ′′ (a) > 0, then this parabola is facing up and there is a local minimum at x = a as shown in Figure 4.3a.
Alternatively, if f ′′ (a) < 0, then this parabola is facing down and there is a local maximum at x = a as shown in
Figure 4.3b.
a. Local minimum b. Local maximum
Figure 4.3: Second-derivative test
Let f be have first and second derivatives at x = a. Assume that f ′ (a) = 0.

Local maximum If f ′′ (a) < 0, then there is a local maximum at x = a.
Second Derivative Test Local minimum If f ′′ (a) > 0, then there is a local minimum at x = a.
Inconclusive If f ′′ (a) = 0, then we can draw no conclusions from the second
derivative.
Example 4. Using the second derivative test
Find and classify the critical points of y = −x3 + 6x2 + 2 using the second derivative test.
Solution. Computing the first and second order derivatives of y = −x3 + 6x2 + 2 yields
dy
= −3x2 + 12x = −3x(x − 4)
dx
d2 y
= −6x + 12
dx2
This derivative always exists, so the critical points correspond to the solutions of 3x(x − 4) = 0. Hence, they are
given by x = 0 and x = 4. Evaluating the second derivatives at x = 0 and x = 4 yields
d2 y
= 12
dx2 x=0

d2 y
= −12
dx2 x=4
Hence, there is a local minimum at x = 0 and a local maximum at x = 4. Graphing the function y = −x3 + 6x2 + 2
demonstrates these conclusions:
35
30
25
20
y
15
10
0
−1 0 1 2 3 4 5
x
Global extrema
Let f be a function with domain A. f has an global minimum at x = a if
f (a) ≤ f (x) for all x in A.
Global extrema f has an global maximum at x = a if
f (a) ≥ f (x) for all x in A.
An global maximum or an global minimum is called an global extremum
Example 5. Finding global extrema
Consider the function whose graph is given by
0
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
Find the global extrema of this function.

Solution. The global maximum of y = 9 occurs at x = −2 and the global minimum of approximately 0.1 occurs
approximately at x = −0.6. 2
Example 5 illustrates that global extrema may occur at critical points or endpoints for a continuous function on
a closed interval. Thus, we have the following procedure for finding global extrema.
Let f be a continuous function defined on the closed interval [a, b]. To find the
global extrema of f , do the following:
Find critical points Find all the critical points on the interval (a, b).
The Closed Evaluate f at the critical points and the end points Evaluate f at all criti-
Interval Method cal points and at end points a and b.
Identify the extrema The largest value of f at a critical point or end point is
the global maximum of f . The smallest value of f at a critical point or end
point is the global minimum of f .
Example 6. Using the closed interval method
Find the global extrema of f (x) = 13 x3 − 21 x2 − 6x + 4 on the interval [−3, 6].
Solution. Taking the derivative of f yields

f ′ (x) = x2 − x − 6 = (x − 3)(x + 2)
The critical points are x = 3 and x = −2. Evaluating f at the critical points and end points yields f (−3) = 8.5,
f (−2) = 11 31 , f (3) = −9.5, and f (6) = 22. Therefore, the global maximum of 22 occurs at the end point x = 6. The
global minimum of −9.5 occurs at the critical point x = 3. Plotting this function demonstrates our findings:
Example 7. Getting extreme with C02
Example 5 of Section 1.3 examined how CO2 concentrations in parts per million (ppm) have varied from 1974 to
1985. Using linear and periodic functions, we found that the following function gives an excellent fit to the data:
π
f (x) = 0.1225x + 329.3 + 3 cos x ppm
6
where x is months after April 1974. Using the closed interval method, find the global maximum and minimum C02
levels in the one year interval [0, 12].

Solution. To find the critical points of f (x), we differentiate

π π
f ′ (x) = 0.1225 − sin x
2 6
While you can solve for the critical points by hand (see the problem set!) by recalling properties of inverse sine, we
circumvent this analysis by using a root finder on graphing calculator. Finding all the roots of f ′ (x) on the interval
[0, 12] yields x = 0.149 and x = 5.851. Evaluating f at these critical points and the endpoints yields:
f (0) = 332.3
f (0.149) ≈ 332.3
f (5.851) ≈ 327.0
f (12) ≈ 333.8
Hence, the global minimum C02 level occurs at x = 5.851 (sometime in late October). The global maximum occurred
at x = 12 (in April 1975). Plotting the function over the interval [0, 12] demonstrates these extremes:
334
333
332
331
ppm
330
329
328
327
326
0 2 4 6 8 10 12
x
Example 8. Search period of the codling moth
After a codling moth (Cydia pomonella) larva hatches from its egg case, it goes looking for an apple in which
to burrow. The period between hatching and finding the apple is called the search period. Obviously for individual
larvae this period will vary, but the average time s that it takes is known to be a function of temperature T in
Celsius. A good fit to the available data∗ illustrated in Fig. 4.4 is provided by the equation
1
s(T ) = , for 20 ≤ T ≤ 30.
−0.03T 2 + 1.67T − 13.65
Use the tools of calculus to find the largest and smallest values of s(T ) over the range 20 ≤ T ≤ 30.
Solution. The function the polynomial p(T ) = −0.03T 2 + 1.67T − 13.65 has roots at T ≈ 9.95 and T ≈ 45.7.
1
Hence p(T ) 6= 0 for 20 ≤ T ≤ 30 so that s(T ) = p(T ) is defined and continuous on this interval. Using the quotient
rule, its derivative is
2(0.03T ) − 1.67
s′ (T ) =
(−0.03T 2 + 1.67T − 13.65)2
which satisfies
s′ (T ) = 0 ⇒ 0.06T = 1.67 ⇒ T ≈ 27.83◦C.
∗ P.L. Shaffer and H.J. Gold, 1985. “A simulation model of population dynamics of the codling moth, Cydia pomonella” Ecological
Modeling 30:247-274.

Figure 4.4: Codling Moth Adults
Codling moth larvae searching period
0.125
0.12
0.115
22 24 26 28 30
0.105
Figure 4.5: Codling Moth Search Period
Note that s′ (T ) is defined for all T . Evaluating s(T ) at this critical point and at the endpoints we obtain
s(20) ≈ 0.129 s(27.83) ≈ 0.104 s(30) ≈ 0.106.
Hence on the interval 20 ≤ T ≤ 30, s(T ) has a minimum at the interior point T ≈ 27.33 and a maximum at the
boundary point T = 20. 2
In many problems, one needs to find the global extrema on open intervals, half-closed intervals, or intervals
involving infinity. For each of these cases, we have to deal with the limits as we approach the endpoints of the
intervals as illustrated with the open interval case. The other cases are left as exercises in the problem set.

Let f be a continuous function defined on the open interval (a, b). Assume the
limits L = limx→a+ f (x) and M = limx→b− f (x) are well-defined. Here we allow L
and M to be ±∞, a to be −∞, and b to be +∞. To find the global extrema of f
on (a, b), do the following:
Find critical points Find all the critical points on the interval (a, b).
Evaluate at critical points Evaluate f at all critical points.
The Open
Interval Method Identify the extrema If L or M is greater than f evaluated at any critical point,
then f has no global maximum on (a, b). Alternatively, if f evaluated at a
critical point x = c is greater than or equal to L, M , and f evaluated at any
other critical point, then f (c) is the global maximum. If L or M is less than
f evaluated at any critical point, then f has no global minimum on (a, b).
Alternatively, if f evaluated at a critical point x = c is less than or equal
to L, M , and f evaluated at any other critical point, then f (c) is the global
minimum.
Example 9. Using the open interval method
Use the open interval method to find the global extrema of the following functions on the indicated intervals.
1
a. f (x) = 3x−x2 −2 on (1, 2).
x
b. f (x) = 1+x2 on (−∞, ∞)
Solution.
1 1
a. We have f (x) = 3x−x2 −2 = (2−x)(x−1) is continuous on (1, 2). Note f ′ (x) exists for all x on (1, 2).
Moreover,
1
lim = +∞
x→2− (2 − x)(x − 1)
1
lim+ = +∞
x→1 (2 − x)(x − 1)
Hence, f (x) has no global maximum on (1, 2). Solving for the critical points on (1, 2), we get
f ′ (x) = 0
3 − 2x
− = 0
(3x − x2 − 2)2
2x = 3
x = 1.5
Since f has only one critical point and f (1.5) = 4 is less than limx→1+ f (x) and limx→2− f (x), the global
minimum is 4 and occurs at x = 1.5.
x
b. Since 1 + x2 is positive for all x, f (x) = 1+x2 is continuous on (−∞, ∞). Taking limits at infinity, we get
x 1/x 1
lim = lim =0
x→∞ 1 + x2 1/x x→∞ 1/x + x
x 1/x 1
lim = lim =0
x→−∞ 1 + x2 1/x x→−∞ 1/x + x

Solving for the critical points, we get

f ′ (x) = 0
1(1 + x2 ) − x(2x)
= 0
(1 + x2 )2
1 − x2
= 0
(1 + x2 )2
x = ±1
1
Since f (1) = and f (1) =
2 − 12
are greater than 0 and less than 0, respectively, these correspond to the
global minimum an global maximum.
2
Problem Set 4.2

In problems 1 to 4, identify the local and global extrema.
1.
0.5
0.4
0.3
0.2
0.1
−0.1
−0.2
−0.3
−1 −0.5 0 0.5 1
2.
1.8
1.7
1.6
1.5
1.4
1.3
1.2
1.1
1
−1 −0.5 0 0.5 1
3.
5
−1
−2 −1.5 −1 −0.5 0 0.5 1
4.
1
0.5
−0.5
−1
−1.5
−2
−1 −0.5 0 0.5 1 1.5 2

In problems 5 to 12, find the critical points and use the first derivative test to classify them.
5. y = 1 + 3x + 4x2
6. f (x) = 10 + 6x − x2
7. f (t) = t2 e−t
3 x4
8. y = x3 − 4 +5
x
9. f (x) = 1+x
x3
10. y = −3 x − x2 + 3
3 x2 x3
11. y = −x + 4 + 3
2
12. y = et −2t+1
In problems 16 to 13, find the critical points and use the second derivative test to classify them.
9 x2
13. y = −12 x − 2 + x3
14. y = 1 − exp(−x2 )
1
15. y = x + 2+x
2x2 −x4
16. y = 4
In problems 17 to 20, use the closed interval method to find the global extrema on the indicated intervals.
17. f (x) = x2 − 4x + 2 on [0, 3].
18. f (x) = x3 − 12x + 2 on [−3, 3].

1
19. f (x) = x + x on [0.1, 10].
20. f (x) = xe−x on [0, 100].

In problems 21 to 24, use the open interval method to find the global extrema on the indicated intervals.
21. f (x) = x2 − 4x + 2 on (−∞, ∞).

22. f (x) = x3 − 12x + 2 on (0, ∞).
1
23. f (x) = x + x on (0, ∞).
24. f (x) = xe−x on (−∞, ∞).
25. Let f be continuous on the half-open interval [a, b) with b possibly equal to +∞. Devise a method to find the
global extrema of f on this interval.
26. Let f be continuous on the half-open interval (a, b] with a possibly equal to −∞. Devise a method to find the
global extrema of f on this interval.
In problems 27 to 30, use the half-open interval methods you developed in problems 25–26 to find the global extrema
on the indicated intervals.
27. f (x) = x2 − 4x + 2 on [0, ∞).
28. f (x) = x3 − 12x + 2 on [1, 10).

1
29. f (x) = x + x+2 on [−1, ∞)

30. f (x) = xe−x on (−∞, −1].
31. Let f be defined on (a, b) and c ∈ (a, b). Prove that if x = c is a local maximum and f is differentiable at
x = c, then f ′ (c) = 0.
32. In Example 5, Section 1.3, we examined how CO2 concentrations (in ppm) have varied from 1974 to 1985.
Using linear and periodic functions, we found the following function gives an excellent fit to the data:
π
f (x) = 0.122463x + 329.253 + 3 cos x ppm
6
where x is months after April 1974. Using the closed interval method, find the global maximum and minimum
C02 levels on the interval [12, 24].
33. In the previous problem, use the closed interval method, find the global maximum and minimum C02 levels
between April 2000 and April 2001.
34. A close relative of the codling moth is the pea moth, Cydia nigricana, which is a pest of cultivated and garden
peas in several European countries. If its search period in one of the regions where it is a pest is given by the
function
1
s(T ) = , for 20 ≤ T ≤ 30,
−0.04T 2 + 2T − 15
then graph s(T ) using information about the first derivative over the range 20 ≤ T ≤ 30. Be sure that your
graph indicates the largest and smallest value of s over this interval.
f (t) = 890 sech2 (0.2t − 3.4) deaths/week
where t is measured in weeks. Find the global maximum of this function. Recall that
2
sech x =
ex + e−x
36. A particular species of plant (for example, bamboo) flowers once and then dies. A well-known formula for the
average growth rate r of a semelparous species (a species that breeds only once) that breeds at age x is
ln[s(x)n(x)p]
r(x) =
x
where s(x) represents the proportion of plants that survive from germination to age x, n(x) is the number of
seeds produced at age x, and p is the proportion of seeds that germinate.
a. Find the age of reproduction that maximizes r in terms of the parameters a, b, c and p where
s(x) = e−ax a>0
and
n(x) = bxc b>0
0 < c < 1.
b. Sketch the graph of y = r(x) for the case where a = 0.2, b = 3, c = 0.8, and p = 0.5.

37. The production of blood cells plays an important role in medical research involving leukemia and other so-
called dynamical diseases. In 1977, a mathematical model was developed by A. Lasota that involved the cell
production function
P (x) = Axs e−sx/r
where A, s, and r are positive constants and x is the number of granulocytes (a type of white blood cell)
present.∗
a. Find the granulocyte level x that maximizes the production function P . How do you know it is a
maximum?
b. Graph this function.
38. When you cough, the radius of your trachea (windpipe) decreases, affecting the speed of the air in the trachea.
If r is the normal radius of the trachea, the relationship between the speed S of the air and the radius r of the
trachea during a cough is given by a function of the form
S(r) = ar2 (r0 − r)
where a is a positive constant.∗ Find the radius r for which the speed of the air is the greatest.
39. Research indicates that the power P required by a bird to maintain flight is given by the formula
w2 1
P = + ρAv 3 ,
2ρSv 2
where v is the relative speed of the bird, w is its weight, ρ is the density of air, and S and A are constants
associated with the bird’s size and shape.∗ What speed will minimize the power? You may assume that w, ρ, S,
and A are all positive.
40. An epidemic spreads through a community in a such a say that t weeks after its outbreak, the number of
residents who have been infected is given by a function of the form
A
f (t) =
1 + Cekt
where A is the total number of susceptible residents. Show that the epidemic is spreading most rapidly when
half the susceptible residents have been infected.
∗ See “A Blood Cell Population Model, Dynamical Diseases, and Chaos,” by W. B. Gearhart and M. Martelli, UMAP Modules 1990:
Tools for Teaching. Arlington, MA: Consortium for Mathematics and Its Applications (CUPM) Inc., 1991.
∗ Philip M. Tuchinsky, “The Human Cough,” UMAP Modules 1876: Tools for Teaching, Consortium for Mathematics and Its Appli-
cations, Inc., Lexinton, MA, 1977.

∗ C. J. Pennycuick, “The Mechanics of Bird Migration,” IBIS III (1969), pp. 525-556.

4.3. OPTIMIZATION IN BIOLOGY 379
4.3 Optimization in Biology

One of the most important applications of calculus to biology, in particular, and science and technology, in general,
is finding the extrema of functions. Consider, for example the problem of determining the most effective treatment
regimen for a malignant tumor using chemo- or radiotherapy. Since treatments are toxic the body, one wants to
minimize the dosage that will do the job. Typically, a single therapy treatment will not destroy the tumor. Instead,
the tumor will initially shrink in size following treatment and some time after will begin to regrow. Ideally, therapy
should be reapplied immediately before this regrowth phase–when the tumor is at its smallest. Using calculus in
conjunction with tumor growth data, we can estimate the time when therapy should be reapplied. Alternatively, a
farmer planting a corn crop might be interested in the planting density of seeds that maximizes his or her profit.
Hence, the farmer may formulate a function that describes how his or her profits depends on planting density and
maximizing this function. In this section, we consider these problems, as well as the behavior of dogs fetching balls,
sustainable harvesting of arctic fin whales, and vascular branching. More examples are presented in the exercises.
A Basic Guideline
Optimization problems often require developing and analyzing an appropriate model of the situation at hand. Con-
sequently, when encountering an optimization problem for the first time, it is useful to keep a few things in mind.
Read, understand, and visualize When you first encounter an optimization problem, take the time to carefully
read the problem so that you fully understand what is being asked. In particular, ask yourself, what am I trying
to maximize or minimize? What information am I given? Is it sufficient to solve the problem of concern? When
appropriate, draw a picture or figure that summarizes the problem.
Identify key variables and quantities Ask yourself, what are the important quantities in the problem? In par-
ticular, what quantity is being optimized? This will be the dependent variable. Which of the variables is the
one whose value I can control to obtain my optimal solution? This will be the independent variable. What
additional quantities presented in the problem do I need to obtain the sought after relationship between the
dependent and independent variable? Associate units with each of these variables.
Write down the function In this step you need to determine how the dependent variable is determined by the
quantities that you identified in the previous step. Think carefully about this crucial step. Make sure that
units on both sides of your equation agree.
Optimize Determine whether you need to minimize or maximize the function and over what interval (i.e. values of
the independent variable) you need to perform the optimization. To find the optimal value it suffices to find
the critical points, evaluate the function at these critical points and at the endpoints of the interval. Whichever
of these values is the largest (respectively smallest) yields the maximum (respectively minimum).
Interpret your answer Interpret the results of your optimization. Ask yourself whether your answer makes sense.
If not, check over your work.
In this next example, we demonstrate how these principles are applied to the problem of determining how much
seed to plant to optimize profits. In the remaining examples in this section, we will not stress the steps involved, as
we have in the example above. But these steps are implicitly there and should be consciously used by you when you
get stuck in solving an optimization problem. These examples are grouped into problems that relate to behavior,
physiology, and resource management. You won’t necessarily have time to study them all, so you can pick and choose
those of the greatest interest to you.
Example 1. Maximum Economic Yield
Gaspar et al.’s article, A ‘Cookbook’ approach for determining the ‘Point of Maximum Economic Return’, states
“Many agronomists and producers have been conducting on-farm experiments that are designed to de-
termine the impact of different fertilizer rates or plant populations on crop yields. These data are usually
analyzed by plotting the input (fertilizer or population rate) vs. output (yield). The point of maximum

380 4.3. OPTIMIZATION IN BIOLOGY
yield may be picked directly off the plot. To make the results of these experiments more useful, the point
of maximum economic return should be calculated. The point of optimum economic return is determined
by:
1. Conducting a yield response experiment;

2. Converting the yield response data to a functional relationship, output corn yield = f(input levels);
3. Knowing or estimating the costs of your inputs and outputs;
4. Using calculus to determine where the change in the value of the input equals the change in the
value of the output.”
∗
To illustrate this approach, the authors consider a yield response curve that relates seed density to corn yield
Yield = −0.1181x2 + 8.525x + 12.95 bushels per acre
where x is thousands of seeds planted per acre. For this crop, assume that the selling price as $1.5 per bushel and
cost of seeds as $3 per thousand seeds. Determine the density of seeds (i.e. seeds per acre) that maximize profit (per
acre).
Figure 4.6: Corn yield as a function of seed density. Source: Gaspar, P.E., S. Paszkiewicz, P. Carter, M. McLeod,
T. Doerge, and S. Butzen. 1999. Corn hybrid response to plant populations. Pioneer Hi- Bred International Inc.
Northern Agronomic Research Summary. p. 29-40.
Solution. Having read the problem, we realize that we want to maximize the profit by planting the optimal number
of seeds per acre. Since profit is given by the revenue generated selling the crops minus the cost of the seeds, we have
Profit = Revenue − Costs
where each of these quantities is given in dollars per acre. Hence, the key variables for this problem are profit,
revenue, costs, and the number x of thousand seeds planted per acre. Since the revenue per acre is the price per
bushel times the bushels per acre, we get
Revenue = Price per bushel × bushels per acre

= 1.5 × (−0.1181x2 + 8.525x + 12.95) dollars per acre
On the other hand, the cost per acre is $3 for each thousand seeds, so that for x thousand seeds the cost is:
Cost = 3 × x = 3x dollars per acre

∗ Gaspar, P.E., S. Paszkiewicz, P. Carter, M. McLeod, T. Doerge, and S. Butzen. 1999. Corn hybrid response to plant populations.
Pioneer Hi- Bred International Inc. Northern Agronomic Research Summary. p. 29-40.

Figure 4.7: An Arctic Fin Whale
Hence, we can write the function

revenue cost
z }| { z}|{
2
P (x) = −0.1772 x + 12.7875 x + 19.425 − 3x
= −0.1772 x2 + 9.7875 x + 19.425
that we want to maximize with respect to x on the interval [0, ∞).
Plot of revenue P in dollars per acre as a function of x thousand seed planted per acre. Red line indicates optimum
planting density.
Since this function is a parabola that is facing down, we can find the maximum by finding where the derivative
vanishes.
P ′ (x) = −0.3544 x + 9.7875
Hence, x = 27.617 is where the maximum occurs. Our interpretation of 27.617 is that the farmer should plant
approximately 28 thousand seeds per acre and doing so should yield a profit of P (27.617) ≈ $155 per acre. 2
In many problems in population biology, a variable x is used to represent the density (or number) of individuals
in a population and the function G(x) is used to represent the population growth rate. For instance, for the discrete
logistic equation presented in Example 7, Section 2.5, we modeled the growth using the function
x
G(x) = rx 1 −
K
where r > 0 has the interpretation of the maximum per-capita growth rate and K > 0 has the interpretation of the
environmental carrying capacity. Here we use the function G(x) to determine the optimal rate at which to harvest a
population of whales (we present this for historical reasons, noting that harvesting of a number of different species
of whales is now illegal).
Example 2. Sustainable exploitation of the arctic fin whale
The arctic fin whale Balaenoptera physalus, at 50-70 tons for adults of both sexes (second only in size to the blue
whale) was a highly desirable catch during the whaling hay-days of the 19th and 20th century. As many as 30,000
individuals were slaughtered each year from 1935 to 1965. This level of exploitation could not be sustained for very
long so that today, population levels are estimated to be an order of magnitude below historical highs of around a
half million individuals. Some individuals are still taken each year for purposes of subsistence by aboriginal people

in Greenland. A moratorium on whale hunting is needed, however, to allow this species to recover to levels where
the populations can be safely exploited on a sustainable basis.
Assume the arctic fin whale growth rate is modeled by the logistic function G(x) = rx(1 − x/K) with r = 0.08
(i.e. an 8% annual growth rate when the whale densities are low) and K = 500, 000 (i.e. prior to exploitation, the
arctic fin whale population was estimated to be around half a million individuals). If the population is harvested at
a constant rate of H individuals per year for an extended period of time, then this harvesting rate is sustainable if
there exists a positive number x of whales at which the growth rate G(x) equals the harvesting rate H. That is, it
is possible for the growth to keep pace with the loss from harvesting. Determine the maximal sustainable harvesting
rate.
Solution. According to the statement of the problem, a harvesting rate H is sustainable if there is a positive x
such that

x
H = G(x) = 0.08x 1 −
500, 000
Hence, maximizing a sustainable harvesting rate is equivalent to finding x > 0 which maximizes G(x). Taking the
derivative of G, setting it equal to zero and solving for x yields
x
0 = G′ (x) = 0.08 − 0.16
500, 000
x
0.16 = 0.08
500, 000
x = 250, 000
Hence, the maximum sustainable yield occurs at a harvesting rate of H = G(250, 000) = 10, 000 whale per year at
which the whale population consists of 250,000 individuals. This maximum sustainable harvesting rate of 10,000
whales per year is three times smaller than the harvesting rate in the early 20th century. Hence, the model reaffirms
the statement that harvesting at 30,000 whales per year in the early 20th century was not sustainable and may
explain why the current population sizes are an order of magnitude lower than half a million. 2
Sometimes when solving a problem it is useful to sketch a figure as illustrated in the next example.
Example 3. Do dogs know calculus?
Professor Tim Pennings∗ from Hope College wanted to determine whether his dog, Elvis, fetched balls thrown
into Lake Michigan in an optimal way. Standing along the shoreline, with Elvis at his side, Professor Pennings would
through the ball into the water. Elvis could choose to swim out directly from where Tim was standing to get the
ball, hence taking a minimal distance trajectory. Alternatively, he could run along the shore before he jumped into
the water and swam to the ball. Because Elvis can only swim (based on actual data!) at an average speed of 0.91
meters per second while he can run at an average speed of 6.4 meters per second, it is likely that he ran for some
distance along the shore. But how far along the shore should Elvis run?
Tim performed an experiment to assess what strategy Elvis was playing by throwing the ball repeatedly into the
water and keeping track of where Elvis entered the water. For one throw, the ball landed 6 meters from the shore as
illustrated in Figure 4.8.
What path would Elvis take if he were to minimize that amount of time it took him to retrieve the ball?
Solution.
Let us begin by sketching a figure that indicates a hypothetical path Elvis could take.
∗ T. Pennings. Do Dogs Know Calculus? The College Mathematics Journal. 34(2003)178–182

Figure 4.8: Calculus-dog Elvis fetching a ball
In this drawing, 15 − x is the distance Elvis runs along the shore. Assuming that Elvis wants to minimize the
time to getting to the ball, we need to write down a function that describes how the amount of time to get to the
ball depends on x. Since he is running at a speed of 6.4 meters per second and runs a distance of 15 − x meters along
the shore, we get that the time he spends running on the shore is
(15 − x) meters 15 − x
= seconds
6.4 meters per second 6.4
√
By the Pythagorean theorem, the distance Elvis swims to the ball is 36 + x2 . Hence, the time he spends
swimming is √ √
36 + x2 meters 36 + x2
= seconds
0.91 meters per second 0.91
Hence, the total time T it takes him to get to the ball as a function of x is given by
√
15 − x 36 + x2
T (x) = + seconds
6.4 0.91
We want to understand the graph of this function for 0 ≤ x ≤ 15. Hence, let us take the derivative of T .
1 x
T ′ (x) = − + √
6.4 0.91 36 + x2
To find the critical points, we need to solve T ′ (x) = 0. Doing so yields
1 x
= √
6.4 0.91 36 + x2
1 x2
= square both sides
6.42 0.912 (36 + x2 )
0.02(36 + x2 ) = x2 multiply both sides by common denominator
2
0.72 = 0.98 x
0.735 = x2
±0.86 ≈ x

Hence, on the interval [0, 15], T ′ vanishes only at x = 0.86. Since T ′ (2) ≈ 0.19 > 0, we have T is increasing on
the interval (0.86, 15]. On the other hand, since T ′ (0) ≈ −0.16, T is decreasing on [0, 0.86). Thus, the minimum time
is achieved at x = 0.86. Therefore, Elvis should run 14.1 meters along the shore before jumping into the water. 2
So what was the outcome of Tim’s experiment? When Tim Pennings measured the point at which Elvis entered
the water, he found that Elvis ran 15 − x = 14.1 meters along the shore (i.e. x = 0.9). Does this dog know calculus?
Well, he could have been lucky on this one throw. So Tim Pennings performed 35 throws with the ball landing
different distances d from the shore line. Tim measured the point x where Elvis entered the water on each throw.
In the exercises at the end of this section you will be asked to show that the optimal place to enter the water as a
function of the distance d the ball lands from the shore is
x = 0.144 d meters
A scatter plot of the data and the line is shown in Figure 4.9. This figure illustrates that Elvis is on average acting
pretty optimal. Quite remarkable.
Figure 4.9: Scatter plot of distance of ball from shore (in the horizontal direction) and Elvis’ point of entry 15 − d
in the water (in the vertical direction). The optimal line x = 0.144 d passes through the center of the scatter plot.
Now we turn to our tumor treatment analysis mentioned in the introduction to this section.
Example 4. Tumor regrowth
In an experimental study performed at Dartmouth College, two groups of mice with tumors were treated with
the chemotherapeutic drug, cisplatin. Prior to the therapy, the tumor consisted of proliferating cells (also known
as clonogenic cells) that grew exponentially with a doubling time of approximately 2.9 days. (Of course the tumor
could not grow indefinitely at this rate because it would soon be larger than the mouse. However exponential growth
is a good approximation when the tumor is much smaller than the mouse.) Each of these mice was given a dosage
of 10mg/kg of cisplatin. At the time of the therapy, the average tumor size was approximately 0.5 cm3 . After
treatment, 99% of the proliferating cells became quiescent cells (also known as non-proliferating or resting cells).
These quiescent cells do not divide, and decay with a half life of approximately 5.7 days.
a. Write down a function V (t) that represents the volume of the tumor t days after therapy. The tumor
volume includes the volume of the proliferating cells and the quiescent cells.
b. Determine at what point in time the tumor starts to regrow and therapy should be reapplied.
Solution.
a. The volume V (t) of the tumor is given by
V (t) = P (t) + Q(t)

where P (t) is the volume of proliferating cells and Q(t) is the volume of quiescent cells. The proliferating
cells are increasing at an exponential rate and have an initial volume of P (0) = 0.01 × 0.5 = 0.005 cm3
(i.e. 1% of the previous untreated average size). Hence, P (t) = 0.005eat where we need to solve for a.
Since the doubling time is 2.9 days, we can solve for a as follows:
P (2.9) = 2(0.005)
0.005e2.9a = 0.01
e2.9a = 2
a = ln 2/2.9 ≈ 0.24
Hence, P (t) = 0.005e0.24t. Similarly, we have Q(0) = 0.99(0.5) = 0.495 and Q(t) = 0.495ebt where we
have to solve for b. Since the half-life of quiescent cells is 5.7 days, we can solve for b as follows:
Q(5.7) = 0.5(0.495)
5.7b
0.495e = 0.5(0.495)
e5.7b = 0.5
b = ln 0.5/5.7 ≈ −0.12
Hence, Q(t) = 0.495e−0.12t and
V (t) = 0.005e0.24t + 0.495e−0.12t
b. To determine when V (t) is increasing or decreasing, we need to compute its derivative:
V ′ (t) = 0.0012e0.24t − 0.0594e−0.12t
Since V ′ (0) ≈ −0.0582, the volume of tumor is initially decreasing after therapy. To see when V ′ (t)
changes sign, we solve
0.0012e0.24t − 0.0594e−0.12t = 0
0.0012e0.24t = 0.0594e−0.12t
0.0012e0.36t = 0.0594
e0.36t = 49.5
ln 49.5
t = ≈ 10.84 days
0.36
Hence, after 10.84 days, the tumor begins to regrow and therapy should be reapplied. Indeed this
prediction is supported by the data shown below on the left hand side:

The data on the right hand side will be examined in the problem set.
2
The vascular system consists of arteries and veins that branch in different directions to pump blood through all
parts of the body. Ideally the body is designed to minimize the amount of energy it expends in pumping the blood.
According to one of Poiseuille’s Laws the resistance blood experiences by traveling down the center of a blood vessel
with radius r and length L is proportional to
L
r4
Without loss of generality, we assume that this proportionality constant equals 1 and use this law to determine
optimal branching angles in the vascular system of animals.
Example 5. Vascular branching
Consider a blood vessel that branches as illustrated below:
where a and b are positive constants. Given a and b determine the angle θ which minimizes the total resistance in
the blood flow from the point A to the point C.

Solution. We want to minimize the total resistance along the blood vessel from A to C. Let B be the point where
vessel branches. We need to determine the resistance from A to B and the resistance from B to C. To determine
the resistance along the blood vessel from A to B, we need to determine how the distance from A to B depends on
θ. Using the right triangle as shown below
we get the length from B to D is given by b cot θ. Hence, the distance from A to B is
a − b cot θ
and, as the radius of the vessel from A to B is 3, the resistance from A to B is

a − b cot θ
34
Alternatively, using the same right triangle, we get that the distance from B to C is b csc θ. Hence, the resistance
from B to C is
b csc θ
24
as the radius of the vessel from B to C is 2. Adding the resistance from A to B to the resistance from B to C, we
get that the resistance from A to C is given by
a − b cot θ b
R(θ) = + csc θ
81 16
To minimize the resistance on the half-open interval (0, π2 ], we need to determine the critical points along this interval
as follows:
0 = R′ (θ)
b b
0 = csc2 θ − csc2 θ cos θ
81 16
1 1
0 = b csc2 θ cos θ −
16 81
16
= cos θ
81
1.37 ≈ θ
Hence, the optimal angle is given by ≈ 1.37 radians. Equivalently, since one radian is approximately 57.3 degrees,
θ ≈ 78.5 degrees. 2
Problem Set 4.3

1. In Example 1, the selling price was $1.5 per bushel and the seeds cost $3 per thousand seeds. Determine the
density of seeds that maximize profit if the selling price and cost are both doubled.

2. In Example 1, determine the density of seeds that maximize profit if the selling price is $5 per bushel and the
seeds cost $2 per thousand seeds.
3. In Example 1, determine the density of seeds that maximize profit if the selling price is $2.2 per bushel and
the seeds cost $2.5 per thousand seeds.
4. In Example 2, we estimated that the maximum per-capita growth rate of the fin whales is r = 0.08. Suppose
a better estimate is r = 0.1. Determine the maximum sustainable harvesting rate for this value of r.
5. In Example 2, we estimated that the carrying capacity of the fin whales is K = 500, 000. Suppose a better
estimate is K = 400, 000. Determine the maximum sustainable harvesting rate for this value of K.
6. In a species of fish, the growth rate function is given by G(x) = 1.4x(1 − x/K) where K = 5 million metric
tons (i.e. the population of fish is measured in metric tons rather than number of individuals). If the harvest
rate is a function of the harvesting effort h and the total amount of fish x, that is H = hx, find the harvesting
effort value h that corresponds to the maximum sustainable yield.
7. In a species of fish, the growth rate function is given by G(x) = 2.1x(1 − x/K) where K = 8 million metric tons.
If the harvest rate is H = hx, find the harvesting effort value h that corresponds to the maximum sustainable
yield.
In Problems 8 to 12, find the optimal angle for the following vascular branching problems, as considered in Example 5.
8. A larger artery has radius 0.05 mm and a smaller artery of radius 0.025 mm branches from the larger artery
with branching angle θ.
9. A larger artery has radius 0.06 mm and a smaller artery of radius 0.04 mm branches from the larger artery
with branching angle θ.
10. The radius of the main blood vessel is r1 = 2 and the radius of the branching vessel is r2 = 1
11. The radius of the main blood vessel is r1 = 4 and the radius of the branching vessel is r2 = 3.
12. The general case where the radius of the main blood vessel is r1 and the radius of the branching vessel is r2 .
Assume that r1 > r2 .
13. In Example 3 calculate at what point x along the shore Elvis should enter the water if the distance of the ball
from the shore is 20 meters rather than 6.
from the shore is 10 meters rather than 6.
from the shore is d meters.
16. Find a general formula for which Example 3 is a specific case that describes how to calculate at what point x
along the shore Elvis should enter the water if the distance of the ball from the shore is d meters (rather than
6) and the point on the shore to which this distance d holds is k meters (rather than 15) from where Tim is
standing.
17. In a species of fish, the growth rate function is given by G(x) = 1.5x(1 − x/K) where K = 6000 metric tons
(i.e. the population of fish, x, is measured in metric tons rather than number of individuals). The price a
fisherman can get is p =$600 per metric ton. If the amount the fisherman can harvest is determined by the
function H = hx, where teach unit of h costs the fisherman c =$100, what is the maximum amount of money
the fisherman can expect to make on a sustainable basis. (Hint : The fisherman’s sustainable income is given
by pH − ch where H is a sustainable harvesting rate).

18. In the tumor growth study described in Example 4, where the tumor consisted of proliferating cells (clonogenic
cells) that grew exponentially with a doubling time of approximately 2.9 days, suppose that each mouse was
given a dosage of 25mg/kg of cisplatin per treatment with the following results: At the time of the therapy,
the average tumor size was approximately 0.44 cm3 . After treatment, 99.73% of the proliferating cells became
quiescent cells and decayed with a half life of approximately 6.24 days.
a. Write a function V (t) that represents the size of the tumor (proliferating plus quiescent cells) t days
after therapy.
c. Compare your answer to the data figure in Example 4.
19. In a follow up study to the tumor growth study described in Example 4, mice were infected with a relatively
aggressive line of proliferating clonogenic cells that grew exponentially with a doubling time of approximately
1.8 days. Each mouse was given a dosage of 20mg/kg of cisplatin per treatment with the following results: At
the time of the therapy, the average tumor size was approximately 0.6 cm3 . After treatment, 99.10% of the
proliferating cells became quiescent cells and decayed with a half life of approximately 4.4 days.
a. Write a function V (t) that represents the size of the tumor (proliferating plus quiescent cells) t days
after therapy.
20. In certain tissues, cells exist in the shape of circular cylinders. Suppose such a cylinder has radius r and height
h. If the volume is fixed (say, at v), find the value of r that minimizes the total surface area (S = 2rh + 2r) of
the cell.
21. Farmers regularly use fertilizers to enhance the productivity of their crops. Determining the appropriate
amount of fertilizer to use requires balancing the costs of fertilization with the increases in yield. In a 2004
study published in the Agronomy Journal ∗ , Baker et al. studied the relationship between nitrogen fertilization
and yield of hard red spring wheat. For conventional tillage practices in Eastern Washington in the late 1980s,
Baker et al. found that the grain yield (in Mg per hectare) as a function of nitrogen (in Kg per hectare) is well
approximated by
Y (N ) = 1.86 + 0.02741N − 0.00009N 2
Baker et al. suggested that a high price for wheat would be $191.1/Mg and low cost for nitrogen would be
$0.49/kg. Determine the amount of nitrogen that maximizes profits per hectare.
22. Baker et al. suggested that a low price for wheat would be $139.65/Mg and a high cost for nitrogen would be
$0.71/kg. Using the same yield function as in the previous problem, determine the amount of nitrogen that
maximizes profits per hectare.
23. If the effects of density-dependence in a whale population set in less rapidly closer to the final carrying capacity
K, then the Logistic equation used in Example 2 should be replaced by a more general non-symmetric growth
model α
x
G(x) = 0.08x 1 − whales per year
500, 000
for some α ∈ (0, 1). For the case α = 0.5, calculate the stock level x that provides the maximum sustainable
yield and compare this to the value predicted by the model in Example 2.
24. If the effects of density-dependence in a whale population set in less rapidly or more rapidly closer to the
final carrying capacity K, then the Logistic equation can be replaced by a more general non-symmetric growth
model h x α i
G(x) = rx 1 − whales per year
K
For α > 0, r > 0, and K > 0, calculate the stock level x that provides the maximum sustainable yield. Discuss
whether rapid onset of density dependence (i.e. large α) or gradual onset of density dependence (i.e. small α)
leads to larger sustainable yields.
∗ Dustin A. Baker, Douglas L. Young, David R. Huggins and William L. Pan. 2004. Economically Optimal Nitrogen Fertilization for
Yield and Protein in Hard Red Spring Wheat. Agronomy Journal 96:116–123

25. During the winter, a species of bird migrates from the coast of a mainland to an island 500 miles southeast. If
the energy the bird requires to fly one mile over the water is twice more than the amount of energy it requires
to fly over the land, determine what path the species should fly to minimize the amount of energy used.
26. Online ∗ you can find the following problem “The Statue of Liberty stands 92 meters high, including the
pedestal which is 46 meters high. How far from the base should you stand so that your viewing angle, θ, is as
large as possible? See the figure below.”
27. In the northeastern part of Sweet Water County, a large dam is being constructed on the Shuga River to
produce hydro-electricity (i.e. the generation of electricity through water pressure). An important part of this
project is running a power lines from the power stations at the downstream side of the dam to various parts of
the county including Pickle City, the largest city in the county. On the recommendation of a number of other
counties, county officials have hired you as consults to resolve cost issues for running these power lines.
County officials have informed you that the Shuga River runs due south and on its western side lies an expanse
of federally protected wetlands. Pickle City lies several miles to the west of these wet lands as shown in the
map below.
Suga River
Power
Plant
15 miles 25 miles
20 miles
The Swamp Barbaloot Habitat

10 miles
Pickle City
The federally protected wetlands are divided into two regions. In the northern region, county officials expect
that due to federal regulations it will cost 40% more to run conduit here than it does through non-wetland
ground. The southern region of the wetlands is a habitat for the endangered Brown Barbaloots. Consequently,
federal law prevents the county from running conduits through this region.
∗ http://astro.temple.edu/ dhill001/maxmin/statueoflibertydescription.html

As the county officials intend to submit a budget proposal for the project to the county council in the next week,
they would like you to determine the path from the power station to downtown Pickle City that minimizes the
cost of installing the conduit.
28. An oil spill has fouled 200 miles of Pacific shoreline. The oil company responsible has been given 14 days to
clean up the shore line, after which a fine will be levied in the amount of $10,000 dollars/day. The local cleanup
crew can scrub 5 miles of beach per week at a cost of $500/day. Additional crews can be brought in at a cost of
$18,000 plus $800/day for each crew. Determine how many additional crews should be brought in to minimize
the total cost to the company and how much the clean up will cost.
29. Consider a spherical cell with radius r. Assume that the cell gains energy at a rate proportional to its surface
area (i.e. nutrients diffusing in from outside of the cell) and the cell loses energy at a rate proportional to its
volume (i.e. all parts of the cell are using energy). If the cell is trying to maximize its net gain of energy,
determine the optimal radius of the cell. Note: your final expression will depend on your proportionality
constants.
30. Consider a cylindrical cell with radius r and height r/2. Assume that the cell gains energy at a rate proportional
to its surface area (i.e. nutrients diffusing in from outside of the cell) and the cell loses energy at a rate
proportional to its volume (i.e. all parts of the cell are using energy). If the cell is trying to maximize its net
gain of energy, determine the optimal value of r. Note: your final expression will depend on your proportionality
constants.
31. A dune buggy is on the desert at a point A located 40 km from a point B, which lies on a long, straight road,
as shown in Figure 4.10
Figure 4.10: Path traveled by a dune buggy
The driver can travel at 45 km/h on the desert and 75 km/h on the road. The driver will win a prize if she
arrives at the finish line at point D, 50 km from B, in 85 min or less. Set up and analyze a model to help her
decide on a route to minimize the time of travel. Does she win the prize?
32. The question of whether an optimal body size exists for different kinds of animals is one that is of great interest
to biologists. The reproductive power P of an individual can be modeled, following the ideas of ecologist,
Professor James H. Brown∗ , as the harmonic mean† of two limiting rates: a per-unit-mass rate R1 at which
individuals acquire resources, and a per-unit mass rate R2 at which individuals convert those resources into
new individuals: that is,
R1 R2
P = .
R1 + R2
∗ See his book ”Macroecology” 1995, Unversity of Chicago Press.
† The harmonic mean of two numbers a and b is the reciprocals of the average of the inverses of the two numbers: 1/(1/a + 1/b) =
ab/(a + b)

Assuming both R1 and R2 are the following allometric functions of body mass measure in kilograms (kg)
R1 = 2M 3/4 and R2 = 3M −1/4 ,
then find the body mass M that maximizes the reproductive power P , and show that this extremum is a
maximum for the case b1 = 0.75 and b2 = −0.25.
33. Suppose that we express the two rate functions in Problem 32 using the general form R1 = c1 M b1 and
R2 = c2 M −b2 . Show in this case that the maximum body size is given by the expression
1/(b1 −b2 )
−c2 b1
M∗ = .
c1 b 2

4.4. APPLICATIONS TO OPTIMAL BEHAVIOR 393
4.4 Applications to Optimal Behavior

The behavior of animals has been honed by natural selection to maximize the reproductive potential of individuals.
Thus, from an individual’s point of view, an organism acts in a way that maximizes the number of offspring it can
spawn or rear to sexual maturity. This number is referred to as an individual’s fitness. From a genetic point of
view, a gene encoding for a behavior that maximizes an individual’s fitness will have a greater representation in the
gene pool of future generations than a gene that encodes for a behavior that is detrimental to an individual’s fitness
(e.g. a gene that causes an individual to be excessively reckless, making it likely that the individual will die before
reaching sexual maturity). Theories of optimal behavior are based on the premise that organisms can maximize
their individual fitness by behaving in a certain way. Using models, one can develop hypotheses about these optimal
behaviors. These hypotheses can be tested experimentally or through comparative studies. While these models are
relatively crude, they often provide key insights into the behavior of animals.
In our first example, we obtain insights into the reason why Northwestern crows consistently drop whelks from
a rather specific height in an effort to get the shells to break open on impact. If they fly too low the shells require
too many drops to get them to break open. If they fly too high, they waste energy. Thus, assuming that crows have
evolved to minimize energy expenditures, scientists might be interested in testing this hypothesis by formulating a
suitable function to minimize. This function would characterize the number of of drops, and hence work, required to
break open a whelk as a function of the height from which the shell is dropped. In addition to modeling the dropping
behavior of northwestern crows, this section investigates optimal foraging in a patchy environment, optimal timing
of seed production, and optimal time to harvest crops. Beyond these examples, we present a key theorem, called the
Marginal Value Theorem, which has applications to problems maximizing or minimizing average rates of change.
Figure 4.11: A Northwestern Crow
Example 1. Northwestern crows and whelks
The Northwestern crow illustrated in Figure 4.11 feeds on whelks, a type of mollusk. To get the meat from
inside the whelk’s shell, individual crows lift them into the air and drop them onto a rock to break open the shell. A
biologist, Reto Zach, who studies the crows we first encountered in Example 6 of Section 3.2, observed that individual
crows typically drop the shells from a height of five meters. This led him to ask whether this behavior is optimal in
the sense of minimizing the amount energy required to open a shell. After collecting some data by dropping whelks
from different heights, Reto Zach found that, on average, the number of drops required to break a whelk dropped

394 4.4. APPLICATIONS TO OPTIMAL BEHAVIOR
from h meters can be modeled by the function

20.4
D(h) = 1 + drops, h > 0.84.
h − 0.84
Note, this relationship implies that limh→0.84+ D(h) = ∞ which, in turn, implies that if h ≤ 0.84 the shell will never
open. Assuming that crows try to minimize the amount of work required to break a whelk shell, find the optimal
height from which a whelk should be dropped.
Solution. Since work is force times distance, the amount of work required to drop a shell of fixed weight is
proportionate to the total height the crow flies when breaking a whelk. The total height is given by the number of
drops times the height of the drop. In other words, up to a proportionality constant the work is given by
20.4h
W (h) = h D(h) = h +
h − 0.84
To determine where this function takes on it smallest value we need to understand the graph of the function. It has
a vertical asymptote at h = 0.84. Taking the derivative yields
0.84 · 20.4
W ′ (h) = 1−
(h − 0.084)2
−16.4304 − 1.68 h + h2
=
(h − 0.84)2
Since the denominator is positive wherever h 6= 0.84, we only need to understand when the numerator is positive
or negative. Solving −16.4304 − 1.68 h + h2 = 0 for h yields h ≈ −3.3 meters and h ≈ 4.98 meters. Since this
quadratic corresponds to an upward facing parabola, we get the numerator of W ′ (h) is positive when h > 4.98 and
negative on the interval (0.84, 4.98). Hence, W (h) decreases on the interval (0.84, 4.98) and increases on the interval
(4.98, ∞) so that h ≈ 4.98 is a global minimum for h > 0.84. Hence, the height that minimizes the amount of work
is approximately 5 meters as observed by biologists! 2
The next example explores the optimal time for a plant to produce seeds, which is just one in a class of optimal
time-to-reproduction problems such as optimal time for a honey bee colony to swarm or for a semelparous fish (i.e.
breeds once and then dies), such as salmon, to return from the ocean to lay its eggs up-river.
Example 2. Optimal time for producing seeds
A particular plant is known to have the following growth and seed production characteristics. At time of planting
(t = 0) the seedling has a mass of 5 grams. At time (t > 0) days after planting the seedling has grown into a plant
that weighs w(t) = 5 + 400t − t2 grams. The plant has a gene that can be manipulated to control the age t at which
the plant matures. At maturity the number of seeds S(t) produced by the plant is given by
S(t) = 0.1w(t) = 0.5 + 40t − 0.1t2
A farmer asks the geneticists to genetically engineer a plant line that accounts for the fact that on his farm, because
of losses from pests, drought and disease, a proportion
100
P (t) =
100 + t
of germinating seeds can be expected to develop and survive as plants to age t. What age-of-maturity should the
geneticist select for the plants to maximize the seed production of the mature crop for the farmer?
Solution. For every N seeds that the farmer plants on his land at time t = 0, he can count on N P (t) maturing at
time t > 0. The total yield from these plants is then
100N (0.5 + 40t − 0.1t2 )

Y (t) = N P (t)S(t) = .
100 + t

Since N is just a scaling factor that depends on the number of acres that farmer plants, we can set it to any
convenient value such as N = 1. To find the germination time that maximizes this yield, we need to understand the
first derivative:

d 50 + 4000t − 10t2
Y ′ (t) =
dt 100 + t
(100 + t)(4000 − 20t) − (50 + 4000t − 10t2 )
=
(100 + t)2
−10t2 − 2000t + 399950
=
(100 + t)2
Thus Y ′ (t) exists for t > 0 and the derivative vanishes at solutions to the equation
10t2 + 2000t − 399950 = 0.
We can use technology or the quadratic formula to obtain the roots
t∗ = −323.6 and 123.6.
Since Y ′ (0) > 0 and Y ′ (200) < 0, we get that Y increases on the interval [0, 123.6] and decreases on the interval
[123.6, ∞). Hence, Y (t) is maximized at t ≈ 123.6. You can verify this directly by plotting Y as a function of time,
as illustrated in Fig. 4.12. The vertical line indicates the optimal maturation time t∗ = 123 days.
1600
1400
1200
1000
seeds per plant
800
600
400
200
0
0 50 100 150 200
days
Figure 4.12: The top curve is the number of seeds S(t) produced by a plant that survives to age t. The bottom curve
is the expected number of seeds that plant will produce after taking into account the probability it may die before
starting to seed.
Optimal foraging and the Marginal Value Theorem

Very often, food is not distributed homogeneously over the environment, but occurs in discrete patches in the
environment. For fruit bats, a patch may correspond to a fruit tree or a stand of fruit trees. For a hummingbird
which feeds on the nectar of flowers, a patch may correspond to a single flower or a field of flowers. In optimal
foraging theory, we want to know how long an animal should continue to collect resources in a patch when it has the
choice of traveling to another resource rich patch. The question of when to leave a patch as resources in the patch
are being depleted is know as the optimal residence time problem.
Example 3. Optimal foraging in a multi-patch environment

Figure 4.13: A house martin parent feeding its young
House martins make sorties from their nests to collect food to bring back to their young. In an experiment carried
out in the early 1980s, two British scientists, D. M. Bryant and A. K. Turner∗ found that the travel time of martins
from a particular nest to nearby foraging areas ranged from half a minute to several minutes, and the weight of the
load of insects they collected and brought back to their nest to feed their chicks (see Figure 4.13) varied between 20
and 100 mg. On an average foraging bout, Bryant and Turner observed that these martins collected insects at the
rate of (roughly) 10 mg per minute from time of departure from the nest. Assume one of these martins encounters a
patch three minutes after leaving its nest and its cumulative load of insects after foraging for t minutes is given by
the function
200t
B(t) = mg
6+t
If the Martin is trying to maximize the average load accumulated per minute since leaving its nest, then what is the
optimal time for the martin to quit foraging in this patch?
Solution. Since it takes three minutes for the martin to reach the patch, the average load accumulated per minute
after t minutes in the patch is R(t) = B(t)/(t + 3). To determine the best time to leave the patch, we need to
understand the graph of R(t) for t ≥ 0. Taking the first derivative of R we get

′ d 200t
R (t) =
dt (6 + t)(t + 3)

d 200t
=
dt t2 + 9t + 18
200(t2 + 9t + 18) − (2t + 9)200t
=
(t2 + 9t + 18)2
200(18 − t2 )
=
(t2 + 9t + 18)2
√
We have dR 2
dt = 0 when t = 18. Equivalently t = ± 18 ≈ ±4.24 minutes. Only the positive solution is relevant.
′ ′
√
Since it is easily
√ shown that R (0) > 0 and R (18) < 0, R is increasing
√ on the interval (0, 18) and decreasing on
the interval ( 18, ∞). Hence, the maximum is achieved at t = 18 at which
√
R( 18) ≈ 11.44 mg per minute,
which exceeds the background average rate of 10 mg per minute. This conclusion is reaffirmed by graphing R(t) as
follows:
∗ D.M. Bryant and A.K. Turner, 1982, Animal Behavior 30:845-856

12
10
mg per minute
6
0
0 2 4 6 8 10
minutes
In Example 3, the average rate of change was being maximized over a time interval. A fundamental result for
problems of this type is the Marginal Value Theorem.
Theorem 4.2. Marginal Value Theorem
Let V (t) be a function defined on an interval [a, ∞). If V (t) represents the accumulated value of the resource by
time t ≥ a, then the average rate of resource accumulation by time t is given by
V (t) − V (a)
A(t) =
t−a
If a maximum or minimum of A(t) occurs at t∗ > a and V is differentiable at t = t∗ , then this time t∗ satisfies the
equation
V (t) − V (a)
V ′ (t) = .
t−a
In other words, this maximum or minimum occurs at a time where the average rate of change equals the instantaneous
rate of change.
Proof. The proof follows directly from the fact that since V (t∗ ) exists and t∗ > a by hypotheses, then A′ (t) = 0 at
one extremum of A(t) lying in (a, ∞). Computing the derivative yields

d V (t) − V (a)
A′ (t) =
dt t−a
V ′ (t)(t − a) − (V (t) − V (a))
= =0 by the quotient rule.
(t − a)2
V (t)−V (a)
Thus provided t > a, then A′ (t) = 0 implies V (t) − V (a) = (t − a)V ′ (t). Equivalently, V ′ (t) = t−a . 2
Example 4. Optimal foraging of great tits
In a classic paper on animal behavior∗ , biologist Richard Cowie studied the foraging behavior of great tits by
constructing experimental trees in an aviary (see Figure 4.14). On these experimental trees, food was placed in
plastic containers in a manner that would allow Dr. Cowie to manipulate the average travel time T between food
∗ R. Cowie. 1977. Optimal foraging in great tits (Parus major ). Nature 268:137–139.

Figure 4.14: The experimental tree in Cowie’s experiments
containers. Through these experiments, Dr. Cowie estimated that that the energy gained by a bird after eating from
a container for t ≥ 0 seconds is
E(t) = 6.3587(1 − e−0.0081 t ) calories
Assuming the great tits are maximizing their average energy gain, do the following:
a. Use the Marginal value theorem to determine the relationship between T and the optimal residence time
t in a patch.
b. Solve for T in terms of the optimal residence time and plot it.
c. Discuss your findings.
Solution.
a. Assume that at t = 0 the bird arrives at a food container. Since it takes T seconds to get to a container,
we are interested in the time interval [−T, ∞) where t = −T corresponds to the moment that the bird
begins traveling to the container. Since there is no energy gain during the flight (in fact there is some
loss that we are ignoring!), we define E(t) = 0 for t ≤ 0. Clearly, the maximum can not occur during the
interval [−T, 0]. By the marginal value theorem with a = −T , the time t at which the maximum occurs
must satisfy:
E(t) − E(−T )
E ′ (t) =
t+T
6.3587(1 − e−0.0081 t )
0.0515e−0.0081 t =
t+T
b. Solving for T in terms of the optimal residence time t yields
6.3587(1 − e−0.0081 t )
0.0515e−0.0081 t =
t+T
6.3587(1 − e−0.0081 t )
t+T =
0.0515e−0.0081 t
t+T = 123.5(e0.0081t − 1)
T = 123.5(e0.0081t − 1) − t

Plotting this function yields
60
50
40
travel time T
30
20
10
0
0 20 40 60 80 100
optimal residence time t
c. The plot implies that the as the travel time between patches increases, the residence time should increase.
Intuitively this conclusion is clear. Moreover, the plot makes very specific predictions about optimal
residence times. These predictions were tested by Cowie’s experiments. The following figure shows how
the data relates to our predictions.
Notice that in Cowie’s graph the axes are switched. The average observed values are plotted as solid
circles. The curve we found is plotted as a dashed line. While five of the twelve data points are very
close to the dashed line, the remaining seven data points lie significantly above it. In other words, for
these seven experiments, the birds were spending more time in the patches than predicted by the model.
On possible explanation for this discrepancy is that the model doesn’t account for the energetic costs
of traveling. Cowie adjusted the model to account for these energetic costs and the resulting prediction
is plotted as a solid curve in the figure above. In the problem set, you are asked to account for these
energetic costs.
2

The marginal value theorem has a simple graphical interpretation, but also some limitations; both of which are
explored in the next example.
Example 5. Optimal time to harvest
Over a sixty year period a forestry company has collected data on the profit P (t) of stands harvested at various
ages of t years. Initially, P (t) is negative because the costs required to bring in the heavy equipment needed to
harvest the trees, exceeds the value of the harvest itself. Once the trees reach a certain size, a profit is possible and
it steadily increases as the stand of trees ages. The company found that the function that best fit their data has the
following graph:
60
40
20
t
0.5 1 1.5 2
where the profit P is measured in thousands of US dollars and t is measured in years.
a. The company wants to maximize the profits it makes per year not taking into account the costs needed
to clear the land during the harvesting cycle (i.e. the so-called clear cutting part of the operation). Write
down the function A(t) that they want to maximize and illustrate the solution graphically.
b. The company wants to maximize the profits it makes per year, but now taking into account the costs
needed to clear the land during the harvesting (i.e. clear cutting operation) Write down the function B(t)
that they want to maximize and illustrate the solution graphically.
c. Discuss the differences between the two solutions and the role of the Marginal Value Theorem.
Solution.
a. In this first part of this problem, the company is interested in maximizing the gross profit V (t) =
P (t)+P (0), where the initial cost P (0) < 0 has been added to P (t) to remove its effect from consideration.
To find the point in time that maximizes the gross profit on an average accumulated rate basis, we apply
the Marginal Value Theorem to the function A(t) = V t−0 (t)−0
(in this case the initial time and value are
∗
both 0). According to the marginal value theorem, if there is a local maximum at t , thenit will satisfy
V (t∗ ) V (t∗ )
the equation A(t∗ ) = t∗ . This solution is graphically represented by the line y = t∗ t that passes
∗ ∗ V (t∗ )
through the point (0, 0) and (t , V (t )) and has slope as well as being tangent to V (t) at the point
t∗
t = t∗ (as depicted in the top graph of the illustration below).

b. If the company wants to maximize the average rate of accumulation of profit P (t), which is B(t) = P (t) t ,
then the Marginal Value Theorem does not apply because, as we can see from the lower graph in the
above illustration, the line y = B(t̂)t does not intersect with the function P (t) at the origin for any choice
of t̂. As we can see from the lower graph in the above illustration, the maximum value to the slope of
the line y = B(t̂)t that still intersects P (t) occurs at t̂ = t# . Thus t = t# maximizes the function B(t).
c. Because the maximizing solutions in the above two parts to this problem occur where the curve P (t) is
concave down, we see from the illustration that t∗ < t# . This implies that if the company does not take
the cost of the harvesting process itself into account (i.e. the costs to clearcut the forest stand which are
represented by P (0)) then the company will always end up harvesting earlier then if it does take this cost
into account. The Marginal Value Theorem only has a role to play on an interval t ∈ [a, ∞) when the
average resource (in this case profit) accumulation rate function and the derivative of the profit function
coincide at t = a. In this problem a = 0 and this latter condition only holds in Part a. but not Part b.
as we see by the fact that the red and black curves in our illustrative graphs coincide at a = 0 in the first
case but not the second.
Our final example introduces the concept of discounting when optimizing a sustainable stream of revenue cal-
culated for all time in the future. We have left this problem to the end of the section because it illustrates how a
general formulation of a problem leads to general insights regarding its solution.
Discounting arises from the fact that if someone promises to pay you D dollars next year, and the current interest
D
rate (after adjusting for inflation) is r%, then this person may be equally happy to receive 1+r/100 dollars this year.
D D
This reasoning follows because after investing 1+r/100 this year, you recapture 1+r/100 (1 + r/100) = D dollars one
year later. Instead of working with interest rates r and discrete time (e.g. payment at yearly intervals), economist
prefer to work with a continuous time analog which involves a discount rate δ and a discounting function e−δt
of continuous time t. As we saw in Example 6, Section 3.7, compounding at discrete time intervals involves the
exponential function in the limit as the compounding interval approaches 0. Thus economists use the discount factor
e−δt to reduce D dollars needed at time t in the future to their current value De−δt dollars now.

Example 6. Optimal rotation period for a plantation
In the mid-19th century a German forester by the name of Faustmann developed a theory for the optimal rotation
period of a plantation. First he calculated that if one planted a stand and harvested it every T years, and received
the same value V (T ) each time, then the sum of all the discounted amounts (i.e. the sum of V (T )e−δT obtained
after T years, V (T )e−δ2T obtained after 2T years, V (T )e−δ3T obtained after 3T years and so on for all time into the
future) would constitute to the so-called present value P (T ) of the stand given by the formula
V (T )
P (T ) = .
eδT − 1
Now continue his analysis as follows:
a. Using his formula for P (T ) find a general expression for the optimal stand rotation period T ∗ that is
defined to be the value of T on (0, ∞) that maximizes the present value P (T ) of the stand.
b. What does the expression in a. imply as δ → 0?

2T 5/2
c. Use your technology to find the optimal rotation period when V (T ) = 1+T 2 − 1 and the discount rate
is δ = 0.1.
Solution.
a. The optimal rotation period T ∗ is an extremum of P (T ). Thus, if a maximum exists on an open interval,
T ∗ will satisfy the equation P ′ (T ) = 0 where

′ d V (T )
P (T ) =
dT eδT − 1
V ′ (T )(eδt − 1) − V (T )δeδT
= quotient rule
(eδT − 1)2
Therefore, for δ > 0

δeδT
P ′ (T ) = 0 ⇒ V ′ (T ) = V (T ).
eδT −1
b. Using L’Hopital’s rule to calculate the limit as δ = 0, we obtain that the optimal rotation period T ∗ in
this case will satisfy the equation
δeδT V (T )
V ′ (T ) = V (T ) lim = .
δ→0 eδT − 1 T
By Part b. of the previous Example, this equation implies that T ∗ maximizes the average profit accu-
mulation rate over each harvesting period when δ = 0.
c. From part a. and the specific from for V (T ), the optimal rotation period when δ = 0.1 is the solution to

2T 3/2 T2 + 5 2T 5/2 0.1e0.1T
= −1
2 (1 + T 2 )2 1 + T2 e0.1T − 1
T 3/2 (T 2 + 5)(e0.1T − 1) = 0.1(1 + T 2 )(2T 5/2 − T 2 − 1)e0.1T
T∗ = 2.68361 using technology.
Problem Set 4.4


In Problems 1 to 6, the amount of energy a hummingbird gains after remaining in a patch for t seconds is given. For
each problem, find how long a hummingbird should stay in a patch if it wants to maximize its average energy intake
rate.∗
1. The travel time between patches is 15 seconds and
180t
f (t) = calories
1 + 0.15t
180t
f (t) = calories
1 + 0.15t
360t
f (t) = calories
1 + 0.5t
360t
f (t) = calories
1 + 0.5t
360t
f (t) = calories
1 + 0.3t
360t
f (t) = calories
1 + 0.3t
In problems 7 to 10, rework Example 5 with the given graphs.
7.
90
80
70
60
50
40
30
20
10
−10
0 0.5 1 1.5 2
8.
80
70
60
50
40
30
20
10
−10
−20
0 0.5 1 1.5 2
∗ At the web site, http://www.cquest.utoronto.ca/cgi-bio150/foraging/tutorial.cgi?page=intro, you can find a game involving optimal
foraging of hummingbirds as they fly from one patch of flowers to another patch of flowers. In this game, the average flight time between
patches of flowers is given.

9.
60
50
40
30
20
10
−10
−20
−30
−40
0 0.5 1 1.5 2
10.
60
50
40
30
20
10
−10
−20
−30
−40
0 0.5 1 1.5 2
Assume the house martins in Example 3 can chose between two patches. In Problems 11 to 16 the time to fly to a
patch and the energy yield as a function of patch residence time (t minutes) is given for two patches. If an individual
can only visit one patch, and wants to maximize the average amount of calories it receives, then which patch of each
pair should it choose?
150t 250t
11. B(t) = 3+t Calories with travel time of 2 minutes or B(t) = 5+t Calories with travel time of 3 minutes.
150t 250t
12. B(t) = 3+t Calories with travel time of 1 minute or B(t) = 5+t Calories with travel time of 2 minutes.
150t 150t
150t 150t
250t 150t
15. B(t) = 5+t Calories with a travel time of 2 minutes or B(t) = 4+t Calories with a travel time of 3 minutes.
250t 150t
16. B(t) = 4+t Calories with a travel time of 2 minutes or B(t) = 4+t Calories with a travel time of 15 seconds.
Assume in the Optimal Time to Harvest Example 5 that the profit function P (t) has the form specified in Problems
17 to 22. For these profit functions find the optimal age at which to harvest the stands to maximize profit. Note, in
all these problems, t represents the number of decades rather than years.
2t5/2
17. P (t) = 1+t2 − 1.
3t5/2
18. P (t) = 1+t2 − 1.
2t5/2
19. P (t) = 1+2t2 − 1.
3t5/2
20. P (t) = 1+2t2 − 1.
5t5/2
21. P (t) = 1+2t2 − 2.
4t5/2
22. P (t) = 1+2t2 − 3.


2T 5/2
23. Find the optimal rotation period for a forest stand which has a value V (t) = 1+T 2 − 1 when δ = 0.2.

2T 5/2
24. Find the optimal rotation period for a forest stand which has a value V (t) = 1+T 2 − 3/2 when δ = 0.1.

(7/3)T 5/2
25. Find the optimal rotation period for a forest stand which has a value V (t) = 1+T 2 − 1 when δ = 0.15.

(5/3)T 5/2
26. Find the optimal rotation period for a forest stand which has a value V (t) = 2/3+T 2 − 1 when δ = 0.1.
27. At the NCTM illuminations web site, students are encourage to collect data on how many drops are required
to break a blanched peanut in two pieces. The sample data provided at this web site is shown in the following
graph
20
18
16
14
12
drops
10
0
10 20 30 40 50 60
height (cm)
and can be modeled by the function:

80
drops
f (h) = 0.8 +
h − 10
where h is the height in centimeters. Suppose that the “peanut hummingbird” collects peanuts and wants to
minimize the amount of work required to break a peanut. Determine the height which minimizes the amount
of work to break open the peanuts.
28. In Example 4, we found how the optimal residence time for a great tit depended on the travel time between
patches. While our prediction described the data reasonably well, more than half of the data points lay
above the optimal curve. Dr. Cowie proposed that part of the explanation was that the birds expend energy
traveling between patches and searching for food within a patch. In this problem, you will determine how these
expenditures of energy influence the optimal residence time. Let
E(t) = 6.3587(1 − e−0.0081 t ) calories
denote the amount of energy gained by a bird after residing in a patch for t seconds. Assume that the bird
requires T seconds to travel the patch. Dr. Cowie found that great tits expend approximately 0.697 calories
per second while traveling between patches and expend approximately 0.155 calories per second while searching
for food in a patch.
a. Write a function V (t) that represents the net gain in energy in a patch after residing there for t ≥ 0
seconds.
b. Use the marginal value theorem to find an expression relating the optimal residence time t to the
travel time T .

c. Compare your solution to the solution found in Example 4

29. Suppose the crop developed by plant geneticists, as discussed in Example 2; that is, the weight of the crop t
days after planting satisfies the growth equation w(t) = 5 + 400t − t2 ;is grown in a location that is relatively
pest free, so that
900
P (t) = ,
900 + t
but the crop must be harvested on or before the first frosty day of fall. Suppose the crop has relative value 1
when harvested at its optimum time of maturity, as represented by the day on which the yield Y = aw(t) is
maximized, and that value is reduced by 10te %, where te is the number of days prior to T that harvest actually
occurs. If the expected number of days ts in the growing season—that is the number of frost free days plus
1—is equally likely to fall on any day from day 165 to day 190. Then what is the expected value of the harvest
in any year.
30. When the larva of the codling moth (Cydia pomonella) hatches from its egg it goes looking for an apple. The
period between hatching and finding an apple is called the search period. The search period S seems to be a
function of the temperature, as shown in Table 4.1
Table 4.1: Search Period for the Codling Moth

Temperature S, in days
20◦ C 0.129
21◦ C 0.122
22◦ C 0.116
23◦ C 0.112
24◦ C 0.109
25◦ C 0.106
26◦ C 0.105
27◦ C 0.104
28◦ C 0.104
29◦ C 0.105
30◦ C 0.106
Following the lead of Shaffer and Gold (see Section 4.2, Example 8) Find 1/S for each data value and then use
technology to fit a quadratic function to this data. Find the largest and smallest value of this fitted function
S.
31. A forest economist estimated that in a plantation of a particular species of tree the number of board feet that
can harvested as a function of the age of the plantation is given in Table 4.2
Table 4.2: Harvest Yield for a Lumber Crop

Age (years) Yield (board feet per acre) ($)
15 6013
20 7021
25 8793
30 9411
35 9786
40 9958
45 9921
50 9766
By using your technology to fit a quadratic function to this data, estimate at what age the plantation should
be harvested to maximize the yield of board per acre?
32. By using your technology to fit a cubic equation to the data in Problem 31, find the age in [15, 50] at which
the plantation represented by this data should be harvested to maximize the yield?

33. By using your technology to fit a quartic equation to the data in Problem 31, find the age in [15, 50] at which
34. By using your technology to fit a quintic equation to the data in Problem 31, find the age in [15, 50] at which

408 4.5. LINEARIZATION AND DIFFERENCE EQUATIONS
4.5 Linearization and Difference Equations

As we have seen, difference equations xn+1 = f (xn ) are useful for describing biological dynamics of varying
complexity. The simplest dynamics occur at an equilibrium: the solutions of the difference equation remains constant
for all time. While equilibria can be easy to identify, their biological relevance depends on their stability. Many
biological systems, when perturbed, naturally return to their the equilibrium state around which they operate. The
temperature of our bodies is a case in point. If our temperature is perturbed because of an infection, it returns to
its equilibrium value of 98.6 ◦ F once we are well again. Not all equilibria, however, are stable. If we stand up a 6
month old child, it may stay upright for a second or two, but until the child is around a year old it will soon fall
over. The reason for this is that standing vertically, without feedback control from our muscles constantly moving
us to correct our tendency to fall over, is an unstable situation.
Thus, when a biological system is perturbed away from equilibrium, it may do one of two things. First, it
may return to the equilibrium state, in which case the equilibrium is considered stable. Alternatively, even if the
perturbation is small, the system may continue to drift away from the equilibrium. In this case, the equilibrium is
unstable. In this section, we make the notion of stability precise and provide a simple algebraic method for checking
stability. This method relies on linearizing the difference equation near the equilibrium. These ideas and methods
are applied to models of population growth and population genetics.
We conclude the section by considering another application of linearization and difference equations. Namely,
numerically solving a non-linear equation. This numerical method is a commonly employed alternative to the bisection
method presented in Example 9 of Section 2.3.
Equilibrium stability
We begin with the following example which motivates the notion of a stable equilibrium.
Example 1. Logistic equation
In Example 7 of Section 2.5, we introduced the discrete logistic equation which is a simple model of population
growth. If xn denotes the population density (e.g. average number of individuals per acre) in the n-th generation,
then the model is given by
xn+1 = xn + rxn (1 − xn /K)
where r is the per-capita growth rate at low densities and K is the environmental carrying capacity of the population.
Assume K = 100 in which case x = 0 and x = 100 are the equilibria for the model. For r = 0.5, 1.5, 2.0, simulate
the model for the initial condition x0 = 99. Discuss what you find.
Solution. Simulating the model with x0 = 99 for 25 iterations yields the following figures:
101 101
100.8 100.8 120
100.6 100.6
100
100.4 100.4
100.2 100.2 80
density
density
density
100 100
60
99.8 99.8
99.6 99.6 40
99.4 99.4
20
99.2 99.2
99 99
0 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25
time time time
r = 0.5 r = 1.5 r = 3.0
When r = 0.5, the population density gradually increases from the density 99 to the equilibrium density 100. When
r = 1.5, the population density exhibits oscillations that dampen and eventually approach the equilibrium density
100. When r = 3, the population exhibits irregular oscillations and never approaches the equilibrium density 100
despite having started near this equilibrium density. 2

4.5. LINEARIZATION AND DIFFERENCE EQUATIONS 409
Example 1 illustrates that some solutions started near the equilibria approach the equilibrium, while other
solutions started near an equilibrium move away. These observation suggest the following definitions.
An equilibrium x∗ for xn+1 = f (xn ) is stable provided that there exists an open
interval (a, b) containing x∗ such that
lim xn = x∗ and xn ∈ (a, b) for all n

Equilibrium stability n→∞
whenever x0 lies in (a, b). An equilibrium x∗ for xn+1 = f (xn ) is unstable provided
that there is an interval (a, b) containing x∗ such that all solutions xn eventually
leave (a, b) whenever x0 lies in (a, b) but x0 6= x∗ .
Stated more simply, stability of an equilibrium means that if the solution starts near the equilibrium then it
remains nears the equilibrium and asymptotically approaches the equilibrium. Alternatively, solutions starting near
(but not at) an unstable equilibrium eventually move further away from the unstable equilibrium.
We note that x = x∗ is stable if there is an interval (a, b) containing x∗ such that
1. the image of (a, b) under f lies in (a, b), and
2. limn→∞ xn = x∗ for all solutions xn with x0 ∈ (a, b)
The only difference between these two conditions and our initial definition of stability is condition 1. However, in
the problem set you are challenged with proving that condition 1 is equivalent to requiring that xn lies in (a, b) for
all n ≥ 1 whenever x0 lies in (a, b).
Example 2. Stability the hard way
Find the equilibria of the following difference equations and verify stability using the definitions of stable and
unstable.
a. xn+1 = xn /2
b. xn+1 = x2n
Solution.
a. The equilibria are given by solutions of x = x/2. Hence, the only equilibrium is x = 0. Given any x0 ,
using the methods developed in Section 1.7, it follows that xn = 21n x0 . Therefore, given any a > 0, we
get that limn→∞ xn = 0 for any x0 in (−a, a). In addition, the image of (−a, a) under f is (−a/2, a/2).
Therefore, x∗ is stable.
b. The equilibria of xn+1 = x2n must sastisfy x = x2 . Hence, the equilibria are given by x = 0, ±1. For any
x0 , we have that x1 = x20 , x2 = x21 = x40 , x3 = x22 = x80 . Hence, xn = x2n
0 . If x0 lies in the interval (−1, 1),
then limn→∞ x2n 2
0 = 0. Moreover, the image of (−1, 1) under the function f (x) = x is [0, 1) which lies in
(−1, 1). Hence 0 is a stable equilibrium for this difference equation. For any x0 > 1 or x0 < −1, xn = x2n 0
approaches +∞ as n approaches ∞. Hence, for any initial condition near 1 (or −1), the solution moves
away from 1 (respectively, −1) so that the equilibria 1 and −1 are unstable.
Example 3. Stability of linear difference equations
Consider the linear difference equation

xn+1 = rxn

For this difference equation, the origin, x = 0, is always an equilibrium. Determine for which r values, the origin is
stable or unstable.
Solution. The solution of this difference equation is given by xn = rn x0 . Suppose x0 6= 0. If |r| < 1, then
|xn | = |r|n |x0 | is decreasing to zero at a geometric rate. Therefore, if |r| < 1, then x = 0 is stable. Alternatively,
if |r| > 1, then |xn | = |r|n |x0 | is increasing without bound. Hence, x = 0 is unstable when |r| > 1. If r = 1, then
xn = x0 for all n. Hence, xn neither approach or move away from 0 so that 0 is neither stable or unstable when
r = 1. Similarly, if r = −1, you should show that x = 0 is neither stable nor unstable. 2
Stability via Linearization

While stability of an equilibrium can be verified directly using the definition, this method is somewhat awkward. To
make things easier, we can take advantage of linearization and our work in Example 3.
Suppose x∗ is an equilibrium of xn+1 = f (xn ) and f is differentiable at x∗ . If we approximate f by its tangent
line at x = x∗ , we get
f (x) ≈ f (x∗ ) + f ′ (x∗ )(x − x∗ )

= x∗ + f ′ (x∗ )(x − x∗ )
where the second line follows from the fact that x∗ is an equilibrium. Using this linear approximation and setting
r = f ′ (x∗ ), we get
xn+1 ≈ x∗ + r(xn − x∗ )
Equivalently
(xn+1 − x∗ ) ≈ r(xn − x∗ )
If we make the change of variables yn = xn − x∗ , then
yn+1 ≈ r yn
Example 3 suggests that if |r| < 1 and x0 is sufficiently close to x∗ , then we expect that yn approaches zero at a
geometric rate. Equivalently, xn approaches x∗ at a geometric rate. Alternatively, if |r| > 1 and x0 is sufficiently
close to x∗ , then we expect that yn increases initially at a geometric rate. Equivalently, xn initially moves away from
x∗ at a geometric rate. As it turns out, all of these statements hold provided that xn is sufficiently close to x∗ . In
particular, the following theorem can be proven:
Theorem 4.3. Stability via Linearization
If xn+1 = f (xn ) has an equilibrium at x = x∗ and r = f ′ (x∗ ) exists, then

Stability If |r| < 1, then x∗ is stable.
Instability If |r| > 1, then x∗ is unstable.
Notice that the linearization is inconclusive about stability if |r| = 1.
Example 4. Logistic revisited
Consider the logistic difference equation
xn+1 = xn + rxn (1 − xn /100)
with r > 0. Determine for which r values x∗ = 100 is stable.
Solution. Let f (x) = x + rx(1 − x/100). To determine whether an equilibrium is stable or not, we need to compute
f ′ (x) = 1 + r − rx/50 = 1 + r(1 − x/50)

and evaluate at x = 100

f ′ (100) = 1 + r(1 − 2) = 1 − r
For stability, we need that |1 − r| < 1. Equivalently,
−1 < 1−r <1

−2 < −r < 0
2 > r>0
Hence, the equilibrium x∗ = 100 is stable provided that 0 < r < 2. This conclusion is consistent with our simulations
in Example 1. Indeed for r = 0.5 and r = 1.5, the simulations approached the equilibrium x∗ = 100. However, for
r = 3, the simulation oscillated irregularly and never converged to any density. 2
Figure 4.15: Photos of the budworm moth, Colorado potato beetle, and the meadow plant bug.
Example 5. Stability of insect population dynamics
Biology professor, T. S. Bellows, investigated the ability of several difference equations to describe the population
dynamics of insects. He found that the so-called, generalized Beverton-Holt model, provided the best description. If
xn denotes the population density in the nth generation, then the model is of the form
rxn
xn+1 =
1 + xbn
where r is the intrinsic fitness of the population and b measures the abruptness of density dependence. For three
insect species, Professor Bellows found the following parameter estimates:
Budworm moth r = 3.5 and b = 2.7
Colorado potato beetle r = 75 and b = 4.8
Meadow plant bug r = 2.2 and b = 1.4
These different insects are shown in Fig. 4.15.
a. Use these parameter estimates to determine which population, according to the model, supports a stable
equilibrium.
b. For the species that do not support a stable equilibrium simulate their dynamics.

Solution.
x
a. To begin with, we need to find the equilibria of the model which must satisfy x = r 1+x b . Equivalently,
x = 0 or
r
1 =
1 + xb
1 + xb = r
xb = r−1
x = (r − 1)1/b
Hence, for the budworm moth, the equilibria are given by
x = 0 and x = 2.51/2.7 ≈ 1.40.
For the Colorado potato beetle, the equilibria are given by
x = 0 and x = 741/4.8 ≈ 2.45.
For the meadow plant bug, the equilibria are given by
x = 0 and x = 1.21/1.4 ≈ 1.14.
rx
Let f (x) = 1+xb
To determine the stability of these equilibria, we need to evaluate the equilibria at
1 + xb − bxb−1 x
f ′ (x) = r
(1 + xb )2
1 + (1 − b)xb
= r
(1 + xb )2
Notice that f ′ (0) = r. Since r > 1 for all the species, 0 is an unstable equilibrium for all species.
For the budworm moth, f ′ (1.4) ≈ −0.93. Since | − 0.93| = 0.93 < 1, the equilibrium x ≈ 1.4 is stable for
the budworm moth model. For the Colorado potato beetle, f ′ (2.45) ≈ −3.75. Since | − 3.75| > 1, the
equilibrium x ≈ 2.45 is unstable for the Colorado potato beetle model. Therefore, the Colorado potato
beetle model has no stable equilibria. For the meadow plant bug, f ′ (1.14) ≈ 0.24. Since 0.24 < 1, the
equilibrium x ≈ 1.14 is stable for the meadow plant bug model.
b. Since all of the equilibria for the Colorado potato beetle model are unstable, we can ask what is the
long-term behavior of a non-equilibrium solution? Simulating the model with x0 = 2.4 (a value ”close
to” the equilibrium 2.45) yields the following numerical solution
45
40
35
30
25
density
20
15
10
0
0 10 20 30 40 50
time

Stability of monotone difference equations

A special, yet important, class of difference equations were introduced in Section 2.5. For these difference equations,
xn+1 = f (xn ), f is a continuous and increasing function over some domain of interest. Within this domain, the
solutions to this difference equation are monotone (i.e. either increasing for all n or decreasing for all n). As a
consequence of this monotonicity, it is possible to provide a simple graphical approach to stability for these difference
equations.
Theorem 4.4. Stability of Monotone Difference Equations
Let f be a continuous, increasing function on an interval (a, b). Let x∗ be an equilibrium for xn+1 = f (xn ) that
lies in (a, b). Then
Stable If f (x) > x for x in (a, x∗ ) and f (x) < x for x in (x∗ , b), then x∗ is stable. In particular, limn→∞ xn = x∗
whenever x0 lies in (a, b).
Unstable If f (x) < x for x in (a, x∗ ) and f (x) > x for x in (x∗ , b), the x∗ is unstable. In particular, xn leaves
(a, b) for some n whenever x0 lies in (a, x∗ ) or (x∗ , b).
Combined with the Monotone Convergence Theorem in Section 2.5, this stability theorem allows us to determine
the fate of solutions to difference equations where f is a continuous, increasing function.
Example 6. A graphical approach to stability
Consider the difference equation

xn+1 = f (xn )
where the graph of f is given by
1.2
0.8
0.6
y
0.4
0.2
−0.2
−0.2 0 0.2 0.4 0.6 0.8 1 1.2
x
Assuming f is increasing, identify the equilibria and determine their stability.
Solution. The equilibria correspond to points where the graph of y = x intersects with the graph of y = f (x).
These intersections occur at x = 0, 12 , and 1. Inspection of the graph of f yields that f (x) > x for x in (−0.2, 0) and
(0.5, 1). Alternatively, f (x) < x for x in (0, 0.5) and (1, 1.2). Applying the stability theorem for monotone difference

equations implies that 0 and 1 are stable, while 0.5 is unstable. Moreover, xn converges to 0 whenever x0 lies in
(−0.2, 0.5) and xn converges to 1 whenever x0 lies in (0.5, 1.2).
We can verify this stability with cobwebbing. Cobwebbing with an x0 slightly above 0.5 and an x0 slightly below
0.5 leads to the following figures:
1 1
0.8 0.8
0.6 0.6
n+1
n+1
x
x
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
−0.2 0 0.2 0.4 0.6 0.8 1 1.2 −0.2 0 0.2 0.4 0.6 0.8 1 1.2
xn xn
As we saw in Example 4, Section 1.7, we can construct models that allow us to trace the fate of alleles that
code for genes affecting the biological fitness (i.e. the ability to survive and reproduce) of individuals. Recalling our
discussion in Section 1.7 regarding the genetics concepts of loci, alleles, and ploidy (haploid and diploid), we consider
an allele that codes for a particular trait at a diploid locus. If the frequency of this allele in the population is x ∈ [0, 1],
then a well-known model describing how the frequency of this trait changes from generation n to generation n + 1
in a very large (essentially infinite) randomly mating population is
w1 x2 + x(1 − x)
xn+1 = f (xn ) where f (x) =
w1 x2 + 2x(1 − x) + w2 (1 − x)2
where w1 and w2 are the fitness of individuals who respectively have two and no copies, relative to individuals that
have only one copy, of the allele in question. Referring back to the equation given in Example 4 in Section 1.7
regarding the spread of a deleterious mutant allele, we see that the equation is the above equation for the special case
w1 = 0 and w2 = 1. This case is equivalent to the statement that the allele a in question is recessive (heterozygous
Aa individuals are not affected) and lethal (aa individuals die before reproducing). Despite the drastic effect of this
lethal allele a, we found that it can take a very long time for it to be eliminated from the population. In the next
example, we consider a variant of this model where the allele that is lethal in the homozygous state actually confers
a benefit on an individual when combined with the other allele (i.e. when in the heterozygous state).
Example 7. Fate of the sickle cell allele
In areas of the world were malaria occurs, it is known that individuals who have one sickle cell allele are more
resistant to malaria than those who don’t have the allele. On the other, hand individuals who have two sickle cell
alleles suffer from sickle cell anemia that can cause premature death. Let x denote the frequency of the allele that
does not cause sickle cell anemia. Assume that, when malaria is prevalent, individuals not protected by the sickle
cell allele will, on average, have 10% fewer progeny than individuals that have one sickle cell allele: i.e. w2 = 0.9.
For the sake of simplicity, we assume that individuals with sickle anemia die before they reproduce (even though, in
reality, this assumption is too extreme): i.e. w1 = 0.
a. Write down and simplify the difference equation, xn+1 = f (xn ), under the assumption that x 6= 0.
b. Verify that f is increasing on the interval (0, 1].
c. Find the equilibria on the interval (0, 1] and determine their stability.
d. Interpret your results.

Solution.
a. Under the assumption that w1 = 0.9, w2 = 0, and x 6= 0, we get
0.9 x2 + x(1 − x)
f (x) =
0.9 x2+ 2x(1 − x) + 0 · (1 − x)2
−0.1 x2 + x
=
−1.1 x2 + 2x
−0.1 x + 1
= assuming x 6= 0
−1.1 x + 2
b. To verify that f (x) is increasing on the interval, we can the derivative of f (x)

d −0.1 x + 1
f ′ (x) =
dx −1.1 x + 2
−0.1(−1.1 x + 2) + 1.1(−0.1 x + 1)
=
(2 − 1.1 x)2
0.11 x − 0.2 − 0.11 x + 1.1
=
(2 − 1.1 x)2
0.9
=
(2 − 1.1 x)2
Hence f ′ (x) > 0 on (0, 1] and f is increasing on this interval.
c. To find the equilibria in (0, 1], we solve x = f (x):
−0.1 x + 1
x = By definition of equilibrium
−1.1 x + 2
−1.1 x2 + 2 x = −0.1 x + 1
0 = 1.1 x2 − 2.1 x + 1 = (1.1 x − 1)(x − 1)
Hence, the equilibria are given by x = 1 and x = 1/1.1 ≈ 0.91.

0.9
To determine their stability, we can use the derivative calculated in part b.. We have f ′ (1) = 0.9 2 =
′
1/0.9 ≈ 1.11. Hence, x = 1 is unstable. Alternatively f (1/1.1) = 0.9 so that x = 0.91 is stable. In fact,
these calculations imply that f (x) < x on the interval (0.91, 1) and f (x) > x on the interval (0, 0.91).
Hence, the stability theorem for monotone difference equations implies that limn→∞ xn = 0.91 whenever
x0 lies in (0, 1).
d. The results imply that as long as both alleles are present in the population, they will persist and the
frequency of the sickle cell anemia allele will approach a value of 1 − 1/1.1 ≈ 0.09. Hence, under the
assumptions made, we expect approximately 9% of this population to have the sickle cell allele.
Newton’s Method
The final application of linearization to difference equations is to illustrate the inner workings of an algorithm called
Newton’s method. This method is used to find the roots of nonlinear algebraic equations of the form f (x) = 0
that are too difficult or impossible to solve algebraically. The algorithm is based on the Newton-Raphson difference
equation which we now describe. Suppose our initial guess for the solution of f (x) = 0 is x = x0 . Assuming that this
guess is not the solution, we need to find an improved guess for the root. Since the nonlinear function in question is
too hard to manipulate by hand, we consider the linear approximation of y = f (x) at x = x0 :
y = f (x0 ) + f ′ (x0 )(x − x0 )

To get our next guess x1 for the solution to f (x) = 0, we set x = x1 and y = 0 in the linear approximation and solve
for x1 :
0 = f (x0 ) + f ′ (x0 )(x1 − x0 )

′
−f (x0 )/f (x0 ) = x1 − x0 assuming that f ′ (x0 ) 6= 0
x1 = x0 − f (x0 )/f ′ (x0 )
To get the next guess, x2 , we can proceed similarly to get the equation x2 = x1 −f (x1 )/f ′ (x1 ). Proceeding inductively,
we get the following difference equation:
f (x)
xn+1 = F (xn ) where F (x) = x − and f ′ (x) 6= 0.
f ′ (x)
This difference equation is illustrated in Figure 4.16. In this figure, r is a root of the equation f (x) = 0.
a. Estimating a root, r, of y = f (x) b. First, second, and third estimates
Figure 4.16: Graphical representation of Newton’s method
One of the key requirements to the method is to start with a reasonable guess x0 for the root x∗ because the
closer x0 is to x∗ the more likely the solution will converge to x∗ . For example, if we want to use Newton’s method
a numerical solution to the equation f (x) = x2 − 10 = 0, which of course is the same as finding a numerical
to obtain √
value for 10, we could begin with x0 = 3 or 4. We then use the Newton-Raphson equation to solve for x1 and carry
on iteratively. The following theorem implies that if the sequence converges, then it converges to a root.
Theorem 4.5. Newton’s method
Let f (x) be a continuously differentiable function with f ′ (x) 6= 0. Any solution to

f (xn )
xn+1 = xn − f ′ (xn ) 6= 0
f ′ (xn )
will approach a limit that is a root of the equation, or else will not have a limit.
When applying Newton’s method, we choose a number ǫ > 0 that determines the allowable tolerance for estimated
solutions. Given an appropriate initial guess x0 , we iteratively compute xn until |f (xn )| < ǫ This procedure is shown
in the following flowchart.

Example 8. Time to tumor regrowth
In Example 4, Section 4.3, we considered the growth of a mouse tumor after being given a drug treatment. To
model the volume of the tumor, we used the function (renaming the variable x rather than t)
V (x) = 0.005e0.24 x + 0.495e−0.12 x cm3
where x is measured in days after the drug was applied. Using Newton’s method, find the time x at which the tumor
volume is within one hundredth of its original size of 0.5 cm3 . For an initial guess use x = 22 days.
Solution. We want to find a root of
f (x) = V (x) − 0.5 = 0.005e0.24 x + 0.495e−0.12 x − 0.5
To use Newton’s method, we need to compute the first derivative:
f ′ (x) = 0.0012e0.24 x − 0.0594e−0.12 x
We will see what happens if we start with x0 = 20, although other start values close to the anticipated solution can
be chosen. To find x1 , we compute
f (x0 ) f (22)
x1 = x0 − = 22 − ′ =≈ 20.07
f ′ (x0 ) f (22)
Since f (20.07) ≈ 0.07, which is greater than one hundredth, we need to find x2
f (x1 ) f (20.07)
x2 = x1 − ′
= 20.07 − ′ ≈ 19.56
f (x1 ) f (20.07)
Since f (19.56) ≈ 0.0007, which is less than a hundredth, we are done. Hence, the tumor should regrow to its original
size in approximately 19.6 days. 2

Implementation of Newton’s method for finding roots is widespread, as a quick search of the web will reveal.
Several websites should turn up that allow you to input a function, an initial condition and the number of iterations,
and return the corresponding sequence from Newton’s method.∗
Newton’s method may not converge to a solution, as shown by the following example.
Example 9. Non convergence of Newton’s method
Consider the function f (x) = ex − 2x. Use Newton’s method with x0 = 1 to find a solution to f (x) = 0. Discuss
what you find.
Solution. Note that f ′ (x) = ex − 2 so that

f (xn )
xn+1 = xn −
f ′ (xn )
exn − 2xn
= xn − xn
e −2
If we let x0 = 1, then we find
e1 − 2(1)
x1 = 1 − =0
e1 − 2
e0 − 2(0)
x2 = 0 − =1
e0 − 2
e1 − 2(1)
x3 = 1 − =0
e1 − 2
Note that the values simply alternate, and the method does not converge to a solution. A graph (see Figure 4.17)
shows why there can be no solution.
Figure 4.17: Graph of y = ex − 2x
Problem Set 4.5

In problems 1 to 4, find the equilibria of xn+1 = f (xn ) and determine their stability using cobwebbing.
∗ One which we particularly like because is shows not only the numerical, but also the graphical solution, is
www.cse.uiuc.edu/eot/modules/nonlinear eqns/Newton/. A site showing Newton’s method using a TI-83 calculator is
www.acad.sunytccc.edu/instrut/sbrown/ti83/newton.htm.

1.
0.8
0.6
y
0.4
0.2
−0.2 0 0.2 0.4 0.6 0.8 1 1.2

x
2.
1.8
1.6
1.4
1.2
1
y
0.8
0.6
0.4
0.2
0 0.5 1 1.5 2
x
3.
0.8
0.6
xn+1
0.4
0.2
0 0.2 0.4 0.6 0.8 1

xn
4.
1.5
0.5
y
−0.5
−1
−1 −0.5 0 0.5 1 1.5 2
x
In problems 5 to 10, find the equilibria of the difference equation. Moreover, use the definition of an unstable
equilibrium and a stable equilibrium to determine their stability.
5. xn+1 = 2xn
1/3
6. xn+1 = xn
1/2
7. xn+1 = xn

8. xn+1 = 2x2n
2xn
9. xn+1 = 1+2xn
xn
10. xn+1 = 2+2xn
In problems 11 to 16, find the equilibria of the difference equation. Moreover, use linearization to determine their
stability.
11. xn+1 = x2n

xn
12. xn+1 = 2+2xn
2xn
13. xn+1 = 1+2xn
14. xn+1 = 2xn (1 − xn )
15. xn+1 = 4xn (1 − xn )

1
16. xn+1 = 1+xn
17. Consider the logistic difference equation xn+1 = rxn (1 − xn /100) with r > 0.
a. Find the equilibria.

b. Determine under what conditions the origin is stable.
c. Determine under what conditions the non-zero equilibrium is positive.
d. Determine under what conditions the non-zero equilibrium is stable.
18. Consider the logistic difference equation xn+1 = rxn (1 − xn /50) with r > 0.

rxn
19. Consider the Beverton-Holt difference equation xn+1 = 1+xn with r > 0.

rxn
20. Consider the Beverton-Holt difference equation xn+1 = 1+2xn with r > 0.

Following the approach laid out in Example 7 (i.e. graphing and using Theorem 4) investigate the fate of an allele
in a large randomly mating population when the fitnesses of individuals with two and zero copies of the allele relative
to those that have one copy are given in Problems 21 to 24.
21. w1 = 1/2 and w2 = 1/2.
22. w1 = 2 and w2 = 1.

23. w1 = 1/2 and w2 = 2.

24. w1 = 2 and w2 = 2.
Use Newton’s method to estimate a root of the equations in Problems 25-32. Use x0 as a starting value and iterate
20 times.
25. x2 − 2 = 0, x0 = 1
26. x2 + 2 = 0, x0 = 1
27. x3 − x + 1 = 0, x0 = −1
28. x4 + 2x − 1 = 1, x0 = 1
29. cos x = x, x0 = 1
30. sin x + 0.1 = x2 , x0 = 0
31. ex − 5x = 0, x0 = 0
32. ex + x = 0, x0 = −1
11
33. Let f (x) = −2x4 + 3x2 + 8
a. Show that the equation f (x) = 0 has at least two solutions.

b. Use x0 = 2 and Newton’s method to estimate a root of the equation f (x) = 0.
1
c. Show that Newton’s method fails if you choose x0 = 2 as the initial estimate.
34. Let f (x) = x6 − x5 + x3 − 3
a. Show that the equation f (x) = 0 has at least two solutions.
b. Use x0 = 2 and Newton’s method to find a root of the equation f (x) = 0.
c. Show that Newton’s method fails if you choose x0 = 0 as the initial estimate.
35. For the flour beetle species, Lasioderma serricorne, Professor Bellow found that the fraction f (x) of eggs
surviving as a function of their initial density x is well-described by
0.806x
f (x) =
1 + (0.0114x)7.53
A graph of this function and the corresponding data is shown below:
If we assume that each adult produces 2 eggs, then the dynamics of the population are given by
xn+1 = 2f (xn )

a. Find the equilibria and determine their stability.

b. Simulate the model with x0 = 0.1
36. For the flour beetle species, Tribolium castaneum, Professor Bellow found that the fraction f (x) of eggs surviving
as a function of their initial density x is well-described by
0.8x
f (x) =
1 + (0.0149x)4.21
If we assume that each adult produces r eggs, then the dynamics of the population are given by
xn+1 = rf (xn )
a. Find the equilibria and determine their stability for r = 2, 4, 6.

b. Simulate the model with x0 = 0.1 for r = 2, 4, 6.
37. Show that the genetic model

w1 p2 + p(1 − p)
pn+1 =
w1 p2 + 2p(1 − p) + w2 (1 − p)2
has three equilibrium solutions: p = 0, p = 1, and p∗ = (w2 − 1)/(w1 + w2 − 2). Further, demonstrate that
a. p = 1 is the only stable equilibrium when w1 > 1 > w2

b. p∗ is the only stable equilibrium when w1 < 1 and w2 < 1 (a condition known as heterozygote
superiority)
c. p∗ is the only unstable equilibrium when w1 > 1 and w2 > 1 (a condition known as inbreeding
depression).
38. It can be shown that the volume of a spherical segment is given by

π 2
V = H (3R − H)
3
where R is the radius of the sphere and H is the height of the segment, as shown in Figure 4.18.
If V = 8 and R = 2, use Newton’s method to estimate the corresponding H.
39. Historical Quest The Greek geometer Archimedes is acknowledged to be one of the greatest mathematicians
of all time.

Figure 4.18: Spherical segment is the portion of a sphere between two parallel planes
Archimedes 287-212 B.C.
Ten treatises of Archimedes have survived the rigors of time (as well as traces of some lost works) and are
masterpieces of mathematical exposition. In one of these works, On the Sphere and Cylinder, Archimedes asks
where a sphere should be cut in order to divide it into two pieces whose volumes have a given ratio.
Show that if a plane at distance d from the center of a sphere with R = 1 divides the sphere into two parts,
one with volume twice that of the other, then
3d3 − 9d2 + 2 = 0
Use Newton’s method to estimate d.

40. Suppose the plane in Problem 39 is located so that it divides the sphere in the ratio of 1:3. Find an equation
for d, and estimate the value of d using Newton’s method.
41. In Example 4 in Section 4.3, we considered an the growth of a mouse tumor after being given a drug treatment.
To model the volume of the tumor, we used the function
V (x) = 0.005e0.24 x + 0.495e−0.12 x cm3
where x is measured in days after the drug was applied. Using Newton’s method, estimate (within one
hundredth error) the time x at which the tumor volume has doubled. For an initial guess use x = 30 days.
42. In Example 4 in Section 4.3, we considered an the growth of a mouse tumor after being given a drug treatment.
To model the volume of the tumor, we used the function
V (x) = 0.005e0.24 x + 0.495e−0.12 x cm3
where x is measured in days after the drug was applied. Using Newton’s method, estimate (within one
hundredth error) the time x at which the tumor volume has quadrupled. For an initial guess use x = 25
days.

43. In Problem 18 in Section 4.3, you found that the volume of a tumor for mice under a different drug regime was
V (x) = 0.0044 e0.239 x + 0.4356e−0.111x cm3
where x is days after treatment. Use Newton’s method to estimate (within one hundredth error) the time x at
which tumor volume has regrown to its original volume. For an initial guess use x = 20 days.
44. In Problem 18 in Section 4.3, you found that the volume of a tumor for mice under a different drug regime was
V (x) = 0.0044 e0.239 x + 0.4356e−0.111x cm3
where x is days after treatment. Use Newton’s method to estimate (within one hundredth error) the time x at
which tumor volume has doubled. For an initial guess use x = 30 days.
45. Show that for different initial values Newton’s Method converges to a unique solution for the function
y = x3 − 3x2 + 2x + 0.4
but yet converges to one of three solutions for the function
y = x3 − 3x2 + 2x + 0.3.
Why is this the case?

DEFINITIONS
Section 4.2
Local maximum, p. 366
Local minimum, p. 366
Local extremum, p. 366
Critical point, p. 367
Critical value, p. 367 Global extremum, p. 370
Global minimum, p. 370
Global maximum, p. 370
Section 4.3
Sustainable, p. 382
Section 4.4
Individual’s fitness, p. 393
Optimal residence times, p. 395
Section 4.5
Stable, p. 409
Unstable, p. 409
Newton’s method, p. 415

Section 4.1
Vertical asymptotes, p. 355
Intervals of increase and decrease, p. 355
Intervals of concavity, p. 355
x and y intercepts, p. 355
Section 4.2

THEOREM 4.1 FERMAT’S THEOREM, p. 366

First derivative test, p. 368
Second derivative test, p. 369
Closed interval method, p. 371
Open interval method, p. 374
Section 4.3
Optimization guidelines, p. 379
Section 4.4
THEOREM 4.2 MARGINAL VALUE THEOREM, p. 397
Section 4.5
Logistic equation, p. 408
Equilibrium stability, p. 409
Cobwebbing, p. 413
Stability of linear difference equations, p. 409
Linearization, p. 410
THEOREM 4.3 STABILITY VIA LINEARIZATION, p. 410
THEOREM 4.4 NEWTON’S METHOD, p. 416
Section 4.1
Dropping welks
Tylenol in the bloodstream
Michaelis-Menten equation
Stock recruitment curves
Section 4.2
Thermodilution
CO2 concentrations
Search period in the codling moth
Section 4.3
Maximum economic yield for corn
Sustainable exploitation of the arctic fin whale
Best path applications
Tumor regrowth
Vascular branching
Section 4.4
Northwestern crows and welks
Optimal time for producing seeds
Optimal foraging in a multi-patch environment
Optimal foraging of the great titmouse
Optimal time to harvest
Section 4.5
Stability of the moth, beetle, and plant bug

Use derivatives to determine the maximum and minimum value of each function on the interval given in Problems 1
to 10.
1. f (x) = x2 − x + 12 on [−1, 2]
2. g(x) = x3 − 3x − 4 on [−2, 2]
3. 2(x + 20)2 − 8(x + 20) + 7 on [−2, 2]

4. (x − 12)2 − 2(x − 12)3 on [−1, 0]

√
5. f (x) = xe−x on [0, 6]
6. f (x) = x4 − 2x5 + 5 on [0, 1]

1
7. f (x) = x2 +3 on [1, 2]
x
8. f (x) = x2 +1 on [0, 2]
9. t − ln t on [0.5, 2]
10. e−x/2 ln x on [1, 4]

x3 +3
11. Using asymptotes, graph f (x) = x(x+1)(x+2) by hand and then check it using a calculator.
12. Consider the family of curves

y 2 = x3 + x2 + bx + 2b
Using calculus, graph the curves for the given values of b.
a. b = 0
b. b = 0.05
c. b = 0.01
d. b = −0.05
e. b = −0.1
13. The function

λn
P (n) =
1 + (an)k
is used in population models to give the size for the next population N (n) in terms of the current population
n, where λ, a, and k are positive constants.
a. Graph y = N (n) for the case where λ = 4, a = 1, and k = 2.

b. If λ, a, and k stay fixed, for what value of n is P (n) maximized?
14. The canopy height (in meters) of a tropical grass may be modeled by (for 0 ≤ t ≤ 30)
h(t) = 0.0000071t3 − 0.0015852t2 + 0.1419159t + 3.14
where t is the number of days after mowing.
a. Sketch the graph of h(t).

b. When was the canopy height growing most rapidly? Least rapidly?
15. Public awareness of a new drug is modeled by
5.2t
P (t) = + 0.18
0.015t2 + 0.342
where t is the number of months after FDA approval and P (t) is the fraction of people who are aware of the
drug and its possible uses.
a. Find the critical points for P (t).

b. Sketch the graph of P (t).
c. At what time, t, during the time interval 0 ≤ t ≤ 36 is P (t) the largest?

16. Suppose that systolic blood pressure of a patient t years old is modeled by
P (t) = 38.52 + 21.8 ln(0.98t + 1)
for 0 ≤ t ≤ 60, where P (t) is measured in millimeters of mercury.
a. Sketch the graph of P (t).

b. At what rate is P (t) increasing at age t?
17. During the time period 1905-1920, hunters virtually wiped out all large predators on the Kaibab Plateau near
the Grand Canyon in northern Arizona. This, in turn, resulted in a rapid increase in the deer population P (t)
until food supplies were exhausted and famine let to a steep decline in P (t). A study of this ecological disaster
determined that during the time period 1905-1920, the rate of change of the the population, P ′ (t), could be
modeled by the function
1
P ′ (t) = (100 − 5t)t3
8
0 ≤ t ≤ 20, where t is the number of years after the base year of 1905.
a. In what year during this period was the deer population the largest?
b. In what year does the rate of growth P ′ (t) begin to decline?
18. Let C(t) denote the concentration in the blood at time t of a drug injected into the body intramuscularly. In
a now-classic paper by E. Heinz, the concentration was modeled by the function
k
C(t) = (e−at − e−bt ) t≥0
b−a
where a, b (with b > a) and k are positive constants that depend on the drug.∗ At what time does the largest
concentration occur? What happens to the concentration as t → +∞?
19. Consider a bird that has arrived at a wooded patch with two trees. If the bird spends x minutes foraging for
insects on the first tree, she gains E1 (x) = 200(1 − e−x ) Calories of insects. If the bird spends x minutes on
the second tree, she gains E2 (x) = 100(1 − e−x ) Calories of insects. Assuming the bird has 5 minutes to spend
in the patch. Determine the time she should spend on each tree to optimize her energy intake.
20. For the flour beetle species, Tribolium confusum, Professor Bellow’s found that the fraction f (x) of eggs
surviving as a function of their initial density x is well-described by
0.61x
f (x) =
1 + (0.0116x)3.12
∗ E. Heinz, “Problems bei der Diffusion kleiner Substanzmengen innerhalb des menschlichen Kor”, Biochem., Volume 319 (1949), pp.
482-492.

If we assume that each adult produces r eggs, then the dynamics of the population are given by
xn+1 = rf (xn )
a. Find the equilibria and determine their stability for r = 2, 4, 6.

b. Simulate the model with x0 = 0.1 for r = 2, 4, 6.

4.7 Group Projects

Working in small groups is typical of most work environments, and learning to work with other to communicate
of the following projects.
Project 4A: Optimal Swimming Patterns

In getting from one spot to another, fish have to contend with drag forces and gravity. Drag forces are much greater
when a fish is swimming than when it is merely gliding. To reduce the amount of time spent swimming, fish that
are heavier than water engage in burst swimming in which they alternate between gliding and swimming upwards.
This burst swimming leads to a vertical zig-zag motion of the fish in the water as shown below:
a b
A B
where a is the angle of the upward glide and b is the angle of the downward glide.
In this project, you will investigate the optimal swimming pattern under the following assumptions
• Throughout its swim, the fish maintains a constant speed s to the right.
• The forces acting on the fish are its weight W relative to the water and drag forces.
• The drag on the gliding fish is D and the drag on the swimming fish is kD where k ≥ 1.
• The fish has sufficient top/bottom surface area (e.g. a skate) that frictional forces perpendicular to the
top/bottom of the fish cancel the component of the gravitational force that is perpendicular to the top/bottom
of the fish.
• The energy expended by the fish in swimming is proportional to the force it exerts in moving.
Under these assumptions, your project should do the following
• Find the ratio of energy in the burst mode to the energy for continuous horizontal swimming from A to B.
• It has been found empirically that tan a ≈ 0.2 Given this information, find the optimal value of b for the fish.
• Determine how much energy the fish saves by swimming with this b instead of swimming horizontally.
• Determine how sensitive the amount energy used is used is to b, and how sensitive the optimal b is to the
estimate of a.

Project 4B: Stability and Bifurcation Diagrams

Consider the normalized version of the logistic model introduced in the first example in this Section 4.5—that is we
set K = 1 or, equivalently, interpret the units of x in terms of multiples of K to obtain the equation
xn+1 = f (xn ) where f (x) = rx(1 − x).
Now explore the behavior of this equation as follows:
1. Solve for the equilibrium solutions as a function of r and determine the stability properties of these equilibria
for r ∈ [0, 5]. You will notice that as r increases an equilibrium solution jumps at some point rb from being
stable on one side of rb to unstable on the other side of rb . The value rb is called a bifurcation point.
2. Plot the equilibria in the r-x plane (r is the horizontal axis spanning [0,5]) using a solid line to denote where
the nontrivial equilibrium solution x̂ to x = f (x) is stable and a dotted line where it is unstable.
3. Now consider the equilibria of the iterated logistic map xn+2 = f (f (xn )) by constructing (see Section 1.6) the
composite map (f ◦ f ) (x). Use the terminology f 2 ≡ (f ◦ f ). Find all the equilibria of f 2 (x) as a function
of r and plot them on the same r-x plane as above, but this time plot only the nontrivial stable solutions
using a solid line (if you plot where they are unstable, your diagram will become to busy). Note that the
equation x = f 2 (x) has many more solutions than the equation x = f (x): it has both all the solutions to
equation x = f (x) (demonstrate this) and additional solutions that come in pairs, say x∗ and x∗∗ such the
sequence {x∗ , x∗∗ , x∗ , x∗∗ , ...} is a two cyclic solution of the original equation x = f (x) (demonstrate this).
Further, if for a particular value of r, x∗ and x∗∗ are stable equilibrium solutions of x = f 2 (x), then the
2-cycle {x∗ , x∗∗ , x∗ , x∗∗ , ...} is a stable attractor of the equation xn+1 = f (xn ). By this we mean, for any initial
condition x0 starting close to x∗ or x∗∗ , the resulting sequence generated by our original equation will oscillate
between two values that get closer and closer to x∗ and x∗∗ as time progresses.
4. You have now reached the limit of what you can probably do analytically. By researching the literature† discuss
what happens as r increases on [0,5] focusing in terms of bifurcation values at which stable equilibrium solutions
of the logistic equation are replaced by stable 2-cycles, as well as stable n-cycles for n > 2.
5. If you have command of an appropriate technology, use it to graphical summarize your discussion in what is
called a Bifurcation Diagram (instructions on how to do this are available in textbooks or on the web, so locate
a set of instructions and see if you can follow them).
Project 4C: Economic Production versus Ecological Welfare

Economic activities, such as extraction and processing of raw materials or the manufacture of finished goods, always
results in some damage to the ecosystem and, because of pollution or the destruction of natural habitats, may even
severely degrade the ecosystems delivery of clean water and clean air. It may also compromise the ecosystems ability
to produce food or provide a place for relaxation and recreation. In this project you are asked to use optimization
to explore the trade-off between economic production and ecological welfare.‡
Suppose the level of economic activity is measured by a variable X, the value of goods and services produced by
this activity (also known as economic output) is measured by a variable Y , and the value of ecosystem services is
measured by an environmental quality variable Z.
A very simple model of human welfare W is based on the assumptions that
• welfare is proportional to both economic output Y and environmental quality Z

† A very good source is J.D. Murray’s book ”Mathematical Biology: I An Introduction” (Third Edition), 2001, Springer-Verlag, New
York.
‡ This problem follows Problem II 5. in J. Harte, 2001, Consider a Cylindrical Cow, University Science Books, Sausalito, California.

Figure 4.19: Industrial pollution
• economic output Y is itself proportional to economic activity X and and environmental quality Z (the first
assumption is self-evident, the second arises from the notion that it is much more difficult to produce the same
unit of economic output in a poor environment where resources are depleted than in a pristine environment
where resources are plentiful)
• the environment declines from a pristine level linearly with activity X.
These three equations are equivalent to the mathematical statements: for positive constants a, b, c, and Z0 , our
variables satisfy the equations
W = aY Z
Y = bXZ (4.1)
Z = Z0 − cX.
1. Demonstrate the Human Welfare W is maximized at X ∗ = Z0 /3c and has the maximum value
4abZ03
W∗ = .
27c
2. Show that the value of economic activity X̂ that maximizes production Y is 1.5 times larger than X ∗ —that
is ρ = X̂/X ∗ = 3/2. Further show that if Ŵ is the welfare obtained when production is maximized then the
“cost of greed” defined to be the ratio γ = Ŵ /W ∗ is γ = 27/32. Discuss the implications of the fact that ρ > 1
and γ < 1.
3. If the economic production level Y has the more general Cobb-Douglas form Y = bX α Z β than assumed in
equations 4.1, where α and β are non-negative empirically determined constants with values that depend on
the economic sector under consideration. If, in addition, welfare has the general form W = aX µ Y ν then find
the values of X that maximize both economic output and welfare. Calculate the ratios ρ and γ for this more
general case. What do you conclude?
4. Show in the case of equations (4.1) that the level of economic output that maximizes welfare-per-unit-output—
that is the ratio W/Y —is X = 0. Does this hold true for the more general case when α, β, µ, and ν are not
necessarily 1?
5. Look through the literature and see how many Cobb-Douglas functions you can find and what values of α and
β are associated with various sectors of the world economy. Also see if you can find a real problem where most

of the parameters a, b, c, Z0 , α, β, µ, and ν are known. Describe the problem and the values of the parameters
(if one of or more of α, β, µ, and ν are not known, then set them equal to 1, and it is fine if relative rather
than global values of the other constants are known or guessed at). Now calculate the optimum production
levels X̂ and X ∗ with respect to economic output and welfare respectively and elaborate in anyway you think
appropriate.

Chapter 5
Integration
5.1 Antiderivatives, p. 435
5.1 Area Under A Curve, p. 449
5.3 The Definite Integral, p. 464
5.4 The Fundamental Theorem of Calculus, p. 477
5.5 Substitution, p. 487
5.6 Integration by Parts and Partial Fractions, p. 496
5.7 Numerical Integration, p. 508
5.8 Applications of Integration, p. 524
Figure 5.1: The peregrine falcon (Falco peregrinus) feeds primarily on pigeons, doves, and shorebirds

434
Preview
Calculus has two intimately related parts—differential calculus, the topic of the previous two chapters, and integral
calculus. Just as division is the inverse of multiplication on the playing field of arithmetic so, in a narrow sense that
will be clarified later, integration is the inverse of differentiation on the playing field of calculus. When it plays this
role, we can refer to it as antidifferentiation; but, in general, integration is much more than this, so the term must
be used with caution to ensure that it is being used appropriately.
At the core of differential calculus is the concept of the instantaneous rate of change of a function. We have
seen how this concept can be used to locally approximate functions and to identify local maxima and minima.
Integral calculus, on the the other hand, deals with accumulated change and, thereby, recovering a function from a
mathematical description of its instantaneous rate of change. This recovery process, interestingly enough, is related
to the concept of finding the area under a curve.
Here is a novel example of what the integral calculus can do for us. Consider a stooping peregrine falcon (i.e.
a falcon diving towards the earth at great speed in an attempt to catch some flying prey item. The motion of the
falcon is subject to forces, such as gravity, and these forces determine its acceleration (i.e. the instantaneous rate
of change of the velocity). Given information about its acceleration, how does the velocity of the peregrine depend
on time? Is the falcon going to catch its prey before it escapes into a densely wooded forest? These are questions
that we can answer with integral calculus once we have mastered the process of finding “antiderivatives,” which is
the topic of our first section in this chapter.
Another example relates to calculating the date on which a tree blossoms as a function of anticipated temperature
patterns so that an orchard can be stocked with bees in time for them to pollinate the trees. Many organisms,
such as plants and insects, require a certain amount of heat to accumulate before a particular phenological event
(i.e. developmental event such as the start of bud break or flowering) will occur. Since this accumulation of heat
corresponds to the area under a temperature curve, the answer to the question of when to stock the orchard with bees
depends on our knowledge of how the development of flower buds depends on variations in the ambient temperature.
Again, we can answer such questions once we have mastered the process of finding the area under a prescribed curve,
which is the topic of Sections 2 and 3 of this chapter.
A systematic method for estimating the area under the curve was devised by Riemann, one of the great math-
ematicians of the 19th Century. This method, commonly known as taking Riemann sums (we will see this method
in some detail later in this chapter), yields in the limit (as presented in Section 2) an object called the definite
integral that, on calculation, can be interpreted as the area under a given curve. The fathers of calculus, Newton and
Leibnitz, themselves proved a connection between the problem of finding antiderivatives and finding areas under a
curve. This connection, the fundamental theorem of calculus, is presented in Section 4 and helps make calculus one
of the most powerful mathematical tools for understanding biological and physical processes.
In Sections 5 through 7, we provide a short apprenticeship in various techniques to compute and approximate
integrals. Armed with these techniques, the chapter concludes with applications to cardiac output, survival-renewal
equations, and work.

5.1. ANTIDERIVATIVES 435
Figure 5.2: A green stink bug
5.1 Antiderivatives
Many mathematical operations have an inverse. For example, to undo that addition of b to a we subtract b:
i.e. a + b − b = a. To undo that division of a by b we multiply by b: i.e. ab (b) = a. Alternatively, to undo
exponentiation, we take logarithms: i.e. ln ea = a. The process of differentiation can be undone by a process called
antidifferentiation.
Table 5.1: Developmental Rates of Stink Bugs
Temperature (in Fahrenheit) Developmental rate (in 1/days)

64.4 1/89
69.8 1/58
80.6 1/37
89.6 1/25
To motivate antidifferentiation, we consider the question of how long it takes an organism to develop when
the rate of development depends on environmental factors such as heat, light, and humidity. For example, plants
and insects lacking internal thermal regulation mechanisms, depend critically on ambient temperature for their
development. For ambient temperatures within a range defined by developmental thresholds, a plant or insects’
organismal developmental rate can often be approximated by an increasing linear function of temperature. For
example, Eileen Cullen, a doctoral student at the University of California, collected data shown in Table 5.1 on the
developmental rate of a particular species of stink bug reared in the laboratory. We see from this table, a stink bug
1
at 64.4◦ F completes 89 th of its development in one day and all of its development in 89 days. Performing a linear
regression on the data (i.e. finding a statistical “best fitting” line through the data) yields
developmental rate = −0.06075 + 0.00112 T
where T is temperature in degrees Fahrenheit. This relationship is illustrated in Figure 5.3. If T (x) is the temperature
at time x and F (x) denotes the fraction of development completed by the stink bug at time x, then the preceding

436 5.1. ANTIDERIVATIVES
developmental rate
0.04
0.035
0.03
0.025
0.02
0.015
temperature
70 75 80 85 90
Figure 5.3: Graph of the developmental rate of stink bugs. The red dots represent the actual data, and the line is
the best-fitting line.
equation yields
F ′ (x) = −0.06075 + 0.00112 T (x)
Thus, if we knew T (x), then we would like to “solve” for F (x). More generally, if we are given f (x) is the develop-
mental rate at time x, then F (x) must satisfy
F ′ (x) = f (x)
Understanding solutions of this equation is the main goal of this section.
Given a function f , an antiderivative F of f is a function F that satisfies

Antiderivative F ′ (x) = f (x)
d 3
For example, x3 is an antiderivative of 3x2 since dx x = 3x2 . Is x3 the only antiderivative of 3x2 ? The answer
is no. For example x , x + 1, x + π all have the same derivative 3x2 . Consequently, all are antiderivates of 3x2 .
3 3 3
Luckily for us, all antiderivatives of a function are related. Suppose F (x) and G(x) are antiderivatives of f (x) on
some interval. Since F ′ (x) = f (x) = G′ (x), the function
H(x) = F (x) − G(x)
has derivative
H ′ (x) = f (x) − f (x) = 0
on this interval. What functions have derivative equal to zero on an interval? The Mean Value Theorem implies only
the constant function! Hence, there must be a constant C such that F (x) = G(x) + C and we have the following
result.
If F is an antiderivative of f on an interval I, then every derivative of f on I is the

form
General form F (x) + C
of an antiderivative
where C is a constant. For this reason, we call F (x) + C the general form of the
antiderivative
Because of this general form, finding the general antiderivative amounts to finding an antiderivative of f and
adding an arbitrary constant C.

Example 1. Finding general antiderivatives
Find the general antiderivatives of

a. ex
b. cos x
c. x5
Solution.
d x
a. Recall that dx e = ex . Thus, the general form of the antiderivative is ex + C.
d
b. Recall that dx sin x = cos x. Hence, the general form of the antiderivative is sin x + C.
d 6
c. Recall that dx x = 6x5 . This is not quite what we want as we are off by a factor of 6. If we divide both
sides of the equation by 6, then
d 1
( x6 ) = x5
dx 6
x6
Thus, the general form of the antiderivative of x5 is 6 + C.
2
Warning! What we did in c., namely dividing by 6 as we were off by a factor of 6, only worked because 6 is a
2
constant. It doesn’t work in general. For example, suppose we wanted to find an antiderivative of ex . It would be
2 2
d x 1 x2
incorrect to argue that since dx e = 2xex , we are off by a factor of 2x and the antiderivative is 2x e . Indeed,
d 1 x2 x2
dx 2x e does not equal e as you should verify for yourself!
Example 2. Antiderivative of cos(ax)
Find the general form of the antiderivative for cos(ax) where a 6= 0.
d
Solution. We know dx sin(ax) = a cos(ax), but this is not quite what we want, since we are off by a factor of a. If
we divide both sides by a, then
1 d 1
sin(ax) = a cos(ax)
a dx a
d sin(ax)
= cos(ax)
dx a
1
Thus, the general form of the antiderivative of cos(ax) is a sin(ax) + C. 2
Corresponding to the many rules of differentiation are rules of antidifferentiation. For instance, if F (x) and G(x)
are antiderivatives of f (x) and g(x), respectively, then H(x) = F (x)+ G(x) is an antiderivative of h(x) = f (x)+ g(x).
Indeed, since the derivative of a sum is the sum of the derivatives, we obtain
H ′ (x) = F ′ (x) + G′ (x) = f (x) + g(x) = h(x)
Furthermore, as illustrated in the preceding examples, inverting our work with derivatives also yields formulas for
antiderivatives. Table 5.2 highlights some properties and formulas for antiderivatives.
Combining the antidifferentiation properties and formulas allows us to compute even more antiderivatives, as the
following example illustrate.
Example 3. Using antiderivative rules
Find the general antiderivative of 3x2 + 3x + 7.

Table 5.2: Antiderivative Formulas where F (x) and G(x) are particular antiderivatives of f (x) and g(x), respectively.
Function A particular antiderivative
f (x) + g(x) F (x) + G(x)
f (x) − g(x) F (x) − G(x)
c f (x) cF (x)
ex ex
sin x − cos x
cos x sin x
sec2 x tan x
1
x ln |x|
n+1
xn with n 6= −1 xn+1
Solution. Since an antiderivative of a sum is a sum of antiderivatives, an antiderivative of 3x2 + 3x + 7 is the sum
of antiderivatives of 3x2 , 3x, and 7. Antiderivatives of 3x2 , 3x, and 7 are x3 , 32 x2 , and 7x. Hence, an antiderivative
of 3x2 + 3x + 7 is x3 + 23 x2 + 7x and the general form of the antiderivative is x3 + 23 x2 + 7x + C where C is an
arbitrary constant. 2
To find a particular antiderivative F (x) of f (x) on an interval, we need to know a value of F (x) at a particular
value of x i.e. determine the particular value of the arbitrary constant C. If we have this information, then finding
the antiderivative is known as an initial value problem.
Example 4. An initial value problem
Find F (x) such that F (2) = 1 and F ′ (x) = 3x2 + 3x + 7.
Solution. From Example 1, the general form of the antiderivative F (x) of 3x2 + 3x + 7 is F (x) = x3 + 32 x2 + 7x + C.
To solve for C, we solve
F (2) = 1
3
23 + · 22 + 7 · 2 + C = 1
2
28 + C = 1
C = −27
Thus
3
F (x) = x3 + x2 + 7x − 27
2
2
Example 5. Stink bug development
Consider the development of the stink bug from egg to adult as summarized in Table 5.1. If F (x) denotes the
fraction of development completed by time x (in days) and T (x) denotes the temperature at time x, we found that
F ′ (x) = −0.06075 + 0.00112 T (x)
Suppose the temperature is oscillating between 60 and 80 degrees Fahrenheit each day. Then T (x) could be given by
T (x) = 70 + 10 cos(2 πx)
where x is measured in days. Substituting T (x) into the expression for F ′ (x) yields
F ′ (x) = −0.06075 + 0.00112 (70 + 10 cos(2π x))

= 0.01765 + 0.0112 cos(2π x)
Assuming that F (0) = 0:

a. Find F (x).
b. Use technology to find x such that F (x) = 1; that is, find at what time development is completed.
Solution.
1
a. An antiderivative of 0.01765 is 0.01765 x. Since an antiderivative of cos(2π x) is 2π sin(2π x), an an-
tiderivative of 0.0112 cos(2π x) is 0.0112
2π sin(2π x). Hence,
0.0112
F (x) = 0.01765 x + sin(2π x) + C
2π
To find C, we solve
F (0) = 0
0.0112
0.01765(0) + sin(2π · 0) + C = 0
2π
C = 0
Hence,
0.0112
F (x) = 0.01765 x + sin(2π x)
2π
b. Plotting F (x) as shown below suggests that development is completed in about 57 days.
development
1.02
1.01
days
55 56 57 58
0.99
0.98
0.97
0.96
Differential Equations and Slope Fields

An equation that involves derivatives is called a differential equation. Thus F ′ (x) = f (x), which can be written,
using Leibnitz’s notion, as
dF dy
= f (x) or as = f (x)
dx dx
is a differential equation. Solving and finding antiderivatives F (x) of f (x) corresponds to solving this equation. In the
next chapter, we discuss differential equations in greater detail. Here we provide an exploratory introduction to the

topic by considering a physiological phenomenon known as the Weber-Fechner law. This law describes the expected
response of an animal or human subject to a stimulus, such as light or sound. More particularly, The Weber-Fechner
law in physiological psychology asserts that when a subject is exposed to a stimulus, S, the rate of change of the
response R with respect to S is inversely proportional to S. This statement can be written mathematically as
dR k
=
dS S
where k a positive constant to be determined through experiment. One can interpret this equation as saying if the
stimulus S is small, then small changes in stimulus cause large changes in the response. Alternatively, if stimulus S
is large, then small changes in the stimulus do not change the response much.
Example 6. Solving the Weber-Fechner differential equation
Find the solution to the Weber-Fechner equation
dR k
= k>0
dS S
assuming that a threshold stimulus, S0 > 0, is the lowest level for which a response can be detected: i.e. find R(S)
subject to the threshold condition R(S0 ) = 0.
Solution. This problem requires us to find a function R(S) such that R′ (S) = Sk and R(S0 ) = 0. Taking the
general antiderivative of k/S with respect to S yields R(S) = k ln S + C where C is a constant. Since R(S0 ) = 0,
solving
R(S0 ) = k ln S0 + C = 0
for C yields
C = −k ln S0
Hence, we obtain
R(S) = k ln S − k ln S0
Equivalently
S
R(S) = k ln for S0 > 0.
S0
2
You may be familiar with particular examples of the Weber-Fechner law. The eye, for instance, senses brightness
logarithmically. Hence stellar magnitude is measured in a logarithmic scale invented by the ancient Greek astronomer
Hipparchus (190-120 B.C.) in about 150 B.C. Another logarithmic scale is the decibel scale of sound intensity. Still
another, discovered by Pythagoras (569-475 B.C.), is the relationship between the pitch or tone produced by a
vibrating string in a violin, piano, guitar, or any other stringed instrument, and the frequency of the vibration.
For some functions it is impossible to come up with an explicit expression for the antiderivative. For example,
2
the functions f (x) = e−x and f (x) = sin x2 fall into this category. This may seem a little mysterious. Later in
−x2
this book, we provide an interpretation of the mathematical statement dF dx = e that is of great importance to
the empirical sciences (i.e. the science of measuring things). Here we explore how to interpret equations such as
dF −x2
dx = e using computational technologies such as graphing calculators and computers.
In particular, when we look for a function F so that F ′ (x) = f (x), we can use the fact that the slope of a function
y = F (x) at any point (x, y) on its graph is given by the derivative F ′ (x). We exploit this fact to obtain a “picture”
of all slopes F ′ (x) on the (x, y)-plane. More specifically, using technology we draw a small line segment with slope
f (x) at regular intervals in the x and y directions. The collection of these line segments form what is known as a
slope field or direction field of the function f . For example in the slope field shown in Figure 5.4, we draw a small
horizontal lines of slope 0 at x = 0 since F ′ (0) = sin(0) = 0. These horizontal lines correspond to tangent lines of
the antiderivatives at x = 0. Any antiderivative F (x) is a function so that it is tangent to these line segments. Two
of these are found in the next example.

1.5
0.5
y
0
−0.5
−1
−1.5
−2
0 0.5 1 1.5 2 2.5 3
x
Figure 5.4: The slope field for F ′ (x) = sin(x2 )
Example 7. Antiderivatives with slope fields
Use technology to sketch the slope field for
F ′ (x) = sin(x2 )
for 0 ≤ x ≤ 3 and −2 ≤ y ≤ 2. Sketch by hand antiderivatives F (x) satisfying F (0) = 0 and F (0) = −2.
Solution. Using technology one can generate the slope field illustrated in Figure 5.4. Sketching an antiderivative
F (x) satisfying F (0) = 0 is tantamount to sketching a curve that passes through the point (0, 0) and remains tangent
to the line segments with arrows. Doing so yields the higher curve illustrated in Figure 5.5. The other curve in
this figure corresponds to an antiderivative F (x) with F (0) = −2 (that is, passes through (0, −2)). Notice that the
graphs of each of these antiderivatives are vertical translations of one another.
2
Rectilinear Motion
We can use antiderivatives to understand the motion of an object along a straight line. The acceleration a(t) of an
object at time t equals the rate of change of its velocity v(t). Thus, velocity is an antiderivative of acceleration, i.e.
v ′ (t) = a(t). Similarly, the position s(t) of the object is an antiderivative of its velocity i.e. s′ (t) = v(t).
These definitions bring us to the peregrine falcon shown on the opening page of this chapter (Figure 5.1). The
peregrine falcon is arguably the fastest animal in the world. If you do a search for this falcon on the Internet, you
may encounter the following quotation or something like it:
Some birds of prey soar or hover in the sky and others have evolved short wings for quick, darting flights
in forested country. The peregrine’s speed and size make it an excellent hunter, able to take some of the
larger birds. The long-winged raptor specializes in direct pursuit in the open and thus favors non-forested
areas in which to hunt, particularly shores, marshes, river valleys, open moors, and tundra. Even though
its level speed of flight exceeds that of most birds, the peregrine takes advantage of height from which
to launch its attack. The top speed of its dives (stoops) at prey is estimated at well over 300 km/h. A
peregrine is a hurtling wedge of streamlined feathers, its feet lying back against the tail and wings half-
closed. At such speeds it delivers a fierce blow to the prey with a half-closed foot, the usual method of
disabling or killing medium-sized and large prey. (Source: Hinterland Who’s Who at http://www.hhw.ca)

1.5
0.5
y
0
−0.5
−1
−1.5
−2
0 0.5 1 1.5 2 2.5 3
x
Figure 5.5: Antiderivatives of F ′ (x) = sin(x2 )
Thus the peregrine falcon speeding along at 300 km/hr is more than three times the speed limit in many states! But
just how long a drop it takes the peregrine to achieve this speed is the answer to the following problem.
Example 8. Stooping peregrines
Assuming the peregrine falcon downward acceleration is due to gravity is 9.8 m/s2 , determine how far a peregrine
falcon would have to free fall to achieve a speed of 300 km/h.
Solution. Let v(t) denote the velocity at t seconds after a peregrine falcon has begun its stoop. Assuming its
acceleration is purely due to gravity, we have
dv 2
= 9.8 m/s
dt
To solve for v, we antidifferentiate:
v(t) = 9.8 t + C
where C is a constant. Since the peregrine has no downward velocity at the beginning of its stoop, we have v(0) = 0.
Hence, 0 = v(0) = C and
v(t) = 9.8 t
To find the position s(t) of the falcon at time t, we have
ds
= v = 9.8 t
dt
Here, s(t) describes the vertical distance (in meters) from the initial position of the falcon to its position at time t.
Antidifferentiating yields s(t) = 4.9 t2 + C where C is some constant. Since s(0) = 0, we obtain 0 = s(0) = C and
s(t) = 4.9t2
Next, we need to determine how many seconds the peregrine falcon needs to fall to achieve a speed of 300 km/h.
Begin by converting 300 km/h to m/s:
300 km · 1000 m · 1 h 1
= 83 m/s.
1 h · 1 km · 3600 s 3

Thus, to find the desired time, we solve

1
v = 83 = 9.8 t
3
for t to obtain t ≈ 8.5 seconds. Hence, the distance fallen to achieve a speed of 300 km/h is approximately
s(8.5) ≈ 4.9(8.5)2 ≈ 354
The peregrine falcon would need to free fall about 354 meters.
Reality Check. Recently, scientists have accurately measured speeds achieved by Peregrines during stooping.
One has been logged by radar at 183 km/h ≈ (114 m/hr) after a dive of 305 m ≈ (1, 000 ft). This is considerably slower
than the 300 km/hour our current model would predict. One of the most important reasons for this discrepancy is
that we ignored air resistance in our calculations. This shortcoming can be addressed with a slightly more complicated
differential equation.
Problem Set 5.1

Find the general antiderivative of the functions f shown in Problems 1 to 22.
1. 2
2. f (x) = 4
3. f (x) = 2x + 3
4. f (x) = 4 − 5x
5. f (x) = 6x4
6. f (x) = 2x−4 for x > 0
7. f (x) = 2x2 − 5
8. f (t) = 4t + 4t2
9. f (t) = 8t3 + 15t
1
10. f (x) = 2x for x > 0
5
11. f (x) = u2 for u > 0
2x
12. f (x) = 5x2 for x > 0

13. f (x) = cos x

14. f (x) = 4 sin(5x)
15. f (x) = 3 sin(2πx)
16. f (x) = 14ex
17. f (x) = 3ex
18. f (θ) = sec2 θ for −π/2 < x < π/2
19. f (x) = x3/2 + x1/2 + x−1 for x > 0
√
20. f (u) = u3 − 2u + u
21. f (u) = 6u + 3 cos u
22. f (x) = 5x − 4 sin x
Find the antiderivative F (x) of the functions shown in Problems 23 to 28 satisfying the indicated initial condition.
23. f (x) = 2 with F (0) = 1.

24. f (x) = 4 with F (1) = −1.
25. f (x) = 2x + 3 with F (−3) = 0.
26. f (x) = 4 − 5x with F (0) = 4.
27. f (x) = 6x4 with F (1) = −2.
28. f (x) = 2x−4 for x > 0 with F (2) = 0.
29. a. If F ′ (x) = 1 − 4x, find F so that F (1) = 0.
b. Sketch the graphs of y = F (x), y = F (x) − 2, and y = F (x) + 4.
c. Find a constant C so that the largest value of G(x) = F (x) + C is 0.
30. a. If F ′ (x) = 2x − 1, find F so that F (2) = 0.
b. Sketch the graphs of y = F (x), y = F (x) − 2, and y = F (x) + 4.
c. Find a constant C so that the largest value of G(x) = F (x) + C is 0.
The slope F ′ (x) at each point on a graph is given in Problems 31 to 34 along with one point (x0 , y0 ) on the graph.
Use this information to find F graphically.
31. F ′ (x) = x2 + 3x with point (0, 0).

32. F ′ (x) = (2x − 1)2 with point (1, 3)
33. slope x + ex with point (0, 2)
x2 −1
34. slope x2 +1 with point (0, 0)
35. Sketch a slope field for

dy
=x
dx
for −2 ≤ x ≤ 2 and 0 ≤ y ≤ 5. Over this slope field, sketch the antiderivative of F (x) of x which satisfies
F (0) = 1.
dy
= 3x2
dx

for −5 ≤ x ≤ 5 and −5 ≤ y ≤ 5. Over this slope field, sketch the antiderivative F (x) of x which satisfies
F (0) = 0.

dy
= cos x
dx
for −π ≤ x ≤ π and −2 ≤ y ≤ 2. Over this slope field, sketch the antiderivative F (x) of x which satisfies
F (0) = 1.

dy
= x sin(π x)
dx
for −1 ≤ x ≤ 1 and −2 ≤ x ≤ 2. Over this slope field, sketch the antiderivative F (x) of x which satisfies
F (−1) = 0.
39. Find the general antiderivative of sin(ax) where a 6= 0.
40. Find the general antiderivative of ekx where k 6= 0.
41. As discussed in Example 5, the developmental rate of a stink bug as a function of temperature T is −0.06075 +
0.00112T . Assume that the temperature of a typical spring in Davis, California, x days after the start of the
stink bug development period, is adequately modeled by the function
T (x) = 80 + 10 cos(2πx)
a. Find the function F (x) describing the amount of development completed by day x assuming that
F (0) = 0.
b. Estimate at what time a stink bug has completed development.
42. Recall that the developmental rate of a stink bug as a function of temperature T is −0.06075 + 0.00112T .
Assume that the temperature of an atypical day in Davis, California, x days after the start of the stink bug
development period, is adequately modeled by the function
T (x) = 80 + x + 10 cos(2πx)
a. Find the function F (x) describing the amount of development completed by day x assuming that
F (0) = 0.
b. Estimate at what time a stink bug has completed development.
43. Entomologists Godfrey and Anderson∗ studied the developmental rates of the Hydrilla tuber weevil which is a
species that consumes a weed found in ponds and waterways. As illustrated in Figure 5.6. The developmental
rate as a function of temperature (in Celsius) is given by
DEVOLPMENTAL RATE = −0.0582211 + 0.00417376 T
a. Suppose that the temperature in C ◦ is given by the function T (t) = 30 + 10 sin(2π t) where t is
measured in days.
b. Estimate how many days it takes the weevil to develop to adulthood.

Developmental rate
0.07
0.06
0.05
0.04
0.03
Celsius
20 22 24 26 28 30 32
Figure 5.6: Developmental rate as a function of temperature
1
44. Assume that the temperature in Problem 43 is given by the function T (t) = 50 + 1+t . Using the developmental
rate for the tuber weevil presented in that problem estimate how many days it takes the weevil to complete
half of its development.
45. A peregrine falcon stoops for 305 meters. Assuming a constant acceleration of 9.8 m/sec2 , find its speed at the
end of the stoop.
46. Suppose a food package is dropped out of a balloon which is 100 ft above the ground and ascending at a rate
of 10 ft/s. Determine how long it takes the package to hit the ground.
47. Apollo 15 astronaut David Scott dropped a hammer and feather on the moon to demonstrate that in a vacuum
all objects fall at the same rate. He dropped both items from a height of approximately 4 ft. How long did it
take them to hit the ground? (Acceleration on the moon due to gravity is −5.2 ft/s2 .) How long would it take
for a hammer to hit the ground on earth if dropped from a height of 4 ft? (Gravitational acceleration on the
earth is −32 ft/s2 .)
48. Assume the brakes of a certain automobile produce a constant deceleration of 22 ft/s. If the car is traveling at
60 mi/h (88 ft/s) when the brakes are applied, how far will it travel before coming to a complete stop?
49. It is estimated that t months from now, the population of Ferndale, California certain town will be changing
at the rate of 4 + 5t2/3 people per month. If the current population is 2,000, what will the population be 8
months from now?
50. A hypothetical study of a community suggests that t years from now the level of carbon monoxide in the air
will be changing at the rate of 0.1t + 0.1 ppm/yr. If the current level of carbon monoxide in the air is 3.4 ppm,
what will be the level 3 years from now?
51. One of Poiseuille’s laws for the flow of blood in an artery says that if v(r) is the velocity of flow r cm from the
central axis of the artery, then the velocity decreases at a rate proportional to r. That is, v ′ (r) = a r where a
is a positive constant. Find an expression for v(r) assuming that v(R) = 0 where R is the radius of the artery.
52. Suppose that a silviculturalist finds that a certain type of tree grows in such a way that its height h(t) t years
after planting is changing at the rate of
h(t) = 0.2t2/3 + t t/yr
If the tree was 2 ft tall when it was planted, how tall will it be in 27 years?
53. Suppose that a woman, driving a sports car down a straight road at 60 mi/h (88 ft/s), sees a cow start to
cross the road 200 feet ahead. She takes 0.7 seconds to react to the situation before hitting the brakes, which
decelerate the car at the rate of 28 ft/s2 . Does she stop in time to avoid hitting the cow?
∗ Godfrey, K. E. and Anderson, L. W. J. Developmental rates of Bagous affinis at constant temperatures. Florida Entomologist. 77
(1994), 516–519

54. A hypothetical population, N , grows in such a way that at time t (years), the growth rate is given by
dN
= 0.15t + cos t + 0.7 sin t
dt
where N (t) is measured in thousands of individuals and N (0) = 5.
a. Find N (t).
b. What is the minimum population? When does it occur?

5.2. ACCUMULATED CHANGE AND AREA UNDER A CURVE 449
5.2 Accumulated Change and Area under a Curve

The invention of calculus provided tools for addressing the major scientific problems of the 17th Century. These
fell essentially into one of the four following categories:
1. Finding the distance, velocity, or acceleration of objects using Newton’s laws of motion.
2. Finding the tangent function of curves.
3. Finding the maximum and minimum values of functions.
4. Finding the lengths of curves (e.g. the trajectory of a planet), the area under a curve, or the volume contained
insight a geometric object such as a sphere
In this section, we deal with the problem of finding the area under a curve, which we shall see has many applica-
tions. The early Greeks, particularly Archimedes (287-212 BC) estimated the areas and volumes of geometric objects
using the “method of exhaustion,” a precursor to integral calculus: they found better and better approximations by
filling in areas or volumes with smaller and smaller elements of known area or volume (much as we do later in this
section in taking Riemann sums).
In elementary school you learned formulas for areas of squares, triangles, and other polygons. You also are familiar
with the formula for the area of a circle with radius r : A = πr2 . The Egyptians were the first to use this formula over
5,000 years ago, but the Greeks derived the area of a circle by drawing inscribed polygons or circumscribed polygons
and then using triangles to find the area of those polygons as an approximation, as shown in Figure 5.7. This method,
called the method of exhaustion, involves finding the area of a circle by inscribing polygons with increasing numbers
of sides (Archimedes stopped at a 96-sided polygon). The area of the circle is the limit of the areas of the inscribed
polygons as the number of polygonal sides increases.
Figure 5.7: Using the limit of a sequence of inscribed polygons to find the area of a circle
In this section, we focus on estimating the area under a curve y = f (x) over an interval a to b. As illustrated in
Figure 5.8, this means estimating the area defined by the region bounded by the curves y = f (x) (with f (x) ≥ 0 on
[a, b]), x = a, x = b, and y = 0. Similar to the method of exhaustion, we find these areas by approximating them
with collections of finer and finer rectangles.
To motivate finding area under a curve, we show area under a curve corresponds to accumulated change. We
continue with an explicit calculation of the area under y = x2 over [0, 1], and conclude this section by generalizing
the process.
Degree-days
Plants and insects often require a certain amount of heat to develop from one stage in their life cycle to another
stage in their life cycle. This measure of accumulated heat is known as physiological time and the units used are
called degree-days, the accumulated product of time and temperature between the organism’s lower and upper
developmental thresholds. To simplify the presentation for right now, we assume that the temperature remains
between the lower and upper developmental thresholds. The more general case is considered later. One degree-day
is one day (24 hours) with the temperature one degree above the lower developmental threshold. For example, if the
lower developmental threshold of the organism is 47◦ F and the temperate remains at 48◦ F for one day or 47.2◦ F
for five days, then in each case, one degree-day is accumulated (i.e. 1 × (48 − 47) = 5 × (47.2 − 47) = 1). The

450 5.2. ACCUMULATED CHANGE AND AREA UNDER A CURVE
17.5
15
12.5
10
7.5
2.5
-2 -1 1 2
Figure 5.8: Area under a curve y = f (x)
concept of degree-days is used widely in agriculture and developmental biology. For example, at the integrated pest
management web site of the University of California at Davis∗ you can find the following types of statements:
The number of degree-days required for sweet corn to mature for the fresh market is 1,539 degree-days,
for pistachio shells to harden is 1,197 degree-days, and for corn earworm (a pest of corn) to mature from
egg to adult is 760.1 degree-days. Moreover, the lower developmental thresholds for sweet corn, corn
earworm, and pistachio are 50◦ F, 54.7◦ F, and 50◦ F, respectively.
Using these statements and information about the temperature, we can estimate the time it takes sweet corn to
mature or the time at which one can harvest a crop.
Example 1. Degree-days under constant temperature
According University of California, Davis’ Integrated Pest Management website, the lower developmental thresh-
old of Thompson Seedless Grapevines is 50◦ F and requires approximately 3,000 degree-days for the fruit to mature. If
the temperature were to remain constant at 70◦ F, which is assumed to be below the upper developmental threshold,
how long would it take for the fruit to mature?
Solution. The amount of degree-days accumulated in x days is

(70 − 50) · x = 20 x
Solving 20 x = 3, 000 yields x = 150 days. Therefore, it would take 150 days for the grapes to mature. Notice that
this answer can be interpreted as the following shaded rectangular area:
degrees
80
60
40
20
days
20 40 60 80 100 120 140
∗ http://www.ipm.ucdavis.edu/MODELS

Unlike the preceding example, temperature in the fields vary continuously. Consequently, computing the accumu-
lation of degree-days, as we shall see, requires finding the area of an appropriate region defined by the temperature
curve and the lower developmental threshold, assuming that the temperature never goes high enough to reach the
upper developmental threshold. More specifically, suppose the temperature in Lincoln, Nebraska is given by f (t)◦ F
where t is time in days after June 23, 2006 as illustrated in Figure 5.9a. If the organism of interest (say sweet corn)
has a lower developmental threshold of 50◦ F, then how do we compute the accumulated degree-days over a single
day? Consider a small interval of time from t to t + ∆t. Over this interval, the temperature remains relatively con-
stant at f (t)◦ F. Hence, the accumulated degree-days over this time interval is approximately f (t) − 50◦ F multiplied
by ∆t days i.e the area of a rectangle of height f (t) − 50 and width ∆t. Since the accumulated degree-days over
the whole day is the sum of the degree-days accumulated over all parts of the day, this argument suggests that the
total accumulated degree-days is given by the area between the curves y = f (t) and y = 50 from t = 0 to t = 1 as
illustrated in Figure 5.9b. We explore this idea further in the next example.
85
85 80
80
75
75
temperature
70
temperature
70
65
65
60
60
55
55
50
50
45 45
0 0.25 0.5 0.75 1 0 5 10 15 20
days days
(a) (b)
Figure 5.9: In (a) temperature for Lincoln, Nebraska on June 23, 2006. In (b), the shaded area corresponds to the
accumulated day degrees for an organism with a lower developmental threshold of 50◦ F.
Example 2. Sweet corn in Nebraska
Estimate the accumulation of degree-days for sweet corn in Lincoln, Nebraska on June 23, 2006 using the following
table which reports the temperature at two hour intervals:
Hour Temperature ◦ F Excess above 50◦ F
0 65.8 15.8
2 62.2 12.2
4 62.2 12.2
6 59.5 9.5
8 67.8 17.8
10 73.4 23.4
12 79.7 29.7
14 82.8 32.8
16 83.8 33.8
18 82.8 32.8
20 78.8 28.8
22 70.5 20.5
24 67.3 17.3
Total 286.6

Solution. To approximate the number of degree-days that have accumulated, break up the day into two hour
1
intervals (i.e. intervals of width 12 day). Within each interval, let us assume that the temperature is relatively
constant. Then accumulated degree days within the first interval [0, 1/12] days is given by
1 15.8
(65.8 − 50) =
12 12
1
This quantity simply corresponds to the area of a rectangle with height 65.8 and width 12 days as illustrated in
Figure 5.10. To estimate the total accumulation of degree days, we compute the accumulated degree days for each
1
interval of length 12 day and add them up:
1
Accumulated degree-days ≈ [(65.8 − 50) + (62.2 − 50) + (62.2 − 50) + · · · + (70.5 − 50) + (67.3 − 50)] ·
12
1
= 286.6 · ≈ 23.9
12
This sum corresponds to the sum of the areas of rectangles as shown in Figure 5.10. 2
85
80
75
temperature
70
65
60
55
50
45
0 1/12 2/12 3/12 4/12 5/12 6/12 7/12 8/12 9/12 10/12 11/12 1
days
1
Figure 5.10: Accumulated degree-days approximated by the area of rectangles with width 12 days.
The Black Plague

In epidemiology, scientists keep track of various rates associated with disease including the incidence rate that
measures the number of new disease cases per unit time (e.g. day or week) and the mortality rate that reports the
number of deaths due to the disease per unit of time. For instance, during the outbreak of the Black Plague in
Bombay in 1905–1906, the weekly mortality rate due to the plague was recorded and the values obtained are plotted
in Figure 5.11.
In a landmark paper ∗ , two mathematicians, W. O. Kermack and A. G. McKendrick, showed that this data could
be reasonably well fitted by the function
f (t) = 890 sech2 (0.2 · t − 3.4) deaths/week

2
where t is measured in weeks and sech(x) = ex +e −x . If we want to estimate the total number of deaths using this
function, what do we need to compute? Consider a small interval of time, from t to t + ∆t. Since the mortality
rate over this interval is given approximately by f (t), the number of deaths over this time interval is approximately
f (t) · ∆t, i.e. the area of a rectangle of width ∆t and height f (t). Notice how the units work out in this product:
f (t) has units of deaths/week and ∆t has units of weeks. The product f (t)∆t has units of deaths. These arguments
∗ “A contribution to the mathematical theory of epidemics” by W. O. Kermack and A. G. McKendrick, Proceedings of the Royal
Statistical Society, 115 (1927), 700–721.

deaths/week
800
600
400
200
weeks
5 10 15 20 25 30
Figure 5.11: Incidence of deaths of the black plague in the island of Bombay from December 17, 1905 to July 21,
1906.
suggest that the area under the curve f (t) from t = 0 to t = 30 should give the total number of deaths, which we
further investigate now.
Example 3. Mortality due to the black death
Approximate the total number of deaths in Bombay from t = 0 to t = 30 using intervals of 5 weeks..
deaths/week
761.519
633.304
192.376
3.95622 weeks
5 10 15 20 25 30
Figure 5.12: Using the data in Figure 5.11 to approximate the number of deaths.
Solution. Begin by breaking the interval from t = 0 to t = 30 into six subintervals of length 5, as shown in
Figure 5.12. For the mortality rate in each interval, we can evaluate f (t) at the right end point of each interval. This
yields the following table (entries rounded to one decimal place):
Interval Deaths/week (height of rectangle) Deaths (area of rectangle)
[0, 5] f (5) ≈ 28.8 144
[5, 10] f (10) ≈ 192.4 962
[10, 15] f (15) ≈ 761.5 3, 807.5
[15, 20] f (20) ≈ 633.3 3, 166.5
[20, 25] f (25) ≈ 134.0 670
[25, 30] f (30) ≈ 19.4 97
Summing up the deaths yields 8,847 deaths, which is just over 2% short of the actual 9,043 recorded number of

deaths. This approximation is illustrated in Figure 5.12. 2
From Fig. 5.12 we notice that when the curve is on the rise, as in the first three rectangles, the area is over
estimated (the green area above the curve) and when the curve is on the decline, the area is underestimated (the
white area below the curve). This is a result of the height of the rectangles being defined by the value of the function
on the right side of each interval. The reverse would be true if the height of the rectangles were defined by the value
of the function on the left side of each interval.
The Area Problem

The previous examples illustrate the importance of finding areas under curves. These examples also showed us that
we can approximate areas by approximating the region with rectangles, computing the area of each rectangle, and
summing up the areas. This observation is the key that unlocks the area problem. We pursue this approach to with
Example 4. Estimating the area under a curve
Consider the function f (x) = x2 over the interval [0, 1]. Use rectangles to find upper and lower bounds for the
area under x2 , above the y-axis, between x = 0 and x = 1.
Solution. Let A denote the area under y = f (x), above y = 0, and between the lines x = 0 and x = 1, as shown
in Figure 5.13.
Figure 5.13: The area A under y = x2 on [0, 1]
Now, we find the area A by taking successive approximations. Notice that the largest value of x2 on the interval
[0, 1] is 1 at x = 1. Hence, the region under x2 is contained in a rectangle of height 1 and width 1. Thus, A < 1·1 = 1.
On the other hand, A is clearly greater than 0. To obtain a better estimate, subdivide the interval [0, 1] into two
subintervals [0, 1/2] and [1/2, 1], each with width ∆x = 1/2, as shown in Figure 5.14.
The greatest values that x2 takes on these subintervals are f (1/2) = 1/4 and f (1) = 1. Hence, the two rectangles
over the intervals [0, 1/2] and [1/2, 1] with heights, 1/4 and 1, respectively, enclose our region. Therefore, A <
1 1 1 5 2
4 · 2 + 1 · 2 = 8 . Alternatively, since the minimum values of x on [0, 1/2] and [1/2, 1] are 0 and 1/4, respectively.
1 1 1 1
Therefore, A > 0 · 2 + 4 · 2 = 8 .
Since subdividing the interval once improved our estimates, more subdivisions should improve our estimate.
Suppose we divide the interval into n subintervals [0, n1 ], [ n1 , n2 ], . . ., [ n−1 1 2
n , 1] of width n . Since x is an increasing
1 22 2
function on the interval [0, 1], the maximum values of f (x) = x on these subintervals are n2 , n2 , . . . , nn2 . The area
2
of the n rectangles determined by these heights is given by

n 1
1 1 2 1
Rn = f +f + ...+ f
n n n n n n
2 2 2
1 1 2 1 n 1
= 2
· + 2 · + ... + 2 ·
n n n n n n

1 1
0.25 0.25
0.5 1 0.5 1
(a)Estimate using greatest values (b) Estimate using least values
Figure 5.14: Left and right sum approximations of the area under y = x2
2 2
which is greater than A. Since the minimum values of x2 on these subintervals of width 1
n are 0, n1 2 , . . . , (n−1)
n2 , A is
greater than

0 1 1 1 n−1 1
Ln = f +f + ... + f
n n n n n n
1 12 1 (n − 1)2 1
= 0· + 2 · + ... ·
n n n n2 n
Thus,
Ln < A < Rn
Before we continue, let us take a particular value for n, say n = 4, as shown in Figure 5.15.
1 1
0.5625 0.5625
0.25 0.25
0.0625 0.0625
0.25 0.5 0.75 1 0.25 0.5 0.75 1
(a)Estimate using greatest values (b) Estimate using least values
Figure 5.15: Estimating the area of A using four subintervals
We now carry out the estimation:

Ln < A < Rn
2
0 1 1 1 2 1 32 1
2 2
1 22 1 32 1 42 1
· + · + · + · < A < + · + · + ·
42 4 42 4 42 4 42 4 4 42 4 42 4 42 4
0.21875 < A < 0.46875
Since computing these quantities by hand for large n is tedious, one option is to use technology to compute these
sums for us (in this table entries have been calculated to a precision of 10 decimal places, but we have not written
down the zeros when the number can be reported in less than 10 decimal places):

n Ln Rn
1 0.0 1.0
2 0.125 0.625
3 0.1851851852 0.5185185185
4 0.21875 0.46875
5 0.24 0.44
10 0.285 0.385
100 0.32835 0.33835
1,000 0.3328335 0.3338335
10,000 0.333283335 0.333383335
100,000 0.3333283334 0.3333383334
The sums for the previous example suggest that as n becomes large, Ln and Rn both converge to 31 .
2
Example 4 suggests that area under x2 over the interval [0, 1] is 13 . But how can we REALLY be sure that these
numbers converge to 31 ? We prove this with the next example.
Example 5. Finding the exact area under x2
Use the formula (which can be proved inductively)

n(n + 1)(2n + 1)
1 2 + 2 2 + 3 2 + . . . + n2 =
6
to prove that
1
lim Rn =
n→∞ 3
Solution. We have that

12 1 22 1 n2 1
Rn = 2
· + 2 · + ... + 2 · From Example 4.
n n n n n n
1 1 1 1
= 1 2 2 + 2 2 2 + . . . + n2 2
n n n n
1
= 3
1 2 + 2 2 + . . . + n2
n
1 n(n + 1)(2n + 1)
= Given induction formula.
n3 6
(n + 1)(2n + 1)
=
6n2
2n2 + 3n + 1
=
6n2
1 1 1
= + + 2
3 2n 6n
Thus,
1 1 1
lim Rn = lim + + 2
n→∞ n→∞ 3 2n 6n
1
=
3
Similarly (see Problem 19), it can be shown that limn→∞ Ln = 13 . Since Ln ≤ A ≤ Rn for all n ≥ 1, it follows
that
1
A = lim Rn = lim Ln =
n→∞ n→∞ 3

Example 5 provides the core idea of how to define the area above the x-axis and under a positive function y = f (x)
from x = a to x = b. First, we divide the interval [a, b] into n equally spaced subintervals of width ∆x = b−a
n . Let
a0 = a, a1 = a + ∆x, a2 = a + 2∆x, a3 = a + 2∆x, . . . , an = a + n ∆x = b
To approximate the height of f over a subinterval [ai , ai+1 ], choose a point xi ∈ [ai , ai+1 ]. The points xi are called
sample points. In our examples, we choose left or right end points as our sample points, but we could have picked
any point in each interval. The height of f over [ai , ai+1 ] is approximately f (xi ). The area of f over [ai , ai+1 ] is
approximately f (xi ) ∆x. Adding all these rectangular areas up yields
Area ≈ f (x1 )∆x + f (x2 )∆x + . . . f (xn )∆x
This sum is known as a Riemann sum after the brilliant mathematician Georg Friedrich Bernhard Riemann (1826-
1866; see Historical Quest in the problem set). It can be written more simply as shown in the following definition
box.
Suppose a continuous function f is defined on the interval [a, b]. If the interval is
divided into n subintervals so that ∆x = b−a
n and
a = a0 < a1 < a2 < . . . < an = b
Riemann sum then a Riemann sum associated with f is the sum

n
X
f (xi )∆x = f (x1 )∆x + f (x2 )∆x + . . . f (xn )∆x
i=1
where xi is chosen in the interval [ai , ai+1 ]

P
In this definition, we introduced the summation notation Pn . In general given a collection of real numbers
a1 , . . . , an , we can represent their sum a1 + a2 + . . . + an as i=1 ai where the latter equation reads “summing over
P4 P3
the quantities ai from i = 1 to i = n.” For example, i=1 i = 1 + 2 + 3 + 4 = 10 and i=1 i2 = 12 + 22 + 32 = 14.
We see that the area can be written as a Riemann sum, but the Riemann sum is only an approximation to the
area. If we let n become large, however, the approximation clearly improves (the areas that should be included but
are not included and the areas that should not be included but are included decrease as n gets larger) and approaches
the true area as n → ∞. Therefore, we write
Area = lim f (x1 )∆x + f (x2 )∆x + f (x3 )∆x + . . . + f (xn )∆x
n→∞
n
X
= lim f (xi )∆x
n→∞
i=1
We cannot know that the method really works unless we have a theorem that tells us that a limit exists and that
this limit is independent of the way we choose the sample points in the subintervals. For continuous functions such
a theorem does exist, but its proof is a topic for a course in real analysis (the “real” refers to real-valued functions
in contrast to “complex” function analysis)!
Theorem 5.1. Limit of a Riemann Sum
If f (x) is continuous on [a, b], then

n
X
lim f (xi )∆x
n→∞
i=1

exists and is independent of the choice of sample points xi .
Problem Set 5.2
First sketch the region under the graph of y = f (x) on the interval [a, b] in Problems 1 to 12. Then approximate the
area of each region by using right endpoints and the formula
Rn = f (a + ∆x)∆x + f (a + 2 ∆x)∆x + . . . + f (a + n∆x)∆x
b−a
for ∆x = n and the indicated values of n.
1. f (x) = 2x + 1 on [0, 1] for n = 4.
2. f (x) = 4x + 1 on [0, 1] for n = 8.
3. f (x) = x2 on [0, 2] for n = 4.
4. f (x) = x2 on [0, 2] for n = 6.
5. f (x) = x3 on [1, 3] for n = 4.
6. f (x) = 4x2 + 2 on [0, 1] for n = 4.
7. f (x) = x2 + x3 on [0, 1] for n = 4.
8. f (x) = ex on [0, 1] for n = 4.
9. f (x) = x−1 on [1, 2] for n = 4.

√
10. f (x) = x on [1, 4] for n = 4.
11. f (x) = cos x on [− π2 , 0] for n = 4.
12. f (x) = x + sin x on [0, π4 ] for n = 3.
Use a calculator to estimate the area under the curve y = f (x) on each interval given in Problems 13 to 18 as a sum
of 10 terms evaluated at right end points.
13. f (x) = 4x on [0, 1].
14. f (x) = x2 on [0, 4]
15. f (x) = cos x on [− π2 , 0]
16. f (x) = x + sin x on [0, π4 ]
17. f (x) = ln(x2 + 1) on [0, 3]

2
18. f (x) = e−3x on [0, 1]

The following formulas can be verified using mathematical induction. You may use
these formulas to find certain Riemann sums.
n n times
X z }| {
1 = 1 + 1 + ··· + 1 = n
k=1
n
X n(n + 1)
Summation Formulas k = 1 + 2 + 3 + ··· + n =
2
k=1
n
X n(n + 1)(2n + 1)
k 2 = 1 2 + 2 2 + 3 2 + · · · + n2 =
6
k=1
n
X n2 (n + 1)2
k 3 = 1 3 + 2 3 + 3 3 + · · · + n3 =
4
k=1
Use a summation formula in Problems 19 to 24.

19. Prove that
1
lim Ln =
n→∞ 3
as defined in Example 5.
20. Use Riemann sums and left endpoints to prove that the area under y = x from x = 0 to x = 2 equals 2.
21. Use Riemann sums and right endpoints to prove that the area under y = x from x = 0 to x = 4 equals 8.
22. Use Riemann sums and right endpoints to prove that the area under y = x3 from x = 0 to x = 4 is 64.
23. Use Riemann sums and left endpoints to prove that the area under y = x3 from x = 0 to x = 2 is 4.
24. Use Riemann sums and right endpoints to prove that the area under y = x + 3x2 from x = 0 to x = 2 is 10.
25. The lower developmental threshold of sweet corn is 50◦ F and requires 1,587 degree-days for maturing. If the
temperature were to remain constant 75◦ F, how long would it take for the corn to mature?
26. The pistachio has a lower developmental threshold of 50◦ F and requires 1,197 degree-days for shell hardening.
If the temperature were to remain a constant 72◦ F, how long would it take for the pistachio’s shell to harden?
27. The Black Turtle Soup bean has a lower developmental threshold of 41◦ F and requires 1,365.5 degree-days for
50% anthesis (i.e. until 50% of the all flowers have blossomed). If the temperature were to remain a constant
68.5◦F, how long would it take to reach the required 50% anthesis?
28. Estimate the mortality due to the black death by approximating the region under
f (t) = 890 sech2 (0.2t − 3.4)
deaths per week from t = 0 to t = 30 with rectangles of width 15 weeks. Would you expect your answer to be
more or less accurate than the result of Example 3?
29. Estimate the mortality due to the black death by approximating the region under
f (t) = 890 sech2 (0.2t − 3.4)
deaths per week from t = 0 to t = 30 with rectangles of width 3 weeks. Would you expect your answer to be
more of less accurate than the result of Example 3?

30. The weekly rate of cases of influenza A (strain Unk ) studied by WHO/NREVSS during the 2003–2004 season
is plotted in Figure 5.16. Estimate the total number of cases (i.e. the area under the curve) over the interval
[40, 56] using the right end points of two week intervals. Sketch the corresponding rectangles in the figure.
cases per week

2500
2000
1500
1000
500
weeks
42 44 46 48 50 52 54
Figure 5.16: Weekly rate of cases of influenza A.
31. Repeat Problem 30 using left end-points.
Estimate degree-day accumulation in problems 32 to 35 from the beginning of the first day to the end of the last day.
For each of these problems, assume that high temperature is maintained throughout the day. Clearly, your answers
will overestimate the actual number of degree-days.
32. The lower developmental threshold for cotton is 60◦ F. Estimate the degree-day accumulation for cotton in
Yreka, CA using the period of time shown in the following table.
Data Highest Temperature

Aug 1, 2003 93◦ F
Aug 2, 2003 76◦ F
Aug 3, 2003 78◦ F
Aug 4, 2003 88◦ F
Aug 5, 2003 82◦ F
Aug 6, 2003 81◦ F
Aug 7, 2003 83◦ F
Aug 8, 2003 86◦ F
Aug 9, 2003 88◦ F
Aug 10, 2003 87◦ F
33. The lower developmental threshold for cotton is 60◦ F. Estimate the degree-day accumulation for cotton in
Fresno, CA using the period of time shown in the following table.

Jun 1, 2003 94◦ F
Jun 2, 2003 98◦ F
Jun 3, 2003 100◦ F
Jun 4, 2003 92◦ F
Jun 5, 2003 93◦ F
Jun 6, 2003 89◦ F
Jun 7, 2003 88◦ F
Jun 8, 2003 94◦ F
Jun 9, 2003 94◦ F
Jun 10, 2003 83◦ F

34. The lower developmental threshold for the Elm Leaf Beetle is 52◦ F. Estimate the degree-day accumulation for
the Elm Leaf Beetle in Stockton, CA using the period of time shown in the following table.

Sept 15, 2003 83◦ F
Sept 16, 2003 80◦ F
Sept 17, 2003 81◦ F
Sept 18, 2003 87◦ F
Sept 19, 2003 92◦ F
Sept 20, 2003 95◦ F
Sept 21, 2003 96◦ F
Sept 22, 2003 97◦ F
Sept 23, 2003 90◦ F
Sept 24, 2003 77◦ F
35. The lower developmental threshold for the cornsperse stink bug is 53.6◦ F. Estimate the degree-day accumulation
for the cornsperse stink bug in Visalia, CA using the period of time shown in the following table.

Feb 1, 2004 93◦ F
Feb 2, 2004 76◦ F
Feb 3, 2004 78◦ F
Feb 4, 2004 88◦ F
Feb 5, 2004 82◦ F
Feb 6, 2004 81◦ F
Feb 7, 2004 83◦ F
Feb 8, 2004 86◦ F
Feb 9, 2004 88◦ F
Feb 10, 2004 87◦ F
36. Assume the temperature in degrees Fahrenheit is given by
T (t) = 50 + 20 cos(2πt/365) + 10 sin(2πt)
where t is time in days. A graph of this function for is shown in Figure 5.17.
Figure 5.17: Temperature variations
Assuming the lower development threshold is 40◦ F, estimate the number of degree-days that accumulate from
t = 0 to t = 10 days using time intervals of width 2.
37. Use the temperature variation model shown in Figure 5.17 to estimate the number of degree-days accumulated
from t = 0 to t = 20 for the citrus flow which has a lower developmental threshold of 49◦ F. Use time intervals
of width 4 days.
38. Suppose the velocity v (in meters per second) of a runner during the first few seconds of a race is given by

t in seconds 0 0.5 1.0 1.5 2.0 2.5

v in m/sec 0 5 9.5 15.1 21 25
Plot these points in the t–v-plane. Sketch the velocity curve. Estimate the distance traveled by the runner by
estimating the area under the velocity curve.
39. A pneumotachograph is a medical device used to measure the rate at which air is exhaled by a patient’s lungs.
Suppose Figure 5.18 shows the rate of exhalation for a particular patient. Then the area under the graph
provides a measure of the total volume of air in the lungs during exhalation. Use a Riemann sum with n = 8
and right-endpoint subinterval representatives to estimate the volume.
Figure 5.18: Rate of exhalation
40. An industrial plant spills pollutant into a lake. Suppose that the pollutant spread out to form the pattern
shown in Figure 5.19. All distances are in feet.
Figure 5.19: Pollutant spill
Use a Riemann sum with n = 6 and right-endpoint subinterval representatives to estimate the area of the spill.
41. Historical Quest
Georg Riemann (1826–1866)

In this section, we see that history honored Riemann by naming an important process after him. In his
personal life, he was frail, bashful, and timid, but in his professional life, he was one of the all time giants in
mathematics. In his book, Space Through the Ages, Cornelius Lanczos wrote, “Although Riemann’s collected
papers fill only one single volume of 538 pages, this volume weighs tons if measured intellectually. Every one
of his many discoveries was destined to change the course of mathematical science.” One of these discoveries is
the Riemman zeta function which is described in the following quote from the MacTutor Mathematics History
website (http://www-history.mcs.st-andrews.ac.uk):
Riemann’s thesis, one of the most remarkable pieces of original work to appear in a doctoral thesis,
was examined on 16 December 1851. In his report on the thesis Gauss described Riemann as having:
... a gloriously fertile originality.
..
.
A newly elected member of the Berlin Academy of Sciences had to report on their most recent research
and Riemann sent a report on On the number of primes less than a given magnitude another of his
great masterpieces which were to change the direction of mathematical research in a most significant
way. In it Riemann examined the zeta function
X 1 Y 1
ζ(s) = =
ns 1 − p−s
which had already been considered by Euler. Here the sum is over all natural numbers n while the
product is over all prime numbers. Riemann considered a very different question to the one Euler had
considered, for he looked at the zeta function as a complex function rather than a real one. Except
for a few trivial exceptions, the roots of ζ(s) all lie between 0 and 1. In the paper he stated that the
zeta function had infinitely many nontrivial roots and that it seemed probable that they all have real
part 1/2. This is the famous Riemann hypothesis which remains today one of the most important of
the unsolved problems of mathematics.
Amazingly, the Clay Mathematics Institute∗ has offered a million dollar prize for solving this conjecture. So
you can become a millionaire doing mathematics! Write a paper on Georg Riemann, and in particular discuss
this million dollar prize.
∗ For more information about this institute visit www.claymath.org

464 5.3. THE DEFINITE INTEGRAL
5.3 The Definite Integral

Previously, we defined area under a nonnegative function as the limit of a Riemann sum. In this section, we
define this limit for any continuous function (positive or negative), and develop its geometric meaning as well as its
properties.
For a nonnegative continuous function f (x) from x = a to x = b, we defined the area under the curve as
n
X
Area = lim f (x1 )∆x + f (x2 )∆x + . . . + f (xn )∆x = lim f (xi )∆x
n→∞ n→∞
i=1
b−a
where ∆x = n and xi is a point from the interval [a + (i − 1) ∆x, a + i ∆x]. It turns out that
n
X
lim f (xi )∆x
n→∞
i=1
exists and is independent of the sample points xi whenever f is continuous. When f takes on negative values, the
integral no longer corresponds to the area under the curve as we soon shall see. The existence of the limit is so
important that Gottfried Leibniz (see Historical Quest, page 16) developed a special notation for it, which we
introduce in the following definition box.
Let f be continuous on [a, b]. Then the definite integral of f from a to b is given
by
Z b n
Definite Integral X
f (x) dx = lim f (xi )∆x
a n→∞
i=1
In the definition of the definite integral, the function f that is being integrated is called the integrand; the
interval [a, b] is the interval of integration; and the endpoints a and b are called, respectively, the lower and the
upper limits of integration. The variable x is called the variable of integration. Notice that in taking the
limit the Greek letters are supplanted by the Roman letters: the ∆ becomes a d and Σ becomes an elongated S.
Example 1. From sums to integrals
Write the sum

n
X 2π i 2π
lim sin
n→∞
i=1
n n
as a definite integral.
Solution. There are several ways we can answer this problem depending on how we view the Riemann sum. For
instance, we can view this Riemann sum corresponding to an integrand sin x with sample points xi = 2πi n and
∆x = 2πn . Since the first sample point x1 = 2π
n approaches 0 as n increases, the lower limit of integration must be 0.
Since the last sample point xn = 2π for all n, the upper limit of integration must be 2π. Hence, we get the definite
integral is Z 2π
sin x dx
0
Alternatively, we can always represent the limit of the Riemann sums as an integral from x = 0 to x = 1 (in
fact, we can choose the limits of integration arbitrarily and still get things to work out!). With this view, our sample
points need to be xi = ni and ∆x = n1 . Hence, the argument of the sum is equal to sin(2πxi )2π∆x and the Riemann
sum converges to Z 1
sin(2πx)2π dx
0

5.3. THE DEFINITE INTEGRAL 465
Note that the two expressions obtained for the integrals in the above example must be the same for the theory of
integration to be consistent. And they are!—as can be easily shown once we we have learned, as we will in Section 5.5,
to transform or change the variable of integration using substitution.
Example 2. From integrals to sums
Write the integral Z 4

dx
1 x
as a limit of a Riemann sum.
Rb
Solution. Compare the given integral with a f (x) dx, and note that the integrand is f (x) = x1 and the limits of
integration are a = 1 and b = 4. If we break up the interval [1, 4] into n subintervals of equal width, then
4−1 3
∆x = =
n n
Choosing the right end-points of the intervals as sample points gives
3 3 3
x1 = 1 + , x2 = 1 + 2 · , ..., xn = 1 + n ·
n n n
Hence, the definite integral equals
n n
X X 1 3
lim f (xi )∆x = lim 3
n→∞
i=1
n→∞
i=1
1+i· n
n
2
Example 3. Approximating integrals with sums
Approximate the integral Z 0.5

tan x dx
−1
by the sum
6
X
tan xi ∆x
i=1
Choose the sample points xi to be right end-points.
Solution. Since the integrand, tan x, is continuous on the interval [−1, 0.5], the integral is well defined. The
summation expression in the problem statement implies that n = 6. Thus we choose ∆x = 0.5−(−1)
6 = 1.5
6 = 0.25, in
which case x0 = −1, x1 = −0.75, x2 = −0.5, x3 = −0.25, x4 = 0, x5 = 0.25, and x6 = 0.5. The Riemann sum is
6
X
tan xi ∆x = [tan(−0.75) + tan(−0.5) + tan(−0.25) + tan(0) + tan(0.25) + tan(0.5)] · 0.25
i=1
≈ −.2329
A graphical representation of this sum is shown in green in Figure 5.20. Notice that we got a negative number as
the area of the rectangles below the x axis were greater than the areas of the rectangles above the x axis. 2
Example 4. Computing an integral using a summation formula

0.546302
0.255342
-1 -0.75 -0.5 -0.25 0.25 0.5
-0.255342
-0.546302
-0.931596
-1.55741
Figure 5.20: Graph of y = tan x with approximating rectangles
Use a summation formula to compute

Z 2
(1 − x2 ) dx
0
Solution. Break the interval [0, 2] into n subintervals whose endpoints are 0, n2 , n4 , . . . , 2n
n . Choose xi =
2i
n. The
corresponding Riemann sum is
n
" 2 #
X 2i
1− ∆x
i=1
n
2
with ∆x = n. Expanding and rearranging terms yields
n
" 2 #
X 2i 4 · 12 4 · 22 4 · n2 2
1− ∆x = 1− 2 + 1− 2
+ ...+ 1 − 2
i=1
n n n n n
 
2 2 2

= 1 + 1 + . . . + 1 2 − 4 · 1 + 4 · 2 + . . . + 4 · n 2
| {z } n n2 n2 n2 n
n times
2 4·2
= n · − 1 2 + 2 2 + . . . + n2
n n3
n(n + 1)(2n + 1) 8
= 2−
6 n3
(n + 1)(2n + 1)4
= 2−
3n2
2
8n + 12n + 4
= 2−
3n2
8 4 4
= 2− − − 2
3 n 3n

Taking the limit of this expression as n → ∞ yields

Z 2
2 8 4 4 8 2
(1 − x ) dx = lim 2 − − − 2 = 2 − − 0 − 0 = −
0 n→∞ 3 n 3n 3 3
Again, we got an integral that is negative. However, this is alright as you are about to find out. 2
Geometric Meaning of the Definite Integral

Rb
We saw previously that a f (x) dx corresponds to the area under the curve y = f (x) provided that f (x) ≥ 0 from
x = a to x = b. The following example uses this fact to evaluate an integral.
Example 5. Integral of dx rule
Evaluate Z b
1 dx
a
Solution. Let f (x) = 1 with limits of integration x = a and x = b.
If we plot f over [a, b], we can see this is the area of a rectangle of height 1 and width (b − a). Thus,
Z b
1 dx = 1(b − a) = b − a
a
2
Rb
What happens if f (x) changes sign on the interval? In this case, a f (x) dx is the signed area of the region R
determined by the curve y = f (x), and the lines, y = 0, x = a, and x = b. More specifically, if f changes sign on
the interval [a, b], then the region R breaks up into two pieces: one piece, call it R− , that lies below the x-axis as
illustrated by the red region in Figure 5.21 and another piece, call it R+ that lies above the x-axis as illustrated by
the green region in Figure 5.21.
If A+ and A− denote the “positive” area of R+ and R− , respectively, then
Z b
f (x) dx = A+ − A−
a
Example 6. Evaluating integrals using signed areas
Using the signed area interpretation of integrals, find

Figure 5.21: The geometry of the definite integral
R2
a. −1
x dx
R3 √
b. −3 9 − x2 dx
R3
c. −3 x5 dx.
Solution.
a. Let f (x) = x on [−1, 2], as shown Figure 5.22. The graph forms two triangles, R+ and R− , that lie above
and below the x-axis, respectively. The area of R+ is 2 and the area of R− is 21 . Hence,
Z 2
1 3
x dx = 2 − =
−1 2 2
1.5
0.5
-1 -0.5 0.5 1 1.5 2
-0.5
-1

√
b. Let g(x) = 9 − x2 on [−3, 3], as shown in Figure 5.23. The graph forms a semicircle of radius 3. The
graph is always above the axis and, consequently, we need its area. Using the formula for the area of a
circle, Z 3p
1 9π
9 − x2 dx = (π · 32 ) =
−3 2 2
c. Let h(x) = x5 on [−3, 3], as shown in Figure 5.24 Notice that this graph is symmetric with respect to the
origin and, consequently, it has the same area above and below the x-axis. Therefore,
Z 3
x5 dx = 0
−3

2.5
1.5
0.5
-3 -2 -1 1 2 3
Figure 5.23: Graph of g
200
100
-3 -2 -1 1 2 3
-100
-200
Figure 5.24: Graph of h
Properties of Definite Integrals

Integrals satisfy several useful properties, some of which are summarized in the following box.
Let f and g be continuous functions on the interval [a, b].

Rb Rb Rb
Sum Rule a [f (x) + g(x)] dx =
f (x) dx + a g(x) dx
a
Properties of the Rb Rb Rb
Difference Rule a f (x) − g(x) dx = a f (x) dx − a g(x) dx
Definite Integral:
Part I Rb Rb
Scalar Rule a c f (x) dx = c a f (x) dx
Rb Ra
Opposite Rule a f (x) dx = − b f (x) dx
These properties can be proved using Riemann sums and limit laws (see the problem set).
Example 7. Using the properties of definite integrals

R3 √
Evaluate −3 [2 9 − x2 − 5] dx.
Solution.
Z 3 p Z 3 p Z 3
2
[2 9 − x − 5] dx = 2
2 9−x − 5 dx Difference rule
−3 −3 −3
Z 3 p Z 3
= 2 9 − x2 − 5 1 dx Scalar rule
−3 −3

9π
= 2( ) − 5(3 − (−3)) From Examples 5 and 6
2
= 9π − 30
Combining these integral properties with the geometric interpretation of the integral allows one to quickly compute
certain integrals.
Example 8. Growing Grapes
Thompson Seedless Grapes have a lower developmental threshold of 50◦ F and require approximately 3,000 degree-
days to ripen. Suppose the temperature in the fields is given by
T (x) = 70 + 10 sin(2π x)
where x is time in days. Write down an expression involving definite integrals that represents the number of degree-
days accumulated from x = 0 to x = 10, and evaluate this expression.
Solution. We are interested in finding the area between the curves y = 50 and y = 70 + 10 sin(2π x) from x = 0 to
x = 10 as illustrated in Figure 5.25.
temperature
80
60
40
20
days
2 4 6 8 10
Figure 5.25: Degree-days accumulated for 10 days
Since this area is computed by finding the area below the curve y = 70 + 10 sin(2π x) and then subtracting the
area below the curve y = 50, the number of accumulated degree-days is:
Z 10 Z 10
Accumulated degree-days = [70 + 10 sin(2π x)] dx − 50 dx
0 0
Z 10 Z 10 Z 10
= 70 dx + 10 sin(2πx) dx − 50 dx Sum rule
0 0 0
Z 10 Z 10
= 20 dx + 10 sin(2π x) dx Difference rule
0 0
Z 10 Z 10
= 20 dx + 10 sin(2π x) dx Scalar rule
0 0
Z 10
= 200 + 10 sin(2π x) dx Integral of dx rule
0

Since the integral of sin(2πx) has equal area above and below the x-axis on the interval [0, 10], its value is zero.
Hence, the number of degree-days accumulated is 200. This area could be found by noticing that the “hills” of the
temperature functions can be fit in the valleys yielding a 20 by 10 rectangle. 2
We conclude this section with some additional properties of the definite integral.
Assuming all integrals exist, there are the following properties.

POSITIVITY If f (x) ≥ 0 from x = a to x = b, then
Z b
f (x) dx ≥ 0
a
.
DOMINANCE If f (x) ≥ g(x) from x = a to x = b, then
Z b Z b
f (x) dx ≥ g(x) dx
a a
Properties of the
Definite Integral: BOUNDING If m ≤ f (x) ≤ M from x = a to x = b, then
Part II Z b
m(b − a) ≤ f (x) dx = M (b − a)
a
SPLITTING Z Z Z
b c b
f (x) dx = f (x) dx + f (x) dx
a a c
DEFINITE INTEGRAL AT A POINT

Z a
f (x) dx = 0
a
Positivity can be proved using the definition of a definite integral, and positivity, in turn, can be used to prove
dominance and bounding. For example, to prove dominance, suppose that f (x) ≥ g(x) from x = a to x = b. Then
f (x) − g(x) ≥ 0 from x = a to x = b. Applying the property of differences and positivity yields
Z b Z b Z b
f (x) dx − g(x) dx = [f (x) − g(x)] dx ≥ 0
a a a
Hence
Z b Z b
f (x) dx ≥ g(x) dx
a a
If we set M and m to be the maximum value and minimum value, respectively, of f on the interval [a, b], then the
bounding property provides crude estimates for the value of a definite integral. When working through detailed
computations by hand, these crude estimates allow us to see whether our work has resulted in a reasonable answer.
Finally, a proof of the splitting property is somewhat subtle, but geometrically intuitive as illustrated in Figure 5.26.
Example 9. Using Bounds
Show that
Z 6
24 ≤ 10 + 2 sin(x2 ) dx ≤ 36
3

x
a c b
Rb Rc Rb
Figure 5.26: Geometric depiction of the splitting property: a f (x) dx = a f (x) dx + c f (x) dx
Solution. Since the sine function is bounded between -1 and 1 it follows that 8 ≤ 10 + 2 sin(x2 ) ≤ 12 for all x. The
bounding property implies that
Z 6
24 = 8 · (6 − 3) ≤ 10 + 2 sin x2 dx ≤ 12 · (6 − 3) = 36
3
which yields the desired result as illustrated below:
12
10
0
3 3.5 4 4.5 5 5.5 6
Example 10. Using the splitting property

R9 R9 R4
Suppose that 4 f (x) dx = 100 and −3 f (x) dx = 125, find −3 f (x) dx
Solution. By the splitting property,

Z 9
125 = f (x) dx
−3
Z 4 Z 9
= f (x) dx + f (x) dx
−3 4
Z 4
= f (x) dx + 100
−3

Thus,
Z 4
f (x) dx = 125 − 100 = 25
−3
2
Problem Set 5.3

R1
Express the limits in Problems 1 to 6 as definite integrals of the form 0
f (x) dx.
1.
n
X i
lim
n→∞
i=1
n2
2.
n
X i2
lim
n→∞
i=1
n3
3.
n
X 3i 3
lim −2 +
n→∞
i=1
n n
4.
n
X 2i 2
lim 1−
n→∞
i=1
n n
5.
n
X i2 1
lim 1− 2
n→∞
i=1
n n
6.
n
X πi π
lim sin −π
n→∞
i=1
n n
Express the definite integrals in Problems 7 to 12 as limits of Riemann sums.

R2
7. 1 x4 dx
R1 2
8. −1 (x − x) dx
R1
9. 0 ex dx
R4
10. −1
ex dx
R1
11. −1
|x| dx
R1
12. −1 | cos x| dx

First sketch the region under the graph of y = f (x) on the interval [a, b]. Then use the interpretation of the definite
Rb
integral a f (x) dx as a signed area to evaluate the integrals in Problems 13 to 16.
R3
13. −4 (1 − 2x) dx
R 2π
14. cos x dx
0
R4√
15. 0 16 − x2 dx
R3
16. −1 |x| dx
Evaluate each of the integrals in Problems 17 to 22 by using the following information together with the sum rule and
the splitting property:
Z 2 Z 0 Z 2 Z 2
1 3
f (x) dx = 3; f (x) dx = ; g(x) dx = ; g(x) dx = 2
−1 −1 3 −1 2 0
R −1
17. 0
f (x) dx
R2
18. −1
f (x) + g(x) dx
R2
19. −1 2f (x) − 3g(x) dx
R2
20. 0 f (x) dx
R0
21. −1 g(x) dx
R0
22. −1 3f (x) − 5 dx
Using integral properties to establish the statements in Problems 23 to 26.

Rπ
23. 0 sin x dx ≤ π Hint: sin x ≤ 1 for all x.
9
R 10
24. 10 ≤ 1 dx x ≤ 9
R1 √ √
25. 2 ≤ −1 1 + x2 dx ≤ 2 2
R1 1
26. 0
x3 dx ≤ 2 Hint: Note that x3 ≤ x on [0, 1].
R1 1
27. Use the fact that 0
x2 dx = 3 and the geometric interpretation of the integral to find
Z 1
x2 dx
−1
28. Use the graph of y = cos x to evaluate

Z b
cos x dx
a
on the indicated interval.
(a) [0, 2π]

(b) [ π2 , 5π
2 ]
(c) If a = 0, for what values of b > 0 does the integral take on its largest value?
R4 R4
29. Given −2 [5f (x) + 2g(x)]dx = 7 and −2 [3f (x) + g(x)]dx = 10, find

R4
(a) −2
f (x) dx
R4
(b) g(x) dx
−2
R2 R2 R2
30. Suppose 0 f (x) dx = 3, 0 g(x) dx = −1, and 0 h(x) dx = 3.
R2
(a) Evaluate 0 [2f (x) + 5g(x) − 7h(x)] dx
(b) Find the value of s so that
Z 2
[5f (x) + sg(x) − 6h(x)] dx = 0
0
R2
31. Evaluate −1
f (x) dx given that
Z 1 Z 3 Z 3
f (x) dx = 3 f (x) dx = −2 f (x) dx = 5
−1 2 1
R1 R2 R2 R2
32. If 0 f (x) dx = 1, 0 f (x) dx = 3, and 1 g(x) dx = 4, then find 1 [f (x) − g(x)] dx.
Using right end-points with n = 5, approximate the definite integrals in Problems 33 to 36. Indicate whether each
approximation is greater than or less than the actual definite integral.
R0
33. −2 x2 , dx
R2
34. 1 x3 , dx
R4
35. 1 dxx
R0 √ √
36. −1 1 + x2 dx Hint: is 1 + x2 increasing or decreasing on the interval [−1, 0]?
Use Riemann sums with right end-points, along with a summation formula (p. 459) to evaluate the integrals in
Problems 37 to 38.
R3
37. 0 (x3 − 3) dx
R1
38. 0 (2x2 − 4) dx
Show that each statement about area in Problems 39 to 45 is generally true, or provide a counterexample. It will
probably help to sketch the indicated region for each problem.
39. If C > 0 is a constant, the region under the line y = C on the interval [a, b] has area A = C(b − a).
40. If c > 0 is a constant and b > a ≥ 0, the region under the line y = Cx on the interval [a, b] has area
A = 21 C(b − a).
41. Let f be a function that satisfies f (x) ≥ 0 for x in the interval [a, b]. Then the area under the curve y = [f (x)]2
on the interval [a, b] must always be greater than the area under y = f (x) on the same interval.
42. A function f is said to be even if f (−x) = f (x). If f is even and f (x) ≥ 0 throughout the interval [−a, a], then
the area under the curve y = f (x) on this interval is twice the area under y = f (x) on [0, a].
43. We saw in Example 8 that Thompson Seedless grapes have a lower developmental threshold of 50◦ F and require
approximately 3, 000 degree-days to ripen. Suppose the temperature in the fields is given by
T (x) = 70 + 10 sin(2πx)
where x is time in days. Write an expression involving definite integrals that represent the number of degree-
days accumulated from x = 10 to x = 20, and evaluate this expression.

44. Assume the temperature in degrees Fahrenheit is given by
T (x) = 50 + 20 cos(2πx)
where x is the time in days. Also, assume the lower developmental threshold is 30◦ F. Write an expression
involving definite integrals that represent the number of degree-days accumulated from x = 0 to x = 15.
Evaluate this expression.
45. A function f is said to be odd if f (−x) = −f (x). Show that if f is odd on the interval [−a, a] then the signed
area under the curve y = f (x) is 0.
46. Prove the sum rule for integrals using the definition of a definite integral and properties of summation.
47. Generalize the splitting property by showing that for a ≤ c ≤ d ≤ b
Z b Z c Z d Z b
f (x) dx = f (x) dx + f (x) dx + f (x) dx
a a c d
whenever all these integrals exist.

48. Prove the bounding rule for definite integrals: If f is integrable on the closed interval [a, b] and m ≤ f (x) ≤ M
for constants, m, M , and all x in the closed interval, then
Z b
m(b − a) ≤ f (x) dx ≤ M (b − a)
a
49. Historical Quest Gilles Roberval (1602–1675) started his study of mathematics at the age of 14 years. He
had a distinguished career, and was a founding member of the Académie Royale des Sciences, and in 1669 he
invented the Roberval balance, (Figure 5.27), which is still widely used today.
Figure 5.27: Roberval’s balance variations
Roberval had a chair position as Professor of Mathematics at the College Royale. Every three years, there
was a contest by competitive examination (written by the incumbent!) to determine who would occupy this
position. It is said for this reason, Roberval kept many of his techniques of integration secret until his death.
We do know, however, that he developed powerful methods in the early study of integration. These methods
are described in his treatise Traité des indivisibles. For instance, in this treatise, he computed the definite
integral of sin(x) using obscure trigonometric identities. It is these identities that the computer algebra system
uses to simplify the sum and take the limit. For this quest, you may stand on the shoulders of Roberval and
use technology to answer this question. Write down a Riemann sum for
Z π
sin x dx
0
using right end-points. Then use technology to simplify this sum to evaluate this definite integral.

5.4. THE FUNDAMENTAL THEOREM OF CALCULUS 477
5.4 The Fundamental Theorem of Calculus

In this section, we discuss the evaluation theorem and the fundamental theorem of calculus. These theorems link
antiderivatives which we can compute relatively easily, with definite integrals and Riemann sums. We show that
antiderivatives and Riemann sums, when they both exist, are the same thing. When a simple analytical expression
F (x) cannot be found as a solution to the equation F ′ (x) = f (x), Riemann sums provide ways to find approximate
solutions to this differential equation.
The Evaluation Theorem and Net Change
Theorem 5.2. Evalution Theorem
Let f be a continuous function on [a, b], and F be any antiderivative of f . Then

Z b
f (x) dx = F (b) − F (a)
a
The proof of this theorem is a corollary of the fundamental theorem of calculus, which is discussed later in this
section. Why is this theorem useful? Well, as we saw in the beginning of this chapter, finding antiderivatives is
much easier than taking limits of Riemann sums. This theorem allows us to evaluate definite integrals by finding and
evaluating an antiderivative! This fact is so important that this theorem is sometimes itself called the fundamental
theorem of calculus, even though it is only a corollary to the more powerful theorem of the same name. Despite this
theorem and the fact that Riemann sums connect antiderivatives with the area under a curve, Riemann sums are
important in providing a basis for the development of powerful computational tools for finding the numerical values
of definite integrals when we can’t find an explicit formula for an antiderivative—a topic we explore later in this
chapter.
Example 1. Using the evaluation theorem
Evaluate the following definite integrals.

R1
a. 0 x7 dx
R2
b. 1 x−1 dx
Rπ
c. 0
sin x dx
Solution.
a. Since an antiderivative of f (x) = x7 is F (x) = 18 x8 , the Evaluation Theorem tells us

Z 1
1 1
x7 dx = F (1) − F (0) = −0=
0 8 8
Notice that if we took another antiderivative, say F (x) = 18 x8 + 14, we still get
1 1
F (1) − F (0) = + 14 − (0 + 14) =
8 8
as constant term 14 cancels out.

478 5.4. THE FUNDAMENTAL THEOREM OF CALCULUS
b. Since an antiderivative of f (x) = x−1 is F (x) = ln x, the Evaluation Theorem tells us

Z 2
x−1 dx = F (2) − F (1) = ln 2 − 0 = ln 2
1
Notice that we used the fact that x1 is continuous on the interval [1, 2]. The Fundamental Theorem would
R1
not apply to −1 dx 1
x as x is not continuous on the interval [−1, 1].
c. Since the antiderivative of f (x) = sin x is F (x) = − cos x, the Evaluation Theorem tells us
Z π
sin x dx = F (π) − F (0) = − cos π − (− cos 0) = 2
0
To appreciate the power of the Evaluation Theorem, compare the work of Example 1 with work required to find
the limit of the Riemann sum for each of these functions. To simplify our work, we introduce the following notation
x=b

F (b) − F (a) = F (x)
x=a
or, when there is no ambiguity,

b

F (b) − F (a) = F (x)
a
Example 2. Using evaluation notation
R 5π/4
Evaluate π
sec2 x dx.
Solution. Since the antiderivative of sec2 x is tan x, the evaluation theorem tells us
Z 5π/4 5π/4
sec2 x dx = tan x

=1−0=1
π π
Notice that we used the fact that sec2 x is continuous on the interval from x = π to 5π/4. The Fundamental Theorem
does not apply on the interval from x = π to 2π as sec2 x is not defined at x = 3π 2 .
2
As illustrated in Example 3 in section 2 of this chapter, an important interpretation of the Evaluation Theorem
is that it relates the accumulated change of a function over an interval to the area under its derivative. Given a
function F , we define the accumulated change of F over [a, b] as
Accumulated Change of F over [a, b] = F (b) − F (a)
For instance, if F (x) represents the total number births in the world by year x, then F (b) − F (a) is the number of
births that occurred between years a and b. Now suppose that f (x) = F ′ (x). In other words, f is the derivative of
Rb
F . Equivalently, F is an antiderivative of f . By the Evaluation Theorem, F (b) − F (a) = a f (x) dx. That is,
Accumulated Change of F over [a, b] = Signed Area of F ′ over [a, b]
This means that F ′ (x) is the instantaneous birth rate in year x. Thus, the Evaluation Theorem asserts that the area
under the instantaneous birth rate equals the accumulated change in births. Does this make sense? At the level of
units, it certainly does. The instantaneous birth rate has units births per year. Hence, the area over an interval of
time has units births per year multiplied by years. This equals births which has the same units as the accumulated
change. Moreover, if we broke up the interval [a, b] into small subintervals, then the number of births in a given
subinterval would be approximately the instantaneous birth rate at some point in the subinterval times the length of

Figure 5.28: A bighorn ram
the subinterval i.e. the area of a rectangle lying above this subinterval. Adding up all these little rectangular areas
would give us simultaneously an approximation for the number of births over [a, b] and the area under F ′ . More
generally, if you integrate the rate of change over an interval [a, b], then you get the accumulated change over [a, b].
Example 3. Horn increase for the bighorn ram
Bighorn sheep (Ovis canadensis) inhabit remote mountain and desert regions. They are restricted to semi-open,
precipitous terrain with rocky slopes, ridges, and cliffs or rugged canyons. Forage, water, and escape terrain are the
most important components of bighorn sheep habitat. Jon Jorgenson and colleagues (“Effects of population density
on horn development in bighorn rams,” Journal of Wildlife Management, 62 (1998), 1011–1020) found the rate of
increase of a bighorn ram’s horn is approximated by the function
0.1762 x2 − 3.986 x + 22.68 cm per year
for x between 3 and 9 years.
Find the accumulated change in the length of a bighorn ram’s horn from age x = 3 to x = 9.
Solution. Let F (x) denote the length of a ram’s horn at age x years. Then,
Z 9
Accumulated Change over [3, 9] = F (9) − F (3) = (0.1762 x2 − 3.986 x + 22.68) dx
3
The evaluation theorem implies
Z 9
0.1762 3 3.986 2 9
(0.1762 x2 − 3.986 x + 22.68) dx = x − x + 22.68 x
3 3 2 3

0.1762 3 3.986 2
= (9) − (9) + 22.68(9)
3 2

0.1762 3 3.986 2
− (3) − (3) + 22.68(3)
3 2
= 85.5036 − 51.6888
= 33.8148 cm
2

The Fundamental Theorem of Calculus

The answer to Example 3 gives us the net increase in the length of the ram’s horn over the whole six year period.
Suppose, though, that we want to find the net increase at any age x in time during the six year period from age 3
to age 9. The net increase is given by the function
Z x
G(x) = (0.1762 u2 − 3.986 u + 22.68)du
3
for 3 ≤ x ≤ 9. For example, we found G(6) ≈ 33.8148 cm. In writing down the integral defining G(x), we took
advantage of the fact that variable of integration is a dummy variable. Consequently to avoid unnecessary confusion,
we choose the variable u of integration to be different from our time variable x. By the Evaluation Theorem,

0.1762 3 3.986 2 x
G(x) = u − u + 22.68 u
3 2 3
= 0.0587 x3 − 1.993 x2 + 22.68 x − 51.6888
Plotting G(x) from x = 3 to x = 6 as shown in Figure 5.29, illustrates how the net increase of the length of the horn
changes over this time interval. Notice, as we might expect, the length of the horn is increasing at a decreasing rate.
length
30
25
20
15
10
5
years
4 5 6 7 8 9
Figure 5.29: The estimated growth of a ram’s horn in centimeters
We can now generalize this idea for any continuous function f defined on the interval [a, b] by considering the
function Z x
G(x) = f (u) du for a ≤ x ≤ b.
a
If we interpret f (x) as a rate, then G(x) describes how the accumulated change varies as a function of x. Alternatively,
G(x) describes how the signed area under f confined to the interval [a, x] varies as a function of x.
Theorem 5.3. Fundamental Theorem of Calculus (FTC)
Consider a continuous function f on the interval [a, b]. Then G defined by

Z x
G(x) = f (u) du
a
is an antiderivative of f (x) on (a, b). In other words,

Z x
d
f (u) du = f (x) on (a, b)
dx a

Why should this be true? The idea of the proof is as follows. The splitting property of integrals implies that
Z x+∆x Z x
G(x + ∆x) − G(x) = f (u) du − f (u) du
a a
Z x Z x+∆x Z x
= f (u) du + f (u) du − f (u) du
a x a
Z x+∆x
= f (u) du
x
On the other hand, continuity of f implies that f (u) ≈ f (x) for u between x and x + ∆x. Furthermore this
approximation gets better and better as ∆x approaches zero. Thus,
Z x+∆x Z x+∆x
G(x + ∆x) − G(x) ≈ f (x) du = f (x) du = f (x)[x + ∆x − x] = f (x)∆x
x x
Dividing both sides by ∆x and letting ∆x go to zero suggests that
G′ (x) = f (x).
To really show that this final statement is true requires a bit more care using ǫ’s and δ’s.
An important consequence of the Fundamental Theorem of Calculus is that it proves that every continuous
function f has an antiderivative G, even though G cannot always be expressed using a combination of elementary
functions (e.g. polynomial, exponential, trigonometric functions, etc.). A corollary of the Fundamental Theorem of
Calculus is the Evaluation Theorem. Proving this corollary is left for the problem set (Problem 33).
Example 4. Derivatives via the Fundamental Theorem
Compute the following derivatives.

d
R3√
a. dx 1
u + u3 du
d
Rx√
b. dx 0
u − u3 du on the interval [0, 1].
Solution.
R3√ d
R3√
a. Since 1
u + u3 du is a number, we are taking the derivative of a constant and dx 1
u + u3 du = 0.
d
Rx√ √
b. By the Fundamental Theorem of Calculus, dx 0
u − u3 du = x − x3 .
Example 5. From integrals to integrands
Suppose
Z x √
f (u) du = x+a
3
Find f and a.
Rx
Solution. Let G(x) = 3 f (u) du. By the Fundamental Theorem of Calculus,
d √ 1
f (x) = G′ (x) = [ x + a] = √
dx 2 x

To find a, notice that

√
3+a = G(3)
Z 3
= f (u) du = 0
3
√ 1
Therefore, a = − 3 and f (x) = √
2 x
. 2
Using the Fundamental Theorem, we can easily compute the accumulation of degree-days.
Example 6. Seedless Grapes
Thompson Seedless Grapes (see Figure 5.30) have a lower developmental threshold of 50◦ F and require approxi-
mately 3,000 degree-days to ripen.
Figure 5.30: Seedless grapes
Suppose the temperature (◦ F) in the fields is given by

T (t) = 70 + 10 sin(2π t)
where t is time in days. Write down an expression involving definite integrals that represent the number of degree-
days accumulated from day 0 to day x, evaluate this expression, and find the time x at which 3, 000 degree-days have
accumulated.
Solution. Since T (t) ≥ 50 for all t (i.e. the lower developmentalR x threshold does not require consideration), the
number of degree-days accumulated by day x is given by G(x) = 0 (T (t) − 50) dt. Integrating yields
Z x
G(x) = (T (t) − 50) dt
0
Z x
= (20 + 10 sin(2π t)) dt
0

10 x
= 20 t − cos(2π t)
2π 0
5 5
= 20 x − cos(2π x) +
π π

Since G′ (x) = 20 + 10 sin(2πx) > 0, G is an increasing function. Therefore, if we find a positive solution, then it
is the only solution. Notice that if x is an integer, then
5 5
G(x) = 20 x − + = 20 x
π π
Solving
20x = 3, 000
x = 150
The grapes will be ready for plucking in 150 days. 2
Indefinite Integrals
Since the Fundamental Theorem ensures the existence of antiderivatives via integrals, it is appropriate to introduce
a notation for the general antiderivative.
If f (x) is a continuous function, then

Z
f (x) dx
Indefinite Integral
is called the indefinite integral of f and is equal to the general antiderivative of

f.
The fact that the indefinite integral has no upper limit of integration or lower limit of integration distinguishes
it from a definite integral. It is important to remember that the indefinite integral represents a family of functions,
whereas the definite integral represents a value.
Example 7. Finding indefinite integrals
Compute the following indefinite integrals.

R
a. ex dx
R
b. x3 + 2 dx
R
c. sec2 x dx
Solution.
R
a. ex dx = ex + C where C is an arbitrary constant.
R 4
b. x3 + 2 dx = x4 + 2 x + C where C is an arbitrary constant.
R
c. sec2 x dx = tan(x) + C where C is an arbitrary constant.
2
Problem Set 5.4

In Problems 1 to 8, evaluate the definite integral.
R 10 R5
1. a. −10 6 dx b. −3 (2x + a) dx

R7 R2
2. a. −5
(−3) dx b. −2
(b − x) dx
R4 Rπ
3. a. 0
(x2 − 1) dx b. (sin x + x) dx
0
R1 R −1
4. a. −1
(x3 + bx2 ) dx b. −2 xb2 dx
R9√ R1
5. a. 0
x dx b. 0
(5u7 + π 2 ) du
R 27 √ R1 √
6. a. 0 3 x dx b. 0
(7u8 + π) du
R2 R1
7. a. 1 (2x)π dx b. −1 ex+1 dx
R2 R2
8. a. 1 x2a dx, a 6= − 12 b. 1 x2a dx, a = − 12
Find the indefinite integrals in Problems 9 to 16.

R R
9. a. (4t3 + 3t2 ) dt b. (−8t3 + 15t5 ) dt
R R
10. a. dx 2x b. 14ex dx
R R √
11. a. (−3 cos u) du b. (5t3 − t) dt
R R
12. a. 2 sin θ dθ b. cos3 θ dθ
R√ R√ √
13. a. x(x + 1) dx b. t(t − t) dt
R 2 R 2
14. a. x x+1 2 dx b. x +x−1
√
x
dx
R x2 −4
R x2 −1
15. a. x−2 dx b. x+1 dx
R R
16. a. (sin2 x + cos2 x) dx b. (sec2 t − tan2 t) dt
Compute G′ (x) for the functions given in Problems 17 to 22.

Rx √
17. G(x) = −1 1 + u2 du
Rx
18. G(x) = π (sec2 t) tan t dt
3
Rx dt
19. G(x) = 4 2+sin t2
Rx sin u
20. G(x) = 1 u du
R2 eu
21. G(x) = x u du
R3 2
22. G(x) = x et dt
23. Find a function f and a number a such that
Z x
f (t) dt = cos(2x) + a
0
24. Find a function f and a number a such that

Z x
f (t)dt = ln(x) + 4
a
R
25. a. If F (x) = √1 − 4 dx, find the particular F so that F (1) = 0.
x

0.5
x
-2 -1 1 2
-0.5
-1
-1.5
b. Sketch the graphs of y = F (x), y = F (x) + 3, and y = F (x) − 1.

√
a. 2 x − 4 x + 2
b. See answer art.
Rx
26. Let F (x) = −2 f (u) du where the graph of f is shown in Figure 5.31.
(a) For what values of x does F (x) have a local maximum or minimum?
(b) For what values of x is F concave up? concave down?
(c) At what values of x does F (x) achieve a global maximum? global minimum?
(d) Sketch the graph of F (x).
Rx
27. Let G(x) = 0 g(u) du where the graph of g is shown in Figure 5.32.
x
1 2 3 4
-1
-2
-3
Figure 5.32: Graph of g
(a) For what values of x does G(x) have a local maximum or minimum?
(b) For what values of x is G concave up? concave down?
(c) At what values of x does G(x) achieve a global maximum? global minimum?
(d) Sketch the graph of G(x).
28. Use the model for Bighorn rams formulated by Jon Jorgenson et. al. (see Example 3, p. 479) to find the net
increase in length of a ram’s horn from x = 3 to x = 7.
29. Use the model for Bighorn rams formulated by Jon Jorgenson et. al. (see Example 3, p. 479) to find the net
increase in length of a ram’s horn from x = 5 to x = 9.
30. A model used to estimate the time that Citrus Flower occurs in Tulare County, CA has a lower developmental
threshold of 49◦ F and requires approximately 767 degree-days to reach petal-fall (i.e. 50% of citrus flowers
have lost their petals). Suppose the temperature in the fields is given by
T (x) = 74 + 14 sin(2πx)
where x is the time in days.

a. Write down an expression involving definite integrals that represents the time, x, at which 767
degree-days have accumulated.
b. Use technology to estimate the time x.
31. Sweet corn in western Oregon has a lower developmental threshold of 50◦ F and requires approximately 1, 597
degree-days to reach maturity. Suppose the temperature in the fields is given by
T (x) = 68 + 17 sin(2πx)
where x is the time in days.

a. Write down an expression involving definite integrals that represents the time, x, at which 1, 597
degree-days have accumulated.
b. Use technology to estimate the time x.
32. The rate of change of ant diversity along an elevational gradient in the Spring Mountains is given by
F ′ (x) = 24.9 − 15.4 x species on average per km
where x is elevation above sea level measured in kilometers. If F (1) = 6.9, find an expression for F (x), the
number of ant species at an elevation of x km. Compare your answer to Example 5 in Section 2.7.
33. Prove the Evaluation Theorem (Theorem 5.2) using the Fundamental Theorem of Calculus (Theorem 5.3).

5.5. SUBSTITUTION 487
5.5 Substitution
In the next two sections, we discuss three techniques of integration: substitution, integration by parts, and
partial fractions. The first two of these techniques are counterparts to rules of differentiation. Unlike differentiation,
techniques of integration are incomplete; not every function has a elementary representation of its indefinite integral.
Consequently, part of the skill you will need to acquire is learning which integration techniques apply to which
functions. Since it is possible to compute integrals using technology, you may be asking, “Why bother with these
techniques?” There are several responses. First, by learning integration techniques and determining when and which
order to use them, we gain some insight into how these technologies work. Second, sometimes technology needs
a helping hand; implementing an integration technique (especially substitution) by hand may allow technology to
complete a calculation it could not do otherwise. Third, computing integrals builds important mathematical skills.
Substitution for Indefinite Integrals
Our integration efforts begin with an antidifferentiation form of the chain rule. Consider the integral
Z
2x
2
dx
x +5
Our basic rules of antidifferentiation provide us no direct way of computing this integral. However, if you look
carefully at this integral, then you might notice that the derivative of denominator equals the numerator. This
observation suggests introducing a new variable u = x2 + 5. Then we have that du dx = 2x. Formally (without
justification yet) we can write du = 2x dx and
Z Z
2x 2x dx
2
dx =
x +5 x2 + 5
Z
du
=
u
= ln |u| + C
= ln(x2 + 5) + C
Now while these calculations are all well and good, we have not justified them. However, we can verify our answer
d
by differentiating. Since dx ln(x2 + 5) = x21+5 2x, ln(x2 + 5) + C is the general antiderivative of x22x
+5 .
To see why this approach worked, consider the integral
Z
f [g(x)]g ′ (x) dx
R
for any functions f and g. For instance, in our example x22x 2
+5 dx, we have f (x) = 1/x and g(x) = x + 5. If F is
an antiderivative of f which we now know is guaranteed if f is continuous on an interval, then
Z Z
f [g(x)]g ′ (x) dx = F ′ [g(x)]g ′ (x) dx definition of F
Z
d
= F [g(x)] dx chain rule
dx
= F [g(x)] + C FTC
Equivalently, if we make the change of variables u = g(x), then
Z
f [g(x)]g ′ (x) dx = F [g(x)] + C
= F (u) + C substitution
Z
= F ′ (u) du FTC
Z
= f (u) du definition of F

488 5.5. SUBSTITUTION
We summarize this procedure in the following box.
Over any interval of x for which u = g(x) is differentiable and f is continuous on

the range of this function, the relationship
Z Z
Integration by ′
substitution f (g(x)) g (x) dx = f (u) du
|{z} | {z }
u du
holds.
Example 1. Integration by substitution
Find Z
9(x2 + 3x + 5)(2x + 3) dx
Solution. For the procedure of substitution, we need to identify the appropriate change of variables.
Z
9(x2 + 3x + 5)(2x + 3) dx
Let u = x2 + 3x + 5. Then du = (2x + 3)dx and

Z Z
9(x2 + 3x + 5)(2x + 3) dx = 9 u du
9 2
= u +C Power Rule
2
9 2
= (x + 3x + 5)2 + C Return to original variable.
2
This procedure may seem a bit difficult to start because you may not be sure what to let the variable u represent.
Just remember, at least initially, you are looking for one part of the integrand that is the derivative of another part
of the integrand. If you practice enough, things will get easier!
Example 2. Substitution with a radical function

R√
Find 3x + 7 dx.
du
Solution. Let u = 3x + 7, so du = 3dx or dx = 3 . Substituting and integrating yields
Z Z
√ √ du
3x + 7dx = u Substitute
3
Z
1
= u1/2 du Simplify
3
1 2 3/2
= · u +C Power rule
3 3
2
= (3x + 7)3/2 + C Return to original variable.
9

As the previous two examples illustrate, after making a substitution and simplifying, there should be no x values
in the integrand. Sometimes, eliminating all the x’s requires some additional work.
Example 3. Substitution with leftover x-values

R √
Find x 4x + 5 dx.
du
Solution. Let u = 4x + 5. Then du = 4dx and dx = 4 .
Z Z
√ √ du
x 4x + 5 dx = x u Substitute
4
Z
1 √
= x u du Left-over x-value: since u = 4x + 5, it follows that x = u−5
4
4
Z
1 u − 5√
= u du
4 4
Z
1 √ √
= (u u − 5 u) du Simplify
16
Z Z
1 5
= u3/2 du − u1/2 du Difference rule
16 16
1 u5/2 5 u3/2
= 5 − +C Power rule
16 2 16 32
1 5
= (4x + 5)5/2 − (4x + 5)3/2 + C Simplify and return to the original variable.
40 24
2
Example 4. Substitution with a trigonometric function
Find Z
tan x dx
sin x d
Solution. Recall that tan x = cos x and dx cos x = − sin x. Hence using u = cos x should do the trick:
Z Z
sin x
tan x dx = dx
cos x
Z
1
= (sin x dx) Let u = cos x, so du = − sin x dx
cos x
Z
1
= (−1) du Substitution
u
= − ln |u| + C Antiderivative of 1/u
= − ln | cos x| + C Return to the original variable.
We are in the midst of learning techniques of integration, and some of you will, no doubt, have access to a
calculator or a computer program which can assist in the process of integration. While technology is very useful,
there are times when doing an integral by hand will result in a simpler form of the answer, or the technology will
give an incomplete form which needs to be adjusted.
Example 5. Using technology to integrate

Use technology on the previous example. That is, use technology to find
Z
tan x dx
Solution. You will need to check the formatting requirements for the calculator or software you are using. However,
most will require a statement such as
integrate(tan(x), x)
Some calculators (TI-89 or TI-92) will output − ln(| cos(x)|) while other programs (such as Maple 9) will output
− ln cos x. Notice that neither expression includes the constant, C. Also notice, that sometimes technology will be
correct with certain conditions. For example, the term “cos x” may not have absolute value signs. This creates no
problem provided cos x > 0. However, if we want to plot an antiderivative over a large range of x-values, we need to
insert absolute values. 2
Lest you think that if you purchase a calculator you will not need to study and master techniques of integration,
consider the following example.
Example 6. Use substitution with the help of technology
Find Z p
(1 + ln x) 1 + (x ln x)2 dx
Solution. If you attempt to use technology on this example (e.g., TI-89, TI-92, or Maple 9 ), you will find no
satisfactory answer is provided. However, if we let u = x ln x, then du = (1 + ln x) dx and we have
Z p Z p
(1 + ln x) 1 + (x ln x)2 dx = 1 + u2 du
This function can be evaluated using technology to give one of the equivalent forms. The first is
1 p
u 1 + u2 + sinh−1 (u)
2

where sinh(θ) is the hyperbolic sine function, which has the form sinh(θ) = eθ − e−θ /2, and sinh−1 is its inverse.
The second is √ √
ln | u2 + 1 + u| u u2 + 1
+
2 2
Notice that neither of the forms has the “+C”, and despite looking quite different these two expressions are alge-
braically equivalent (graph them!). 2
Substitution for Definite Integrals

We have two methods for dealing with definite integrals, there are two ways to go. One is to return to the original
variable (as we did with indefinite integrals), and the other is to keep track of the change of variables in the limits
of integration. We illustrate both of these methods by considering Example 4.
Method I: return to the original variable.

Z π/4 Z x=π/4 x=π/4
du
tan x dx = − where u = cos x
0 x=0 u x=0
x=π/4

= − ln |u|
x=0
π/4

= − ln | cos x|
0

= − ln | cos(π/4)| − (− ln | cos(0)|)

1
= − ln √ + ln 1
2
√ √
= ln 2 + 0 = ln 2
Method II: keep track of the change of variables in the limits of integration.
Z π/4 Z 1/√2
−du
tan x dx = If x = 0 (lower limit), then u = cos 0 = 1, and
0 1 u
√
if x = π/4 (upper limit), then u = cos(π/4) = 1/ 2
√
1/ 2

= − ln |u| Since limits of integration were changed, it is not
1
necessary to return to the original variable.

1
= − ln √ + ln 1 Evaluate
2
√
= ln 2
Consider the general case of this second approach. Let u = g(x) and F (x) be antiderivative of f (x). Then
Z b Z b
′
f (g(x))g (x) dx = F ′ (g(x))g ′ (x) dx
a a
Z b
d
= F (g(x)) dx
a dx
= F [g(b)] − F [g(a)]
Z g(b)
= f (u) du
g(a)
We summarize this observation in the following box.
If g ′ (x) is a continuous function on [a, b] and f is continuous on the range of u = g(x)

Substitution with then Z b Z g(b)
definite integrals f (g(x))g ′ (x) dx = f (u) du
a g(a)
population size
250
200
150
100
50
year
1850 1900 1950
Figure 5.33: U.S. population growth (in millions)
Example 7. U.S. Population Growth

The logistic formula

389.2e0.23 t
P (t) =
e0.23 t + e4
provides a reasonably good fit to the population of the United States (in millions) during the period 1790-1990, as
illustrated in Figure 5.33. The variable t is the time (in decades) after 1790. Thus, t = 0 for 1790, t = 20 for 1990.
Suppose that each person eats food at a rate of one ration per year, find the total number of rations of food eaten
in the U. S. between 1790 and 1990.
Solution. Since each person eats at a rate of one ration per year, the rate at which food is being eaten in decade t
is 10 P (t) rations per decade. To find the amount of rations eaten, we integrate 10 P (t) from t = 0 to t = 20.
Z 20 Z 20
e0.23t
10 P (t) dt = 3892 0.23t
dt Let u = e0.23t + e4 , then du = 0.23e0.23t du
0 0 e + e4
If t = 0, then u = 1 + e4
if t = 20, then u = e4.6 + e4
Z e4.6 +e4 du
0.23
= 3892
1+e4 u
Z e4.6 +e4
du
≈ 16, 922
1+e4 u
e4.6 +e4

= 16, 922 ln |u|
1+e 4
≈ 17, 249
Between the years 1790 and 1990, the people living in the United States ate 17, 248, 000, 000 yearly rations of
food!! 2
Figure 5.34: The breathing cycle
Example 8. Breathing
Breathing is a cyclic process, as illustrated in Figure 5.34. One cycle of breathing from the beginning of inhalation
to the end of exhalation takes about five seconds. Since the maximum rate of airflow into the lungs is about 21

liters/second, we could model the rate of air flow into the lungs by the function

1 2π
f (t) = sin t liters/second
2 5
where t is the time in seconds. Find the total amount of air inhaled in one cycle.
Solution. The time for inhalation is 25 seconds and for exhalation is also 52 seconds, so total amount of air inhaled
R 5/2
in one cycle is 0 f (t) dt. Let u = 2π 2π 5 du
5 t so du = 5 dt or dt = 2π . When t = 0, u = 0, and when t = 5/2, u = π.
Hence
Z 5/2 Z
1 2π 1 π 5
sin t dt = (sin u) du
0 2 5 2 0 2π
Z π
5
= sin u du
4π 0
5 π

= (− cos u)
4π 0
5
= [−(−1) − (−1)]
4π
5
=
2π
≈ 0.80 liters
Problem Set 5.5

Problems 1 to 8 present pairs of integration problems, one of which will use substitution and one of which will not.
As you are working these problems think about when substitution may be appropriate.
R4 R4
1. a. 0 (2t + 4) dt b. 0 (2t + 4)−1/2 dt
R π/2 R1
2. a. 0 sin θ dθ b. 0 eθ sin(eθ ) dθ
R π/2 Rπ
3. a. 0 cos t dt b. 0 t cos t2 dt
R4√ R0 √
4. a. 0
x, dx b. −4 −x dx
R 16 √ R1 √ 4
5. a. 0
4
x dx b. 0
x + 2 dx
R R
6. a. x(3x2 − 5) dx b. x(3x2 − 5)5 dx
R √ R √
7. a. x2 2x3 dx b. 6x2 2x3 − 5 dx
R R
8. a. (2x + 1) dx b. (2x + 1)1,000 dx
Use substitution to find the indefinite integrals in Problems 9 to 16.

R
9. (2x + 3)4 dx
R
10. (5x − 2)20 dx
R √
11. x x2 + 4 dx

R
12. √x dx dx
x2 +1
R
13. cot x dx
R
14. sin3 t cos t dt
R ln x
15. x dx
R 3
16. √z dz
z 4 +12
Use substitution to evaluate the definite integrals in Problems 17 to 24.

R2
17. −1 (5x2 − x)2 (10x − 1) dx
R1 5x2 dx
18. 0 2x3 +1
R2 e1/x
19. 1 x2 dx
R1 ln(x+1)
20. 0 x+1 dx
R1 0.58e0.2x
21. 0 1+e0.2x dx
R 12 5,000e0.2t dt
22. 0 e0.2t +10
R2 √
23. 1
x x − 1 dx
R2 x
24. 0 (e − e−x )2 dx
25. Find Z
esin x cos x dx
R8
26. Assume that f is continuous and 1
f (x) dx = 12. Find
Z 2
f (x3 )x2 dx
1
R6
27. Assume that f is continuous and 1
f (2x) dx = −3. Find
Z 12
f (x) dx
2
28. In Example 7, US population growth was modeled by
389.2e0.23 t
P (t) = millions of individuals
e0.23 t + e4
where t is decades after 1790. If each person eats food at a rate of one ration per year, find the total number
of rations of food eaten in the U. S. between 1800 and 1900.
29. Assume that a dust mite population starts with 10 dust mites and grows at a rate of 10e0.3 t dust mites per
hour. How many dust mites will there be one day from now?

30. Suppose an environmental study indicates that the ozone level, L, in the air above a major metropolitan center
is changing at a rate modeled by the function
0.24 − 0.03t
L′ (t) = √
36 + 16t − t2
parts per million per hour (ppm/h) t hours after 7:00 A.M.
a. Express the ozone level L(t) as a function of t if L is 4 ppm at 7:00 A.M.
b. Use the graphing utility of your calculator to find the time between 7:00 A.M. and 7:00 P.M. when the
highest level of ozone occurs. What is the highest level?
31. Gompertz law’s of tumor growth is given by the equation
Z Z
dN
= − a t dt
N ln(N/b)
where N is the size of the tumor, t is time (measures in days), b is the asymptotic size of the tumor, and a is
a measurement of the tumor growth rate. Assume a = 1 and b = 10. Integrate both sides of the Gompertz
equation and solve for N in terms of t. To get rid of the integration constant, assume that N equals 5 at time
t = 0.
32. In Example 6 from Section 1.6, we modeled the uptake of glucose by bacterial populations off of the coast of
Peru by the function
1.2078x
f (x) = micrograms per hour
1 + 0.0506x
where x is micrograms of glucose per liter. Suppose the concentration of glucose x is decaying exponentially
in time: x(t) = 100 e−0.01t micrograms per liter where t is measured in hours.
a. Write down a function U (t) the describes how the uptake rate is changing in time.
b. Determine the net uptake of a cell from t = 0 to t = 6 hours.
33. In Example 4 in Section 2.4, we found that the rate at which wolves kill moose can be modelled by
3.36x
0.42 + x
where x is measured in number of moose per km2 . Suppose that the density of moose is increasing exponentially
according to the function x(t) = 0.1e0.2t moose per km2 where t is measured in hundreds of days. Determine
the number of moose killed by a wolf from t = 0 to t = 3.
34. In Problem 39 in Section 2.4, we examined how wolf densities in North America depend on moose densities.
We found that the following function provides a good fit to the data:
58.7(x − 0.03)
0.76 + x
where x is number of moose per km2 . Assume the moose density is increasing exponentially according the
function x(t) = 0.1e0.2t moose per km2 where t is measured in hundreds of days. Determine the change in the
wolf density from t = 0 to t = 3.

496 5.6. INTEGRATION BY PARTS AND PARTIAL FRACTIONS
5.6 Integration by Parts and Partial Fractions

In this section, we present two important techniques of integration that will be useful in later chapters.
Integration by Parts
Integration by parts is a procedure based on inverting the product rule for differentiation. To derive a formula for
this procedure, we being with the product rule for differentiating functions f (x) and g(x), assuming these derivatives
exist.
d
f (x)g(x) = f ′ (x)g(x) + f (x)g ′ (x) Product rule
Z dx Z
d
f (x)g(x) dx = [f ′ (x)g(x) + f (x)g ′ (x)] dx Antidifferentiate both sides
dx
Z Z Z
d
f (x)g(x) dx = f ′ (x)g(x) dx + f (x)g ′ (x) dx Properties of integrals
dx
Z Z
f (x)g(x) = f (x)g(x) dx + f (x)g ′ (x) dx
′
Z Z
R
f (x)g(x) − f ′ (x)g(x) dx = f (x)g ′ (x) dx Subtract f ′ (x)g(x)dx from both sides.
If we let u = f (x) and v = g(x), then du = f ′ (x) dx, dv = g ′ (x) dx, and we obtain the following simplified
formula.
Z Z
Integration by parts u dv = u v − v du
To evaluate integrals using integration by parts, we would like to choose u and dv so that the new integral is
easier to integrate than the original.
Example 1. Integration by parts
Find Z
x ex dx
Solution. For this example, there are two ways we can choose u and dv. Suppose we choose u = x and dv = ex dx.
We differentiate u and integrate dv. Thus, du = dx, and v = ex . Now, substitute these values into the integration
by parts formula:
Z Z
u dv = uv − v du
Z Z
x ex dx = xex − ex dx
= x ex − ex + C
We noted that there were two possible choices for u and dv in Example 1. The other choice, is to let u = ex and
dv = x dx. If you make this choice, and substitute into the formula for integration by parts, you will obtain the same

5.6. INTEGRATION BY PARTS AND PARTIAL FRACTIONS 497
result. Try this yourself to practice the technique. However, it is not usually the case that both choices of u and v
will work equally easily.
Example 2. When the differentiable part is the entire integrand
Find Z
ln x dx
assuming x > 0.
Solution. Let u = ln x and dv = dx. Then du = dx

x , v = x, so
Z Z
u dv = uv − v du
Z Z
dx
ln x dx = (ln x)x − x
x
Z
= x ln x − dx
= x ln x − x + C = x (ln x − 1) + C
Example 3. Repeated use of integration by parts
Find Z
x2 e2x dx
Solution. Let u = x2 and dv = e2x dx. Then du = 2x dx, v = 12 e2x , and

Z Z
2 2x 2 1 2x 1 2x
x e dx = x e − e (2x dx)
2 2
To compute the right-most integral, we need another application of integration by parts. Let u = x and dv = e2x dx.
Then du = dx, v = 21 e2x and
Z Z
1 2x
x2 e2x dx = x2 − xe2x dx
e
2
Z
1 2 2x 1 2x 1 2x
= x e − x e − e dx
2 2 2
1 2 2x 1 2x 1 2x
= x e − xe + e + C
2 2 4
1 2x 2
= e (2x − 2x + 1) + C
4
In the next example, it is necessary to apply integration by parts more than once, but as you will see, when we
do so a second time we return to the original integral.
Example 4. There and back again

R
Find ex cos x dx.
Solution. For this problem you will see that it will be useful to call the initial antiderivative I and assume that
the constant of integration is 0. That is, let
Z
I = ex cos x dx
Z
= ex sin x − sin x (ex dx) Let u = ex and dv = cos x dx; so
du = ex dx, v = sin x, and use
integration by parts.
Z
x x x
= e sin x − −e cos x − (− cos x)e dx
Let u = ex and dv = sin x dx; so

du = ex dx, v = − cos x, and use
integration by parts again.
Z
= ex sin x + ex cos x − ex cos x dx Simplify
= ex sin x + ex cos x − I Notice the integral is I.
x x
2I = e sin x + e cos x Add I to both sides., since I exists
ex
I = (cos x + sin x) Divide both sides by 2 and
2
factor the common factor on the right.
Hence, the general form of the antiderivative is

Z
ex
ex cos x dx = (cos x + sin x) + C
2
2
Sometimes you need to combine techniques to conquer an integral.
Example 5. Combining substitution and integration by parts

R 2
Find x3 e−x dx.
2
The function x3 e−x is related to the Gaussian or normal distribution, which as its second name suggests is the
most important distribution in statistics.
Solution.
Z Z
3 −x2 dt
x e dx = te−t Substitution: let t = x2 , so dt = 2x dx.
2
Z
1
= te−t dt Integration by parts:
2
u = t, dv = e−t dt; so
du = dt, v = −e−t
Z
1
= t(−e−t ) − (−e−t dt)
2
Z
1
= −te−t + e−t dt
2

1
= [−te−t − e−t + C]
2
1
= − e−t (t + 1) + C Renaming C/2 just C (an arbitrary constant)
2
1 2
= − e−x (x2 + 1) + C
2
Do not forget to return to the original variable.
Integration by parts extends to definite integrals in a natural way.
If f (x) and g(x) are differentiable functions of x on the interval [a, b] then
Integration by Z b b Z b
parts with
f (x)g ′ (x) dx = f (x)g(x) − f ′ (x)g(x) dx

definite integrals a a a
Example 6. Integration by parts with a definite integral
Evaluate Z t
1
se−s/2 ds
4 0
Solution. Let u = s and dv = e−s/2 ds, so that du = ds and v = −2e−s/2 .
Z t t Z t
1 1
se−s/2 ds s(−2e−s/2 ) − (−2e−s/2 ) ds

=
4 0 4 0 0
t
1
−2te−t/2 − 4e−s/2

=
4 0
1
= [−2te−t/2 − (4e−t/2 − 4)]
4
1
= − e−t/2 (t + 2 − 2et/2 )
2
Example 7. Survival to age t
Suppose a biologist has found that for a particular population of monkeys, the proportion of individuals born
each year who die before they are t years old is
Z t
1
p(t) = se−s/2 ds
4 0
a. What proportion of individuals die before the age of 3?
b. What proportion of individuals dies between ages 3 and 4?

c. What proportion of individuals live to be at least age 6?

d. At what rate is the proportion changing at age 1? age 4?
Solution. From Example 6, p(t) = − 21 e−t/2 (t + 2 − 2et/2 ).
a. p(3) = − 21 e−3/2 (3 + 2 − 2e3/2 ) ≈ 0.442; this is about 44% of the population will die before the age of 3.
b. The proportion that will die before the age of 4 is
1
p(4) = − e−4/2 (4 + 2 − 2e4/2 ) ≈ 0.594
2
Thus, the proportion between ages 3 and 4 is
p(4) − p(3) ≈ 0.152
That is, about 15% of the population will die between the ages of 3 and 4.
c. The proportion to live to be at least age 6 is one minus the number that die before the age of 6. We find,
1
p(6) = − e−6/2 (6 + 2 − 2e6/2 ) ≈ 0.800
2
Thus, the desired number is
1 − 0.80 = 0.20
Therefore, we would expect 20% of the individuals to live to at least the age of 6.
d. Using properties of integrals and the Fundamental Theorem of Calculus, we have that
Z Z
d 1 t −s/2 1 d t −s/2
se ds = se
dt 4 0 4 dt 0
1 −t/2
= te
4
1 1
Hence, the proportion is changing at a rate 4 e−1/2 ≈ 0.1516 at age 1 and 4 4 e−2 ≈ 0.1353 at age 4.
2
Partial Fractions
Partial fractions is a method by which you can integrate any rational function
P (x) a0 + a1 x + a2 x2 + . . . + am xm
f (x) = =
Q(x) b0 + b1 x + b2 x2 + . . . + bn xn
Integration problems involving rational functions arise commonly in problems of enzyme kinetics, evolutionary games,
and population dynamics. For example, in describing the growth of a population of size N (t) with a growth that is
negatively impacted by its own size, we may encounter an integral of the form
Z
dN
N (N − 1)
The appropriate integration procedure is to write (expand) the rational function N (N1−1) into a sum of two simpler
functions that we can directly integrate. More specifically, we try to find constants A and B such that
1 A B
= +
N (N − 1) N N −1
Placing the two fractions on the right hand side over a common denominator yields

A(N − 1) + BN A B
= +
N (N − 1) N N −1
A(N − 1) + BN
=
N (N − 1)
(A + B)N − A
=
N (N − 1)
The left and right sides of these rational expressions are identical for all N if and only if the numerators agree:
1 = (A + B)N − A
Hence, we need that A + B = 0 and −A = 1, or in other words, A = −1 and B = 1. Thus, we can write
Z Z
1 −1 1
dN = + dN
N (N − 1) N N −1
Z Z
dN dN
= − +
N N −1
= − ln |N | + ln |N − 1| + C

N − 1
= ln
+C
N
While it is possible to deal with all rational functions, we confine our discussion to rational functions f (x) =
P (x)/Q(x) such that Q(x) can be expressed as a product of n distinct linear factors:
Q(x) = (a1 + b1 x)(a2 + b2 x) . . . (an + bn x)
If the degree of P (x) is less than the degree of Q(x) (i.e. n > m), then one can always find constants A1 , A2 , . . . , An
such that
P (x) A1 A2 An
= + + ...+
Q(x) a1 + b 1 x a2 + b 2 x an + b n x
For more general rational functions, there also exist integration techniques and encourage you to read about these
techniques online or in another calculus text.
Alternatively, if the degree of P (x) is greater than or equal to the degree of Q(x), then you can perform long
division and factor the remainder term. When Q(x) can be decomposed into linear factors, there is a simple method
to determine the coefficients. This method is the Heaviside “cover-up” method. It is named after Oliver Heaviside
(see HistoricalQuest), and this method is discussed in the problem set.
Example 8. Integrating a rational function

R x+2
Find x3 −x dx
Solution. Since we can express x3 − x = x(x2 − 1) = x(x − 1)(x + 1) as a product of distinct linear factors, there
exists constants A1 , A2 , and A3 such that
x+2 A1 A2 A3
= + +
x3 − x x x−1 x+1
A1 (x − 1)(x + 1) + A2 x(x + 1) + A3 x)x − 1
=
x(x − 1)(x + 1)
(A1 + A2 + A3 )x2 + (A2 − A3 )x − A1
=
x3 − x

We see A1 + A2 + A3 = 0, A2 − A3 = 1, and −A1 = 2 to find

3 1
A1 = −2 A2 = A3 = .
2 2
You can also use the Heavyside method (see Problem 35) or technology to obtain the same result.
Thus, we can write
Z Z Z Z
x+2 1 3 1 1 1
= −2 dx + dx + dx
x3 − x x 2 x−1 2 x+1
3 1
= −2 ln |x| + ln |x − 1| + ln |x + 1| + C
2 2
2
Example 9. Second-order chemical kinetics
Consider two compounds A and B that bind to form a third compound C. Assume a and b are the initial
concentrations of A and B. If the rate at which C is produced is proportional to the product of concentrations of
A and B, then it has been shown that the following integral equation holds when y is the concentration of C, k is a
constant of proportionality, and t is time:
Z Z
dy
= k dt
(a − y)(b − y)
Integrate both sides of this equation and solve for y as a function of t assuming that a = 2, b = 1, k = 3 and that
y = 0 when t = 0. Sketch this function.
Solution. Solve
Z Z
dy
3 dt = Given equation
(2 − y)(1 − y)
Z Z
−1 1
3 dt = + dy Method of partial fractions
2−y 1−y
(by calculator or “cover-up” method)
Z Z
1 1
3 dt = + dy
y−2 1−y
3t + C = ln |y − 2| − ln |1 − y|
y − 2

3t + C = ln
y−1
y − 2
e3t+C

= Definition of logarithm
y−1
y−2
= ±e3t+C Definition of absolute value
y−1
If t = 0 then y = 0, so we have
0−2
= ±e3·0+C
0−1
2 = ±eC
Hence, we need the positive solution +eC to equal 2. Thus
y−2
= 2e3t
y−1
y − 2 = 2ye3t − 2e3t
y(1 − 2e3t ) = 2 − 2e3t
2 − 2e3t
y =
1 − 2e3t

The graph is shown in Figure 5.35. To obtain this graph by hand, it is not hard to check that y ′ (t) > 0 and
limt→∞ y(t) = 1.
y
1
0.8
0.6
0.4
0.2
t
1 2 3 4 5
Figure 5.35: Graph of a second-order chemical process.
Problem Set 5.6

Find each integral in Problems 1 to 10.

R
1. xe−x dx
R
2. et sin t dt
R
3. x ln x dx
R
4. x sin(2x) dx
R √
ln√ x
5. x
dx
R
6. x2 ln x dx
R
7. e2x sin 3x dx
R
8. x2 sin x dx
R
9. x sin x cos x dx
R
10. sin−1 x dx. Hint: d
dx sin−1 x = √ 1
1−x2
.
Find the exact value of the definite integrals in Problems 11 to 16 using integration by parts, and then check by using
a calculator to find an approximate answer correct to four decimal places.
R4
11. 0
xe−x dx
Re
12. 1
(ln x)2 dx
Re
13. 1/3 3(ln 3x)2 dx
Rπ
14. 0
x sin x dx
Rπ
15. 0
x(sin x + cos x) dx
Re
16. 1 x3 ln x dx

Find the indicated integrals in Problems 17 to 22. Hint: Use partial fractions, technology, or first learn about the
Heaviside coverup method (Problem 35).
R dN
17. N (1,000−N )
R x+1
18. x(1−x) dx
R x
19. x(x−1000) dx
R x+1 dx
20. (x+2)(x+3)
R dx
21. x(x+1)(x−2)
R 4
22. (x+1)(x+2)(x+3) dx
In Problems 23 to 28, first use an appropriate substitution and then use integration by parts or partial fractions to
evaluate the integral. Remember to give your answers in terms of x.
R
23. cos(ln x) dx
R 2
24. x3 ex dx
R
25. ln x sin(ln
x
x)
, dx
R
26. [sin x ln(2 + cos x)] dx
R 2x
27. e2xe+3edxx +2
R
28. (1−sincos x dx
x)(2−sin x)
R x3
29. (a) Evaluate x2 −1 dx using integration by parts.
(b) Evaluate the integral using partial fractions.
R
30. (a) Evaluate cos2 x dx. Hint: Use the trignometric identity: cos2 x = 12 (1 − cos(2x)).
R
(b) Use part (a) to evaluate x cos2 x dx using integration by parts.

31. The 1988 film Stand and Deliver provides an alternative perspective, tabular integration, on integration by
parts. This technique involves writing down a table with two columns, one labeled D for differentiation and
another labeled I for integration. The first row of D column contains u, the part to be differentiated in the
original integral. The second row in the D column contains du dx . The third row in the D column contains
d2 u
dx2 . Proceed in this manner until the product of the functions in the last row either equals 0 or is a constant
multiple of what you started with. The first row of the I column contains v, the part to integrated. For the
second, Rthird, etc. rows in the column I, place the successive integrals. For example, if we rework Example 1,
namely xex dx with u = x and dv = ex :
D I
x ex
1 ex
0 ex
Now, draw diagonal lines from the first element of the D column to the second element of the I column, from
the second element of D to the third element of the I column, etc. Multiply the elements at the ends of each
of the diagonal lines, take an alternating sum of these products, and add the integral of the product of terms
in the last row. For Example 1, and the table above, we have:
Z Z
x e − 1 · e + 0 dx = e (x − 1) + 0 dx = ex (x − 1) + C
x x x
which is the same results as shown in Example 1. Use this method to find the integral in Example 2.
32. Use the table method from Stand and Deliver (Problem 31) on the integral in Example 3.
33. Use the table method from Stand and Deliver (Problem 31) on the integral in Example 4.
34. Contrast the methods of integration by parts as illustrated by the examples in the text and the table method
from Stand and Deliver (Problem 31).
35. Historical Quest∗ Oliver Heaviside was born in the same London slums as Charles Dickens. Scarlet fever
left him partly deaf. He compensated with shyness and sarcasm. Heaviside finished his only schooling in 1865.
He was 16 and a top student, but he’d failed geometry. He loathed all that business of deducing one fact from
another. He meant to invent knowledge – not to compute it.
Heaviside went to work as a telegrapher. That drew him into the study of electricity. Then he read Maxwell’s
new Treatise on Electricity and Magnetism, and it seemed to have mystical beauty. It changed his life. He quit
work and sealed himself in a room in his family’s house. There he reduced Maxwell’s whole field theory into
two equations. He gave electric theory its modern shape and form. Hertz got the credit for that. But in the
fine print Hertz admits his ideas came from Heaviside.
Next Heaviside picked up the radical new idea of vector analysis. His most important ally was the reclusive
American genius J. Willard Gibbs. Vector analysis won out, but only after Heaviside—this shy man with
his acid pen—had started a war. He brought that war to full pitch a few years later with something called
operational calculus.
He invented this strange new math by leaping over logic. It was a powerful tool, but it wasn’t rigorous. Only
people like Kelvin, Rayleigh, and Hertz saw the brilliance that was driving Heaviside faster than method could
follow. He knew what he was doing. He growled at his detractors, ”Shall I refuse my dinner because I do not
fully understand ... digestion?”
Like vector analysis, Heaviside’s calculus stood the test of time. So did the rest of his work. He gave us the
theory for long distance telephones. His math has served and shaped engineering. Yet his biographer, Paul
Nahin, writes a sad ending.
Heaviside grew sick of fighting and faded off to Torquay in Southwest England. There he lived out his last 25
years in a bitter retreat. He signed the initials W.O.R.M. after his name. That didn’t stand for anything more

Figure 5.36: Dedication of Heaviside monument
than worm. For that was all he could see when he looked into other people’s eyes. A monument to Oliver
Heaviside is shown in Figure 5.36
You do not see much of Heaviside’s name today. But his magnificent works have been woven into the fabric
of our textbooks. He deserved a better end. Yet his huge accomplishments force a happy ending on a sad life.
They also warn us to be alert – to be ready to see raw genius like that when it walks among us.
For this Quest, let us consider a “cover-up” method for determining the coefficients with partial fractions.
Consider the antiderivative from Example 8. Find A1 , A2 , A3 such that
x+2 A1 A2 A3
= + +
x(x + 1)(x − 1) x x+1 x−1
The coefficients are found, one at a time, by “covering” that factor, and evaluating the remaining expression
by the value that causes the “covered” factor to be zero. That is, first “cover” x:
x+2
x (x + 1)(x − 1)
The “covered” factor is 0 when x = 0, so evaluate the non-covered portion at x = 0:
0+2
= −2
(0 + 1)(0 − 1)
Thus, A1 = −2. Next, “cover” the factor under the A2 term:
x+2
x x + 1 (x − 1)
Evaluate for x = −1:
−1 + 2 1
=
−1(−1 − 1) 2
Finally, “cover” the factor under the A3 term:
x+2
x(x + 1) x − 1
Evaluate for x = 1:
∗ From http://www.uh.edu/engines/epi426.htm

1+2
1(1 + 1)
Thus,
x+2 −2 1/2 3/2

= + +
x(x + 1)(x − 1) x x+1 x−1
Explain why this “cover-up” method of Heaviside works.

36. Assume that after t hours on the job, a factory worker can produce 100te−0.5t units per hour. How many units
does the worker produce during the first 3 hours?
37. After t weeks, suppose that contributions in response to a local fund-raising campaign were coming in at the
rate of 2, 000te−0.2t dollars per week. How much money was raised during the first 5 weeks?
38. An actuary measures the probability that a person in a certain population will die at age x by the formula
P (x) = λ2 xe−λx
where λ is a parameter such that 0 < λ < e.

(a) For a given λ, find the maximum value of P (x).
(b) Sketch the graph of P (x).
(c) Find the area under the probability curve y = P (x) for 0 ≤ x ≤ 100, and interpret your result.
39. A population P , grows at the rate √
P ′ (t) = 5(t + 1) ln t + 1
thousand individuals per year at time t (in years). By how much does the population change during the 8th
year?
40. Suppose that a drug is assimilated into a patient’s bloodstream at a rate modeled by
A(t) = 2te−0.31t
where t is the number of minutes since the drug was taken. Find the total amount of drug assimilated into the
patient’s bloodstream during the second minute.
41. Recovering from an environmental perturbation, a (hypothetical) population exhibits dampened oscillations of
the form
N (t) = 100 + 50 sin(2πt)e−0.01t individuals per acre
where t is measured in days. As a part of a sampling effort, a scientist captures and releases individuals from
this population at a rate of 0.1N (t) individuals per acre per day. If the scientist is sampling one acre, determine
the number of individuals she captures and releases in 7 days.
42. Recovering from an environmental perturbation, a (hypothetical) population exhibits dampened oscillations of
the form
N (t) = 50 + 50 cos(πt)e−0.2t individuals per acre
where t is measured in days. As a part of a sampling effort, a scientist captures and releases individuals
from this population at a rate of 0.01N (t) individuals per acre per day. If the scientist is sampling one acre,
determine the number of individuals she captures and releases in 10 days.

508 5.7. NUMERICAL INTEGRATION
5.7 Numerical Integration

We have seen that integration is, in general, a more difficult task than differentiation. In differentiation, knowing
the the derivatives of several elementary functions (i.e. sin x, ex , xn ) and a set of basic rules (i.e. product rule,
chain rule, quotient rule) allows us to differentiate rather complex looking functions. In contrast, integration is
more complicated. The number of rules, special cases, and the uncertainty of which rule to apply makes integration
more of an art than a science. Nonetheless, an optimist would hope that armed with enough rules, and a great
deal of practice, we could express the integral of any reasonable continuous function in terms of familiar functions.
Unfortunately, this is not true.
You may have encountered some of these functions, especially ifR you rely heavily on technology to do your work
2
in calculus. For instance, if we ask a TI-89 calculator to compute e−x dx it returns the √ same integral you input.
More competent software programs, such as Mathematica or Maple, return an answer 21 π erf(x). What is erf(x)?
Looking it up on Maple’s HELP yields the answer:
Rx 2
The error function is defined for all x by erf(x) = √2 e−t dt.
π 0
In other words, Maple’s HELP tells us that the erf function is essentially the integral we started with. Why is
technology of no help for this function?
To answer this question, recall an “elementary function” is a function that can be expressed in terms of sines,
exponentials, power functions, and logarithms, via the usual algebraic processes, including the solving (with or with-
out radicals) of polynomials. Thus, elementary functions are all the “precalculus functions,” including polynomials,
trigonometric, and logarithmic functions. There is a theorem in mathematics that says certain elementary functions
do not have an elementary antiderivative. Examples of these functions include
Z Z Z Z p Z Z
2 sin x dx
ex dx dx sin x2 dx 1 + x3 dx xx dx
x ln x
The more common ones get their own names. For instance, we saw that up to some scaling factors, “erf” is the
2
antiderivative of e−x . We can also find out using technology that “Si” is the antiderivative of sin x/x. Unfortunately,
these functions are not exceptional.
What can we do when we need to integrate such functions that are integrable, but do not have elementary
derivatives? If they are definite integrals, we could approximate them using Riemann sums, or we could use one of
several other approximation schemes. In this section, we discuss four numerical schemes for approximating definite
integrals. Three of these schemes, left endpoint rule, right endpoint rule, and midpoint rule, differ only in the manner
that the sample points xi are chosen. The fourth scheme, Simpson’s rule, involves approximating the function with
piecewise quadratic functions. These schemes differ in how rapidly they converge (as n → ∞) to the true value of
the definite integral. These rates of convergence are described via error estimates.
Left Endpoint and Right Endpoint Approximations

We begin with the simplest of the approximation schemes, left endpoint approximation and right endpoint
approximation. For presentation purposes, our discussion will focus on left endpoint approximation. Analogous
statements apply to the right endpoint approximation.
Rb
Suppose f is a continuous function from x = a to x = b and we want to estimate a f (x) dx. By definition,
Rb
a f (x) dx, is a limit of Riemann sums. Consequently, given n, partition the interval [a, b] into n subintervals with
end points:
a = a 0 < a1 < a2 < · · · < an = b
where a1 = a + ∆x, a2 = a + 2∆x, . . . , an = a + n∆x, and ∆x = b−a n . Taking the left endpoints, x1 = a0 , x2 =
a1 , . . . , xn = an−1 as our sample points, we have
Z b n
X
f (x) dx ≈ f (xk )∆x = Ln
a k=1

5.7. NUMERICAL INTEGRATION 509
As a first example, we begin with a simple function, so that we can examine the error generated by taking an
approximation.
Example 1. Using technology for a left endpoint approximation
We know Z √
4
3 x
dx = 7
1 2
Use the left endpoint rule with n = 5, 10, 25, 50 and 100 to approximate this integral.
√
Solution. We have f (x) = 3 x/2, a = 1, and b = 4. We will show the detail for n = 5, and then use technology
to generate other values. For n = 5, we have ∆x = 4−1
5 = 0.6 and
n xn f (xn ) f (xn ) ∆x
1 1 1.5 0.9
2 1.6 1.897 1.1382
3 2.2 2.225 1.3350
4 2.8 2.510 1.5060
5 3.4 2.766 1.6596
Sum: 6.5388
Thus,
Z 5 √ n
3 x X
dx ≈ f (xk )∆x ≈ 6.539
1 2
k=1
Using technology to generate the approximation Ln the other values for n yields:
n Ln
5 6.539
10 6.772
25 6.910
50 6.955
100 6.977
Hence, the approximation given by technology appears to converging to the known value of 7.
2
We might have added another column to our answer for Example 1. What is the error of the approximation?
That is,
Z b

error = f (x) dx − numerical approximation
a
We calculate the error for each of the entries of Example 1, and find the approximations are underestimates by 0.461,
0.228, 0.090, 0.045, and 0.023 for the respective entries in the table. Notice that the errors tend to decrease as n
increases, as we would expect. However, there are two questions we can ask: First, how quickly do the errors decrease
with n? Second, if we don’t know the true value of the definite integral, how can we estimate the error?
To answer both questions requires introducing error bounds: an upper bound for the magnitude of the error.
These upper bounds often involve understanding the derivatives of the integrand. For example, suppose we know for
some constant K1 > 0 that |f ′ (x)| ≤ K1 for all x between a and b. The evaluation theorem implies that
Z x
f (x) − f (a) = f ′ (u) du
a
for any point x in [a, a + ∆x]. Since f ′ (u) ≤ K1 , the dominance property of integrals implies that
f (x) − f (a) ≤ K1 (x − a) for x between a and a + ∆x

Equivalently,
Inequality I: f (x) ≤ f (a) + K1 (x − a) for x in [a, a + ∆x]
Similarly, since f ′ (u) ≥ −K1 , the dominance property of integrals implies that
Inequality II: f (x) ≥ f (a) − K1 (x − a) for x in [a, a + ∆x]
A graphical interpretation of these two inequalities is shown in Figure 5.37. The graph of f (x) above the interval
R a+∆x
[a, a + ∆x] lies in a triangular wedge with area K1 (∆x)2 . Hence, the error in approximating a f (x) dx with
f (x1 )∆x is less than or equal to K1 (∆x)2 .
Figure 5.37: Estimating errors for the left endpoint rule
Similarly, over any of the subintervals, the error in approximating the actual value with f (xk )∆x is at most
K1 (∆x)2 . Let EL be the error by using a left endpoint approximation. Then, summing the error estimates over the
n subintervals yields
K1 (b − a)2
EL ≤ n · K1 (∆x)2 =
n
where
Z b

EL = f (x) dx − Ln
a
is the error of the left endpoint approximation.
Example 2. Using the Left Endpoint Rule
Consider Z π
sin(x2 ) dx
0
a. Use technology to evaluate this integral.
b. Use the left endpoint rule with n = 10 to approximate this integral.
c. Give an error estimate for the approximation found for n = 10.
d. Find n sufficiently large to ensure that approximation with the left endpoint rule will have an error no
larger than 0.001.
Solution.
a. Using technology, we might get an estimate of 0.77265.

π−0 π
b. We see f (x) = sin(x2 ), a = 0, b = π, n = 10, and ∆x = 10 = 10 . Setting up a table of values (we leave
the details for you), we obtain an estimate of 0.78997.
c. To find an upper bound to the error, we need an upper bound to the derivative of f (x) = sin(x2 ) on the
interval [0, π]. Since
f ′ (x) = 2x cos(x2 )
and | cos(x2 )| ≤ 1,
|f ′ (x)| ≤ 2 x ≤ 2π
for x on [0, π]. Setting K1 = 2π, n = 10, a = 0, b = π, and a = 0 into the error bound yields
2π(π − 0)2
EL ≤ ≈ 6.21
10
Hence, our estimate of 0.78997 does not have very good assured accuracy.
d. We want n such that EL ≤ 0.001
K1 (b − a)2
EL ≤ Left endpoint error formula
n
Thus, we find n such that
K1 (b − a)2
≤ 0.001
n
K1 (b − a)2 ≤ 0.001n
2π(π − 0)2 ≤ 0.001n Substituting known values.
2π 3
≤ n
0.001
62, 012.6 ≤ n
Since n is an integer, this says we need to choose n = 62, 013. Implementing n = 62, 013 with technology,
we obtain
Z π
I= sin x2 dx ≈ 0.77266
0
Thus,
0.77266 − 0.001 ≤ I ≤ 0.77266 + 0.001
0.77166 ≤ I ≤ 0.77366
Our original calculator answer is within this range of accuracy.
The preceding example illustrates several important points. First, even though we initially chose a reasonably
large n (say 10), the error bound was so large that we could not be certain about any of the digits. Second, an
extremely large n is needed to ensure an estimate with accuracy to 0.001. Third, the approximation for the large n
value was not very different than the estimate for the smaller n value of 10.

Midpoint Rule
An alternative numerical approximation scheme is the Midpoint Rule which chooses the sample points to be the
midpoints of each of the subintervals. We begin as we did with the left endpoints and partition the interval [a, b]
into n subintervals with endpoints:
a = a 0 < a1 < a2 < · · · < an = b
b−a
where a1 = a + ∆x, a2 = a + 2∆x, . . . , an = a + n∆x, and ∆x = n . This time, we take the midpoints,
a0 + a1 a1 + a2 an−1 + an
x1 = , x2 = , · · · , xn =
2 2 2
as our sample points to get
Z b n
X
f (x) dx ≈ f (xk )∆x = Mn
a k=1
Associated with the midpoint rule is an error estimate. If |f ′′ (x)| ≤ K2 for x on [a, b], then the midpoint error
bound is
K2 (b − a)3
EM ≤
24n2
where
Z b

EM = f (x) dx − Mn
a
Example 3. Using the Midpoint Rule
Consider Z π
sin(x2 ) dx
0
a. Use technology to evaluate this integral.

b. Use the midpoint rule with n = 10 to approximate this integral.
c. Give an error estimate for the approximation found for n = 10.
d. Find n sufficiently large to ensure that approximation with the midpoint rule will have an error no larger
than 0.001.
Solution.
a. Using the technology
Rπ (e.g. an online integrator or a calculator that numerically approximates integrals)
we might get 0 sin(x2 ) dx ≈ 0.77265.
b. We see f (x) = sin(x2 ), a = 0, b = π, n = 10, and ∆x = π−0
10 =
π
10 . Setting up a table of values (we leave
the details for you), we obtain an estimate of 0.79918141.
c. To find an upper bound to the error, we need an upper bound to the derivative of f (x) = sin x2 on the
interval [0, π]. Since
f ′ (x) = 2x cos x2 , f ′′ (x) = 2 cos x2 − 4x2 sin x2
and | sin x2 | ≤ 1 as well as | cos x2 | ≤ 1,
|f ′′ (x)| ≤ 2 + 4|x|2 ≤ 2 + 4π 2
for x on [0, π]. Setting K2 = 2 + 4π 2 , n = 10, a = 0, and b = π into the error bound yields
(2 + 4π 2 )(π − 0)3
EM ≤ ≈ 0.1706
24 · 102
Hence, our estimate is accurate to within 0.54. Compare this to our error estimate of 6.2 with the left
end point rule.

d. We want n such that EM ≤ 0.001.
K2 (b − a)3
EM ≤ Midpoint error formula
24n2
Thus, we find n such that
K2 (b − a)3
≤ 0.001
24n2
K2 (b − a)3 ≤ 0.001(24n2)
(2 + 4π 2 )(π − 0)3 ≤ 0.001(24n2) Substituting known values.
2π 3 + 4π 5
≤ n2
(0.001)(24)
231.5 ≤ n Since n > 0
Since n is an integer, this says we need to choose n = 232. Compare this to the n = 62, 013 that we
needed for the left end point rule! Using midpoint rule with n = 232, we obtain
Z π
I= sin(x2 ) dx ≈ 0.7727
0
If the midpoint rule only differs from the left and right endpoint rules by shifting the sample points by ∆x/2,
why does it do so much better? We can get a sense of the answer by observing that both the midpoint rule and left
Rb
endpoint rule integrate constant functions perfectly. Indeed, Ln and Mn for a c dx equal c(b − a). Now consider
a linear function f (x) = cx + d on an interval [a, b]. In this case, you can verify (try this for yourselves!) that
EM = 0 while EL = |c|(b − a)/2. Hence, the midpoint rule integrates linear functions perfectly and, consequently, it
only introduces error for functions with nonzero second order derivatives. This explains why the error bound for the
midpoint rule involves second order terms (i.e. |f ′′ (x)| and n2 ). In contrast, since the left endpoint rule introduces
errors for functions whose derivative is nonzero, the error bound for the left endpoint rule involves first order terms
(i.e. |f ′ (x)| and n).
Simpson’s Rule
As the previous example illustrated, the midpoint rule was a significant improvement over the left endpoint rule
since the error bound decreased like 1/n2 instead of 1/n. The small price we had to pay for this improvement was
that bounds are calculated in terms of the second rather than first derivative. With this sweet taste of success in our
mouth, can we do better? It turns out, yes! There is another rule, Simpson’s rule, for approximating integrals using
parabolas to approximate the curve. Unlike the previous method, this method requires breaking the interval into an
even number n of subintervals. Let a = x0 < x1 = a + ∆x < . . . < xn = b be the endpoints of these subintervals of
width ∆x = (b − a)/n. The approximation is given by
Z b
∆x
f (x) dx ≈ [f (x0 ) + 4f (x1 ) + 2f (x2 ) + 4f (x3 ) + . . . + 2f (xn−2 ) + 4f (xn−1 ) + f (xn )] = Sn
a 3
Given |f (4) (x)| ≤ K4 for x on [a, b], the Simpson error bound is
K4 (b − a)5
ES ≤
180n4
where
Z b

ES = Sn − f (x) dx
a

Finally, note that throughout this section we used the notation Kn to represent the bound of the nth derivative
of f (x) for the cases n = 1, 2 and 4: that is
(n)
f (x) ≤ Kn
This convention stresses the fact that each derivative has its own bound.
Example 4. How good is Simpson’s Rule
Compare Simpson’s Rule with the Left-endpoint and Midpoint Rules in calculating the value of
Z π
I= sin x2 dx
0
in terms of
a. the case where the interval is partitioned into n = 10 subintervals
b. the smallest value of n that ensures the approximation is no larger than 0.001.
From these results what to you conclude?
Solution.
a. The approximation given by Simpson’s rule for the case n = 10 can be calculated as outlined in the
following table:
i xi Simpson’s weighting × sin x2i

0 0 1×0
1 π/10 4 × 0.09854
2 π/5 2 × 0.38461
3 3π/10 4 × 0.77598
4 2π/5 2 × 0.99997
5 π/2 4 × 0.62427
6 3π/5 2 × - 0.39995
7 7π/10 4 × -0.99236
8 4π/5 2 × 0.33355
9 9π/10 4 × 0.99016
10 π 1 × -0.43030
Weighted total × π/10 = 0.79503
From the two previous examples and the above table we see that the correct answer to 5 decimal places
is I = 0.77265 and for n = 10:
• The left-endpoint rule approximation is I = 0.78997
• The midpoint rule approximation is I = 0.79918
• The Simpson’s rule approximation is I = 0.79503
b. To find n such that ES ≤ 0.001, we need to solve for n in the inequality
K4 (b − a)5
≤ 0.001
180n4
where K4 is a bound on the fourth derivative of sin x2 on the interval [0, π]. Repeated differentiation
yields
d4
4
sin x2 = 16x4 − 12 sin x2 − 48x2 cos x2 ,
dx


from which we conclude that K4 = 16π 4 − 12 + 48π 2 for x ranging over the interval [0, π]. Since in this
cae (b − a) = π, it follows from the above inequality that the lower bound for n to ensure an error of less
than 0.001 is:
4 (16π 4 − 12) + 48π 2 π 5
n ≥
180 · 0.001
Solving this we obtain n ≥ 43.05. Thus selecting a value n ≥ 44 (remember for Simpson’s rule n must be
even) ensures that the accuracy of the estimate of I is better than 0.001.
Again, from the two previous examples and the above calculation, we see that to ensure an accuracy of
at least 0.001 requires:
• n ≥ 62, 013 for the left-endpoint rule

• n ≥ 232 for the midpoint rule
• n ≥ 44 for Simpson’s rule.
These results suggest that for small values of n, when all methods are relatively inaccurate, fortuitously the over and
underestimates on the various subintervals can cancel out so that the most accurate method is not always the one
that gives the best result for a particular small value of n. As n increases, however, the more accurate the method
the more rapidly it converges on the true solution, as exemplified by the considerable decreases in the values of n
needed to ensure a 0.001 level accuracy for the three rules respectively going from least to most accurate. 2
Example 5. Estimating crab harvests
Dungeness crab (Cancer magister ) is an important commercial fishery along the northeastern Pacific coast (Cali-
fornia to Alaska). The data shown in Figure 5.38 shows the commercial harvest of Dungeness crabs, excluding sport
fishery and non-treaty landings, from 1950 to 1999 off the coast of Washington State. ∗ A subset of this data is
reported in the table below, where the catch is reported in millions of pounds:
Year Catch Year Catch

1950 3.3 1975 8.5
1955 8.5 1980 2.7
1960 5.9 1985 3.9
1965 10.2 1990 6.8
1970 12.6
Use Simpson’s rule to estimate the total amount of dungeness crabs caught between 1950 and 1990.
∗ Data from http://wfcb.ucdavis.edu/www/PopData/Crab/crab.htm. See also Johnson, D. F., L. W. Botsford, R. D. Methot, Jr., and
T. C. Wainwright. 1986. Wind stress and cycles in dungeness crab (Cancer magister) catch off California, Oregon, and Washington.
Canadian Journal of Fisheries and Aquatic Sciences 43(4):838-845

catch
20
15
10
year
1960 1970 1980 1990 2000
Figure 5.38: Dungeness crab harvest in millions of pounds.
Solution. Applying Simpson’s rule with n = 8 yields
5
(3.3 + 4 · 8.5 + 2 · 5.9 + 4 · 10.2 + 2 · 12.6 + 4 · 8.5 + 2 · 2.7 + 4 · 3.9 + 6.8) · ≈ 294.8
3
As the error estimate for Simpson’s rule suggests and as we saw in Example 4, Simpson’s rule has much better
convergence properties (i.e. the rate at which the error decreases with increasing n) the midpoint rule. Why?
Well, the Simpson’s rule integrates a cubic function (i.e. third-order polynomials) perfectly. Hence, only nonzero
fourth-order derivatives result in errors, and the error bound involves fourth-order terms (i.e. |f (4) (x)| and n4 ).
We summarize our four methods of numerical integration with the following box.

Let f be continuous on [a, b]. Divide this interval into n equal parts:
a = a 0 < a1 < a2 < · · · < an = b

Rb
Define ∆x = b−an . The following provide estimates of a f (x) dx with error bounds
given in terms of the bounding parameters |f (i) (x)| ≤ Ki .
Left Endpoint Rule
Ln = [f (a0 ) + f (a1 ) + · · · + f (an−1 )]∆x
and
K1 (b − a)2
EL ≤
n
Right Endpoint Rule
Rn = [f (a1 ) + f (a2 ) + · · · + f (an )]∆x
and
K1 (b − a)2
ER ≤
n
Numerical Midpoint Rule
Integration

a0 + a1 a1 + a2
Mn = f +f + ···
2 2

an−1 + an
+f ∆x
2
and
K2 (b − a)3
EM ≤
24n2
Simpson’s Rule (n is even)
∆x
Sn = [f (a0 ) + 4f (a1 ) + 2f (a2 ) + 4f (a3 ) + · · ·
3
+2f (an−2 ) + 4f (an−1 ) + f (an )]
and
K4 (b − a)5
ES ≤
180n4
A final simple example reinforces how efficient Simpson’s method is compared with the Left Endpoint and
Midpoint Rules in converging to a solution.
Example 6. Comparing the the efficiency of convergence to a solution
Consider Z 3
dx
1 x
How large do we need n to ensure that
a. EL ≤ 0.0001?
b. EM ≤ 0.0001?

c. ES ≤ 0.0001?
Solution. We have f (x) = x−1 , f ′ (x) = −x−2 , f ′′ (x) = 2x−3 , f ′′′ (x) = −6x−4 , and f (4) = 24x−5 . We also note
that a = 1 and b = 3.
K1 (b−a)2
a. The maximum value of |f ′ (x)| on [1, 3] is K1 = 1. Since |EL | ≤ n , we need
1 · (3 − 1)2
≤ 0.0001
n
4
≤ n
0.0001
40, 000 ≤ n
Hence n = 40, 000 will suffice.

K2 (b−a)3
b. The maximum value of |f ′′ (x)| on [1, 3] is K2 = 2. Since |EM | ≤ 24n2 , we need
2 · (3 − 1)3
≤ 0.0001
24n2
16
≤ n2
24(0.0001)
2
6, 666 ≤ n2
3
81.6 ≤ n
Hence n = 82 will suffice.

K4 (b−a)5
c. The maximum value of |f (4) (x)| on [1, 3] is K4 = 24. Since |ES | ≤ 180n4 , we need
24 · (3 − 1)5
≤ 0.0001
180n4
768
≤ n4
180(0.0001)
14.4 ≤ n
Since Simpson’s rule requires an even number of intervals, n = 16 will suffice.

2
Problem Set 5.7

Approximate the integrals in Problems 1 to 12 using
a. left endpoint rule
b. right endpoint rule
c. midpoint rule
d. Simpson’s rule

R2
1. 1
x2 dx with n = 4
R4√
2. 0 x dx with n = 6
R1
3. 0
cos 2x dx with n = 4
R 2 −1
4. 1 x dx with n = 6
R1 1
5. 0 1+x2
dx with n = 4
R0 √
6. 2
−1 1 + x dx with n = 4
R2
7. 0 x cos x dx with n = 6
R2
8. 0
xe−x dx with n = 6
R1 1
9. 0 1+x3 dx with n = 4
Rπ
10. 0 sin x dx with n = 4
R2
11. −2
cos x2 dx with n = 6
R2
12. 0 e−x dx with n = 6
Estimate the value of the integrals in Problems 13 to 22 to within the prescribed accuracy.
R2 √
13. 0 x 4 − x dx, |EL | < 0.01
R4√
14. 1 x dx, |EM | < 0.01
R0
15. −2 ex dx, |ES | < 0.01
R π/2
16. 0
cos2 θ dθ, |ER | < 0.1
R π/2
17. 0
cos2 θ dθ, |EM | < 0.01
Rπ
18. 0
sin(2θ) dθ, |ES | < 0.01
R1
19. 0
esin x dx, |ER | < 0.001
R1
20. 0 esin x dx, |ES | < 0.001
R2
21. 0
sin(x3 ) dx, |EL | < 1
R5 1
22. 1 x dx, |ES | < 0.001
In Problems 23 to 28, determine how many subintervals are required to guarantee accuracy to within 0.00005 using:
a. the midpoint rule
b. Simpson’s rule
R4
23. 1 x−1 dx
R 4 dx
24. 1 √ x
R2
25. 0
cos x dx
R1
26. 0 e−2x dx

R4
27. −1
(x3 + 2x2 + 1), dx
R3 √
28. 1
ln x dx
29. Estimate the area in the graph in Figure 5.39 using left endpoint rule, right endpoint rule, and Simpson’s rule.
Figure 5.39: Estimate shaded area
30. Estimate the area in the graph in Figure 5.40 using left endpoint rule, right endpoint rule, and Simpson’s rule.
Figure 5.40: Estimate shaded area
31. Area of a circle: Since elementary school, you have been told that the area, A, of a circular disk with radius
r is πr2 . In this problem, you are asked to prove
√ this formula using integration and substitution. Since one
quarter of a circle of radius r is given by y = r2 − x2 with x between 0 and r, the area of a disk of radius r
is given by Z rp
A=4 r2 − x2 dx
0
To compute this integral requires a very clever substitution: let x = r sin θ. Make this substitution and complete
the integration. Hint : To integrate cos2 (θ), use the trigonometric identity
1
cos2 (θ) = [1 − cos(2θ)]
2
32. Area of a circle (continued): Consider a circle with radius 1. From Problem 31,
Z 1p
π
1 − x2 dx =
0 4
Use this result to estimate π correct to one decimal place by applying Simpson’s rule to this integral and using
appropriate error estimates.

33. Black Plague revisited: Recall that for the outbreak of the Black Plague in Bombay in 1905-1906, the
mortality rate due to the plague was approximated by W.O. Kermack and A.G. McKendrick with the function
f (t) = 890 sech2 (0.2 · t − 3.4)
deaths per week.
a. Write a definite integral that represent the number of deaths that accumulated from t = 0 to t = 30.
b. Estimate the definite integral using Simpson’s rule with n = 10.
34. The data set for 80 hours of the discharge (in m3 /s) for the Raging River is shown below:
The data set for the first 24 hours is summarized in the following table.
hour m3 /s hour m3 /s hour m3 /s hour m3 /s hour m3 /s hour cm3 /s

2 5.41 4 5.25 6 5.10 8 5.00 10 4.81 12 4.67
14 4.49 16 4.29 18 4.19 20 4.06 22 3.97 24 3.83
Use Simpson’s rule to estimate the total amount of discharge in the first 24 hours.
35. Sweet corn has a lower development threshold of 50◦ F and requires 1587 degree days to complete development.
On July 3rd, 2006, the temperatures in Northen Illinois were as follows (measurements performed by the
Northern Illinois Agronomy Research Center)
hour temperature hour temperature

0 66.7
1 66.7 13 72.1
2 65.3 14 75.4
3 66.0 15 77.9
4 69.6 16 79.7
5 70.2 17 80.8
6 69.1 18 81.0
7 68.7 19 80.6
8 68.7 20 79.2
9 68.5 21 76.5
10 68.7 22 75.6
11 69.3 23 73.9
12 70.2 24 73.0
a. Using the right end point rule, estimate the number of degree days that elapsed on this summer day.
b. If the temperatures on July 3rd typified the temperatures throughout the summer, estimate how
many days it would take sweet corn to mature.
36. Repeat Problem 35 using Simpson’s rule. Do you expect your answer to be more or less accurate than the
answer to Problem 35?

37. The weekly rate of cases of influenza A (strain Unk ) studied by WHO/NREVSS during the 2003–2004 season
is plotted below
cases per week

2500
2000
1500
1000
500
weeks
42 44 46 48 50 52 54
Estimate the total number of cases (i.e. the area under the curve) from week 40 to week 56 using Simpsons’s
rule on two week intervals.
Takakazu Seki Kōwa (1642-1708)
Takakazu Seki Kōwa was born in Fujioka, Japan, the son of a sumurai, but was adopted by a patriarch of the
Seki family. Seki invented and used an early form of determinants for solving systems of equations, and he also
invented a method for approximating areas that is very similar to the rectangular method introduced in this
section. This method, known as the yenri (circle principle), found the area of a circle by dividing the circle
into small rectangles, as shown in Figure 5.41.
Figure 5.41: Early Asian calculus
The sample shown in Figure 5.41 was drawn by a student of Seki Kōwa. For this Quest, draw a circle with radius
10 cm. Draw vertical chords through each centimeter on a diameter (you should have 18 rectangles). Measure
the heights of the rectangles and approximate the area of the circle by adding the areas of the rectangles.
Compare this with the formula for the area of this circle.

Roger Cotes (1682-1716)
Isaac Newton invented a preliminary version of Simpson’s rule. In 1779, Newton wrote an article to an
addendum to Methodus Differentials (1711)in which he gave the following example: If there are four ordinates
at equal intervals, let A be the sum of the first and fourth, B the sum of the second and third, and R the
interval between the first and fourth; then · · · the area between the first and fourth ordinates is approximated
by 81 (A + 3B)R. This is known today as the “Newton-Cotes three-eighths rule,” which can be expressed in the
form Z x3
3
f (x) dx ≈ (y0 + 3y1 + y3 )∆x
x0 8
Roger Cotes and James Stirling (1692-1770) both knew this formula, as well as what we call in this section
Simpson’s rule. In 1743, this rule was rediscovered by Thomas Simpson (1710-1761). Estimate the integral
Z 3
tan−1 x dx
0
using the Newton-Cotes three-eights rule and then compare with an approximation using left endpoints (rect-
angles) with n = 8. Which of the rules gives the most accurate estimate?

524 5.8. APPLICATIONS OF INTEGRATION
5.8 Applications of Integration

In the preceding sections of this chapter, we motivated definite integration with area under a curve and accu-
mulated change. In this section, we present some additional applications that utilize Riemann sums to formulate
integrals for survival and renewal processes, estimating cardiac output, and computing work.
Survival and Renewal
Survival and renewal is the study of a population, or group of individuals, with the goal of predicting the size of
the group at some future time. In the following example, a survival function gives the fraction of individuals in a
group, or population, that can be expected to remain in the group for any specified period of time. In addition, a
renewal function gives the rate at which new members arrive. Survival and renewal problems arise in many areas
of study, including sociology, ecology, demography, and even finance, where the “population” is the number of dollars
in an investment account, and “survival and renewal” refer to results of an investment strategy.
Example 1. Survival and renewal in a clinic
A new county mental health clinic has just opened. Statistics from similar facilities suggest that the fraction of
patients who will still be receiving treatment at the clinic t months after their initial visit is given by the survival
function s(t) = e−t/20 . The clinic initially accepts 300 people for treatment and plans to accept new patients at the
rate of 10 per month. Approximately how many people will be receiving treatment at the clinic 15 months from
now?
Solution. Since e−15/20 is the fraction of patients whose treatment we expect to continue at least 15 months, it
follows that of the current 300 patients, only 300e−15/20 ≈ 141.7 will still be receiving treatment 15 months from
now.
Each month, however, 10 new patients enter, and some of these will also still be around at month t = 15. To
account for this, we divide the 15-month time interval [0, 15] into n equal subintervals, each of length ∆t = 15 n
months. Let tk = (k − 1)∆t denote the beginning of the kth subinterval for k = 1, ..., n. Since new patients are
accepted at the rate of 10 per month, the number of new patients accepted during the kth subinterval is 10 ∆t.
When ∆t is small, we can estimate 15 − tk to be the time that elapses for all of these patients by the 15th month.
Consequently, approximately
e−(15−tk )/20 10 ∆t
of these patients will still be receiving treatment 15 months from now. Thus the total number of patients arriving
at times tk , k = 1, ..., n that are still receiving treatment at time t = 15 is approximated by the sum
n
X
e−(15−tk )/20 10∆t
k=1
As n → ∞, we obtain the integral

n
X Z 15
lim e−(15−tk )/20 10∆t = 10e(t−15)/20 dt
n→∞ 0
k=1
which is also referred to as the renewal function.

Adding this integral to 141.7, the number of original patients who will still be receiving treatment after 15 months,
we have that the total number of patients who will be receiving treatment at time t = 15 is
Z 15
141.7 + 10e(t−15)/20 10 dt
0
15
141.7 + 200e(t−15)/20

=
0
≈ 141.7 + 105.5
= 247.2

5.8. APPLICATIONS OF INTEGRATION 525
That is, 15 months from now, the clinic will be treating approximately 247 patients. 2
This example provides a guide to developing a more general formulation for survival renewal processes. More
generally, suppose a population initially has N0 individuals, receives new individuals (renews) at a rate r(t), and the
fraction of individuals remaining (surviving) in the population after t units of time after entering the population is
s(t). If we want to determine the number of individuals in the population at time T , we can divide the interval [0, T ]
into subintervals of width ∆t = T /n. The number of individuals arriving into the population during the kth time
interval is approximately r(tk )∆t. The fraction of these r(tk )∆t individuals surviving to time T is approximately
s(T − tk ). Hence, the number of individuals entering during the kth time interval and surviving to time T is
approximately s(T − tk )r(tk )∆t. Summing up over all these time intervals yields
n
X
new individuals surviving to time T ≈ s(T − tk )r(tk )∆t
k=1
Taking the limit as n → ∞ yields

Z T
new individuals surviving to time T = s(T − t)r(t)dt
0
Of the N0 individuals that were initially present, s(T ) of them survive to time T . Hence, the number of individuals
in the population at time T is given by the following survival and renewal function.
Suppose there are N0 individuals initially present, a fraction of s(t) individuals

survive a period of length t, and individuals arrive at a rate of r(t) individuals per
unit time at time t. Then the total number of individuals present at time t = T is
Survival and given by the survival renewal equation
Renewal Equation Z T
s(T )N0 + s(T − t)r(t) dt
0
Example 2. Fire Ants
The imported fire ant (Solenopsis invecta) (Figure 5.42) is a pest in both urban and rural areas. Damage estimates
for the U.S. range in the millions of dollars.
The fire ant has colonies in which workers live approximately 10 to 70 weeks and queens survive for about
seven years. A single colony can have from 10 to 100 or more queens, each producing 1000 to 1500 eggs per year
for 7 years. Suppose a newly formed colony with 100 queens in which each queen produces workers at a rate of
1, 250 + 250 sin(2π t) workers per queen per year, and in which the fraction of workers living t years after their birth
is given by s(t) = e−1.25t . Find the number of workers in the colony seven years from now, assuming all 100 queens
survive the seven years under consideration.
Solution. Initially there are no workers and N0 = 0. The rate at which workers are renewed is
r(t) = 100 · (1250 + 250 sin(2π t))

= 125, 000 + 25, 000 sin(2π t) workers/year
The survival function is given by s(t) = e−1.25 t . Setting T = 7 into the renewal equation
Z T Z 7
s(T − t)r(t) dt = [125, 000 + 25, 000 sin(2πt)]e−1.25(7−t) dt
0 0
≈ 96, 157 By calculator

Figure 5.42: Fire ants were imported from South America
If you do not wish to use a calculator, you can integrate by using the addition rule first, then substitution where
u = −1.25(7 − t) for the first integral, and integration by parts (twice) for the second integral. The renewal equation
predicts that we should expect the colony to have around one hundred thousand workers seven years from now. 2
Another application of the renewal equation is in the area of finances, where we may well be concerned with
the precision of the second and subsequent significant digits in predicting the growth of economies or our personal
fortunes. The key difference in this application is that instead of calculating how capital (population of dollars)
decays, we are interested in how capital grows. (Note: decay is the negative of growth).
Before considering a financial example, we need to review some terminology. If you deposit money into an interest-
bearing account and are paid interest only on the amount of the original deposit, we call it simple interest. On the
other hand, if after some suitable period of time, you receive interest not only on the original deposit, but also on all
the interest paid up to that point in time, we call it compound interest. Common (discrete) periods of compounding
are annually, semi-annually, quarterly, or daily. In many applications, however, it is common to assume a continuous
compounding, which assumes that the money flows at each instant of time. Although it is not common practice in
banking systems to compound continuously, it is a reasonable approximation for daily compounding. Recall from
Example 6 in Section 3.7 that compounding continuously at a rate of c% per year implies that if you put N dollars
into the account, then t years later there will be ect/100 N dollars in the account.
Example 3. Saving for retirement
Starting at age 20, Peggy Sue puts money into a retirement account at a rate of $2,000 per year. The money in
this account is compounded continuously at a rate of 10% per year. How much money will be in her account when
she turns 60? How much would she have if she started at the age of 30?
Solution. To determine the total amount in Peggy Sue’s account, let us break up the time interval [0, 40] into
n subintervals of width ∆t = 40/n. The amount of money she puts into the account during the kth time interval
[(k − 1)∆t, k∆t] is approximately 2, 000∆t. Over the 40 − k∆t year period this money grows to approximately
e0.1·(40−tk ) 2000∆t where tk = k∆t. Hence, the total amount of money she has at age 60 is approximately
n
X
e0.1·(40−tk ) 2, 000∆t
k=1

Taking the limit as n → ∞ yields Z 40

e0.1(40−t) 2, 000 dt
0
Integrating yields
Z 40
e0.1(40−t) 2, 000 dt ≈ $1, 071, 960
0
She would be a millionaire! Alternatively, if she started saving at the age 30, then at age 60 she would have
Z 30
e0.1(30−t) 2000 dt ≈ $381, 711
0
Not even close to being a millionaire! 2
The solution to Example 3 shows that if money is being adding to an account at a rate of r(t), the account is
continually compounded at an interest rate of c%, and there is initially N0 dollars in the account, then the total
amount in the account T years from now is
Z T
N0 ecT /100 + ec(T −t)/100 r(t) dt
0
This is just another survival renewal equation with s(t) = ecT /100 .
Cardiac Output
Cardiac output is the volume of blood pumped by the heart in a specified interval of time. Estimating cardiac output
is important as it is an indicator of certain heart diseases. A schematic for a typical heart is shown in Figure 5.43.
Figure 5.43: Schematic of the heart
Cardiac output can be measured using dye dilution. A known quantity of dye, say D mg, is injected into a main
vein near the heart. This dye circulates with the blood through the body (from the right ventricle of the heart to
the lungs to the left ventricle and into the arteries) and returns to left heart ventricle. The concentration of the dye,
c(t) mg/L, passing through an artery is monitored. To compute cardiac output from these recorded concentrations,
assume that the cardiac output (i.e. blood flow) remains a constant rate, F L/s, during the experiment. The rate
at which dye is passing through the artery at time t seconds is given by F · c(t) mg/s. Notice how the units work
out here: c(t) has units mg/L and F has units L/s. Hence c(t) · F has units
mg L mg
· =
L s s

Assume that the entire amount of dye passes through the artery between time t = 0 and t = T . The net amount of
dye passing through the artery over the time interval from 0 to T is
Z T Z T
F · c(t) dt = F c(t) dt
0 0
By conservation of mass, the net amount of dye observed must equal the initial amount of dye, D:
Z T
F c(t) dt = D
0
Solving for the cardiac output, F , yields

D
cardiac output F = RT
0
c(t) dt
Example 4. Dye dilution
A (hypothetical) patient is given an injection of 5 mg of dye. The measured concentrations of dye are recorded
in the following table
t c(t) t c(t) t c(t) t c(t)
0 0.00 6 4.84 12 4.74 18 0.30
1 0.20 7 5.67 13 3.76 19 0.15
2 0.77 8 6.19 14 2.75 20 0.05
3 1.63 9 6.35 15 1.82 21 0.00
4 2.69 10 6.13 16 1.10
5 3.81 11 5.57 17 0.60
A plot of this data is given in Figure 5.44. Use Simpson’s rule to estimate the cardiac output of the patient.
dye concentration
6
5
4
3
2
1
seconds
5 10 15 20
Figure 5.44: Dye concentrations in the heart after an injection
Solution. We want to use Simpson’s rule with ∆t = 1. However, we have 22 data points and consequently 21 (an
odd number) of intervals. Since Simpson’s rule requires an even number of intervals and c(21) = 0, we omit the last
data point and make the following approximation:
Z 20
1
c(t) dt ≈ · (0 + 4 · 0.2 + 2 · 0.77 + . . . + 4 · 0.15 + 0.05)
0 3
≈ 59.1

Since the initial amount of dye is 5 mg, we get
D
cardiac output F = RT
0
c(t) dt
5
≈
59.1
≈ 0.085L/s
≈ 5.1L/min
Example 5. Thermodilution
An alternative approach to measuring cardiac output is a pulmonary artery catheter which allows rapid easy
measurements of cardiac output using thermodilution. The principle of thermodilution is the same as dye dilution.
Instead of injecting dye, the doctors inject 10 milliliter of a cold dextrose solution. As the cold solution mixes with
the blood in the heart, the temperature variations in the blood leaving the heart are measured. A hypothetical
temperature variation curve may be described by the function
f (t) = 0.1t2 e−0.3 t degrees celsius
This curve is plotted in Figure 5.45.
temperature
0.6
0.5
0.4
0.3
0.2
0.1
seconds
10 20 30 40 50 60
Figure 5.45: Temperature variation in the heart due to an injection of cold dextrose
Assuming the temperature of a body is 37◦ C and the temperature of the dextrose solution is 0◦ C, estimate the
cardiac output of a patient over a one minute time interval.
Solution. This example is just like the previous example replacing dye concentration with temperature variation.
The initial “amount of cold” (the equivalent of the initial amount of dye) is given by
10ml · (37 − 0)◦ C = 370 ml-◦ C
If the cardiac output is F , then the rate of “cold” passing by at time t is
F · f (t) ml-◦ C/s

R 60
The accumulated change in cold is F · 0
f (t) dt ml-◦ C which must equal 370 ml-◦ C. Hence
370
F = R 60
0
0.1t2 e−0.3 dt
Applying integration by parts (twice) and evaluating yields

370
F = R 60 ≈ 49.95
0
0.1t2 e−0.3 dt
We can convert units to get 49.95 ml/s ≈ 3.00 liters/min.

2
Work
How much pasta should I eat to dig a post hole? 10 post holes? How much energy should a Grizzly bear expend
to dig out a pocket gopher? To answer these and other questions, requires understanding the relationship between
work and energy. To complete any work, we need energy. A standard unit of energy is a calorie. A calorie is the
amount of energy required to heat 1 gram of water 1 degree Celsius. What does this mean? At a website on counting
calories, Paul Doherty writes:∗
In my undergraduate biophysics physics course at MIT, Professor George Benedek burned a peanut. That
may not sound impressive, but it was. Professor Benedek stood in the front of a small 50 seat lecture
hall. He was a middle age man who had the build of a swimmer under a tweed suit, and he always wore
white socks. He held the peanut in a loop of wire made from a bent paper clip and held the bent paper
clip in a pair of pliers. He positioned the peanut under a test tube which contained ten grams of water.
Beneath the peanut was a large pan filled with water. A very large fire extinguisher stood on the floor
nearby. I thought the fire extinguisher was excessive for a single peanut. For that matter, so was the pan
of water.
Then professor Benedek set the peanut on fire. The peanut burned, and burned, and burned, and then
burned some more. Drops of flaming oil oozed from the nut and dripped into the pan of water. The
water in the test tube started to boil. When the peanut finally burned out, there were only eight grams
of water left. Not only had the peanut heated the water from room temperature to 100 degrees Celsius,
it had also boiled away two grams of water.
Heat flowed from that burning peanut as combustion converted the hidden chemical energy stored in the
nut into the easily measured energy of heat flow. When you eat a peanut, your body does the same sort
of thing: it converts the energy stored in the peanut into the energy it needs to keep you running. As
professor Benedek’s demonstration showed, a little bit of food stores a great deal of energy in its chemical
bonds.
Physicists measure the energy content of food by burning the food. To a physicist, a calorie is the heat
flow needed to raise the temperature of one gram of water by one degree Celsius. After burning that
peanut, professor Benedek turned to the blackboard and calculated the calories that the peanut had
produced. The burning peanut warmed ten grams of water from tap water temperature, 20 degrees
Celsius, to boiling, 100 degrees Celsius; a temperature increase of 80 degrees Celsius. This temperature
increase required 800 calories of heat flow. The heat flowing from the peanut then boiled away two grams
of water, which took 1080 calories more, since 540 calories are needed to boil a gram. All in all, one
burning peanut delivered 1880 calories to the test tube of water.
Did you think a peanut could have this many calories? The reason for the surprise is that a single dietary calorie
(i.e. the type of calories that are reported with food), which is abbreviated by “Cal” or just simply C, is actually
one kilocalorie abbreviated by “kcal.” Hence, a peanut has only 1.88Cal.
What can we do with all this energy? Not just heat up water. We can do work. A standard definition of work
caused by a constant force is given in the following box.
∗ From http://isaac.exploratorium.edu/ pauld/activities/food/countingcalories.html

If body moves a distance d in the direction of an applied constant force F , the

Work done by work, W , done is
a constant force W = Fd
The standard work unit is given by
1 Joule (J) =1 kilogram-meters2/second2 (kg-m2 /s2 )
On earth, the acceleration due to gravity is 9.81 m/s2 , so on earth
1 J=1 kg-m2 /s2
9.81 J = 1 kg-m
A joule can be related to dietary calories by the formula
1 Cal = 4, 184 J
Example 6. Calories consumed by working out
How much work is done lifting 30 kg 20 meters? (This is equivalent to 40 arm curls lifting about 67 lb). Give
your answer in Joules and in Calories.
Solution.
W = FD
= (30 · 9.81) · 20
= 5, 886 J
5, 886
= Cal
4, 184
≈ 1.41 Cal
All that work, and so little to show for it? Well actually, we are not 100% efficient in translating Calories from
food to work. Roughly humans have 10% efficiency (all that overhead from maintaining body temperature etc.).
Thus, we might estimate the number of calories being burnt off as 14 Cal. 2
Example 7. Climbing Mountains with a candy bar
The website Calorimetry∗ reveals that a Milky Way candy bar contains more energy than a stick of dynamite.
The candy bar contains 270 Cal. If the energy from the Milky Way bar is used with 100% efficiency, determine how
high (in meters) a 70 kg human could be lifted with the energy from the Milky Way bar.
Solution. First find the number of Joules:
270 Cal · 4, 184 = 1, 129, 680 J
Since the amount of work required to lift a 70 kg man x meters is
70 · 9.81 · x = 686.7x
∗ http://isaac.exploratorium.edu/ pauld/activities/food/countingcalories.html

Thus,
686.7x = 1, 129, 680
x ≈ 1, 645
This is almost twice the height of the cliff face of Yosemite’s El Capitan. (See Figure 5.46.) No stick of dynamite
can do that! In fact, an ounce of dynamite produces only one quarter as many calories when it explodes as an ounce
of sugar does when it is burnt!
Figure 5.46: El Capitan, Yosemite
Often we try to achieve great things by doing work. For example, in the children’s story, Mike Mulligan and his
Steam Shovel, Mike and Mary Anne (the steam shovel) dug canals for boats to travel through, cut through large
mountains for railways, and hollowed out deep cellars for skyscrapers. Using calculus, one can actually compute the
amount of work required to accomplish such feats.
Example 8. Mike Mulligan and Mary Anne
Consider the cellar that Mike and Mary Anne dug for the folks of Popperville. Since the dimensions of this cellar
are not reported in the book, lets assume that it was 7 meters deep, 100 meters long, and 50 meters wide. How much
work would it take to dig this cellar?
Solution. To answer this question, we need to know approximately the density of soil. Checking on the World
Wide Web, we find that it is approximately the same density as water. Namely, one cubic centimeter (i.e. milliliter)
has a mass of one gram. Since our problem is phrased in meters and kilograms, we need to translate this statement
into these units. Since one cubic meter equals 1003 cm3 , one m3 of water has a mass of 1, 000, 000 g= 1, 000 kg. If we
assume that the density of soil and water are the same (soil is a mixture of air and particles, the former much less
dense and the later much more dense than water), then each cubic meter of soil also weighs 1, 000 kg (also called a
metric ton).
The amount of work required to lift one scoop of dirt to the ground level depends on the depth of that scoop of
dirt. Dirt at the bottom of the cellar has to be lifted higher than dirt at the top of the cellar. To find the amount
of work, we envision cutting the cellar up into n thin horizontal slices of thickness ∆x = 7/n (see Figure 5.47). Let
x denote the depth of a slice in meters.

Figure 5.47: Slicing a cellar into horizontal slices
The volume of a slice with thickness ∆x at depth x is

Volume of slice = (100 m) · (50 m) · (∆x m) = 5, 000 · ∆x m3
The mass of this slice is given by
Mass of slice = (1, 000 kg/m3 )(5, 000∆x m3 ) = 5, 000, 000∆x kg
The weight of this slice is given by
2 2
Weight of slice = (9.81 m/s )(5, 000, 000∆x kg) = 49, 050, 000∆x kg-m/s
If this slice is at depth x meters, then the work required to lift the slice is
Work to lift slice ≈ 49, 050, 000 x ∆x J
If the depths of the slices are x1 , x2 , . . . , xn , then the work to dig the cellar is the sum of the work to lift all of the
slices is approximately given by
Xn
49, 050, 000 xk∆x J
k=1
Letting ∆x get smaller and smaller should yield better and better approximation. Consequently, taking the limit as
n → ∞ yields
Z 7
WORK = 49, 050, 000 · x dx = 1, 201, 730, 000 J
0
Equivalently, 287, 219 Cal: a considerable amount of work! 2
Example 9. A hungry grizzly
The following quote was uncovered with Google:

SMITHSONIAN (unknown date) - Since 1983, Steve and Marilynn French watch grizzlies in Yellowstone
National Park - at a distance so as not to habituate the bear to their presence... Steve tells about watching
a bear digging a trench 20 feet long to get a “little gooey gopher.” He says he can’t perceive that its
worth the energy, but they get all excited when they hear that little guy squeak. “It’s kind of like a
Twinkie.”

Figure 5.48: A Twinkie for Spirit? Spirit is the first Montana grizzly to reside at the Grizzly & Wolf Discovery
Center.
Assuming the trench has a semicircular cross-section with radius 1 meter and the density of soil is 1,000 kg/m3 ,
find the amount of work performed by the Grizzly bear.
Solution. First, we need to express 20 feet as ≈ 6.1 meters. To determine the approximate amount of work, we
slice the trench into n slices of thickness ∆x meters (see Figure 5.49)
Figure 5.49: Grizzly’s trench
To determine the width w of a slice at depth x meters, we use the fact that the cross-sectional profile of the
trench is a semi-circle of radius 1. Thus (w/2)2 + x2 = 1 so that
p
w = 2 1 − x2
The volume of a slice at depth x meters is approximately

p p
2 1 − x2 |{z} ∆x = 12.2 1 − x2 ∆x m3
6.1 |{z}
| {z }
width length height
The weight of the slice is approximately
2 3
p p 2
(9.81m/s )(1, 000kg/m )(12.2 1 − x2 ∆x m3 ) = 119, 682 1 − x2 ∆x kg-m/s

The amount of work to lift a slice at depth x is

p
119, 682 x 1 − x2 ∆x J
If x1 , x2 , · · · , xn are depths of the n slices, then the total work is approximately

n
X q
119, 682xk 1 − x2k ∆x J
k=1
Taking the limit as n → ∞ yields

Z 1 p
work = 119, 682 x 1 − x2 dx
0
≈ 39, 894 J By calculator or substitution where u = 1 − x2 .
≈ 39, 894/4, 184 ≈ 9.5 Cal
The answer of 9-10 calories seems surprisingly few calories! Of course all this presumes that the Grizzly was able
to perform the work 100% efficiently, which is certainly not the case. For instance if the Grizzly worked with 5%
efficiency, then he used about 200 calories—the number you find in one Twinkie. 2
Problem Set 5.8

Reconsider the mental health clinic Example 1. For problems 1 to 6, calculate the number of patients in the clinic
after 15 months if the patient survival rate s(t) and the renewal rate r(t) are as given.
1. s(t) = e−t/20 and r(t) = 20 per month. How did doubling the renewal rate change the answer from what we
found in Example 1?
2. s(t) = e−t/40 and r(t) = 10 per month. How did halving the survival rate change the answer from what we
found in Example 1?
3. s(t) = e−t/10 and r(t) = 20 per month.
4. s(t) = e−t/40 and r(t) = 20 per month.
5. s(t) = e−t/20 and r(t) = 10 + t per month.
1
6. s(t) = 1+t and r(t) = 10 per month.
Reconsider the fire ants in Example 2. For problems 7 to 12, calculate the number of workers after 5 years if the
worker survival rate s(t) is as given. Use technology to numerically evaluate the integrals.
7. s(t) = e−0.625t i.e. survival rate is doubled.

8. s(t) = e−2.5 t i.e. survival rate is halved.
1
9. s(t) = 0.25+t2
10. s(t) = e−1.25t and, in addition, the proportion of queens alive at time t is q(t) = e−0.1t .
For problems 13 to 16, reconsider Example 3. Calculate the amount of money Suzy will have in her account by age
60 if she adds A dollars per year to her account and she opens her account at age B years.

13. A = 4, 000 and B = 20.
14. A = 4, 000 and B = 30.
15. A = 1, 000 and B = 10 (she starts really young!)
16. A = 10, 000 and B = 40 (she starts very late!)
17. Analysts speculate that patients will enter a new clinic at a rate of 300 + 100 sin πt 6 individuals per month.
Moreover, the likelihood an individual is in the clinic t months later is e−t . Find the number of patients in the
clinic one year from now.
18. A patient receives a continuous drug infusion at a rate of 10 mg/h. Studies have shown that t hours after
injection, the fraction of drug remaining in a patient’s body is e−2t . If the patient initially has 5 mg of drug in
her bloodstream, write an expression (involving a definite integral) that represents the amount of drug in the
patient’s blood stream 24 hours later.
19. Consider a mental heath clinic that initially has 300 patients, accepts 100 new patients per month, and for
which the fraction of patients receiving treatment for t or more months is given by f (t).
t (in months) f (t)

0 1
3 0.5
6 0.3
9 0.2
12 0.1
Using Riemann sums with left endpoints, estimate the number of patients in the clinic after 12 months.
20. Consider the following two scenarios involving an IRA account that yields 9% continuous interest.
a. You graduate from college at age 22, get a job, and open an IRA account. You deposit $1,000 per year until
age 65. How much money is the account at age 65? How much money did you pay into this account?
b You graduate from college at age 22, and do not bother to start an IRA account until you reach 32. Then
you deposit $2,000 per year into the IRA account until you reach age 65. How much money is in your
IRA account at age 65? How much money did you pay into this account?
21. The administrators of a town estimate that the fraction of people who will still be residing in the town t years
from now is given by the function S(t) = e−0.04t . The current population is 20,000 people and new people are
arriving at a rate of 500 per year.
a. What will be the population size 10 years from now?

b. What will be the population size 100 years from now?
22. After 5 mg of dye is injected into a vein, we obtain the concentration levels in the following table. The variable
t is in seconds and c(t) is in mg/liter. Using Simpson’s rule, compute the cardiac output.
t c(t) t c(t) t c(t) t c(t)

0 0.00 6 4.8 12 4.5 18 0.50
1 0.20 7 5.5 13 3.5 19 0.2
2 0.7 8 6. 14 2.5 20 0.1
3 1.6 9 6.3 15 1.8 21 0.00
4 2.5 10 6.3 16 1.10 22 0.00
5 3.5 11 5.5 17 0.60

23. Sediment flow ∗ . Ecologists and scientists are interested in how much sediment is moved by a river. Data on the
water flow and suspended sediment in the Des Moines River near Saylorville lake is given in the table. Using
Simpson’s rule compute the total amount (kilograms) of suspended sediment that passed the measurement
point for the period ended December 15, 1993.
Des moines River Basin Water Discharge Records December 1993

Day Discharge (ft3 /sec) Suspended Sediment (mg/l)
1 1300 8
2 1590 35
3 2000 58
4 2200 64
5 2350 66
Note: One cubic foot equals 28.3 liters. One kilogram equals 1,000,000 milligrams.
24. Kety-Schmidt technique. Seymour Kety and Carl Schmidt describe a widely acknowledged and accurate method
for determination of cerebral blood flow and cerebral physiological activity such as metabolic rate of oxygen.
For example, a patient breathes 15% nitrous oxide (N2 O). After the start of administration, the arterial
concentration, A, is measured in the radial artery. This is the concentration before the blood enters the brain.
The venous concentration, V , is measured at the base of the skull in the superior bulb of the internal jugular
(at the point of exit of the jugular vein from the brain). This process for measuring the blood flow of cerebral
physiological activity is commonly referred to as the Kety-Schmidt technique. A sample table is shown.
Time (min) A (cc N2 0 per cc blood) V (cc N2 0 per cc blood)

0.0 0.000 0.000
2.5 0.031 0.012
5.0 0.039 0.027
7.5 0.041 0.034
10.0 0.044 0.042
a. Initially the rate N2 0 flow into the brain is greater than the flows out of the brain. Moreover,
after approximately ten minutes the concentration flowing into the brain and from the brain are
approximately equal. The brain has become saturated with N2 O. Assuming a constant cerebral
blood flow rate, F , use Simpson’s rule to estimate the total amount of N2 O accumulated in the brain
during 10 minutes. Your answer will depend on F .
b. Through other means the maximum amount of N2 O in the brain can be measured. Suppose that the
maximum amount is determined to be 58.8 cc. Determine F . ∗
Answer Problems 25 to 29 by finding the work done, leaving your answer using the unit of foot-pounds (ft-lb).
25. lifting a 90-lb bag of concrete 3 ft.
26. lifting a 50-lb bag of salt 5 ft
27. lifting a 850-lb billiard table 15 ft
28. A bucket weighing 75 lb when filled and 10 lb when empty is pulled up the side of a 100-ft building. How much
more work is done in pulling up the full bucket than the empty bucket?
29. A 20-ft rope weighting 0.4 lb/ft hangs over the edge of a building 100 ft high. How much work is done in pulling
the rope to the top of the building? Assume that the top of the rope is flush with the top of the building, and
the lower end of the rope is swinging freely.
∗ These problems come from: http://illuminations.nctm.org/imath/912/cardiac/cardiac4.html. The methods explored in the
measuring of cardiac output can be applied to other situations.
∗ For your information, for a normal adult, the blood flow rate is between 600cc/min and 900cc/min, or approximately equal to 1 mL.
The resting cardiac output is around 5 or 6 liters per minute.

30. How much ice water do you need to ingest to burn off 300 Calories? Assume your body temperature is 37◦ C
and the energy required to digest ice water is the energy needed to raise the ice to body temperature.
31. In the book, Mike Mulligan and his Steam Shovel (Example 8), Mike claimed that Mary Anne (the steam
shovel) could do as much work in one day as 100 men could do in seven days. If we assume that each of the
men ate 2 lbs of pasta a day and worked with 10% efficiency, how many calories could these men produce in
10 days? How does this compare with the work done by Mary Anne in Example 8? Assume a serving of pasta
is 2 ounces and contains 200 Cal.
32. Determine the length of a trench you can dig with the energy gained from eating one Milky Way bar (270 Cal).
Assume that you convert the energy gained from the food with 10% efficiency and the trench is 1 meter wide
and 1 meter deep. Assume the density of soil is 1, 000 kg/m3 .
In the next two problems, use the fact that a serving of pasta contains 200 Cal and the density of soil is 1, 000
kg/m3
33. How much work does it take to dig up a conical hole of depth 5 meters and diameter 6 meters? How many
servings of pasta are required to complete this work assuming the energy from the pasta is converted with 5%
efficiency to work?
34. How much work does it take to dig a hemispherical pit with radius 10 meters? How many servings of pasta are
required to complete this work assuming the energy from the pasta is converted with 5% efficiency to work?


DEFINITIONS
Section 5.1
Antiderivative, p. 436
Differential equation, p. 439
Slope field, p. 440
Section 5.2
Degree-days, p. 449
Developmental threshold, p. 449
Area problem, p. 454
Riemann sum, p. 457
Section 5.3
Integrable function, p. 464
Definite integral, p. 464
Integral of dx rule, p. 467
Signed area, p. 467
Sum rule, p. 469
Difference rule, p. 469
Scalar rule, p. 469
Opposite rule, p. 469
Positivity rule, p. 471
Dominance rule, p. 471
Bounding rule, p. 471
Splitting rule, p. 471
Definite integral at a point rule, p. 471
Section 5.4
Accumulated change, p. 478
Dummy variable, p. 480
Indefinite integral, p. 483
Section 5.5
Integration by substitution, p. 488
Section 5.6
Integration by parts, p. 496
Partial fractions, p. 500
Section 5.7
Error of approximation, p. 509
Left endpoint rule, p. 508
Right endpoint rule, p. 508
Midpoint rule, p. 512
Simpson’s rule, p. 513
Section 5.8
Survival function, p. 524
Renewal function, p. 524
Cardiac output p. 527
Calorie, p. 530
Work, p. 530
Section 5.1

Antiderivative, p. 436
General form of an antiderivative, p. 436
Initial value problem, p. 438
Section 5.2
Area under a curve, p. 438
Summation formulas, p. 459
THEOREM 5.1 LIMIT OF A RIEMANN SUM, p. 457
Section 5.3
Properties of definite integrals, p. 469 and p. 471
Geometric meaning of the definite integral, p. 472
Section 5.4
THEOREM 5.2 THE EVALUATION THEOREM, p. 477
THEOREM 5.3 THE FUNDAMENTAL THEOREM OF CALCULUS, p. 480
Section 5.5
Integration by substitution, p. 488
Substitution with definite integrals, p. 491
Section 5.6
Integration by parts, p. 496
Integration by parts with definite integrals, p. 499
Section 5.7
Numerical integration, p. 508
Error bounds, pp. 510, 512, 513
Section 5.1: Stink bug development; Weber-Fechner law; Rectilinear motion - Peregrine falcon
Section 5.2: Crop maturity (degree-days); Black Plague
Section 5.3: Growing grapes
Section 5.4: Horn increase for Bighorn Ram
Section 5.5: U.S. population growth; Breathing
Section 5.6: Survival to age t; Second-order chemical kinetics
Section 5.7 Crab harvest
Section 5.8: Survival and renewal; Fire ants; Savings for retirement; Cardiac output; Dye dilation; Thermodilution;
Work; Calorie consumption by working out; Hungry Grizzly Bear
Problem Set 5.9

1. Find the general antiderivative of f (x) = √1 .

x
Evaluate the definite integrals in Problems 2 to 10.

R4
2. 0
(x2 − 1) dx
Rπ
3. 0 (sin x + x) dx

R1
4. −1
ex+1 dx
R 1/2 dx
5. 0 1−x2
R1
6. 0 ex sin(πx)dx
R π/2
7. 0
t2 sin(2t) dt
R2 2
8. −2
xe−x dx
R4 dx
9. 1 (x+1)(x+2)
R1 x+1
10. −1 (x+3)(x+2)
dx
x+1
11. Find the area under the curve y = x over [1, 2].
x+1
12. The slope F ′ (x) = x2 at each point is shown in Figure 5.50.
Figure 5.50: Slope field
Find F passing through (1, −2) both graphically and analytically.
13. The “Royalty” rose has a lower developmental threshold of 41.4◦ F requires 473 degree-days for harvesting time.
If the temperature were to remain a constant 72◦ F, how long would it take for this rose to mature?
dy
14. Find dx where
Z 2x
y= sin(x2 ) dx
1
15. Evaluate the following integral Z

dN
N (100 − N )
16. Consider a mental health clinic that initially has 300 patients, accepts 100 new patients per month, and for
which the fraction of patients receiving treatment for t or more months is given by f (t):
t (in months) f (t)

0 1
3 0.5
6 0.3
9 0.2
12 0.1

Using Riemann sums with left endpoints, estimate the number of patients in the clinic after 12 months.
17. Find an upper bound for
Z 2
5 sin(x3 ) dx.
−2
18. The rate of infection of a disease in a population of 10,000 is given by the function
R(t) = 10, 000 te−t people per month
where t is the time in months since the disease broke out.

(a) Use your graphing calculator to plot R(t). Why is this a reasonable description of a disease spreading in
a population?
Solution: The graph shows the rate of increase starting low (as only a few are infected), increasing to a
maximum, and decreasing to zero (as every one gets infected)
(b) Compute the number of people infected by the disease by time T .
(c) (Requires a graphing calculator) Approximate the time when 50% of the population of have the disease?
19. In a wild week of temperature fluctuations, the temperature in Corvalis is given by
T (t) = 75 + t cos(2πt)◦ F
where t is measured in days. Find the number of degrees days that have elapsed for a beet army worm over
the first week. Note: The lower developmental threshold of a beet army worm is 54◦ F.
R7
20. Express 3 tan x dx as the limit of Riemann sum using right end points.
Pn 1 2
21. Express limn→∞ i=1 5+2i/n n as a definite integral.
R2
22. Find √x dx
1 x+1
R3 R3 R2 R2
23. Suppose 1
f (x) dx = 4, 2
f (x) dx = 5, and
g(x) dx = 6. Find 1 f (x) − 2g(x) dx.
1
R2
24. Use the geometric interpretation of the definite integral to find −1 1 − |x| dx. Be sure to provide a sketch.
25. A stone was dropped off a tower and hits the ground at a speed of 200ft/second. What was the height of the
tower?
5.10 Group Projects

specific ideas is an important skill. Work with three or four other students to submit a single report based on each of
the following projects.
Project 5A: Physiological Time

In Section 5.1 you were introduced to the concept of developmental thresholds that defined the range of temperatures
over which plants and poikilothermic animals (those without an internal mechanism for maintaining their body
temperature within a narrow range of values as in homeothermic birds and mammals) grow and develop. In Section
5.2 this idea was articulated further through the concept of physiological time, as measured through the accumulation
of heat units called degree-days. The number of degree days that accumulate over time for a given temperature profile
is the area under the curve of this profile between the lower and upper thresholds, as illustrated in Figure 5.51. Note
that a lower threshold is always needed, to bound the area from below, but the calculated area is either bounded

Figure 5.51: The solid line represents the continuous temperature that a plant or poikilotherm experiences and
the shaded area represents the accumulated degree-days that the organism in question will experience subject to
development being arrested above and below the upper and lower thresholds respectively
above by the temperature curve itself or an upper threshold, depending on which is the minimum for the time in
question. Thus, a lower threshold is always needed, but an upper threshold is only included as a refinement of
physiological time as a model for estimating the growth and developmental rates of plants and poikilotherms.
If we could continuously measure the temperature in an orchard, for example, from the time of bud break (the
first buds appear on otherwise bare trees) and we knew how many degree-days between a minimum and maximum
thresholds were needed until the trees come into blossom, we could predict using anticipated weather patterns from
historical data sets the expected date for the occurrence of blossoms and make sure that we have honey bee hives in
the orchard in sufficient time to anticipation this event. Thus the calculation of degree-days helps growers optimize
their use of honey bee pollinators or the scheduling of harvest activities and so on.
It is generally not possible or even desirable to continuously monitor the temperature of an orchard. Further
the temperature in an orchard varies somewhat from the ground to the tops of the trees, with temperatures on the
north, south, east, and west sides of trees varying among locations as well. Most growers have temperature gauges
that only record the maximum (max) and minimum (min) temperatures each day. This data can be used to generate
a degree-day calculation under the assumption that the maximum and minimum temperatures occur 24 hours apart
(as idealized in Figure 5.51) using an appropriate function (i.e. model) for interpolating the temperature between
each consecutive pair of max-min and min-max temperatures. If a linear function is used, and only a lower threshold
is assumed, the method is equivalent to constructing a sequence of right-angled triangles with either a rectangular
piece added below when the minimum temperature is above the threshold, or the base of the triangle is raised for
the case when the minimum temperature is below a lower threshold (as illustrated in Figure 5.52)
1. Use the double triangle method illustrated in Fig. 5.52 to calculate the number of degree-days accumulated
over a three day period in which the minimum and maximum temperatures in degrees centigrade are T =
{(5, 23), (7, 22), (4, 26), (5, not measured)} and the lower threshold is 0◦ C with no upper threshold assumed to
exist.
2. Instead of using a line to interpolate between min and max temperatures, use the rising first quarter phase of
a sine function to interpolate between the given min and max and the falling second quarter of a sine function
to interpolate between the max and min temperatures. This method is referred to as the double sine method
(Figure 5.53).
3. Using the double triangle method, recalculate the number of degree days accumulated when the lower threshold
is 5◦ C, firstly for the case when there is no upper threshold, then when the upper threshold is 30◦ C, and finally
when the upper threshold is 25◦ C.
4. Repeat the previous exercise using the double sine method and compare your results with the double triangle
method.

Figure 5.52: The thick irregular line represents a hypothetical temperature profile that oscillates like a distorted
sine wave so that in every 24 hour period it has a maximum and a minimum value. The thin line is a linear
interpolation between these maximum and and minimum values. The shaded quadrilaterals plus intervening non-
shaded quadrilaterals, all with their bases defined by the lower threshold temperature (dotted line: note the upper
threshold is above the max temperature in all cases and so does not apply), are the accumulation of degree-days
between consecutive min-max temperatures and max-min temperatures respectively. This method of accumulating
degree-days is called the double triangle method because two different “triangular looking” quadrilaterals are used
in every 24 hour cycle. Areas labeled b are not included when they should be, but this is balanced to some extent
by areas labeled a which are included when they should not be.
5. Use your precalculus knowledge of algebra and geometry to write down a general expression for the num-
ber of accumulated degree-days under the double triangle method when the temperature profile is T =
{(m1 , M1 ), (m2 , M2 ), · · · , (mn , Mn ))} and the minimum and maximum developmental thresholds are k and
K respectively.
6. Use your knowledge of integral calculus to repeat the above exercise and write down a general expression for
the double sine method.
7. Find a real data set on the web of daily max and min temperatures that spans a several month period (if you
find a longer data set, select a several month subset) and use your double triangle and double sine formulae,
implemented in your favorite technology (e.g. a spreadsheet application, Mathematica, Maple or some other
programming language), to calculate the number of degree-days progressively accumulating each day from the
start to end date of your data if the lower and upper thresholds respectively are equal to the average min and
average max over the data. Plot these results on a graph of “accumulated degree-days to date” to provide a
visual sense of how much the two methods differ over time.
Project 5B: Life Histories and Population Growth (challenging!)

Every biological species has a life history characterized by two functions: the mortality function ℓ(x) and natality
function b(x). The interpretation of the first function is that ℓ(x) represents the proportion of individuals in a large
population that survive to age x or, in a small population, as we will see in Chapter 7 when we look at the relationship
between integration and probability theory, it represents the probability that any given individual will survive until
age x (which can be a fractional number). Thus ℓ(30) = 0.2 implies that only 20% of individuals in a population will
survive until age 30. Note that we don’t have to use years as our unit of time. In the case of fruit flies, for example,
a more appropriate measure of age is weeks or days.
The second function represents a force of natality
R 3 which only has a clear meaning in terms of being integrated
over some non-zero age interval x. For example, 2 b(x)dx = 3.5 implies that each individual in its third period of
life (i.e. from age 2 to age 3) is expected (on average) to produce 3.5 offspring. If these are sexually reproducing
organisms, then this implies that any male-female pair in their second year of life is expected to produce 7 offspring.

Figure 5.53: The thick irregular line represents a hypothetical temperature profile while the thin line, instead of being
a linear interpolation as depicted in the double triangle method, is a different quarter sine wave interpolation (of 12
hour duration) between each min-max and max-min pair of temperatures. This method of accumulating degree-days,
using quadrilaterals modified so the the top side is a quarter sine wave rather than a line, is called the double sine
method.
The theory we are about to explore assumes that either the species is clonal, or males and females have the same
life histories, or males and females have different gender specific life histories but only females are considered. In the
latter case, b(x) is interpreted as the force of natality of female progeny per reproducing female of age x—that is,
R3
the statement 2 b(x)dx = 3.5 implies that each female is expected to have 3.5 daughters from age 2 to 3. Of course
some might have 0 and others might have 10, but the average for the age range in question is 3.5.
Demographers have shown, under assumptions of stationarity (a technical term that requires more advanced
concepts than we have to define it, but can be loosely thought of as a population that has an unchanging age-
structure over time), that the quantity Z xmax
R0 = ℓ(x)b(x)dx
0
represents the number of individuals being born for every individual that dies, given that no individual lives beyond
age xmax . This implies that the population is growing if R0 > 1 and declining if R0 < 1. Further, demographers have
shown that this rate of growth or decline is equivalent to the mathematical statement that Nt+G = R0 Nt , where G
is the length of a generation which is given by the integral
R xmax
xℓ(x)b(x)dx
G= 0 .
R0
Thus if we rescale time so G = 1, then this model implies that the population will have grown from an initial size
N0 to a size Nm = R0m N0 after m generations
1. If the proportion of individuals that die each time period in an age-specific cohort (i.e. a group of individuals
of the same age) is independent of their age, then the mortality schedule (curve, function) for the species in
question is said to be Type II. Demonstrate that the form
−rx
e 0 ≤ x ≤ xmax
ℓ(x) =
0 x ≥ xmax
is a Type II mortality curve on [0, xmax ] for some constant r > 0.

2. Species are said to have Type I mortality schedules if mortality rates are much higher in immature than mature
individuals (except, of course, for the very old) and Type III in the reverse case. By scouring the Internet or
other reference sources, identify 3-5 species conforming to each of the three mortality schedule types.
3. Over long periods of time, ecological processes ensure that most populations either stay the same size or go
extinct, since the finiteness of our world does not permit them to grow without bound. In the former case, we

expect in the long run (i.e. on average over time) that R0 = 1, which implies:
Z xmax
ℓ(x)b(x)dx = 1 (5.1)
0
If a species has the mortality schedule given in Part 1 and a natality schedule b(x) of the form

0 0≤x<m
b(x) = ,
b x≥m
where m < xmax (i.e. individuals start reproducing at age m beyond which reproduction is the age independent
rate of b progeny per time period), then for xmax = 100 explore the trade-offs in the values of r, b, and m, that
correspond to a stable population (i.e. satisfy equation 5.1) and provide an expression for the corresponding
generation time. (Hint: Integrate equation 5.1 to get a relationship between r, b, and m and then express one
of the parameters in terms of the other two. For selected values of one of the parameters you can then graph
relationships between the other two. What general statements can you make about these relationships?)
4. The mortality schedule 1
1+(x/d)2 0 ≤ x ≤ xmax
ℓ(x) =
0 x ≥ xmax
is of Type III on [0, xmax ], provided d > xmax , because mortality rates are relatively low until individuals
approach age d around which mortality rates increase strongly. Repeat the previous exercise with this mortality
schedule instead of the Type II schedule, looking at trade-offs in the values of d, b, and m.

Chapter 6
Differential Equations
6.1 A Modeling Introduction to Differential Equations, p. 549
6.2 Separable Equations, p. 562 have
6.3 Linear Models in Biology, p. 572
6.4 Slope Fields and Euler’s Method, p. 584
6.5 Phase Lines and Classifying Equilibria, p. 601
6.6 Bifurcations, p. 616
Preview
Equations containing at most two variables, and derivatives of the first or higher order of one of the variables with
respect to the other are known as ordinary differential equations (or ODEs, for short). For example, the
equations
dy
= 3y(1 − y)
dt
dy d2 y
sin = + cos t
dt dt2
are both ordinary differential equations. These equations, however, are anything but “ordinary,” and have been
used successfully to describe extraordinary things such as planetary motion, sudden population disappearances, the
collapse of the Tacoma Narrows Bridge, nerve impulses, electrical circuits, and the love between Laura and Petrarch.
For example, in Example 4 of Section 6.6 we use differential equations to explore how populations of neurons can
store memories.
A solution to an ODE is a function y(t) that satisfies the equation in question over a specified interval of time:
that is the derivative of y(t) over the interval in question gives the identical function of time as the right-hand-side
of the equation. However, this may only be seen once the right-hand-side has been reduced to its simplest form. In
this chapter, we will tackle differential equations in three ways. First, after introducing some basic terminology and
models, we will derive analytical solutions for special types of ODEs using our integration techniques. Second, as
ODEs often cannot be solved explicitly, we will introduce techniques that shed light into the qualitative behavior
of ODEs. In the words of the brilliant mathematician J. Henri Poincaré (1854-1912), winner of the coveted King
Oscar’s Prize in 1889:
In the past an equation was only considered solved when one had expressed the solution with the aid
of a finite number of known functions; but this is hardly possible one time in a hundred. What we

548
Figure 6.1: We are able to think, move, and eat because of collections of neuron cells in our bodies. This photograph
shows the neurons of a ground squirrel. The specimen was prepared by Professor Brian Boycott using the Golgi
technique and was photographed by Dr Jonathan Clarke of UCL Anatomy and Developmental Biology.
should always try to do, is to solve the qualitative problem, that is to find the general form of the curve
representing the unknown function.∗
Third, we will use technology to generate and visualize numerical solutions to these ODEs. To this end, we discuss
a numerical method, Euler’s method, and the use of technology. By no means will this discussion be exhaustive.
This chapter only provides you with a tantalizing taste of this powerful mathematical construct which has been used
extensively in biological modeling.
∗ Address to the International Congress of Mathematicians in 1908. Translation by Morris W. Hirsch.

6.1. A MODELING INTRODUCTION TO DIFFERENTIAL EQUATIONS 549
6.1 A Modeling Introduction to Differential Equations

Differential equations can be used to describe how quantities change continuously over time. Since understanding
nature inspired much of mathematics, it is only natural to begin with some models of population growth that motivate
the techniques.
Exponential population growth and decay

Consider a population of yeast in the flask illustrated in Figure 6.2.
Figure 6.2: Density of yeast in a flask. Source: The Struggle for Existence by G. F. Gause
At the beginning of the 20th century several notable biologists including G. F. Gause and T. Carlson studied
the population dynamics of yeast. For example, T. Carlson grew yeast under constant environmental conditions in
a flask. He regularly monitored their densities. The resulting data is shown in Figure 6.3† .
density
600
500
400
300
200
100
hours
2.5 5 7.5 10 12.5 15 17.5
Figure 6.3: The Carlson yeast data

† Über Geschwindigkeit und Grösse der Hefevermehrung in Würze. Biochem. Z.57: 313-334, 1913

550 6.1. A MODELING INTRODUCTION TO DIFFERENTIAL EQUATIONS
One of the goals of this section will be to come up with a model that describes Carlson’s yeast data. In developing
this model, or for that matter any other model in this text, we first select a modeling paradigm, a general conception
or world view on how a particular class of processes or systems should be modeled and how the parameters in the
model relate to what can be measured in practice. Within this paradigm, we adhere to the Principle of Parsimony,
an operational principle used in science that requires we begin with the simplest model. It is also known as Occam’s
Razor, or more simply (using the vernacular) as the KISS principle—Keep it simple, stupid! Because when it comes
to the complexities of nature, we are all inherently stupid. Consequently, we need to begin by formulating and
analyzing the simplest possible model and introduce elaborations only as necessary.
In keeping with the KISS principle, we begin by modeling the initial growth phase of the number of cells in a
yeast culture.
Example 1. Constant per-capita growth rate paradigm
The growth of a population is determined by four processes, birth, death, immigration, and emigration. The
simplest model follows from the following three assumptions.
System is closed. There is no immigration or emigration.
Constant per-capita birth rates. The birth rate b > 0 is proportional to the population density so that the more
individuals in the population the greater the birth rate.
Constant per-capita death rates. The death rate d > 0 is proportional to the population density so that the
more individuals in the population the greater the death rate.
Write down a differential equation model that embodies these assumptions.
Solution. Let N denote the population density and t time. Under the stated assumptions, the model is:
dN
= Birth Rates − Death Rates
dt
= bN − dN
= (b − d)N
= RN setting R = b − d
Here the birth minus death rate R = b − d is referred to as the intrinsic growth rate, but is sometimes called the
instantaneous per-capita growth rate because the above equation can be rearranged to reveal R = N1 dN
dt . 2
dN
A solution of this equation, dt = R N , is a function N such that
N ′ (t) = R N (t)
Let us try to understand the solutions of this differential equation qualitatively and analytically.
A qualitative analysis involves discovering the qualitative behavior of solutions. In other words, it involves
determining whether the solutions are increasing, decreasing, remaining constant, or even oscillating without worrying
about the exact form of the solution.
Example 2. Qualitative behavior of the constant per-capita growth rate model
Consider the growth of a population modeled by

dN
= R N, N (0) > 0
dt
Note, that we have assumed that the initial value of the population N (0) at time t = 0 is positive. Discuss how the
behavior of the population depends on the sign of R.
Solution.

Case R = 0. In this case dN

dt = R N = 0, which implies that the rate of change
dN
dt is zero for all t. Hence, the
population density N (t) remains constant over time.
Case R > 0. In this case dN
dt = R N > 0, which implies that the population growth rate
dN
dt is positive for all t.
Hence, the population density increases indefinitely over time.
Case R < 0. In this case dN
dt = RN < 0, which implies that the population growth rate
dN
dt is negative for all t.
Hence the population decreases indefinitely over time.
2
The three qualitative cases in this example correspond to three regimes of population behavior.
Constancy The case R = 0 is implied by b = d: that is, the per-capita birth and death rates balance each other
implying that the population will neither grow nor decline.
Growth The case R > 0 is implied by b > d > 0: that is, the per-capita birth rate exceeds the per-capita death
rate implying that the population will increase over time.
Decay The case R < 0 is implied by d > b > 0: that is, the per-capita death rate exceeds the per-capita birth rate
implying that the population will decrease over time.
All of these qualitative predictions were readily made by looking at the sign of the right hand side of N ′ = R N .
General methods for making these predictions are discussed further in Sections 6.4 and 6.5.
In contrast to a qualitative analysis, an analytical approach involves finding explicit solutions to differential
equation models. For this constant per-capita growth rate model, finding an analytical solution means finding a
function N (t) such that its derivative is R times itself: that is, N ′ (t) = R N (t). If we consider the derivatives of all
the elementary functions we know, then a little thought reveals that
N (t) = eRt
has the desired property. To demonstrate this, it suffices to check that
N ′ (t) = ReRt = RN (t)
More generally, N (t) = C eRt is a solution for any choice of the constant C. Indeed,
N ′ (t) = CReRt = RCeRt = RN (t).
You might ask, what does C represent? The answer is that since N (0) = CeR·0 = C, C represents the initial
population density. Furthermore, because N (t) is an exponential function when C > 0, the second and third
qualitative cases considered above are more accurately referred to as
Exponential growth: If R > 0 then the population density exhibits exponential growth, which is unbounded over
time. (No matter how large a number we choose, a time will come when the population exceeds that number—a
biological impossibility.)
Exponential decay If R < 0 the population declines at an exponential rate and will asymptotically approach zero
with increasing time.
Using the solution N (t) = C eRt , we can determine how well our simple model dN/dt = RN fits the initial phase of
Carlson’s yeast data.
Example 3. Carlson’s data: exponential growth
Table 6.1 shows the data that T. Carlson gathered in the study of a growing yeast culture.∗
∗ Ibid.

Table 6.1: Population densities (number/unit volume) for a growing yeast culture at one hour intervals
Time Population Time Population Time Population
0 9.6 6 174.6 12 594.8
1 18.3 7 257.3 13 629.4
2 29.0 8 350.7 14 640.8
3 47.2 9 441.0 15 651.1
4 71.1 10 513.3 16 655.9
5 119.1 11 559.7 17 659.6
As illustrated in Figure 6.3, the initial phase of population growth appears to be exponential. Use the data to
estimate the parameters C and R for N (t) = CeRt , where t is measured in hours.
Solution. Recall, C represents the initial population density, so C = N (0). Hence, C = 9.6. To estimate R, we
can choose another data point, say N (3) = 47.2 (bearing in mind that a different data point would yield a similar,
but different graph), and solve
N (t) = CeRt
N (3) = 9.6e3R Substitute known value for C.
3R
47.2 = 9.6e Substitute know value for N (3).
3R = ln(47.2/9.6) Definition of logarithm.
R ≈ 0.53
Since the time is in hours R ≈ 0.53 per hour.
density
500
400
300
200
100
hours
2 4 6 8
Figure 6.4: The Carlson yeast data and derived equation
A plot of N (t) = 9.6 e0.53 t against the data is shown in Figure 6.4. Note the equation we derived seems to fit the
data well, at least until t = 6, and it passes through the points we used to derive the equation. If it did not, there
would have been some error in our calculations. 2
Using the exponential model of growth, we can estimate the doubling time for yeast population.
Example 4. The yeast doubling time
For a population satisfying the equation

dN
= 0.53 N
dt
find the time in hours for the population to double.

Solution. Previously, we found a solution to this differential equation in the form

N (t) = Ce0.53t
where C is the initial population size. To find the doubling time in hours, we need to find t such that N (t) = 2N (0) =
2C. Hence, we need to solve
2C = Ce0.53t Solve this equation for t.
0.53t
2 = e Divide both sides by C.
0.53t = ln 2 Definition of logarithm.
ln 2
t = ≈ 1.31 Divide both sides by 0.53.
0.53
The doubling time is about 1 hr 18 min. This conclusion is consistent with the data for the first few hours. For
instance, after 3 × 1.31 ≈ 4 hours, the yeast density has increased approximately by a factor of 23 = 8. 2
As you will see in the problem set and in Section 6.3, this simple exponential model dN/dt = RN can also be
used to model radioactive decay, decay of a drug in the blood stream, and decay of the number of viral particles in
the blood of an individual treated with drugs.
Logistic Growth
While the exponential model provides a reasonable fit for the initial growth of the yeast population, it begins
significantly overestimating the population density during the 7th and 8th hours. Moreover, the actual yeast data
asymptotically approaches a density of around 660, while the exponential growth model exhibits unbounded growth.
This phenomenon of decreasing per-capita growth rates with increasing population density, was first elaborated by
Thomas Malthus (1766-1834) in his treatise “An Essay on the Principle of Population Growth” published in 1798.
Malthus recognized that as populations get larger, their per-capita growth rate declines due to limited resources and
interference among individuals. To deal with these limitations, we modify our model, again using the Principle of
Parsimony. The data in Table 6.1 can be used to estimate the per-capita growth rate, R, of yeast as a function of
population density, N . Looking at the data plotted in Figure 6.5, the per-capita growth rate is clearly a decreasing
function.
per-capita growth
0.8
0.6
0.4
0.2
density
100 200 300 400 500 600
Figure 6.5: Per-capita growth rate as a function of density for the Carlson yeast data
The exact form of R as a function of N is not uniquely determined. In the words of Raymond Pearl, a professor
of Biometry at John Hopkins University in the 1920’s and 30’s∗
∗ 1930, cf. Guass gfg04.htm at www.ggause.com, pp. 407-408.

It should be made clear at the start that there is, unfortunately, no methods known to mathematics
which will tell anyone in advance of the trial what is either the correct or even the best mathematical
function with which to graduate a particular set of data. The choice of the proper mathematical function
is essentially, at its very best, only a combination of good judgment and good luck.
According to the Parsimony Principle, we begin with the simplest decreasing function of N with positive intercept
on the R axis, which is the linear function. Let K denote the horizontal intercept and r the vertical intercept of
this linear function. In other words, we choose the per-capita growth rate R(N ) to be the linear function R(N ) =
r(1 − N/K). For reasons that become obvious in the next example, the value N = K is called the environmental
carrying capacity for the population. The parameter r is called the intrinsic growth rate. Under these assumptions,
we obtain the so-called logistic equation, which is arguably the single most important equation in population ecology.
The equation
dN N
=r 1− N
dt K
Logistic Equation is knows as the logistic equation with intrinsic growth rate r and the environ-
mental or population carrying capacity K. Note that the parameters r and K
are the intercepts on the R and N axis of the instantaneous growth rate function
R(N ) = r(1 − N/K).
What can we say about the behavior of the solutions to the logistic equation? We can readily answer this question
with a qualitative analysis. Finding explicit solutions will have to wait until the next section.
Example 5. Qualitative analysis of the logistic equation
Assuming that r > 0 and K > 0, describe qualitatively how solutions to the logistic equation depend on the
initial value of N .
Solution. Qualitatively there are three types of solutions when initially N ≥ 0.

Equilibrium solution: If N (0) = 0 or N (0) = K, then one can prove that dN dt = r N (1 − N/K) = 0 for all time
t ≥ 0 (See Problem 17) . Since the growth rate of the population is zero for all t ≥ 0, the population density cannot
change over time so that N (t) = 0 for all time or N (t) = K for all t ≥ 0, depending on which of the two initial
conditions applies. Such unchanging (i.e. constant) solutions are called equilibrium solutions and are illustrated
in Figure 6.6a.
N N N
t t t
a. Equilibrium solution b. Increasing and saturation c. Decreasing and saturation
Figure 6.6: Different solutions for the Logistic equation
Increasing and saturating: If 0 < N < K, then rN (1 − N/K) > 0 and the population growth rate is positive.
For a population starting between 0 and K, we expect the population density to increase, as will be shown to be
true once we have solved these equations in the next section. However, since dN/dt gets close to zero as N gets close
to K, we would expect the population to increase less rapidly as it approached K and to asymptotically saturate at
K as illustrated in Figure 6.6b.
Decreasing and saturating: If N > K, them rN (1 − N/K) < 0 and the population growth rate is negative. In
this case the population density declines over time. As dN/dt is barely negative for N slightly larger than K, we

expect to population density to decline less rapidly as it approaches K and the population density to asymptotically
level off at K. This is illustrated in Figure 6.6c.
Hence, as long as N (0) > 0, we expect the population density to approach the carrying capacity K of the
environment, again as will be seen to be true once we have solved the logistic equation in the next section. 2
Example 6. Logistic model for the yeast data
Parametrize the logistic model for the Carlson yeast data in Example 3.
Solution. The Carlson data suggests the population density is approaching an asymptotic value of 660. Hence, we
choose K = 660. To estimate r, notice that when N is small

dN N
= rN 1 − ≈ rN
dt 660
In other words, at low densities we expect to see approximately exponential growth. Using our work from Example 3,
we set r = 0.53. Thus the specific logistic equation in this case is

dN N
= 0.53N 1 −
dt 660
2
How well does the model presented in this example fit Carlson’s yeast population density data over time? In the
next section we analytically derive the solution to the logistic equation so the data and model output (population
density trajectory) can be compared. A quick peak forward to Figure 6.11 shows that the model fits the data
surprisingly well. In the problem set, we will see examples where the logistic model provides a reasonable fit to the
spread of AIDS and the ascent of the video recorder (VCR) in the United States.
External Influences on Populations

In addition to understanding the implications of models for the behavior of the populations they describe, it is
important to be able to extend models to account for external influences on these populations. To do this we need
to extend the models to incorporate elements not included in the initial model. In the 1970’s a bio-mathematician
by the name of Colin W. Clark at the University of British Columbia, building on the work of others, invented a
new field which he called Mathematical Bioeconomics∗ , essentially based on extending the logistic equation (and
generalized versions of this equation) to account for the economic aspects of harvesting biological populations. The
most important applications of this theory were in the whaling and fisheries industries, although current theory has
been further developed to account for the fact that size of individuals within any population varies with the age of
those individuals. His analysis is based on logistically growing populations that are harvested at a rate h(t) over
time:
dN N
= rN 1 − − h(t).
dt K
Two special cases of particular interest are:
Constant Harvesting: In this case, the harvest rate is h(t) = h, where h is constant value for all t for which
N (t) > 0. Obviously if harvesting drives N (t) to 0, as it has in some real populations, then h(t) is necessarily
0 once N (t) = 0.
Proportional Harvesting: In this case, the harvest rate is h(t) = vN (t), where the constant of proportionality
v > 0 is also called the harvesting effort variable.
∗ Colin W. Clark, Mathematical Bioeconomics: the Optimal Management of Renewable Resources, John Wiley & Sons, New York,
1976.

Figure 6.7: Conch are harvested for home reef aquariums, as well as for their beautiful shells. Source:
www.etropicals.com/product
Example 7. Harvesting queen conchs
Consider a population of queen conchs in the Bahamas that, in the absence of harvesting, exhibit logistic growth.
Let N represents the number of conch in a well-defined area and t be measured in years. For ease of computation,
let us assume that the intrinsic growth rate of this population is r = 10 and the carrying capacity of the area in
which the conch are located is K = 10, 000 individuals.
a. Write down a logistic harvesting model for the case where 21,000 individuals are removed from the
population every year.
b. Determine qualitatively the fate of the population and how it depends on the initial number of conch in
the population.
c. Discuss what happens if the harvesting rate is h = 30, 000 conch per year.
Solution.
a. Since we are harvesting at a constant rate of 21,000 individuals per year, the model is

dN N
= 10N 1 − − 21, 000
dt 10, 000
b. The qualitative analysis boils down to understanding for what values of N is

dN dN dN
= 0, < 0, or >0
dt dt dt
dN
Case I: dt = 0:
dN
= 0
dt
N
10N 1 − − 21, 000 = 0
10, 000
N 2 − 10, 000N − 21, 000, 000 = 0 Expanding and multiplying by 1000
(N − 3, 000)(N − 7, 000) = 0
N = 3, 000 or 7, 000
dN
These are the equilibrium values i.e. values where dt = 0.

dN
Case II: dt < 0:
From our work in case I, we see this is true if N < 3, 000 or N > 7, 000. Hence if N > 7, 000, then the
population would decrease, but decrease more slowly as N approaches 7, 000 (i.e. dNdt is close to zero for
N near 7, 000). Consequently, if N > 7, 000, we would expect the population to decrease and to saturate
at 7, 000 (we say expect, because the notions expressed here can only be made more precise once we
have additional theory under our belts). Alternatively if N < 3, 000, the population would continually
decrease to 0 (i.e. extinction) as dN
dt becomes more and more negative as N continues to decrease.
dN
Case III: dt > 0:
From our work in case I, we see this is true if 3, 000 < N < 7, 000. We would expect the population to
increase, increase more slowly as it approaches 7, 000, and to saturate at 7, 000.
c. First we note that the conch growth rate

N
F (N ) = 10N 1−
10, 000
has a maximum at N = 5, 000 with

1
F (5, 000) = 50, 000 1 − = 25, 000
2
Thus the right-hand-side of the conch growth equation subject to a harvesting rate of 30,000 conch per
year satisfies
dN N
= 10N 1 − − 30, 000 ≤ −5, 000 for all N
dt 10, 000
Hence harvesting the population at this rate will drive it extinction.
2
Problem Set 6.1

Write a differential equation to model the situation in Problems 1 to 8. Do not try to solve.
1. The number of bacteria in a culture grows at a rate that is proportional to the number of bacteria present.
2. A sample of radium decays at a rate that is proportional to the amount of radium present in the sample.
3. In Section 6.3, we will introduce Newton’s law of cooling. Newton’s law states that the rate at which the
temperature of a body changes is proportional to the difference between the body’s temperature T and the
ambient temperature A.
4. In Section 6.3, we will study the von Bertlanffy growth equation. As part of that study, we will formulate
a differential equation which states that the rate at which the mass, M , of a healthy critter grows through
absorption of food is directly proportional to its surface area L2 and declines through respiration at a rate
proportional to its mass L3 .
5. According to Benjamin Gompertz (1779-1865) the growth rate of a population is proportional to the number
of individuals present, where the factor of proportionality is an exponentially decreasing function of time.
6. When a person is asked to recall a set of N facts, the rate at which the facts are recalled is proportional to the
number of relevant facts in the person’s memory that have not yet been recalled.
7. The rate at which an epidemic spreads through a community of P susceptible people is proportional to the
product of the number of people y who have caught the disease and the number P − y who have not.

8. The rate at which people are implicated in a government scandal is proportional to the product of the number
N of people already implicated and the number of people involved who have not yet been implicated.
A population model for Problems 9 to 12 is given by

dP
= P (100 − P )
dt
where P (t) denotes population density at time t.
9. For what values is the population at equilibrium?

dP
10. For what values is dt > 0?
dP
11. For what values is dt < 0?
12. Describe how the fate of the population depends on the initial density.
A population model for Problems 13 to 16 is given by

dP
= P (P − 1)(100 − P )
dt
where P (t) denotes population density at time t.
13. For what values is the population at equilibrium?

dP
14. For what values is dt > 0?
dP
15. For what values is dt < 0?
16. Describe how the fate of the population depends on the initial density.
dN
17. Prove that if N (0) = 0 or N (0) = K, then the solution to the equation dt = r N (1 − N/K) = 0 is respectively
N (t) = 0 or N (t) = K for all t.
Radioactive decay: Certain types of atoms (e.g. carbon-14, xenon-133, lead-210, etc.) are inherently unstable.
They exhibit random transitions to a different atom while emitting radiation in the process. Based on experimental
evidence, Rutherford found in the early 20th century that the number, N , of atoms in a radioactive substance can be
described by the equation
dN
= −λN
dt
where t is measured in years and λ > 0 is known as the decay constant. The decay constant is found experimentally
by measuring the half life, τ of the radioactive substance (i.e. the time it takes for half of the substance to decay).
Use this information in Problems 18 to 22.
18. Find a solution to the decay equation assuming that N (0) = N0
19. For xenon-133, the half-life is 5 days. Find λ. Assume t is measured in days.
20. For carbon-14 the half life is 5, 568 years. Find the decay constant λ, assuming t is measured in years.
21. How old is a piece of human bone which contains just 60% of the amount of carbon-14 expected in a sample
of bone from a living person, assuming the half life of carbon-14 is 5, 568 years?
22. The Dead Sea Scrolls were written on parchment at about 100 B.C. What percentage of carbon-14 originally
contained in the parchment remained when the scrolls were discovered in 1947?

23. King Arthur’s Round Table: In Winchester castle there hangs a wooden round table, 18 feet in diameter
and divided into 25 sections, one for the King and 24 for the knights. There has been speculation that the
Winchester round table was King Arthur’s round table from the 5th century.∗ We know that the round table
was at Winchester since the 15th century. John Harding says in his chronical (1484) that the round table
“ended at Winchester, and there it hangs still.” To put an end to the speculation regarding the Winchester
round table, in 1976 it was taken down from the wall and a series of tests were employed in order to determine
the date of origin. The rate of decay of carbon-14 in the table (i.e. in dead wood) was found to be 6.08 atoms
per minute per gram of sample. Estimate the age of the table to determine whether the Winchester table was
King Arthur’s round table. Hint : Use the fact that the half-life of carbon-14 in dead wood is 5,568 years and
in living wood the rate of decay of carbon-14 is 6.68 atoms per minute per gram of wood.
The Shroud of Turin is a rectangular linen cloth kept in the Chapel of the Holy Shroud in the cathedral of
St. John the Baptist in Turin, Italy. It shows the image of a man whose wounds correspond with the biblical
accounts of the crucifixion.
In 1389, Pierre d’Arcis, the Bishop of Troyes, wrote a memo to the Pope, accusing a colleague of passing off a
certain cloth, cunningly painted as the burial shroud of Jesus Christ. Despite this early testimony of forgery,
this so-called Shroud of Turin has survived as a famous relic. In 1988, a small sample of the Shroud of Turin
was taken and scientists from Oxford University, the University of Arizona, and the Swiss Federal Institute of
Technology were permitted to test it. Suppose the cloth contained 92.3% of the original amount of carbon.
For this Quest, use this information to determine the age of the Shroud.
25. Consider the queen conch logistic growth model presented in Example 7: with a general harvesting function
h(t):
dN N
= 10N 1 − − h(t).
dt 10, 000
a. Describe the qualitative behavior of solutions to this equation (i.e. the long term abundance of the
population) when h(t) = 25, 000 for all t.
b. Describe the qualitative behavior of solutions to this equation (i.e. the long term abundance of the
population) when h(t) = 5N for all t.
c. Describe the qualitative behavior of solutions to this equation (i.e. the long term abundance of the
population) when h(t) = 12N for all t.
26. The cane toad (Bufo marinus), was introduced to Australia by the sugar cane industry to control two pests of
sugar cane, the grey backed cane beetle, and the frenchie beetle.∗ One-hundred-one toads arrived at Edmonton
in North Queensland in June 1935. Unfortunately, due to an asynchrony between the life cycles of the cane
toad and the sugar cane pests, the cane toad did not help suppress the cane beetle and the frenchie beetle.
However, the cane toads ate almost everything else and grew at a tremendous pace. Now the cane toad is a
major pest in Australia. The data below describes the extent of the area in Table 6.2 occupied by the cane
toads as a function of time.
∗ From Applying Mathematics: A Course in Mathematical Modeling by D.N. Burghes, I. Huntley, J. Mc-Donald, Halsted Press, 1982.
∗ Adapted from Differential Equations by Blanchard, Devaney, and Hall, 1998, Brooks/Cole Publishing Company.

Table 6.2: Area occupied (in km2 ) by the cane toad in Australia toad
Year Area
1939 32,800
1944 55,800
1949 73,600
1954 138,000
1964 257,000
1969 301,000
1974 584,000
A simple model to describe this data is given by

dA
= RA
dt
where A(t) is the area occupied at time t (years).
a. Use the data to find a solution to this model such that A(0) = 32, 800 an A(10) = 73, 600. estimate
the parameter R.
b. Estimate the area occupied by cane toads in 2004.
c. Modify this model to account for removing cane toads at a rate Hkm2 /yr beginning in year 2004.
Determine how large H needs to be to ensure that A starts decreasing.
27. Consider the following problem of historical curiosity. The percentage of U.S. households that own a VCR rose
steadily from the time of their introduction in the late 1970 to the point at which other technologies displaced
them. Let y(t) denote the percentage of U.S. households with a VCR where t is measured in years from 1980
to 1991.
Year 1978 1979 1980 1981 1982 1983 1984

% 0.3 0.5 1.1 1.8 3.1 5.5 10.6
Year 1985 1986 1987 1988 1989 1990 1991
% 20.8 36 48.7 58 64.6 71.9 71.9
Assume that
dy y
= ry 1 −
dt K
can be used to describe the data.
dy
a. Use the first and third data points and the approximation dt ≈ r y when y is small compared with
K to estimate r.
b. Using the fact that the data is saturating to estimate the value of K
28. In the previous example compare an estimate obtained for r using growth from 1981 to 1982 versus 1981 to
1984.
29. The Ohio Department of Health released the following data tallying the number of newly diagnosed cases of
AIDS in the state from the initial stage of the epidemic to the early 1990s when the first antiretroviral drugs
began to become widely available:∗
Year 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992
Cases 2 8 27 58 121 209 394 533 628 674 746 725
∗ Cincinnati Enquirer, December 11, 1994.

Let y(t) denote the number of AIDS cases in Ohio in year t. Assume that
dy y
= ry 1 −
dt K
can be used to describe the data.
dy
a. Using the first few data points and the fact that dt ≈ r y when y is small, estimate r.
b. Estimate the value of K.
30. Hyperthyroidism is caused by a new growth of tumor-like cells that secrete thyroid hormones in excess to
the normal hormones. If left untreated, a hyperthyroid individual can exhibit extreme weight loss, anorexia,
muscle weakness, heart disease intolerance to stress, and eventually death. The most successful and least
invasive treatment option is radioactive iodine-131 therapy.∗
This involves the injection of a small amount of radioactivity into the body. For the type of hyperthyroidism
called Graves’ disease, it is usual for about 40-80% of the administered activity to concentrate in the thyroid
gland. For functioning adenomas (“hot nodules”), the uptake is closer to 20-30%. Excess iodine-131 is excreted
rapidly by the kidneys. The quantity of radioiodine used to treat hyperthyroidism is not enough to injure any
tissue except the thyroid tissue, which slowly shrinks over a matter of weeks to months. Radioactive iodine
is either swallowed in a capsule or sipped in solution through a straw. A typical dose is 5-15 millicures. The
half-life of iodine-131 is 8 days.
a. Suppose that it takes 48 hours for a shipment of iodine-131 to reach a hospital. How much of the
initial amount shipped is left once it arrives at the hospital?
b. Suppose a patient is given a dosage of 10 millicures of which 30% concentrates in the thyroid gland.
How much is left one week later?
c. Suppose a patient is given a dosage of 10 millicures of which 30% concentrates in the thyroid gland.
How much is left 30 days later?
∗ We find the following statement at http://www.nrc.gov/reading-rm/doc-collections/cfr/part035/part035-0932.html of
the U. S. Nuclear Regulatory Commission. “Except as provided in 35.57, the licensee shall require the authorized user of only iodine-131
for the treatment of hyperthyroidism to be a physician with special experience in thyroid disease who has had classroom and laboratory
training in basic radioisotope handling techniques applicable to the use of iodine-131 for treating hyperthyroidism, and supervised clinical
experience as follows: (a) 80 hours of classroom and laboratory training that includes (1) Radiation physics and instrumentation; (2)
Radiation protection,(3) Mathematics pertaining to the use and measurement of radioactivity; and (4) Radiation biology; and
(b) Supervised clinical experience under the supervision of an authorized user that includes the use of iodine-131 for diagnosis of thyroid
function, and the treatment of hyperthyroidism in 10 individuals.”

562 6.2. SEPARABLE EQUATIONS
6.2 Separable Equations

For the remainder of this chapter, we consider differential equations of the form
dy
= f (t, y)
dt
where f is an expression involving t and y; e.g. f (t, y) = y−6
t+1 . After discussing what we mean by a solution to a
differential equation, we introduce an important method called separation of variables that can be used to solve
certain types of differential equations.
Solutions to differential equations

A function y(t) is a solution to a differential equation if when you substitute the function y(t) into both sides of the
differential equation, the equation is satisfied.
Example 1. Verifying a function is a solution
Consider the differential equation

dy y−6
=
dt t+1
which is defined for all t 6= 1. Which of the following functions are solutions for all t > −1 or all t < −1?
a. y(t) = t + 7
b. y(t) = 3t + 21
c. y(t) = 3t + 9
Solution. First, notice that the domain for the differential equation is all real values for t, t 6= −1
a. To verify whether or not y(t) = t+7 is a solution, we substitute this expression for y(t) into the differential
equation, and simplify both sides.
dy y(t) − 6
=
dt t+1
d t+7−6
(t + 7) =
dt t+1
t+7−6
1 =
t+1
t+1
1 =
t+1
1 = 1
Since the equation is satisfied for all t in the domain, we see that y(t) = t + 7 is a solution.
b. To verify whether or not y(t) = 3t + 21 is a solution, we substitute this expression for y(t) into the
differential equation and simplify both sides.
dy y(t) − 6
=
dt t+1
d 3t + 21 − 6
(3t + 21) =
dt t+1
3t + 15
3 =
t+1
Since this equation is not satisfied for all t, we see that y(t) = 3t + 21 is not a solution.

6.2. SEPARABLE EQUATIONS 563
c. To verify whether or not y(t) = 3t + 9 is a solution, we substitute this expression for y(t) into the
differential equation, and simplify both sides.
dy y(t) − 6
=
dt t+1
d 3t + 9 − 6
(3t + 9) =
dt t+1
3t + 3
3 =
t+1
t+1
3 = 3
t+1
3 = 3 provided t 6= 1.
Since this equation is satisfied for all t in the domain, we see that y(t) = 3t + 9 is a solution.
2
Example 2. Verifying an implicit solution to a differential equation
Verify that if y satisfies the relationship

x2 + y 2 = 4
then it is a solution to the differential equation
dy x
=− provided y 6= 0
dx y
Solution. From the given equation, we find dy/dx:
x2 + y 2 = 4
dy
2x + 2y = 0
dx
dy x
= − provided y 6= 0
dx y
2
Example 3. From solutions to differential equations
Find a function g(t) such that y(t) = cos t is a solution to

dy y
=
dt g(t)
on some interval of time.
Solution. For y(t) = cos t to be a solution to the given differential equation, we need
dy y
= provided g(t) 6= 0
dt g(t)
d cos t
(cos t) =
dt g(t)
cos t
− sin t =
g(t)

cos t
g(t) = − provided sin t 6= 0
sin t
g(t) = − cot t on an interval of time for which sin t 6= 0
. 2
Separation of Variables
As with the case of integration, solving differential equations requires specialized techniques and there is no guarantee
that in general you can find an elementary solution. A special class of differential equations for which we often can
find solutions are separable equations: differential equations which can be written in the form
dy f (t)
=
dt g(y)
To solve such an equation on an interval of time for which g(y) 6= 0, first separate the variables to obtain
dy
g(y) = f (t)
dt
and then integrate both sides separately to obtain
Z Z
dy
g(y) dt = f (t)dt
dt
The expression we derived for integration by substitution in Section 5.5, implies that the left-hand side can be
expressed purely in terms of y (i.e. without reference to t) to obtain the equation
Z Z
g(y)dy = f (t)dt
which we can then integrate to solve for y in terms of t, as illustrated in the next example.
Example 4. Solving a separable differential equation
Solve
dy t
=−
dt y
Solution. In this case g(y) = y and f (t) = −t. Hence separating variables and integrating yields
Z Z
y dy = − t dt
1 2 1
y + C1 = − t2 + C2
2 2
y 2 = −t2 + C where C = 2(C2 − C1 ) is an arbitrary constant.
√ √ √
Solving for y yields y = ± C − t2 for any non-negative C and for all time t > ± C and t < ± C to ensure y 6= 0.
2
Notice the treatment of constants in Example 4. Because all constants can be combined into a single constant,
it is customary not to write C = 2(C1 − C2 ), but rather to simply replace all the arbitrary constants in the problem
by a single arbitrary constant after the last integral is found.
Example 5. Finding and plotting solutions

dy
a. Solve the differential equation dt = ty 2 .
b. Find and plot a solution of this equation that satisfies y(0) = 1.
Solution.
a. First observe that y = 0 is a solution, but it is not the solution that passes through the point y(0) = 1. To
find this latter solution, we define g(y) = y −2 and f (t) = t and use our separation of variables technique
to obtain on integration:
Z Z
y −2 dy = t dt
t2
−y −1 = +C
2
−1
y =
t2 /2 + C
dy
To check our work, we can substitute this solution into the differential equation dt = ty 2 to ensure that
both sides are the same:

d −1 t
Right hand side: 2
= 2
dt t /2 + C (t /2 + C)2
(−1)2 t
Left hand side: t = 2 = Right hand side.
(t2 /2 + C)2 (t /2 + C)2
b. To satisfy y(0) = 1, we need
−1
1 =
02 /2 +C
−1
1 =
C
C = −1
Thus,
−1 −2 √
y(t) = = 2 for all t 6= ± 2.
t2 /2 −1 t −2
√ √
The solution
√ is plotted
√ in Figure 6.8 on the interval t ∈ (− 2, + 2). Solutions can also be plotted on
(−∞, − 2) and on ( 2, ∞).
Sometimes separation of variables leads to integrals we cannot compute or leads to expressions for which y is only
implicitly defined.
Example 6. Implicitly defined solutions
Consider
dy 2t
=
dt y + sin y
a. Use separation of variables to solve for y implicitly in terms of t. Use technology to graph this solution.
b. Find a solution of this equation that satisfies y(−1) = 0. Use technology to graph this particular solution.

y
4.5
3.5
2.5
2
1.5
t
-1 -0.5 0.5 1
dy
√ √
Figure 6.8: Solution to dt = t y 2 on (− 2, + 2) with y(0) = 1.
Solution.
a. In this case g(y) = y + sin y and f (t) = 2t. Hence separating variables and integrating, bearing mind the
equation is not defined for values of y satisfying y = − sin y, yields
Z Z
(y + sin y) dy = 2t dt
y2
− cos y = t2 + C
2
We use technology to plot the solutions as implicitly defined functions yielding the family of solutions in
Figure 6.9.
dy 2t
Figure 6.9: Plots of solutions y(t) to the equation dt = y+sin y bearing in mind the horizontal axis y = 0 must be
excluded
b. Although all the points y = 0 must be excluded, from Figure 6.9, the curves we plotted using our
technology, suggest the solutions are well-behaved, since they appear in this figure to cross the line
y = 0. This apparent continuity across the excluded points is in contrast to the solution in the previous
example, where√ we see from Figure 6.8 that the solution appears to approach infinity at the excluded
points t = ± 2). Thus, we might try solving the equation from the point y(−1) = 0 in the hope the
solution remains defined and continuous as it crosses the line y = 0. In this spirit, we follow the separation
of variables approach to see if it will yield a solution:
y2
− cos y = t2 + C
2

0 − cos 0 = (−1)2 + C
−2 = C
Thus, the particular solution is
y2
− cos y = t2 − 2
2
and this solution is shown graphically (using technology) in Figure 6.10.
dy 2t
Figure 6.10: A plot of the solution y versus t to the equation dt = y+sin y with y(−1) = 0
One of the principle examples of the previous section is the logistic equation which has the general form

dN N
= rN 1 −
dt K
where N is the population abundance, r is the intrinsic rate of growth and K is the carrying capacity. For different
data sets we were able to estimate the parameters r and K. In Example 3 of Section 6.1, we estimated r ≈ 0.53 and
K ≈ 660 for the yeast data set of Carlson. We can now find an analytic solution to this equation and see how well
the model describes the data set.
Example 7. Logistic growth of Carlson’s yeast data
Find a solution to
dN
= 0.53 N (1 − N/660) N (0) = 9.6
dt
and compare the solution to the Carlson yeast data set.
1
Solution. In this case g(N ) = N (1−N/660) and f (t) = 0.53. Hence separating variables and integrating yields
Z Z
1
dN = 0.53 dt Integrate both sides.
N (1 − N/660)
Z Z
1 1
+ dN = 0.53 dt Partial fractions
N 660 − N
ln |N | − ln |660 − N | = 0.53t + C Integrate
To find C corresponding to the solution that passes through the point t = 0 and N = 9.6 we solve:
ln 9.6 − ln(660 − 9.6) = 0+C
C ≈ −4.2158

The particular solution is found, and then we solve for N :
ln |N | − ln |660 − N | = 0.53t − 4.2158

N
ln
= 0.53t − 4.2158
660 − N
N
e(0.53t−4.2158) =
660 − N
N
e−4.2158 e0.53t =
660 − N
N
0.01476e0.53t =
660 − N
9.7416e0.53t
N =
1 + 0.01476e0.53t
A plot of the solution against the Carlson yeast data is shown in Figure 6.11 and illustrates a very good fit.
600
500
400
300
200
100
2.5 5 7.5 10 12.5 15 17.5
Figure 6.11: Solution of logistic equation plotted against the Carlson data.
Problem Set 6.2

Verify in Problems 1 to 8 that if y satisfies the prescribed relationship with t, then it will be a solution of the given
dy
1. If t2 + y 2 = 7, then dt = − yt .
dy 5t
2. If 5t2 − 2y 2 = 3, then dt = 2y
dy −y
3. If y = C/t for t 6= 0, then dt = t .
dy (2x−3y)
4. If x2 − 3xy + y 2 = 5, then dx = (3x−2y) .
dy
5. If y = esint then dt = y cos t.
1 dy
6. If y = 1+t , then dt = −y 2 .
dy
7. If y = 100 − 2e −t, then dt = 100 − y.

dy
8. If y = 100 − 2e−3t then dt = 300 − 3y.
Determine whether the function given in Problems 9 to 12 is a solution of

dy
= sin t − y
dt
9. y(t) = 12 (sin t − cos t)
10. y(t) = 21 (10 + sin t − cos t)

11. y(t) = sin t − cos t
12. y(t) = e−t + 12 (sin t − cos t)
Determine whether the function given in Problems 13 to 16 is a solution of

dy 1
= (y 2 − 1)
dt 2
1+et
13. y(t) = 1−et
1−et
14. y(t) = 1+et
15. y(t) = 2 − et
2+et
16. y(t) = 2−et
Solve the differential equations in Problems 17 to 28.

dy
17. dt = y3
dy
18. dt = y sin t
dy
19. dt = cos t
dy t
20. dt = y
dy
21. dt = e−y
dy
22. dt =y−1
dy
23. dx = 3xy
dy
√
24. dx = xy 1 + x2
dy √2xy
25. dx = 1+x2
dy sin x
26. dx = cos y
dy √
27. dx = xy
dy
28. dx = − sec y/x2
Find the solutions to Problems 29 to 36.
dy
29. dt = (1 + y)2 with y(0) = 2

dy
30. dt = yt with y(1) = −1
dy
31. dt = te−t /y with y(0) = 3
dy
32. dt = e−y t with y(−2) = 0
dy t+1
33. dt = y+ey with y(3) = 4
dy
34. dt = ty 2 + 3t2 y 2 with y(−1) = 2
dy
35. dt = y(y − 1) with y(0) = 1/2
dy
36. dt = y(y − 1) with y(0) = 2
37. Create a differential equation of the form

dy
= 5 − t + g(y)
dt
such that y(t) = et is a solution.
38. Create a differential equation of the form

dy
= yh(t)
dt
such that y(t) = cos t is a solution.
39. Doomsday prediction: In 1960, three electrical engineers at the University of Illinois published a paper in
Science entitled “Doomsday.” Based on world population growth data from 1000 AD to 1960 AD, the engineers
found that population growth was faster than proportional to the population size. Using the data, they modeled
the growth of the population as
dP
= 0.4873 P 2 P (0) = 0.2
dt
where P is the population size in billions and t is centuries after 1000 AD. Solve this differential equation and
sketch the solution. What year is doomsday?
40. The logistic equation did a remarkable job in describing the number of new cases of AIDS in the USA from
1980 until the early 1990s, as seen in the figure below.∗ . Let y(t) denote the number of new cases t years after
1980. Then nonlinear regression techniques used to fit the data resulted in the equation
dy
= 0.8y(1 − y/50000) y(0) = 334
dt
a. Find the solution to this differential equation.
b. Plot this solution. What happens as t → ∞?
c. Check the web to see how this compares with the prevalence of HIV in the USA today. What do you
conclude and how do you explain the discrepancy? (there is no right or wrong answer to this last part).
41. A model for tumor growth is the Gompertz function which is a solution to the differential equation
dy K
= ay ln
dt y
where y is the weight of tumor in mg, t is measured in days, a is a constant and K is the limiting size of the
tumor. Assume that a = 0.5 and K = 100.
∗ See the website http://www.nlreg.com/aids.htm

Figure 6.12: New Cases of AIDS in the United States
a. Find a solution to this differential equation that satisfies y(0) = 1 mg.

b. Plot this solution.
42. The 1984 census recorded a population of 15,757,000 Hispanics, while in 1990 the figure was 16,098,000.
Assuming that the rate of population growth is proportional to the population, predict the Hispanics’ U.S.
population in the year 2000. Use the World Wide Web to find the actual Hispanic population in 2000. How
does your prediction compare with the actual number? What do you think can account for any differences?
43. Consider a chemical reaction involving two reactants, A and B, that form a product C. Let [A], [B], and [C]
denote the concentrations of A, B, and C. If a molecule of A encounters molecules of B at a rate proportional
to their concentration, then the law of mass action states that
d[C]
= k[A] · [B]
dt
where k is a positive constant. If the initial concentration of A is a, the initial concentration of B is b and we
set y = [C], then [A] = a − [C], [B] = b − [C], and
dy
= k(a − y)(b − y)
dt
Assume that a = b. Find and plot the solution to this differential equation satisfying y(0) = 0.
44. Populations may exhibit seasonal growth in response to seasonal fluctuations in resource availability. A simple
model accounting for seasonal fluctuations in the abundance N of a population is
dN
= (R + cos t)N
dt
where R is the average per-capita growth rate and t is measured in years.
a. Assume R = 0 and find a solution to this differential that satisfies N (0) = N0 . What can you say
about N (t) at t → ∞?
b. Assume R = 1 (more generally R > 0) and find a solution to this differential that satisfies N (0) = N0 .
What can you say about N (t) at t → ∞?
c. Assume R = −1 (more generally R < 0) and find a solution to this differential that satisfies N (0) =
N0 . What can you say about N (t) at t → ∞?

572 6.3. LINEAR MODELS IN BIOLOGY
6.3 Linear Models in Biology

An important class of models are described by the linear differential equation
dy
= c0 + c1 y
dt
where the constants c0 and c1 are model parameters that have specific physical or biological interpretations. For
example, in Section 6.1 we saw how models with c0 = 0 and c1 = r were used to describe exponential population
growth (c1 = R) and radioactive decay (c1 = −λ). In this section, we discuss further applications where the constant
coefficient c0 is non-zero.
Mixing models
Mixing models are formulated on the premise that the density of individuals or concentration of molecules, which are
generically characterized in terms of a number of objects per unit area or volume, form a homogeneous pool such that
the flow of objects into the pool is controlled by an external constant rate while the flow of objects out of the pool
is in proportion to the density of objects in the pool. This latter assumption implies that the greater the density of
objects in the pool, the faster the total flow of objects out of the pool.
Let y(t) represent the density of objects in a pool at time t. If objects flow into
this pool at a constant rate a > 0 and out of this pool at a rate by(t) > 0, i.e. at a
constant per-capita rate b > 0, then the density of objects in the pool over time is
Mixing Model governed by the equation
dy
= Rate In − Rate Out = a − b y.
dt
Example 1. Modeling HIV
Human immunodeficiency virus-type 1 (HIV-1) has many puzzling quantitative features. For instance, most HIV
patients undergo a 10 year period during which the concentration of virus in the plasma is very low. It is only after
this quiescent period that a patient experiences the onset of AIDS. The reason for this quiescent period is unknown,
and it was presumed that during this period the virus was relatively inactive. Using models, Perelson and colleagues
quantified viral levels in the blood of infected individuals during this quiescent period.∗
Specifically, Perelson and colleagues let the concentration of viral particles in the blood plasma be represented by
the variable V (t). They assumed that HIV viral particles infused into the blood, from production sites in lymphatic
tissue, at a constant rate P > 0 and were eliminated from the blood at a rate cV (t), where c > 0 is referred to as
the elimination rate constant. From these assumptions they obtained the mixing model
dV
= P − c V,
dt
where t is measured in days. Both P and c are unknown constants.
a. Data showed that after being put on a potent antiviral drug, the viral concentration fell exponentially
in the blood. Assuming that the drug killed the production of new virus completely in lymphatic tissue,
Perelson et al. estimated the half-life of the viral particles to be 0.2 days. Use this information to estimate
the elimination rate constant, c.
∗ A.S. Perelson, A.U. Neumann, M. Markowitz, J.M. Leonard, D.D. Ho: “HIV-1 Dynamics in vivo: virion clearance rate, infected cell
lifespan, and viral generation time” (1996): Science, 271, 1582-1586 and A. S. Perelson, P. W. Nelson: “Mathematical Analysis of HIV-1
Dynamics in vivo” (1999): SIAM Review, 41, 3–44.

6.3. LINEAR MODELS IN BIOLOGY 573
b. Perelson et al. estimated that prior to drug administration, the mean plasma viral level was 2.16 · 105
viral particles per milliliter (ppmL). Assume that before the administering of drugs the system was at
equilibrium (i.e. dV /dt = 0). Using the estimate of c in the previous part, now estimate the rate P of
viral particles coming from lymphatic tissue.
Solution.
a. To estimate the clearance rate of all the viral particles currently in the blood of an individual, we solve
the equation with the production parameter P = 0. Specifically:
dV
= −cV
Z dt Z
1
dV = −c dt
V
ln |V | = −ct + C1
|V | = e−ct+C1
V = Ce−ct for biological reasons we assume positivity,
V (0.2) 1
where C = eC1 is the initial viral load V (0). Since the half-life is 0.2 days, we know that V (0) = 2 when
t = 0.2. Thus,
Ce−c 0.2 1
=
C 2
which can be solved to yield c = 5 ln 2 ≈ 3.47.
b. If the viral density in a patients blood is at equilibrium (dV /dt) prior to the application of the drug, then
dV
dt = 0 and we can solve for P :
dV
= P − cV Given equation
dt
0 = P − 3.47 V Substitute known values.
0 = P − 3.47 (2.16 · 105 ) Substitute given plasma level
P = 749, 520 Solve for P .
Hence, we estimate about 749,520 viral particles per mL per day. According to Perelson et al. (1996),
the typical individual has approximately 5.6 liters of blood, which means during the quiescent phase
749, 520 × 5.6 × 103 ≈ 4.2 × 109
viral particles are being created per day. Thus, this dormant phase still exhibits “the raging fire of active
HIV replication”. Consequently, the authors suggested that “early and aggressive therapeutic intervention
is necessary if a marked clinical impact is to be achieved.”
2
Hospital patients often receive a drug by intravenous infusion. For drugs to be administered effectively and
safely, the correct infusion rate must be determined. Differential equation models are a basic tool used by doctors
to determine these infusion rates. These models are known as pharmacokinetics or biopharmaceutics models.∗
Example 2. Determining an infusion rate
An asthmatic patient is given a continuous infusion of theophylline to relax and open the air passages in his
lungs. The desired steady-state level of theophylline in the patients blood stream is 15 mg/L. The average half-life
of theophylline is about 4 hours, and the patient has 5.6 liters of blood.
∗ Check out http://www.boomer.org/c/p1/index.html for a whole course on this topic.

a. Find the necessary infusion rate.

b. Determine how long it takes for the concentration of theophylline to be 10mg/liter.
Solution.
a. First, we write down a differential equation model. Let y(t) be the amount (mg) of theophylline in the
blood plasma at time t, in hours. Let a denote the rate (mg/hr) at which theophylline enters the blood
stream via infusion. Let c denote the elimination rate constant of the theophylline. Then,
dy
= a − cy
dt
To determine c we use the fact that the half-life of theophylline is about four hours. What this means is
that in the absence of the infusion (i.e. when a = 0), half of the theophylline leaves the blood plasma in
four hours. Solving dy
dt = −cy yields y(t) = y(0)e
−ct
. Since y(4) 1
y(0) = 2 , we can solve for c as follows:
1
= e−c 4 Half-life is 4 hours.
2
1
c = ln 2 ≈ 0.17 Solve for c, the infusion rate.
4
To find a we want the equilibrium (i.e. the y-value for which dy/dt = 0) to hold at y =15 mg/L × 5.6 L
= 84 mg:
dy
= a − cy
dt
0 = a − 0.17 · 84
a ≈ 14.28
The desired infusion rate is approximately 14.3 mg/h.

b. To determine how long it takes to reach a concentration of 10 mg/L, we need to solve the differential
equation subject to the initial condition y(0) = 0 (i.e. initially, there is no drug in the patient). First,
using our separation of variables method, the solution for any value of a and c is
dy
= a − cy
Z dt Z
dy
= dt provided y 6= a/c which we note is also a solution
a − cy
ln |a − cy|
= t+C
−c
ln |a − cy| = −ct − cC
a − cy = ±Ke−ct where K = ±e−cC is still an arbitrary constant
a
y = (1 − ke−ct ) where k = K
a is still an arbitrary constant
c
Now to solve for k corresponding to the solution that passes through y = 0 at t = 0. This implies ke0 = 1
or simply k = 1. Since a = 14.28 and c = 0.17, which implies ac = 14.28 0.17 = 84, it follows from our
derivation of the solution above for arbitrary a and c that the particular solution we want is

y = 84 1 − e−.017t .
Finally, since a concentration of 10 mg/L corresponds to having 10 × 5.6 mg = 56 mg in the blood, we

need to solve

84 1 − e−.017t = 56

1
e−0.17t =
3
1
−0.17t = ln
3
t = 6.46
This it will take about 6.5 hours for the concentration of theophylline to reach 10 mg/L.
In the first example, we determined the elimination rate constant when the half-life of the viral particles in the
patient’s blood was known. In the second example, we determined the infusion rate necessary to maintain a desired
concentration of drug in the patient’s blood. In the next example, we determine the elimination rate constant when
the half-life of the viral particles in the patient’s blood is not known.
Example 3. Determining an elimination rate constant
Consider a patient receiving a drug intravenously at a rate of 10 mg/h. An hour later, the concentration of
drug in the patient’s body is 1 mg/L. Assuming the patient has 5 liters of blood and the drug is lost at a rate
proportional to amount of drug in the body, find the elimination rate constant of the drug. Finally, determine the
limiting concentration of drug in the patients body.
Solution. Let y denote the amount of drug in the patient’s body. Then y/5 is the concentration of drug in the
patients body. The rate of change of y is given by
dy
= 10 − by
dt
where 10 is the infusion rate and b > 0 is the elimination rate constant. We now solve this differential equation, as
we did in the previous example using separation of variables to obtain for the case y(0) = 0:
10
y= (1 − e−bt ).
b
To find the elimination rate constant, b, we can use the fact that
mg
y(1) = 1 · 5L = 5mg
L
Now, we need to solve
10
5= (1 − e−b )
b
This is not a particularly easy equation to solve, so we use technology to find b ≈ 1.6.
To find the limiting concentration of the drug in the patient’s body, we must find the limit of y(t) as t approaches
∞.
10 10
lim (1 − e−1.6t ) = = 6.25
t→∞ 1.6 1.6
The limiting quantity is 6.25 milligrams in 5 liters of blood, or 1.25 mg/L. 2
An interesting application using a mixing model is to model the concentration of pollution in a lake.
Example 4. Lake Pollution
A well-mixed lake with constant volume 100 km3 is fed by rivers and tributaries at a rate of 48 km3 /yr and
factories are dumping polluted water into the lake at a rate of 2 km3 /yr. Environmental studies have shown that
after mixing if the percentage of water in the lake from polluted sources exceeds 2%, then the water becomes a
hazardous environment for the fish.

a. Does the lake ever reach this percentage of polluted water mixed in with fresh water? If so, when?
b. How much would one have to reduce the polluted water input into the lake to reduce the long-term
mixture of water to 2% or less from polluted sources?
Solution.
a. Let y (units km3 ) denote the total amount of polluted water mixed into the lake. Assume that y(0) = 0.
The rate polluted water enters the lake is 2 km3 /yr. Finding the rate at which they are leaving is more
difficult. Since the lake volume is assumed to be constant, the rate at which water and pollutants are
leaving the lake is 48 + 2 = 50 km3 /yr. The proportion of water in the lake that comes from the polluted
flow is y/100. Hence the rate at which pollutants are leaving after being well mixed with fresh water is
y y km3
× 50 =
100 2 yr
Thus, our initial value problem is
dy
= Entering Rate − Leaving Rate
dt
y
= 2−
2
with initial value y(0) = 0. As in the previous two examples, we can use separation of variables to solve
this differential equation to obtain for the case y(0) = 0, with a = 2 and c = 1/2, the particular solution
a
y(t) = (1 − e−ct ) = 4(1 − e−t/2 ) km3
c
Since limt→∞ y(t) = 4, the eventual proportion of polluted water in the well-mixed lake is 4 km3 , which
is 4% of the lake’s 100 km3 volume. Thus, the lake will reach the hazardous level. To find the time at
which it reaches the 2% hazardous level, we can solve
2 = 4 − 4e−t/2
−t/2
e = 0.5
t = 2 ln 2 ≈ 1.39
This is about 1 year, 5 months.
b. To assure the polluted water never exceeds 2% of the total, we reformulate the model as
dy y
=p−
dt 2
where p is the rate polluted water is dumped into the lake, and the initial value is y(0) = 0. Polluted
water levels in the lake will approach the equilibrium level given by
y
0 = p−
2
y = 2p km3
Hence, if flow of polluted water is reduced to 1 km3 per year, then the long-term (i.e. equilibrium)
porportion of polluted water in the lake is 2%.

Newton’s Cooling Law and Forensic Medicine

The following two quotations are germane in the field of forensic medicine:∗
The time of death is sometimes extremely important. It is a question almost invariably asked by police
officers, sometimes with a touching faith in the accuracy of the estimate. Determining the time of death
is extremely difficult, and accuracy is impossible.∗
No problem in forensic medicine has been investigated as thoroughly as that of determining the time of
death on the basis of post mortem findings. Apart from its obvious legal importance, its solution has
been so elusive as to provide a constant intellectual challenge to workers in many sciences. In spite of the
great effort and ingenuity expended, the results have been meagre.∗
Here we describe one “meagre” attempt that involves Newton’s Law of cooling. Newton’s law states that the
rate at which the temperature of a body changes is proportional to the difference between the body’s temperature
T and the ambient temperature A. Mathematically, this statement translates to the following differential equation
dT
= k(A − T )
dt
where k a positive constant proportionate to the thermal conductivity of the body. A large k means the body readily
conducts heat and quickly adjusts to the ambient temperature. A small k means the body is well-insulated and
slowly adjusts to the ambient temperature.
Example 5. Put on your detective caps
On a dark and stormy night, Sherlock Holmes and Dr. Watson were called to investigate the shocking murder of
Jacob Marley. The main suspects of this crime were three people that would benefit from his death. First, there is
Marley’s business partner, Ebenezer Scrooge, who was having strong disagreements with Marley about how to run
the business. Scrooge spent the evening alone working late at his office. His staff confirmed that he arrived home at
9 PM and remained home for the rest of the night. Second, there is Marley’s wife, Claudia, who was having an affair
with another man. Claudia stood to inherit Marley’s fortune. Claudia was at the theater from 8 PM to 9:30 PM, as
verified by several people at the theater. Finally, there was Marley’s client, Sam Wise Gange, who Marley swindled
out of a large sum of money. Sam was at a local pub until 11:00 PM, also verified by several people. Marley’s body
was found in an alley at 1:30 AM. The alley temperature was a nippy 55 degrees and the body temperature was
87◦ F. One hour later the body temperature had cooled to 85◦ F. Given this information, determine who has a good
alibi.
Solution. Let T denote the temperature of the body. By Newton’s law of cooling
dT
= k(55 − T )
Z dt Z
dT
= k dt provided T 6= 55
55 − T
− ln |55 − T | = kt + C1
T (t) = 55 + Ce−kt where C = ±e−C1
To determine C and k, let us identify t = 0 with 1:30 AM. Since at this time the body temperature was 87◦ F, it
follows that
87 = T (0) = 55 + C
C = 32
∗ They can be found at the web site http://www.dundee.ac.uk/forensicmedicine/llb/timedeath.htm
∗ Knight, page 115 of Legal Aspects of Medical Practice, 4th edition, (1987), Churchill Livingstone, Edinburgh.
∗ Jaffe, page 33 in A Guide to Pathological Evidence : For Lawyers and Police Officers, 2nd edition, (1983), Carswell Criminal Law
Series, Carswell Ltd., Toronto.

Thus,
T (t) = 55 + 32e−kt
To find k we use the information that the body temperature was 85◦ F an hour after 1:30 AM, which implies
85 = T (1) = 55 + 32e−k
30 = 32e−k
15
−k = ln
16
16
k = ln ≈ 0.065
15
Finally, to determine the time of death, we need to solve backwards in time to the point where the body temperate
was a normal 98.6◦ F, that is:
98.6 = T (t) = 55 + 32e−0.065t

43.6 = 32e−0.065t
43.6
−0.065t = ln
32
t ≈ −4.76
Thus, Marley died approximately 4 h 45 min before the body was found; that is approximately 8 : 45 PM. Thus,
Claudia and Sam have alibis for the murder, while Scrooge does not. 2
Organismal growth
The study of growth involves determining the body size as a function of age. Various measurements of body size exist
including weight, length, and girth. A famous equation, the von Bertalanffy growth equation, that describes
growth of an organism can be derived from first principles using scaling laws. To derive this equation, consider a
cubical critter with length L as illustrated in Figure 6.13.
Figure 6.13: Cubical critter
The surface area of this critter is 6L2 and the volume is L3 . If we assume length is measured in centimeters and
the critter is mostly made of water, which we note has a density of 1 g/cm3 , then its mass M is L3 grams. If a critter
ingests food at a rate proportional to its surface area and respires at a rate proportional to its mass, then
dM
= aL2 − bL3
dt
where a and b are positive proportionality constants. Since M = L3 , we obtain
dM dL
= 3L2
dt dt
Combining the previous two equations yields

dL 1 −2 dM
= L
dt 3 dt
1 −2
= L (aL2 − bL3 )
3
a b
= − L
3 3
Defining k = b/3 and L∞ = a/b yields the two-parameter von Bertalanffy growth equation
dL
= k (L∞ − L)
dt
Note that biologist like to use the notation L∞ because it is the equilibrium solution that can also be shown to be
approached asymptotically over time by all other biologically relevant solutions (e.g. see Figure 6.14).
Example 6. von Bertalanffy growth equation
Find a general solution to the von Bertalanffy growth equation

dL
= k (L∞ − L)
dt
with initial value L(0) = L0 > 0. Show that the solution of this equation is as given in Figure 6.14.
Figure 6.14: von Bertalanffy growth equation

.
Solution. Since L(0) > 0, the solution to this equation is more general than the solution found in Example 2
where the initial condition required the solution to pass through 0. Thus we cannot directly apply the solution we
previously found but we use the same separation of variables method to obtain
dL
= k(L∞ − L)
Z dt Z
dL
= k dt
L∞ − L
− ln |L∞ − L| = kt + C1
L∞ − L = ±Ce−kt Since L < L∞
−kt
L(t) = L∞ ± Ce
Next, we use the initial condition to find C and choose the sign based on the fact that L0 < L∞ .
L0 = L∞ − C
C = L∞ − L0

Thus,
L∞ − L0
L(t) = L∞ 1 − e−kt .
L∞
This equation is often written as
L(t) = L∞ 1 − e−k(t−t0 )

L∞ −L0
where it follows that ekt0 = L∞ . If we now solve this identity for t0 we obtain a negative time

1 L0
t0 = ln 1 −
k L∞
that corresponds to the time when the function L(t) is zero: i.e. L(t0 ) = 0. This time is sometimes thought of as
the theoretical time of conception but is only a meaningful concept if the same growth equation applies at all stages
of development (which is not generally a reasonable assumption). 2
Problem Set 6.3

In Example 1 we modeled HIV using the data of Perelson et al. In that example it was assumed that the half-life
of the viral particles was 0.2 days (or 4.8 hours) and that the mean plasma viral level was 2.16 · 105 viral particles per
milliliter (ppmL). Estimate the elimination rate constant for the half-life given in Problems 1 to 6 and then estimate
the daily rate of production of HIV viral particles for the specified mean plasma viral level.
1. 2.4 hours; 1.89 · 105 viral ppmL
2. 3 hours; 2.15 · 105 viral ppmL
6. 7.2 hours; 2.75 · 105 viral ppmL
Using the information from Example 2, determine the length of time it takes for the concentration of theophylline to
be the quantity given in Problems 7 to 12.
7. 5 mg/L
8. 7 mg/L
9. 12 mg/L
10. 14 mg/L
11. 14.5 mg/L
12. 14.99 mg/L
Find the amount of drug in the patient’s body, given the infusion rate and the concentration of the drug one hour
later as given in Problems 13 to 18. You should assume the patient has 5 liters of blood.
13. 10 mg/h; 1.6 mg/L

14. 12 mg/h; 1 mg/L
15. 12 mg/h; 2 mg/L

16. 20 mg/h; 1 mg/h

17. 20 mg/h; 2 mg/h
18. 20 mg/h; 3 mg/h
Rework Example 4, using all of the given information and changing only the lake size and outflow as shown in
Problems 19 to 24.
19. Lake size of 50 km3 and outflow of 23 km3 /year.


In Problems 25 to 28, set up an appropriate model to answer the given question. These problems use a special case
of the linear model in which the relative rate of change remains constant.
25. In 1990, the gross domestic product (GDP) of the United States was 5,464 million. Suppose the growth rate
from 1989 to 1990 was 5.08%. Predict the GDP in 2003. Check your answer by finding the actual 2003 GDP.
26. In 1980, the gross domestic product (GDP) of the United States in constant 1972 dollars was 1,481 million.
Suppose the growth rate from 1980 to 1984 was 2.5% per year. Predict the GDP in 2003. Check your answer
by finding the actual 2003 GDP.
27. According to the Department of Health and Human Services, the divorce rate in 1990 in the United States was
4.7% and there were 1,175,000 divorces that year. How many divorces will there be in 2004 if the divorce rate
is constant?
28. According to the Department of Health and Human Services, the marriage rate in 1990 in the United States
was 9.8% and there were 2,448,000 marriages that year. How many marriages will there be in 2004 if the
marriage rate is constant?
29. The rate at which a drug is absorbed into the blood system is given by
db
= α − βb
dt
where b(t) is the concentration of the drug in the bloodstream at time t. What does b(t) approach in the long
run (that is as t → ∞)? At what time is b(t) equal to half this limiting value? Assume b(0) = 0.
30. Calculate the infusion rate in mg/h required to maintain a long-term drug concentration of 50 mg/L (i.e. the
rate of change of drug in the body equals zero when the concentration is 50mg/L). Assume the half life of the
drug is 3.2 hours and the patient has 5 liters of blood.
31. Calculate the infusion rate in mg/h required to maintain a desired drug concentration of 2 mg/L. Assume the
patient has 5.6 liters of blood and the half life of the drug is 2.7 hours.
32. Calculate the infusion rate required to achieve a desired drug concentration of 2 mg/L in 1 hour. Assume the
elimination rate constant of the drug is 5 per hour and the patient has 6 liters of blood.
33. Calculate the infusion rate required to achieve a desired drug concentration of 12 mg/L in 20 minutes. Assume
the elimination rate constant is 2 per hour and the patient has 5 liters of blood.

34. A drug is given at an infusion rate of 50 mg/h. The drug concentration value determined at 3 h after the start
of the infusion is 8mg/L. Assuming the patient has 5 liters of blood estimate the half-life of this drug.
35. A drug has given at an infusion rate of 250 mg/h. The drug concentration determined at 4 h after the start of
the infusion is 50 mg/L. Assuming the patient has 5.5 liters of blood, estimate the elimination rate constant of
this drug.
36. A lake with a constant volume of 10,000 m3 is initially clean and pristine. Water flows into the lake from two
streams, Babbling Brook and Raging Rapids, at rates of 250 m3 per day and 750 m3 per day, respectively. At
time t = 0, road salt from a nearby road contaminates Babbling Brook with concentration of 2 kilograms per
m3 . Find an equation that describes the amount of salt in the lake for all t ≥ 0 and find the limiting amount
of salt in the lake.
37. After one hydrodynamic experiment a tank contains 300 liters of a dye solution with a dye concentration of 2
g/L. To prepare for the next experiment the tank is to be rinsed with water flowing in at a rate of 2 L/min,
the well-stirred solution flowing out at the same rate. Write down an equation that describes the amount of
dye in the container. Be sure to identify variables and their units.
38. At midnight, the coroner was called to the scene of the brutal murder of Casper Cooly. The coroner arrives
and notes that the air temperature was 70◦ F and Casper’s body temperature was 85◦ F. At 2 AM, he notes
that the body has cooled to 76◦ F. The police arrested Cooly’s business partner Tatum Twit and charged her
with the murder. She has an eyewitnesses that said she left the theater at 11:00 PM. Does her alibi help?
39. A cup of coffee at a coffee shop is served at 95◦ C and left on the counter. The coffee shop is air conditioned
with an ambient temperature of 20◦ C. After 5 minutes, the coffee’s temperature is 45◦ C. Determine how long
before the coffee looses its taste quality, i.e., it cools down to the temperature of 22◦ C.
Figure 6.15: The von Bertalanffy curve fitted to age and body length
Data for female (◦, dashed line) and male (▽, solid line) polar bears Ursus maritimus captured in the Svalbard area.
40. Uranium-234 (half-life 2.48 × 105 yr) decays to thorium-230 (half-life 80,000 yr).
a. If U (t) and T (t) are the amounts of uranium and thorium at time t, then
dU dT
= −k1 U = −k2 T + k1 U
dt dt
Solve this system of differential equations to obtain U (t) and T (t).
b. If we start with 100 g of pure U-234, how much Th-230 will there be after t = 5, 000 yr?
41. The von Bertalanffy curve was used to examine growth patterns in both body length and mass of female and
male polar bears (Ursus maritimus) live-captured near Svalbard, Norway (see Figure 6.15).

A longer growth period in males resulted in pronounced sexual dimorphism in both body length and mass.
Males were 1.16 times longer and 2.10 times heavier than females.”∗ For females L∞ = 194 cm, k = 0.75/yr,
and t0 = −0.27 is the theoretical age at which the polar bear would have no length (L0 = 0). For males,
L∞ = 225 cm, k = 0.537/yr, and t0 = −0.395 is the theoretical age at which the polar bear would have no
length. Using the von Bertalanffy curve to determine at what age males and females achieve half of their
limiting size?
∗ A. E. Derocher and ØWiig, Postnatal growth in body length and mass of polar bears (Ursus maritimus) at Svalbard,J. Zool., Lond.
(2002) 256, 343–349

584 6.4. SLOPE FIELDS AND EULER’S METHOD
6.4 Slope Fields and Euler’s Method

Not all equations are separable, many separable equations do not lead to explicit solutions, and even when you
find a solution it may be so complex that it is nearly impossible to interpret what it means. To address these issues,
we discuss a qualitative method, slope fields, and a numerical method, Euler’s method, for studying solutions of
differential equations.
Slope fields
Consider a differential equation of the form
dy
= f (t, y)
dt
where f (t, y) denotes an expression involving t and y. Since a solution y(t) to this differential equation satisfies
y ′ (t) = f (t, y(t)), it follows that the slope of all solutions at time t are given by the right hand side f (t, y(t)) of the
differential equation. Equivalently, a solution through a point (t, y) is tangent to a line that passing through the
point (t, y) and that has slope f (t, y). A qualitative way to investigate the behavior of solutions to dy dt = f (t, y) is to
sketch its slope field. We introduced slope fields in Section 5.1. Recall, a slope fields is a figure in the ty-plane with
infinitesimal line segments of slope f (t, y) at (t, y).
There are two ways to generate slope fields. One method is by using technology and the other is by hand. We will
not construct a slope field for dy/dt = 1/t by hand. Since for t = 1 the slope is 11 = 1, we draw short line segments
at t = 1, each with slope 1, for different y-values, as shown in Figure 6.16a. If t = −3, then the slope is − 13 and we
draw short line segments at t = −3, each with slope −1/3, also shown in Figure 6.16a. If we continue to plot these
slope points for different values of t, we obtain many little slope lines. The resulting graph, shown in Figure 6.16b
is the slope field for the equation dy dy
dt = 1/t. Finally, notice the relationship between the slope field for dt = 1/t and
its solutions y = ln |t| + C. If we choose particular values for C, say C = 0, C = − ln 2, or C = 2, and draw these
particular antiderivatives as shown in Figure 6.16c, we notice that these particular solutions are anticipated by the
slope field drawn in part b.
a. Beginning of a slope field b. slope field c. Particular solutions
1
Figure 6.16: Solution of the differential equation y ′ = y using a slope field
While the slope field for dy

dt = 1/t was relatively straightforward to sketch, these sketches can be more challenging
for differential equations with a more complicated right hand side. In the next example, we illustrate how to handle
these cases.
Example 1. Solving a differential equation using a slope field

6.4. SLOPE FIELDS AND EULER’S METHOD 585
Consider a drug that continuously infuses at a periodic rate into a patient. One possible differential equation
modeling such a scenario is
dy
= 10 + 10 sin t − y
dt
where y is the amount of drug (in mg) and t is the time (in h). Notice that this is just a mixing problem (See
Example 2 in previous section), where the input is now the time dependent infusion rate 10 + 10 sin t mg/h and
the elimination rate constant is 1/h. Sketch the slope field for this differential equation, and sketch the particular
solution that satisfies y(0) = 0.
Solution. To sketch the slope field by hand, it often suffices to determine precisely where
dy dy dy
=0 <0 >0
dt dt dt
The set of points (t, y) for which dy
dt = 0 is called the nullcline of the differential equation. For this example, it is the
set of points in the ty-plane that satisfy 0 = dydt = 10 + 10 sin t − y. In other words, it is the graph of y = 10 + 10 sin t.
Everywhere along this nullcline y = 10 + 10 sin t, the slopes are 0. So along this curve, draw small line segments with
slope 0.
Next, for y > 10 + 10 sin t, dy
dt < 0 so above the nullcline draw line segments with negative slopes (shaded red).
Moreover, the slopes of these line segments get closer to 0 as the line segments get closer to the curve y = 10 + 10 sin t.
Finally, for y < 10 + 10 sin t, we have dy dt > 0 so below the nullcline draw line segments with positive slopes
(shaded green). Moreover, the slopes of these line segments get closer to 0 as the line segments get closer to the
curve y = 10 + 10 sin t. This work yields a sketch similar to what is shown in Figure 6.17a.
20
18
20 16
18
14
16
12
14
y
10
12
8
y
10
8 6
6
4
4
2
2
0
0 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
t t
dy
a. Slope field for dt = 10 + 10 sin t − y b. Particular solution passing through (0, 0)
dy
Figure 6.17: Slope field and a solution for dt = 10 + 10 sin t − y.
To sketch the solution satisfying y(0) = 0, we sketch a curve starting at t = 0, y = 0 that remains tangent to the
slope field. This should lead to a sketch similar to Figure 6.17b. This qualitative analysis correctly suggests that this
solution eventually exhibits well defined oscillations. In fact, this can be verified by solving this differential equation
using integration factors, a technique that is taught in most differential equation courses. 2
Using slope fields is particularly important for equations where finding explicit solutions is impossible or hideously
complicated. In the next example, the equations are separable. However, solving for the variable N requires solving
for the roots of a cubic. Solving this cubic results in complicated expressions that shed little light onto the behavior
of the model.
The next example goes beyond the logistic growth model introduced at the beginning of this Chapter. The
example is named after Warder Clyde Allee, an American ecologist and one of the first to write extensively on
ecological aspects of animal aggregations. Allee argued in the 1920s, that for many populations, the per-capita

growth rate should increase rather than decrease (as in the logistic) when the population density is low. In honor
of Allee’s work in this area, this phenomenon is called the Allee effect. Reasons for this Allee effect relate to
synergistic effects of cooperative behavior in bringing down prey (e.g. lions), improved chances of finding mates (e.g.
whales), and warding off predators (e.g. antelope).
In following the principle of parsimony, the simplest model of growth that we formulated was the linear equation
N ′ = rN . Our extension to account for a finite carrying capacity led us to formulation of a quadratic equation
N ′ = rN (1 − N/K). Thus it should come as no surprise that the inclusion of the Allee effect, while maintaining the
phenomenon of a finite carrying capacity, leads to the formulation of a cubic growth model N ′ = rN ((1 − N/K)(N −
A). In the next example, we explore the behavior of solutions to this cubic growth model.
Example 2. The Allee effect
Consider the model

dN N N
= rN 1−
dt K A−1
where r > 0 and 0 < A < K.
a. Sketch the slope field for this equation assuming r = 1, K = 200, and A = 50.
b. Sketch solutions satisfying N (0) = 49 and N (0) = 55. What can you conclude?
Solution.
a. To sketch slope field, we first solve for the nullclines. This corresponds to the set of points in the t–N -plane
for which
dN N N
0= =N 1−
dt 200 50 − 1
Hence, the nullclines are given by the lines
N = 0, N = 50, andN = 200
in the tN -plane. Along the lines we sketch horizontal line segments.

For 0 < N < 50 and N > 200, we have dN dt < 0 (shaded red). Hence, between the lines N = 0 and
N = 50 and above the line N = 200, we sketch line segments with negative slope. Moreover, the slope of
these line segments gets closer to zero as the line segments get closer to N = 0, N = 50, or N = 200.
For 50 < N < 200, we have dN dt > 0 (shaded green). Hence, between the lines N = 50 and N = 200, we
draw line segments with positive slope. Moreover, the slope of these line segments get closer to zero as
the line segments get closer to N = 50 or N = 200. This work yields a sketch similar to Figure 6.18a.
b. To sketch a solution satisfying N (0) = 49, we sketch a curve passing through the point t = 0, N = 49
that remains tangent to the slope field (that is, “go with the flow”). This curve is shown in Figure 6.19b
which starts at (0, 49) and becomes asymptotic to the t-axis.
Similarly, we sketch a solution satisfying N (0) = 55, which is shown in Figure 6.19b which starts at (0, 55)
and becomes asymptotic to the line N = 200.
These solutions suggest that whenever 0 < N (0) < A, the population declines to extinction. Whenever
N (0) > A, the population converges to N = 200.
2
Notice that in the Allee model above, the term (N/A − 1) accounts for the increase in the per-capita growth rate
with increasing density N , while the term (1 − N/K) accounts for the decrease in the per-capita growth rate with
increasing density N . Hence the requirement that 0 < A < K to ensure that effect of the term (N − A) induces a
negative growth rate for 0 < N < A while the positive but decreasing growth rate effect of (1 − N/K) comes into
play as N approaches the saturating level K from below. Also note that both N = A and N = K implies dN dt = 0 in

250 250
200 200
150 150
N
N
100 100
50 50
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
t t
dN N
N

a. Slope field for dt =N 1− 200 50 −1 b. Particular solutions passing through (0, 49) and (0, 55)
dN N
N

Figure 6.18: Slope field for dt =N 1− 200 50 −1 .
this model. Thus any solution that starts at any of these two values is an equilibrium solution—that is, the solutions
remain at the respective values N = A and N = K for all time.
As you may have noticed, while it is easy to qualitatively sketch a slope field, it would quite tedious to construct
an accurate slope fields by hand. Fortunately today it is easy to use technology to create slope fields. Check with
your calculator to see if it can sketch slope fields. In addition, there exist many programs that create slope fields.∗
For example, the slope field and solutions for Example 2 generated using technology are shown in Figure 6.19.
N
250
200
150
100
50
t
0.1 0.2 0.3 0.4 0.5
Figure 6.19: Using technology to graph a slope field and particular solutions
Any model of the form dy/dt = f (y), irrespective of whether f (y) is linear (exponential model), quadratic (logistic
model), or cubic (Allee model) is called autonomous because the associated slope field is independent of time (i.e.
the function f (y) does not explicitly depend on time). We discuss these equations in much greater detail in next
two sections. The counterparts to these autonomous models are nonautonomous models that have growth functions
f that depend explicitly on time. For example, consider the case
dy
= f (t)
dt
in which the slope field is purely time dependent—that is, it does not depend at all on the variable y. In this case,
a solution y(t) is an antiderivative of f (t).
∗ A particularly nice one is a java script, DField, written by John C. Polking at Rice University. To try this java script, go to the
website: http://math.rice.edu/ dfield/dfpp.html

Example 3. Purely time-dependent slope field
Consider
dy
= −t
dt
Find all solutions to this differential equation, and then sketch several members of the family of curves representing
this solution. Finally, use technology to compare the slope field with this family of solutions.
Solution. We begin by separating the variables and integrating.
dy
= −t
Z dt Z
dy = − dt
t2
y(t) = − +C
2
If we sketch the solutions for a variety of C values, we obtain a family of downward facing parabolas as illustrated
in Figure 6.20a.
y y
4 4
2 2
0 0
-2 -2
t t
-2 0 2 4 -2 0 2 4
a. Several members of the family of solutions b. Slope field compared to family of solutions
Figure 6.20: Comparison of solutions by hand and by technology
dy
Now, use technology to graph the slope field for dt = −t. Notice that the slope lines are tangents to the downward
facing parabolas, as illustrated in Figure 6.20b. 2
Using slope fields, we sometimes can quickly answer questions about the long-term behavior of solutions to a
Example 4. Lake pollution revisited
A lake with constant volume 100 km3 is fed by rivers and tributaries at a rate of 50 km3 /yr and factories are
dumping polluted water into the lake at a rate of 2 km3 /yr. Environmental studies have shown that if the proportion
of polluted water in the well-mixed lake exceeds 2%, then it becomes a hazardous environment for the fish. Previously,
we showed (Example 4 of Section 6.3) that this scenario leads to the following equation
dy y
=2−
dt 2
where y is the proportion of polluted water (in km3 ) in the well-mixed lake and t is time (in years). Sketch a slope
field for this equation and use it to find the limiting values as t → ∞.

Solution. We use technology to draw the slope field in Figure 6.21a. Note that the nullcline is
y
0 = 2−
2
y = 4
and that the slopes above this nullcline are negative and those below are positive.
y y
6 6
5 5
4 4
3 3
2 2
1 1
t t
5 10 15 20 5 10 15 20
a. Slope field b. Particular solutions
Figure 6.21: Slope field and solutions
Sketches of several solutions on this slope field are shown in Figure 6.21b. Hence, as we had previously shown
analytically, the proportion of polluted water in the well-mixed lake approaches a limiting value of y = 4 km3 (or
equivalently an asymptotic concentration of 4% of pollution). 2
Euler’s method
Sometimes it is not possible to solve for the solution of a differential equation analytically, but we want more than
a qualitative sense of the solution. In such situations, numerical methods are important. The simplest numerical
method is Euler’s method which roughly corresponds to sliding in short linear segments along the slope field.
Suppose we have
dy
= f (t, y) y(t0 ) = y0
dt
where f (t, y) is some expression using the variables t and y. The key idea in Euler’s method is to increment t0 by a
small quantity h and then to use the approximation
y(t0 + h) − y(t0 )
≈ y ′ (t0 ) = f (t0 , y0 )
h
which yields the well-know linear approximation
y(t0 + h) ≈ hf (t0 , y0 ) + y(t0 ) = h f (t0 , y0 ) + y0
If we define
t1 = t0 + h and y1 = h f (t0 , y0 ) + y0
then we get the approximation
y(t1 ) ≈ y1
In other words, we are approximating the solution curve y = y(t) near (t0 , y0 ) by the tangent line to the curve at
this point, as shown in Figure 6.22a, and use it to calculate the new approximation for y at t1 which is y1 in the
above equation. We then repeat this process with (t1 , y1 ) assuming the role of (t0 , y0 ) to obtain an approximation of
the solution y = y(t) over the interval [y1 , y2 ] where t2 = t1 + h and
y2 = h f (t1 , y1 ) + y1

Continuing in this fashion, we obtain a sequence of line segments that approximates the shape of the solution curve
as shown in Figure 6.22b and determined by the sequence of points:
ti+1 = ti + h
yi+1 = yi + hf (ti , yi ) i = 0, 1, 2, . . .
a. The first Euler approximation b. Graphical representation of Euler’s method
Figure 6.22: Euler’s method
Euler’s method is illustrated in the following example.
Example 5. Euler’s method
Use Euler’s method with h = 0.1 to estimate the solution of the initial value problem
dy
= t + y2 y(0) = 1
dt
over the interval [0, 0.5].
Solution. Before using Euler’s method, we might first look at a graphical solution. The slope field is shown, along
with a particular solution through the point (0, 1), in Figure 6.23a.
To use Euler’s method for this example, we note
f (t, y) = t + y 2 , t0 = 0, y0 = 1.

3 3
2.5 2.5
2 2
y
y
1.5 1.5
1 1
0.5 0.5
0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
t t
a. Graphical solution using a direction field b. Solution using Euler’s method
Figure 6.23: Comparison of graphical solution and Euler’s method solutions
Then for h = 0.1 the Euler approximation is (correct to four decimal places):
t0 = 0.0; y0 = y(0) = 1
t1 = 0.1; y1 = y0 + hf (t0 , y0 ) = 1 + 0.1(0 + 12 ) = 1.1
t2 = 0.2; y2 = y1 + hf (t1 , y1 ) = 1.1 + 0.1(0.1 + 1.12 ) = 1.2310
t3 = 0.3; y3 = y2 + hf (t2 , y2 ) = 1.2310 + 0.1(0.2 + 1.23102) ≈ 1.4025
t4 = 0.4; y4 = y3 + hf (t3 , y3 ) = 1.4025 + 0.1(0.3 + 1.40252) ≈ 1.6292
t5 = 0.5; y5 = y4 + hf (t4 , y4 ) = 1.6292 + 0.1(0.4 + 1.62922) ≈ 1.9346
These points can be plotted to approximate the solution as shown in Figure 6.23b. Notice that we plotted these
points by superimposing them on the direction field. 2
Example 6. Comparing Euler’s method with and without technology
Use Euler’s method to solve

dy
= sin πt − y, y(0) = 0
dt
on the interval [0, 2] for the case y(0) = 0.
a. By hand, with h = 0.5.
b. Using technology, with h = 0.1.
Solution.
a. We have f (t, y) = sin πt − y with h = 0.5, t0 = 0 and y0 = 0. Thus by Euler’s method we obtain.
t1 = 0.5; y1 = y0 + hf (t0 , y0 ) = 0 + 0.5[sin(π·) − 0.0] = 0

t2 = 1.0; y2 = y1 + hf (t1 , y1 ) = 0 + 0.5[sin(π · 0.5) − 0.0] = 0.5
t3 = 1.5; y3 = y2 + hf (t2 , y2 ) = 0.5 + 0.5[sin(π · 1.0) − 0.5] = 0.25
t4 = 2.0; y4 = y3 + hf (t3 , y3 ) = 0.25 + [sin(π · 1.5) − 0.25] = −0.375
We plot these points in the ty-plane, and connect them with line segments, as shown in Figure 6.24a.

0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
y
0 0
−0.1 −0.1
−0.2 −0.2
−0.3 −0.3
−0.4 −0.4
−0.5 −0.5
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
t t
a. Euler approximation for h = 0.5 b. Euler approximation using technology
dy
Figure 6.24: Euler approximations for a solution of dt = sin πt − y
b. We use technology to graph a slope field, along with the solution using Euler’s method for h = 0.1 as
shown in Figure 6.24b.
As with any numerical scheme it is important to have error bounds to determine how small h needs to be.
Although we do not discuss error bounds in this course, you can learn about them in any introductory numerical
analysis course. If the value of the selected for h is too big, the approximate solutions can be wildly off as the
following example illustrates.
Example 7. Effect of the choice of h
Consider the Logistic equation

dy 4y
= 30y 1 − y(0) = 0.1
dt 3
which has solution (using separation of variables)
3e30t
y(t) =
26 + 4e30t
Compare the plots on [0, 5] of the numerical and actual solution for the given values of h.
a. h = 0.1
b. h = 0.08
c. h = 0.05
Solution.
a. Using Euler’s method for h = 0.1 on [0, 5] yields 50 values for t and y. Plotting these values in the
ty-plane yields the black curve shown in Figure 6.25a. The actual solution is shown in red. As we can
see, the numerical solution acts quite wildly.
b. Using Euler’s method for h = 0.08 on [0, 5] yields 60 values for t and y. Plotting these values (black) are
compared with the actual solution (red) in Figure 6.25b. As you can see, the numerical solution is not a
good approximation, even though it is not quite as wild as that shown in part a.

1 1 1
0.9 0.9 0.9
0.8 0.8 0.8
0.7 0.7 0.7
0.6 0.6 0.6

y
y
0.5 0.5 0.5
0.4 0.4 0.4
0.3 0.3 0.3
0.2 0.2 0.2
0.1 0.1 0.1
0 0 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t t t
a. h = 0.1 b. h = 0.08 c. h = 0.05
dy 4y

Figure 6.25: Euler approximations for solution of dt = 30y 1 − 3
c. Using Euler’s method for h = 0.05 on [0, 5] yields 100 values for t and y. Plotting these values (black) as
compared with the actual solution (red) in Figure 6.25c. As we can see here, the numerical and actual
solutions are virtually indistinguishable.
2
Problem Set 6.4

Sketch at least three particular solutions for each of the slope fields shown in Problems 1 to 6.
1.
2.

3.
4.
5.
6.

Sketch a solution satisfying the specified initial conditions over the slope field in Problems 7 to 10.
7. y(0) = 0.3
y
1
0.8
0.6
0.4
0.2
t
5 10 15 20
8. y(0) = 2
y
2
1.5
0.5
t
5 10 15 20
9. y(6) = 0
y
4
-2
t
2 4 6 8 10
10. y(0.75) = 0

11. Match the following four equations with the four slope fields.
dy
a. dt = sin t
dy
b. dt = t sin y
dy
c. dt = sin y
dy
d. dt = y sin t
GRAPH A
y
GRAPH B
y
GRAPH C
y
GRAPH D

12. Match the following four equations with the four slope fields.
dy
a. dt = y(1 − y)(1 + y)
dy
b. dt = sin t
dy
c. dt = sin(t + y)
dy
d. dt = t/10 + y
GRAPH E
y
2
-1
t
5 10 15 20
GRAPH F
y
2
-1
t
5 10 15 20
GRAPH G
y
2
-1
t
5 10 15 20
GRAPH H
y
2
-1
t
5 10 15 20
Sketch the slope fields and sketch a few solutions for the differential equations given in Problems 13 to 18.

dy
13. dt = y(4 − y)(y − 2)
dy
14. dt = t2 − y
dy
15. dt = sin t
dy
16. dt = y 2 + t2 − 1
dy
17. dx = − xy
dy
18. dx = ex+y
Sketch the slope fields and the solution passing through the specified point for the differential equations given in
Problems 19 to 24.
dy
19. dt = t2 − y 2 , (t, y) = (0, 0)
dy
20. dt = 1.5y(1 − y), (t, y) = (0, 0.1)
q
dy
21. dt = yt , (t, y) = (4, 1)
dy
√
22. dt = y 2 t, (t, y) = (9, −1)
dN 0.1N
23. dt = 1+0.01 N − 0.01 N − 4, (t, N ) = (0, 90) and (t, N ) = (0, 110)
dz
24. dt = 4(z − z 3 ), (t, z) = (0, 0) and (t, z) = (0, 0.1)
Estimate a solution for Problems 25 to 28 using Euler’s method. For each of these problems, a slope field is given
with actual solution. Superimpose the segments from Euler’s method on the given slope field and assess how well your
solution approximates the actual solution as drawn.
dy t
25. dt = y − t passing through (0, 4) for 0 ≤ t ≤ 7, h = 1
dy t t
26. dt = y + 4 − 2 passing through (0, 5) for 0 ≤ t ≤ 4, h = 1

dy
27. dt = 2t(y − t2 ) passing through (0, 1) for 0 ≤ t ≤ 3, h = 0.5
dy 4t−2ty
28. dt = 1+t2 passing through (0, 1) for 0 ≤ t ≤ 5, h = 0.5
Use Euler’s method to approximate the solution to y ′ (t) = f (t, y) and sketch the approximate solution in Problems
29 to 32 over the specified interval.
29. Over the interval 0 ≤ t ≤ 2 with f (t, y) = (4 − y)(y + 2), y(0) = 0.1, h = 0.5.
30. Over the interval 0 ≤ t ≤ 1 with f (t, y) = y − t, y(0) = 2, h = 0.2.
31. Over the interval 1 ≤ t ≤ 3 with f (t, y) = sin πt − 2y, y(1) = 0, h = 0.5.
32. Over the interval −1 ≤ t ≤ 0 with f (t, y) = (4 − y)(y + 2), y(−1) = 0, h = 0.25.
33. Consider the differential equation
dy 1
=
dt t
a. Verify that y(t) = ln t is a solution to this differential equation satisfying y(1) = 0.
b. Use Euler’s method to approximate y(2) = ln 2 with h = 0.5.
34. Consider the differential equation
dy
= et
dt
a. Verify that y(t) = et is a solution to this differential equation satisfying y(1) = 0.
b. Use Euler’s method to approximate e with h = 0.2.
35. A patient receives a continuous drug infusion of 100mg/h. The half life of the drug is 2 hours.
a. Write a differential equation for the amount of drug in the body. (Hint: Review Example 2 in Section
6.3.)

b. Sketch the slope field for this differential equation.

c. Determine the limiting amount of the drug in the patient’s body.
36. A patient receives a continuous drug infusion of 50 mg/h. The half-life of the drug is 1 hour.
a. Write a differential equation for the amount of drug in the body. (Hint: Review Example 2 in Section
6.3.)
b. Sketch the slope field for this differential equation.
c. Determine the limiting amount of the drug in the patient’s body.
37. A population subject to seasonal fluctuations can be described by the logistic equation with an oscillating
carrying capacity. Consider, for example,

dP P
=P 1−
dt 100 + 50 sin 2πt
While it is difficult to solve this differential equation, it is easy to obtain a qualitative understanding.
a. Sketch a slope field over the region 0 ≤ t ≤ 5 and 0 ≤ P ≤ 200.
b. Sketch solutions which satisfy P (0) = 0, P (0) = 10, and P (0) = 200.
c. Using technology to obtain a better rendition of the slope field and solutions.
d. Comment on your solutions and compare to your work using different methods.
38. The velocity v(t) of a skydiver is governed by the equation
dv
m = mg − kv 2
dt
where m is the mass of the sky diver, g is gravitational acceleration, and k is a dampening constant (i.e.
accounts for air friction).
a. Sketch the slope field for this equation assuming that m = 70 kg, g = 9.8 m/s2 , and k = 110 kg/s.
b. Using the slope field, determine the value of limt→∞ v(t) for the solution v(t) satisfying v(0) = 0. Note
that the this limiting value is know as the terminal velocity.
39. In this problem, we consider an autocatalytic chemical reaction involving two molecules, A and B. Let a denote
the concentration of A and assume that the concentration b of B remains constant throughout the experiment
(e.g. B is added to the mixture in such a way to keep b constant). If A combines with a molecule B to form
two molecules of A and in a backward reaction, two molecules A form a molecule of A and B, then
da
= k1 ab − k2 a2
dt
where k1 and k2 are positive rate constants.
a. Sketch the slope field for this equation for the case k1 = 1, b = 1, k2 = 0.5.
b. For the cases a(0) = 0.2 and a = 3 sketch in the solutions and determine the value of limt→∞ a(t)
40. A population in absence of harvesting exhibits the following growth

dN N N
=N −1 1−
dt 100 1, 000
where N is abundance and t is time in years.
a. Write an equation that corresponds to harvesting the population at a rate of 0.5% per day.
b. Sketch the slope field for the differential equation you found in a and by sketching solutions describe how
the fate of the population depends on its initial abundance.

6.5. PHASE LINES AND CLASSIFYING EQUILIBRIA 601
6.5 Phase Lines and Classifying Equilibria

In this section and the next section, we focus on autonomous (independent of time) differential equations
dy
= f (y)
dt
In the previous section, we noted that the slope field for an autonomous differential equation is time-independent.
Since each vertical line in the slope field contains all the information about the slopes, the slope fields contains an
infinite amount of redundancy. In this section, we trim off this redundancy using phase lines and discuss classifying
equilibria , the y-values for which f (y) = 0.
Phase lines
In the last section, we sketched slope fields by determining where the slope is zero (nullcline), and where it is positive
and where it is negative. In this section, we consider a phase-line diagram that collapses the two-dimensional slope
field to the y-axis without losing any information regarding the qualitative behavior of solutions to the differential
equation dydt = f (y) (e.g. see Figure 6.26). The following procedure creates a phase line.
dy
To draw a phase line for dt = f (y),
Step 1. Draw a vertical line corresponding to the y-axis.
dy
Step 2. Draw solid circles on this line corresponding to the equilibria of dt = f (y).
That is, y-values where f (y) = 0.
Phase Lines
Step 3. Draw an upward arrow on intervals where f (y) > 0. On these intervals,
solutions of the differential equation are increasing.
Step 4. Draw a downward arrow on intervals where f (y) < 0. On these intervals,
the solutions of the differential equation are decreasing.
Example 1. Phase lines for clonal genotypes
Consider two clonally reproducing lines of the same species (i.e. individual replicate themselves rather than
reproducing sexually) exhibiting two genotypes a and A and whose per-capita growth rates are ra and rA , respectively.
Suppose these two clonal lines are growing together in the same population and let y denote the proportion of genotype
a in this population. It it left as an exercise(see Problem 39) to show that the variable y satisfies the equation
dy
= (ra − rA )y(1 − y).
dt
a. Draw the phase line for this equation when ra > rA .

b. Draw the phase line for this equation when ra < rA .
c. Discuss why this makes sense.
Solution.
a. Begin by drawing the y-axis. The equilibria are determined by the solutions of
0 = (ra − rA )y(1 − y)

602 6.5. PHASE LINES AND CLASSIFYING EQUILIBRIA
Figure 6.26: An illustration of how the three qualitatively different solution zones y < 0, 0 < y < 1, and y > 1,
separated by the two equilibrium solutions y = 0 and y = 1 associated with the logistic equation dy
dt = y(1 − y), can
be collapsed on the y axis by removing (or projecting down) the time axis t.
Since the equilibria are y = 0 and y = 1, we draw solid circles on the y axis at these y-values. Since
ra > rA , we have dy dy
dt > 0 for 0 < y < 1 and we draw an upward arrow on this interval. Since dt < 0 for
y > 1 and y < 0, we draw downward arrows on these intervals. This results in the phase line illustrated
in Figure 6.27a.
1 1
0 0
a. ra > rA b. ra < rA
dy
Figure 6.27: Phase lines for dt = (ra − rA )y(1 − y)
b. Again begin by drawing the y-axis. The equilibria are determined, as before, by the solutions of
0 = (ra − rA )y(1 − y)
Since the equilibria are y = 0 and y = 1, we draw solid circles on the y-axis at these y-values. Since
ra < rA , we have dy dy
dt < 0 for 0 < y < 1 and we draw a downward arrow on this interval. Since dt > 0 for
y > 1 and y < 0, we draw upward arrows on these intervals. This results in the phase line illustrated in
Figure 6.27b.
c. If the per-capita growth rate of genotype a is greater than the per-capita growth rate of genotype A, then
we would expect genotype a to become more and more prevalent in the population. Hence, provided that
y > 0 initially, y approaches 1 as seen in the phase line for part a. Conversely, if the per-capita growth
of genotype a is less than the per-capita growth rate of genotype A, then we would expect a to become
less and less prevalent in population. Hence y should approach 0 as seen in the phase line for part b.

In the last example, we found the phase lines from an equation, but sometimes we have a graph (or data leading
to a graph) and not an equation. The next example shows us how to find the phase lines in such a case.
Example 2. From graphs to phase lines to solutions
Let the graph of f (y) be as shown in Figure 6.28.
dy
Figure 6.28: Graph of dt = f (y)
dy
a. Draw a phase line for dt = f (y).
b. Sketch solutions for this differential equation that satisfy y(0) = −1.1, y(0) = 1.1, and y(0) = 0.9.
Solution.
a. Since the graph of f (y) intersects the y-axis at the points −2, −1, 1, and 2, these y-values are the
equilibria of y ′ = f (y). We draw solid circles at these points of the phase line. Since f (y) > 0 on the
intervals (−∞, −2) and (1, 2), we draw upward arrows on these intervals, as shown in Figure 6.29a. For
all the other intervals, (−2, −1), (−1, 1), and (2, ∞), we draw downward arrows.
2 2
1 1
-1 -1
-2
-2
t
0.2 0.4 0.6 0.8 1 1.2 1.4
a. Phase line b. Solutions to differential equation
Figure 6.29: Phase line and solutions for given graph
b. According to the phase line, a solution initiated at y = −1.1 initially decreases slowly (as it is near
the equilibrium y = −1), decreases more rapidly, and numerical/analytical methods can be use to show
that this solution asymptotes at the equilibrium y = −2. A solution initiated at y = 1.1, initially
increases slowly, increases more rapidly, and and numerical/analytical methods can be use to show that

this solution at y = 2. A solution initiated at y = 0.9, initially decreases slowly, decreases more rapidly,
and and numerical/analytical methods can be use to show that this solution at y = −1. These solutions
are shown in Figure 6.29b.
The equation y = (ra − rA )y(1 − y) in Example 1 has a special name in the context of evolutionary game
theory. It is called the replicator equation. Evolutionary game theory was developed in the late 1970s, by the
eminent theoretical evolutionary biologist, John Maynard-Smith (1920-2004). (For more information about one of
the world’s greatest evolutionary biologists see the HISTORICAL QUEST in the problem set). Perhaps the best known
of his games is the Hawk–Dove game which describes under what conditions non-aggressive behaviors can persist in
a population.
In general, for any two inherited contrasting strategies, the growth rates ra and rA for genotype a (e.g. hawks) and
genotype A (e.g. doves) respectively in the replicator equation are constructed from a two-by-two table. This table
is known as the pay-off matrix and it tells us how much payoff (benefit if positive, cost if negative) an individual gets
following a pairwise interaction with another individual. The payoffs when a meets a is denoted by Paa . Similarly,
we use PaA , PAa , and PAA to denote the payoffs when a meets A, A meets a, and A meets A, respectively. We
summarize this information in Table 6.3
Table 6.3: Payoff Matrix

Type a Type A
Type a (proportion y) Paa PaA
Type A (proportion (1 − y)) PAa PAA
To determine the per-capita (i.e. proportional) growth rate of a genotype, we find the expected payoff by cal-
culating the product of the chance of meeting an individual playing a particular strategy and the corresponding
payoff.
For genotype a, this expected payoff is
ra = yPaa + (1 − y)PaA
and for genotype A the expected payoff is
rA = yPAa + (1 − y)PAA
dy
Substituting these expressions for ra and rA into dt = y(1 − y)(ra − rA ), we get the two-strategy replicator equations.
The replicator equation describing the proportion y(t) of the population of geno-
types a and A, each playing different strategies with payoff interactions Pij , (i, j = a
and A) is
Two-Strategy
Replicator Equations dy
= y(1 − y)(ra − rA )
dt
= y(1 − y) [PaA − PAA + y(Paa + PAA − PaA − PAa )]
The following assumptions are made regarding the Hawk-Dove game: a population of individuals competes for
a limiting “resource” such as mates, food, or shelter. To win this resource, individuals engage in pair-wise contests
and play one of two strategies, hawk or dove. Individuals playing the hawk strategy constantly escalate the intensity
of the contest until they either they get the resource or they get injured. Individuals playing the dove strategy leave
the contest whenever their opponent escalates the conflict. We consider this game in the next two examples.
Example 3. The Hawk-Dove replicator equation

Suppose a hawk gets a payoff of V > 0 every time it meets a dove and the dove gets 0. Further every time two
doves meet they share the payoff V , while if two hawks meet they escalate the contest until one gets the net payoff
V and the other pays a cost C > 0. What are the payoff matrix entries for this contest and the replicator equation
that describes the frequency of doves in the population?
Solution. Let a denote doves and A denote hawks. In this game, the payoffs are:
V V −C
Paa = , PaA = 0, PAa = V, and PAA =
2 2
This last value represents what the average hawk obtains in a hawk-hawk encounter. If we now substitute these
values in the two-strategy replicator equation we obtain
dy
= y(1 − y)(ra − rA )
dt
V −C V V −C
= y(1 − y) 0 − +y + −0−V
2 2 2

C−V Cy
= y(1 − y) −
2 2
Example 4. Dynamics of a Hawk-Dove game
Consider a Hawk-Dove game with a “payoff” of 2 and a cost of 3. Sketch the phase line and then discuss the
evolutionary implications.
Solution. When V = 2 and C = 3, we obtain from the previous example the specific replicator equation

dy 1 3y
= y(1 − y) −
dt 2 2
The equilibria solutions are values for which dy/dt = 0:
y = 0, y = 1, y = 1/3
dy 1
For 0 < y < 1/3, dt > 0. To see this, choose a value in the interval, say y = 6 and calculate

dy 1 1 1 1/2
= 1− − >0
dt 6 6 2 2
dy 1
For 1/3 < y < 1, dt < 0. To see this, choose a representative value, say y = 2 and calculate

dy 1 1 1 3/2
= 1− − <0
dt 2 2 2 2
dy
Finally, for y > 1, dt > 0. To see this, choose some representative value, say y = 2. Then,

dy 1 6
= 2(1 − 2) − >0
dt 2 2
The phase line is shown in Figure 6.30a.

The phase line implies that if initially hawks and doves are present, then the population approaches an equilibrium
consisting of 31 doves and 23 hawks. This approach to this equilibrium is illustrated in Figure 6.30b. 2

y
1
0.8
1
0.6
0.4
1/3
0.2
0
t
5 10 15 20
a. Phase line b.Solution to the differential equation
Figure 6.30: Phase line and solutions to a Hawk-Dove game
The equilibrium in Example 3 support multiple strategies in the population. Such an equilibrium is called a
polymorphic equilibria. We can understand the growth rates of hawks and doves at low frequencies as follows.
Imagine the population consists mainly of doves and only a few hawks. Individuals are most likely to have a contest
with a dove. For, a dove this means that they get on average a payoff of V /2 = 2 × 12 = 1. For a hawk this means
they get a payoff of V = 2. Therefore, the hawk numbers would grow at twice the rate of doves. Alternatively,
consider a population consisting mostly of hawks and only a few doves. An individual engaging in a contest is most
likely to encounter a hawk. A hawk, on average, gets a payoff of (V − C)/2 = (2 − 3)/2 = −1/2. A dove gets a
payoff of 0. So hawk frequency will decline.
Classifying Equilibria
When a system starts at an equilibrium, it remains there for all time. However, in the real world, biological systems
are constantly subject to environmental perturbations (small changes). Thus, if a system starting at equilibrium
is slightly perturbed from equilibrium, we need to ask does it tend to return to the equilibrium or not? When
the system tends to return to the equilibrium, we call the equilibrium stable. Otherwise, we call it unstable. More
precisely, we make the following definitions.
dy
An equilibrium y ∗ for dt = f (y) is classified as follows:
Stable: f (y) > 0 for all y < y ∗ near y ∗ and f (y) < 0 for all y > y ∗ near y ∗ .
Solutions initiated near the equilibrium tend toward the equilibrium in forward
time (i.e. as t → ∞).
Classification of Unstable: f (y) < 0 for all y < y ∗ near y ∗ and f (y) > 0 for all y > y ∗ near
y ∗ . Solutions initiated near the equilibrium tend toward the equilibrium in
Equilibria
backward time (i.e. as t → inf ty).
Semi-stable: Either f (y) < 0 for all y 6= y ∗ near y ∗ or f (y) > 0 for all y 6= y ∗ near
y ∗ Solutions initiated near one side (resp. other side) of the equilibrium tend
toward the equilibrium in backward (resp. forward) time.
Graphical depictions of these definitions are provided in Figure 6.31.
Example 5. Classifying equilibria
Classify the equilibria for dy

dt = f (y) where the graph of f (y) is the graph given in Figure 6.27 in Example 2 which
we repeat here for convenience.

y* y* y*
Stable Unstable Semistable
Figure 6.31: Graphical characterization of classifying equilibria
y
-2 -1 1 2
-5
-10
Solution. Previously, we sketched the phase lines for dy dt = f (y) and found four equilibria: y = −2, y = +2, y =
−1, y = +1. From the phase line sketch in Example 2, Fig. 6.29, we classify the equilibria as follows: y = −2 and
y = 2 are stable, y = 1 is unstable, and y = −1 is semi-stable. 2
Example 6. Membrane potential
The voltage V across the membrane of a neuron is maintained by voltage-gated (i.e. controlled) protein channels
embedded in the cell membrane. These channels regulate the flow of positively charged potassium ions and negatively
charged organic molecules out of the cell, and negatively charged chlorine ions and positively charged sodium ions
into the cell. If the membrane is perturbed from its resting potential V0 by a small input current (e.g. coming from
another neuron), it will return to its resting potential. If this perturbing current, however, is sufficiently large to
cause V (t) to drop below a critical threshold level Vc , then the sodium ions flow across the membrane until the
voltage stabilizes at a new depolarized equilibrium level Vd . Show that model
dV
= −k(V − V0 )(V − Vc )(V − Vd )
dt
exhibits these characteristics by finding and classifying its equilibria for the values V0 = −70 mV, Vc = −30 mV,
Vd = 55 mV and k = 1.
Solution. For the constants in question, the right-hand-side of the equation is
f (V ) = −(V + 70)(V + 30)(V − 55)
This function is a cubic in the variable V with roots at V = −70, −30, and 55. The graph of this cubic is given by

5
x 10
2
1.5
0.5
−0.5
−1
−80 −60 −40 −20 0 20 40 60
V
From the graph, we see that V0 = −70 mV and Vd = 55 mV are stable as required, and that Vc = −30 mV is
unstable, also as required.
2
Linearization
An analytical approach to classifying equilibria involves linearizing about the equilibria. Suppose y ∗ is an equilibrium
for dy ′ ∗ ∗ ∗
dt = f (y). Consider a = f (y ). Since f (y ) = 0, a linear approximation to f (y) for y near y is given by
∗ ′ ∗ ∗
f (y ) + f (y)(y − y ) = a(y − y ). Hence,
dy
≈ a(y − y ∗ )
dt
for y values near y ∗ . As you are asked to show in Problem 37, the solution to
dy
= a(y − y ∗ )
dt
satisfying y(0) = y0 is
y(t) = (y0 − y ∗ )eat + y ∗
. We can use this solution as a first-order approximation for the solution to dydt = f (y) satisfying y(0) = y0 . This
approximation y(t) = (y0 − y ∗ )eat + y ∗ to the solution remains reasonable provide y(t) remains near y ∗ . Using this
approximation, one can prove the following theorem.
Theorem 6.1. Linearization
dy
Let dt = f (y) have an equilibrium at y = y ∗ .
• If f ′ (y ∗ ) < 0 then y ∗ is stable
• If f ′ (y ∗ ) > 0 then y ∗ is unstable.
• If f ′ (y ∗ ) = 0 then no conclusion is possible without looking at higher order derivatives.
Informally, the result follows from the fact that
y(t) ≈ (y0 − y ∗ )eat + y ∗ where a = f ′ (y ∗ )
implies the rate at which solutions move towards or away from y ∗ is given approximately by eat . Thus when
a = f ′ (y ∗ ) < 0 solutions starting near y ∗ move toward y ∗ and when a = f ′ (y ∗ ) > 0 solutions starting near y ∗ move
away from y ∗ .
Example 7. Population resilience

Consider two populations whose dynamics are described by

dN N dP P
=N 1− and = 0.5 P 1 −
dt 10, 000 dt 10, 000
a. Find the equilibria and use linearization to classify.
b. Describe in what ways the populations are similar and dissimilar.
Solution.
a. For both populations, the equilibria are given by 0 and 10, 000. For the first model set f (N ) = N (1 −
N/10, 000), we find
N N
f ′ (N ) = 1 − −
10, 000 10, 000
Checking the equilibria:
Equilibria Evaluate Classification

N =0 f ′ (0) = 1 unstable
N = 10, 000 f ′ (10, 000) = −1 stable
For the second model, set g(P ) = dP/dt, we find
Equilibria Evaluate Classification

P =0 g ′ (0) = 21 unstable
P = 10, 000 g ′ (10, 000) = − 21 stable
b. The populations are similar in that both populations have equilibria at 0 and 10, 000 which are unstable
and stable, at 0 and 10,000, respectively. Hence, populations tend to approach the equilibrium value of
10, 000.
The populations differ in that P (the second model) tends to grow less rapidly at low densities; i.e.
g ′ (0) = 21 < f ′ (0) = 1. Moreover, if the populations are at the equilibrium of 10,000, the P population
recovers less rapidly from a perturbation; i.e. g ′ (10, 000) = − 21 > f ′ (10, 0000) = −1.
When one population recovers more rapidly from environmental perturbations than another population (as with
P versus the N population in Example 7) it is said to be more resilient.
Example 8. Hawk-Dove game revisited
Consider the Hawk-Dove game

dy 1 3y
= y(1 − y) −
dt 2 2
where y is the frequency of doves in the population.
a. Use linearization to classify each of the equilibria.
b. Use your work from part a to determine whether the hawks increase more rapidly at low frequencies or
the doves increase more rapidly at low frequencies.
Solution.

a. Let
1 3y
f (y) = y(1 − y) −
2 2
As we have seen, the equilibria are y = 0, y = 1, and y = 1/3. To linearize, we need the derivative:

1 3y 1 3y 3
f ′ (y) = (1 − y) − −y − − y(1 − y)
2 2 2 2 2
Evaluated at y = 0, we obtain
1
f ′ (0) = >0
2
Hence, the equilibrium y = 0 is unstable. Since
f ′ (1) = 1 > 0,
the equilibrium y = 1 is unstable. Since
1 1
f ′ ( ) = − < 0,
3 3
1
the equilibrium y = 3 is stable.
1
b. Since f ′ (1) = 1 > f ′ (0) = 2 we see that hawks at low frequency increase more rapidly than doves at low
frequency.
2
Example 9. Linearization of membrane voltage model
By considering the linearization of the model

dV
= −k(V − V0 )(V − Vc )(V − Vd )
dt
classify the equilibria for the case V0 < Vc < Vd .
Solution. Define
f (V ) = −k(V − V0 )(V − Vc )(V − Vd )
The roots of the function f (V ) are the equilibria V = V0 , Vc , and Vd . which by two applications of the product rule
implies
f ′ (V ) = −k[(V − Vc )(V − Vd ) + (V − V0 )(V − Vd ) + (V − V0 )(V − Vc )]
At V = V0 , we have
f ′ (V0 ) = −k[(V0 − Vc )(V0 − Vd ) + 0 + 0] = −k(V0 − Vc )(V0 − Vd ).
Since V0 < Vc < Vd , f ′ (V0 ) < 0 and V = V0 is stable. At V = Vc , we have
f ′ (Vc ) = −k(Vc − V0 )(Vc − Vd )
Since V0 < Vc < Vd , f ′ (Vc ) > 0 and V = Vc is unstable. At V = Vd , we have
f ′ (Vd ) = −k(Vd − V0 )(Vd − Vc )
Hence, f ′ (Vd ) < 0 and V = Vd is stable. 2
Problem Set 6.5

Draw phase lines, classify the equilibria, and sketch a solution satisfying the specified initial value for the equations
in Problems 1 to 10.

dy
1. dt = 1 − y 2 , y(0) = 0
dy
2. dt = 2 − 3y, y(0) = 2
dy
3. dt = −7, y(0) = −2
dy
4. dt = 10, y(0) = 5
dy
5. dt = y(y − 10)(20 − y), y(0) = 9
dy
6. dt = y(y − 5)(25 − y), y(0) = 7
dy
7. dt = sin y, y(0) = 0.1
dy
8. dt = 1 − sin y, y(0) = −0.6
dy
9. dt = y 2 − 2y + 1, y(0) = 0
dy
10. dt = y 3 − 4y, y(0) = 0.1
dy
Draw a phase line for dt = f (y) for the graphs shown in Problems 11 to 14. Sketch the requested solutions.
11. y(0) = −1.1, y(0) = 1.1, y(0) = 0.9
12. y(0) = −0.1, y(0) = 0.9, y(0) = 1.1
13. y(0) = −2, y(0) = 1, y(0) = 2

14. y(0) = −0.1, y(0) = 1.9, y(0) = 3.
Linearize about the equilibrium in Problems 15 to 20 and classify it.

dy
15. dt = 4 − y2, y∗ = 2
dy π
16. dt = cos y, y ∗ = 2
dy √1
17. dt = 2
− cos y, y ∗ = π/4
dy
18. dt = 2y − y 2 − y 10 , y ∗ = 0
dy
19. dt = 3 − y, y ∗ = 3
dy
20. dt = y(10 − y)(100 − y), y ∗ = 100
Sketch the phase line and classify the equilibria for the Hawk-Dove game with the values V and C given in Problems
21 to 24.
21. V = 2, C = 2
22. V = 4, C = 2
23. V = 3, C = 2
24. V = 2, C = 4
Sketch the phase line and classify the equilibria for the replication equations with the indicated payoffs in Problems
25 to 28.
25. Paa = 2, PaA = 1, PAa = 1, and PAA = 2
26. Paa = 1, PaA = 2, PAa = 3, and PAA = 4
27. Paa = −1, PaA = 2, PAa = 1, and PAA = −1
28. Paa = 2, PaA = −1, PAa = −1, and PAA = 3
29. “The Stag Hunt” is a story told by Rousseau in A Discourse on Inequality that became a game. In the story
we read
If it was a matter of hunting a deer, everyone well realized that he must remain faithful to his post;
but if a hare happened to pass within reach of one of them we cannot doubt that he would have gone
off in pursuit of it without scruple...
To turn this into an evolutionary game, consider a population of individuals that can engage in group hunting
for larger game (e.g. packs of wolves etc.) Each individual in this population can play one of two strategies,
hunt stag (i.e. remain loyal to the pack even if an alternative prey comes along) or hunt hare (i.e. run after
hares whenever he see them). In his writing, Thomas Hobbes present informal arguments about this game that
suggest the following payoff matrix

Hunt Stag Hunt Hare

Hunt Stag 7.5 4
Hunt Hare 7 5
a. Find the replicator equation.

b. Sketch the phase line, and classify the equilibria.
c. Discuss how the outcome of the evolutionary game depends on the initial composition of the popu-
lation.
30. Consider two scenarios based on Problem 29:

i. In a population of stag hunters, a few individuals decide to hunt hares.
ii. In a population of hare hunters, a few individuals decide to hunt stag.
Use linearization to determine in which of these scenarios, the “defecting” individuals are more rapidly excluded.
31. (Evolution of Cooperation, Part I) Consider a population with two strategies, cooperate and defect. Individuals
that cooperate provide a benefit B to their opponent and pay a cost C for providing this benefit. Defectors
provide no benefits to their opponents and pay no cost. Under these assumptions, we get the following payoff
matrix.
Cooperate Defect
Cooperate B−C −C
Defect C 0
a. Write down a replicator equation for this payoff matrix.

b. Assuming B > 0 and C > 0, sketch the phase line for the replicator equation.
c. Discuss the implications of your phase line.
32. (Evolution of Cooperation, part II) In Problem 31, cooperation could not evolve. However, cooperation is seen
in natural populations. In this problem, we investigate how individuals that interact frequently and respond to
the strategy of their opponents can promote the evolution of cooperation. Let us imagine that each time two
opponents meet they interact on average n times. Individuals can play one of two strategies: defect always or
tit-for-tat in which case an individual initially cooperates but switches to defecting if their opponent defected.
a. If each time individuals interact the individuals payoffs are as in Problem 31, then discuss why the
payoff matrix should be
Tit for Tat Defect
Tit for Tat n(B − C) −C
Defect B 0
b. Write down a replicator equation for this game.
c. Assume B = 3 and C = 2. Sketch phase lines for n = 2, 3, 4.
d. Discuss the implications for the evolution of cooperation.
33. To account for the effect of a generalist predator (with a type II functional response) on a population, ecologist
often write differential equations of the form

dN N 10N
= 0.1N 1 − −
dt 1, 000 1+N
where N is the population abundance and t is time (in years). The first term of the equation corresponds to
logistic growth and the second term corresponds to saturating predation.
a. Sketch the phase line for this system.

b. Discuss how the fate of the population depends on its initial abundance.
34. Construct the phase line for the model
dV
= −2V 3 − 20V 2 + 3000V
dt
and hence demonstrate that this equation belongs to the class of membrane voltage models presented in
Example 6.
35. Use a phase line diagram to discuss the behavior of the membrane voltage models presented in Example 6 with
constants k = 3, V0 = −65 mV, Vc = 40 mV and Vd = 40 mV. Does this membrane have the property that it
is able to switch between two states when perturbed by a current?
36. Historical Quest John Maynard Smith, or JMS as he was almost always known, was professor emeritus at
the University of Sussex, and one of the world’s great evolutionary biologists.
John Maynard Smith (1920-2004)
JMS introduced mathematical modeling from game theory into the study of mathematical biology, and com-
pletely revolutionized the way that biologists think about behavioral evolution. Jonathan Weiner wrote “A
Conversation With John Maynard Smith” which was published in the September 2000 issue of Natural History,
before JMS died. Here is what he said:
A classical geneticist and leading theorist in evolutionary biology, John Maynard Smith started out
as an engineer and worked as a “stress man” during World War II, calculating the stresses in airplane
wings. Since then, he has applied his knowledge of mathematics to some of the greatest problems in
evolution–exploring the stress points, the places where the theory threatens to pop its rivets.
Maynard Smith is best known for using game theory to explain the jousting matches that one sees
among the males of many species, from sticklebacks to sea lions, from stag beetles to stags. “You’d
simply expect them to sort of hit the other chap in the groin as quickly as possible,” he says, “and
yet there’s rather little escalated fighting and a great deal of display in settling contests.” It’s almost
as if the combatants are cooperating–a paradox the biologist explains by invoking the mathematics
of nonzero-sum contests and win-win situations.
At the University of Sussex in England, where he works, Maynard Smith is closely involved with a
group of colleagues he calls “The Institute for the Study of Tiny Minds”: neurobiologists working
on the behavior of ants, bees, worms, and snails. He also talks daily with colleagues across disci-
plines who, like him, are trying to apply the theory of natural selection to the design of robots and
computers.

Because of his stature, he received numerous prestigious awards, and for this Historical Quest, you should
research and say a few words about each of these awards achieved by JMS, or in the case of the last one,
established in his honor.
a. Balzan Prize
b. Crafoord Prize
c. Kyoto Prize
d. John Maynard Smith Prize
37. Verify that the solution to
dy
= a(y − y ∗ )
dt
satisfying y(0) = y0 is given by
y(t) = (y0 − y ∗ )eat + y ∗
38. Show that the linearization theorem is inconclusive when the derivative equals zero at the equilibrium. Hint:
Consider the equations
dy dy dy
dt = y3 dt = −y 3 dt = y2
39. Consider a population of clonally reproducing individuals consisting of two genotypes a and A with per-capita
growth rates, ra and rA , respectively. If Na and NA denote the densities of genotypes a and A, then
dNa dNA
= ra Na = rA NA
dt dt
Na
Also, let y = Na +NA be the fraction of individuals in the population that are genotype a. Show that y satisfies
dy
= (ra − rA )y(1 − y)
dt

616 6.6. BIFURCATIONS
6.6 Bifurcations
Biological systems can exhibit a multitude of dynamical behaviors which can change abruptly or gradually in
response to external perturbations. The term bifurcation is used in the context of differential equation models to
denote a change in the stability of equilibria or the types of solutions that occur as a parameter in the model is
varied. In this section, we provide an introduction to bifurcation theory. This theory provides a systematic approach
to studying qualitative changes in the dynamical behavior of a differential equation. We will use the notation
dy
= f (y, a)
dt
to represent an expression in y and a where y is the variable and a is a parameter. Our goal is to understand how
the qualitative behavior of this equation depends on a. More precisely, we will study how the phase line varies with
the parameter a.
In this section, we illustrate bifurcation theory with populations subjected to harvesting and the firing rates of
neural populations.
Sudden population disappearances
Example 1. Harvesting queen conch
Consider a population of queen conch in the Bahamas whose dynamics are given by

dy y
= 10 y 1 − −a
dt 10, 000
where t is time in years, y is number of conch, and a is the constant annual harvesting rate.
a. Draw phase lines for a = 0, a = 21, 000, and a = 30, 000.
b. Discuss the biological implications of these phase lines.
c. Determine how the number of equilibria depends on a.
Solution.
a. Consider a = 0. The equilibria are given by the solutions of

y
0 = 10y 1 − −0
10, 000
Solving this equation yields the equilibria y = 0 and y = 10, 000. Since dy
dt > 0 for 0 < y < 10, 000 and
dy
dt < 0 for the other intervals, we obtain the phase line as shown in Figure 6.32a.
Consider a = 21, 000. The equilibria are given by solutions to

y
0 = 10 y 1 − − 21, 000
10, 000
dy dy
which yields y = 3, 000 and y = 7, 000. Since dt > 0 for 3, 000 < y < 7, 000 and dt < 0 elsewhere, we get
a phase line as shown in Figure 6.32b.
Finally, consider a = 30, 000. In this case there are no equilibria because

y
0 = 10y 1 − − 30, 000
10, 000
dy
has no real roots. Since dt < 0 for all y, we get a phase line as shown in Figure 6.32c.

6.6. BIFURCATIONS 617
10,000
7,000
3,000
a=0 a = 21, 000 a = 30, 000
Figure 6.32: Phase lines for the density (y) of conch for the three harvesting levels a, as labeled, inserted into the
conch harvesting equation.
b. The phase lines in Figure 6.32 show that as a increases, the number of equilibria goes from two to zero.
In particular, at sufficiently high harvesting rates, the population is unable to persist at an equilibrium.
c. To determine how the equilibria depend on the harvesting rate a, we need to solve

y
0 = 10 y 1 − −a
10, 000
for y. Using the quadratic formula,

r
a
y = 5, 000 ± 100 2, 500 −
10
Hence, we obtain two equilibria provided that
2, 500 − a/10 > 0
which occurs if and only if a < 25, 000. If a = 25, 000, then we get only one equilibrium given by
a
y = 5, 000. Finally, if a > 25, 000, then 2, 500 − 10 is negative and there are no equilibria. Therefore,
a change in the number of equilibria occurs at a = 25, 000.
Example 1 illustrates that the phase line of dydt = f (y, a) can vary substantially as you vary the parameter a.
Moreover, it shows that at certain parameter values (i.e. a = 25, 000 in Example 1) there is a qualitative change in the
phase line. These values are important enough to have their own name: bifurcation values. We define bifurcation
values as the value of a parameter in an equation where either the number of equilibrium solutions changes or the
stability properties of these solutions undergo a transition from stable to unstable. A simple way to graphically
summarize how the behavior of the system depends on a is to graph something known as a bifurcation diagram.
The procedure for constructing such a diagram is summarized as follows.
A bifurcation diagram summarizes the behavior of a system in the a–y plane and
can be created as follows
Step 1. Draw that a-axis (horizontal) and the y-axis (vertical).
Step 2. Sketch the set of equilibria in the ay-plane. That is, the set of points (a, y)
Bifurcation diagram that satisfy 0 = dy
dt = f (y, a).
dy
Step 3. Determine in which regions of the ay-plane, dt is positive or negative.
Step 4. For a collection of a values, draw a phase line. In particular, draw phase
lines at bifurcation values of a and at values of a that lie between bifurcation
values.

Example 2. Sudden queen conch disappearances
Sketch a bifurcation diagram for Example 1.

dy y
= 10 y 1 − −a
dt 10, 000
with a ≥ 0 and y ≥ 0. Discuss the implications for population harvesting.
Solution. We begin by solving

y
0 = 10 y 1 − −a
10, 000
for a and graphing a = 10y(1 − y/10, 000) in the ay-plane. The graph is a parabola as shown in Figure 6.33a.
10000
8000
6000
y
4000
2000
0
0 5000 10000 15000 20000 25000 30000
a
dy
a. Graph of dt =0 b. Bifurcation diagram
dy
Figure 6.33: The curve of equilibria and bifurcation diagram for dt = 10y(1 − y/10000) − a
Choosing a point inside the parabola, say (0, 5000), we obtain dy/dt = 10 · 5, 000(1 − 1/2) = 25, 000 > 0. Hence,
dy/dt > 0 inside of the parabola. Choosing a point outside of the parabola, say (10, 0), we obtain dy/dt = −10 < 0.
Hence dy/dt < 0 outside of the parabola.
Next, we can sketch phase lines for several a values, say a = 0, a = 20, 000, a = 25, 000, and a = 30, 000. For
each of these values of a, we draw a vertical line. Where the line intersects the parabola we draw a solid circle (in
red in Fig. 6.33b.) as this corresponds to points where dy/dt = 0. Where the line lies inside the parabola, we draw
an upward arrow. Where the line lies outside the parabola, we draw downward arrows. The resulting bifurcation
diagram is illustrated in Figure 6.33b. Notice that for a = 0, a = 20, 000, and a = 30, 000, we get the same phase
lines as in Example 1.
This bifurcation diagram indicates that for
0 < a < 25, 000
there are two equilibria. The lower equilibrium is unstable equilibrium and the upper equilibrium is stable. When
the two equilibria coalesce at a = 25, 000, the resulting equilibrium is semi-stable—that is, solutions starting above
the density y = 5, 000 decrease to asymptotically approach the equilibrium y = 5, 000, while solutions that start
below 5, 000 also decrease to asymptotically approach 0.
Noting that this critical semi-stable equilibrium value y = 5, 000 is half the carrying capacity K = 10, 000, it
follows that for harvesting rates over the range 0 < a < 25, 000, the population can persist provided that its initial
population abundance is sufficiently large. Moreover, the stable population equilibrium is always greater than 5, 000.
On the other hand, if the population is harvested at a rate a > 25, 000, it will eventually be driven to 0, at which
point the harvesting must necessarily be set to 0 since a population that has 0 individuals can no longer be harvested.
2

An important implication of the bifurcation diagram in Example 2 is that gradual changes in harvesting can
bring about discontinuous changes in the population abundance. More specifically, when the harvesting rate is ever
so slightly increased beyond the bifurcation value (a = 25, 000 in Example 1) the population suddenly exhibits a
dramatic decline from an abundance to extinction. Such population disappearances have been observed in natural
populations. Dramatic examples include the precipitous drop of blue pike (stizostedion vitreum glaucum) from annual
catches of 10 million pounds to less than one thousand pounds in the mid 1950s, or the unexpected collapse of the
Peruvian anchovy population in 1973, as illustrated in Figure 6.34, and the sudden reduction of Great Britain’s grey
partridge (perdix perdix ) population in 1952.
metric tons caught

7
1.2·10
7
1·10
6
8·10
6
6·10
6
4·10
6
2·10
year
1955 1960 1965 1970
Figure 6.34: Catch data for Peruvian anchovies in the 20th century
The bifurcation occurring at a = 25, 000 in the queen conch example is a saddle node bifurcation because the
transition from two equilibria to no equilibria is preceded by the appearance of a semi-stable (or saddle) equilibrium.
A more colorful name for this bifurcation is a blue sky catastrophe as two equilibria vanish into the blue sky as a
increases past the value 25, 000. Other types of bifurcations are possible, such as the pitchfork bifurcation illustrated
by the next example. A look ahead at Figure 6.35 indicates the source of this name: one equilibrium bifurcates into
three as the value of the bifurcation parameter increases to create a pitchfork looking object.
Example 3. Pitchfork bifurcation
Sketch a bifurcation diagram for

dy
= ay − y 3
dt
Solution. The equilibria are given by

0 = y(a − y 2 )
Hence,
√ either
√ y = 0 or y 2 = a so that for a ≥ 0 the right-hand-side
√ of the differential equation is f (y) = y(y −
a)(y + a). The sketches of the curves y = 0 for all a and y = ± a for a ≥ 0 in the ay-plane yields Figure 6.35a.
These curves determine four regions in the ay-plane: the regions above and below the pitchfork and the upper and
lower parabolic wedges of the pitchfork. Using the point (a, y) = (0, 1), we obtain dy dy
dt = −1 < 0. Hence, dt < 0
in the region above the pitchfork. Using the point (a, y) = (0, −1), we obtain dy dy
dt = 1 > 0. Hence, dt > 0 in the
dy dy
region below the pitchfork. Using the point (a, y) = (2, 1), we obtain dt = 1 > 0. Hence, dt > 0 in the upper
parabolic wedge of the pitchfork. Using the point (a, y) = (2, −1), we obtain dy dy
dt = −1 < 0. Hence, dt < 0 in the
lower parabolic wedge of the pitchfork.
To complete the bifurcation diagram, it suffices to sketch phase lines for a negative a-value (i.e. only one
equilibrium), a positive a-value (i.e. three equilibria), and the bifurcation value a = 0. Drawing vertical lines at
these a values, solid circles at the equilibria, upward arrows where dy dy
dt > 0, and downward arrows where dt < 0,
results in the bifurcation diagram illustrated in Figure 6.35b. 2

y
-1
-2
-1 -0.5 0 0.5 1 1.5 2
a
dy
a. Graph of dt =0 b. Bifurcation diagram
Figure 6.35: Pitchfork bifurcation
Figure 6.36: A neuron
Modelling memory formation

Behind the motions and thoughts of every animal lies a vast network of cells, the nervous system. The network
comprises billions of cells called neurons. A typical “network” neuron is illustrated in Figure 6.36, although neurons
with various types of morphologies make up the total neural system of any animal.
Neurons specialize in carrying “messages” from one part of the body to another through an electrochemical
process that typically causes a voltage spike to travel along the membrane of the neural cell. The message is received
by dendrites, which look like tentacles attached to the cell body. The chemical messages pass down these tentacles
into the cell body and then out through one main long axon. The end of this axon then communicates with dendrites
of neurons further down the neural chain, thereby passing the message along from one neuron to the next. Messages
between two neurons are usually passed in the form of a chemical flux of so-called neurotransmitters. Excitatory
neurotransmitters trigger “go” signals that allow the message to be passed to the next neuron in the communication
line and inhibitory neurotransmitters produce “stop” signals that prevent the message from being forwarded. A
single neuron “integrates” the incoming signals to determines whether or not to pass the information along to other
cells. The activity within a single neuron is typically measured by the rate which it “fires” voltage spikes.

The simplest model of a population of neurons is the Wilson-Cowan model. It assumes that the entire
population of neurons fire at the same rate y (units are number of spikes/msec) and are of the same type (i.e. release
the same type of neurotransmitters).
Let a be the rate at which an external source produces neurotransmitters that

stimulate the dendrites of a population of neurons. If a is positive or negative,
then the external neurotransmitters are respectively excitatory or inhibitory.
Let b be the rate at which each individual neuron release neurotransmitters when
it fires. If b is positive or negative, then the internal neurotransmitters re-
spectively are excitatory or inhibitory.
Wilson-Cowan Neural
Population Model Let c be the rate at which the firing of an active neuron decays exponentially in
the absence of external stimulation.
Then the firing rate y (measured in spikes per unit time) of each neuron in the
network is modeled by the equation
dy 1
= −cy +
dt 1 + e−a−by
Example 4. Modeling memory formation
Consider an application of the Wilson-Cowan model in which b = 6 (that is, the neurons are excitatory) and
c = 1 (that is, in one unit of time the firing rates has dropped by a factor e−1 = 1/e).
a. Sketch the bifurcation diagram with respect to parameter a.
b. Create a plot of y(t) that corresponds to a population of neurons that is initially quiescent—that is
y(0) = 0—and is then subject to an external stimulus that has the following “switching” characteristics:
a = −3 for 0 ≤ t < 20 (units of t are ms), a = −1 on 20 ≤ t ≤ 40 and a = −3 on 40 < t ≤ 100.
c. Discuss the implications of what you have found.
Solution.
a. This equation is too complicated to plot by hand, so we will graph it using technology. Some computer
programs will graph this equation as shown, but most will require that we solve for one of the variables.
Solving for a in terms of y under equilibrium conditions yields
1
1 + e−a e−6y =
y

−a 6y 1
e = e −1
y

y
ea = e −6y
y−1

1−y
a = −6y − ln
y
Using technology, we find that the graph of this curve is shown in Figure 6.37. Using the point (a, y) =
(−5, 1), we obtain dy dy
dt < 0. Hence, dt < 0 in the left region. Using the point (a, y) = (0, 0), we obtain
dy dy
dt > 0. Hence, dt > 0 in the right region. To complete the bifurcation diagram, we draw five phase lines.
One at each bifurcation value (i.e. a ≈ −2.5 and a ≈ −3.5) and one to either side of the bifurcation
values. Doing so, we obtain the bifurcation diagram illustrated in Figure 6.37b.

0.8
0.6
y
0.4
0.2
0
-5 -4 -3 -2 -1 0
a
dy
a. Graphs of dt =0 b. Bifurcation diagram
Figure 6.37: The curve of equilibria and bifurcation diagram for a Wilson-Cowan model
b. We will use technology to solve the differential equation (a = −3):

dy 1
= −y +
dt 1 + e3−6y
for 0 ≤ t < 20 with y(0) = 0. This solution is shown below:
y
0.1
0.08
0.06
0.04
0.02
t
5 10 15 20
Then, as the domain shifts to 20 ≤ t ≤ 40, we are given that a = −1, so we again use technology to graph
a solution of
dy 1
= −y + y(20) ≈ 0.07
dt 1 + e1−6y
y
1
0.8
0.6
0.4
0.2
t
25 30 35 40
Finally, a returns to −3 for the domain 40 < t ≤ 100, and we use from the above graph an initial value
of y(40) = 1, to give the following graph:
y
1
0.8
0.6
0.4
0.2
t
50 60 70 80 90 100

We use technology to put together these parts into a single graph as shown in Figure 6.38.
y
1
0.8
0.6
0.4
0.2
t
20 40 60 80 100
Figure 6.38: Graph of how a population of neurons records that it has been subject to a change in background firing
rate a on the interval t ∈ [20, 40] (ms).
c. We see from Figure 6.38 that the population of neurons initially rises from 0 to asymptote at a low firing
rate around y = 0.06. In terms of the bifurcation the bifurcation diagram (Figure 6.37b). The activity of
the population has risen from 0 to the lower arm of the S-shaped null-cline in ya-space. When a rises from
−3 to −1, the lower arm ceases to exist and the population rises, as we see in Figure 6.38, to the upper
arm which has a value close to 1. After “switching back” at t = 40 ms to the value a = −1, the neural
firing rate starts to decline, but now it is only able to drop to the upper arm of the S-shaped null-cline in
the bifurcation diagram (Figure 6.37b). By remaining at the high firing rate, the population of neurons is
effectively “remembering” that the background stimulus was in one state (a = −3), switched to a second
state (a = −1) for some period of time and then switched back to the first state again (a = −3). In
this way the neuron has recorded an ”off-on-off” event and is said to now remember that it was once
”switched on.” To clear the memory, the background stimulus a would need to drop below approximately
−3.5 (see Figure 6.37b)
2

Problem Set 6.6

Draw the phase lines requested in Problems 1 to 6.

1. dy y
dt = y 1 − 100 − a; a = 0, a = 9, a = 25

2. dy
dt = 2y 1 − y
1,000 − a; a = 0, a = 180, a = 600
dy
3. dt = 450 − ay; a = −10, a = 0, a = 10
dy
4. dt = (100 − y)(y − 250) − a; a = 0, a = 2000, a = 5000
dy
5. dt = y 2 − ay + 1; a = 0, a = 2, a = 4
dy
6. dt = 2y 2 − ay + 240; a = 0, a = 50, a = 200
Sketch bifurcation diagrams for the equations in Problems 7 to 12.

dy
7. dt = ay − y 2
dy
8. dt = y2 − a
dy
9. dt = 1 + ay
dy
10. dt = 1 − ay 2
dy
11. dt = sin y − a
dy
12. dt = y 2 − ay + 2
Consider an application of the Wilson-Cowan model for the values of b and c in Problems 13 to 18. Sketch the
bifurcation diagram with respect to parameter a.
13. b = 5, c = 1
14. b = 4, c = 1
15. b = 8, c = 1
16. b = 4, c = 2
17. b = 8, c = 2
18. b = 12, c = 2
SIS model in Epidemiology

Mathematical epidemiologists often use the symbol S to denote the number of individuals in a population that
are susceptible to a disease, I the number of people infected with the disease, and R the number of individuals that
have recovered and are now immune. If no individuals die from the disease then the total number of individuals in
the population is N = S + I + R. This kind of model is called and SIR model. In the case that all individuals that
recover are immediately susceptible, then R = 0 for all time and the model is called an SIS model. Many sexually
transmitted infections, for example gonorrhea, do not confer immunity, and are best described by SIS models. This
is one reason why a single round of antibiotics, even if applied widely on a population basis will not have a long-term
effect in lowering incidence of STIs.

Let us assume that a susceptible individual encounters and gets infected by infected individuals at a rate pro-
portional to the density of infected in the population. Call this proportionality constant b ≥ 0. The constant b is
known as the transmission rate in the epidemiological literature. Let us also assume that individuals infected with
the disease recover from the disease at a constant rate r ≥ 0. Under these assumptions we obtain the SIS model:
dI
= bIS − rI
dt
Since I + S = N , we know S = N − I so
dI
= bI(N − I) − rI
dt
Rearranging terms yields
dI
= I (bN − r − bI)
dt
In Problems 19 to 22 sketch a bifurcation diagram with respect to b if r = 1 and with respect to r if b = 1. Discuss
under what conditions the disease persists in a population confined to living in a group of indicated size.
19. N = 1, 000 (boarding school)

20. N = 10, 000 (army camp)
21. N = 100, 000 (isolated town in Alaska)
22. N = 1, 000, 000 (isolated city in remote region of Asia)
Habitat destruction
Consider a population living in a patchy environment. Let y be the fraction of patches occupied by the species of
interest. Let c ≥ 0 denote the colonization rate (i.e. the rate at which individuals from one patch colonize an empty
patch), d ≥ 0 the rate at which individuals clear out of a patch, and 0 ≤ D ≤ 1 is the fraction of patches destroyed
by mankind. Then we get the following model of Harvard biologist, Richard Levins,
dy
= cy(1 − D − y) − dy
dt
Sketch the bifurcation diagram for this differential equation for the information given in Problems 23 to 28.
23. Assume that D = 0. Sketch bifurcation diagrams for d when c = 1 and for c when d = 0. Under what
conditions does the population persist?
24. Assume that D = 0. Sketch bifurcation diagrams for d when c = 2 and for c when d = 1. Under what
25. Assume that D = 0.5. Sketch bifurcation diagrams for d when c = 1 and for c when d = 0.5. Under what
26. Assume that D = 0.5. Sketch bifurcation diagrams for d when c = 2 and for c when d = 2. Under what
27. Assume that c = 3/2. Sketch bifurcation diagrams for D when d = 1/2 and for d when D = 0. Under what
28. Assume that d = 2. Sketch bifurcation diagrams for D when c = 4 and for c when D = 1/2. Under what
Lotka-Volterra predation
In the 1920’s two mathematicians, Vito Volterra (1860-1940) and Alfred Lotka (1880-1949), considered models
of the density of a prey species, denoted here by the variable x, predated by a species at a density denoted here by
the variable y. They then wrote down two differential equations, one for the prey and one for the predator, in which
the prey equation included the predator density and the predator equation included the prey density. We have not

developed the theory on how to analyze a system of two interdependent differential equations, which rightly belongs
to a course on multivariate calculus, but at least we can analyze the behavior of the prey equation or the predator
equation where the density of the other species appears as a parameter. The general form of the prey equation is:
dx
Prey Equation: = xg(x) − yh(x),
dt
where g(y) is a per capita growth rate of the prey species and h(x) is the rate at which each unit of predator is able
to extract prey. Note it is assumed that both g(0) = 0 and h(0) = 0. The general form of the predator equation is:
dy
Predator Equation: = ybh(x) − yf (y),
dt
where h(x) is the prey extraction rate per predator appearing in the above prey equation, 0 < b < 1 is the efficiency
with which predators can convert a unit of consumed prey into their own biomass (ingestion, digestion, metabolism,
etc.), and f (y) is the rate at which predators die when they have no prey species to feed upon.
In Problems 29 to 31 sketch the bifurcation diagram for the specified growth and extraction functions in the prey
equation in which the density y of the predators is regarded as a parameter in the prey equation and in Problems
32 to 33 sketch the bifurcation diagram for the specified extraction and mortality functions in the predator equation
in which the density x of the prey species is regarded as a parameter.
29. In the classic Lotka-Volterra model g(x) is constant or or is a decreasing linear function and h(x) is homogeneous
(i.e. h(0) = 0) linear, so assume g(x) = 0.5(1 − x/3) and h(x) = x. Under what conditions does the population
persist?
30. In the modified Lotka-Volterra model g(x) is constant or is a decreasing linear function and h(x) is a saturating
x
function of prey density x, so assume g(x) = 0.5 and h(x) = x+2 . Under what conditions is the population
reigned in by predation?
x
31. Assume g(x) = 0.5(1 − x/3) and h(x) = y x+2 . Under what conditions is the prey population driven to
extinction by predation?
32. In the classic Lotka-Volterra model f (y) is constant. Thus, assuming, b = 0.2 and f (y) = 1 under what
conditions does the predator population persist when h(x) = x?
33. Assume b = 0.2 and f (y) = y under what conditions does the predator population persist when h(x) = x?
34. A self regulatory genetic network Smolen et al. (1998, 1999) investigated a model of a single transcription
factor, TF-A, that activates its own transcription TF-A forms a homodimer that activates transcription by
binding to enhancers (TF-REs). A rapid equilibrium is assumed between monomeric and dimeric TF-A. The
transcription rate saturates with TF-A dimer concentration to a maximal rate a, which is proportional to TF-A
phosphorylation. Responses to stimuli are modeled by varying the degree of TF-A phosphorylation. A basal
synthesis rate d is present, as well as a first-order process for degradation, −cy. If y denotes the concentration
of TF-A then the model is given by
dy a y2
= − cy + d
dt b + y2
Assume that b = 1, c = 1, and d = 0.1. Sketch a bifurcation diagram over the region 1 ≤ a ≤ 3 and 0 ≤ y ≤ 3.
Discuss when you expect to see two stable equilibria.
35. (Evolution of Cooperation, part III) Problem 31 in Section 6.5 investigated how individuals that interact fre-
quently and respond to the strategy of their opponents can promote the evolution of cooperation. If opponents
interact on average n times and cooperation gives a benefit B to the opponent and a cost C to the cooperator,
then the pay off matrix for the strategies tit-for-tat and defect are given by
Tit for Tat Defect

Tit for Tat n(B − C) −C
Defect B 0

a. Write down a replicator equation for this game.

b. Assume B = 4 and C = 3, and sketch a bifurcation diagram with respect to the parameter n.
c. Discuss the implications for the evolution of cooperation.


DEFINITIONS
Section 6.1
Differential equation (ODE), p. 547
Paradigm, p. 550
Logistic equation, p. 554
Carrying capacity, p. 554
Equilibrium solution, p. 554
Section 6.2
Solution (of an ODE), p. 562
Separable ODE, p. 564
Section 6.3
Linear differential equation, p. 572
Pharmacokinetics, p. 573
Biopharmaceutics, p. 573
von Bertalanffy growth equation, p. 578
Section 6.4
Autonomous (ODE), p. 587
Nullcline, p. 585
Section 6.5
Equilibria, p. 601
Phase line, p. 601
Polymorphic equilibria, p. 606
Stable equilibrium, p. 606
Unstable equilibrium, p. 606
Semi-stable equilibrium, p. 606
Resilient population, p. 609
Section 6.6
Bifurcation, p. 616
Bifurcation values, p. 617
Section 6.1
Section 6.2
Separation of variables, p. 562
Section 6.3
Newton’s law of cooling, p. 577
Section 6.4
Slope fields, p. 584
Euler’s method, p. 589
Section 6.5
Classification of equilibria, p. 606
Theorem 6.1 Linearization, p. 608
Section 6.6
Bifurcation diagram, p. 617

Section 6.1
Harvesting queen conch
Radioactive decay (Problem Set)
Cane toad control in Australia (Problem Set)
VCR use in the USA (Problem Set)
Growth of HIV epidemic in Ohio (Problem Set)
Section 6.2
Carlson’s yeast data
Doomsday prediction (Problem Set)
Tumor growth (Problem Set)
Growth of Hispanic population in the USA (Problem Set)
Mass action and chemical interaction rates (Problem Set)
Section 6.3
Modeling HIV
Infusion rate
Drug elimination rate in the body
Lake pollution
Newton’s Law of Cooling
Forensic medicine
Organismal growth
von Bertalanffy growth equation
Growth of the US gross domestic product (Problem Set)
Growth of divorce rates in the US (Problem Set)
Section 6.4
Allee effect
Lake pollution (revisited)
Section 6.5
Haploid or clonal genetics
Hawk-Dove game
Membrane potential of a neuron
Population resilience
Stag hunt (Problem Set)
Section 6.6
Harvesting queen conch (revisited)
Pitchfork bifurcation
Wilson-Cowen model
Memory formation
Growth of the US gross domestic product (Problem Set)
Evolution of cooperation (Problem Set)
Effect of a generalist predator (Problem Set)
SIS epidemic models (Problem Set)
Habitat destruction (Problem Set)
Lotka-Volterra predation (Problem Set)
Self-regulatory genetic network (Problem Set)
Problem Set 6.7

Find a family of solutions (i.e. a solution involving general constants) to the differential equations in Problems 1 to
8 by separating variables.
dy
1. (x − 5) dx = xy

dy
2. dt = y tan t
3. (e2t + 9) dy
dt = t
4. y dy
dt = e
t−3y
cos t
√
5. t t2 − 9 dydt = 9
dy
6. xy dx = x2 + y 2 + x2 y 2 + 1
7. Solve y dy
dt = e
t+2y
sin t with initial condition t = 0, y = 0
8. Solve (y + 1)et dy 2
dt = (y + 2y + 2) with initial condition t = 0, y = −1
Estimate a solution for Problems 9 to 12 using Euler’s method. For each of these problems, a slope field is given.
Superimpose the segments from Euler’s method on the given direction field. Does the solution appear to fit?
dy t+y
9. dt = t−y passing through (0, 1) for 0 ≤ t ≤ 0.5, h = 0.1
Figure 6.39: Vector Field for Problem 9
dy
10. dt = 2t(t2 − y) passing through (0, 3) for 0 ≤ t ≤ 2, h = 0.4
dy 5t−3ty
11. dt = 1+t2 passing through (0, 0) for 0 ≤ t ≤ 0.5, h = 0.1
dy y 2 +2t
12. dt = 3y 2 −2ty passing through (0, 1) for 0 ≤ t ≤ 0.5, h = 0.1
Sketch the solution to the initial value problems using the slope fields given in Problems 13 to 14.
dy 2y+2t3
13. dt = t with y(2) = −9
dy
14. dt = sin t sin y with y(0) = 0.25
15. The radioactive substance gallium-67 (symbol 67 Ga) used in the diagnosis of malignant tumors has a half-life
of 46.6 hours. If we start with 100 mg of 67 Ga, what percent is lost between the 30th and 35th hours? Is this
the same as the percent lost over any other 5-hour period?

16. A certain artifact is tested by carbon dating and found to contain 75% of its original carbon-14 (half-life 5,730
yr). As a cross-check, it is also dated using radium, and was found to contain 32% of the original amount.
Assuming the dating procedures were accurate, what is the half-life of radium?
17. Consider a Hawk-Dove game (see Section 6.5) with a “payoff” of V = 100 and a “cost” of C = 30. Sketch the
phase line and then discuss the evolutionary implications. Contrast this with another scenario with a payoff of
V = 100 and a cost of C = 180.
18. In 1986, the Chernobyl nuclear disaster in the Soviet Union contaminated the atmosphere. The buildup of
radioactive material in the atmosphere satisfies the differential equation

10
y
0
−2
−4
−6
−8
−10
2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4
t
1.5
0.5
y
−0.5
−1
−1.5
−2
0 1 2 3 4 5 6
t

dM k
=r −M M (0) = 0
dt r
where M is the mass of the radioactive material in the atmosphere after time t (in years); k is the rate at
which the radioactive material is introduced into the atmosphere; r is the annual decay rate of the radioactive
material. Find the solution, M (t), of this differential equation in terms of k and r.
19. A population of animals on Catalina Island is limited by the amount of food available. Studies show there
were 1,800 animals present in 1980 and 2,000 in 1986, and suggest that 5,000 animals can be supported by the
conditions present on the island. Use a logistic model to predict the animal population in the year 2000.
20. A lake has a volume of 6 billion ft3 , and its initial pollutant content is 0.22%. A river whose waters contain
only 0.06% pollutants flows into the lake at the rate of 350 million ft3 /day, and another river flows out of the
lake also carrying 350 million ft3 /day. Assume that the water in the two rivers and the lake is always well
mixed. How long does it take for the pollutant content to be reduced to 0.15%.

6.8 Group Projects

of the following projects.
Group project 6A: Modeling Diseases

Consider the situation where a small group of people having an infectious disease is inserted into a large population
which is capable of catching the disease. What happens as time evolves? Will the disease die out rapidly, or will
an epidemic occur? How many people will ultimately catch the disease. The goal of this project is to address these
questions by deriving a system of differential equations which govern the spread of the infectious disease, and to
analyze the behavior of its solutions.
You may begin with the assumption that the disease under consideration confers permanent immunity upon
any individual who has completely recovered from it, and that it has a negligibly short incubation period. This
latter assumption implies that an individual who contracts the disease becomes infective immediately afterwards.
Therefore, you may divide the population into three classes of individuals: the infective class (I), the susceptible class
(S), and the removed class (R). The infective class consists of those individuals who are capable of transmitting the
disease to others. The susceptible class consists of those individuals who are not infective, but are capable of catching
the disease and becoming infective. The removed class consists of those individuals who have had the disease and
are dead, or have recovered and are permanently immune, or are isolated until recovery and permanent immunity
occur.
To complete this project you should
• Write down a system of three first order differential equations based on the following additional assumptions:
Assumption 1: The total population remains fixed at a level N in the time interval of consideration.
Assumption 2: The rate of change of the susceptible population is proportional to the product of the
number of susceptible and the number of infected.
Assumption 3: Individuals from the infected class are removed and enter the removed class at a rate
proportional to the number of infected.
• Assume that R(0) = 0. Use the fact that S(t) + I(t) + R(t) = N , and special features of the dS dR
dt and dt
dR
equations to show that there exists a function F (R) such that dt = F (R) (i.e., the system reduces to ONE
first order equation!)
dR
• Show that dt = F (R) can be rewritten as
dx
= a − bx − e−x (6.1)
dτ
with a ≥ 1 and b > 0 by appropriately rescaling R and t to x and τ .
• Determine the number of fixed points of (6.1) and classify their stability.
• Show that if b < 1 and x(0) = 0, then x′ (τ ) is increasing from τ = 0 until it reaches a maximum at some time
τmax > 0. Show that if b > 1 and x(0) = 0, then x′ (τ ) is decreasing. What do these facts imply about I(t)?
Discuss the biological implications.

Group project 6B:∗ Save the Perch Project

Happy Valley Pond is currently populated by yellow perch. A map of the pond is shown in Figure 6.45. Use this
map, which shows at each grid point the depth of the pond in feet when the dam is at spillover level, as well as the
fact that each grid cell is 5 × 5 ft2 to estimate the number of gallons of water in the pond when the water level is
exactly even with the top of the spillover dam. This information will be needed to construct your model to account
for the following additional facts. Water flows into the pond from two springs and evaporates from the pond at rates
Figure 6.45: Happy Valley Pond is fed by two springs A and B
given in the following table.

Spring Dry Season (6 months) Rainy Season (6 months)
A 50 gal/h 60 gal/h
B 60 gal/h 75 gal/h
Evaporation 110 gal/h 75 gal/h
The pond at all times is well mixed by the inflows, outflows and wind. Recently, Spring B became contaminated by
an underground salt deposit so that its water is a 10% salt solution, which means that 10% of a gallon of water from
Spring B is salt. Assume that the salt does not evaporate but is instead well-mixed with the water in the pond so
that the rate of salt lost is determined by the outflow rate of water and the well mixed concentration of salt in the
pond at the time of outflow.
The yellow perch in the pond are salt intolerant and start to die when the concentration of salt exceeds 1%.
There was no salt in the pond before the contamination of Spring B. You and the members of your group (if you
have one) have been called upon by the Happy Valley Bureau of Fisheries to try to save the perch. Unfortunately
Spring B is underground and cannot be capped off, but you are able to pipe fresh water from other sources to help
dilute the salt concentration in the pond.
Assume the salt contamination in Spring B started at the beginning of the dry season in 2004 (t = 0), when the
pond was exactly even with the top of the spillover dam. Selecting the units of t to be hours, formulate a differential
equation for the amount of salt in the pond at any time t after the start of the dry season in 2004. Remembering to
take into account the seasonal nature of the flows using your differential equation solving technology, draw a graph
of the amount of salt in the pond over the dry season in 2004 and over the following wet season.
Now use the model to address the following questions, assuming no management interventions unless specifically
asked to do so:
1. What is the equilibrium solution under persistent dry season conditions.
∗ Courtesy of Diane Schwartz from Ithaca College, New York.

2. What is the equilibrium solution under persistent wet season conditions.

3. What will the salt profile look like in the long run over a combined dry and wet season. (Use your model to
produce the profile for the first dry-wet season, the second dry-wet season, and so on until all two subsequent
profiles are identical to the desired number of decimal places.)
4. From this long term profile identify the periods when the perch are threatened (i.e. salt concentrations exceed
1%) and calculate how much water will need to be piped in to ensure that the perch remain safe, given that
whenever fresh water is piped in it is always at the rate of 100 gallons of pure water per hour (i.e. all the
Happy Valley Bureau of Fisheries Management Council can decide is when to switch on and off the spigot).
Your report should include the design and analysis of a plan that can be used to ensure that the water in the
pond never gets too salty for the perch. Can you come up with any interesting innovations that might help manage
the salinity of the pond?


Chapter 7
Probabilistic Applications of Integration
7.1 Histograms, PDFs and CDFs, p. 639
7.2 Improper Integrals, p. 659
7.3 Mean and Variance, p. 674
7.4 Bell-Shaped Distributions, p. 691
7.5 Life Tables, p. 712
PREVIEW
At the beginning of the 19th Century, the prevailing scientific view was a clockwork universe in which everything
was determinable, if not actually determined. Thus an asteroid hitting the earth, was not a chance event, but could
be anticipated if the position and velocity of all asteroids in the solar system were known. This was the view of
Pierre-Simon Laplace(1749-1827) whose methods on how to compute the future positions of planets and comets from
observations of their past positions were published in a five volume treatise entitled “Méchanique céleste.” Napoleon
Bonaparte in commenting to Laplace that he found no mention of God in his treatise is reputed to have received the
reply from Laplace he had need for that hypothesis. A century later, the clockwork view of 19th century classical
mechanics was shattered by 20th century quantum mechanics, built on Werner von Heisenberg’s (1901-1976) pivotal
uncertainty principle. This principle implies the precision with which both the position and velocity of any object
can be known at a given point in time is limited. Einstein’s difficulty in accepting this principle is encapsulated
in one of his most quoted attributions (translated from the German): “God does not play dice with the universe.”
Whether or not we believe that the time of death of an individual is preordained by God or is subject to the throw of
a cosmic die many kinds of computations in biology are intrinsically predictions of things happening with particular
frequencies rather than happening with absolute certainties.

638
“God does not play dice with the universe.” – Albert Einsetin (1879-1955)
In this chapter, we investigate the applications of calculus to probability and statistics with applications in biology,
particularly population biology. For example, the process whereby individuals inherit genes from their parents is
essentially a random process so that questions such as “Will a child inherit a defective gene from a parent?” are only
answerable in terms of expectations. The answer to such a question might be that we expect a half or a quarter of
all children of a particular couple to carry a specific gene, depending on the gene in question. Applied population
biology questions relating to demography, such as “What are the chances that individuals of a particular species will
live beyond age t?” (see Section 7.5) can also only be answered in probabilistic terms.
The concept of chance is not one that has come easily to us. For all of human history most individuals have
believed that their lives and the things they do are controlled by God or gods. It is only for the past 350 years that
we have taken seriously the prospect of being able to calculate outcomes of events based on a theory of chance. The
mathematical theory of chance is reputed to have gotten its start when a French nobleman, Chevalier de Méré, with
a penchant for gambling and an interest in mathematics, challenged the French mathematician, Blaise Pascal (1623-
1662), to solve a betting problem. Pascal teamed with another French mathematician, Pierre de Fermat (1601-1665)
to solve the problem and, in the process, laid the foundations of probability theory.
The mathematical study of the likelihood of a particular occurrence, or event, is known as probability theory.
It is no exaggeration to say that without probability theory, the biological sciences would not exist as we know them
today. All concepts, ideas, and calculations relating to the theory and practice of mathematical statistics and its
indispensable application to experimental biology would not exist. Thus we take the view in this text that students
in the biological sciences should become immersed in ideas that relate to chance and probabilities as early as possible
and have these ideas reinforced as often as possible. This cannot be done at the expense of learning the foundations
of calculus, which is the gateway to world of biological modeling. But, where ever we can provide a more relevant
training through exposure to the probability and its many essential applications in biology, we do so. Thus, in
this chapter, we develop some of the basic ideas of probability in biology through the the application of calculus,
particularly integration. Among the examples we consider are bird diversity in woodlands, wheat yields, life histories
of the dinosaurs Albertosaurus and Tryrannosaurs rex (as gleaned from the fossil record) and of the painted turtle,
Figure 7.1.

7.1. HISTOGRAMS, PDFS AND CDFS 639
(i) Back of shell (ii) Bottom of shell.
Figure 7.1: Painted turtle
7.1 Histograms, PDFs and CDFs

Histograms and Probabilities
In previous chapters we have seen how gathering data leads to an uncovering and ultimately an understanding of
many different phenomena. On the other hand, some times the data we have gathered is so extensive that it is
difficult to understand what we have gathered. One way of visualizing large data sets in a way the enhances our
ability to comprehend them is through a graph known as a histogram. A histogram is a bar graph of a frequency
distribution in which the widths of the bars are proportional to the intervals into which the variable has been divided
and the heights of the bars are proportional to the interval frequencies. The histogram can give the viewer a sense
of whether there is a center to the data (i.e. where most of the data points lie), how much spread there is about a
center (i.e. how far data points spread from the center), and how skewed the data set is (i.e. whether there are more
data points to the left or right of the center). The histogram can also be used to visually identify multiple peaks in
the data set as well as outliers in the data set (i.e. points manifested as isolated bars in the tails of the histogram).
Given a data set, there are many types of histograms that one can create. The most common form is obtained
by splitting the range of the data into equal-sized intervals. For each interval, the proportion of data points that fall
into the interval is determined. To draw the histogram, break up the horizontal axis into the equal-sized intervals.
Above each of these intervals, draw a rectangle whose area equals the proportion of data points lying in the interval.
Example 1. Bird diversity in oak woodlands
In the spring 1994, the number of bird species in 40 different California oak woodland sites were collected. Each
site was around 5 hectares in size–the equivalent of about 12 31 acres, or 0.0193 square miles–and were situated in
relatively homogeneous habitat. The number of bird species found in these sites is listed below:
37, 21, 26, 27, 21, 21, 28, 22, 22, 26, 47, 26, 29, 34, 28, 25,
19, 32, 32, 29, 29, 16, 21, 24, 37, 38, 30, 20, 23, 30, 27, 32,
17, 24, 32, 29, 40, 31, 38, 35
a. Construct a histogram with two intervals corresponding to 0 to 25 species and 25 to 50 species.
b. Construct a histogram with intervals of width 10 for the interval [0, 50].
c. Use technology to construct a histogram with intervals of width 5 for the interval [0, 50].
d. Determine the units on the vertical axis of your histograms.

640 7.1. HISTOGRAMS, PDFS AND CDFS
For these problems, assume the intervals include the left end point but not the right end point.
Solution.
a. Since 13 of the 40 data points are between 0 and 24, the fraction of data points in the first interval is
13/40 = 0.325. Since the remaining 27 data points are in the second, the fraction of data points in the
second interval is 27/40 = 0.675. To draw the histogram, we sketch a rectangle over the right interval
that is approximately twice as high as the rectangle over the right interval. More precisely, we want
the area of the left rectangle to equal 0.325. Since the base of the rectangle is of length 25, its height
must be 0.325/25 = 0.013. We want the area of the right rectangle to equal 0.675. Therefore, the height
of this rectangle is 0.675/25 = 0.027. The resulting histogram is illustrated in Figure 7.2a. Note, by
construction, that the total shaded area is 1.
b. Intervals of width 10 correspond to [0, 10), [10, 20), [20, 30), [30, 40), and [40, 50). The number of data
points in [0, 10) interval is 0. Hence we draw no rectangle over this interval. The number of data points
in the [10, 20) interval is 3. Hence, the fraction of data in this interval is 3/40 = 0.075. Since the width
of the interval is 10, the height of the rectangle over the [10, 20) interval should be 0.075/10 = 0.0075.
Similarly, the number of data points in the [20, 30) interval is 22 implying that the fraction of data in this
interval is 22/40 = 0.55 and that the height of the rectangle over this interval should be 0.55/10 = 0.055.
For the interval [30, 40), the number of data points is 13, the fraction of data is thus 13/40 = 0.325 so
that the height of the rectangle over this interval should be 0.325/10 = 0.0325. Finally for the interval
[40, 50) the number of data points in interval is 2, the fraction of data is thus 2/40 = 0.05 so that the
height of the rectangle over this interval should be 0.05/10 = 0.005. The resulting histogram is illustrated
in Figure 7.2b. Again, by construction, the total shaded area is 1.
c. Many programs exist (e.g. most spreadsheet and statistical software) that create histograms for which
one can specify the size of the intervals to be plotted. Specifying intervals of width 5 in one of these
programs yields Figure 7.2c.
d. Since the areas of the rectangles are unitless, the units on the vertical axes of the histogram have to be
the reciprocal of the units on the horizontal axes. In other words, the product of the units on the axes has
to be unitless. For the histograms in Figure 7.2, the units on the horizontal axis are number of species.
1
Hence, the units on the vertical axes are .
number of species
2
0.06
0.025 0.05
0.05
0.02 0.04
0.04
0.015 0.03
0.03
0.01 0.02 0.02
0.005 0.01 0.01
10 20 30 40 50 20 30 40 50 20 25 30 35 40 45 50
(a) (b) (c)
Figure 7.2: Species richness in oaklands
As we have seen, one interpretation of the area of a rectangle in a histogram is the proportion of the data in the
interval. An alternative interpretation is in terms of probabilities and random variables. To describe this alternative
interpretation, imagine you have a data set and each data value is recorded on a slip of paper. You place all of these
slips of paper into a hat and shake it. You close your eyes and grab a slip of paper from the hat. Let X denote the
value of the slip that is going to be picked. X is a random variable —that is, we don’t know X’s value before hand
since this value is randomly determined through a blind drawing in which each of the slips is equally likely to be

drawn. A basic question one can ask is what is the probability that the value written on the slip of paper is in the
interval [a, b). We denote this probability using the notation
P (a ≤ X < b)
In particular, if an experiment is repeated n times and an event E occurs m ≤ n times, then the probability of
E, written P (E), that the event occurs for any particular trial is approximated by the relative frequency m/n, with
this approximation becoming exact in the limit as the number of trials increases without bound: that is,
m
P (E) = lim
n→∞ n
Intuitively, we except this probability to correspond to the proportion of data values between a and b.
Example 2. Computing probabilities
A oak woodland site is randomly selected from the 40 sites presented in Example 1. Let X denote the number of
bird species in that site.
a. Find and interpret the probability P (0 ≤ X < 25).
b. Find P (20 ≤ X < 30).
c. Find P (20 ≤ X < 40).
Solution.
a. Since the proportion of sites with less than 25 bird species is 13/40 = 0.325, we approximate P (0 ≤ X <
25) = 0.325. In other words, our best estimate is a 32.5% chance that a randomly chosen site has less
than 25 bird species. This value corresponds to the area over the interval [0, 25) in Figure 7.2a.
b. Since the proportion of sites with at least 20 species and less than 30 species is 22/40 = 0.55, we
approximate P (20 ≤ X < 30) = 0.55. In other words, our best estimate is a 55% chance that a randomly
chosen site has between 20 and 30 bird species. This corresponds to the area over the interval [20, 30) in
Figure 7.2b.
c. Since the proportion of sites with at least 20 species and less than 40 species is (22 + 13)/40 = 0.875,
we approximate P (20 ≤ X < 40) = 0.875. In other words, our best estimate is a 87.5% chance that a
randomly chosen site has between 20 and 40 bird species. This corresponds to the total area over the
intervals [20, 30) and [30, 40) in Figure 7.2b.
Example 3. From Histograms to Probabilities
In a study involving 252 men, Dr. A. Garath Fisher estimated the percentage of body fat by underwater weighing
and various body circumference measurements. A histogram for this data is shown in Figure 7.3. Assume a man is
randomly selected from this study. Let X denote the percentage of body fat of this randomly selected man.
a. Estimate P (X < 10).
b. Estimate P (10 ≤ X < 30).
c. Estimate P (X ≥ 30).
Solution.

0.035
0.03
0.025
0.02
0.015
0.01
0.005
10 20 30 40 50
Figure 7.3: Percentage body fat in a study of 225 men.
a. The area of the rectangle above the interval [0, 10] is approximately 0.015 × 10 = 0.15. Hence we
approximagte P (X < 10) = 0.15. Equivalently we estimate that 15% of men have less than 10% body
fat.
b. The area of the rectangle above the interval [10, 20] is approximately 0.037 × 10 = 0.37. The area of
the rectangle above the interval [20, 30] is approximately 0.038 × 10 = 0.38. Hence we approximate
P (20 ≤ X < 30) = 0.37 + 0.38 = 0.75. Equivalently we estimate that 75% of men have between 10% and
30% body fat.
c. Since the sum of the areas of the rectangles must be one (i.e. the total fraction of data is one), the
area of the rectangle over the intervals [30, 40] and [40, 50] must equal 1 − 0.75 − 0.15 = 0.10. Hence we
approximate P (X ≥ 30) = 0.10 or, in words, we estimate that 10% of the men in the study have greater
than 30% body fat.
2
Probability density functions

Biologists are often faced with the question of whether or not two populations differ with respect to a particular
attribute or trait. For example, one my ask whether 10 year old boys are, on average, taller or shorter than 10 year
old girls. One could not effectively answer this question by choosing five 10 year old boys and five ten year old girls
in your local neighborhood and then compare the averages of the five individuals in each of the two groups because,
just by chance, a couple of unusually tall girls or short boys may be included. Such chance events would give a false
view of the real situation. To answer this question one needs a statistically adequate sample of representatives from
the two population being compared. Further, we have to be clear how we define these populations because it is well
known that individuals from different nationalities and groups differ on average with respect to height. For example,
Tutsi men of Burundi and Rwanda are regarded as the tallest humans, averaging over 6 ft, while Pygmy men and
women of central Africa are the shortest, averaging 4 ft 5 in and 4 ft 6 in respectively.
The best way to compare two populations is to obtain sufficiently many randomly chosen individuals from both
populations so that the histograms constructed for both populations are smooth enough to be approximated by
continuous curves. The two continuous curves can them be compared visually through graphical superposition. A
whole field of analysis called statistical inference has been developed to analytically compare such graphs, but this is
material presented in statistics rather than calculus courses. The continuous function used to represent a particular
histogram is called a probability density function (PDF), a theoretical construct that we discuss more fully after the
next example.
Example 4. Comparing two populations of mice
An ecologist decides that she is going to undertake a study of how the distribution of mice weights differ in two
locations.

a. Discuss how she could go about creating a PDF for this study.
b. Compare the PDFs for the two studies and draw some conclusions about the weight of the mice in the
two sites.
Solution.
a. The variable of interest in this study is the weight of an individual mouse. To make sure the results are
not effected by chance events associated with small sample size, our ecologist decides to hire a whole class
of undergraduates and put them to work trapping, and weighing mice. (The students diligently mark the
captured mice with a dab of non-toxic paint so that if they are trapped again they will not be measured
twice.) The students are kept working until many thousands of mice are trapped and weighed in each
area. In fact, so many mice are weighed that the frequencies of mice found in each weight range provide
a very good estimate of the probability that any mouse selected at random will be in that weight range.
The weight of the mice is found to range between 20 and 50 grams. A histogram of the data organized
into 10g categories for the weight of individual mice at locations 1 and 2 is illustrated in Figure 7.4a.
Plotting the two histograms gives us some sense of the distribution of the mice in the two locations
1
and the difference between them. The weight of the mice, however, were recorded to the nearest 10 -th
gram, making it possible to plot the histogram with smaller class intervals. Histograms corresponding
to intervals of width 2g, and 0.5g are illustrated in Figures 7.4b and c, respectively. These finer levels
of resolution provide smoother and smoother representation of the distribution. In fact, it is possible to
approximate the histogram with a smooth curve as shown in Figure 7.4d.
b. Figure 7.4d shows that field mice in the first location have a range of weights from 25 grams to 38 grams
while in the second location the rate is 32.5 grams to 45 grams. While the ranges overlap, the center of
the histogram for the first location is approximately 32.5 grams, while the center for the second location
is approximately 37.5 grams. Hence, the field mice in the second location tend to be 5 grams heavier.
0.175 0.2
0.15
0.15
0.125 0.15 0.15
0.125
0.1
0.1
0.075 0.1 0.1
0.075
0.05
0.05 0.05 0.05
0.025 0.025
25 30 35 40 25 30 35 40 25 27.5 30 32.5 35 37.5 25 27.5 30 32.5 35 37.5
0.175 0.2 0.2

0.15
0.15
0.125
0.125 0.15 0.15
0.1
0.1
0.075 0.1 0.1
0.075
0.05
0.05
0.05 0.05
0.025 0.025
32.5 35 37.5 40 42.5 45 32.5 35 37.5 40 42.5 45 32.5 35 37.5 40 42.5 45 32.5 35 37.5 40 42.5 45
(a) (b) (c) (d)
Figure 7.4: Histograms of the field mice data. Each row of figures corresponds to a different site.
Some data sets are naturally discrete, such as the distribution of the litter size among female cats of a particular
age while others involving physical measurements such as height, weight, or time can take on a continuum of values
(ignoring the issue of resolution of the measuring device). In the latter case, when sufficiently many measurements
are taken, the histogram is well approximated, as seen in the previous example, by a continuous probability density
function (PDF)—that is, a function with the following properties:

A probability density function (PDF) is a piece-wise continuous function f (x)

such that
Probability density
function (PDF) • f (x) ≥ 0 for all x i.e. probabilities are non-negative.
• the total area under f (x) equals one i.e. all the data lies on the real line.
Example 5. Constructing a PDF from a non-negative function
Let a be a constant. Consider the function defined by f (x) = ax for 0 ≤ x ≤ 5 and f (x) = 0 otherwise. Determine
for what value of a the function f is a PDF.
Solution. In order for f to be a PDF, f needs to be non-negative. Hence a must be non-negative. The area under
f must equal 1. Since f (x) = 0 outside the interval [0, 5], the area under f is given by
Z 5
a x2 5 a25
a x dx = = =1
0 2 0 2
Solving for a yields a = 2/25 = 0.08. 2
For a data set described by a PDF, f (x), the fraction of data in the interval [c, d] is given by the area enclosed
by f (x) over the interval [c, d]. Formally:
Z d
Area under a PDF Fraction of data in [a, b] = f (x) dx
c
The importance of being able to calculate areas corresponding to a particular range of for x will be made apparent
after the next example.
Example 6. Finding fractions of data
Consider a data set whose histogram can be approximated by f (x) = 0.08 x for 0 ≤ x ≤ 5 and f (x) = 0 otherwise.
Determine the fraction of data lying in the interval [2, 4].
Figure 7.5: Fraction of data lying between 2 and 4 for the PDF f (x) = 0.08x

Solution. The fraction of data lying in [2, 4] is given by

Z 4
x2 4
0.08x dt = 0.08
2 2 2
= 0.08(8 − 2) = 0.48
Hence, 48% of the data lies in the interval [2, 4]. 2
An alternative interpretation of the area under a PDF is seen as follows. Imagine that we have a hat with an
infinite number of (infinitely thin) slips of paper each with different numbers such that the proportion of slips with
numbers in the interval [a, b] is given by
Z b
f (x) dx
a
where f is a PDF i.e. f is obtained by making a large number of drawings, constructing the resulting histogram and
fitting a continuous curve f (x) to this histogram, as outlined in Example 4.
Now shake this hat and, with your eyes closed, grab a slip of paper. Let X denote the value on this slip. Since
X can assume any real value, it is called a continuous random variable with PDF f (x) where the probability
Rb
that X takes on a value in the interval [a, b] equals a f (x) dx. Equivalently, we write
Z b
P (a ≤ X ≤ b) = f (x) dx
a
Example 7. Birth times
As illustrated in Figure 7.6, birth times of babies are approximately uniformly distributed over the year. What
this implies is that the birth time X (in days) of a randomly chosen individual in the population has the following
PDF 1
365 0 ≤ x ≤ 365
f (x) =
0 otherwise
a. Show that f is a PDF.

b. Compute the probability of a randomly chosen individual having a birth day in January.
Figure 7.6: Distribution of birthdays
Solution.
a. Since f (x) = 0 outside of the interval [0, 365], the area under f (x) is
Z 365
dx 365
= =1
0 365 365
Since f (x) ≥ 0 for all x, f is a PDF.

b. Since January comprises of the first 31 days of the year, we obtain

Z 31
dx 31
P (0 ≤ X ≤ 31) = = ≈ 0.0849315
0 365 365
In other words, there is approximately a 8.5% chance that a randomly chosen student from your calculus
class is born in January.
The PDF in Example 12 is an example of the following general class of density functions:
The uniform PDF on the interval [a, b] is given by

1
Uniform PDF b−a if a ≤ x ≤ b
f (x) =
0 elsewhere
In the problem set, you will be asked to verify that the uniform PDF is indeed a PDF.
Cumulative distribution functions

An alternative way of describing a random variable is with a cumulative distribution function.
The cumulative distribution function (CDF) of a random variable X is the

Cumulative function F defined by
distribution F (x) = P (X ≤ x)
function (CDF) If X describes a data set, then F (x) equals the fraction of data in the interval
(−∞, x].
If X is a continuous random variable with PDF f (x), then F (x) corresponds to the area under f over the interval
(−∞, x]. Formally we write this Z x
F (x) = f (t) dt
−∞
If there exists an a such that f (t) = 0 for t ≤ a, then

Z x
F (x) = f (t) dt
a
The case for which there is no such a results in an improper integral (i.e. an integral over an infinite range) which
is discussed in Section 7.2.
There are several nice things about CDFs (opposed to PDFs). For example, if X is a random variable with a
CDF F , then
P (a < X ≤ b) = F (b) − F (a)
Thus, when given a CDF, computing probabilities is much easier as no integration needs to be preformed. Of course,
if you are only given the PDF, you are stuck with doing the integration one way or the other!
Example 8. From PDF to CDF
Consider the birth time PDF 1

365 0 ≤ x ≤ 365
f (x) =
0 otherwise

a. Interpreting x as a continuous variable (fraction of days are still part of a particular day), find and plot
the CDF for this PDF.
b. Use the CDF to find the fraction of data lying in January. Compare your answer to what was found in
Example 12.
Solution.
a. Since f (x) = 0 for x ≤ 0, we obtain F (x) = P (X ≤ x) = 0 whenever x ≤ 0. Alternatively, for
0 ≤ x ≤ 365, we obtain
Z x
dx x
F (x) = P (X ≤ x) = P (0 ≤ X ≤ x) = =
0 365 365
Finally, we have
F (x) = P (X ≤ x) = 1
for x ≥ 365. Thus, 
 0 if x ≤ 0
x
F (x) = 365 if 0 ≤ x ≤ 365

1 if x ≥ 365
Plotting the CDF yields Figure 7.7.
1
0.8
0.6
0.4
0.2
-200 200 400
Figure 7.7: The CDF for the birth time distribution
b. January corresponds to the interval [0, 31]. The fraction of data in this interval is given by F (31)−F (0) =
31
365 − 0 ≈ 0.849315. This answer agrees with what was found in Example 12.
As Example 8 illustrates, a CDF F (x) for a random variable (or for PDF) has the following properties.
A CDF F (x) is characterized by the following three properties:

1. 0 ≤ F (x) ≤ 1 for all x as a probability is always between 0 and 1
CDF properties 2. F (y) ≥ F (x) whenever y ≥ x i.e. F is a non-decreasing function.

3. Since X always takes on some finite value, limx→∞ F (x) = 1 and
limx→−∞ F (x) = 0.
Amazingly, CDFs arise quite naturally from differential equation models as the following example illustrates.

Example 9. Drug decay and the exponential CDF
Lidocaine is a common local anesthetic and antiarrhythmic drug. The eliminate rate constant for lidocaine is
c = 0.43 for most patients. If y is the amount of drug in the body and there is no further input of drug into the
body, we can model the drug dynamics by
dy
= −0.43 y y(0) = y0
dt
where t denotes times in hours and y0 is the initial amount of Lidocaine in the body.
a. Solve for y(t).
b. Write down an expression, call it F (t), that represents the fraction of drug that has left the body by time
t ≥ 0.
c. If we define F (t) = 0 for t ≤ 0, verify that F (t) is a CDF.
d. What is the probability that a randomly chosen drug particle leaves the body in the first 2 hours? What
is the probability that a randomly chosen drug particle leaves the body between the second and fourth
hour?
Solution.
a. Separating and integrating yields
Z Z
dy
= − 0.43 dt
y
ln |y| = −0.43t + C1
y = C2 e−0.43t
Since y0 = y(0) = C2 , we obtain y(t) = y0 e−0.43t .

y(t)
b. The fraction of drug in the body at time t is y0 = e−0.43t . Hence, the fraction that has left by time t is
F (t) = 1 − e−0.43t for t ≥ 0.
c. Let F (t) = 0 for t ≤ 0. Since 0 ≤ e−0.43t ≤ 1, F (t) lies between 0 and 1. Since F ′ (t) = e−0.43t > 0 for
t ≥ 0, F is non-decreasing for all t. Since F (t) = 0 for t ≤ 0, limt→−∞ F (t) = 0. Since limt→∞ e−0.43t = 0,
limt→∞ F (t) = 1. Hence, F is a CDF.
d. The likelihood that a particular drug particle is eliminated in the first two hours is given by F (2) ≈
0.58. The likelihood that a particular drug particle is eliminated between the second and fourth hour is
F (4) − F (2) ≈ 0.24. Hence, a randomly chosen particle is much more likely to be eliminated in the first
two hours than the second two hours.
2
Example 9 is a particular instance of the exponential distribution that arises in many applications. The general
exponential distribution and additional applications are discussed in the problem sets.
Since CDFs are non-decreasing functions, it is easier to fit functions to empirically derived CDFs than empirically
derived PDFs. When fitting this function, however, we need to be careful as the following example illustrates.
Example 10. Survivorship histograms and CDFs for the Mediterranean fruit fly
The Mediterranean fruit fly is one of the world’s most destructive pests of deciduous fruits, such as apples, pears,
and peaches, and of citrus fruits as well. Adults of both sexes may live six months or more under favorable conditions.
University of California scientist, Professor James Carey, and his colleagues, reared Mediterranean fruit flies under
laboratory conditions and recorded, daily, the number of adults surviving a given number of days after emerging
from the pupal stage. This resulted in the following data:

Interval # 1 2 3 4 5 6 7 8 9
Interval in days 0-10 10-20 20-30 30-40 40-50 50-60 60-90 70-80 >80
Proportion that die 0.03 0.19 0.08 0.11 0.08 0.10 0.11 0.25 0.05
Cumulative proportion of dead 0.03 0.22 0.30 0.41 0.49 0.59 0.70 0.95 1.00
The histogram associated with this data is illustrated in Figure 7.8a
0.025
0.8
0.02
0.6
0.015
0.01 0.4
0.005 0.2
t
20 40 60 80 20 40 60 80
(a) (b)
Figure 7.8: Mortality histogram and cumulative mortality distribution for the Mediterranean fruit fly
The cumulative proportion of dead individuals at times 0, 10, 20, · · · 80 can also be plotted. The final point
at which all individuals are dead, however, cannot be included because we do not know when this occurs. The
experiment was stopped after 85 days when 3% of individuals where still alive. We can use technology to fit a quartic
equation F (x) that goes through the origin to the 8 data points representing the cumulative data at times 10, 20, to
80 days to obtain (each coefficient is rounded to 5 significant figures) (See Figure 7.8b)
0.00059818 x + 0.00069088 x2 − 0.000015411 x3 + 1.0672 10−7 x4 .
a. Use the fitted cumulative distribution function F (x) to estimate the probability that an individual dies
before age 18 days.
b. What is the probability that an individual survives at least until age 46 days?
c. Calculate the probability that an individual of age 15 days dies by age 35 days.
d. Finally, what is the probability that an individual lives beyond 100 days?
Solution.
a. The probability that an individual dies before reaching 18 days old is, by definition of the cumulative
distribution function,
F (18) = 0.156
Hence, there is a 15.6% chance that a randomly chosen fruit fly dies before its 18th day of life.
b. The probability that an individual survives at least until age 46 days is equal to 1 minus the probability
that the individual dies before reaching age 46 days:
1 − F (46) = 1 − 0.467 = 0.533
In other words, 53.3% of the fruit flies survived at least 46 days.
c. The probability that an individual survives until age 15 is 1 − F (15). Also, the probability that an
individual dies between the start of age 15 and the start of age 35 is F (35) − F (15). But if we want
to know what proportion of individuals who are alive at start of age 15 that die by age 35, we have to
normalize the probability of dying between age 15 and 35 by the probability of making it to age 15: that
is, we need to calculate
F (35) − F (15) 0.367 − 0.118
= = 0.282
1 − F (15) 1 − 0.118

d. Finally, the probability that an individual lives beyond 100 days is 1−F (101) (i.e. 1 minus the probability
of dying by age 101). However, F (101) = 2.335 which clearly violates the requirement that F (x) ≤ 1.
The reason is that we only fitted the data up to day 80. We do not have sufficient data to know how
to construct F (x) beyond 85 days because in the original data set not all individuals had died at the
termination of the experiment. Thus, because bin sizes in our histogram are 10 days apart and we set
F (9) = 1, we effectively assumed that no individuals live beyond 90 days.
2
Example 10 suggests that for some populations, we could have a problem constructing F (x) if we do not have
an estimate on the maximum life span of individuals in the population. At the beginning of this millennium, for
example, the Guinness book of records reported that the oldest fully authenticated age to which any human has ever
lived is a French woman, Jeanne-Louise Calment who was born on February 21, 1875, and died at age 122 years
and 164 days. Individuals who appear to be older than this are alive today, but authentication of their birth date
is required for them to be listed in the Guinness book of records. Because we can never be sure what the upper
longevity bound is, this motivates us to characterize F (x) as approaching the value of 1 asymptotically as x → ∞,
rather then have F (x) reach the value of 1 at any finite point in time.
Percentiles
Using the CDF, one can define quantities called percentiles that play a special role in statistics and probability.
Let F (x) be a CDF for a continuous random variable. The value of x such that
F (x) = p is called the p × 100th percentile of the random variable. The 25th ,
Percentiles
50th and 75th percentiles are known as the first quartile, the median, and the third
quartile, respectively.
Example 11. Drug decay percentiles
In Example 9, we found the CDF

F (t) = 1 − e−0.43t t≥0
that describes the fraction of Lidocaine that has left the body after t hours. Find the median and 90th -percentile
for this CDF. Discuss what these numbers mean.
Solution. To find the median, we need to solve F (t) = 0.50 as follows:

0.5 1 − e−043t
e−0.43 t = 0.5
−0.43 t = ln 0.5
t = ln 2/0.43 ≈ 1.61
The median of 1.61 hours corresponds to the time when 50% of the drug has left the body.
To find the 90th percentile, we need to solve F (t) = 0.9 as follows:
0.9 1 − e−043t
−0.43 t
e = 0.1
−0.43 t = ln 0.1
t = ln 10/0.43 ≈ 5.35
The 90th percentile of 5.35 hours corresponds to the time when 90% of the drug has left the body. 2
Example 12. Birth times quartiles

In Example 12, we presented 1

365 0 ≤ x ≤ 365
f (x) =
0 otherwise
as the PDF of the birth time X (in days) of a randomly chosen individual in a population where births are equally
likely on any day of the year. In such a population compute the birthdays of individuals falling on the median, and
first and third quartiles of f .
Solution.
The median and first and third quartiles of the f are respectively solutions to
Z c
dx c
P (0 ≤ X ≤ c) = = = 0.5, 0.25 and 0.75
0 365 365
which are c = 182.5, 91.25, and 273.75. For a non-leap year, these correspond to the 1st of July, the 2nd of April,
and the 1st of October. 2
Example 13. An overweight baby
A medical practitioner examines a young boy of 30 months and finds that the child is 87 cm tall and weighs 15.6
kg. Use the CDC percentile charts to decide if the boy is much heavier than normal for his height and how his height
and weight relate to boys of other ages?
Solution. Reading off the CDC percentile charts for length and weight of boys aged 0 to 36 months we see that
87 cm corresponds to the 10th percentile for height of a 30 month-old boy, while 16.1 kg corresponds to the 90th
percentile for weight. Thus the boy seems to be well above normal weight for his height. His height is the same as
the median for boys aged 24 months, while the median age for his weight is off the 0-36 month chart, but is still
above the 75th for 3-year olds. 2

Figure 7.9: The CDC Length and Weight Percentile Charts for Boys Aged 0 to 36 Months
Problem Set 7.1
In Problems 1 to 4 construct a histogram for the given data sets.

Score Frequency
50-59 3
60-69 0
1.
70-72 8
80-89 4
90-99 1
Score Frequency
1-10 5
11-20 8
2. 21-30 6
31-40 10
41-50 17
51-60 15
Score Frequency
1-35 10
36-70 20
3.
71-105 35
106-140 20
141-175 10
Score Frequency
0-99 50
100-199 45
200-299 65
300-399 75
4.
400-499 60
500-599 50
600-699 80
700-799 75
800-899 30
5. If X denotes a score in Problem 1, find
a. P (50 ≤ X ≤ 59)
b. P (50 ≤ X < 69)
c. P (70 ≤ X ≤ 89)
d. P (90 ≤ X < 100)
a. P (1 ≤ X ≤ 10)
b. P (1 ≤ X < 21)
c. P (31 ≤ X < 41)
d. P (51 ≤ X ≤ 60)
a. P (X < 71)
b. P (1 ≤ X < 141)
a. P (X < 500)

b. P (X ≥ 500)
In Problems 9 to 12 find a constant a so that the given function is a PDF and find the values of x that correspond
to the median, the first quartile, and the third quartile.
9. f (x) = 2ax, 0 ≤ x ≤ 2
10. f (x) = 5ax, 1 ≤ x ≤ 5
11. f (x) = ax2 , 0 ≤ x ≤ 1
12. f (x) = 3ax2 , 1 ≤ x ≤ 4
In Problems 13 to 16 use the CDC Chart illustrated in Fig. to estimate the length for age and weight for age percentiles
for the following boys of age a months, w kg, and l cm.
13. a = 21, w = 12.2, l = 85 cm.
14. a = 27, w = 12.2, l = 87 cm.
15. a = 15, w = 13.4, l = 87 cm.
16. a = 18, w = 10.9, l = 83 cm.
1
20 if 0 ≤ x ≤ 20
17. If f (x) = Find and plot the CDF for this PDF.
0 otherwise
1
100 if 0 ≤ x ≤ 100
18. If f (x) = Find and plot the CDF for this PDF.
0 otherwise
dy
19. If dt = −0.25y, y(0) = y0
a. Find y(t).
b. If we define F (t) = 0 for t ≤ 0 and F (t) = 1 − y(t)/y0 , verify that F (t) is a CDF.
c. If X is a random variable whose CDF is given by F (t), find P (0 < X ≤ 1).
dy
20. If dt = −0.15y, y(0) = y0
a. Find y(t).
b. If we define F (t) = 0 for t ≤ 0 and F (t) = 1 − y(t)/y(t0 ), verify that F (t) is a CDF.
c. If X is a random variable whose CDF is given by F (t), find P (0 < X ≤ 1).
21. Consider the function g(x) whose graph is shown below:
2.5
2
y
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3
x

a. For what value of c is f (x) = cg(x) a PDF?

b. For a continuous random variable with PDF f (x), find P (2 ≤ X ≤ 3).
22. Consider the function g(x) whose graph is shown below:
a. For what value of c is f (x) = cg(x) a PDF?

b. For a continuous random variable with PDF f (x), find P (3 ≤ X ≤ 12).
23. For Problem 21 find an expression for the CDF and plot it.
24. For Problem 22 find an expression for the CDF and plot it.
25. Consider x
1+x if x ≥ 0
F (x) =
0 elsewhere
a. Verify that F (x) is a CDF.

b. Assume X is a continuous random variable with CDF F (x). Find P (0 ≤ X ≤ 1), P (2 ≤ X ≤ 10).
26. Consider x2
1+x2 if x ≥ 0
F (x) =
0 elsewhere
a. Verify that F (x) is a CDF.

b. Assume X is a continuous random variable with CDF F (x). Find P (0 ≤ X ≤ 1),P (2 ≤ X ≤ 10).
27. A distribution table is shown below. The table gives the distribution of cholesterol level for 6,000 children, 4
to 19 years old. Cholesterol level is measured in milligrams per 100 milliliters of blood. The class intervals
include the left end point, but exclude the right end point.
Cholesterol (in mg) Percent

100–140 18
140–180 52
180–220 20
220–260 10
a. Sketch the histogram for the given intervals.

b. Find the probability that a randomly selected child in this group has a cholesterol level of ≥ 140.
c. Find the probability that a randomly selected child in this group has a cholesterol level between 100
and 220.

28. One study of grand juries compared the demographic characteristics of jurors with the general population, to
see if the jury panels were representative. Here are the results for age. Only persons 21 and over are considered;
the county age distribution is known from Public Health Department data.
Age County-wide percentage Number of jurors

20 to 39 42 5
40 to 49 23 9
50 to 59 16 19
60 and up 19 33
Total 100 66
Sketch the histogram for the county-wide percentage and the number of jurors. What do you notice? For
simplicity, assume that the last bin is [60, 70).
29. According to the U.S. Census Bureau’s International Data Base, the life expectancies in 2000 for the following
countries are given by
Country Life Expectancy Country Life Expectancy

Argentina 75.1 Brazil 62.9
Canada 79.4 China 71.4
Colombia 70.3 Egypt 63.3
Ethiopia 45.2 France 78.8
Germany 77.4 India 62.5
Indonesia 68.0 Iran 69.7
Italy 79.0 Japan 80.7
Kenya 48.0 Korea, South 74.4
Mexico 71.5 Morocco 69.1
Pakistan 61.1 Peru 70.0
Philippines 67.5 Poland 73.2
Romania 69.9 Russia 67.2
South Africa 51.1 Spain 78.8
Turkey 71.0 Ukraine 66.0
United Kingdom 77.7 United States 77.1
Venezuela 73.1 Vietnam 69.3
Zambia 37.2
a. Sketch a histogram with the bins < 50 (plot as if on the interval [54, 50)) [50, 55), [55, 60), [60, 65),
[65, 70), [70, 75), [75, 80) and > 80 (plot as if on the interval [80, 85)).
b. Selecting one of the countries at random (i.e. each country is equally likely to be selected), what is
the probability of getting a country with a life expectancy of i.) < 60 and ii.) ≥ 70?
30. Let f (x) represent the PDF for the weight of a field mouse in Williamsburg where x is measured in grams.
Express the following probabilities as integrals:
a. A randomly chosen field mouse weighs between 20 and 30 grams.

b. A randomly chosen field mouse weighs less than 40 grams.
31. Let f (x) represent the PDF for the weight of a pigeon in New York City where x is measured in ounces. Express
the following probabilities as integrals:
a. A randomly chosen pigeon does not weigh between 13 and 14 oz.

b. A randomly chosen pigeon is in the weight class 12-15 oz, but does not weight between 13 and 14 oz.

32. If you are really bad at darts, then the PDF for the distance x (in inches) that your dart is from the center of
a 12 inch dart board may be given by

x/72 if 0 ≤ x ≤ 12
f (x) =
0 elsewhere
a. Verify that f (x) is a PDF.

b. Compute the probability that your dart is more than 9 inches from the center.
c. Compute the probability that you dart is less than 3 inches from the center.
Note: This PDF assumes that you are equally likely to hit any point on the dart board (a fact that you are
asked to verify in Exercise 25 of Problem Set 7.3) .
33. Suppose you are a champion dart player with a PDF for the distance x (in inches) that your dart is from the
center of a 12 inch dart board given by

1 − x/2 if 0 ≤ x ≤ 2
f (x) =
0 elsewhere

b. Compute the probability that your dart is more than 1 inch from the center.
c. Compute the probability that you dart is between than 1/4 and 1/2 inch from the center.
34. According to Thomson et al. 1973∗, the elimination constant for Lidocaine for patients with hepatic impairment
is 0.12 per hour. Hence, for a patient that has received an initial dosage of y0 mg, the Lidocaine level y(t) in
the body can be modeled the differential equation
dy
= −0.12y y(0) = y0
dt
a. Solve for y(t).

b. Write down an expression, call it F (t), that represents the fraction of drug that has left the body by
time t ≥ 0.
c. If F (t) = 0 for t ≤ 0, verify that F (t) is a CDF.
d. What is the probability that a randomly chosen drug particle leaves the body in the first 2 hours?
e. What is the probability that a randomly chosen drug particle leaves the body between the second
and fourth hour?
35. Consider a drug that has an elimination rate constant of c. If y is the amount of drug in the body and there
is no further input of drug into the body, we can model the drug dynamics by
dy
= −cy y(0) = y0
dt
where t denotes times in hours and y0 is the initial amount of drug in the body.
a. Solve for y(t).

time t ≥ 0.
∗ Thomson PD, Melmon KL, Richardson JA, et al. Lidocaine pharmacokinetics in advanced heart failure, liver disease, and renal
failure in humans. Ann Intern Med 1973;78(4):499-508


d. Find an expression that allows one to calculate for any value c > 0 and times 0 < r < s what
proportion of the drug is removed on the interval [r, s].
36. (Extinction rates) In early 1960s, Robert MacArthur of Princeton University and Edward O. Wilson of Harvard
University developed a theory to explain why big islands generally have more species than smaller islands, and
why the number of species on islands of similar sizes are inversely related to their distance from continental
landmasses. They argued that the number of species on an island represents a dynamic balance between the
rate at which new species arrive at that island and the rate at which species on the island go extinct. The
simplest model of island biodiversity assumes that the rate of change of the number N of species is given by a
constant rate I of immigration of new species from the main land and that species on the island go extinct at
a rate proportional to N . If the proportionality constant is c, then we obtain
dN
= I − cN
dt
where t denotes time in years. To know what the species immigration rate I might be for a particular island,
we need to know the number of species on the mainland that serve as a source for the colonization process.
On the other hand, the extinction rate c on each island is a characteristic of the island alone rather than of
the surrounding mainlands and the distance of the island to these mainlands. To understand the likelihood a
species already on the island has gone extinct by time t, we can ignore the immigration process (i.e. keep only
track of the species currently on the island) and consider the model
dN
= −cN
dt
a. Solve for N (t).

b. Write down an expression, F (t), for the fraction of species that have gone extinct by year t.
c. Donald Levin, a botany professor at the University of Texas, Austin, was quoted by the Science
Daily∗ as stating “Roughly 20 of the 297 known mussel and clam species and 40 of about 950 fishes
have perished in North America in the last century.” Use this data to approximate the extinction
constants c for mussel and clam species and for fish species.
d. Using your estimates from (b), estimate the probability that a specific clam or mussel species goes
extinct in the next decade.
e. Using your estimates from (b), estimate the probability that a specific fish species goes extinct in the
next decade.
∗ Posted January 10th, 2002 at http://www.sciencedaily.com/releases/2002/01/020109074801.htm

7.2. IMPROPER INTEGRALS 659
7.2 Improper Integrals

In the study of probability, one often encounters integrals called improper integrals in which the limits of
integration are not finite. These improper integrals come in three varieties:
Z ∞ Z a Z ∞
f (x) dx f (x) dx f (x) dx
a −∞ −∞
In this section, we discuss when these integral are well-defined.
One sided improper integrals

Consider the function e−x for x ≥ 0 as illustrated in Figure 7.10.
0.8
0.6
0.4
0.2
Figure 7.10: Area under e−x from x = 0 to x = t
What is the area under this curve? At first glance, one might reason: since the region under the curve goes on
forever, the area is infinite. To evaluate this statement, define A(t) to be the area under e−x from x = 0 to x = t. In
other words,
Z t
A(t) = e−x dx
0
Computing A(t) yields

Z t t
e−x dx = −e−x = 1 − e−t

A(t) =
0 0
A(t) is always less than 1 for any t > 0. Therefore, the area under e−x for x ≥ 0 can not be infinite! In fact, it would
be natural to define the area under e−x for x ≥ 0 to be
lim A(t) = lim 1 − e−t = 1

t→∞ t→∞
Thus, even though the curve is of infinite length, the area under this curve is finite. Our first guess was wrong!
Inspired by this example, we make the following definition.

660 7.2. IMPROPER INTEGRALS
Define Z Z
∞ t
f (x) dx = lim f (x) dx
a t→∞ a
Convergent and R∞
When the limit exists, a
f (x) dx is convergent, otherwise it is divergent. Sim-
divergent ilarly, define Z a Z a
improper integrals
f (x) dx = lim f (x) dx
−∞ t→−∞ t
Ra
When the limit exists, −∞ f (x) dx is convergent, otherwise it is divergent.
Example 1. Convergent versus divergent
Determine whether the following integrals are convergent or divergent. If convergent, determine their value.
R∞
a. 2 dxx2
R ∞ dx
b. 2 x
R∞
c. 0 sin x dx
Solution.
a. For any t,
Z t
dx 1 t 1 1
2
=− = −
2 x x 2 2 t
Taking the limit yields
Z t
dx 1 1 1
lim 2
= lim − =
t→∞ 2 x t→∞ 2 t 2
R∞ dx 1
Hence, 2 x2 is convergent and equals 2.
b. For any t,
Z t t
dx
= ln x = ln t − ln 2
2 x 2
Since Z t
dx
lim = lim [ln t − ln 2] = ∞,
t→∞ 2 x t→∞
R∞ dx
2 x is divergent.
c. For any t,
Z t t

sin x dx = − cos x = 1 − cos t
0 0
Since Z t
lim sin x dx = lim 1 − cos t
t→∞ 0 t→∞
R∞
doesn’t exist (i.e. the values oscillate between 0 and 2), 0
sin x dx is divergent.
Example 1 shows that while the curves x1 and x12 are very similar (i.e. both decreasing to zero as x goes to ∞),
the areas under these curves are infinitely different: x1 encloses an infinite area for x ≥ 2, while x12 encloses a finite
area for x ≥ 2. Figure 7.11 shows that 1/x decreases to zero much slower than 1/x2 . This observation suggests the

0.5 0.25
0.4 0.2
0.3 0.15
0.2 0.1
0.1 0.05
5 10 15 20 5 10 15 20
1 1
Area under x Area under x2
Figure 7.11: Area under curves
following question: How fast does the function have to approach zero to ensure convergence? The following example
formulates a precise answer to this question for p-integrals.
Example 2. p-integrals
Determine for which p > 0, the integral Z ∞

dx
1 xp
is convergent.
R∞
Solution. Example 1 dealt with the case of p = 1 and found 1 dx x to be divergent. Assume that p 6= 1. In which
case Z t t
dx 1 1−p 1
p
= x = t1−p − 1
1 x 1 − p 1 1 − p
When p > 1, we obtain t1−p has a negative exponent and
1 1
lim t1−p − 1 =
t→∞ 1 − p p−1
R∞
Hence, 1 xdxp is convergent if p > 1.
When p < 1, we obtain t1−p has a positive exponent and
1
lim t1−p − 1 = ∞
t→∞ 1 − p
R∞ dx
Hence, 1 xp is divergent if p ≤ 1. 2
Example 2 illustrates that convergence depends subtly on the speed at which f (x) approachesR ∞ zero as x approaches
1
∞. For instance, while x1.0001 dx
seems to go to zero only slightly faster than x1 , the integral 1 x1.0001 is convergent
R∞
(i.e. p = 1.0001 > 1) while the integral 1 dx x is divergent. While this might appear shocking at first, notice that
1
the former integral converges to a very large value: 1.0001−1 = 10, 000. More generally as p > 1 approaches 1 from
above, the area under x1p approaches ∞ as limp→1+ p−1 1
= +∞.
The p-integrals are related to the Pareto distribution, named after the Italian economist Vilfredo Pareto. Pareto
originally used this power distribution to describe the allocation of wealth among individuals. This power distribution
also has been used to describe social, scientific, geophysical, and many other types of observable phenomena. In the
next example, we examine this distribution and its use to describe frequency of individuals visiting websites.
Example 3. The Pareto Distribution
The PDF for the Pareto distribution is of the form

0 ifx < 1
f (x) =
Cx−p ifx ≥ 1

where p > 1 and C is a constant that you will determine.

a. Determine for what value of C, f (x) is a PDF. Your answer will depend on p > 1.
b. Find the CDF for the Pareto distribution.
c. A scientist at Hewlett Packard’s Information Dynamics Lab used the Pareto distribution to describe how
many AOL users visited various web sites on one day in 1997. The data are shown in Figure 7.12 and
conform to a Pareto distribution with p = 2.07. Estimate the fraction of web sites that received visits
from 10 or fewer AOL users on the day in question.
(a) Original data (b) Binned on a logarithmic scale
Figure 7.12: Number of web sites visited by different numbers of AOL users.
Solution.
a. We need to compute the area under f (x). Since f (x) = 0 for x ≥ 1, the area under f is given by
Z ∞
C
dx
1 xp
Since k > 1, Example 2 implies that Z ∞
1 C
C dx =
1 xp p−1
Hence, in order for f to be a PDF, we need that C = p − 1 and we get
p−1
f (x) =
xp
for x ≥ 1.
b. The CDF is given by Z x
F (x) = f (t) dt
−∞
Since f (x) = 0 for x < 1, we get F (x) = 0 for x < 1 and for x > 1,
Z x
p−1
F (x) = dt
1 tp
x
= −t1−p

1
= 1 − x1−p

Thus the CDF is given by

1 − x1−p for x ≤ 1
F (x) =
0 otherwise
c. If p = 2.07, then the fraction of websites visited by ≤ 10 AOL users can be approximated by
F (10) = 1 − 10−1.07
≈ 0.9149
Hence approximately 91.5% of the web sites were visited by 10 users or less.
2
The previous example illustrates how you can go from PDFs to CDFs by integrating over the interval (−∞, x).
Conversely, suppose that you are given a CDF for a continuous random variable. How do you find the associated
PDF? The following theorem says all you have to do is differentiate. Hence, integrate to go from PDF to CDF,
differentiate to go from CDF to PDF!
Theorem 7.1. Fundamental Theorem of PDFs
Suppose that f is a probability density function. Then the CDF

Z x
F (x) = f (s)ds
−∞
satisfies
F ′ (x) = f (x)
Outline of Proof
To prove this theorem, we need to find
F (x + h) − F (x)
F ′ (x) = lim
h→0 h
The limit laws and rules of integration imply
Z x+h Z x
F (x + h) − F (x) = lim f (s) ds − lim f (s) ds
t→−∞ t t→−∞ t
Z x+h Z x
= lim f (s) ds − f (s) ds
t→−∞ t t
Z x+h
= lim f (s) ds
t→−∞ x
Z x+h
= f (s) ds
x
R x+h
If f is continuous at x, then we obtain x f (s) ds ≈ f (x)h. In which case
F (x + h) − F (x)
≈ f (x)
h
where this approximation gets better and better as h → 0 and in the limit F ′ (x) = f (x). Of course, this argument
R x+h
is only an outline of the proof and the real subtly lies in making the statements “ x f (s) ds ≈ f (x)h” and “this
approximation gets better and better as h → 0” mathematically precise. You need to take a real analysis course to
learn how to address these subtleties.

Example 4. Exponential distribution revisited
Recall that in Example 9 in Section 7.1 we considered a model of decay of Lidocaine in the body. For this model,
we found that the fraction of molecules of this drug that have been eliminated by t ≥ 0 days is given by
F (t) = 1 − e−0.43t for t ≥ 0
and F (t) = 0 for t ≤ 0.
a. Find the PDF for the random variable with CDF F (t).
b. Use the PDF to find the probability that a randomly chosen molecule of this drug is eliminated in the
first two hours. Compare your answer to what was found in Example 9 from Section 7.1.
Solution.
a. The derivative of F (t) for t > 0 is F ′ (t) = 0.43e−0.43t. The derivative of F (t) for t < 0 is F ′ (t) = 0.
Hence, the PDF is given by

0.43e−0.43t if t ≥ 0
f (t) =
0 elsewhere
b. Let X be the random variable whose CDF is given by F (t). X corresponds to the time a randomly chosen
drug particle gets eliminated. Using the PDF, we obtain
Z 2
P (0 ≤ X < 2) = f (t) dt
0
Z 2
= 0.43e−0.43t dt
0
2
= −e−0.43t 0
= 1 − e−0.86 ≈ 0.58
Hence, there is a 58% chance that a randomly chosen drug particle gets eliminated in the first two hours.
This is the same answer we found in Example 9 in Section 7.1.
Convergence tests
As we have seen before, the integral of a function can not be always expressed in terms of elementary functions
2 Rb
(e.g. f (x) = e−x ). One way to get around this issue is to numerically estimate a f (x) dx. However, if a = −∞
or b = ∞, then numerical estimates will only make sense if the integral converges. Consequently, it is important
to have methods that determine whether an improper integral is convergent or not. A powerful yet simple test for
convergence is the comparison test. The basic idea in this test is to compare the integral in question (the one for
which convergence is not understand) to an integral for which convergence is understood.
Theorem 7.2. Comparison Test
Suppose that f (x) ≥ g(x) ≥ 0 for x ≥ a. Then

R∞ R∞
Convergence If a f (x) dx is convergent, then a g(x) dx is convergent.
R∞ R∞
Divergence If a g(x) dx is divergent, then a f (x) dx is divergent.

x
a
Figure 7.13: Comparing areas of f (x) ≥ g(x) ≥ 0
The idea behind this theorem is very simple (see Figure 7.13).
If the area under f is finite and f ≥ g ≥ 0, then the area under g is finite. Conversely, if the area under g is
infinite, then the area under f is infinite.
Example 5. Using comparison test
Use the comparison test to determine whether the following integrals are convergent or divergent:
R∞
a. 1 2+sin
x2
x
dx
R ∞ dx
b. 2 x+√x
R∞ 2
c. 0 e−x dx.
Solution.
a. Since 1 ≤ 2 + sin x ≤ 3 for all x,
2 + sin x 3
0≤ 2
≤ 2
x x
R∞ 3 R ∞ dx
for all x > 0. Moreover, since 1 x2 dx = 3 1 x2 is convergent (i.e. a p-integral with p > 1), the
R∞
comparison test implies that 1 2+sin
x2
x
dx is convergent.
√ √
b. Since x ≥ x for all x ≥ 1, we have x + x ≤ 2x for x ≥ 1. Hence,
1 1
√ ≥
x+ x 2x
R∞
for x ≥ 1. Since 12 2 dx
x is divergent (i.e. a p-integral with p = 1), the comparison test implies that
R ∞ dx
√ is divergent.
2 x+ x
2 R∞
c. Since x2 ≥ x for all x ≥ 1, e−x ≤ e−x for all x ≥ 1. Since 1 e−x dx = 1e is convergent, we can
R∞ 2 2 R1 2
conclude that 1 e−x dx is convergent. Moreover, as e−x ≤ 1, we have 0 e−x dx is finite. Hence,
R ∞ −x2 R 1 −x2 R ∞ −x2
0 e dx = 0 e dx + 1 e dx is convergent.
2
Improper integrals can lead to maddening paradoxes as the following example illustrates.
Example 6. Torricelli’s trumpet (or Gabriel’s horn)

1
0.5
0
-0.5
2 1-1
0.5
4 0
-0.5
6 -1
1
Figure 7.14: y = x for x ≥ 1 rotated about the x − axis
Consider the surface created by rotating the curve y = x1 about the x-axis as illustrated in Figure 7.14.
This surface is sometimes called Torricelli’s trumpet and it is named after the Italian mathematician Torricelli.
It can be shown that the volume of this infinite trumpet is given by the expression
Z ∞
π
2
dx
1 x
and the surface area is given by the expression

Z ∞
r
2π 1
1+ dx
1 x x2
a. Determine whether the surface area and volume are convergent or divergent.
b. Discuss how much paint it would take to paint the surface versus how much paint it would take to fill
the trumpet.
Solution.
a. Since the volume is determined by a p-integral with p = 2 > 1, we can say the volume is finite. In fact,
Z ∞
π 1
2
dx = lim π − = π
1 x t→∞ x
Computing
q the surface area directly is hopeless!
q Hence, the comparison test comes to the rescue. Since
R∞
2π 2π
1 + x2 ≥ 1 for x > 0, we obtain x ≤ x 1 + x12 for x > 0. Since 1 2π
1
x dx is a p-integral with p = 1,
R ∞ 2π q 1
the comparison test implies that 1 x 1 + x2 dx is divergent. In particular, the surface area is infinite!
b. Since the surface area is infinite, it would take an infinite amount of paint to paint the surface. On the
other hand, if we plugged up the hole at the end of trumpet, then we could fill the trumpet with a finite
amount of paint. This paint after being poured out would cover the interior surface of the trumpet. How
can this be? Regarding this paradox, Thomas Hobbes was quoted in in Rose’s Mathematical Maxims and
Minims (Raleigh N C 1988) to have said

“To understand this for sense it is not required that a man should be a geometrician or a
logician, but that he should be mad. ”
You are challenged with resolving this paradox in the problem set.
2
Two sided improper integrals

We conclude this section by defining Z ∞
f (x) dx
−∞
A first attempt at this definition might be Z t

lim f (x) dx.
t→∞ −t
Unfortunately this definition is flawed as the following example illustrates.
Example 7. When definitions go wrong

R∞
Compute the integral −∞
2x dx using the definition
Z ∞ Z k+t
2x dx = lim 2x dx
−∞ t→∞ k−t
for any value of k and discuss any anomalies that arise.
Solution. Z k+t k+t

2x dx = x2 = (k + t)2 − (k − t)2 = 4kt

k−t k−t
R∞ Rt
Hence if we believe that −∞ 2x dx is well defined, we must conclude the limt→∞ −t 2x dx equals 0 (when k = 0),
∞ (when k > 0) and −∞ (when k < 0) all at the same time! Since is clearly impossible we have to conclude that
the way we have defined our doubly infinite integral must be flawed. 2
To skirt around the problems from Example 7, we make the following definitions:
R∞
−∞ f (x) dx is convergent if the limits
Z 0 Z t
lim f (x) dx lim f (x) dx
t→−∞ t t→∞ 0
Doubly infinite integrals R∞

exist, otherwise −∞ f (x) dx is divergent. If convergent, we define
Z ∞ Z 0 Z t
f (x) dx = lim f (x) dx + lim f (x) dx.
−∞ t→−∞ t t→∞ 0
In the problem set, you will be asked to show that for convergent integrals
Z ∞ Z a Z t
f (x) dx = lim f (x) dx + lim f (x) dx.
−∞ t→−∞ t t→∞ a
for any a. Hence, for convergent integrals, we can not make the infinite from nothing.
Example 8. Convergence of doubly improper integrals

Determine whether the following integrals are convergent or divergent.

R∞
a. −∞ 2x dx
R∞ 2
b. −∞ x e−x dx
The signed area for each of these curves is shown in Figure 7.15.
6
0.4
4
0.2
2
-3 -2 -1 1 2 3 -3 -2 -1 1 2 3
-2
-0.2
-4
-0.4
-6
2
Signed area under 2x Signed area under xe−x
Figure 7.15: Signed area for Example 8
Solution.
Rt t R∞ R∞

a. Since 0 2x dx = x2 = t2 , 0 2x dx = limt→∞ t2 = ∞ is divergent. Hence, −∞ 2x dx is divergent!
0
R −x2
b. To compute xe dx, we introduce the substitution u = x2 , du = 2x dx. Then
Z Z
2 du e−u
x e−x dx = e−u =− +C
2 2
Therefore,
Z ∞ Z t
2 2
xe−x dx = lim xe−x dx
0 t→∞ 0
2
e−x t
= lim −
t→∞ 2 0
2
1 e−t 1
= lim − =
t→∞ 2 2 2
and
Z 0 Z 0
2 2
xe−x dx = lim xe−x dx
−∞ t→−∞ t
2
e−x 0
= lim −
t→−∞ 2 t
2
e−t 1 1
= lim − =− .
t→−∞ 2 2 2
Hence, Z Z Z
∞ 0 ∞
2 2 1 1 2
xe−x dx = xe−x dx = − + = 0xe−x dx +
−∞ −∞ 0 2 2
Amazingly the repeating these calculations for the more general case leads to the conclusion that
Z a Z ∞
2 2
xe−x dx + xe−x dx = 0
−∞ a
for any a!

Many PDFs have bi-infinite tails (i.e. f (x) > 0 for all x ∈ (−∞, ∞)). One such example is the Laplace
distribution.
Example 9. The Laplace Distribution
An important distribution discovered by the French mathematician and astronomer Simon Laplace (1749-1827)
is the double exponential or Laplace distribution whose probability density function is given by f (x) = ae−b|x|
where b > 0 is a parameter and a > 0 is a constant that you will determine. As the Laplace distribution describes
the random motion of a particle in a liquid with a constant settling rate, it has been used to describe dispersal of
marine larvae along a coastline. Let X denote the distance (say in kilometers) a larvae has traveled northward from
its birth place. If X is negative, then the larvae has traveled south.
a. Determine what a needs to be to ensure that f is a probability density function.
b. Suppose for one marine species b = 1. Determine the probability that a randomly chosen larvae from this
population travels more than 1 km north from its birth place.
c. Suppose for another marine species b = 2. Determine the probability that a randomly chosen larvae from
this population travels more than 1 km north from its birth place.
d. In light of your answers to (b) and (c) provide an interpretation of the parameter b.
Solution.
R∞
a. We need that −∞ ae−b|x| dx = 1. Computing the first half of this improper integral leads to
Z 0 Z 0
bx
ae dx = lim aebx dx Using the fact that |x| = −x for x ≤ 0
−∞ t→−∞ t
a bx 0
= lim e Using the substitution u = bx
t→−∞ b t
a a
= lim 1 − ebt =
t→−∞ b b
Computing the second half of this improper integral leads to
Z ∞ Z t
−bx
ae dx = lim ae−bx dx Using the fact that |x| = x for x ≥ 0
0 t→∞ 0
a t
= lim − e−bx

Using the substitution u = bx
t→∞ b 0
a a
= lim 1 − e−bt =
t→∞ b b
R∞
Hence, −∞ ae −b|x|
dx = 2 b . Since we need that 2 b = 1, we obtain a = 2b .
a a
b. If b = 1 (the units of b are km−1 ), then the fraction of larvae that travel at least 1 km north is given by
Z ∞ Z t
1 −x 1 −x
e dx = lim e dx
1 2 t→∞ 1 2
1 t
= lim − e−x

t→∞ 2 1
1 −t 1 −1
= lim − e + e
t→∞ 2 2
1 −1
= e ≈ 0.1839
2
Hence, there is approximately an 18% chance that a randomly chosen larvae travels at least 1km north.
This area corresponding to this integral is illustrated in the figure below:

0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
−5 0 5
x
c. If b = 2, then the fraction of larvae that travel at least 1 km north is given by

Z ∞ Z t
e−2x dx = lim e−2x dx
1 t→∞ 1
1 t
= lim − e−2x

t→∞ 2 1
1 −2t 1 −2
= lim − e + e
t→∞ 2 2
1 −2
= e ≈ 0.0677
2
Hence, there is approximately a 7% chance that a randomly chosen larvae travels at least 1 km north.
d. The larger b is the more likely that a randomly chosen travels a shorter distance before settling. In fact,
redoing (b) and (c) with an arbitrary b, we find that the chance of a randomly chosen larvae moving at
least 1 km north is 12 e−b .
2

Problem Set 7.2

Determine whether the integrals in Problems 1 to 10 are convergent or divergent. If convergent, determine their
value.
R∞
1. 4 dx
x2
R1
2. −∞ dx
x4
R0 dx
3. −∞ 1−x
R∞
4. 0
e−2x dx
R∞
5. 0
ex dx
R0
6. −∞
ex dx
R∞
7. 0
x2 e−x dx
R0
8. −∞
x2 e−x dx
R∞
9. −∞
x2 e−x , dx
R∞ ex
10. −∞ (1+ex )2
dx
For Problems 11 to 14, use the comparison test to determine whether the integrals are convergent or divergent
R ∞ dx
11. 1 1+e x
R ∞ dx
12. 2 √x2 −2
R∞ cos2 x
13. 1 1+x2 dx
R∞ dx
14. 1 x1.01 +2
From Problems 15 to 18, find the CDF of the given PDF.

ex
15. f (x) = (1+ex )2
16. f (x) = 21 e−|x|

1
17. f (x) = x2 for x ≥ 1 and f (x) = 0 otherwise.
1
18. f (x) = (1+x)2 for x ≥ 0 and f (x) = 0 otherwise.
From Problems 19 to 22, find the PDF of the given CDF.

1
19. F (x) = 1+e−x
1
20. F (x) = 0 for x ≤ 1 and F (x) = 1 − x for x ≥ 1.
21. F (x) = ex for x ≤ 0 and F (x) = 1 for x ≥ 0.
−x
22. F (x) = e−e
R∞ 2 R4 2 R∞ 2
23. Estimate the numerical value of 0 e−x by writing it as the sum of 0 e−x dx and 4 e−x dx. Approximate
the first integral using
R ∞ Simpson’s rule with n = 8. Show that the second integral is smaller than 0.0000001.
Hint: Compare to 4 e−4x dx.

24. Determine how large a needs to be to ensure that

Z ∞
dx
< 0.01
a 1 + x3
R∞ dx
Hint: Compare to a x3 .
R∞
25. If −∞ f (x) dx is convergent, show that
Z 0 Z ∞ Z a Z ∞
f (x) dx + f (x) dx = f (x) dx + f (x) dx
−∞ 0 −∞ a
for all a.
26. Consider a marine species whose larvae disperse northward according the Laplace distribution f (x) = e−2|x| .
a. Determine the fraction of individuals that travel north at least 2 kilometers.
b. Determine the fraction of individuals that travel south at least 2 kilometers.
c. Determine the fraction of individuals that travel at most 2 kilometers north. Note: this includes all
individuals that travel south.
27. Consider a marine species whose larvae disperse northward according the Laplace distribution f (x) = 41 e−|x|/2 .
a. Determine the fraction of individuals that travel north at least 2 kilometers.
b. Determine the fraction of individuals that travel south at least 2 kilometers.
c. Determine the fraction of individuals that travel at most 2 kilometers north. Note: this includes all
individuals that travel south.
28. Journal Problem College Mathematics Journal ∗ Peter Lindstrom of North Lake College in Irving, Texas,
had a student who handled an ∞/∞ form as follows:
Z +∞ Z +∞
−x x−1
(x − 1)e dx = dx
1 1 ex
Z +∞
1
= dx l’Hopital’s rule
1 ex
1
=
e
What is wrong, if anything, with this student’s solution?
29. Historical Quest Evangelista Torricelli was a student of Galileo. As a young man he studied in Galileo’s
home at Arcetri near Florence. Upon Galileo’s death, Torricelli succeeded his teacher as mathematician and
philosopher for their good friend and patron, the Grand Duke of Tuscany.
Evangelista Torricelli (1608-1647)

∗ Vol. 24, No. 4, September 1993, p. 343.

Torricelli’s own words fully describe his amazement at discovering an infinitely long solid with a surface that
calculates to have an infinite area, but a finite volume. “It may seem incredible that although this solid has
an infinite length, nevertheless none of the cylindrical surfaces we considered has an infinite length but all of
them are finite.” ∗
In Example 6, we introduced Torricelli’s trumpet, where we quoted Thomas Hobbs: “To understand this for
sense it is not required that a man should be a geometrician or a logician, but that he should be mad.” Without
resorting to the possibility of admitting insanity, write an argument that resolves this paradox.
30. Historical Quest Newton and Leibniz have been credited with the discovery of calculus, but much of its
development was due to the mathematicians Pierre-Simon Laplace, Lagrange, and Gauss.
Pierre-Simon Laplace (1749-1827)
∗
These three great mathematicians of calculus were contrasted by W.W. Rouse Ball:
The great masters of modern analysis are Lagrange, Laplace, and Gauss, who were contemporaries.
It is interesting to note the marked contrast in their styles. Lagrange is perfect both in form and
matter, he is careful to explain his procedure, and through his arguments are general they are easy
to follow. Laplace, on the other hand, explains nothing, is indifferent to style, and, if satisfied that
his results are correct, is content to leave them either with no proof or with a faulty one. Gauss
is exact and as elegant as Lagrange, but even more difficult to follow than Laplace, for he removes
every trace of the analysis by which he reached his results, and strives to give a proof which while
rigorous will be as concise and synthetical as possible.
Pierre-Simon Laplace taught Napoleon Bonaparte, was appointed for a time as Minister of Interior, and was
at times granted favors from his powerful friend. Today, Laplace is best known as a major contributor to
probability, taking it from gambling to a true branch of mathematics. He was one of the earliest to evaluate
the improper integral Z +∞
2
I= e−x dx
−∞
which plays an important role in the theory of probability.

Evaluate this improper integral.
∗ http://curvebank.calstatela.edu/torricelli/torricelli.htm
∗A Short Account of the History of Mathematics, as quoted in Mathematical Circles Adieu, by Howard Eves (Boston: Prindle, Weber
& Schmidt, Inc., 1977).

674 7.3. MEAN AND VARIANCE
7.3 Mean and Variance

As we have seen in Section 7.1, the histogram for a large data set sometimes can be well approximated by the
graph of a continuous function, the probability density function (PDF). When this occurs, a scientist can describe
concisely his or her data set to another scientist by describing the PDF. Many important PDFs lie in families of
functions whose parameters provide some basic information about the shape of the PDF. These parameters are often
related to the mean and variance of the PDF. The mean is a measurement of the centrality of a data set. In fact,
the mean is the value at which the PDF or histogram balances. Alternatively, the variance describes the spread of
the data set around the mean. The greater the variance, the greater the spread in the data.
Means
The inspiration for the mean or average of a data set is wonderfully captured in the following quote of the French
mathematician Blaire Pascal:
“The excitement that a gambler feels when making a bet is equal to the amount he might win times the
probability of winning it.”
Consider a data set that takes on the values x1 , x2 , . . . , xk . Let pi be the fraction of
data taking on the value xi for i = 1, 2, . . . , k. The mean or average of the data
Mean for set equals
k
Data X
µ = p1 x1 + p2 x2 + . . . + pk xk = pi xi
i=1
From a gambling perspective, if x1 , . . . , xk are the amounts you can win and p1 , . . . , pk are the likelihoods of
winning these amounts, then the mean is what you expect to win. Each term in the sum corresponds to the “amount
you might win times the probability of winning it.” The operation of taking a probability-weighted sum of values is
referred to as calculating the mathematical expectation .
Example 1. Computing the mean
The Condor (May 1995) published a study of competition for nest holes among collared flycatchers, a species
of bird. The authors collected the data by periodically inspecting nest boxes located on the island of Gotland in
Sweden. The accompanying data gives the number of flycatchers breeding at 14 distinct plots.
5 4 3 2 2 1 1 1 1 0 0 0 0 0
Find the mean of this data set.
5
Solution. The data values are 0, 1, 2, 3, 4, and 5. The fraction of zeros is 14 . The fractions of ones and twos
4 2 1
respectively are 14 and 14 . The fraction of 3s, 4s, and 5s are all 14 . Hence, by the definition of the mean,
1 1 1 2 4 5
µ= ·5+ ·4+ ·3+ ·2+ ·1+ · 0 = 1.42857
14 14 14 14 14 14
Hence on average there are approximately 1.4 fly catchers breeding in a randomly chosen plot. 2
Now suppose that our data set has to all appearance a continuous histogram described by the PDF f (x). To find
the mean of this data set, divide the real line into intervals of length ∆x with end points
. . . , x−2 = −2∆x, x−1 = −∆x, x0 = 0, x1 = ∆x, x2 = 2∆x, . . .

7.3. MEAN AND VARIANCE 675
Since the fraction of data values between x and x + ∆x is approximately f (x)∆x, the sum of the values weighted by
their fractions is
∞
X
.... + x−2 f (x−2 )∆x + x−1 f (x−1 )∆x + x0 f (x0 )∆x + x2 f (x2 )∆x + . . . = f (xk )∆x
k=−∞
R∞
Taking the limit as ∆x goes to zero by definition yields the integral −∞ xf (x) dx.
For a continuous random variable X with PDF f (x), the mean of X is given by
Z ∞
Mean for a PDF x f (x) dx
−∞
provided that improper integral is convergent.
Example 2. Throwing darts
Sebastian is a terrible dart player. In his honor, the local pub has created a large dart board with a radius of
2 feet. With this dartboard, Sebastian always hits the board but his dart is equally likely to hit any point on the
board. Let X be the distance from the center that the dart lands. In exercise 25 of this Section, you are asked to
show that the PDF for X is given by
x/2 if 0 ≤ x ≤ 2
f (x) =
0 elsewhere
a. Find the mean of distance that Sebastian’s darts land from the center.
b. Find the probability that a dart lands less than the mean distance from the center.
Solution.
a. To find the mean, we compute

Z ∞ Z 2
x2
xf (x) dx = dx
−∞ 0 2
x3 2
=
6 0
4
=
3
4
So on average, a dart thrown by Sebastian lands 3 feet from the center.
b. To find P (X ≤ 34 ), we compute
Z 4 Z 4
3 3 x
f (x) dx = dx
−∞ 0 2
x2 43
=
4 0
4
=
9
Hence, there is less than a 50% chance, that Sebastian’s dart will land within 4/3 feet of the center, even
though 4/3 is the average distance of all shots from the center of the dart board.

Example 3. Exponential means
Consider a drug with elimination constant c. Then the fraction of drug left after t hours has a exponential
distribution with parameter c. As illustrated in Example 9 of Section 7.1, the PDF for this distribution is given by

0 if t < 0
f (t) =
ce−ct if t ≥ 0
a. Find the mean of the exponential distribution. What is its interpretation in the context of drug decay?
b. For the typical patient, Lidocaine has an elimination constant of 0.43 per hour. What is the mean time
for a molecule to leave? What is the half life in the body of a Lidocaine molecule?
Solution.
a. The mean of exponential distribution is given by
Z ∞ Z ∞
tf (t) dt = tce−ct dt
−∞ 0
Z s
= lim tce−ct dt
s→∞ 0
s Z s
−ct −ct
= lim −te + e dt Using integration by parts with u = t and dv = ce−ct dt
s→∞ 0 0

−ct 1 −ct s
= lim −te − e
s→∞ c 0

−cs 1 −cs 1
= lim −se − e +
s→∞ c c
1
= days
c
Since c has units “per day”, 1c has units “days” and corresponds to the mean number of days it takes for
a drug particle to be cleared from the body.
1
b. The mean elimination time for Lidocaine is 0.43 ≈ 2.33 days. On the other hand, the half life is given by
the solution to
1
= e−0.43t
2
1
ln = −0.43t
2
1
ln 2 = t
0.43
1.61 ≈ t
Hence, half of the particles are eliminated before the mean time to elimination.
2
As discussed in the problem set for Section 7.1, the exponential distribution can be used to model extinction
times for species.
Example 4. Extinction rates

In their article, Extinction rates of North American Freshwater Fauna∗ , Ricciardi and Rasmussen have shown
that time to extinction of a species is exponentially distributed with 0.1% of terrestrial and marine animals going
extinct per decade.
a. What is “elimination constant” c for this data set? What is the mean extinction time?
b. What fraction of species will have gone extinct after 100 years?
c. How long do we expect it take for half the species to go extinct?
d. Ricciardi and Rasmussen estimated future extinction rates by assuming all currently imperiled species
(i.e. endangered or threatened) will not survive this century. Under this assumption, 0.8% of species
would be going extinct per decade. Determine how this alters the answers to (b) and (c).
Solution.
a. If species extinctions are exponentially distributed and time x is measured in years, then Riccciardi and
Ramussen’s data tells us that
0.1% = 0.001 = 1 − e−c 10
Solving for c yields
c = 0.00010005
The mean time to extinction for a species is 1/c = 9, 995 years.
b. The fraction of species that would have gone extinct after a century (i.e. t = 100) is given by
1 − e−0.00010005·100 = 0.00995512
In other words approximately 1%.

c. To determine the half life of the extinction process, we need to solve
0.5 = 1 − e−0.00010005 t
for t which yields 6928.01 years.

d. Solving
0.008 = 1 − e−c 10
for c yields c = 0.000803217. The mean to extinction shrinks by a factor of approximately 8 to 1, 245
years. The fraction species going extinct in the next century would be
1 − e−0.000803217·100 = 0.0771806 ≈ 8%
Solving
0.5 = 1 − e−0.000803217 t
for t yields a half life of 863 years, which is the expected time it will take for half the currently extant
species to go extinct.
2
Examples 2 and 3 illustrate that the fraction of data to the left of the mean can be significantly greater than
50%. This raises two questions. First, what is the geometric interpretation of the mean? To answer this question,
imagine that we take a (infinitely) long board and cut out the area lying under the PDF. If we put the placed this
wooden PDF as shown in Figure 7.16 on a fulcrum at the mean, then the PDF would balance perfectly.
∗ Conservation Biology, 13 (1999), 1220-2

Figure 7.16: PDF with fulcrum at the mean
Second, for what type of PDFs is 50% of the area to the left (and to the right) of the mean? A partial answer to
this question is provided in the following example, using the concept of a symmetric function and an odd function,
where we recall from Chapter 1 that an odd function g(x) has the property that g(x) = −g(−x).
Example 5. Symmetric PDFs
Let f (x) be a PDF that is symmetric about x = a. In other words, f (x) is a PDF and f (a + x) = f (a − x) for
all x as illustrated in Figure 7.17.
Figure 7.17: Symmetric PDF

If the mean is well-defined, then we expect the mean to be x = a as Rthe PDF should balance at this point.
∞
To verify this assertion analytically, assume the mean is well-defined (i.e. −∞ xf (x) dx is convergent) and do the
following:
a. Verify that g(x) = xf (a + x) is an odd function.
R∞
b. Compute −∞ xf (a + x) dx.
R∞ R∞
c. Use the change of variables t = a + x on the integral −∞
xf (a + x) dx to find −∞
tf (t) dt
Solution.
a. Let g(x) = xf (a + x). Since f (a + x) = f (a − x), we obtain
g(−x) = −xf (a − x) = −xf (a + x) = −g(x)
is an odd function.
b. Since g(x) is an odd function, for any b > 0,
Z b Z 0
g(x) dx = − g(x) dx
0 −b
Hence, taking the limit, Z Z

∞ 0
g(x) dx = − g(x) dx
0 −∞
and we get
Z ∞ Z 0 Z ∞
g(x) dx = g(x) dx + g(x) dx = 0.
−∞ −∞ 0
R∞ R∞
c. If we let t = x + a, then 0 = −∞ xf (x + a) dx = −∞ (t − a)f (t) dt. Hence,
Z ∞ Z ∞
tf (t) dt = af (t) dt = a
−∞ −∞
as f (t) is a PDF.
2
In summary, the preceding example proves that for symmetric PDFs with a convergent mean, the mean corre-
sponds to the point of symmetry of the PDF.
Example 6. Means of symmetric PDFs
Assuming the means are well-defined, find the means of the following probability density functions:
a. (The birthday PDF) 1
365 if 0 ≤ x ≤ 365
f (x) =
0 elsewhere
b. (the triangular PDF) 

 x if 0 ≤ x ≤ 1
f (x) = 2−x if 1 ≤ x ≤ 2

0 elsewere
c. (the Laplacian PDF) f (x) = 2b e−b|x|

Solution.
a. Since f (x) is symmetric about x = 365 365

2 , the mean of the birthday distribution is 2 = 182.5. Because
birthday’s are discrete, the day 183, which is July 1 in a non leap year, is proceeded and followed by 182
days in a non leap year.
b. Since the triangular distribution is symmetric about x = 1, x = 1 is the mean.
c. Since the Laplacian distribution is symmetric about x = 0, 0 is the mean.
Sometimes, even for a symmetric PDF, the mean is not well defined as the following example illustrates.
Example 7. Divergent expectations
Demostrate that
1/π
f (x) =
1 + x2
is PDF and find its mean.
Solution. Clearly
R ∞ f1(x) is greater than or equal to 0 for all x ∈ [−∞, ∞]. We leave it as an exercise for the student
to verify that −∞ 1+x 2 dx = π. The mean is given by
Z ∞ Z ∞
1 x
xf (x) dx = dx
−∞ π −∞ 1 + x2
To find an antiderivative, we can use the substitution u = 1 + x2 , du = 2x dx, which yields

Z Z
x dx du/2
=
1 + x2 u
1
= ln |u| + C
2
1
= ln(1 + x2 ) + C
2
Since
Z t
x dx 1
lim = lim ln(1 + t2 )
t→∞ 0 1 + x2 t→∞ 2
= ∞
R∞ x dx
the integral −∞ 1+x2 is divergent and the mean of f (x) is not well-defined! 2
Variance and Standard Deviation

The variance provides a method to measuring the spread of the data around the mean. The importance of going
beyond the mean is captured by the following quote of the English mathematician Sir Francis Galton (1822-1991):
It is difficult to understand why statisticians commonly limit their enquiries to averages, and do not revel
in more comprehensive views. Their souls seems as dull to charm of variety as that of the native of one
our flat English counties, whose retrospect of Switzerland was that, if its mountains could be thrown into
its lakes, two nuisances would be got rid of at once.
For a data set taking on values x1 , x2 , . . . , xk , the variance and standard deviation are defined as follows:

Let pi be the fraction of data taking on the value xi for i = 1, 2, . . . , k. Let µ be the
mean of the data set. The variance which we denote σ 2 is defined by
Variance k
X
2 2 2 2
for Data σ = p1 (x1 − µ) + p2 (x2 − µ) + . . . + pk (xk − µ) = pi (xi − µ)2
i=1
The standard deviation is σ.
Example 8. Computing variances and standard deviations
The Condor (May 1995) published a study of competition for nest holes among collared flycatchers, a bird species.
The authors collected the data by periodically inspecting nest boxes located on the island of Gotland in Sweden.
The accompanying data gives the number of flycatchers breeding at 14 distinct plots.
5 4 3 2 2 1 1 1 1 0 0 0 0 0
Find the variance and standard deviation.
Solution. Previously, we found that µ ≈ 1.43. Hence, the variance is given by

1 1 1 2 4 5
σ2 = · (5 − 1.43)2 + · (4 − 1.43)2 + · (3 − 1.43)2 + · (2 − 1.43)2 + · (1 − 1.43)2 + · (0 − 1.43)2 ≈ 2.388
14 14 14 14 14 14
√
and the standard deviation is σ ≈ 2.388 ≈ 1.54. 2
The following example illustrates that standard deviations measures the spread of the data set around the mean.
Example 9. Seeing the spread
A person places multiple bets on three “fair” games. Her winnings for the games are as follows:
Game A $ −1, 0, 0, 0, 1
Game B $ −1, −1, 0, 1, 1
Game C $ −2, −1, 0, 1, 2
a. Plot the histograms for the each of these data sets.

b. Compute the variances for the each of these data sets.
c. Discuss what you find.
Solution.
a. Plotting the histograms yields
0.7 0.4 0.2
0.18
0.6 0.35
0.16
0.3
0.5 0.14
0.25
0.12
0.4
0.2 0.1
0.3
0.08
0.15
0.2 0.06
0.1
0.04
0.1 0.05
0.02
0 0 0
−1 0 1 −1 0 1 −2 −1 0 1 2

b. Since all the histograms balance at 0, the mean for all data sets is 0. Hence, the variances are given by
Game A : σ2 = (0 + 1)2 · 0.2 + 02 · 0.6 + (0 − 1)2 · 0.2 = 0.4

2
Game B : σ = (0 + 1)2 · 0.4 + 02 · 0.2 + (0 − 1)2 · 0.4 = 0.8
Game C : σ2 = (0 + 2)2 · 0.2 + (0 + 1)2 · 0.2 + 02 · 0.2 + (0 − 1)2 · 0.2 + (0 − 2)2 · 0.2 = 2.0
c. For game A, the variance is 0.4 as there is some variation about the mean 0. Since game B has more
data points away from the mean than game B, the variance for this game is greater than game A. Finally
since game C has the greatest variation in winnings, it has the largest variance.
Now supposeR that our data set has to all appearance a continuous histogram and is well describe by a PDF f (x)
∞
with mean µ = −∞ xf (x) dx. To define the variance associated with the PDF, divide the real line into intervals of
length ∆x with end points
. . . , x−2 = −2∆x, x−1 = −∆x, x0 = 0, x1 = ∆x, x2 = 2∆x, . . .
Since the fraction of data values between x and x + ∆x is approximately f (x)∆x, the variance is approximately
.... + (x−2 − µ)2 f (x−2 )∆x + (x−1 − µ)2 f (x−1 )∆x + (x0 − µ)2 f (x0 )∆x + (x2 − µ)2 f (x2 )∆x + . . .
which equals
∞
X
(xk − µ)2 f (xk )∆x
k=−∞
R∞
Taking the limit as ∆x goes to zero yields −∞ (x − µ)2 f (x) dx.
For a continuous random variable X with PDF f (x) and mean µ, the variance of
X is given by Z
Variance and ∞
standard deviation σ2 = (x − µ)2 f (x) dx

−∞
for a PDF
provided the improper integral converges. The standard deviation of X is given
by σ, the square root of the variance.
Example 10. Integrated variances
Find the variances of the following PDFs:
a. (Birthday distribution)
1
365 if 0 ≤ x ≤ 365
f (x) =
0 elsewhere
b. (Triangular distribution)

 x if 0 ≤ x ≤ 1
f (x) = 2−x if 1 ≤ x ≤ 2

0 elsewhere
Solution.

a. Earlier, we found the mean of the birthday PDF is µ = 365

2 . Hence, the variance is given by
Z 365
dx 1 1 365
(x − 365/2)2 = (x − 365/2)3
0 365 3 365 x=0
3652
= ≈ 11, 102
12
b. Earlier we found that the mean of the triangular distribution is µ = 1. Hence, the variance is given by
Z 2 Z 1 Z 2
2 2
(x − 1) f (x) dx = (x − 1) x dx + (x − 1)2 (2 − x) dx
0 0 1
1
=
6
2
The following example studies the effect of the standard deviation on the shape of the distribution.
Example 11. Laplacian variance
Recall from Example 9 in Section 7.2, the Laplacian PDF f (x) = 2b e−|b|x . Since this distribution is symmetric
its mean is 0.
a. Find the standard deviation of this PDF.
b. Using technology to plot the PDF for different b values and discuss how the standard deviation effects
the shape of the PDF.
Solution.
a. We need to compute
Z ∞ Z Z
b b 0 2 bx b ∞ 2 −bx
x2 e−|b|x dx = x e dx + x e dx
2 −∞ 2 −∞ 2 0
Z ∞
= b x2 e−bx dx by symmetry
0
Applying integration by parts twice yields

Z Z
b x2 e−bx dx = −x2 e−bx + 2xe−bx dx using u = x2 and dv = e−bx dx
Z
2 −bx 2x −bx 2
= −x e − e + e−bx dx using u = 2x and dv = e−bx dx
b b
2x −bx 2
= −x2 e−bx − e − 2 e−bx + C
b b

−bx 2 2x 2
= −e x + + 2 +C
b b
Hence,
Z ∞ Z t
b x2 e−bx dx = lim b x2 e−bx dx
0 t→∞ 0

2x 2 t
= lim −e−2bx x2 + + 2
t→∞ b b 0

2t 2 2
= lim −e−2bt t2 + + 2 + 2
t→∞ b b b
2
=
b2

√
2
Therefore, σ = b .
b. Plotting the PDF for b = 1, 5, 10 yields
5
b=1
4.5 b=5
b=10
4
3.5
2.5
1.5
0.5
0
−4 −3 −2 −1 0 1 2 3 4
For larger b values, the standard deviation is smaller. The PDF tends to concentrate more around the
mean of 0 when the standard deviation is smaller.
2
The following example provides us with an easier way of computing variances.
Example 12. Variance: mean-squared property
Let f be a PDF with mean µ. Assuming σ 2 is well-defined, show that

Z
σ = x2 f (x)dx − µ2 .
2
Solution. The definition of variance and rules of integration imply

Z ∞ Z ∞
2
(x − µ) f (x) dx = (x2 − 2xµ + µ2 )f (x) dx
−∞ −∞
Z ∞ Z ∞ Z ∞
2
= x f (x) dx − 2µ xf (x) dx + µ2 f (x) dx
−∞ −∞ −∞
Z ∞
R∞
= x2 f (x) dx − 2µ2 + µ2 −∞ f (x) dx = 1 and definition of µ
−∞
Z ∞
= x2 f (x) dx − µ2
−∞
Example 13. Back to the birthday distribution

R∞
Compute the variance of the birthday distribution using the equation σ 2 = −∞ x2 f (x) dx − µ2 .

1
Solution. Recall that the Birthday PDF is given by f (x) = 365 for 0 ≤ x ≤ 365 and f (x) = 0 everywhere else.
Since µ = 365
2 , we obtain
Z ∞
σ2 = x2 f (x) dx − µ2
−∞
Z 365 2
x2 365
= dx −
0 365 2
3
2
365 1 365
= −
3 365 2
≈ 11, 021
Chebyshev’s inequality
The variance (and hence the standard deviation) provide us with some measurement of spread around the mean.
Larger standard deviations suggest greater spread around the mean. A basic inequality from probability theory
provides a general method of estimating what fraction of the data is within a certain number of standard deviations
of the mean. This inequality is Chebyshev’s inequality and is named after the mathematician, Pafnuty Chebyshev,
who first proved it.
Theorem 7.3. Chebyshev’s Inequality
Let X be a random variable (think arbitrary point in a data set!) with mean µ and standard deviation σ. Then
1
P (µ − kσ ≤ X ≤ µ + kσ) ≥ 1 −
k2
Proof. We provide a proof in the case of a continuous random variable with PDF f (x). In which case, we obtain
Z ∞
σ2 = (x − µ)2 f (x) dx
−∞
Z µ−kσ Z µ+kσ Z ∞
2 2
= (x − µ) f (x) dx + (x − µ) f (x) dx + (x − µ)2 f (x) dx
−∞ µ−kσ µ+kσ
Z µ−kσ Z ∞
≥ (x − µ)2 f (x) dx + (x − µ)2 f (x) dx
−∞ µ+kσ
Z µ−kσ Z ∞
2 2
≥ (kσ) f (x) dx + (kσ) f (x) dx
−∞ µ+kσ
= (kσ)2 P (X ≤ µ − kσ) + (kσ)2 P (X ≥ µ − kσ)
Thus, we have shown that
σ2 ≥ (kσ)2 P (X ≤ µ − kσ) + (kσ)2 P (X ≥ µ − kσ)

1
≥ P (X ≤ µ − kσ) + P (X ≥ µ − kσ)
k2
1
≥ 1 − P (µ − kσ ≤ X ≤ µ + kσ)
k2
1
P (µ − kσ ≤ X ≤ µ + kσ) ≥ 1−
k2
2

Example 14. Using Chebyshev
In 1998 in Hong Kong, the number of newborns was 52,955 with a mean birth weight 3.2kg and standard deviation
of 0.5kg. Using only this data, estimate the following quantities:
a. The fraction of newborns weighing between 2.2kg and 4.2kg.
b. The fraction of newborns weighing between 1.7kg and 4.7kg.
Solution.
a. Since µ = 3.2 and σ = 0.5, we find that 2.2 = µ − 2σ and 4.2 = µ + 2σ. By Chebyshev’s inequality with
k = 2, we find that at least 1 − 212 = 34 of the newborns weighed between 2.2kg and 4.2 kg.
b. Since µ = 3.2 and σ = 0.5, we find that 1.7 = µ − 3σ and 4.7 = µ + 3σ. By Chebyshev’s inequality with
k = 3, we find that at least 1 − 312 = 89 of the newborns weighed between 1.7kg and 4.7 kg.
2
This example illustrates that Chebysev’s inequality states that at least 3/4 of the data values are at most k = 2
standard deviations away from the mean, at least 8/9 of the data values are at most k = 3 standard deviations away,
at 24/25 are at most k = 5 standard deviations away, and so on.

Problem Set 7.3

Compute the mean, variance, and standard deviation of the data sets given in Problems 1 to 8.
1. 1, 1, 0, 1, 1
2. 2, 0, 2
3. 1, 1, 1, 1, 1
4. 1, 2, 3, 4, 5, 6 (a die)
5. 1, 5, 7
6. −1, −2, 1, 4
7. The set of numbers that contains 2 zeros, 6 ones, 17 twos and 8 threes.
8. The set of numbers that contains 7 negative twos, 5 negative ones, 3 zeros, 8 ones, and 12 twos.
Compute the mean of the random variable with the PDF indicated PDF in Problems 9 to 16 .
1
9. f (x) = 2 for 0 ≤ x ≤ 2 and f (x) = 0 elsewhere.
3
10. f (x) = x4 for x ≥ 1 and f (x) = 0 elsewhere.
1.5
11. f (x) = x4 for |x| ≥ 1 and f (x) = 0 elsewhere.
12. f (x) = e−x for x ≥ 0 and f (x) = 0 elsewhere.

2
13. f (x) = √1 e−x /2
2π
1 1
14. f (x) = 1+x2 π
15. f (x) = xe−x for x ≥ 0 and f (x) = 0 elsewhere.

2
4x2√e−x
16. f (x) = π
for x ≥ 0 and f (x) = 0 elsewhere. (Hint: See Example 5, Section 5.6 (check cross ref))
Compute the variance of the PDFs in Problems 17 to 20.

1
17. f (x) = 2 for 0 ≤ x ≤ 2 and f (x) = 0 elsewhere.
3
18. f (x) = x4 for x ≥ 1 and f (x) = 0 elsewhere.
1.5
19. f (x) = x4 for |x| ≥ 1 and f (x) = 0 elsewhere.
20. f (x) = xe−x for x ≥ 0 and f (x) = 0 elsewhere.
21. Consider the following data set:
-1 0 0 0 0 0 0 0 1
a. Find the mean and standard deviation.

b. According to Chebyshev’s inequality, what fraction of data (at the bare minimum) has to lie in the
interval [µ − 2σ, µ + 2σ]? What fraction of the data does lie in this interval?
22. (Fun but challenging) Construct a data set so that only 75.1% of the data lies in the interval [µ − 2σ, µ + 2σ]

23. Suppose that a random variable x has a PDF of
f (x) = 0.5 + x for 0 ≤ x ≤ 1
a. Find P (0.2 ≤ x ≤ 0.6). Sketch this probability on a graph of the PDF.

b. Find and graph the CDF.
c. Find the mean value.
24. Suppose that a random variable x has a PDF of

1
f (x) = for 1≤x≤e
x
a. Find P (1.2 ≤ x ≤ 2.4. Sketch this probability on a graph of the PDF.
b. Find and graph the CDF.
c. Find the mean value.
25. (Sebastian’s dart-throwing problem.) Let x be the distance from the center that the dart lands on a dart board
with a radius of 2 feet.
a. Show the PDF for x is given by

x/2 if 0 ≤ x ≤ 2
f (x) =
0 elsewhere
b. Find the mean and variance of this PDF

26. (Example 1 from Section 7.1) In the spring 1994, the number of bird species in 40 different California oak
woodland sites were collected. Each site was around 5 hectares in size—the equivalent of about 12 acres, or
.019 square miles—and were situated in relatively homogeneous habitat. The number of bird species found in
these sites is listed below:
37, 21, 26, 27, 21, 21, 28, 22, 22, 26, 47, 26, 29, 34, 28, 25,
19, 32, 32, 29, 29, 16, 21, 24, 37, 38, 30, 20, 23, 30, 27, 32,
17, 24, 32, 29, 40, 31, 38, 35
Using technology, compute the mean and standard deviation of this data set.
27. The following table contains the length in seconds of scenes showing tobacco use recorded for six animation
movies from Universal studios:
0 223 0 176 0 548
a. Compute the mean for this sample.

b. Compute the standard deviation of this data set.
28. The following table contains the number of belly bristles per fruit fly in a sample of size 6.
30 32 27 30 32 32
a. Find the relative frequency of 30.

b. Compute the mean number of belly bristles in this sample.
c. Compute the standard deviation of this data set.

d. According to Chebyshev’s inequality what is the minimum fraction of the data taking on values
between 26.5 and 34.5? What is the actual fraction of data taking on values between 26.5 and 34.5?
29. Consider the exponential PDF given by

ce−ct if t ≥ 0
f (t) =
0 elsewhere
Find the variance and standard deviation of the exponential distribution. Compare these numbers to the mean
of the exponential distribution. What do you notice?
30. According to Thomson et al. 1973∗, the elimination constant for Lidocaine for patients with hepatic impairment
is 0.12 per hour.
a. Determine the mean time µ for a Lidocaine particle to be eliminated.

b. Determine the fraction of Lidocaine eliminated by time t = µ.
31. Donald Levin, a botany professor at the University of Texas, Austin, was quoted by the Science Daily∗ as
stating “Roughly 20 of the 297 known mussel and clam species and 40 of about 950 fishes have perished in
North America in the last century.”
a. Use this data to approximate the mean time to extinction constants for mussel and clam species and
for fish species.
b. Determine the fraction of mussel and clam species and fish species that will be lost in the next
century.
32. In example 3 from Section 7.2, the following Pareto PDF was use to describe how many AOL users visited
certain web sites on one day in 1997:

0 ifx < 1
f (x) =
1.07x−2.07 ifx ≥ 1
a. Find the mean of this PDF.

b. Compute the variance for this PDF. What the variance suggest about the variability in the number
of hits that a web site can experience?
33. Let X denote the number of years a patient lives after receiving treatment for an acute disease like cancer.
Under appropriate conditions, X is exponentially distributed. Suppose that the probability that a patient will
live at least 5 years after treatment is 0.85.
a. Find the mean value of X.

b. Find the probability a patient will live at least 10 years.
34. Based on data from 1974 to 2000 in Humboldt and Del Norte counties in California, the mean time to the next
earthquake of magnitude ≥ 4 is approximately 2.5 weeks. Assuming the the time to the next earthquake is
exponentially distributed, find the probability there will be an earthquake of magnitude ≥ 4 in the next week.
35. (After A Simple Dataset for Demonstrating Common Distributions Peter K. Dunn University of Southern
Queensland, Journal of Statistics Education v.7, n.3 (1999)) According to an article entitled “Babies by the
Dozen for Christmas: 24-Hour Baby Boom,” a record of 44 babies were born in one 24-hour period at the
Mater Mothers’ Hospital, Brisbane, Australia, on December 18, 1997. The article listed the times of birth for
all of the babies. The histogram of the times between birth is as follows:

∗ Posted January 10th, 2002 at http://www.sciencedaily.com/releases/2002/01/020109074801.htm

0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0 20 40 60 80 100 120 140 160
minutes
a. If this histogram has proportions {0.35, 0.25, 0.16, 0.08, 0.02, 0.0, 0.0, 0.0, 0.02} centered on the values
7.5, 22.5, and so on every 15 minutes up to 142.5, then what is the mean time between births.
b. If this histogram is approximately exponential distributed with a mean of 33.26 minutes between
births then what fraction of the times between births were less than 30 minutes? According to the
histogram what fraction of times between births were less than 30 minutes?
c. According to the exponential distribution what fraction of the times between births were more than
75 minutes? Compare this with the actual fraction of times between births that are more than 80
minutes as depicted in the histogram?
36. (Fun but challenging) Construct a data set so that only 75.1% of the data lies in the interval [µ − 2σ, µ + 2σ]

7.4. BELL-SHAPED DISTRIBUTIONS 691
7.4 Bell-shaped distributions

A important collection of PDFs are those whose graphs are “bell-shaped”. In this section, we investigate three
PDFs with this property: the logistic PDF, the normal PDF, and the log-normal PDF. While having similar shapes,
each of the distributions are used to represent quite different biological data sets.
The logistic distribution
The logistic growth equation studied in Chapter 6 describes how populations change over time. In the next example,
we show that solutions to the logistic equation can lead to a CDF for the logistic distribution .
Example 1. Logistic spread of diseases
Consider a population of individuals in which a disease is spreading. Let y denote the fraction of infected
individuals (also known as the prevalence of the disease) and let t denote time in months. If the rate of increase of
infected individuals is proportional to the product of the fraction of infected individuals and the fraction of uninfected
individuals, then
dy
= ry(1 − y)
dt
where r is a constant that describes how rapidly the disease spreads in the population.
a. Assuming that r = 1 and y(0) = 0.5, solve for y(t).
b. Verify that y(t) is a CDF.
c. Determine the probability that a randomly chosen individual from this population is infected within the
next two months.
d. Find the PDF associated with the CDF y(t) and prove that it is symmetric.
Solution.
a. Separating and integrating yields
Z Z
dy
= dt
y(1 − y)
Z
1 1
+ dy = t+C
y 1−y
ln |y| − ln |1 − y| = t+C

y
ln
= t+C
1 − y
y
= C2 et
1−y
y = C2 et (1 − y)
t
y(1 + C2 e ) = C2 et
C2 et
y =
1 + C2 et
Using the initial condition y(0) = 0.5, we can solve for C2
C2
0.5 =
1 + C2
0.5(1 + C2 ) = C2
0.5 = 0.5C2
C2 = 1

692 7.4. BELL-SHAPED DISTRIBUTIONS
Hence,
et
y(t) =
1 + et
b. To verify that y(t) is a CDF, we need to check four things. First, y(t) is clearly non-negative for all t.
Second, to see that y(t) is increasing, we can take the derivative using the quotient rule
et (1 + et ) − et et
y ′ (t) =
(1 + et )2
et
=
(1 + et )2
Since y ′ (t) > 0 for all t, y(t) is increasing. Finally, we need to verify that limt→∞ y(t) = 1 and
limt→−∞ y(t) = 0. Indeed, dividing numerator and denominator of y(t) by et yields
1
lim y(t) = lim =1
t→∞ t→∞ e−t + 1
Similarly, limt→−∞ y(t) = 0.
c. The probability that a randomly chosen individual gets infected before the second week is P (X ≤ 2) =
e2
y(2) = 1+e 2 . The probability that a randomly chosen individual gets infected before t = 0 is P (X ≤
0) = y(0) = 12 . Hence, the probability that a randomly chosen individual gets infected between t = 0 and
e2 1
t = 2 is y(2) − y(0) = 1+e 2 − 2 ≈ 0.381. Hence, there is an approximately 38% chance that a randomly
chosen individual gets infected within two weeks.

d. To find the PDF, we can use the fundamental theorem of PDFs. Namely, the PDF f (t) is given by the
derivative of the CDF:
f (t) = y ′ (t)
et
=
(1 + et )2
It follows that
e−t
f (−t) =
(1 + e−t )2
et
= multiplying numerator and denominator by e2t
(1 + et )2
= f (t).
2
In the previous example, we selected y(0) = 0.5, resulting in a symmetric PDF around 0. Thus the mean is zero
provided that the associated improper integral is convergent. More generally, we can derive a logistic PDF for any
initial condition y(0), as well as for any arbitrary r > 0 in which case the PDF is symmetric around a value of x
other than 0. In particular, in the Problem Set 7.3, you will be asked to show the following:
dy
A solution y(t) to dt = ry(1 − y) with y(0) ∈ (0, 1) gives a CDF of the following
form
1
y(t) =
1 + ea−rt
Logistic PDF
and CDF where a = ln(1/y(0) − 1). This CDF corresponds to the logistic distribution. The
associated PDF is
rea−rt
f (t) =
(1 + ea−rt )2

Note that the sign convention we use requires r > 0 to ensure f (t) > 0. Further, a > 0 implies y(0) ∈ (0, 0.5) and
a < 0 implies y(0) ∈ (0.5, 1).
Example 2. Playing with the logistic PDF
Assume that the logistic PDF describes the distribution of infection times. Let r be the intrinsic rate of growth
of the disease and y(0) be the fraction of individuals infected during week 0.
a. Consider a disease for which r = 1. Determine the fraction of people that are infected by the disease in
the next two weeks if y(0) = 0.25 or y(0) = 0.75.
b. Use technology to plot the PDF for r = 1 and y(0) = 0.25, 0.5, and 0.75. Discuss what you find.
c. Consider a disease for which y(0) = 0.1. Determine the fraction of people that are infected by the disease
in the next two weeks if the intrinsic rate of growth is r = 0.5 or r = 5.
d. Use technology to plot the PDF for y(0) = 0.1 and r = 0.5, 1, and 5. Discuss what you find.
Solution.
a. If r = 1 and y(0) = 0.25, then a = ln(1/y(0) − 1) ≈ 1.1 and the CDF is given by
1
y(t) =
1 + e1.1−t
The fraction that are infected in the next two weeks is given by
y(2) − y(0) ≈ 0.71 − 0.25 = 0.46
Hence 46% are infected in the next two weeks.
If r = 1 and y(0) = 0.75, then a = ln(1/y(0) − 1) ≈ −1.1 and the CDF is given by
1
y(t) =
1 + e−1.1−t
y(2) − y(0) ≈ 0.96 − 0.75 = 0.21
Hence only 21% are infected in the next two weeks.
b. Using technology, we obtain the following plots of the PDF
0.25
y(0)=0.25
y(0)=0.5
y(0)=0.75
0.2
0.15
0.1
0.05
0
−4 −3 −2 −1 0 1 2 3 4 5
t

These plots illustrate that as we increase y(0) the “center” of the PDF tends to move to the left. In other
words, as the fraction individuals infected at week 0 increases, the “time-to-infection” for all individuals
decreases.
c. If y(0) = 0.1, then a = ln(1/y(0) − 1) ≈ 2.2. Hence, if r = 0.5, then the CDF is given by
1
y(t) =
1 + e2.2−0.5t
y(2) − y(0) ≈ 0.23 − 0.1 = 0.13

Alternatively, if r = 5, then the CDF is given by
1
y(t) =
1 + e2.2−5t
y(2) − y(0) ≈ 1.0 − 0.1 = 0.9
d. Using technology, we obtain the following plots of the PDF
1.4
r=0.5
r=1.0
1.2 r=5.0
0.8
0.6
0.4
0.2
0
−6 −4 −2 0 2 4 6
t
These plots illustrate that as we increase r the “center” of the PDF tends to move to the left and the
spread around the center decreases. In other words, for diseases that spread quickly (i.e. r is larger),
most people catch the disease quickly and around the same time. For diseases that spread slowly (i.e.
r is small), there is greater variability in the time it takes for a person to get infected and most people
catch the disease later than sooner.
Example 2 illustrates how the mean and variance of logistic distribution are effected by the parameters r and
y(0). The following example determines the mean of the logistic distribution.

Example 3. The mean of the logistic PDF
Let
1
y(t) =
1 + ea−rt
and
rea−rt
f (t) =
(1 + ea−rt )2
be the CDF and PDF of the logistic distribution. Assuming the mean is well-defined, do the following:
a. Find t = T such that y(T ) = 0.5. In other words, find T such that half of the data lies to the left of T
and half of the data lies to the right of T .
b. Verify that f (t) is symmetric about t = T .
c. Find the mean.

R∞
In the problem set, you will be asked to verify that the mean is well-defined i.e. −∞ t f (t) dt is convergent.
Solution.
1
a. Solving y(T ) = 2 yields
1 1
=
1 + ea−rT 2
2 = 1 + ea−rT
1 = ea−rT
0 = a − rT
a
T =
r
b. To check symmetry of f (t) about t = T , we need to verify that f (a/r + t) = f (a/r − t) for all t. Indeed,
we have
rea−r(a/r+t)
f (a/r + t) =
(1 + ea−r(a/r+t) )2
re−rt
=
(1 + e−rt )2
re−rt e2rt
=
(1 + e ) e2rt
−rt 2
rert
=
(e + 1)2
rt
rea−r(a/r−t)
=
(1 + ea−r(a/r−t) )2
= f (a/r − t)
R∞
c. Since f is symmetric around a/r, the mean is given by µ = a/r provided that the integral −∞
tf (t) dt is
convergent. You are asked to verify this fact in the problem set.
So, we have shown:

rea−rt
The mean of the logistic density function f (t) = (1+ea−rt )2 is
Mean of Logistic a
µ=
r
In addition to describing the spread of diseases, the logistic distribution can describe the spread of an organism
across a landscape.
Example 4. Organismal spread
Pyura praeputialis is a large tunicate (i.e. a species of sea squirt reaching lengths of up to 35 cm) which, in Chile,
is distributed exclusively along 60 to 70 km of coastline in and around the bay of Antofagasta. This tunicate is a
sessile, dominant species, capable of forming extensive beds of barrel-like individuals tightly cemented together in
rocky intertidal and shallow subtidal zones. Using experimental quadrats, biologist Jorge Alvarado and colleagues∗
investigated recolonization dynamics of P. praeputialis in Chile after removal of adult individuals. Professor Alvardo
and colleagues found that the fraction of occupied habitat is given approximately by
1
y(t) =
1 + e4−1.7t
where t is measured in hundreds of days.
a. What fraction of habitat was occupied on day t = 0?
b. What fraction of habitat was occupied in the first 100 days?
c. At what point in time will 95% of the habitat be covered?
d. For a randomly chosen point in the habitat, what is the mean time for it be occupied?
Solution.
a. Since y(0) = 0.018, less than 2% of the habitat is initially occupied.
b. Since y(1) − y(0) ≈ 0.0731, approximately 7% of the habitat was occupied in the first 100 days.
c. Solving y(t) = 0.95 yields
1
0.95 =
1 + e4−1.7t
1
1 + e4−1.7t = ≈ 1.0526
0.95
e4−1.7t ≈ 0.0526
4 − 1.7t ≈ ln 0.0526 ≈ −2.945
6.945 ≈ 1.7t
t ≈ 4.0853
Hence 95% of the habitat is occupied in approximately 408 days.
d. Since r = 1.7 and a = 4, the mean time for a location being occupied is ar = 4
1.7 ≈ 2.35. Therefore, on
average it takes 235 days for a randomly chosen location to get occupied.
∗ J. L. Alvardo et al. 2001. Patch recolonization by the trunicate Pyura praeputialis in the rocky intertidal of the Bay of Antofagasta,
Chile: evidence for self-facilitation mechanisms. Marine Ecology Progress Series. 224: 93–101.

A very important type of regression analysis arises in the context of the logistic model. In Chapter 1, we
demonstrated fitting the linear model y = ax + b to data and then used the model to infer a value for y when a value
of x is known. Now suppose that we want to infer the probability p of a certain event occurring associated with the
measurement of some variable t. For example, t could be the age of a healthy cow and p could be the probability
that this cow will die over the next year. Then the outcome is either 0 (the cow survived the year) or 1 (the cow
died within a year). If we actually have some data on the proportion p(t) of cows dying at various ages to which we
can fit a function, then we might want to trying fitting the logistic function
1
p(t) = ,
1 + ea−rt
because it has the appropriate properties: it increases with time from a positive value less than 1, but is asymptotic
to for large t. In fact, p(t) is the probability that new born individual dies before age t and, specifically, 0 < p(1) =
1
1+ea < 1 is the probability that a new born calf will die in its first year.
A method for fitting the logistic function to data is illustrated by the following example.s
Example 5. Transforming the logistic into a linear equation
p(t)
Demonstrate the function y = ln 1−p(t) is linear in t when p(t) is a logistic function.
Solution. We are told that

1
p(t) = .
1 + ea−rt
It now follows that
y = ln p(t) − ln(1 − p(t))

1 (1 + ea−rt ) − 1
= ln − ln
1 + ea−rt 1 + ea−rt

= ln 1 − ln 1 + ea−rt − ln ea−rt + ln 1 + ea−rt
= rt − a
From this example we see that if we have a set of n data points, (t1 , p1 ), . . . , (tn , pn ) that we want to describe with
p1 pn
the logistic function, we are actually finding the best fitting line through the transformed data (t1 , ln 1−p 1
), . . . , (tn , ln 1−p n
).
The transformed quantities
p
ω= and y = ln ω
1−p
are interesting in their own right. The quantity ω is called the odds ratio, because it is the quantity that bookies
use to decide what odds to give gamblers for correctly predicting the outcome of a horse race or some other event
for which there is an associated probability function p(t). The logarithm of the odds ratio has a the linear we form
see when p(t) is logistic. Not surprisingly, finding the best fitting parameters r and a as outlined in the previous
example for the logistic relationship p(t) = 1+e1a−rt is called logistic regression.
Example 6. Logistic regression
A medical researcher used chemical methods to induced the growth of prostate tumors in several hundred male
rats. He then surgically removed the resulting tumors after 150 days and measured the proportion of individuals the
had the tumors return within 90 days as a function of the size of the original tumor that he removed. The results
are given in the first two columns in Table 7.1. Find the best fitting logistic equation to this data set.
p
Solution. We use the transformation y = ln 1−p of the p values in the second column to obtain the third column.

Table 7.1: Proportion p of mice growing new prostate tumors as a function of the weight t (grams) of the original
tumor removed
Weight Proportion Transformed Variable

p
t p y = ln 1−p
0-1 0.01 -4.60
1-2 0.02 -3.89
2-3 0.05 -2.94
3-4 0.11 -2.09
4-5 0.18 -1.52
5-6 0.32 -0.75
6-7 0.56 0.24
7-8 0.76 1.15
8-9 0.83 1.52
9-10 0.95 2.94
10+ 0.92 2.44
For the t values, we select the midpoint values t1 = 0.5, t2 = 1.5, ...t10 = 9.5, and for the last bin we use t11 = 10.5
even though it represents all weights ≥ 10. Using technology to find the best fitting line, we get y = 0.76t − 4.9 —
that is r = 0.76 and a = 4.9. The transformed data and regression line are illustrated in Figure 7.18. Using the
expression given above this example for the logistic equation we finally obtain
1
p(w) = .
1 + e4.9−0.76w
2
Figure 7.18: Linear regression on transformed logistic data
Normal distribution

An important model of quantitative phenomena in the natural and behavioral sciences is the normal distribution.
The graph of this distribution is a bell curve because the graph of its probability density function is shaped like a
bell. From its name one would speculate that the normal distribution is the most ubiquitous probability distribution
in nature. This is arguably true: it can be theoretically demonstrated that if each data point in a set arises under
the influence of many small independent additive effects then the distribution of the data will be well-approximated
by the normal distribution. The normal distribution is also known as the Gaussian distribution, after the great
German mathematician, Karl Friedrich Gauss (1777-1855).
Figure 7.19: 10 Deutsche Mark Bill showing Karl Gauss
The importance of this distribution is highlighted by the fact that it can be found on the German ten deutsche
mark as illustrated in Figure 7.19.
The PDF of the normal distribution is given by
PDF of the 1 (x−µ)2

f (x) = √ e− 2σ2
normal distribution 2πσ
where µ is the mean of the distribution and σ is the standard deviation.
The effect of increasing µ on this distribution is to move the graph to the right. The standard deviation σ, on
the other hand, controls the spread of the bell-shaped curve about its center. For small σ, the distribution is more
peaked or concentrated around the mean as illustrated in Figure 7.20a. For large σ, the distribution is fatter and
more spread as illustrated in Figure 7.20b. As we discussed in Chapter 5, there is no elementary representation of
the antiderivative of f (x). Hence, we need to resort to numerical estimates.
1 0.4
0.8
0.3
0.6
0.2
0.4
0.1
0.2
-3 -2 -1 1 2 3 -3 -2 -1 1 2 3
(a) (b)
Figure 7.20: Bell-shaped Curves
Example 7. Wheat yields

In 1910, Mercer and Hall conducted a wheat yield experiment at Rothamsted Experimental Station in Great
Britain. In 500 identical plots, wheat was grown and the yield (in bushels) was recorded. The resulting histogram
of this data is approximately normal as illustrated in Figure 7.21. The mean of this data is 3.95 bushels and the
standard deviation 0.45 bushels. Use numerical integration to approximate the following quantities:
a. the likelihood that the yield in a randomly chosen plot is between 3.5 and 4.5 bushels.
b. the likelihood that the yield in a randomly chosen plot at least 5 bushels.
0.8
0.6
0.4
0.2
3 3.5 4 4.5 5
Figure 7.21: Histogram for the Rothamsted experiment
Solution. For this problem, we have

(x−3.95)2
1
f (x) = √ e 2(0.45)2
2π0.45
R 4.5
a. Integrating 3.5 f (x) dx numerically yields 0.730533. Hence, there is approximately a 73% chance the
yield will between 3.5 and 4.5 bushels.
R∞
b. Integrating 5 f (x) dx numerically yields 0.00981533. Hence, there is approximately a 1% chance the
yield at least 5 bushels.
2
Aside from using numerical integrators, we can use tables to estimate areas under normal densities. At first, we
might think that we need an infinite number of tables to deal with all possible values of µ and σ. However, this is
not the case. Using a clever substitution, we can reduce everything to a question about one normal distribution, the
standard normal distribution.
A random variable Z has a standard normal distribution if it has a normal distri-

bution with mean 0 and standard deviation 1: that is, it has the distribution
Standard normal
distribution 1 2
f (z) = √ e−z /2 for z ∈ (−∞, ∞).
2π
The following example proves that all questions about normal distributions can be reformulated as a question
about the standard normal distribution.
Example 8. From normal to standard normal distributions

Let X be normally distributed with mean µ and standard deviation σ. Let Z be normally distributed with mean
0 and standard deviation 1. Show that for any a,
P (X ≤ a) = P (Z ≤ (a − µ)/σ)
Solution. Since X has a normal distribution with mean µ and standard deviation σ, we have that
Z a
1 (x−µ)2
P (X ≤ a) = √ e− 2σ2 dx
−∞ 2πσ
dx
Consider the change of variables, z = (x − µ)/σ. Then dz = σ , z = (a − µ)/σ when x = a, limx→−∞ z = −∞, and
Z (a−µ)/σ
1 z2
P (X ≤ a) = √ e− 2 σ dz
−∞ 2πσ
Z (a−µ)/σ
1 z2
= √ e− 2 dz
−∞ 2π
= P (Z ≤ (a − µ)/σ)
z-scores
Sometimes we want to know the percent of occurrence for scores that do not happen to be 1, 2, or 3 standard
deviations from the mean. To this end, we use z-scores (sometimes called standard scores) to determine how far, in
terms of standard deviations, that a given score is from the mean of the distribution. We can use in Table 7.2 to
find the percent of values between the mean and the value that is 1 standard deviation from the mean. In Table 7.2,
look in the row labeled (at the left) 1.0 and in the column headed 0.00. The entry is 0.3413, which is 34.13%. For
z = 1.2, look at the entry in the row marked 1.2 and the 0.00 column: It is 0.3849. Finally, suppose we want to find
z = 1.68; look at the row labeled 1.6 and the column headed 0.08 to find the entry 0.4535. This means that 45.45%
of the values in a normal distribution are between the mean and 1.68 standard deviations above the mean.
Using the Standard Normal Distribution Table 7.2, we can tackle the type of problem illustrated in the following
example.
Example 9. IQ Score in Socially Disparate Communities
Psychologists and sociologists use scores on standardized intelligence quotient (IQ) tests to predict performance
outcomes of individuals in different parts of society. In a study conducted by Dr. Naomi Breslau and colleagues,
subjects from communities in southeastern Michigan and the City of Detroit had their IQs tested at age 6 and then
again five years later at age 11† . The summary statistics of their results are given in Table 7.3.
Assume the distribution of IQs in each of the categories can be reasonable well approximated by a normal
distribution. Determine the proportion of 6-year olds that have IQs less than 110 in each of the two normal birth
weight groups.
Solution. Let Z denote a standard normally distributed random variable. To answer this question for the urban
community, we have to standardize the value of 110 to a particular value z of the random variable Z as follows,
noting that from Table 7.3 the population mean and standard deviation for normal birth weight six-year-olds in this
community is µ = 99.1 and σ = 14.0:
110 − 99.1
z= = 0.779
14.0
Hence, using the Z-table, we find that the desired probability is
P (Z ≤ 0.779) ≈ P (Z ≤ 0.78) = 0.50 + 0.2832 ≈ 0.78

† Breslau et al., 2001. Am. J. Epi. 154: 711–717

Table 7.2: Standard Normal Distribution
Table 7.3: Mean score with standard deviation in parenthesis of IQ measurements of individuals at age 6 and five
years later at age 11 for children in Michigan stratified by birth weight and home location into the 8 cases labeled 1
to 8.)
Urban community Suburban Community

6 year-old 11 year-olds 6 year-old 11 year-olds
Low birth weight (≤ 2500 g) 90.1 (15.6) 88.1 (14.7) 107.0 (15.0) 107.8 (14.8)
Normal birth weight > 2500 g) 99.1 (14.0) 94.1 (13.6) 113.3 (15.4) 112.8 (14.3)
To answer this question for the suburban community, we have to standardize the value of 110 to a particular
value z of the random variable Z as follows, noting that from Table 7.3 the population mean and standard deviation
for normal birth weight six-year-olds in this community is µ = 113.3 and σ = 15.4:
110 − 113.3
z= = −0.214.
15.4
Hence, using the Z-table, we get that the desired probability is
P (Z ≤ −0.214) ≈ P (Z ≤ −0.21) = 0.50 − 0.0832 ≈ 0.42

Thus 78% of normal urban 6-year-olds, but only 42% of normal suburban 6-year-olds have IQs less then 110. 2
Lognormal distribution
One of the problems with the normal distribution is that it is technically associated with random variables X that
are defined on the interval (−∞, ∞), while very often biological data can assume only positive values (e.g. height or
weight), or may be constrained to lie on a closed interval such as [0, 1] (e.g. proportions). This may not be a problem if
only the extreme tail of the distribution is associated with negative value of X so that ignoring all these negative values
corresponds to loosing a very minute part of the distribution, as in the wheat yield example illustrated in Fig. 7.21.
Sometimes, however, the “normality” of a data set is not apparent until it has been appropriately transformed to
take values on (−∞, ∞) rather than on some smaller interval of the real number line. For example, A very common
transformation for data sets of positive values generated by the random variable X is the log transformation ln X
which arises by taking the natural logarithm of all the data values. Data that exhibit a normal distribution after
such a transformation are said to be lognormally distributed.
A random variable X is lognormally distributed if ln X is normally distributed. In

other words, there exist parameters µ and σ > 0 such that
Lognormal Z ea
1 (x−µ)2
Distribution I a
P (ln X ≤ a) = P (X ≤ e ) = √ e− 2σ2 dx
−∞ 2πσ
for any real number a.
The above definition provides the CDF for the lognormal distribution. In Example 11 you are asked to derive
the PDF from this definition of the CDF and in the Problem Set at the end of the Section you are asked to show
that the mean and variance of the lognormal satisfy the relationships given below.
The PDF of the lognormal distribution is defined in terms of two positive parameters
µ > 0 and σ > 0 by the function
( x−µ)2
√1 − (ln 2σ
f (x) = x 2πσ
e 2
for x ≥ 0
0 otherwise.
Lognormal
Distribution II The mean m and variance v of this distribution are given by
2
m = eµ+σ /2
and 2 2
v = eσ − 1 e2µ+σ .
Example 10. Chicken pox latency periods
In a paper entitled, “The distribution of incubation periods of infectious disease,” Sartwell found that the latency
period of chicken pox was approximately lognormally distributed. The latency period is the period of time from
a person initially getting infected to the moment they exhibit their first symptoms. These latency periods were
measured in days. Taking the natural logs of the latency periods, Sartwell estimated the mean of the log-transformed
data as µ = ln(14) ≈ 2.639 and the standard deviation as σ ≈ 0.13. Using these estimates, find the following
quantities:
a. The fraction of individuals that starting exhibiting symptoms within the first two weeks.
b. The fraction of individuals that started exhibiting symptoms after 18 days.
c. The fraction of individuals that started exhibiting symptoms between the 12th day and the 15th day.

Solution.
a. Since the log of the data is normal, we need to take the natural logarithm of 14 days to determine the
fraction of individuals that exhibited symptoms within 14 days. Since ln 14 ≈ 2.639 is the mean of the
transformed data and this data is normally distributed with mean ln 14, 50% of the data lies to the left
of ln 14. Hence, 50% of the people exhibited symptoms in the first two weeks.
b. Since the logarithm of the data is normal, we need to determine what fraction of the log-transformed data
is to the right of ln 18 ≈ 2.890. Converting to standard normal units, we obtain z = (2.890−2.639)/0.13 ≈
1.93. Using the Standard Normal Distribution Table 7.2, we see that for z = 1.93 approximately 47.3%
of the log-transformed data lies between the mean 2.639 and 2.890. Hence (50 − 47.3)% = 2.7% of the
log-transformed data lies to the right of 2.890. Equivalently, approximately 2.7% of the people started
exhibiting symptoms after 18 days.
c. Since the logarithm of the data is normal, we need to determine what fraction of the log transformed data
lies between ln 12 ≈ 2.485 and ln 15 ≈ 2.708. Converting to standard normal units, we respectively obtain
to the left and right of the mean z = (2.485 − 2.639)/0.13 ≈ −1.18 and z = (2.708 − 2.639)/0.13 ≈ 0.53.
From the Standard Normal Distribution Table 7.2 we find that z = −1.18 implies that approximately
38.1% of the log-transformed data lies between 2.485 and the mean 2.639 (i.e. between 0 and −1.19
standard deviations of the mean) and z = 0.54 implies that 20.5% of the log-transformed data lies
between 2.639 and 2.708. Thus 38.1 + 20.5% = 58.6% of the log- transformed data lies between the 2.485
and 2.708 which implies that approximately 58.6% of the people started to exhibit symptoms between
the 12th and 15th day.
The following example determines the PDF for the lognormal distribution and explores the effects of the param-
eters µ and σ on the shape of the distribution.
Example 11. Lognormal PDF
Let X be a random variable such that ln X has a normal distribution with mean µ and standard deviation σ.
a. Use a change of variables of integration to find the PDF for X.
b. For σ = 1, plot the PDF of X with µ = −1, 0 and 1. Discuss how changing µ influences the shape of the
PDF of X.
c. For µ = 1, plot the PDF of X with σ = 0.5, 1 and 1.5. Discuss how changing σ influences the shape of
the PDF of X.
Solution.
a. Since ln X is normally distributed, the PDF of ln X is given by

1 (x−µ)2
√ e− 2σ2
2πσ
To determine the PDF of X, lets begin by finding an expression for P (X ≤ a) for any positive real
number a. Since X ≤ a if and only if ln X ≤ ln a, we obtain
P (X ≤ a) = P (ln X ≤ ln a)
Z ln a
1 (x−µ)2
= √ e− 2σ2 dx
−∞ 2πσ
Z a
1 (ln y−µ)2 dy
= √ e− 2σ2 with the change of variables y = ex
−∞ 2πσ y

By the fundamental theorem of PDFs, the PDF of X is given by

1 (ln x−µ)2
f (x) = √ e− 2σ2
x 2πσ
b. Using technology to plot the PDF of X with µ = −1, 0, 1 and σ = 1 yields
1.8
µ=−1
1.6 µ=0
µ=1
1.4
1.2
0.8
0.6
0.4
0.2
0
0 1 2 3 4 5
x
Increasing µ moves the center of the distribution to the right and increases the spread of the distribution
about the center.
c. Using technology to plot the PDF of X with σ = 0.5, 1, 1.5 and µ = 0 yields
1
σ=0.5
0.9 σ=1
σ=1.5
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 1 2 3 4 5
x
Increasing σ moves the center of the distribution to the left, but still increases the spread of the distribution
as represented by the size of the tails (i.e. the area under the curve beyond, in this case, x = 2 to 3).

Example 12. Survival of moths
An entomologist needs adult moths for her wind tunnel studies on how moths navigate their way in flight using
pheromones in an odor plume. In a pilot study, she reared the moths from eggs until they eclosed from their pupal
stage and then she selected 194 of the healthiest looking individuals for her flight studies. In the first and second
columns of Table 7.4, the number of moths dying each week is given until all the last moth dies in the 29th week.
Now calculate the following:
a. the proportion of moths dying each week and variance of the resulting distribution
b. the mean age of death
c. the variance of age at death
d. the proportions expected to die each week if these proportions follow a lognormal distribution that has
the observed mean and variance
e. the proportions expected to die each week if these proportions follow a normal distribution that has the
observed mean and variance
f. the number of individuals that need to be reared out for the main study given that ten weeks after the
start of the study, the entomologist needs 600 for the core component of the study.
Table 7.4: Number of months dying each week (values rounded to 3 decimal places for presentation purpose, but
actual calculations of means and variances involve the precision of the technology used).
Week (i) Number that die Proportion that die (pi ) Lognormal Fit Normal Fit
1 0 0.000 0.000 0.026
2 3 0.015 0.009 0.035
3 12 0.062 0.044 0.044
4 15 0.077 0.085 0.054
5 26 0.134 0.109 0.063
6 21 0.108 0.116 0.071
7 13 0.067 0.109 0.076
8 20 0.103 0.097 0.079
9 15 0.077 0.082 0.079
10 10 0.052 0.068 0.076
11 7 0.036 0.055 0.070
12 16 0.082 0.045 0.062
13 3 0.015 0.036 0.053
14 3 0.015 0.029 0.043
15 10 0.052 0.023 0.034
16 5 0.026 0.018 0.026
17 2 0.010 0.015 0.019
18 5 0.026 0.012 0.013
19 0 0.000 0.009 0.009
20 0 0.000 0.007 0.006
21 3 0.015 0.006 0.003
22 1 0.005 0.005 0.002
23 1 0.005 0.004 0.001
24 0 0.000 0.003 0.001
25 0 0.000 0.003 0.000
26 0 0.000 0.002 0.000
27 2 0.010 0.002 0.000
28 1 0.005 0.001 0.000
Sum 194 1 1 0.94
Mean 8.46 8.46 8.46
Variance 25.15 25.15 25.15
Solution.

a. Since the total number of moths at the beginning of the first week is 194, the proportion dying in week
i (i = 1, ..., 24) is the number dying in that week divided by 194. See Column 3 of Table 7.4. The
proportion of moths dying each week and variance of the resulting distribution
P
b. The mean age of death is obtained from the calculation m = 29 i=1 (i − 0.5)pi . Note that we have selected
the mid point of each week to represent the point at which all individuals die during the week. This of
course is an approximation, but some approximation must be used because of the discrete nature of the
problem. The answer using an appropriate technology (e.g spreadsheet software) is 8.46.
P
c. The variance associated with age of death is obtained from the calculation v = 29 2 2
i=1 (i − 0.5) pi − m .
The answer using an appropriate technology is 25.15.
d. If the observed mean and variance
are m = 8.46 and v = 25.15, then we need to use the relationships
2 2 2
m = eµ+σ /2 and v = eσ − 1 e2µ+σ to find parameters µ and σ that will given a lognormal distribution
with the observed mean m and variance v. In the problem set you are asked to show that the resulting
equations are (in terms of the notation used in this example)
1
µ = 2 ln m − ln m2 + v
2
and
σ 2 = −2 ln m + ln m2 + v .
Solving these yields µ = 1.98 and σ 2 = 0.30. The lognormal distribution generated by this parameters is
given in Table 7.4 and visualised in Figure 7.22
e. The proportions of individuals expected to die each week, if these proportions follow a normal distribution
that has the observed mean and variance, are given in Table 7.4 and visual in Figure 7.22 from which it
is clear that the normal distribution is poor fit to the real data from weeks 1 to 9. The fits of this normal
and the above lognormal can be compared using a least-squares measure, but this is not asked for in this
example.
f. The fitted lognormal predicts that 0.72 of the distribution is to the left of the the ten week point (add
the first 10 entries in Table 7.4). Thus our best estimate of the expected number of individuals left
after 10 weeks is p = 0.28. Thus the number at the start of the experiment should be 600/0.28 ≈ 2142
individuals to obtain the expected amount. To be on the safe side, the entomologist may want to start
the experiment with 2200 adults.
Figure 7.22: The frequency of months dying each week is plotted over the 31 week period for the actual data (closed
circles), as well as the lognormal (open circles) and normal (crosses) distributions that have the same mean and
variance as the data.

Problem Set 7.4

Assume that a data set is normally distributed with a mean of 0 and a standard deviation of 1. A value x is randomly
selected. Find the probability requested in Problems 1 to 8.
1. P (0 ≤ x < 0.85)
2. P (0 ≤ x ≤ 1.45)
3. P (x ≤ 0)
4. P (x > 0)
5. P (x ≥ 0.55)
6. P (x < −1.00)
7. P (−1.00 < x < 0.75)
8. P (0.65 < x < 0.95)
9. In a normally distributed collection of scores with mean 0 and a standard deviation 1, find the area under the
curve bounded by the lines z = 1.20 and z = 1.90 and compare this with the value of z = 1.90 − 1.20 = 0.70
in Table 7.2.
10. For the normal distribution with mean µ = 1 and standard deviation σ = 1 calculate P (x ≤ 1).
11. For the normal distribution with mean µ = −1 and standard deviation σ = 1 calculate P (x ≥ 0).
12. For the normal distribution with mean µ = 1 and standard deviation σ = 2 calculate P (x > 1).
13. For the normal distribution with mean µ = 1 and standard deviation σ = 2 calculate P (x > 0).
14. For the normal distribution with mean µ = 0 and standard deviation σ = 2 calculate P (−1.00 < x < 0.75).
15. For the normal distribution with mean µ = −2 and standard deviation σ = 2 calculate P (−3.00 < x < −1.00).
16. For the normal distribution with mean µ = 10 and standard deviation σ = 0.5 calculate P (0 < x < 10).
17. Consider Example 1 with r = 0.1 (units 1/months) and y(0) = 0.5 (a relatively slow spreading disease).
a. Solve the differential equation for y(t).
c. Find the probability that a randomly chosen individual is infected with the disease in the next two
months.
18. Consider Example 1 with r = 3 (units 1/months) and y(0) = 0.5 (a relatively fast spreading disease).
c. Find the probability that a randomly chosen individual is infected with the disease in the 15 days
(i.e. 0.5 months).
19. Consider Example 1 with r = 1 (units 1/months) and y(0) = 0.1 (i.e. 10% of the population have the disease)
c. Find the probability that a randomly chosen individual is infected with the disease in 1 month.

20. Consider Example 1 with r = 0.5 (units 1/months) and y(0) = 0.3
c. Find the probability that a randomly chosen individual is infected with the disease in the 1.5 months.
Use logistic regression to find the best fitting functions p(t) to the data in the sets D = {(t1 , p1 ), ..., (tn , pn )} given in
Problems 21 to 24.
21. D = {(1, 0.10), (2, 0.15), (3, 0.30), (4, 0.49), (5, 0.58), (6, 0.76), (7, 0.87), (8, 0.95), (9, 0.93), (10, 0.98)}
22. D = {(1, 0.03), (2, 0.02), (3, 0.09), (4, 0.08), (5, 0.21), (6, 0.30), (7, 0.52), (8, 0.61), (9, 0.88), (10, 0.84)}
23. D = {(1, 0.01), (3, 0.01), (5, 0.03), (7, 0.03), (9, 0.10), (11, 0.18), (13, 0.29), (15, 0.48), (17, 0.73), (19, 0.85), (21, 0.87)}
24. D = {(1, 0.17), (3, 0.16), (5, 0.27), (7, 0.34), (9, 0.44), (11, 0.58), (13, 0.63), (15, 0.78), (17, 0.77), (19, 0.85), (21, 0.92)}
25. In a large study, human birth weights were found to be approximately normally distributed with mean 120
ounces and standard deviation 18 ounces.
a. Find the probability that a randomly chosen baby has a birth weight of 8 lbs or less.
b. Find the probability that a randomly chosen baby weighs between 6 and 8 lbs at the time of birth.
c. Find the probability that a randomly chosen baby weighs more than 9 lbs at birth.
26. A patient is said to be hyperkalemic (high levels of potassium in the blood) if the measured level of potassium is
5.0 milliequivalents per liter (meq/L) or more. In a population of students at Ozark University, the distribution
of potassium levels is normally distributed with mean 4.5 meq/L and standard deviation 0.4 meq/L. Estimate
the proportion of students that are hyperkalemic.
27. The gestation period of a pregnant woman is normally distributed with mean 279 days and standard deviation
of 16 days.
a. Find the probability that the gestation period is between 263 days and 295 days.
b. Find the probability the gestation period is greater than 303 days.
28. Answer the following questions for the data in Example 9.
a. What is the IQ values that corresponds to the 95% for each of the two 6-year-old low birth weight
groups in Table 1?
b. In the normal birth weight urban and suburban communities what is the change from age 6 to age
11 in the estimated proportion of individuals that have an IQ of 140 and above?
c. In the two 11-year old low birth weight communities the 50th percentile of the suburban community
corresponds to which percentile in the urban community?
In Problems 29 to 32 below we emphasize that we are dealing with the lognormal distribution and recall that the value
eµ is not the mean but the median (see Problem 34 below) and that the dispersion parameter σ is not the square root
of the variance v of the distribution.
29. The latent period of disease is the time from a person initially getting infected to the moment they exhibit their
first symptoms. In a 1950 paper entitled, “The distribution of incubation periods of infectious disease,” Sartwell
found that the latency period (measured in days) of Salmonellosis was approximately lognormally distributed.
Taking the natural logs of the latency periods that are measured in days, he estimated that µ = ln(2.4)) and
σ = ln(1.47). Using these estimates, find the following quantities:
a. The fraction of individuals that start exhibiting symptoms within the first three days.

b. The fraction of individuals that start exhibiting symptoms after four days.
c. The fraction of individuals that start exhibiting symptoms between the start of the 2nd day and end
of the 3rd day.
30. The latent period of disease is the time from a person initially getting infected to the moment they exhibit
their first symptoms. In a paper a 1950 entitled, “The distribution of incubation periods of infectious disease,”
Sartwell found that the latency period (measured in days) of Poliomyelitis was approximately lognormally
distributed. Taking the natural logs of the latency periods that are measured in days, he estimated that
µ = ln(12.6) and σ = ln(1.5). Using these estimates, find the following quantities:
a. The fraction of individuals that start exhibiting symptoms within the first two weeks.
b. The fraction of individuals that start exhibiting symptoms after 10 days.
c. The fraction of individuals that start exhibiting symptoms between the start of the 12th day and the
end of the 15th day.
31. The survival time after cancer diagnosis is the number of days a patient lives after being diagnosed with cancer.
In a paper entitled, “Variation in the duration of survival of patients with the chronic leukemias,” Feinleib and
McMahon found that the survival time for female patients diagnosed with lymphatic leukemia (measured
in months) was approximately lognormally distributed. Taking the natural logs of the survival times, they
estimated that µ = ln(17.2) and σ = ln(3.21). Using these estimates, find the following quantities:
a. The fraction of individuals that survived less than one year.

b. The fraction of individuals that survived at least two years.
c. The fraction of individuals that survived between 1 and 1.5 years.
32. The survival time after cancer diagnosis is the number of days a patient lives after being diagnosed with cancer.
In a paper entitled, “Variation in the duration of survival of patients with the chronic leukemias,” Feinleib and
McMahon found that the survival time for female patients diagnosed with myelocytic leukemia (measured
in months) was approximately lognormally distributed. Taking the natural logs of the survival times, they
estimated that µ = ln(15.9) and that σ = ln(2.80). Using these estimates, find the following quantities:
a. The fraction of individuals that survived less than one year.

b. The fraction of individuals that survived at least two years.
c. The fraction of individuals that survived between the start of 13 months and end of 18 months (i.e.
between 1 and 1.5 years) years.
33. In looking over her data, the entomologist mentioned in Example 12 found that she had transposed the number
of individuals dying in weeks 7 and 8. After fixing this mistake, redo all the calculations covered in this example
and see how much difference it makes to the estimate of the mean and variance associated with the actually
data, and the number of individuals that should be reared for the main experiment.
34. Show that the PDF of a normal curve has its maximum, i.e. median, at x = µ, and points of inflection at
x = µ + σ and x = µ − σ.
35. Consider f (x) = 1.5x−2.5 on [1, ∞).
a. Show it is a PDF.
b. Show it has finite mean and an infinite variance.
36. Consider Example 1 with r > 0 and y(0) = y0 ∈ (0, 1) given.

37. Consider Example 1 with r > 0 and y(0) = y0 ∈ (0, 1) given.

1
a. Verify that y(t) can be written as y(t) = 1+ea−rt where a = ln(1/y0 − 1)
b. Find the PDF for this CDF.
38. For the lognormal distribution defined by
( (ln x−µ)2
f (x) =
√1
x 2πσ
e− 2σ2 for x ≥ 0
0 otherwise.
show that the mean m and variance v are given by

2
m = eµ+σ /2
and 2 2
v = eσ − 1 e2µ+σ .
39. If X is a normally distribution random variable with mean mx and variance σx2 and Y is a lognormally
distributed random variable with mean my and variance σy2 then show that
1
mx = 2 ln my − ln m2y + σy2
2
and
σx2 = −2 ln my + ln m2y + σy2
40. The Gompertz equation is given by

dy
= −ry ln(y)
dt
This equation can be used to model a variety of population processes including tumor growth, population
growth, and acquisition of new technologies. For instance, the Gompertz equation has been used to model
mobile phone uptake, where y(t) is the fraction of individuals that have a mobile phone by time t (say in years)
and r is a parameter that can be fitted to the actual data. Using this model, we can derive a probability density
function that represents the time at which an individual acquired their first mobile phone. To illustrate this
idea, lets assume that y(0) = 1/e (i.e. currently 36.79% of the people that will have mobile phones have mobile
phones) and r = 1.
a. Solve the differential equation for r = 1 and y(0) = 1/e.
b. Verify that F (t) = 1 − y(t), where y(t) is the solution found in a. is a CDF.
c. Find the PDF for your CDF.
d. Compute the probability that a randomly chosen individual acquires a mobile phone 2 years from
now.

712 7.5. LIFE TABLES
7.5 Life tables

In Section 6.1 we introduced the simplest differential equation model of population growth i.e.
dN
= rN
dt
This model, as well as the later models we considered, implicitly assume that all individuals whether young or old
have the same mortality and fecundity rates. While this assumption is a useful first approximation, mortality and
fecundity are often age-dependent. For instance, most animals have a life history that culminates in reproductively
capable (or sexually mature) individuals only after they have received a particular age. Additionally, for many
organisms, the risk of mortality risk is higher at younger and older compared with intermediate ages. In this section,
we consider models that account for age-specific mortality and reproduction.
Figure 7.23: Albertosaurus in action!
Age-specific mortality
In a recent Science article, Biology professor Gregory Erickson∗ and colleagues studied fossils of four North American
tyrannosaurs–Albertosaurus, Tyrannosaurus, Gorgosaurus, and Daspletosaurus. Using the femur bones of these
fossils, the scientists estimated the life spans of the dinosaurs. The estimated life spans ranged from 2 years to 28
years. Using these estimates, the scientists created a life table for each of the dinosaurs. These life tables keep track
of what fraction l(t) of individuals survived to age t. For example, the life table for Albertosaurus sarcophagus (see
Figure 7.23) is reported in Table 7.5.
This table asserts that 18% of these dinosaurs survived at least 20 years. The function l(t) in this table is an
example of a survivorship function.
∗ G. M. Erikson et al. 2006. Tyrannosaur Life Tables: An example of nonavian dinosaur population biology. Science 313:213–216

7.5. LIFE TABLES 713
Table 7.5: Life Table for Albertosaurus sacrophagus.

Age t in years l(t)
2 1.0
4 0.96
6 0.91
8 0.86
10 0.77∗
12 0.73
14 0.64 ∗ correspond to interpolated values.
16 0.45
18 0.32
20 0.18
22 0.11∗
24 0.08∗
26 0.06∗
28 0.04
A function l : [0, ∞) → [0, 1] is a survivorship function if

• l(0) = 1 i.e. all individuals survive to age 0.
Survivorship function • l(t) is non-increasing i.e. if an individual survived to age t, then it survived
to all earlier ages.
• limt→∞ l(t) = 0 i.e. all individuals eventually die.
Example 1. Aging dinosaurs
Use Table 7.5 to do the following:
a. Determine what fraction of dinosaurs die between ages 4 and 6.
b. Determine what fraction of dinosaurs die between ages 10 and 14.
c. Plot l(t) and discuss its shape.
Solution.
a. Since l(4) = 96% of the dinosaurs survive to age 4 and l(6) = 91% of the dinosaurs survive to age 6,
l(4) − l(6) = 5% die between ages 4 and 6.
b. Since l(10) = 77% of the dinosaurs survive to age 10 and l(14) = 64% of the dinosaurs survive to age 14,
l(10) − l(14) = 13% die between ages 10 and 14.
c. Plotting l(t) with technology yields

0.9
0.8
0.7
0.6
l(t)
0.5
0.4
0.3
0.2
0.1
0
0 5 10 15 20 25 30
age t in years
As we expect l(t) is a decreasing function of t. In other words, the fraction of individuals surviving
decreases with age. l(t) is concave down for ages less than approximately 15. Hence, survivorship is
decreasing at a faster rate at the younger ages. Alternatively, survivorship is decreasing at a slower rate
at the older ages. The reason for this in organisms such as fruit flies has been shown to be related to
genetic factors that influence longevity: by a certain age, the only individuals left are those that have
genes promoting longevity. This subgroup of individuals is responsible for a longer tail than expected in
the survivorship function for the population as a whole.
2
Survivorship functions have a natural relationship to CDFs of an appropriate random variable as the following
example shows.
Example 2. From survivorship to CDF
Let l(t) be the survivorship function for Albertosaurus and let X be the age at which a randomly chosen Alber-
tosaurus dies. If F is the CDF for X, then determine the relationship between F and l.
Solution. Since l(t) is the fraction of individuals that die after age x, l(t) = P (X > t). Since P (X > t) =
1 − P (X ≤ t) = 1 − F (t), we have that l(t) = 1 − F (t) or, equivalently that F (t) = 1 − l(t). 2
Using Table 7.5, we can determine how the mortality rates of the Albertosaurus vary with age. In particular,
imagine (as did a famous movie!) that on a remote island scientists were able to create 100 Albertasaurus babies. Of
these 100, the life table implies that all of them would survive to age 2 or, more plausibly, the life table begins with
individuals of age 2 and only considers mortality from age 2 onwards. Thus for every 100 individuals that survive
to age 2, only 96 survive to age 4. Therefore 4% die over two years and the mortality rate is approximately 2% per
year. Equivalently, we could have computed this mortality rate as follows:
1 l(2) − l(4) 1 1.0 − 0.96
= = 0.02 per year
2 l(2) 2 1.0
By age 6, there are only 91 individuals left. In other words, 5 individuals die between the ages of 4 and 6. Hence,
5
95 ≈ 5.2% of individuals die per two years or 2.6% per year. Equivalently, we could have computed this mortality
rate as follows:
1 l(6) − l(2)
≈ 0.026 per year
2 l(6)
In the following example, you compute and interpret the mortality rates for the remaining age classes.
Example 3. Dinosaur mortality rates
Using the life table for Albertosaurus sarcophagus:

a. Determine the age-specific mortality rates.

b. Discuss which ages where most susceptible and least susceptible to mortality.
Solution.
a. We already found that for ages 2, 4, and 6 that the mortality rates are 0.00, 0.03, and 0.21 per annum.
To determine the mortality rate at age 8, we can compute
1 l(8) − l(10)
≈ 0.052 per year
2 l(8)
Computing the remaining mortality m(t) rates yields:
Age t in years m(t)

2 0.020
4 0.026
6 0.0275
8 0.0523
10 0.0260
12 0.0616
14 0.1484
16 0.1444
18 0.2188
20 0.1944
22 0.1364
24 0.1250
26 0.1667
b. This table suggests that as individuals get older their mortality risk overall tends to increase.
2
In Example 3, we computed the mortality rate using the relationship

1 l(t) − l(t + 2)
m(t) = per year
2 l(x)
If we view 2 as the step size ∆t between measurements, then this equation becomes
1 l(t) − l(t + ∆t)
m(t) = per year
∆t l(t)
Multiplying both sides of this equation by −l(t) yields
l(t + ∆t) − l(t)
−l(t)m(t) =
∆t
Taking the limit as ∆t approaches 0 provides us with the following result.
If l(t) is the fraction of individuals that survive to age t and m(t) is the mortality
rate at age t, then l(t) and m(t) satisfy the equation
Survivorship-mortality l′ (t) = −m(t)l(t)

equation
Equivalently
l′ (t)
m(t) = −
l(t)

Example 4. Constant mortality rates
For many short lived mammals and birds, the mortality rate m(t) is approximately constant† . Assuming that
m(t) = m is a constant, determine l(t) and CDF associated with this survival function. Does it look familiar?
Solution. If m(t) = m is constant, then l′ (t) = −m l(t). The general solution to this equation is l(t) = l(0)e−mt .
Since all individuals survive to age 0, l(0) = 1 and l(t) = e−mt . In Example 2, we noted that 1 − l(t) = 1 − e−mt for
1
t ≥ 0 is the CDF for the distribution of ages. This CDF corresponds to the exponential distribution with mean m .
1
Hence, for individuals with a constant mortality rates m per year, the life expectancy is m years. 2
Example 5. T. Rex
Biology professor Gregory Erickson∗ used data from the fossil record and non-linear regression to estimate the
following survivorship curve for Tyrannosaurus rex :
0.2214t
l(t) = e0.009−0.009e
where t is measured in years. The plot of this function is shown below:
0.9
0.8
0.7
0.6
l(t)
0.5
0.4
0.3
0.2
0.1
0
0 5 10 15 20 25 30
age in years
Compute and interpret the mortality rate m(t) for T. rex.

′
Solution. The mortality rate is given by m(t) = − ll(t)
(t)
. By chain rule,
0.2214t
l′ (t) = e0.009−0.009e (−0.009e0.2214t0.2214)
= −l(t)0.00199e0.2214t
Therefore,
l′ (t)
m(t) = − = 0.00199e0.2214t per year
l(t)
Hence the instantaneous mortality rate is initially low (approximately 0.2% in the first year and exponentially
increasing to close to 50% in the 25th year). 2
Life expectancy
Given a survival function l(t) for a population, we can ask “what is the life expectancy of an individual?” To answer
this question, let X be the age at which a randomly chosen individual dies. The mean of X is the mean lifespan of
† T.A. Ebert. Plant and Animal Populations: Methods in Demography. Academic Press, San Diego, CA, 1999
∗ G. M. Erikson et al. 2006. Tyrannosaur Life Tables: An example of nonavian dinosaur population biology. Science 313:213–216

an individual in the population. To compute this mean, recall that the CDF for X is given by F (t) = 1 − l(t) for
t ≥ 0 and 0 otherwise. Hence, the PDF for X (assuming l is differentiable!) is −l′ (t) for t ≥ 0 and 0 otherwise. The
mean of X is given by Z ∞
−tl′ (t) dt
0
provided the improper integral is convergent. Let us assume that it is. To simplify this integral, we can take
advantage of integration by parts. Namely, let u = t and dv = −l′ (t)dt. Then du = dt and v = −l(t). Hence, we get
Z Z
−tl′ (t) dt = −tl′ (t) + l(t) dt
Evaluating this integral from 0 to b and taking the limit as b → ∞ yields

Z ∞ Z b
−tl′ (t) dt = lim −tl′ (t) dt
0 b→∞ 0
b Z b
lim −tl′ (t) +

= l(t) dt
b→∞ 0 0
Z ∞
= lim −bl′ (b) + l(t) dt
b→∞ 0
Z ∞
= l(t) dt
0
R∞
where the last line follows from the facts that −tl′ (t) ≥ 0 and 0
−tl′ (t) dt convergent implies limb→∞ −bl′ (b) = 0.
Hence, we proved the following result.
Theorem 7.4. Life expectancy
Let l(t) be a continuously differentiable survivorship function. Let X be the random variable whose CDF is given
by 1 − l(t) for t ≥ 0 and 0 otherwise. If X has a convergent mean, then the mean of X equals
Z ∞
l(t) dt
0
Example 6. Life expectancy of Albertosaurus
Estimate the mean age of Albertosaurus using Table 7.5.
Solution. Using the right end point rule with ∆t = 2 and assuming maximum lifespan is 30 years, we get
Z ∞ Z 30
l(t) dt = l(t) dt
0 0
≈ [l(2) + l(4) + l(6) + . . . + l(30)]2
= (1.0 + 0.96 + 0.91 + . . . + 0)2
= 14.22 years
Hence, the life expectancy of an Albertosaurus is 14.22 years (given that only individuals making it to year 2 are
considered!). Interestingly this life expectancy is believed to be the age at which Albertaosaurus achieves sexual
maturity. 2
Example 7. Older is better

Consider a hypothetical population whose mortality rate is

3
m(t) = per year
1+t
Determine the life expectancy of this population.
Solution. To determine the life expectancy, we need to find l(t). Since l(t) must satisfy l′ (t) = −m(t)l(t) and
l(0) = 1, we can use separation of variables to solve for l(t)
Z Z
dl 3 dt
= −
l 1+t
ln l = −3 ln(1 + t) + C = ln(1 + t)−3 + C
eC
l = (1 + t)−3 eC =
(1 + t)3
1
Since l(0) = 1 = eC , we get l(t) = (1+t)3 . R∞ dt
To find the life expectancy, we need to compute 0 (1+t)3 . Using the substitution u = 1 + t, we get
Z Z
dt du
=
(1 + t)3 u3
1
= − 2 +C
3u
1
= − +C
3(1 + t)2
Therefore,
Z ∞
dt 1 1
= lim − +
0 (1 + t)3 b→∞ 3(1 + b)2 3
1
= years
3
2
Including reproduction
So far we have only considered the likelihood of an individual surviving until a certain age. To fully understand
these dynamics of the population, we also need to know how the reproductive success of an individual depends on
their age. In other words, how many progeny does an individual of a particular age produce on average. We let b(t)
denote the average number of progeny produced by an individual of age t. The likelihood l(t) of surviving to age t
in conjunction with b(t) provides a lot information about the demography of a population as the following example
illustrates.
Example 8. Vole life history
In their classic text, The Distribution and Abundance of Animals, ecologists Andrewartha and Birch created the
life table, Table 7.6, for females of the vole species Microtus agrestis. Use this table to answer the following question:
If you were given 100 female voles of age 0, and you placed them in your backyard, how many female progeny would
they produce during their lifetime?
Solution. Of the 100 females, we expect 83% will survive to week 8. Each of these 83 will produce 0.08 daughters
per week. Hence, in the interval [0, 8], we expect 83 × 0.08 × 8 = 53.12 daughters to be produced. 73% of the females
survive to week 16. Each of these surviving females will produce 0.3 daughters per week from week 8 to week 16.
Hence, in the interval [8, 16], we expect 73 × 0.3 × 8 = 175.2 daughters to be produced. Continuing in this manner,
we get the following table:

Figure 7.24: The vole Microtus agrestis.
Table 7.6: Life table for Microtus agrestis where t is measured in weeks, l(t) is the fraction of females surviving to
age t, and b(t) is the average number of female offspring produced per week by an individual of age t.
t l(t) b(t)
8 0.83 0.08
16 0.73 0.30
24 0.59 0.37
32 0.43 0.31
40 0.29 0.21
48 0.18 0.14
56 0.10 0.08
64 0.05 0.05
72 0.03 0.04
time interval daughters (rounded to integers) produced over time interval

[0, 8] 53
[8, 16] 175
[16, 24] 175
[24, 32] 107
[32, 40] 49
[40, 48] 20
[48, 56] 6
[56, 64] 2
[64, 72] 1
Adding all these daughters up yields 588 daughters are expected to be produced by 100 females. Equivalently, each
female will produce on average 5.88 daughters. 2
Example 8 illustrates how to use the life table to determine the average number of daughters produced by a
female during her life time. To generalize the computations in Example 8 to an arbitrary survival function l(t) and
an arbitrary reproduction function b(t), assume that initially there are N females (e.g. N = 100 in Example 8) and
that ∆t is the width of the time intervals for life table (e.g. ∆t = 8 in Example 8). The number of females that
survive to age t1 = ∆t is N l(t1 ). Each of these females produce b(t1 )∆t daughters. Hence, by time t1 , there are
N l(t1 )b(t1 )∆t daughters. The number of females that survive to age t2 = 2∆t is N l(t2 ). Each of these females

produce approximately m(t2 )∆t daughters in the time interval [t1 , t2 ]. Hence, by time t2 , there are approximately
N l(t1 )b(t1 )∆t + N l(t2 )b(t2 )∆t
daughters produced. Continuing inductively, there are approximately
N l(t1 )b(t1 )∆t + N l(t2 )b(t2 )∆t + N l(t3 )b(t3 ) + N l(t4 )b(t4 ) + . . .
daughters produced. Taking the limit as k → ∞ yields the expected number of daughters D to be
Z ∞
D=N l(t)b(t) dt.
0
If we now define the Reproductive Number R0 to be the number of daughters that we expected each individual female
to produce in her life time—that is R0 = D/N —then we obtain the following relationship:
Let l(t) be a survival function and b(t) be a reproduction function. The average
number of daughters produced by a female is
Z ∞
Reproductive
R0 = l(t)b(t) dt
number 0
whenever the improper integral is well defined. R0 is called the reproductive

number of the population.
Ignoring the role of males (in the simplest case, one could just assume a 50:50 sex ratio), if R0 > 1, then each
female more than replaces herself in each generation and the population grows. On the other hand, if R0 < 1, then
each female fails to fully replacee herself in each generation and the population declines.
Example 9. Reproductive number for painted turtles
Painted turtles are found in Iowa and their favorite pastime is basking in the sun on warm March days. The recede
to the bottom of the wetland for the night. The females lay their eggs in late May or June. Using a mark-recapture
study, Biology professor Henry Wilbur estimated the survival and reproductive functions for painted turtles. He found
that l(t) ≈ 0.243e−0.273t for t ≥ 1 and l(t) ≈ e−1.69t for t < 1. Moreover, he assumed that females are reproductively
mature at age 7 and mature females produces on average 6.6 daughters per year. Using this information, do the
following:
a. Estimate the life expectancy of a female painted turtle.
b. Estimate the reproductive number of the painted turtles. Based on this estimate discuss whether you
think the painted turtle population would be increasing or decreasing.
Solution.
R∞
a. To estimate the life expectancy, we need to compute 0 l(t) dt. By the splitting property for integrals
R∞ R1 R∞
0
l(t) dt = 0 l(t) dt + 1 l(t) dt. The first integral equals
Z 1
1
e−1.69t dt = (e−1.69 − 1) ≈ 0.4825
0 −1.69
R 0.243 −0.273t
Since, ignoring the constant of integration 0.243e−0.273t dt = − 0.273 e ≈ −0.89e−0.273t, we get
Z ∞
0.243e−0.273t dt ≈ lim −0.89e−0.273b + 0.89e−0.273
1 b→∞
= 0.89e−0.273 ≈ 0.6774
Therefore the life expectancy is approximately 0.4825 + 0.6774 ≈ 1.16 years. Hence, a female turtle is
not expected to live to a reproductively mature age!

R∞
b. The reproductive number is given by R0 = 0 l(t)b(t) dt. Since b(t) = 0 for t ≤ 7, we get
Z ∞
R0 = 0.243e−0.273t6.6 dt
7
R
Since, ignoring the constant of integration 0.243e−0.273t6.6 dt ≈ −5.875e−0.273t, we get
R0 = lim −5.875e−0.273b + 5.875e−0.273·7

b→∞
≈ 0.8691
Hence, a female painted turtle is expected to produce less than one daughter during her lifetime. This
suggests that the population of painted turtles would be in decline as individuals are not replacing
themselves over their life time.
2
In addition to applications in demography, life tables can be used to understand the spread of disease in a
population. In a striking parallel to the demographic process of survivorship and reproduction, as illustrated in
the next example, an individual who contracts a disease will be subject to a maturation process known as a latent
period and then will become infective, which is akin to reaching sexual maturity. Then in each period, the infected
individual may or may not infect another individual, which is akin to reproduction. And, of course, along the way,
the infected individual may either recover from the disease or die, which is a akin to mortality. We should note
that the precise characteristics of a disease depends both on the genetics of the particular strain of the pathogen
causing the disease and the genetics of the host species being infected. Thus no two epidemics of a disease are the
same, which is why influenza can sometimes be of minor concern, and sometimes a worldwide threat such as Spanish
Influenza which killed 20-40 million people (exact total is unknown) around the world in the years of 1918 and 1919.
Example 10. Measles Epidemics
Measles is a highly infectious viral disease (genus Morbillivirus of the family Paramyxoviridae) that infects
particularly human infants and adults. An individual infected with measles will become infectious anywhere from 7-
18 days and remain infectious for about 8 days. For a particular population where access to medical care is relatively
low, the proportion of individuals expected to die from the measles is given in the survival column in Table 7. The
number of new infections that arise from an infected individual (these new infections are equivalent to “births” in the
context of the growth of the infected population) depends on many factors, including the rate at which individuals
contact other individuals on public transport, at the work place, etc. However, in the population of concern, public
health officials have determined that the number of new cases infected individuals can be expected to give rise to
before they are cured is given by the “infections” column in Table 7.
a. If several infectious individuals are introduced into the population to which these data apply is an epidemic
expected to occur (i.e. is the population of infectious individuals expected to grow)?
b. If the proportion of individuals vaccinated in a population reduces the expected number of individuals
infected per infectious individual by this same proportion, then what proportion of the population should
be vaccinated to ensure that the disease will not spread?
Solution.
a. Since we have cast this problem in terms of life table analysis, whether or not a measles epidemic will
occur depends on the value of R0 being greater or less than 1. From Table 7 and the fact that we have
discretized the survival and infection (i.e. birth) functions to be constant values lt and bt for each discrete
time interval of 1 day, it follows that
Z ∞ 23
X
R0 = l(t)b(t) dt = lt bt = 2.28.
0 t=7
Hence, the population of infected individuals will grow and so an epidemic will occur.

Table 7.7: Survival and infection columns for a measles epidemic

Day (t) Survival (lt ≡ l(t)) Infections (bt ≡ b(t))
1-6 1 0
7 1 0.03
8 1 0.06
9 1 0.09
10 1 0.12
11 0.99 0.15
12 0.98 0.18
13 0.97 0.21
14 0.96 0.24
15 0.95 0.24
16 0.94 0.24
17 0.93 0.21
18 0.92 0.18
19 0.91 0.15
20 0.90 0.12
21 0.90 0.09
22 0.90 0.06
23 0.90 0.03
24 0.90 0
b. If a proportion of individuals y is vaccinated, the proportion available to spread the disease is 1 − y.

To control the population we need to select y to ensure that R0 < 1: that is, we need to solve R0 =
2.28(1 − y) < 1 for y. This implies that 2.28y > 2.28 − 1 or y > 1.28/2.28 ≈ 0.5614. Hence at least 57%
of the population should be vaccinated to ensure that the measles does not spread in the population.
Problem Set 7.5

Use the life Table 7.5 for Albertosaurus to compute the quantities in Problems 1 to 4.
1. The fraction of Albertosaurs that died between 14 and 20 years.
2. The fraction of Albertosaurs that died between 20 and 28 years.
3. The fraction of Albertosaurs that lived at least 6 years.
4. The fraction of Albertosaurs that lived at least 8 years.
Use the life Table 7.6 for Microtus agrestis to compute the quantities in Problems 5 to 8.
5. The fraction of female voles that lived less than 24 weeks.
6. The fraction of female voles that lived less than 40 weeks.
7. The fraction of female voles that lived between 24 and 48 weeks.
8. The fraction of female voles that lived between 40 and 64 weeks.
9. Find the survivorship function l(t) when m(t) = a + bt with a > 0 and b > 0.
a
10. Find the survivorship function l(t) when m(t) = b+t with a > 0 and b > 0. Discuss how a and b influence the
shape of the survivorship function.
d
11. Show that m(t) = − dt ln[l(t)] provided that l(t) is differentiable.

R∞ R∞
12. If 0 l(t) dt is convergent and b(t) ≤ B for all t, show that R0 = 0 l(t)b(t) dt is convergent.
13. Use the life Table 7.6 for Microtus agrestis to approximate the mortality rates for all age classes of the female
vole. Discuss any pattern in the mortality rates that you observe.
14. Use the life Table 7.6 for Microtus agrestis to compute the life expectancy of the female vole.
Compute the life expectancy for populations with the (hypothetical) survivorship functions in Problems 15 to 20.
Assume t is measured in years. Hint: one of the expectancies is infinite!
15. l(t) = e−t
16. l(t) = e−t/100

1
17. l(t) = (1+t)2
1
18. l(t) = (1+t/100)2
1
19. l(t) = 1+t
1
20. l(t) = t 3
(1+ 20 )
Compute R0 for populations with the (hypothetical) survivorship and reproduction functions in Problems 21 to 26.
Assume t is measured in years.
21. l(t) = e−t and b(t) = 2 for t ≥ 1 and 0 for 0 ≤ t < 1.
22. l(t) = e−t/100 and b(t) = t.

1
23. l(t) = (1+t)2 and b(t) = 5 for t ≥ 5 and b(t) = 0 for 0 ≤ t ≤ 5.
1
24. l(t) = (1+t/100)2 and b(t) = 0.1.
1 5
25. l(t) = 1+t and b(t) = 1+t .
1 3
26. l(t) = 1+ 2t
and b(t) = 1+ 2t
.
27. According to the work of Erikson and colleagues, the mortality rate for the dinosaur species Gorgosaurus is
given by
m(t) = 0.0059e0.2072 t per year
Find and plot the survivorship function l(t).
28. According to the work of Erikson and colleagues, the mortality rate for the dinosaur species Daspletosaurus is
given by
m(t) = 0.0018e0.2006 t per year
Find and plot the survivorship function l(t).
29. According to the work of Erikson and colleagues, the survivorship function for the dinosaur species Albertosaurus
is given by
0.187t
l(t) = e0.039(1−e )
Find and plot the mortality rate m(t).
30. According to a National Statistics Report (volume 54, number 14), the life table for people in the United states
in 2003 was

t (years) l(t)
0 1.00
10 0.991
20 0.987
30 0.978
40 0.966
50 0.940
60 0.878
70 0.755
80 0.527
90 0.213
100 0.02
Using right end points, estimate the life expectancy of a human.

31. From a new outbreak of a SARS-like corona virus data was collected that resulted in the construction of the
following table:
Day lt bt
4 1 0.2
5 0.98 0.3
6 0.95 0.3
7 0.92 0.3
8 0.89 0.3
9 0.86 0.3
10 0.83 0.3
11 0.8 0.3
12 0.77 0.2
13 0.74 0.2
14 0.72 0.1
Use this table to answer the following questions:

a. If several infectious individuals are introduced into another population, is the epidemic expected to
spread?
b. If the proportion of individuals vaccinated in a population reduces the expected number of individuals
infected per infectious individual by this same proportion, then what proportion of the population
should be vaccinated to ensure that the disease will not spread?
32. Communicable diseases often have at least two stages: a latent stage in which the individual is infected but
not infectious and an infectious stage in which the individual can infect others. For a deadly disease where
the time to death is exponentially distributed with mean 1/q days, the fraction of individuals surviving t days
with the disease is l(t) = e−qt . Using differential equations to model the infection with two stages, latent and
infectious, the infectiousness of an average infected individual (i.e. the number of people infected per day) is
given by
a
b(t) = k e−ct − e−at infected per day
a−c
where 1/a is the mean duration of the latent period, 1/c is the mean duration of the infectious period, and k
is the rate an infectious individual infects others. For this model find R0 .
33. The parameters of the HIV epidemic vary considerably from country to country. Below are the survival (includes
both death and drop out rates) for treated and untreated segments of the population. The numbers reflect
the fact that we expect all individuals to die within 10 years if they are infected, unless they are treated. In
this case we assume that they leave the drop out of the sexually active population under consideration after 20
years. Also their infectivity is less for some of the infectivity period because the levels of virus in their bodily
fluids is reduced by treatment. This infectivity comes back later as the efficacy of treatment is reduced over
time.

a. Compare the R0 for the treated and untreated segments of the population. What do you conclude?
b. What levels of condom use in the two subpopulations are needed to control the epidemic, assuming
that condom use reduces the probability of transmission by 95%.
Year Untreated lt Untreated bt Treated lt Treated bt

1 1 0.5 1 0.5
2 1 0.2 1 0.2
3 1 0.2 1 0.2
4 1 0.2 1 0.2
5 0.95 0.2 0.98 0.1
6 0.9 0.2 0.96 0.05
7 0.8 0.2 0.94 0.05
8 0.65 0.1 0.92 0.05
9 0.45 0.1 0.9 0.05
10 0.2 0.1 0.87 0.05
11 0 0 0.84 0.05
12 0 0 0.81 0.075
13 0 0 0.78 0.1
14 0 0 0.75 0.1
15 0 0 0.72 0.1
16 0 0 0.69 0.1
17 0 0 0.64 0.1
18 0 0 0.54 0.1
19 0 0 0.44 0.1
20 0 0 0.34 0.1
21 Assume all individuals have now left the population of interest


DEFINITIONS
Section 7.1
Histogram, p. 639
Random variable, p. 640
Probability, p. 641
Probability density function (PDF) p. 644
Continuous random variable, p. 645
Uniform PDF, p. 646
Section 7.2
Pareto distribution, p. 661
Improper integral, p. 659
Section 7.3
Mean, p. 674
Average, p. 674
Mathematical expectation, p. 674
Variance, p. 681
Standard deviation, p. 681
Section 7.4
Logistic distribution, p. 691
Normal (Gaussian) distribution, p. 699
Bell curve, p. 699
Standard normal distribution, p. 700
lognormal distribution, p. 703
Section 7.5
Survivorship function, p. 713
Reproductive number, p. 720
Section 7.1
Area under a PDF, p. 644
CDF properties, p. 647
Section 7.2
Convergent and divergent improper integrals p. 660
p-integrals, p. 661
Relationship between PDF and CDF, p. 663
THEOREM 7.1 FUNDAMENTAL THEOREM OF PDFS, p. 663
Convergence tests, p. 664
THEOREM 7.2 COMPARISON TEST, p. 664
Two-sided improper integral, p. 667
Laplace distribution, p. 669
Section 7.3
Mean for a PDF, p. 675
THEOREM 7.3 CHEBYSHEV’S INEQUALITY, p. 685
Section 7.4
PDF and CDF of the logistic distribution, p. 692
Mean of the logistic distribution, p. 696
PDF of the normal distribution, p. 699

z-scores (standard scores), p. 701

Lognormal distribution, p. 703
Section 7.5
Survivorship-Mortality equation, p. 715
Life expenctancy, p. 716
THEOREM 7.4 LIFE EXPECTANCY, p. 717
Section 7.1
Bird diversity in oak woodlands
Mediterranean fruit fly
Section 7.2
Torricelli’s trumpet
Laplace distribution
Section 7.3
Extinction ratios of North American freshwater fauna
Section 7.4
Spread of Pyura praeputialis
Wheat yields at Rothamsted experimental station
Chicken pox latency periods
Section 7.5
Survivorship of the Albertosaurus and Tyrannasaurus rex
Reproductive number for painted turtles
1. Consider the following data set where X denotes a score.

Score Frequency
50-59 6
60-69 14
70-79 26
80-89 10
90-99 4
a. Construct a histogram.
b. Find P (0 ≤ X ≤ 89)
c. Find P (X > 79)
2. Find a constant a so that f (x) = ax3 , 0 ≤ x ≤ 4 is a PDF.
3. Consider the hyperbolic function

x
k+x if x ≥ 0
F (x) =
0 elsewhere
for any k > 0.
a. Show that F (x) is CDF.

b. Let X be a random variable with CDF F (x). Find P (1 ≤ X ≤ 2).
R∞ 2
4. Use the comparison test to prove that √1 e−x /2 dx is convergent.
−∞ 2π
R∞ 2
5. Use the comparison test to prove that √1 x2 e−x /2 dx is convergent.
−∞ 2π

6. Use the comparison test to prove that the integral

Z ∞
rea−rt
t dt
0 (1 + ea−rt )2
is convergent for any r > 0.
7. What is wrong, if anything, with the following evaluation:
Z 3
(x − 2)−1 = ln |x − 2|30
0
= ln 1 − ln 2
= − ln 2
8. Assume b > a and define the uniform PDF:

1
b−a if a ≤ x ≤ b
f (x) =
0 elsewhere

b. Find the CDF F (x) associated with f (x).
9. Determine whether the given integrals converge or diverge.
R∞
a. 1 xdx0.99
R ∞ dx
b. 1 x1.1
R ∞ dx
10. Determine for which p > 0 values, the integral 2 x(ln x)p is convergent.
11. Use the convergence test to determine whether the given integrals converge or diverge.
R ∞ dx
a. 3 √ 3
2x−1
R 0 sin2 xdx
b. −∞ 1+x2
2
12. Show that f (x) = x2 is a PDF on [1, 2] and find its CDF.
13. Find the PDF for the given CDF:
1
F (x) = 1 − if x ≥ 1 and F (x) = 0 if x ≤ 1.
x
14. Compute the mean, variance, and standard deviation for a pair of dice; i.e. data set:
2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
4
15. Compute the mean of the random variable with PDF f (x) = x5 for x ≥ 1 and f (x) = 0 elsewhere.
16. Show that the usual definition of the mean for a data set (i.e. add up the data and divide by the number of
data points) is the same as our definition of the mean (i.e. weighted sum of the data points).
17. According to Thomson et al. 1973∗, the elimination constant for Lidocaine for patients with congestive heart
failure is 0.31 per hour. Hence, for a patient that has received an initial dosage of y0 mg, the Lidocaine level
y(t) in the body can be modeled the differential equation
dy
= −0.31y y(0) = y0
dt

7.7. GROUP RESEARCH PROJECTS 729
a. Solve for y(t).

time t ≥ 0.
d. What is the probability that a randomly chosen drug particle leaves the body in the first 2 hours?
e. What is the probability that a randomly chosen drug particle leaves the body between the start of
the second and start of the fourth hour?
18. The 1999 AAPA Physician Assistant Census Survey found that the mean income for a clinically practicing PAs
working full-time was $68,164 with a standard deviation $17,408. Using Chebyshev’s inequality determine a
lower bound for the fraction of PA with an income between $42,052 and $94,276.
19. The time for a mosquito to mature from larvae to pupae is approximately exponentially distributed with mean
14 days. Find the probability a mosquito has matured from larvae to pupae in 10 days or less. Find the
probability a mosquito has taken at least 14 days to mature from larvae to pupae.
20. According to Alexei A. Sharov, Department of Entomology at Virginia Tech, mortality depends on numer-
ous factors: “temperature, population, density, etc. When building a lift-table, the effect of these factors is
averaged” and only age is considered at a factor that determines mortality. In an article, “Age-Dependent
Life Tables,” Sharov presents a life table for a sheep population in which females are counted once a year,
immediately after breeding season.
t (years) l(t)
0 1.00
1 0.845
2 0.824
3 0.795
4 0.755
5 0.699
6 0.626
7 0.532
8 0.418
9 0.289
10 0.162
11 0.060
Use this table to compute the life expectancy of a female sheep.
7.7 Group Research Projects

Working in small groups is typical of most work environments. Thus learning to work with others and to
communicate specific ideas is an important skill. Work with three or four other students to submit a single report
based on each of the following questions.
Project 7A: Fitting Distributions

Search the web for a data set consisting of at least several hundred data points. Explore your data as outlined below,
providing figures to enhance the presentation of your analysis.
1. Draw histograms for several different bin sizes and select the histogram that results in the smoothest looking
probability distribution in terms of being approximated by some curve. Note that if the bin size is too large,
the histogram will look like a few big blocks. If the bin size is too small, the histogram will look like a picket
fence with lots of missing staves.

730 7.7. GROUP RESEARCH PROJECTS
2. Calculate the mean and variance from the histogram. Compare this value to the value you get when calculating
the mean and variance directly from the data.
3. Calculate the expected values for each bin of a theoretical histogram obtained from a uniform, logistic, normal,
and lognormal distribution that has the same mean and variance as the histogram you constructed from the
data.
4. Use a sum-of-squares measure to compare how well these four distributions fit the data and discuss your results.
5. Bonus: Search the web or books for other distributions not dealt with in this chapter and repeat Steps 2 and
3 for these distributions.
Project 7B: Play with Logistic Regression

Use an appropriate computer technology to generate a set of data the conforms to the logistic distribution
1
p(x) =
1 + ea−rx
for the case a = 5 and r = 1 as follows.
1. First verify that p(0.5) = 0.011 and p(10) = 0.993. Thus x ∈ [0.5, 10] covers more than 98% of the range of
values that p(x) can assume.
2. Use your technology to generate one hundred values xi , i = 1, ..., 100 of a random variable X that is uniformly
distributed on [0.5, 10]. Check to make sure that the mean and variance of these 100 values conform to the
theoretically expected values.
3. For each xi calculate the corresponding pi = 1+e15−xi . Now for each i generate a value zi from the uniform
distribution on [0,1]. (Most technologies refer to this as generating a value at random between 0 and 1.) If
zi > pi set yi = 0 otherwise set yi = 1. Once you have done this for all i = 1, ..., 100 you will have a data set
D = {(xi , yi )|i = 1, ..., 100)} with value of xi ranging between 0.5 and 10 and value of yi either 0 or 1.
4. Construct a histogram for this data using 6 equal bin sizes and the proportion of data points in the bin that
have a yi value equal to 1.
5. Use logistic regression to estimate the parameters â and r̂ from the best fitting linear model of the transformed
data from the histogram. How close are â and r̂ to the values 5 and 1 respectively?
6. Now repeat the exercise with 300 points and again with 1000 points. In each case, how close are â and r̂ to
the values 5 and 1 respectively? What do you notice?
7. Write a report that contains your results and in the concluding section, explain what you think you have been
doing.

INDEX 731
Index
e, 267 exponential growth, 72
rate of change floor function, 161

average, 143 function
instantaneous, 144 absolute value, 26
decreasing, 27
allometry definition, 19
definition, 66 domain, 19
formula, 66 exponential, 76
amplitude, 47 horizontal shift, 87
applications image, 19
exponential decay of beer froth, 74, 75 increasing, 27
Laws of Misery, 72 linear, 40
arithmetic sequence, 108 periodic, 46, 47
asymptotes piecewise defined, 26
horizontal, 190 power, 57
vertical, 193 range, 19
vertical line test, 24
base, 57 vertical shift, 87
functional response, 96
cobwebbing, 116 functions
continuity real valued, 17
definition, 177
laws of, 179 gedanken experiment, 60
of elementary functions, 179 geometric sequence, 108
on an open interval, 180
half-life, 80
daily incidence rates, 23
data infinity, 18
bivariate, 40 integers, 17
derivative Intermediate value theorem, 181
as a function, 233 interval
as an instantaneous rate of change, 224 closed, 18
at a point, 220 infinite, 18
Leibniz notation, 236 open, 18
difference equation
definition, 108 laws of exponents, 59
equilibrium, 114 limit
difference equations definition, 154
cobwebbing, 116 of polynomials and rational functions, 175
differentiability one-sided, 161
versus continuity, 228 limits
doubling time, 72, 80 sequential, 202
line, 40
elasticity, 318 best fitting, 42
environmental carrying capacity, 208 point slope formula, 50
exponent, 57 slope, 40
exponential slope formula, 40
decay, 74 vertical intercept, 40
doubling time, 80 linear regression, 44
growth, 71 residuals, 44
half-life, 80 logarithm

732 INDEX
common, 78
definition, 78
laws of, 78
natural, 78
logistic
discrete model, 208
mathematical model, 6
mean value theorem, 238
model, 5
numbers
irrational, 18
natural, 17
rational, 18
real, 18
whole, 17
power rule, 263

proportionality
definition of, 60
rules of, 61
quadratic approximation, 328
recursive formula, 107
sequences
and continuity, 204
limit of, 202
step function, 161
stock-recruitment, 211
tangent line, 222
used domain convention, 21

Bio Calculus P

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Bio Calculus P

Transféré par

Droits d'auteur :

Formats disponibles

Calculus for the Life Sciences

Sebastian J. Schreiber, Karl J. Smith, and Wayne M. Getz

About the authors:

©2008 Schreiber, Smith & Getz

©2008 Schreiber, Smith & Getz 3

About this Book

To achieve these goals, the book has several important features.

©2008 Schreiber, Smith & Getz

©2008 Schreiber, Smith & Getz

©2008 Schreiber, Smith & Getz

1 Modeling with Functions 3

2 Limits and Derivatives 141

3 Derivative Rules and Tools 261

4 Applications of Differentiation 353

©2008 Schreiber, Smith & Getz 1

6 Differential Equations 547

7 Probabilistic Applications of Integration 637

©2008 Schreiber, Smith & Getz

Modeling with Functions

1.2 Real Numbers and Functions, p. 17

1.3 Data Fitting with Linear and Periodic Functions, p. 40

1.4 Power Functions and Scaling Laws, p. 57

1.5 Exponentials and Logarithms, p. 71

1.6 Function Building, p. 87

1.7 Sequences and Difference Equations, p. 107

1.9 Summary and Review, p. 128

©2008 Schreiber, Smith & Getz 3

©2008 Schreiber, Smith & Getz

©2008 Schreiber, Smith & Getz

What is Mathematical Modeling?

©2008 Schreiber, Smith & Getz

©2008 Schreiber, Smith & Getz

Figure 1.3: Achilles and the tortoise

Example 1. Sequences: an intuitive preview

©2008 Schreiber, Smith & Getz

1 2 3 1, 000 1, 001 9, 999, 999

It is reasonable to guess that the sequence of fractions is approaching the number 1. 2

The Derivative: Rates of Change

©2008 Schreiber, Smith & Getz

parts per million (ppm)

parts per mil ion (p m)

parts per mil ion (p m)

parts per million

Figure 1.5: The Tangent Line

Integration: Accumulated Change

©2008 Schreiber, Smith & Getz

Figure 1.7: Area under a curve

then the total area can be approximated by finding the sum

a. 8 approximating rectangles b. 16 approximating rectangles

Figure 1.8: Approximating the area using circumscribed rectangles

Problem Set 1.1

©2008 Schreiber, Smith & Getz

©2008 Schreiber, Smith & Getz

In Problems 15 to 20, guess the requested limits.

Estimate the area in each figure shown in Problems 21 to 26.

©2008 Schreiber, Smith & Getz

©2008 Schreiber, Smith & Getz

LEVEL 2 – APPLIED PROBLEMS AND THEORY

27. What is a mathematical model?

28. Why are mathematical models necessary or useful?

29. An analogy to Zeno’s tortoise paradox can be made as follows.

©2008 Schreiber, Smith & Getz

Isaac Newton Gottfried Leibniz