Math Course Notes-4

Introductory Mathematics
Linear Algebra
Reference: Gilbert Strang, Introduction to Linear Algebra, Wellesley Cambdrige Press, third Edition. Chapters 1-7.
1.1
1.1.1
Vectors (read Sections 1.1, 3.1, 3.5 of Strangs book)

Rn as vector space
Definition: Rn is the most important example of vector space, its elements

are called vectors and are represented as

x1
x
2
v = .. , x1 , x2 , ..., xn R.
.
xn
Remark: observe that we are using the notation of a column vector and
its transpose is a row vector:
vT = (x1 , x2 , . . . , xn ).
Example: R (real line), R2 (plane, planar vectors), R3 (whole space), etc.
Operations: Rn has two operations, the addition of two vectors:

x1
y1
x 1 + y1
x y x + y
2
2 2 2
v + u = .. + .. = .. .
. . .
xn
yn
xn + yn
and the multiplication of a vector by a scalar R:

x1
x1
x x
2 2
v = .. = .. .
. .
xn
xn
1
Theses properties satisfy the usual distributive and associative laws.

When we combine both operations we say that we form linear combinations of u and v: v + u.
Moreover there is a unique zero vector 0 such that 0+v = v for all v Rn :

0
0

0 = .. .
.
0
Finally, for all v, there exists a unique v such that v + (v) = 0.
Definition: In general, a real vector space is a collection of vectors together
with the rules of vector addition and multiplication by a real number.
Examples: The vector space of all real functions f (x). In this case the
vectors are functions. The vector space that consists only in the zero vector.
We call it the vector space of dimension 0 since it only has the zero element.
1.1.2
Subspaces of Rn
Example: Consider the vector space R3 and a plane through 0. This plane
is also a vector space. We say that it is a subspace of R3 .
Definition: A subspace V of a vector space is a set of vectors that satisfies
the following two conditions:
1. If u, v V then u + v V .
2. If v V and R then v V .
Remark: Observe that these two conditions are equivalent to say that all
linear combinations stay in the subspace.
Remark: Observe that the zero vector is always in the subspace. Therefore
a plane in R3 that does not contain the 0 is never a subspace.
1.1.3
Linearly dependent and independent vectors
Remark: A subspace that contains u and v must contain all linear combinations.
2
Definition: If we choose k vectors v1 ,...,vk the set of all linear combinations

{1 v1 + + k vk ,
1 , ..., k R}
forms a subspace called the subspace spanned by the vectors v1 ,...,vk .

Definition: The dimension of a subspace is the smallest k such that k
vectors span the subspace.
Definition: k vectors v1 ,...,vk in Rn are linearly dependent if there exist
real numbers 1 , ..., k not all zero such that
1 v1 + + k vk = 0.
On the other hand, if the equation 1 v1 + +k vk = 0 has the only solution
1 = = k = 0, then v1 ,...,vk are linearly independent.
Remark: Observe that 1 v1 + + k vk = 0 is a system of n equations
and k unknowns.
Example: Two vectors are linearly dependent if one is multiple of the
other, that is, they lie in the same line.
Example: In R3 , 3 vectors are linearly independent if they do not lie in
the same plane, but if they do then they are linearly dependent.
Elementary properties:
If one of the k vectors v1 ,...,vk is zero, then the vectors are linearly
dependent.
If two vectors are equal in v1 ,...,vk , then they are linearly dependent.
If v1 ,...,vk are linearly independent, then any subcollection is also linearly
independent.
If v1 ,...,vk are linearly dependent, then the family v1 , ..., vk , u is also
linearly dependent, for any u.
v1 ,...,vk are linearly dependent if and only if one of them equals a linear
combination of all others.
If v1 ,...,vk are linearly independent (or dependent) then if we add a
multiple of one of the vectors to another one, the resulting family is also
linearly independent (or dependent).
3
1.1.4
Basis of a subspace
Definition: k vectors v1 ,...,vk form a basis of a subspace V of Rn if they are

linearly independent and they span V .
Example: The n vectors

0
0
1
..
1
0
.

e1 = .. , e2 = .. , ..., en =
0
.
.
1
0
0
form a basis of Rn . This is called the standard or canonical basis of Rn .
Remark: Rn has infinitely many basis.
Theorem: If v1 ,...,vk forms a basis of a subspace V , then every v V
can be uniquely written as a linear combination of the basis. In other words,
there exist unique 1 , ..., k R such that
v = 1 v1 + + k vk .
The 1 , ..., k are called the coordinates of v in the basis.
Theorem: If v1 ,...,vk spans a subspace V , then some subcollection of it is
a basis of V . Therefore, a basis is formed by the smallest number of vectors
that span V and are linearly independent.
Theorem: If V has a basis of k vectors, then any other basis has also k
vectors.
Definition: The number of vectors in a basis is called the dimension of the
subspace.
Example: Rn has dimension n, since the standard basis has n elements.
Change of basis rule: Assume that we have two basis B = {u1 , ..., uk }
and B 0 = {v1 , ..., vk } of the same subspace V . Let w V with coordinates
1 , ..., k in the basis B and coordinates 1 , ..., k in the basis B 0 . Then we
have the following rule:

1
1
.
P .. = ... ,
k
k
where P is the matrix whose columns are the coordinates of each element of
B in the basis B 0 .
4
1.2
1.2.1
Matrices (read Sections 7.1, 7.2, 7.3, 2.4, 2.5, 2.7, 3.2, 3.3, 3.4 of Strangs
book)
Linear transformations in Rn
Definition: Let U Rn and V Rm two subspaces. A transformation

T : U V assigns a vector v V to each vector u U , that is, T (u) = v.
The transformation T is called linear if for all u1 , u2 , u U and R,
T (u1 + u2 ) = T (u1 ) + T (u2 )
and
T (u) = T (u).
U is called the domain and V is called the range or image of T .

Example: rotations of vectors in the plane.
Remark: Observe that the range of T defined as V = {T (u), u U } is a
subspace of Rm .
Definition: The kernel of T is the set of vectors K = {u U : T (u) = 0}.
Remark: The kernel of T is a subspace of U .
Matrix of a linear transformation: We can assign a matrix to every linear
transformation. Consider the standard basis of U (of dimension k) and V
(of dimension `). Then T is completely determined by the values of T on the
standard basis of U . That is, T is completely determined by the matrix with
` rows and k columns, where each columns are the image of the elements of
the standard of U .
Remark: The same T can be represented by other matrices using different
basis of U and V . If B is a basis of U and B 0 is a basis of V , then the matrix
of T in these basis are the coordinates of the image of the elements of B in
the basis B 0 .
Change of basis rule: One matrix can be computed from the other using
0
0
the product rule: T B,B = (P B ,S )1 T S,S P B,S , where P B,S is the matrix with
columns the elements of the basis B, and S denotes the standard basis.
1.2.2
Matrices
Definition: A m n array of numbers is called a matrix. m is the number

of rows, and n is the number of columns and we denote by ai,j the entry in
row i and column j.
Example:

A=

3 2 3
0 1 0
Here A is a 2 3 matrix and a2,1 = a2,3 = 0.

Definition: A n n matrix is called a square matrix.
The zero m n matrix is the matrix that has all entries equal to zero.
Matrix operations: Addition of matrices: If A and B are m n matrices,
then A + B is the m n matrix with entries ai,j + bi,j .
Multiplication by a scalar: If A is am n matrix and R, then the
matrix A is m n with entries ai,j .
Notation: A = (1)A and A B = A + (B).
Remark: The set of m n matrices is a vector space of dimension mn.
Matrix multiplication: We can only multiply two matrices A and B if the
number of columns of A is the same as the number of rows of B. So let A be
a m n matrix and B a n p matrix. Then C = AB is the m p matrix
whose entries are computed as
ci,j = ai,1 b1,j + ai,2 b2,j + + ai,n bn,j .
Properties: (AB)C = A(BC), A(B + C) = AB + AC and (A + B)C =
AC + BC.
Warning: AB 6= BA ! even if both products are meaningful.
Definition: The n n Identity matrix is defined as
1 0 0
0 1 0
In = .. ..
.. .
. . .
0 0 1
Properties: If A is a m n matrix then AIn = A and if B is a n m
matrix then In B = B. Therefore, if A is an n n matrix, AIn = In A = A.
Definition: If A is a m n matrix with entries ai,j then its transpose AT
is the n m matrix with entries aj,i .
Definition: A square matrix is symmetric if AT = A.
6
Properties: (AT )T = A, (A + B)T = AT + B T , (A)T = AT and

(AB)T = B T AT .
Definition: We say that a square matrix A is invertible if there exists a
matrix A1 called the inverse of A such that
AA1 = A1 A = In .
Properties:
1. If A1 exists, it is unique.
2. If A1 exists, then A1 is invertible and (A1 )1 = A.
3. If A and B are invertible then AB is invertible and (AB)1 = B 1 A1 .
4. If A is invertible then AT is invertible and (AT )1 = (A1 )T .
5. (A)1 = 1 A1 if 6= 0.
Definition: A square matrix A is said to be orthogonal if A1 = AT .
Definition: The column rank of a m n matrix A is the number of linearly
independent vector among the columns of A. We define in the same way the
row rank of A.
Remark: The column rank does not change if we add to a column a
multiple of another column or if we multiply by a scalar a column.
Theorem: column rank=row rank.
Definition: The rank of a matrix is its column rank.
1.2.3
Systems of linear equations
Definition: Let A be a m n matrix, x a vector (variable) in Rn and b a

vector (given) in Rm . Then the equation Ax = b is a system of m linear
equations and n variables.
If b = 0 then the system is called homogeneous, otherwise is called inhomogeneous.
Definition: We define the augmented matrix of A the m (n + 1) matrix
whose columns are the columns of A plus the vector b.
7
Theorem: A linear system Ax = b has a solution if and only if the rank

of A equals the rank of the augmented matrix.
Remark: Observe that an homogeneous system always has the solution
x = 0.
Remark: Observe that if the rank of the augmented matrix is n + 1 (that
is, the columns of A and b form a linearly independent family), then there
is no solution to the system Ax = b.
Theorem: Every solution to the system Ax = b can be written as x =
xh + xp , where xh is the solution to the homogeneous system Ax = 0, and
xp is a particular solution of Ax = b.
Corollary: If Ax = b has a solution, then the number of solutions is the
same as the number of solutions to Ax = 0.
Theorem: Ax = 0 has a unique solution (x = 0) if and only if the columns
of A are linearly independent. In other words, if and only if rank(A)= n.
Remark: If m < n (less equations than variables), then there cannot be
more than m linearly independent columns, and the system cannot have a
unique solution.
Theorem: The set of solution of Ax = 0 is a subspace of Rn of dimension
nrank(A).
Theorem: A n n square matrix is invertible if and only of rank(A)= n.
1.3
1.3.1
Determinants (read Chapter 5 of Strangs book)

Properties of the determinants
Definition: The determinant is a function det(A) that associates a number

to every n n square matrix A, and that has the following properties:
1. det(In )= 1,
2. det(a1 , ..., ai , ..., an )= det(A), R,
3. det(a1 , ..., ai + b, ..., an )= det(A)+det(a1 , ..., b, ..., an ), b Rn ,
4. det(A)=0 if two of the columns of A are equal,
where a1 , ..., an are the column vectors of A.
Theorem: These properties uniquely define the determinant.
8
Geometric interpretation: det(A) is the volume of the parallelepiped determined by the column vectors of A.
Properties:
If one of the columns of A is 0, then the determinant of A is 0.
If we swap two columns of A then we change the sign of the determinant.
If we add to a column the multiple of another column, this does not
change the value of the determinant.
If A is triangular then the determinant equals the product of the diagonal
terms. Therefore,

a b
a
b
det
= det
= ad bc.
c d
0 d (bc)/a
det(A)=det(AT ). Therefore, every property involving columns remains
valid for rows.
det(AB)= det(A)det(B).
det(A1 )=
1
det(A) .
Therefore, if A is invertible, then det(A)6= 0.
Theorem: If A is a n n matrix then

det(A) 6= 0 rank(A) = n.
1.3.2
Cofactors, inverse and Cramers rule
Definition: The minor matrix of a n n matrix A is the (n 1) (n 1)

matrix Ai,j obtained deteling row i and column j of A. det(Ai,j ) is called the
minor of A, and det(Ai,i ) are the principal minor. (1)j+i ai,j det(Ai,j ) are
called the cofactors of A. the matrix C that has as entries ci,j equal to the
cofactors of A is called the cofactor matrix.
Theorem: For all j = 1, ..., n, we have the formula
det(A) =
n
X
(1)j+i ai,j det(Ai,j ).
i=1
Remark: Observe that in order to apply this formula we need to choose

some j from 1, ..., n.
Theorem: The rank of a n n matrix A equals the size of the largest
minor matrix with a non-zero determinant.
Formula for the inverse of a matrix: If det(A)6= 0, then
A1 =
1
C T.
det(A)
Remark: This is a simple formula but computationally very inefficient.

In practise it is better to use elementary row operations. That is changing
two rows, multiplying a row by a non-zero scalar, and adding a multiple of
a row to another. Starting form the n 2n matrix (A : In ), we proceed by
elementary row operations until we arrive at (In : A1 ).
Theorem: If A is a n n matrix with det(A)6= 0, then the linear system
Ax = b has a unique solution given by x = A1 b.
Cramers rule: The solution to this system can also be written as
x1 =
d1
dn
, , xn =
,
det(A)
det(A)
where di is the determinant of the matrix obtained replacing column i of A

by the vector b.
1.4
1.4.1
Rn as an Euclidean space (read Section 1.2, Chapter 4 of Strangs book)

The inner product and norm
Definition: The inner product of two vectors u, v Rn is defined as

u v = u1 v1 + + un vn .
Remark: Using matrix notation, u v = uT v.
Properties:
1. u v = v u.
2. (u) v = (u v).
3. (u + v) w = (u w) + (v w).
10
4. u u 0 and u u = 0 if and only if u = 0.

Any operation in a vector space satisfying these properties is called an
inner product, and a vector space equipped with an inner product is called
an Euclidean space.
Cauchy-Schwarz inequality: (u v)2 (u u)(v v).
Definition: The norm of a vector v Rn is defined as
kvk = v v.
Geometric interpretation: The norm of a vector corresponds to the distance to 0.
Properties:
1. kvk = || kvk for all R.
2. ku + vk kuk + kvk.
3. kvk 0 and kvk = 0 if and only if v = 0.
1.4.2
Orthogonality
Definition: Two vectors u and v in Rn are orthogonal if u v = 0.

Geometric interpretation: They form an angle = 90 . In fact
u v = kuk kvk cos().
Definition: if kvk = 1, then v is called a unit vector.
Theorem: If v1 , ..., vk are pairwise orthogonal and different from 0, then
they are linearly independent.
Definition: Any collection of n pairwise orthogonal vectors in an Euclidean
space of dimension n forms a basis called orthogonal basis. If moreover the
vectors are unit vectors the basis is called orthonormal.
Remark: We can always obtain and orthonormal basis from an orthogonal
one by dividing each vector by its norm.
Theorem: Every Euclidean space has an orthogonal basis (therefore, orthonormal). Proof: The Gram-Schmidt process.
Example: The standard basis is an orthonormal basis of Rn .
11
Theorem: If v1 , ..., vn is an orthonormal basis of an n-dimensional vector

space then the coordinates of every vector v in the space with respect to this
basis are
v = (v v1 )v1 + + (v vn )vn .
Projection of a vector onto a subspace: Consider a basis v1 , ..., vk of a
subspace of dimension k in Rn . The vector v in the subspace that is closest
to a given vector b in Rn is called the orthogonal projection of b onto the
subspace. Remark that since v is in the subspace, it can be written in a
unique way as a linear combination of the basis v1 , ..., vk . Therefore, we are
looking for the coordinates of v in the basis.
Solution to this problem: if v1 , ..., vk forms an orthonormal basis of the
subspace, then the orthogonal projection of b onto the subspace is given by
v = (b v1 )v1 + + (b vk )vk .
Least square approximations: Consider a linear system Ax = b that has
no solution. Then we want to find a vector e with minimal kek and such
that the system
Ax + e = b
has a solution. In this case, a solution x is called a least square solution.
Remark: this problem can be seen as a particular case of the projection
of a vector onto a subspace, where the subspace are all the vectors Ax, and
we want to find x that minimizes the norm kAx bk. Then the solution
is the orthogonal projection of b onto the subspace. That is, x such that
(Ax b) Ax = 0,
for all x.
Which is equivalent to solve the linear system of equations

AT b = AT Ax ,
where observe that AT A is a symmetric matrix.
Example: Fit a line to n data points.
12
1.5
1.5.1
Eigenvalues and eigenvectors (read Sections 6.1,6.2,6.4,6.5 of Strangs book)

Introduction to eigenvalues and eigenvectors
Definition: Let A be an n n matrix. If x Rn , x 6= 0 and R are such

that
Ax = x,
then we say that x is an eigenvector of A with eigenvalue .
Remark: The eigenvectors with eigenvalue 0 are the non-zero elements in
the kernel of A.
Property: If x is an eigenvector of A with eigenvalue then cx is also an
eigenvector of A with eigenvalue for all c 6= 0.
The equation for the eigenvalues: since Ax = x (A In )x = 0, this
implies that is an eigenvalue of A the columns of the matrix (A In )
are linearly dependent rank(A In )< n det(A In ) = 0.
Definition: The equation det(AIn ) = 0 is called characteristic equation
of A, and it is a polynomial of degree n. Therefore, there are at most n
eigenvalues for A but some of them may be complex.
Remark: It can be proved that a symmetric matrix has only real eigenvalues.
1.5.2
Diagonalization of a matrix
Definition: We say that an n n matrix A is diagonalizable if there exists

an invertible matrix S and a diagonal matrix D such that
A = SDS 1 ,
which is equivalent to S 1 AS = D.
Application: If A is diagonalizable then we can easily compute its square:
A2 = AA = SDS 1 SDS 1 = SD2 S 1 ,
and D2 is the diagonal matrix where the diagonal are the squares of the
diagonal of D. By induction,
we get that Am = SDm S 1 .
P
k
A
A2
D 1
D
Application: eA =
=
I
+
A
+
n
k=0 k!
2! + = Se S , and e is the
diagonal matrix where the diagonal are the exponential of the diagonal of D.
13
Property: If A is diagonalizable, then det(A)=det(D), which equals the

product of the diagonal elements of D.
Spectral Theorem: A is diagonalizable A has n linearly independent
eigenvectors v1 , ..., vn with eigenvalues 1 , ..., n .
In this case v1 , ..., vn are the columns of S and 1 , ..., n are the diagonal
of D.
Consequence: If A is diagonalizable, then det(A) equals the product of
the eigenvalues.
Remark: If A has n different eigenvalues 1 , ..., n , then the corresponding
eigenvectors v1 ,...,vn are linearly independent. Therefore, in this case it is
diagonalizable.
Remark: S is not unique.
Theorem: A diagonalizable matrix is invertible if and only if all its eigenvalues are non-zero. In this case, A1 = SD1 S 1 , where D1 is the diagonal
matrix with the inverse of the elements in the diagonal of D.
Theorem: If A is a symmetric matrix and 1 and 2 are two different
eigenvalues, then the corresponing eigenvectors x and y are orthogonal.
Spectral Theorem for symmetric matrices: Let A be a n n symmetric
matrix. Then there exists an orthogonal matrix Q and a diagonal matrix
D such that A = QDQ1 . The columns of the matrix Q can be taken as
the orthonormal eigenvectors and the diagonal of D are the corresponding
eigenvalues.
Definition: The trace of a matrix is the sum of the elements in the diagonal.
Propeties: tr(A) = tr(A), tr(AT ) = tr(A), tr(A + B) = tr(A) + tr(B),
tr(In ) = n, tr(AB) = tr(BA).
Property: If A is diagonalizable then the trace of A is the sum of the
eigenvalues.
Definition: A square matrix A is idempotent if A2 = A.
Property: If A is idempotent and is an eigenvalue of A then m are also
eigenvalues of A for all m N.
Theorem: If A is an idempotent matrix, the only eigenvalues of A are zero
or one.
14
1.5.3
Quadratic forms
Definition: A n n symmetric matrix is said to be positive definite if

xT Ax > 0,
for all
x 6= 0.
Negative definite if xT Ax < 0 for all x 6= 0.

Positive semidefinite if xT Ax 0 for all x 6= 0.
Negative semidefinite if xT Ax 0 for all x 6= 0.
Indefinite if both xT Ax > 0 and xT Ax < 0 are possible.
Theorem: A is positive definite if and only if all the eigenvalues of A are
> 0. A is negative definite if and only if all the eigenvalues of A are < 0.
A is positive semidefinite if and only if all the eigenvalues of A are 0. A
is negative semidefinite if and only if all the eigenvalues of A are 0. A is
indefinite if and only A has two eigenvalues of different sign.
Theorem: A is positive definite if and only all the kth leading principal
minor dk are positive for all k = 1, ..., n, where dk is the determinant of its
upper-left k k submatrix.
A is negative definite if and only if (1)k dk > 0 for all k = 1, ..., n.
Definition: A quadratic form Q(x1 , ..., xn ) is a function of the form
T
Q(x1 , ..., xn ) = x Ax =
n
X
ai,j xi xj ,
i,j=1
where A is a n n matrix and x = (x1 , ..., xn )T .

Remark: Since the coefficient of xi xj is ai,j + aj,i , the quadratic form
a +a
doesnt change if we replace both coefficients by i,j 2 j,i , which corresponds
to assume that A is symmetric.
15
Calculus in one variable
References: Domingo Pestana, Jose Manuel Rodrguez, Elena Romera y Ve

nancio Alvarez,
Curso practico de calculo y precalculo, Ariel Ciencia.
Ron Larson and Bruce H. Edwards, Calculus, Brooks Cole.
2.1
Sequences in R (Chapter 5 of Pestanas book)
Definition: A sequence of real numbers assigns a number xn to each natural

number n = 1, 2, 3, ...
Notation: {xn }nN = {x1 , x2 , x3 , ....., xn , ....}
Remark: A sequence may be defined by a formula or by a recursion.
Example: Fibonacci sequence: x1 = x2 = 1, xn = xn1 + xn2 , n = 3, 4, ...
Definition: A sequence is monotone increasing if for every n N, xn+1
xn , and strictly monotone increasing if xn+1 > xn .
A sequence is monotone decreasing if for every n N, xn+1 xn , and
strictly monotone decreasing if xn+1 < xn .
A sequence is monotone if it is monotone increasing or monotone decreasing.
Definition: A sequence if bounded if there exists a real number M > 0
such that for all n N, xn [M, M ].
Definition: A sequence converges to a real number x R if for all > 0
there exists N () > 0 such that for all n > N , |xn x| < .
Notation: limn xn = x.
Definition: A sequence is said to be convergent if there exists x R such
that limn xn = x. Otherwise, is called divergent.
Remark: A divergent sequence may have a limit + or .
Definition: A sequence converges to + (or ) if for all A > 0 there
exists N (A) > 0 such that for all n > N , xn > A (or xn < A).
Notation: limn xn = + or .
Property: If limn xn = x and limn yn = y then
xn
x
lim (xn + yn ) = x + y, lim xn yn = xy, lim
= ,
n
n
n yn
y
the last equality being true only if yn , y 6= 0.
16
Criteria of convergence:
Theorem: If limn xn = limn yn = x, and for all n N, xn an yn ,
then limn an = x.
Remark: This result is also true if x = + or .
Theorem: Every bounded and monotone sequence is convergent.
Remark: Observe that every convergent sequence is bounded.
Definition: A sequence is Cauchy if for all > 0 there exists N () > 0
such that for all n, m > N , |xn xm | < .
Theorem: A sequence is convergent if and only if it is a Cauchy sequence.
Subsequences:
Definition: Let n1 < n2 < n3 < be a strictly monotone increasing
sequence of positive integers. Then the sequence {xnk }kN = {xn1 , xn2 , ...} is
called a subsequence of {xn }.
Property: If limn xn = x, then limk xnk = x for any subsequence.
Theorem (Bolzano-Weierstrass): Every bounded sequence has a convergent subsequence.
Infimum and supremum:
Definition: Let A R be a subset of real numbers. Then the infimum of
A is the number a in R satisfying:
a x for all x A .
for any > 0 there exists x A such that a + > x.
Notation: inf A = a and if no such a exists then inf A = .
The supremum of A is the number a in R satisfying:
a x for all x A .
for any > 0 there exists x A such that a < x.
Notation: sup A = a and if no such a exists then sup A = +.
Definition: inf xn = inf{xn } and sup xn = sup{xn }.
Remark: If a sequence xn is monotone decreasing and bounded below then
limn xn = inf xn .
Subsequential limit:
Definition: A number a is called a subsequential limit of a sequence xn if
there exists a subsequence convergent to a.
17
Definition: Let A be the set of subsequential limits of a sequence xn . Then

the limit inferior of the sequence is the infimum of A and the limit superior
of the sequence is the supremum of A.
Notation: lim inf n xn = inf A and lim supn xn = sup A.
Remark: Observe that if the sequence xn converges to x, then A = {x} so
lim inf xn = lim sup xn = x.
n
Observe also that the limit superior and limit inferior are both subsequential
limits.
2.2
Functions of one variable
2.2.1
Limits of functions (Chapter 8 of Pestanas book)
We consider functions f : A R R or f : R R. A is called the domain

of f and {y R : f (x) = y, x A} is the image or range of f .
Definition: Let f : A R R and let x0 A. We say that y is the
limit of f at x0 if for every sequence of numbers xn A convergent to x0 ,
the sequence f (xn ) converges to y.
Remark: We do note need that x0 A.
Notation: limxx0 f (x) = y.
Properties: If limxx0 f (x) = y1 and limxx0 g(x) = y2 then
lim (f (x) + g(x)) = y1 + y2 ,
xx0
lim f (x)g(x) = y1 y2 ,
xx0
f (x) y1
= ,
xx0 g(x)
y2
lim
the last equality being true only if y2 6= 0 and in some neighborhood of x0 ,

g(x) 6= 0.
Theorem: limxx0 f (x) = y if and only if for all > 0, there exists > 0
such that if |x x0 | < then |f (x) y| < .
Definition: A function is bounded if there exists a real number M > 0
such that f (x) [M, M ] for all x A.
Definition: A function is monotone increasing if for every x < y, x, y A,
f (x) f (y), and strictly monotone increasing if f (x) < f (y).
A function is monotone decreasing if for every x < y, x, y A, f (x)
f (y), and strictly monotone decreasing if f (x) > f (y).
18
2.2.2
Continuity (Chapter 9 of Pestanas book)
Definition: We say that a function f : A R R is continuous at x0 A

if limxx0 f (x) = f (x0 ).
If f is continuous at every point of a set B A, then we say that f is
continuous at A.
Properties: If f, g are continuous at x0 then so are f + g and f g. If
moreover, g(x) 6= 0 in a neighborhood of x0 , then fg is continuous at x0 .
Theorem: If f : [a, b] R is continuous, then f is bounded.
Theorem (Weierstrass): Let f : [a, b] R be continuous and let m =
min[a,b] f (x) and M = max[a,b] f (x). Then there exist two points x1 , x2 [a.b]
such that f (x1 ) = m and f (x2 ) = M . In other words, f achieves its maximum
and minimum.
Corollary: If f : [a, b] R is continuous then it takes any value between
m and M .
Definition: Let f : A R R where A is an interval. Then f is uniformly
continuous if for all > 0 there exists > 0 such that if |x y| < then
|f (x) f (y)| < .
Remark: Clearly, a uniformly continuous function is continuous. Is the
contrary true ? Yes if A = [a, b].
Example: f (x) = x1 is continuous on (0, 1] but not uniformly continuous.
Composition of functions:
Definition: Let g : A B and f : C D, with B C. Then the
composition of f and g is the function h : A D defined as h(x) = f (g(x)),
for all x A.
Notation: h = f g.
Theorem: If f and g are continuous, so is h = f g.
Inverse of a function:
Definition: Let f : [a, b] R be a strictly monotone continuous function
with f (a) = c and f (b) = d. Observe that f takes every value between c and
d. We define the inverse function of f as the function f 1 : [c, d] [a, b]
such that
f 1 (y) = x f (x) = y.
Theorem: f 1 is also continuous.
19
2.2.3
Differentiation (Chapters 10,11,14 of Pestanas book)
Definition: Let f : [a, b] R. The derivative of f at x is defined as

f (t) f (x)
,
tx
tx
f 0 (x) = lim
provided that the limit exists. If the limit exists and is finite, we say that f
is differentiable at x.
Theorem: If f is differentiable at x then f is continuous at x.
Example: The function f (x) = |x| is continuous at 0 but not differentiable
at 0.
Remark: There exist functions which are continuous everywhere but are
not differentiable at any point.
Geometric intepretation: If f is differentiable at x0 , then the equation of
the tangent line to the function f at the point x0 is
y = f (x0 ) + f 0 (x0 )(x x0 ).
That is, the derivative of f at x0 is the slope of this tangent line.
Properties: If f and g are differentiable at x [a, b], then so are f + g, f g
and f /g (the last one provided that g(x) 6= 0), and the derivatives are given
by
(f + g)0 (x) = f 0 (x) + g 0 (x),
(f g)0 (x) = f 0 (x)g(x) + g 0 (x)f (x),
0
0
0
(x)f (x)
fg (x) = f (x)g(x)g
.
2
g (x)
Examples:
1. f (x) = c, then f 0 (x) = 0, where c is a real constant.
2. f (x) = x, then f 0 (x) = 1.
3. f (x) = xn , then f 0 (x) = nxn1 , where n is a positive integer.
4. f (x) = log(x), then f 0 (x) = x1 .
5. f (x) = sin(x), then f 0 (x) = cos(x).
20
6. f (x) = cos(x), then f 0 (x) = sin(x).

Chain rule: Let f : [a, b] R be a continuous function and assume that f
is differentiable at x [a, b]. Let g be defined on the image of f and assume
that g is differentiable at f (x). Then h = g f is differentiable at x and
h0 (x) = g 0 (f (x))f 0 (x).
Application: Let f : [a, b] R be a strictly monotone continuous function
with f (a) = c and f (b) = d. Let g : [c, d] [a, b] be its inverse. Assume
that f is differentiable at x. Then g is differentiable at y = f (x) if and only
if f 0 (x) 6= 0, and in this case
g 0 (y) =
1
f 0 (g(y))
Definition: Let f : [a, b] R. A point x [a, b] is called a local maximum

if there exists > 0 such that f (x) f (y) for all y [a, b] such that
|x y| < . A local minimum satisfies f (x) f (y).
Theorem: If f has a local minimum or maximum at x (a, b) and f is
derivable at x, then f 0 (x) = 0.
Example: f (x) = x3 , f 0 (0) = 0 but 0 is not a local maximum or minimum.
Mean value theorem: If f is continuous on [a, b] and differentiable on (a, b),
then there exists x (a, b) such that
f 0 (x) =
f (b) f (a)
.
ba
In particular, if f (b) = f (a), then there exists x (a, b) such that f 0 (x) = 0
(Rolles theorem).
(a)
Geometric interpretation: f (b)f
is the slope of the secant line connecting
ba
the points (a, f (a)) and (b, f (b)). Thus, the mean value theorem says that
there exists x (a, b), such that the tangent line to f at x is parallel to the
secant line.
Theorem: Consider f differentiable in (a, b). Then f is monotone increasing in (a, b) if and only if f 0 0 in (a, b), and monotone decreasing in (a, b)
if and only if f 0 0 in (a, b). Moreover, f is constant in (a, b) if and only if
f 0 = 0 in (a, b).
21
Hopitals rule: Let f, g be two differentiable functions in (a, b) with g 0 6= 0

in (a, b). Assume that one of both hypothesis is satisfied:
(a) limxa f (x) = limxa g(x) = 0.
(b) limxa f (x) = limxa g(x) = + or .
Then,
f (x)
f 0 (x)
lim
= lim 0 .
xa g(x)
xa g (x)
0
(x)
Remark: a can be + or and limxa fg0 (x)
can be + or .
Derivatives of higher order:
Definition: If f has a derivative f 0 and f 0 is differentiable, we denote its
derivative by f 00 and we call it the second derivative of f . In the same way,
when they exist, we can define higher order derivatives, and f (n) is called the
nth order derivative of f .
Taylors theorem: Let f be defined on [a, b] such that f (n1) is continuous
in [a, b] and differentiable in (a, b) (that is, f (n) exists in (a, b)). Then, there
exists x (a, b) such that
f 00 (a)
f (3) (a)
f (n) (x)
(ba)2 +
(ba)3 + +
(ba)n .
2!
3!
n!
This is called the nth order Taylor expansion of f around a.
Remark: This theorem says that differentiable functions may be locally
(n)
approximated by a polynomial and f n!(a) (b a)n is called the error of order
n of this approximation. For example, if n = 2 and f 00 is bounded, then the
error is of order (b a)2 .
Theorem Let x0 such that f 0 (x0 ) = 0 and assume that f 00 is continuous at
x0 . Then if f 00 (x0 ) < 0, f has a local maximum at x0 and if f 00 (x0 ) > 0, it is
a local minimum.
Definition: A function is called convex if for all a, b and (0, 1)
f (b) = f (a)+f 0 (a)(ba)+
f ((1 )a + b) (1 )f (a) + f (b),

and concave if the inequality is .
Theorem: If f 00 0, then f is convex, and if f 00 0, then it is concave.
Property: If f and g are convex (or concave), then f g is convex (or
concave).
22
2.2.4
Integration (Chapters 12,13 of Pestanas book)
The definite integral:

Definition: Let f : [a, b] R be a bounded function. Consider a partition
of [a, b] into n intervals determined by the points
a = x0 < x1 < < xn = b.
We define the following quantities for i = 1, ..., n,
Mi = sup{f (x) : x [xi1 , xi ]},
mi = inf{f (x) : x [xi1 , xi ]}.
We also define the upper sum and lower sum by

U (f ) =
n
X
Mi (xi xi1 ),
L(f ) =
n
X
mi (xi xi1 ).
i=1
i=1
If there exists a real number I for which

sup L(f ) = inf U (f ) = I,
where the supremum and infumum are taken with respect to all possible
partitions of [a, b], we say that f is integrable in [a, b] and we denote the
definite integral by
Z b
f (x)dx = I.
a
Geometric interpretation: The definite integral of an integrable continuous

positive function in [a, b] equals the area covered by the x-axis and the graph
of f .
Remark: This definition can be extended to infinite intervals or intervals
where f is unbounded. In this case, it is called improper integrals.
Properties: If f and g are integrable functions in [a, b] and c R then
Rb
Rb
Rb
1. f +g is integrable in [a, b] and a (f (x)+g(x))dx = a f (x)dx+ a g(x)dx.
Rb
Rb
2. cf is integrable in [a, b] and a cf (x)dx = c a f (x)dx.
3. If m f (x) M for all x [a, b], then
Z b
m(b a)
f (x)dx M (b a).
a
23
Rb
Rc
Rb
4. If c [a, b] then a f (x)dx = a f (x)dx + c f (x)dx.
Ra
Rb
Notation: b f (x)dx = a f (x)dx.
Theorem: If f is continuous in [a, b] then f is integrable in [a, b]. In fact,
if f is continuous in all but finitely many points of [a, b] then f is integrable
in [a, b].
Fundamental theorem
of calculus: If f is continuous in [a, b], then the
Rx
function F (x) = a f (t)dt is differentiable in [a, b] and F 0 (x) = f (x). F is
called the primitive of f and we have that
Z b
f (x)dx = F (b) F (a).
a
Mean value theorem for integrals: Let f be continuous in [a, b]. Then
there exists z (a, b) such that
Z b
1
f (z) =
f (x)dx.
ba a
Proof: Apply the mean value theorem to the primitive function F .
Remark: If F is a primitive of f then F (x) = F (x) + c is also a primitive
of f .
Terminology:
A primitive is also called the indefinite integral and is deR
noted by f (x)dx = F (x) + x.
Basic primitives:
R
1. 1dx = x + c.
R
n+1
2. xn dx = xn+1 + c, n N.
R
3. x1 dx = ln |x| + c.
R
4. ex dx = ex + c.
R
5. sin(x)dx = cos(x) + c.
R
R
R
6. (f (x) + g(x))dx = f (x)dx + g(x)dx.
R
R
7. cf (x)dx = c f (x)dx.
24
Methods of integration:
Integration by substitution: Assume that f is continuous and let F be
its primitive. Recall that if (t) is a differentiable function, then F is
differentiable and
dF ((t))
= F 0 ((t))0 (t).
dt
But since F 0 = f , we get
dF ((t))
= f ((t))0 (t).
dt
Therefore, if x = (t),
Z
F (x) = F ((t)) =
f ((t)) (t)dt =
Z
f (x)dx.
In many cases, the first integral can be computed much easier that
Formally, we substitute x by (t) and dx by 0 (t)dt.
Integration by parts: Since (f g)0 = f 0 g + g 0 f , we have that
Z
Z
0
f (x)g (x)dx = f (x)g(x) f 0 (x)g(x)dx.
3
f (x)dx.
Calculus in several variables and optimization
Reference: Besada, Garca, Miras y Vazquez, Calculo de varias variables,

Prentice Hall, 2001.
3.1
Functions of several variables
Sequences in Rn : {xn }nN where each xn Rn .

Definition: A sequence if bounded if there exists a real number M > 0
such that for all n N, kxn k M .
Definition: A sequence converges to x Rn if for all > 0 there exists
N () > 0 such that for all n > N , kxn xk < .
Functions in Rn : f : Rn R. We say that a function f is continuous at
x0 Rn if limxx0 f (x) = f (x0 ), that is, for any sequence xn in Rn convergent
to x0 , the sequence f (xn ) converges to f (x0 ).
25
Differentiation in Rn : Definition: A function f : Rn R is differentiable

at x0 Rn if there exists a linear transformation T : Rn R such that
f (x0 + h) f (x0 ) T (h)
= 0.
h0
khk
lim
In this case Df (x0 ) = T .

Example: If f is a linear transformation, then Df (x0 ) is a constant transformation.
Matrix of T : Consider the standard basis e1 ,...,en in Rn . Then the columns
of the matrix of T in this basis are T (e1 ),...,T (en ), thus we obtain a 1 n
matrix (vector). This vector is called the gradient of f at x0 and is denoted
f (x0 ).
Directional derivative: Let u be a unit vector of Rn . The directional
derivative of f : Rn R at x0 Rn in the direction of u, if it exists, is
defined as
f (x0 + tu) f (x0 )
.
Du f (x0 ) = lim
t0
t
Partial derivatives: The kth partial derivative of f : Rn R at x0 Rn
is the directional derivative of f at x0 in the direction of ek , the kth element
f
of the standard basis of Rn . We denote it by x
(x0 ).
k
Theorem: If f is differentiable at x0 , then all partial derivatives of f at x0
exist and

f
f
f (x0 ) =
(x0 ), ...,
(x0 ) .
x1
xn
Theorem: Du f (x0 ) = f (x0 ) u.
Geometric interpretation: If f (x0 ) 6= 0, then Du f (x0 ) is the orthogonal
projection of f (x0 ) in the direction of u. The maximum value of the directional derivative is attained when f (x0 ) and u are in the same direction.
Definition: The Hessian matrix of a twice differentiable function f : Rn
2
f
R at a point x0 is the matrix with entries xi x
(x0 ).
j
Theorem: If f has a local maximum or minimum at x0 then f (x0 ) = 0.
Theorem: Let x0 such that f (x0 ) = 0. If the Hessian matrix at x0 is
positive definite, then f has a local minimum at x0 . If it is negative definite,
then it is a local maximum.
26

Math Course Notes-4

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Math Course Notes-4

Transféré par

Droits d'auteur :

Formats disponibles

Introductory Mathematics

Vectors (read Sections 1.1, 3.1, 3.5 of Strangs book)

Definition: Rn is the most important example of vector space, its elements

Theses properties satisfy the usual distributive and associative laws.

Linearly dependent and independent vectors

Definition: If we choose k vectors v1 ,...,vk the set of all linear combinations

forms a subspace called the subspace spanned by the vectors v1 ,...,vk .

Definition: k vectors v1 ,...,vk form a basis of a subspace V of Rn if they are

Definition: Let U Rn and V Rm two subspaces. A transformation

U is called the domain and V is called the range or image of T .

Definition: A m n array of numbers is called a matrix. m is the number

Here A is a 2 3 matrix and a2,1 = a2,3 = 0.

Properties: (AT )T = A, (A + B)T = AT + B T , (A)T = AT and

Systems of linear equations

Definition: Let A be a m n matrix, x a vector (variable) in Rn and b a

Theorem: A linear system Ax = b has a solution if and only if the rank

Determinants (read Chapter 5 of Strangs book)

Definition: The determinant is a function det(A) that associates a number

Therefore, if A is invertible, then det(A)6= 0.

Theorem: If A is a n n matrix then

Cofactors, inverse and Cramers rule

Definition: The minor matrix of a n n matrix A is the (n 1) (n 1)

(1)j+i ai,j det(Ai,j ).

Remark: Observe that in order to apply this formula we need to choose

Remark: This is a simple formula but computationally very inefficient.

where di is the determinant of the matrix obtained replacing column i of A

Rn as an Euclidean space (read Section 1.2, Chapter 4 of Strangs book)

Definition: The inner product of two vectors u, v Rn is defined as

4. u u 0 and u u = 0 if and only if u = 0.

Definition: Two vectors u and v in Rn are orthogonal if u v = 0.

Theorem: If v1 , ..., vn is an orthonormal basis of an n-dimensional vector

Which is equivalent to solve the linear system of equations

Eigenvalues and eigenvectors (read Sections 6.1,6.2,6.4,6.5 of Strangs book)

Definition: Let A be an n n matrix. If x Rn , x 6= 0 and R are such

Definition: We say that an n n matrix A is diagonalizable if there exists

Property: If A is diagonalizable, then det(A)=det(D), which equals the

Definition: A n n symmetric matrix is said to be positive definite if

Negative definite if xT Ax < 0 for all x 6= 0.

where A is a n n matrix and x = (x1 , ..., xn )T .

Calculus in one variable

References: Domingo Pestana, Jose Manuel Rodrguez, Elena Romera y Ve

Sequences in R (Chapter 5 of Pestanas book)

Definition: A sequence of real numbers assigns a number xn to each natural

Definition: Let A be the set of subsequential limits of a sequence xn . Then

Functions of one variable

Limits of functions (Chapter 8 of Pestanas book)

We consider functions f : A R R or f : R R. A is called the domain

the last equality being true only if y2 6= 0 and in some neighborhood of x0 ,

Continuity (Chapter 9 of Pestanas book)

Definition: We say that a function f : A R R is continuous at x0 A

Differentiation (Chapters 10,11,14 of Pestanas book)

Definition: Let f : [a, b] R. The derivative of f at x is defined as

6. f (x) = cos(x), then f 0 (x) = sin(x).

Definition: Let f : [a, b] R. A point x [a, b] is called a local maximum

Hopitals rule: Let f, g be two differentiable functions in (a, b) with g 0 6= 0

f ((1 )a + b) (1 )f (a) + f (b),

Integration (Chapters 12,13 of Pestanas book)

The definite integral:

mi = inf{f (x) : x [xi1 , xi ]}.

We also define the upper sum and lower sum by

If there exists a real number I for which

Geometric interpretation: The definite integral of an integrable continuous

Calculus in several variables and optimization

Reference: Besada, Garca, Miras y Vazquez, Calculo de varias variables,