Vous êtes sur la page 1sur 4

Linear Algebra

Allan Lin
May 8, 2017

## Warning: package 'knitr' was built under R version 3.3.2

Linear Algebra

important type of matrix

• m × n matrix, if m = n, the matrix is called a square matrix.

• If A is a square, and also A = A0 , then A is called a symmetric matrix. [not sure whether the square
condition is necessary here.]
• A diagonal matrix is a square matrix with all non-diagonal elements being zero.
• A square matrix is diagonalizable if there is another matrix P such that P −1 × A × P = D. where D is
a diagonal matrix. [can a non square matrix be diagonalizable?]

– if A is diagonalzable then, the number of non-zero eigenvalues is called the rank of A and the
number of zero eigenvalues of A is called nullity of A.
– in general, (not restrict to diagonalizable matrix) the rank of A is the number of linearly independent
rows of A (columns) and the nullity is the number of linearly dependent rows (or columns)

• A matrix that has an inverse is called a non singular matrix; otherwise it is called a singular matrix.
• we use the result x0 Ax to tell whether A is a definite matrix. the result has to be always positive, for
whatever values in vector x, except when they are all zero.

– a positive definite matrix must always have positive diagonal elements. to extract the diagonal
element of a matrix, you can use some special unit vectors, for instance x=(1 0 0) in the x0 Ax
operation, the scalar result will be the diagonal element of A. the vice versa is not true. all
elements in the diagonal is positive, the matrix may not be positive definite matrix.
– positive definite matrix is important because x0 Ax has to be positive in a lot of finance applications.
such as portfolio variance while w refers to the weights of each investment and A is the covarience
matrix.

• summarize ways to verify whether a matrix is positive definite.

– all eigenvalues of A are positive. A symmetric matrix A is positive definite. [all symmetric matrix?]
– all its principal minors have positive determinant. A symmetric matrix A is positive definite.
– Is there any relationship to say about determinant and eigenvalues? the determinant
of a square matrix is the product of all its eigenvalues. so, we can calculate determinant through
eigenvalues.

• orthogonal matrix W , the matrix is composed of vectors perpendicular to each other. these vectors are
eigenvectors of matrix A. so, not random vectors. A also has to be a symmetric matrix. [So, if
A is not a symmetric matrix, we don’t know whether W is a orthogonal matrix?]

– property of orthogonal matrix: W −1 = W 0 . inverse = transpose. so, it is easy to find the inverse
now.

1
• A, a square, symmetric matrix, W matrix composed of eigenvectors of A. Λ is the diagonal matrix
of eigenvalues of A. So, everything now is expressed in matrix form now.

– if all the eigenvalues of A are non zero and distinct. derive the followings:
– AW = W Λ -> A = W ΛW −1 -> A = W ΛW 0 because W is orthogonal and W −1 = W 0 .
– So, we can decompose A and this process is called spectral decomposition.
– the relationship A = W ΛW −1 says A and Λ are similar matrix. their trace should be the same.
tr(A) is the sum of the diagonal elements. therefore, we can say the sum of the eigenvalues of A
equals to the sum of the diagonal elements of A.

• if A is a positive definite matrix, we can write A = QQ0 , Q is lower triangular matrix.

– Q is the cholesky matrix of A.


– an intuitive way to remember: Q looks like the square root and can only exist if A is a positive
definite number which can be square rooted.
– cholesky decomposition is a special case of LU decomposition. LU applies to square matrices.
cholesky applies to positive definite. So their application scopes are different.

• A = LU , LU decomposition

– this decomposition may not always exist.


– LU can make solving equations easier.

important concepts, linear algebra languages

• a singular matrix has at least one row(or column) that is a weighted sum of some other rows(or columns).
that is, there is a linear relationship between the rows(or columns) and for this reason the rows(or
columns) are called linearly dependent.
• the expression of x0 Ax is called a quadratic form. when you span the operation, very term has order 2
in x variables. there is a pre-requisite: A is a square matrix. [is this pre requisite condition necessary?]

– x0 Ax = x0 Bx while, B = 12 × (A + A0 ), A is a square matrix and B is a related, square AND


symmetric matrix. the result, left and right are the same. A + A0 can generate a symmetric matrix.
You can use a 2x2 matrix as an example for A and see.
– quadratic form is important because portfolio variance is expressed in quadratic forms, where, w
refers to the investment weight. further, variance matrix itself can be expressed in quadratic forms.
V = DCD, V is the variance matrix, D is the diagonal matrix, C is the correlation matrix.
– quadratic form is also linked with definite positive matrix definition because usually we need x0 Ax
to be positive. In order for the result to be positive, A has to be a positive definite matrix.
– correlation matrices is positive when the correlation is not equal to +1 or -1.

• linear transformation: A × b = c, transform vector b to vector c. multi dimension vector is still one line.
line to line, only linear, so call it linear transformation.
• some commonly used terminology in eigen vector and values.

– we call the vector “eigenvector” of A. it has to be associated with a matrix.


– there are multiple vectors belonging to the same eigenvalue. they can just differ by a scalar.
therefore, we introduce the concept representative normalized eigenvector. that is unique.
– the eigenvalue Λ belong to the eigenvector w.

• characteristic equation. it is used to find the eigenvalues. A × w = Λ × w -> (A − Λ × I) × w = 0 , a


non zero w only exist if (A − Λ × I) = 0, meaning (A − Λ × I) is singular. det(A − Λ × I) = 0. here
used the property: determinant and singularity. then, the whole process of calculating determinant is
put in usage to solve for Λ.

2
• PCA

– Motivation:
∗ Finance part: I have varying returns of p yields and I want to understand the fewer q drivers
for these changes. Eventually, I can use a linear combination of q drivers to recompose p yields.
The linear combination is like a linear regression explaining the change in p driven by change
in q.
∗ Math part: We start with p-dimensional vectors, and want to summarize them by projecting
down into a q-dimensional subspace. Our summary will be the projection of the original
vectors on to q directions, the principal components, which span the subspace.
– What is the assumption? The data distribution is unimodal Gaussian in nature. Fully explained by
the first two moments of the distribution i.e. mean and covariance. Information is in the variance
< what is why the variance is important.>
– Key notation:
∗ X a T × n matrix. T number of observations (here refers to returns, one day has one
observation) and n number of variables.
∗ V = T −1 X 0 X refer to the variance of the samples, presented by the matrix. assumption mean
return = 0, therefore x, the returns,in X it can represent the variance already.
∗ Λ is the eigenvalue matrix of V , W are the eigenvectors matrix.
– Derivation rational
∗ The projection of a data vector → −
x onto the line represented by → −
w is →−
x ∗→−
w , which is a scalar.

− →
− →

The actual coordinate is ( x ∗ w ) ∗ w . The projection and original vectors do not coincide
and the difference is the error. We are choosing → −w to minimize the error. Or choose → −w to

− →

maximize the representation of x in terms of w . < This maximizing is the same as the good
representation of original vector in a reduced space, the motivation in using PCA. >
∗ Given the motivation, our goal is to maximize the variance, the representation. The first
principal component is the direction in space along which projections have the largest variance.
The second principal component is the direction which maximizes variance among all directions
orthogonal to the first. The k th component is the variance-maximizing direction orthogonal to
the previous k − 1 components. Using Lagrange multiplier method, we can derive the equation
that V W = ΛW . Thus, desired vector w is an eigenvector of the covariance matrix V , and
the maximizing vector will be the one associated with the largest eigenvalue in Λ.
∗ After finding W , principle component P can be defined by the linear combination of columns
of X. P is the principle component matrix. P = XW . in each individual principle component:
pm = w1m x1 + w2m x2 + ... + wkm xk , wm is the vector in W . w1m is the first element. pm is
the mth vector in P , the projection. ( not very consistent with the (→ −
x ∗→ −
w) ∗ →
−w expression
but take it as it is for now.)
∗ T −1 P 0 P is the variance of the new principle component matrix, and it can be shown it
equals to Λ. Derivation: T −1 P 0 P = T −1 W 0 X 0 XW = W 0 T −1 X 0 XW = W 0 V W = W 0 ΛW =
W 0 W Λ = W −1 W Λ = Λ. So the total variance can be expressed as the eigenvalue matrix.
Also, it means first λ corresponding the biggest change, corresponding to the first eigenvector
and first principle component.
∗ if P = XW , we can also write X = P W 0 , using principle component to represent the original
observations. therefore, principle components become the underlying drivers.
∗ wi is the co-efficient, can be interpreted as the sensitivity of X to P . If variables are highly
correlated in X (move together in the same sign and magnitude), the sensitivity wi to first
few P will be similar, i.e. X will move the same amount given the move in the first Pi .
∗ We don’t know whether T −1 X 0 X = T −1 P 0 P . It doesn’t matter now, because if P can
represent X, the variance in X can be presented by the variance in P .
– Visualization
∗ plot the principle component, you can notice its vol is close to the vol of X.
∗ plot the first eigenvector in the curve example, see the parallel shift impact to the first principle
component. The shift is related to the eigenvector, not the principle component. The first

3
principle component can demonstrate certain volatility. And there is no economic explanation
for the first principle component, which is a purely statistical thing.

How things are all related together

questions to follow up

• why determinants are important? it is also not easy to calculate determinants. is there an easy way to
calculate determinants for a big matrix?

– determinant can decide the singularity of a matrix. the way to calculate determinant can be used
to set up characteristic equation and solve for something else.
– people use determinant to check whether a matrix is positive definite. but since the determinant
calculation process is so long so unintuitive and so complicated, is this testing method widely
used?

• how can we use eigenvalue and eigenvectors to study the properties of a matrix?

– a square matrix is a singular if and only if it has at least one zero eigenvalue. we can use eigenvalues
to understand square matrix’s invertibility.

• what is the purpose of knowing the rank and nullity of a matrix.


• a lot of matrix properties come after some special matrix, diagonal matrix, square matrix, symmetric
matrix, etc. now if in the real life, the matrix is none of the above kind, then we can’t leverage on any
of the properties, how study them? and what can we do with those matrix?

some notes don’t know where to put yet.

learning objectives
• summarize different matrix and their properties and some pre-requisite conditions for some properties.
* why study decomposition?
• how to check whether some matrix is definite positive matrix.
• importance and applications of definite positive matrix.
• meaning and purpose of some definitions, like determinants.
• need to document 1) simple terms 2) matrix terms in the context of a portfolio. start from simple
terms to understand the relationship. then, use matrix forms to represent because we are in a portfolio
context.

Vous aimerez peut-être aussi