Chapter 01

INTRODUCTION
1
A Statistical Approach In Cosmology
According to the theory of Inflation, quantum primordial fluctuations

left imprints in the structure of the Cosmic Microwave Background
and in the Distribution of Galaxies. Due to the stochastic nature of
quantum processes, our Universe is just one of all the possible Uni-
verses that could have arisen from Inflation.
Cosmological theories allow us to make statistical predictions about
the cosmological (random) fields of the Universe, but not about the
details of our particular realization[Guido Walter Pettinari]. For ex-
ample we don’t know that a given galaxy will form at a specific ’point’
in space and time or that a specific patch of the Cosmic Microwave
Background will have some fixed temperature[Verde]. But we can
predict average statistical properties of the cosmological fields, ob-
serving only a particular realization of them.
In the next sections we will acquire some tools and language to
treat properly cosmological problems.
1.1 pills of probability
There are two school of thoughts that define probability:
• Frequentists: probability is the limiting value of the number of

successes in a sequence of trials
• Bayesians: probability is a measure of the degree of belief in a

proposition based on our available knowledge
We will take the second as our definition of probability, trying to

quantify our uncertainty about the world.
Let A, B, C, ... denote propositions(e.g. that it will be sunny tomor-
row). Let Ω be a sample space of the possible outcomes of an experi-
ment(e.g. it will be sunny, it will rain, etc...).
Some notation :
• Ā denotes the proposition that A is false
• I is the relevant background information at hand, i.e. any rele-

vant information that is assumed to be true
• prob( A| B, I ) is the probability of A happening given that B(and

I) has happened
2 introduction
• prob( A, B| I ) = prob( B, A| I ) is the (joint) probability that both

A and B are true (given I).
The two basic rules of probability theory are:

The sum rule:
prob( A| I ) + prob( Ā| I ) = 1
and the product rule:
prob( A, B| I ) = prob( A| B, I ) × prob( B| I ) (1.1)
From the product rule it is possible to derive Bayes’ theorem
prob( B| A, I ) × prob( A| I )
prob( A| B, I ) = (1.2)
prob( B| I )
that in the third chapter will become our foundation to the estima-
tion of signals.
Finally we introduce the concept of marginalization. The probabil-
ity that A is true, irrespective of the results of B, is given by1
Z
prob( A| I ) = p( A, B)dB (1.3)
1.2 random fields
1.2.1 Random variables
Let Ω be a sample space of the possible outcomes of an experiment.

A (continuous) random variable X : Ω → R is a function that assigns
a number for each element of Ω. We will assign a probability density
function (pdf) to X, defined over the space of numbers:
p X ( X = x )dx ≡ p( x )dx (1.4)
with dx the Lebesgue measure on the space R and
• p( x ) ≥ 0
R
• p( x )dx = 1
The pdf gives the probability p( x )δx to find x < X ≤ x + δx as

δx → 0.
1 There is also a discrete version of it. But because we will deal only with continuous
variables the one with the integration is given.
1.2 random fields 3
Some definition
Any expectation value of some function f ( X ) defined over R can be
obtained directly from the pdf
Z
h f ( X )i = E[ f ( X )] = f ( x ) p( x )dx . (1.5)
In general, R could be any other set of values A2 (e.g. Rn ).

Let’s take two random variables X1 and X2 , with set values A1
and A2 respectively. We can think of ( X1 , X2 ) as a random variable
taking values in (a subset of ) A1 × A2 . The distribution of ( X1 , X2 ) is
called the joint probability distribution p X1 X2 ( x1 , x2 ), while the distri-
butions of X1 and of X2 are the marginal distributions. For example
the marginal distribution of X2 is
Z
p X2 ( x 2 ) = p X1 X2 ( x1 , x2 )dx1 . (1.6)
These definitions can be generalized to several random variables.

We also introduce the conditional probability density, which is com-
puted by considering only the cases where X2 takes the fixed value
x2 ,
p X1 X2 ( x 1 , x 2 )
p X1 | X2 ( x 1 | x 2 ) = R . (1.7)
p X1 X2 ( x1 , x2 )dx1
We say that X1 and X2 are statistically independent if
p X1 | X2 ( x 1 | x 2 ) = p X1 ( x 1 ) . (1.8)
In this their joint pdf is simply p X1 X2 ( x1 , x2 ) = p X1 ( x1 ) p X2 ( x2 ).
Moments and Cumulants

The characteristic function of the random variable X is
Z
χ X (u) = heiuX i = p X ( x )eiux dx (1.9)
which is simply the Fourier transform of the pdf. From this we can
define the n−th moment µn of X, when it exists, by
1 dn χ X (u)
µn = h X n i = | u =0 (1.10)
in dun
The n−th cumulants of X are instead
1 dn lnχ X (u)
κn = | u =0 (1.11)
in dun
2 Which has to be measurable.
4 introduction
If the n−th moments and n−th cumulants are defined for each n
then it is possible to perform a McLaurin expansion of χ X and lnχ X
∞
(iu)n
χ X (u) = 1 + ∑ n! µn (1.12)
n =1
∞
(iu)n
lnχ X (u) = ∑ n! κn (1.13)
n =1
The first n cumulants can be related to the first n moments (and

viceversa). Here there are the relations up to order 6[Lacasa - 2014
- Non-Gaussianity and extragalactic foregrounds to the Cosmic Mi-
crowave Background.pdf]:
κ 1 = µ1 , µ1 = κ 1 (1.14)
κ 2 = µ2 , µ2 = κ 2 (1.15)
κ 3 = µ3 , µ3 = κ 3 (1.16)
κ4 = µ4 − 3µ22 , µ4 = κ4 + 3κ22 (1.17)
κ5 = µ5 − 10µ3 µ2 , µ5 = κ5 + 10κ3 κ2 (1.18)
κ6 = µ6 − 15µ4 µ2 − 10µ23 + 30µ32 , µ6 = κ6 + 15κ4 κ2 + 10κ32 + 15κ23

(1.19)
µ1 is the mean of X. Usually κ2 is denoted as σ2 , the variance. On

the other hand, the skewness is defined as κ3 /κ23/2 = κ3 /σ3 and the
(excess) kurtosis is κ4 /σ4 .
The probability density function can be obtained from the charac-
teristic function with an inverse Fourier transform. The characteristic
function can be determined completely by its cumulants or moments.
So the same it is true for the pdf. In practice we will measure only
a finite number of moments and truncate the expansion. This is an
important fact to be kept in mind for section 1.5.
1.2 random fields 5
example: the 1-d gaussian distribution This is

1 ( x − µ )2
N ( x |µ, σ2 ) = √ exp(− ) (1.20)
2πσ 2σ2
where µ and σ2 are two parameters. Their meaning can be deduced

by calculating the cumulants:
κ1 = µ (1.21)
κ2 = σ2 (1.22)
κn = 0 ∀n ≥ 3 (1.23)
Equation (1.23) is really important. It tells us that if a random

variable distribution is Gaussian then all cumulants of higher order
than three are zero. If this doesn’t happen we have a non-Gaussian
pdf.
On the other hand, the moments of higher order are non-zero for
both Gaussian and non-Gaussian distributions, but for the former
case they are specified by the first two cumulants ((see (1.14) and
following) ).
1.2.2 Random Fields
A n−dimensional random field q(~x ) is a collection of random vari-

ables, indexed by a continuous (real) space M of dimension n, char-
acterised by a probability functional P [q̂(~x )], linked to the probability
for the occurrence of a particular field configuration of the field (real-
ization).
A realization of the field, q̂(~x ) is a deterministic function of the po-
sition ~x3 , that represents one of the possible outcomes of the random
field[Guido Walter Pettinari].
We will take the q’s from the space of square integrable functions
defined at each point of M, with a scalar product and a norm.
The probability density functional has the following three proper-
ties[Matarrese]:
1. it is semi-positive definite
2. it can be normalized to unity4
3. it goes to zero as q → ±∞
3 And
R eventually of a time t that we omit for clarity
4 D [q(~x )]P [q̂(~x )] = 1 , where the integral is a path integral over field configurations.
6 introduction
The statistical (ensemble) average5 of some functional F [q] can be

obtained as6 [Matarrese]:
Z
hF [q]i ≡ D[q]F [q]P [q] (1.24)
The meaning of the ensemble average is that we consider several,

many realizations (similarly prepared systems) beloning to a large
set, the ensemble. We measure what it is inside the bracket for
each realization(e.g.Universe) and then averge it over many realiza-
tions(Universes).
The n−point correlation functions

Given the three conditions above, we can introduce a generating func-
tional7 (also known as partition function)
Z Z
Z [ J (~x )] = D[q(~x )]P [q(~x )]exp[i d~x J (~x )q(~x )] (1.25)
and define the N −point disconnected correlation functions D ( N ) as

the coefficients of its McLaurin expansion [bertinoro]:
∞
iN
Z Z
Z [ J (~x )] = 1 + ∑ N!
d~x1 ... d~x N D ( N ) (~x1 , .., ~x N ) J (~x1 )...J (~x N ) .
N =1
(1.26)
Or in other words by functional differentiation
δN Z [ J]
D ( N ) (~x1 , .., ~xn ) = (−i ) N | J =0
δJ (~x1 )...δJ (~x N )
But this is
δN Z [ J]
Z
D [q]P [q]q(~x1 )...q(~xn ) = (−i ) N | J =0
δJ (~x1 )...δJ (~x N )
In the end
Z
(N)
D (~x1 , .., ~xn ) = D [q]P [q]q(~x1 )...q(~xn ) ≡ hq(~x1 )...q(~xn )i (1.27)
Similarly, we can define the connected correlation functions as the

coefficients of the generating functional of (connected) correlation
functions
∞
in
Z Z
lnZ [ J (~x )] = ∑ d~x1 ... d~xn C (n) (~x1 , .., ~xn ) J (~x1 )...J (~xn ) (1.28)
n =1
n!
5 Denoted by the bracket.
6 Formally inside each argument of the integral it should be present q̂(~x ). We omit for
brevity.
7 See analogy with the characteristic function of random variables
1.2 random fields 7
denoted also as8
hq(~x1 ), .., q(~xn )ic = C (n) (~x1 , .., ~xn ) (1.29)
As a pdf of a random variable is fully specified by its moments or

cumulants, a random field with a probability distribution is specified
by its correlation functions(connected or disconnected).
1.2.3 Statistical Homogeneity and Isotropy in Cosmology
We have developed some theoretical tools to compute quantities av-

eraged over the ensemble of all possible Universes of our ’true’ Uni-
verse. Practically these averages are impossible to do since our ob-
servations can’t probe entirely the single realization in which we live
but only a part of it. But we would like to have information about the
whole Universe. How can we link data to the underlying theory?
A fundamental assumption in Cosmology is the statistical cosmo-
logical principle: on large scales9 (statistically speaking) the Universe
is homogeneous and isotropic. What does it mean?
A random field q(~x ) is statistically homogeneous if ∀n the joint
multipoint probability distribution functions P(q1 , q2 , ..., qn )10 are in-
variant under translation (by the same vector) of ~x1 , ~x2 , ..., ~xn [LSS-
0112551.pdf]. For example this implies that ξ = hq(~x1 )q(~x2 )i depends
only on the relative positions.
From this assumption follows an important fact. The connected
correlation functions hq(~x1 ), .., q(~xn )ic vanish if at least two points be-
long to casually disconnected regions of the Universe11 [Guido Walter
Pettinari]. Therefore, if we take widely separated parts of the Uni-
verse, each one has its own set of connected correlation functions.
Assuming statistical homogeneity these will coincide with those of
any other region. This means that different regions of our Universe
has the same statistical properties (e.g. same averages). This is closely
related to the fair sample hypothesis: well separated regions of the
Universe are independent realizations of the same physical process
(they are realizations inside the realization) and in the observable part
of the Universe there are enough independent samples to be repre-
sentative of the statistical ensemble(of all possible Universes)[Verde].
Then, taking spatial averages over many regions is the same as taking
expectations over the ensemble.
The random field q(~x ) is statistically isotropic if there is invariance
of P(q1 , q2 , ..., qn ) under global spatial rotations(~x → R~x). If we add
8 Note the commas to indicate that we are not doing averages.

9 Scales much greater than the size of gravitationally collapsed struc-
tures[http://www.astro.ufl.edu/~vicki/galaxies/cosmologynotes.pdf]
10 Shorthand notation qi = q(~xi )
11 That exist due to the finite speed of light.
8 introduction
this to the previous example this means that ξ will depend only on
the modulus of the relative positions12 . This assumption is useful
when we do averages of statistical quantities. [Komatsu phd], [Hu]
A simplification
When we do practical observations, we produce pixelised maps with

a finite number of pixels. So that the index space gives us a finite num-
ber of indices. From now on, for cosmological applications we will
consider a random field as a (large) random vector ~ X = ( X1 , ..., XD )T
with dimension D.
1.3 random fields on a sphere
Our focus will be on the random field f = Tof the Cosmic Microwave
Background. When we take observations of it from the sky, we only
sample a projection on the celestial sphere at the time of last scatter-
ing of the photons. In other words we deal with 2 dimensional maps.
Here we will describe a convenient spherical decomposition to study
a 2-D random field.
The idea is to build our theory on the spherical analog of the
Fourier transform, the spherical harmonics13 .
As said, we will consider the space of square integrable functions.
The spherical harmonics form a basis for this space and allow us to
express f through a modal expansion (the index is n̂ = (θ, φ))
∞ m=+l
f (θ, φ) = ∑ ∑ alm Ylm (θ, φ) . (1.30)
l =0 m=−l
This resembles the 2D Fourier transform

∞ ∞
f (x) = ∑ ∑ Anx ny einx kx einy ky
n x =0 n y =0
12 A random field which is homegenous and isotropic is generally called station-

ary[Cosmological Perturbations].
13 Spherical harmonics arise also when we study the Shroedinger equa-
tion solutions for the hydrogen atom. Formally they are the position-
space representation of the eigenstates of L̂ = −∇2 and L̂z =
−i∂φ [Anthony_Challinor_CMB_supplementary_notes_jun13.pdf]:
∇2 Ylm = −l (l + 1)Ylm
∂φ Ylm = imYlm
with l ∈ N and m an integer |m| ≤ l.

1.4 gaussian distribution 9
The Ylm ’s are complete and as the Fourier transform, respect an

orthonormality relation on the sphere
Z π Z π Z
Ylm (θ, φ)Yl 0 m0 (θ, φ)sinθdθdφ = dΩYlm (n̂)Yl 0 m0 (n̂) = δll 0 δmm0
−π 0
(1.31)
From here, and from completeness, the (complex) coefficients of

the expansion (27) are obtained by
Z Z π Z π
alm = dΩ f (n̂)Yl∗0 m0 (n̂) = dθdφsinθ f (θ, φ)Yl 0 m0 (θ, φ) (1.32)
−π 0
The index l is an integer associated with the number of spatial

oscillations (nodes) in the θ direction in spherical coordinates, and
the integer m in the φ direction (|m| ≤ l)[CMB.pdf].
Example
• l = 0 has zero oscillation in the θ direction, it’s a monopole,
constant over the whole sphere.
• l = 1 has one full oscillation over the sphere in the θ direction,

it’s a dipole.
• Higher l values gives us more spatial oscillations over the sphere

surface and therefore smaller wavelength(or higher frequency).
For more insights see [CMB.pdf].
1.4 gaussian distribution
The Gaussian distribution, also know as Normal distribution, for a

real random vector ~
X is
1 1 1
N ( ~X |~µ, Σ) = X − ~µ)T Σ−1 ( ~
exp[− ( ~ X − ~µ)] (1.33)
(2π ) D/2 |Σ| 1/2 2
where Σ is a D × D covariance matrix, |Σ| its determinant, ~µ is a

D −dimensional vector (mean) and T is the operation of transposition.
Without loss of generality we can take the covariance matrix sym-
metric, because any asymmetric component would disappear from
the exponential14 .
Defined in this way, the Normal distribution respects the two re-
quirements for good probability distributions:
X |~µ, Σ) ≥ 0
• N (~
14 Furthermore, for the Gaussian distribution to be well defined, it is necessary that Σ
is also positive definite[Bishop].
10 introduction
N ( ~X |~µ, Σ)d ~X = 1
R
•
Other properties of the Gaussian distribution:
• E[ ~
X ] = ~µ, so ~µ is the mean of the distribution
• E[ ~
X~X T ] = ~µ~µ T + Σ which groups all the second order moments
X ] = E[( ~
• cov[ ~ X − ~µ)T ] = Σ
X − ~µ)( ~
• If the covariance matrix is diagonal, the components of ~X are

independent. In other words, in the Gaussian case uncorrela-
tion means independence15 . A particular case of this is when
Σ = σID× D , known as isotropic covariance. If the surfaces of
constant density were plotted we would obtain spherical sur-
faces.
There is a very remarkable result that makes Gaussian distributions
very special[https://www.physik.uni-muenchen.de/lehre/vorlesungen/sose_14/asp/ex
Wick’s/Isserlis’ Theorem : For a Gaussian random field q the fol-
lowing relation holds (qi = q(~xi ))
< qi1 qi2 ....qin >= ∑ < q j1 qk1 > .... < q jn/2 qkn/2 > (1.34)
I
where the sum is over all possible pairings of i1 , ...., in into pairs
( j1 , k1 ), ...., ( jn/2 , k n/2 ) (n is even).
TOLGO?
Proof Sketch Let’s start from the following partition function with a
source term J and a symmetric, positive definite matrix H(pack qi ’s
in ~q)
1
Z
2∑
Z [ J] = exp{− Hij qi q j + ∑ Ji qi }d~q
i,j i
where we dropped all normalization costants. R So assume, Z [0] = 1.

Note that ∑i,j Hij qi q j = ~q · H~q. If < qq qr >= qq qr exp{− 12 ∑i,j Hij qi q j }d~q
let’s show that
∂ ai
< ∏ qiai >= ∏( ) Z [ J ]| J =0 (1.35)
i i ∂Jiai
Indeed
∂ ai ∂ ai 1
Z
Z [ J ] = exp{− ~q · H~q}exp{∑ Ji qi }d~q =
∂Jiai ∂Jiai 2 i
1
Z
= exp{− ~q · H~q}qiai exp{∑ Ji qi }d~q
2 i
15 Remember that independence means uncorrelation, but the converse is not true for
general non-normal distributions .
1.4 gaussian distribution 11
Then J = 0 =⇒
1
Z
exp{− ~q · H~q}qiai d~q =< qiai >
2
by definition. Because of the independence of the derivatives with
respect to the Ji ’s the result follows directly.
Now we expand ( H~q − ~J ) · H −1 ( H~q − ~J ) = ~q · H~q − ~J · ~q
We can then rewrite Z [ J ] as
1 1
Z
Z[ J ] = exp{− ( H~q − ~J ) · H −1 ( H~q − ~J )}exp{ ~J · ~q}d~q =
2 2
1 1 1
Z
= exp{− ~y · H −1~y}| H −1 |exp{ ~J · H −1~y}exp{ ~J · H −1~J }d~y =
2 2 2
1 1
= exp{ ~J · H −1~J } = exp{ ∑ Hij−1 Ji Jj } (1.36)
2 2 i,j
Then we see that for example
∂ ∂ 1
< qi q j >= exp{ ~J · H −1~J }| J =0 = Hij−1 = Hji−1
∂Ji ∂Jj 2
∂ 1
< qi >= exp{ ~J · H −1~J } = 0
∂Ji 2
With the Taylor expansion around zero of Z [ J ] and using (0.2)
+∞ +∞ n n
1 ∂n 1
Z[ J ] = ∑ ∑ 1 n ∂Ji ...∂Ji
J i1
...J in
Z [ J ]| J =0 = ∑ ∑ ∏ ik ∏ Jik =< q >
n=0 n! i ,..,in
1 1 n n=0 n! i ,..,in
1 k =1 k =1
1 1
= 1 + ∑ < qi > Ji + ∑ < qi q j > Ji Jj + ∑ < qi q j qk > Ji Jj Jk + ....
i
2 i,j 6 i,j,k
where each i j , has values from 1 to D.

Now the exponential form (0.3) can be written as
+∞
1
Z[ J ] = ∑
n=0 n!
Comparing the coefficients of Ja Jb , Ja Jb Jc Jd ,etc... for the two expan-

sions the result follows. End of the Proof
So, the odd N −point (disconnected) correlation functions of a Gaus-
sian are zero while the even ones are all derivable from the two-point
12 introduction
one, in the sense that correlation functions are decomposable as prod-

ucts of the 2−point correlation function, thus determining the statis-
tical properties of the random field16 .
The Gaussian distribution arises in many physical applications. One
of the most important is when we consider the sum of multiple iden-
tically distributed and independent random fields(each one with its
distribution). In this case the central limit theorem applys: as the
number of fields increases, their sum, which itself is a random field,
has a distribution that becomes increasingly Gaussian[Bishop]. This
is connected for example with cosmology: in linear perturbation the-
ory where we have a large number of cosmological fluctuations evolv-
ing independently, we can expect, based on the central limit theorem,
that the Universe will obey a Gaussian distribution[378203.pdf]. In
the next chapter we will see also the CMB example (with inflation).
1.5 non-gaussianity
If the fluctuation is Gaussian, then the two-point correlation function

is what we all need to specifiy the Gaussian distribution. On the other
hand, if we need higher-order correlation functions to determine the
statistical properties of our distribution we are dealing with a Non-
Gaussian pdf. Usually in Cosmology the Gaussian distribution is
enough. But when this is not the case, fortunately we are still, in
some sense, close to a Normal distribution.
1.5.1 Nearly Gaussian Distributions
As said, in cosmology it is common to deal with fields that depart

only weakly from a Normal distribution [LSS-0112551.pdf]. In many
applications, to extract useful information on the underlying physi-
cal processes, it is more interesting to measure the deviations of a
probability density function (pdf) from the Normal distribution than
to prove that it is close to the Gaussian one. For example, the three-
point function, also known as the bispectrum, is a sensitive test for
a non-Gaussian contribution to the fluctuation spectrum since it is
precisely zero in the Gaussian limit. It is possible to find a number
of approximations to express the local nearly gaussian pdf in another
ways[9711239v1.pdf]. Here we will deal with one approximation of
quasi-Gaussian distributions using the EdgeWorth Expansion.
The EdgeWorth expansion expresses one probability density in terms
of another by the use of higher-order statistics.
the 1-pdf edgeworth expansion Le’ts consider a random vari-

able X with a probability density function f ( X = x ) = f ( x ) and Z ( x )
16 One of the most important examples is the Cosmic Microwave Background two-
point function.
1.5 non-gaussianity 13
a Gaussian density function, both with the same mean and variance.
f
If the cumulants κi and κiZ , of f ( x ) and Z ( x ) respectively, become
’close’ sufficiently quickly as i increases [EdgePeter] then we say that
f ( x ) is close to Gaussian. It seems natural then to use the expansion
∞
dn Z ( x )
f (x) = ∑ cn dx n
(1.37)
n =0
We now investigate a family of orthogonal polynomials which form

a natural basis for the expansion of f ( x ). We say that two polynomi-
als Pn ( x ) and Qn ( x ) of degress n 6= m are orthogonal on the real axis
with respect to a weight function w( x ) if
Z +∞
Pn ( x ) Qm ( x )w( x )dx = 0, n 6= m
−∞
For example a weight function could be w( x ) = wCH ( x ) = exp(− x2 /2)

exp(− x2 /2)
proportional to the Gaussian function Z ( x ) = √
2π
. In this case
the associated polynomials are given by the Rodrigues’ formula
2 /2 dn − x2 /2
Hen ( x ) = (−1)n e x e (1.38)
dx n
These are called Chebyshev-Hermite polynomials. Examples are
He0 ( x ) = 1
He1 ( x ) = x
He2 ( x ) = x2 − 1
He3 ( x ) = x3 − 3x
Instead of wCH we could use w H ( x ) = exp{− x2 } proportional to

Z2 ( x ). In this case we obtain the Hermite polynomials17
2 d n − x2
Hn ( x ) = (−1)n e x e (1.39)
dx n
From the definition of Z ( x ) and Hen ( x ) we have that
dn Z ( x )
= (−1)n Hen ( x ) Z ( x ) (1.40)
dx n
17 Chebyshev-Hermite polynomials are also called the probabilists’ Hermite functions,
while Hermite polynomials the physicists’ Hermite function. The two are linked by
a rescaling. See Wikipedia.
14 introduction
Plugging this last equation in the expansion of f ( x ) we obtain that
∞
f ( x ) = (−1)n ∑ cn Hen (x)Z(x) (1.41)
n =0
To obtain the coefficients we simply use the definition of orthogonal

polynomials18 , obtaining
Z +∞
(−1)n
cn = f (t) Hen (t)dt (1.42)
n! −∞
With this coefficients we are left with what it is called the Gram-
Charlier expansion of type A [NearlyGauss]. Expressing the polyno-
mial Hen (t) as ∑nk=0 ak tk stands out the fact that the coefficients cn are
a linear combination of the moments αk of the random variable X. To
find the full formula it is possible to show[NearlyGauss] that
[n/2]
(−1)k x n−2k
Hen ( x ) = n! ∑ k!(n − 2k )!2k
n =0
The problem of the Gram-Charlier series is that it has poor con-

vergence properties. For realistic cases, very often it diverges. This is
linked to the sensitivity of the Gram-Charlier to the behaviour of f ( x )
at infinity: the latter must fall to zero faster than exp(− x2 /4) for the
series to converge. See again[NearlyGauss] also for a real example.
We can normalize our random variable X to unit variance by diving
by its standard deviation σ.Let’s consider as expansion parameter
κn
Sn = (1.43)
σ −2
2n
We denote with k m the set of all non-negative integers satisfing

k1 + 2k2 + ... + nk n = n and r = k1 + k2 + ... + k n . The Edgeworth
expansion at arbitrary order of a probability density g( x ) with respect
to the normalized unit variance probability Gaussian function φ( x ) is
obtained by this
Result: Given a nearly Gaussian pdf f ( x ) this can be written as a
perturbation of a Gaussian by the asymptotic Edgeworth expansion
for arbitrary order s
∞ s
1 S m +2 k m
g( x ) = σ f (σx ) = Z ( x ){1 + ∑ σs × ∑ Hes+2r ( x ) ∏ ( ) }
s =1 {k m }
k ! ( m + 2) !
m =1 m
(1.44)
where by asymptotic we mean that if the first N terms are retained

in the sum over s, then g( x ) minus the partial sum is of a lower
18 Remeber that for a polynomial of order n, its n−th derivative is proportional to n!
1.5 non-gaussianity 15
order than the N − th term in the sum. See [Nearly] for a derivation
and for how to do the sum(O METTO IN UN APPENDICE?). In the
standard literature we will find only the few first terms written. In
this thesis for example we will only need the expansion to S3 . For
future reference we write here some terms of the expansion19 :
1 y S3 y S4 y
f (y) = Z ( )[1 + σ He3 ( ) + σ2 He4 ( ) + ...] =
σ σ 6 σ 24 σ
1 y2 S3 y S4 y
−
=√ e 2σ2 [1 + σ He3 ( ) + σ2 He4 ( ) + ...] (1.45)
2πσ2 6 σ 24 σ
For a simple application of this, see lecture_NG_iucaa_2011.pdf by

Komatsu.
Example:
(METTERE CHI E RAYLEIGH)
attention: A well-known problem with truncating the moments(or

cumulants) expansion, is that the resulting distribution is not a well-
defined probability distribution function. Hence The Edgeworth se-
ries, although properly normalised, can produce negative values if
we use big values of the expansion parameters(e.g. skewness). This
is a breakdown of the series approximation, and one should take care
not to force the PDF into this regime[peter-functional]. As a rule of
thumb we will take Sn << 1.
the multivariate edgeworth expansion We rewrite here

the form for a Gaussian discretized random field Φ(~x ) 20 with zero
mean and covariance matrix Σ
exp[−Φ T Σ−1 Φ/2]

p(Φ|Σ) = (1.46)
(2π ) D |Σ|
p
where Φ T Σ−1 Φ = ∑ij Φ( xi )(Σ)ij Φ( x j ).

An example could be the temperature anisotropy for a given direc-
tion in the sky, Φ(~x ) = δT (n̂)21 . In this case it is more convenient to
use the spherical harmonic transform
Φ(~x ) = δT (n̂) = ∑ alm Ylm (n̂)

lm
19 Note we do the change of variable y = σx

20 Whose components, Φi = Φ(~xi ), may represent, for example, an array of pixel values
or the amplitude of a set of harmonic modes
21 In general we should also consider the observer position δT (~x, n̂).
16 introduction
or22
Z
∗
alm = dΩδT (n̂)Ylm (n̂)
Then we could write the Gaussian pdf as (C = Σ and C − 1 its in-

verse)
1 1
2∑
p(~Φ) = p({ alm }) = exp[− a∗lm (C −1 )lm,l 0 m0 al 0 m0 ] (1.47)
(2π ) D |C |1/2 lm
with D = Nharm the number of harmonics and Clm,l 0 m0 = h a∗lm al 0 m0 i.

In general C have a non diagonal form23 .
What happens if we want to consider Non-Gaussianities for a mul-
tivariate case (in the space of harmonics)? We could again use, as
with the 1− D case, the fact that measuring higher order (than two)
correlation functions allows us to retrace a pdf. The problem is that
in general practically we are only able to measure three or four points
correlation functions (angular bispectrum h al1 m1 al2 m2 al3 m3 i and trispec-
trum h al1 m1 al2 m2 al3 m3 al4 m4 i). But what can we do is to suppose to have
a weak non-gaussianity and so we are justified to use a multidimen-
sional Edgeworth expansion.
We can write an expression for the Edgeworth expansion of a mul-
tivariate PDF in the harmonic coefficients[Verde,2010]
1 1
2∑
p({ alm }) = p(~a) = exp[− a∗lm (C −1 )lm,l 0 m0 al 0 m0 ]×
(2π ) D |C |1/2 lm
1
6 all∑lm l1 m1 l2 m2 l3 m3
×{1 + h a a a i[(C −1 a)l1 m1 (C −1 a)l2 m2 (C −1 a)l3 m3 ]+
1
6 all∑lm l1 m1 l2 m2 l3 m3
− h a a a i[3(C −1 )l1 m1 ,l2 m2 (C −1 a)l3 m3 ] + ...} (1.48)
But if we consider only Cosmic Microwave Background tempera-

ture anisotropy, then we can apply rotational invariance to obtain:
Clm,l 0 m0 = Cl δll 0 δmm0 ≡ Cl (1.49)
and the Edgeworth expansion is [Babich]
a∗ alm
− lm
∂ ∂ ∂ e 2Cl
p(~a) = [1 − ∑ h a l1 m 1 a l2 m 2 a l3 m 3 i
∂al1 m1 ∂al2 m2 ∂al3 m3
]∏ √
2πCl
(1.50)
all lm lm
22 dΩ is the volume element dn̂.

23 For example if we measure a mixture of emissions of astrophysical components.

Chapter 01

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Chapter 01

Transféré par

Droits d'auteur :

Formats disponibles

INTRODUCTION

According to the theory of Inflation, quantum primordial fluctuations

1.1 pills of probability

There are two school of thoughts that define probability:

• Frequentists: probability is the limiting value of the number of

• Bayesians: probability is a measure of the degree of belief in a

We will take the second as our definition of probability, trying to

• Ā denotes the proposition that A is false

• I is the relevant background information at hand, i.e. any rele-

• prob( A| B, I ) is the probability of A happening given that B(and

• prob( A, B| I ) = prob( B, A| I ) is the (joint) probability that both

The two basic rules of probability theory are:

prob( A| I ) + prob( Ā| I ) = 1

and the product rule:

prob( A, B| I ) = prob( A| B, I ) × prob( B| I ) (1.1)

From the product rule it is possible to derive Bayes’ theorem

1.2 random fields

1.2.1 Random variables

Let Ω be a sample space of the possible outcomes of an experiment.

p X ( X = x )dx ≡ p( x )dx (1.4)

with dx the Lebesgue measure on the space R and

The pdf gives the probability p( x )δx to find x < X ≤ x + δx as

In general, R could be any other set of values A2 (e.g. Rn ).

These definitions can be generalized to several random variables.

We say that X1 and X2 are statistically independent if

In this their joint pdf is simply p X1 X2 ( x1 , x2 ) = p X1 ( x1 ) p X2 ( x2 ).

Moments and Cumulants

The first n cumulants can be related to the first n moments (and

κ4 = µ4 − 3µ22 , µ4 = κ4 + 3κ22 (1.17)

κ5 = µ5 − 10µ3 µ2 , µ5 = κ5 + 10κ3 κ2 (1.18)

κ6 = µ6 − 15µ4 µ2 − 10µ23 + 30µ32 , µ6 = κ6 + 15κ4 κ2 + 10κ32 + 15κ23

µ1 is the mean of X. Usually κ2 is denoted as σ2 , the variance. On

example: the 1-d gaussian distribution This is

where µ and σ2 are two parameters. Their meaning can be deduced

Equation (1.23) is really important. It tells us that if a random

1.2.2 Random Fields

A n−dimensional random field q(~x ) is a collection of random vari-

2. it can be normalized to unity4

The statistical (ensemble) average5 of some functional F [q] can be

The meaning of the ensemble average is that we consider several,

The n−point correlation functions

and define the N −point disconnected correlation functions D ( N ) as

Similarly, we can define the connected correlation functions as the

denoted also as8

hq(~x1 ), .., q(~xn )ic = C (n) (~x1 , .., ~xn ) (1.29)

As a pdf of a random variable is fully specified by its moments or

1.2.3 Statistical Homogeneity and Isotropy in Cosmology

We have developed some theoretical tools to compute quantities av-

8 Note the commas to indicate that we are not doing averages.

When we do practical observations, we produce pixelised maps with

1.3 random fields on a sphere

This resembles the 2D Fourier transform

12 A random field which is homegenous and isotropic is generally called station-

with l ∈ N and m an integer |m| ≤ l.

The Ylm ’s are complete and as the Fourier transform, respect an

From here, and from completeness, the (complex) coefficients of

The index l is an integer associated with the number of spatial

• l = 1 has one full oscillation over the sphere in the θ direction,

• Higher l values gives us more spatial oscillations over the sphere

For more insights see [CMB.pdf].

1.4 gaussian distribution

The Gaussian distribution, also know as Normal distribution, for a

where Σ is a D × D covariance matrix, |Σ| its determinant, ~µ is a

• If the covariance matrix is diagonal, the components of ~X are