Stat - 19 July FNL

START HERE
OBSERVATIONS/PREVIOUS STUDIES
Patterns in space & Time
MODELS
Explanation or theories
HYPOTHESIS
Predictions based on model
NULL HYPOTHESIS (H
0
)
Logical opposite of hypothesis
EXPERIMENT
Critical test of null hypothesis
Retain H
o
Refute hypothesis &
model
Reject null hypothesis
Support hypothesis &
Model
Data Interpretation
Bivalvia species of India
65 70 75 80
Delta+
50
100
150
200
250
300
L
a
m
b
d
a
+
S
20
40
60
80
100
120
140
160
180
200
220
240
260
280
300
20
40
60
80 100 120 140 160 180 200 220240 260 280300
GJ(129)
MH(101)
GA(75)
KA(37)
KL(60)
LK(81)
OR(164)
WB(92)
AP(100)
TN(269) AN(252)
Bivalvia species of India
0 200 400 600
Number of species
50
55
60
65
70
75
80
D
e
lt
a
+
GJ MH
GA KA
KL LK OR WB
AP
TN
AN
Bivalvia Species of India
Resemblance: Gamma+
st
gj
mh
ga
ka
kl
lk
or
wb
ap
tn
an
Dissimilarity
25
GJ
MH
GA
KA
KL
LK
OR
WB
AP
TN
AN
2D Stress: 0.1
Statistics
Reliability: Significance
Strength of relationship: Meaningfulness

t- test ANOVA
Env of mandovi
0
20
40
60
80
100
B
%
2-4 5,7
C
6,8
B
1
A
Macrofauna of mandovi
Group average
M
O
R
'0
7
M
O
R
'0
8
M
O
R
' 0
9
L
S
R
'0
8
L
S
R
'0
9
P
M
o
n
R
'0
7
P
M
R
'0
7
E
M
R
'0
7
Samples
100
80
60
40
20
S
im
ila
rity
Transform: Log(X+1)
Resemblance: S17 Bray Curtis similarity
Type of Statistics
1. Descriptive : e.g Mean, Median, Std. Dev, Std.
Error, Std. Variance

2. Correlation: Relation between parameters

3. Inferential : Differences between/within group

Descriptive Statistics
Mean: arithmetic average of the scores. Considers both the
number of scores and their value

Median: middle point in an ordered distribution at which
an equal number of scores lie on each side.

Mode: most frequently occurring score

6
Median
Example: 71, 73, 74, 75, 72

Step One: Place the scores in order from lowest to
highest: 71, 72, 73, 74, 75

Step Two: Calculate the position of the median using the
following formula:

Mdn= 5+1/2 = 3 rd score

Mode
Mode: most frequently occurring score

Which of the following scores is the mode?
Unimodal: 3, 7, 3, 9, 9, 3, 5, 1, 8, 5

Biomodal: 2, 4, 9, 6, 4, 6, 6, 2, 8, 2

Multimodal: 7, 7, 6, 6, 5, 5, 4 and 4

Mean versus Median

Median not influenced by large sample values & is
a better measure of centrality if the distribution is
skewed.

If mean=median=mode then the data are said to
be symmetrical or Normal distribution

9
Descriptive Statistics: Variability
Measures of variability: extent of similarity or difference
in a set of data

E.g Range, standard deviation, standard variance
10
Standard Deviation (SD)
Standard Deviation (s) a measure of the variability,
or spread, of a set of scores around the mean

Sum of differences between each score and the mean
(known as deviation scores)

A good approach for measuring variability around the
mean
11
Standard Deviation
The sample standard deviation, s, is the square-
root of the variance

1
1
2
n
x x
s
n
i
i
12
Standard Deviation

Lets use the scores 1, 2, 6, 6, and 15, where

Substituting our X scores again,
= (1-6)
2
+ (2-6)
2
+ (6-6)
2
+ (6-6)
2
+ (15-6)
2
= (-5)
2
+ (-4)
2
+ (0)
2
+ (0)
2
+ (9)
2
= 25 + 16 + 0 + 0 + 81
= 122
We then divide this value by n-1 to arrive at the mean squared
deviation
122/4 = 30.5
We then take the square root of this value to bring the units back
to the raw score units

13
Standard Variance
Square of the standard deviation (s
2
)

Used with in: regression analysis, analysis of variance
(ANOVA), and the determination of the reliability of a
test

Also known as the mean square (MS)

14
Sample Variance

1
1
2
2
n
x x
s
n
i
i
15
The sample variance, s
2
, is the arithmetic mean of the
squared deviations from the sample mean:
>
Normal Distribution of data
Graphical Assessment of Normality
(probability plots)
Shapiro-Wilk's Test (W-statistic)
D'Agostino Test (D-statistic)
Goodness-of-Fit Tests (e.g.,
Kolmogorov-Smirnov Test)

Data Normally distributed parametric test
No normal Distribution: transformation or non parametric test
E.g log (growth rate)
square root (Density data)
Arcsine (% , ratio data)

Univariate Analysis
t-test: difference between two mean values

Analysis of variance (ANOVA)
t- test
Comparison of two mean values
E.g Density data between two sites
Difference between control & Experiment

One-tailed: testing in any one direction

Two-tailed: testing relationship in both directions i.e
higher & below mean
t-test contd..
Independent t-test : comparing unrelated data
E.g male and female or two different sites

Dependent: data that are related e.g before and after

Analysis of Variance (ANOVA)

One-way - One independent variable e.g site or
month, season

Two-way - 2 independent variable e.g site and month

Factorial - > 2 independent variable e.g Transect,
Site (Area) and month/season
ANOVA
OC
SS Degr. Of
freedom
MS F p
Intercept 2.33635 1 2.336346 112.024 0.0000
station 0.86437 9 0.096041 4.6050 0.0021
Error 0.41712 20 0.020856
SS- Sum of Square
Degree of freedom= n-1
MS: Mean square
F: ratio of mean square by the residual mean square.
F value should be greater than the cut-off value
P= 95 % confidence
One way ANOVA
OC
SS Degr. Of
freedom
MS F p
Intercept 2.33635 1 2.336346 112.024 0.0000
station 0.86437 9 0.096041 4.6050 0.0021
Error 0.41712 20 0.020856
Phaeopigment
SS Degr. Of
F
MS F p
Intercept 0.02431 1 0.02431 7.08660 0.01496
station 0.02900 9 0.00322 0.93941 0.51402
Error 0.06861 20 0.00343
Two way ANOVA
SS Degr. of MS F p
Intercept 272.8705 1 272.8705 575.1452 0.000000
season 9.4421 2 4.7210 9.9508 0.000185
Stn 15.3212 9 1.7024 3.5882 0.001262
season*Stn 13.9800 18 0.7767 1.6370 0.079326
Error 28.4663 60 0.4744
SS Degr. of MS F p
Intercept 15.11210 1 15.11210 290.6176 0.000
season 0.02009 2 0.01005 0.1932 0.824
Tide 0.54502 2 0.27251 5.2406 0.0072
season*Tide 0.57176 4 0.14294 2.7489 0.0336
Error 4.21200 81 0.05200
Post hoc test
{1} {2} {3} {4} {5} {6} {7} {8} {9}
season Tide 0.161 .533 .53 .46 .458 .40 .33 .49 .34
1 1 1
2 1 2 0.013
3 1 3 0.011 1.00
4 2 1 0.211 0.97 0.97
5 2 2 0.11 0.99 0.99 0.99
6 2 3 0.30 0.939 0.92 1.00 0.99
7 3 1 0.75 0.577 0.54 0.99 0.96 0.99
8 3 2 0.037 0.999 0.99 0.99 0.99 0.99 0.79
9 3 3 0.707 0.624 0.59 0.99 0.973 0.99 1.00 0.83
Factorial ANOVA
Source df MS F P
Abundance Season
2 43365239 10.30 0.00005125
Stn
9 14757634 3.50 0.00042912
tide
2 50601728 11.54 0.00001643
S x Stn
18 19414787 4.61 0.00000001
S x T
4 7078555 1.35 0.25004741
Stn x T
18 16288388 3.71 0.00000147
Correlation
A linear relationship between two variables
Pearsons (r and p) : Parametric
Spearman (rho and P): non-parametric
Relation : positive or negative (r= -1 0 +1)
Multiple Regression
Correlation of one variable (e.g biological) to 2 or
more variables (e.g environment )

Multiple Regression Results

Dependent: BR Multiple R = .77193627 F = 6.881218
R= .59588560 df = 3,14
No. of cases: 18 adjusted R= .50928966 p = .004444

Standard error of estimate:23.371691805
Intercept: -1.680825308 Std.Error: 7.052847 t(14) = -.2383
p = .8151

FF beta=.271 GR beta=.499 GR/DR beta=.291
Multivariate Analyses
Cluster and nMDS
SIMPER
ANOSIM
BIOENV
Principal Component Analysis (PCA)
Canonical Correspondence Analysis (CCA)
PRIMER E and MVSTEP

Test for Normality of data
Histogram plot
Check for skewness and
Kurtosis
Kolmogorov-Smirnov Test
Used if data set are unqeual
eg Station 1 (10 replicates/
station)
Station 2 (7 replicate /station
Shapiro-Wilk's Test
(W-statistic)
D'Agostino Test (D-
statistic)
Lilliefors test

Normal distribution (p > 0.05)
Parametric Analysis
T-test, ANOVA, Pearson correlation
Not Normal distribution
Transformation
And check for Normality
Normal distribution
Parametric Analysis
Not Normal distribution
Non-Parametric Analysis
Analysis Type Example Parametric test Non parametric
Compare Mean between
2 independent grp
Abundance variation
between Mandovi and
Zuari
Independent t-test Wilcoxon rank-sum
test
Compare two
quantitative
measurement from
same individual
Difference before and
after
Dependent t-test Wilcoxon signed-
rank test
Compare mean between
> 2 groups
Abundance between
Mandovi, Zuari,
Chapora, Sal
1. Way Anova Kruskal-Wallis test
Estimation relation
between 1 dependent
and 1 independent
variables
Relation of biotic and
abiotic data
Pearson correlation
(r -1 0 +1 p< 0.05)
Spearman
correlation
( -1 0 +1 P< 5%)
Estimation relation
between 1 dependent
and > 2 independent
variables
Relation of
phytoplankton density
with temperature
salinity, DO etc
Multiple Regression
(Check for beta value
and p <0.05)
Parametric tests & analogous nonparametric test
Take-home points
Parametric and nonparametric are two broad classifications of statistical procedures.
Parametric tests are based on assumptions about the distribution of the underlying
population from which the sample was taken.
The most common parametric assumption is that data are approximately normally
distributed.
Nonparametric tests do not rely on assumptions about the shape or parameters of the
underlying population distribution.
If the data deviate strongly from the assumptions of a parametric procedure, using the
parametric procedure could lead to incorrect conclusions.
You should be aware of the assumptions associated with a parametric procedure
(Normality test eg. Shapiro-Wilks testor histogram)
If you determine that the assumptions of the parametric procedure are not valid, use
an analogous nonparametric procedure instead (Previous slide).
Nonparametric tests are often a good option for small data ( n < 30).
Nonparametric procedures generally have less power
Interpretation of nonparametric procedures can also be more difficult than for
parametric procedures.
Thank you!

Next Saturday ?????

Stat - 19 July FNL

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Stat - 19 July FNL

Transféré par

Droits d'auteur :

Formats disponibles

START HERE

Vous aimerez peut-être aussi