SRM

T-test
A t-test is an analysis of two populations means through the use of statistical examination; a t-test
with two samples is commonly used with small sample sizes (usually 30 or less), testing the difference
between the samples when the variances of two normal distributions are not known. The t test
compares two averages (means) and tells if they are different from each other. The t test also tells
how significant the differences are; In other words it indicates if those differences could have
happened by chance.
T-tests can be used in real life to compare means. For example, a drug company may want to test a
new cancer drug to find out if it improves life expectancy. In an experiment, there’s always a control
group (a group who are given a placebo, or “sugar pill”). The control group may show an average
life expectancy of +5 years, while the group taking the new drug might have a life expectancy of +6
years. It would seem that the drug might work. But it could be due to a fluke. To test this, researchers
would use a Student’s t-test to find out if the results are repeatable for an entire population.
http://www.statisticshowto.com/probability-and-statistics/t-test/
https://www.investopedia.com/terms/t/t-test.asp
Chi square test
There are two types of chi-square tests. Both use the chi-square statistic and distribution for different
purposes:
 A chi-square goodness of fit test determines if a sample data matches a population.
 A chi-square test for independence compares two variables in a contingency table to see
if they are related. In a more general sense, it tests to see whether distributions
of categorical variables differ from each another.
 A very small chi square test statistic means that the observed data fits the expected data
extremely well. In other words, there is a relationship.
 A very large chi square test statistic means that the data does not fit very well.
Is gender independent of education level? A random sample of 395 people were surveyed and each
person was asked to report the highest education level they obtained. The data that resulted from
the survey is summarized in the following table
Question: Are gender and education level dependent at 5% level of significance? In other words,
given the data collected above, is there a relationship between the gender of an individual and the
level of education that they have obtained?
So, working this out, χ2= (60−50.886)2/50.886+⋯+ (57−48.132)2/48.132=8.006. The critical value
of χ2 with 3 degree of freedom is 7.815. Since 8.006 > 7.815, therefore we reject the null hypothesis
and conclude that the education level depends on gender at a 5% level of significance.
http://www.statisticshowto.com/probability-and-statistics/chi-square/
Z-test
Z-test is any statistical test for which the distribution of the test statistic under the null hypothesis can
be approximated by a normal distribution. Because of the central limit theorem, many test statistics
are approximately normally distributed for large samples. For each significance level, the Z-test has a
single critical value (for example, 1.96 for 5% two tailed) which makes it more convenient than
the Student's t-test which has separate critical values for each sample size. Therefore, many statistical
tests can be conveniently performed as approximate Z-tests if the sample size is large (greater than
30) or the population variance is known.
Blood glucose levels for obese patients have a mean of 100 with a standard deviation of 15. A
researcher thinks that a diet high in raw corn-starch will have a positive or negative effect on blood
glucose levels. A sample of 30 patients who have tried the raw corn-starch diet have a mean glucose
level of 140. Test the hypothesis that the raw corn-starch had an effect.
State the null hypothesis: H0: μ=100, State the alternate hypothesis: H1:≠100, State your alpha
level. We’ll use 0.05 for this example. As this is a two-tailed test, split the alpha into two.
0.05/2=0.025, find the z-score associated with your alpha level. You’re looking for the area in one
tail only. A z-score for 0.75(1-0.025=0.975) is 1.96. As this is a two-tailed test, you would also be
considering the left tail (z=1.96), find the test statistic using this formula:
z= (140-100)/ (15/√30) =14.60. If z is less than -1.96 or greater than 1.96 (Step 3), reject the null
hypothesis. In this case, it is greater, so reject the null.
https://en.wikipedia.org/wiki/Z-test
http://www.statisticshowto.com/probability-and-statistics/hypothesis-testing/
Anova
Analysis of variance (ANOVA) is a collection of statistical models and their associated procedures (such
as "variation" among and between groups) used to analyse the differences among group means.
ANOVA was developed by statistician and evolutionary biologist Ronald Fisher. In the ANOVA setting,
the observed variance in a particular variable is partitioned into components attributable to different
sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not
the means of several groups are equal, and therefore generalizes the t-test to more than two groups.
ANOVAs are useful for comparing (testing) three or more means (groups or variables) for statistical
significance. It is conceptually similar to multiple two-sample t-tests, but is more conservative (results
in less type I error) and is therefore suited to a wide range of practical problems.
One-way ANOVA can be used for studying the effects of tea on weight loss and form three groups:
green tea, black tea, and no tea. A two-way ANOVA allows a company to compare worker
productivity based on two independent variables, say salary and skill set. It is utilized to observe
the interaction between the two factors. It tests the effect of two factors at the same time
https://en.wikipedia.org/wiki/Analysis_of_variance
Correlation analysis
Correlation analysis is a method of statistical evaluation used to study the strength of a relationship
between two, numerically measured, continuous variables (e.g. height and weight). This particular
type of analysis is useful when a researcher wants to establish if there are possible connections
between variables. If correlation is found between two variables it means that when there is a
systematic change in one variable, there is also a systematic change in the other; the variables alter
together over a certain period of time. If there is correlation found, depending upon the numerical
values measured, this can be either positive or negative.
 Positive correlation exists if one variable increases simultaneously with the other, i.e. the high
numerical values of one variable relate to the high numerical values of the other.
 Negative correlation exists if one variable decreases when the other increases, i.e. the high
numerical values of one variable relate to the low numerical values of the other.
Pearson’s product-moment coefficient is the measurement of correlation and ranges between +1 and
-1. +1 indicates the strongest positive correlation possible, and -1 indicates the strongest negative
correlation possible. Therefore the closer the coefficient to either of these numbers the stronger the
correlation of the data it represents. On this scale 0 indicates no correlation, hence values closer to
zero highlight weaker/poorer correlation than those closer to +1/-1.
For an example, try to think of two fields, subjects, ideas (whatever you can think of) that is naturally
assumed are connected somehow. Because both are considered maths, one might think "Algebra
scores" and "Geometry scores" could be correlated. This would mean that most people who had a
high score on one test, also had a high score on the other, and the same goes for low and medium
scores. In a class with 5 students, an example of a data set that would fit the hypothesis that the
test scores are correlated is: (6, 7), (15, 17), (12, 11), (20, 20), (8, 10). X is relatively large when Y is
large and vice versa.
http://www.djsresearch.co.uk/glossary/item/correlation-analysis-market-research
https://www.quora.com/What-is-an-everyday-example-of-a-correlation-in-statistics
Regression (simple)
Linear regression is a basic and commonly used type of predictive analysis. The overall idea of
regression is to examine two things: (1) does a set of predictor variables do a good job in predicting
an outcome (dependent) variable? (2) Which variables in particular are significant predictors of the
outcome variable, and in what way do they–indicated by the magnitude and sign of the beta
estimates–impact the outcome variable? These regression estimates are used to explain the
relationship between one dependent variable and one or more independent variables. The simplest
form of the regression equation with one dependent and one independent variable is defined by the
formula y = c + b*x, where y = estimated dependent variable score, c = constant, b = regression
coefficient, and x = score on the independent variable.
Consider rainfall is independent and yield is dependent variable for rice. We may have 20 years
rainfall rate and yield data. Now you need to decide that how much rice should be imported, so that
rice will be sufficiently available in market. This is example of regression but it’s not sure that the
relation between rice yield and rainfall rate is linear.
http://www.statisticssolutions.com/what-is-linear-regression/
https://www.quora.com/What-are-some-real-world-applications-of-simple-linear-regression
Regression (multiple)
Reality in the public sector is complex. Often there may be several possible causes associated with a
problem; and likewise there may be several factors necessary for a solution. Complex statistical
applications are needed which can deal with interval and ratio level variable, assess causal linkages,
forecast future outcomes. Ordinary least squares linear regression is the most widely used type of
regression for predicting the value of one dependent variable from the value of one independent
variable. It is also widely used for predicting the value of one dependent variable from the values of
two or more independent variables. When there are two or more independent variables. It is called
multiple regression.
A research chemist wants to understand how several predictors are associated with the wrinkle resistance of
cotton cloth. The chemist examines 32 pieces of cotton cellulose produced at different settings of curing time,
curing temperature, formaldehyde concentration, and catalyst ratio. The durable press rating, a measure of
wrinkle resistance, is recorded for each piece of cotton. The chemist performs a multiple regression analysis
to fit a model with the predictors and eliminate the predictors that do not have a statistically significant
relationship with the response.
http://web.csulb.edu/~msaintg/ppa696/696regmx.htm
http://support.minitab.com/en-us/minitab-express/1/help-and-how-to/modeling-
statistics/regression/how-to/multiple-regression/before-you-start/example/
Factor Analysis
Factor analysis is a technique that is used to reduce a large number of variables into fewer numbers
of factors. This technique extracts maximum common variance from all variables and puts them into
a common score. As an index of all variables, it can be used for further analysis. Factor analysis is part
of general linear model (GLM) and this method also assumes several assumptions: there is linear
relationship, there is no multicollinearity, it includes relevant variables into analysis, and there is true
correlation between variables and factors. Several methods are available, but principle component
analysis is used most commonly.
Suppose we want to develop a test that will allow a company to select for applicants that are good
team members. How would we go about it? Let's say a psychologist conducts an exploratory factor
analysis on the company's requirements and discovers 20 different aspects or characteristics that
make a good team member (for example "empathy" and "politeness").
Further factor analysis and testing on small samples reveals, however, that all 20 aspects are merely
the manifestations of just three main factors: communication skills, conscientiousness
and extroversion. The psychologist can conduct further rounds of factor analysis, testing and
refinement to find answers to two main questions:
 What is the minimum number of factors needed to explain all the variation we see in the
company's data?
 How well do these factors describe all the data?
Eventually the psychologist can arrive at the main hidden factors in the data and design the
inventory accordingly1.
http://www.statisticssolutions.com/factor-analysis-sem-factor-analysis/
https://explorable.com/factor-analysis
https://www.theanalysisfactor.com/factor-analysis-1-introduction
1
Explorable. Factor Analysis. [online] Available at: https://explorable.com/factor-analysis [Accessed 27 Feb. 2018]

SRM

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

SRM

Transféré par

Droits d'auteur :

Formats disponibles

T-test

Chi square test

 How well do these factors describe all the data?

Vous aimerez peut-être aussi