Vous êtes sur la page 1sur 7

ASSIGNMENT

OF
COMPUTER

SUBMITTED BY :-
SUBMITTED TO :-
Ashana
Mrs Charu Sharma 11187807
Msc Chem 4th Sem
Q1. What is t-test? When it is used and for what purpose? Explain by the
means of an example.
Ans:- A t-test is a type of inferential statistic used to determine if there is a significant
difference between the means of two groups, which may be related in certain features.
It is mostly used when the data sets, like the data set recorded as the outcome from flipping
a coin 100 times, would follow a normal distribution and may have unknown variances. A t-
test is used as a hypothesis testing tool, which allows testing of an assumption applicable to
a population.
A t-test looks at the t-statistic, the t-distribution values, and the degrees of freedom to
determine the statistical significance. To conduct a test with three or more means, one must
use an analysis of variance.

For example:- Assume that we are taking a diagonal measurement of paintings received in
an art gallery. One group of samples includes 10 paintings, while the other includes 20
paintings. The data sets, with the corresponding mean and variance values, are as follows:

Set 1 Set 2
19.7 28.3
20.4 26.7
19.6 20.1 Since the number of data records is different (n1 = 10 and
17.8 23.3 n2 = 20) and the variance is also different, the t-value and
18.5 25.2 degrees of freedom are computed for the above data set
18.9 22.1
using the formula.
18.3 17.7
18.9 27.6 T-value= mean1 – mean2
19.5 20.6
√𝑠2 (1/𝑛2 + 1/𝑛2)
21.95 13.7
23.2
17.5 The t-value is -2.24787. Since the minus sign can be
20.6 ignored when comparing the two t-values, the computed
18 value is 2.24787.
23.9
21.6
24.3
20.4
23.9
13.3
Mean 19.4 21.6
Variance 1.4 17.1

Q2. What do you mean by hypothesis? Differentiate between null and alternate
hypothesis?
Ans:- A hypothesis may be defined as a proposition or a set of proposition set forth as an
explanation for the occurrence of some specified group of phenomena either asserted
merely as a provisional conjecture to guide some investigation or accepted as highly
probable in the light of established facts.
Characteristics of Hypothesis:-
• Hypothesis should be clear and precise.
• Hypothesis should be capable of being tested.
• Hypothesis should state relationship between variables, if it happens to be a relational
hypothesis.
• Hypothesis should be limited in scope and must be specific.
• Hypothesis should be consistent with most known facts i.e it must be consistent with a
substantial body of established facts.

Difference between Null & Alternate hypothesis:-

1. A null hypothesis is a statement in which there is no relationship between two variables.


An alternative hypothesis is a statement that is simply the inverse of the null hypothesis,
i.e. there is some statistical significance between two measured phenomenon.
2. A null hypothesis is what, the researcher tries to disprove whereas an alternative
hypothesis is what the researcher wants to prove.
3. A null hypothesis represents, no observed effect whereas an alternative hypothesis
reflects, some observed effect.
4. If the null hypothesis is accepted, no changes will be made in the opinions or actions.
Conversely, if the alternative hypothesis is accepted, it will result in the changes in the
opinions or actions.
5. As null hypothesis refers to population parameter, the testing is indirect and implicit. On
the other hand, the alternative hypothesis indicates sample statistic, wherein, the testing
is direct and explicit.
6. A null hypothesis is labelled as H0 (H-zero) while an alternative hypothesis is
represented by H1 (H-one).
7. The mathematical formulation of a null hypothesis is an equal sign but for an alternative
hypothesis is not equal to sign.
8. In null hypothesis, the observations are the outcome of chance whereas, in the case of
the alternative hypothesis, the observations are an outcome of real effect.

Q3. What is Chi-square test? Explain its significance in statistical analysis?


Ans:- Chi-square, symbolically written as χ2 is a statistical measure used in sampling
analysis for comparing a variance to a theoretical variance. As a non-parametric* test, it “can
be used to determine if categorical data shows dependency or the two classifications are
independent. It can also be used to make comparisons between theoretical populations and
actual data when categories are used. Thus, the chi-square test is applicable in large
number of problems. The test is, in fact, a technique through the use of which it is possible
for all researchers to (i) test the goodness of fit; (ii) test the significance of association
between two attributes, and (iii) test the homogeneity or the significance of population
variance.
The formula for calculating Chi-square is:-
(𝑂𝑖−𝐸𝑖) 2
χc2 = ∑
𝐸𝑖

where, c = degrees of freedom


O = observed value
E = expected value

Significance of Chi-square test in statistical analysis:-


The chi-square distribution has many uses in statistics, including:

• Confidence interval estimation for a population standard deviation of a normal


distribution from a sample standard deviation.
• Independence of two criteria of classification of qualitative variables.
• Relationships between categorical variables (contingency tables).
• Sample variance study when the underlying distribution is normal.
• Tests of deviations of differences between expected and observed frequencies.

Example of Chi-square test:-

Suppose there is a city of 1,000,000 residents with four neighbourhoods: A, B, C, and D. A


random sample of 650 residents of the city is taken and their occupation is recorded as "white
collar", "blue collar", or "no collar". The null hypothesis is that each person's neighbourhood of
residence is independent of the person's occupational classification. The data is tabulated as:

A B C D Total

White collar 90 60 104 95 349


Blue collar 30 50 51 20 151
No collar 30 40 45 35 150
Total 150 150 200 150 650

Let us take the sample living in neighbourhood A, 150, to estimate what proportion of the whole
1,000,000 live in neighbourhood A. Similarly we take 349/650 to estimate what proportion of the
1,000,000 are white-collar workers. By the assumption of independence under the hypothesis we
should "expect" the number of white-collar workers in neighbourhood A to be
150 × (349∕650) ≈ 80.54

Then in that "cell" of the table, we have

(𝑶𝒊−𝑬𝒊) 𝟐 (𝟗𝟎−𝟖𝟎.𝟓𝟒)𝟐
= ≈ 1.11
𝑬𝒊 𝟖𝟎.𝟓𝟒

The sum of these quantities over all of the cells is the test statistic; in this case, ≈ 24.6.
Under the null hypothesis, this sum has approximately a chi-squared distribution whose
number of degrees of freedom are
(number of rows – 𝟏) (number of columns – 𝟏) = (3 – 𝟏) (4 – 𝟏) = 6
If the test statistic is improbably large according to that chi-squared distribution, then one
rejects the null hypothesis of independence.
Q4. Explain the meaning of ANOVA? Describe ANOVA for one-way & two-way
classification?
Ans:- Analysis of variance (ANOVA) is a collection of statistical models and their
associated estimation procedures (such as the "variation" among and between groups) used
to analyze the differences among group means in a sample using the F - distribution.
ANOVA helps us to figure out if you need to reject the null hypothesis or accept the alternate
hypothesis.

One-way ANOVA
The one-way ANOVA is a hypothesis test in which only one categorical variable or single
factor is taken into consideration. With the help of F-distribution, it enables us to compare the
means of three or more samples. It compares the means between the groups you are
interested in and determines whether any of those means are statistically significantly
different from each other. Specifically, it tests the null hypothesis:

H 0 : 𝝁 1 = 𝝁2 = 𝝁3 = … = 𝝁k

where µ = group mean and k = number of groups. If, however, the one-way ANOVA returns
a statistically significant result, we accept the alternative hypothesis (H A), which is that there
are at least two group means that are statistically significantly different from each other.

Assumptions for One-way ANOVA


Normal distribution of the population from which the samples are drawn.
• Measurement of the dependent variable is an interval or ratio level.
• Two or more than two categorical independent groups in an independent variable.
• Independence of samples.
• Homogeneity of the variance of the population.

Limitations of One-way ANOVA


A one way ANOVA will tell you that at least two groups were different from each other. But it
won’t tell you which groups were different. If your test returns a significant f-statistic, you may
need to run an ad hoc test to tell you exactly which groups had a difference in means.

Two-way ANOVA
Two-way ANOVA examines the effect of two independent factors on a dependent variable. It
also studies the inter-relationship between independent variables influencing the values of the
dependent variable, if any.
For example, analyzing the test score of a class based on gender and age. Here test score is
a dependent variable and gender and age are the independent variables. Two-way ANOVA
can be used to find the relationship between these dependent and independent variables.

Assumptions for Two-way ANOVA


• The population must be close to a normal distribution.
• Samples must be independent.
• Population variances must be equal.
• Groups must have equal sample sizes.

Advantages of ANOVA
• It is an improved technique over t-test and z-test.
• Suitable for multidimensional variables.
• Analysis of various factors at a time.
• Economical method of parametric testing.
• Can be used in 3 or more than 3 groups.

Disadvantages of ANOVA

• It is difficult to analyze ANOVA under strict assumptions regarding the nature of data.
• It is not so helpful in comparison with t-test that there is no special interpretation of the
significance of two means.
• The requirement of post-ANOVA t-test for further testing.

Applications of ANOVA
• Recommendation of a fertilizer against others for the improvement of crop yield.
• ANOVA has immensely useful practical applications in business, particularly Lean-Six
Sigma/operational efficiency.
• Comparing the gas mileage of different vehicles, or the same vehicle under different fuel
types, or road types.
• Understanding the impact of temperature, pressure or chemical concentration on some
chemical reaction (power reactors, chemical plants, etc).
• Understanding the impact of different catalysts on chemical reaction rates.
• Studying whether advertisements of different kinds solicit different numbers of customer
responses.
• Understanding the performance, quality or speed of manufacturing processes based on
the number of cells or steps they’re divided into.

Q5. What is degree of freedom? Explain.


Ans:- Degrees of Freedom refers to the maximum number of logically independent values,
which are values that have the freedom to vary, in the data sample.
Degrees of Freedom are commonly discussed in relation to various forms of hypothesis
testing in statistics, such as a Chi-Square.
The easiest way to understand Degrees of Freedom conceptually is through an example:

• Consider a data sample consisting of five positive integers. The values c ould be any
number with no known relationship between them. This data sample would, theoretically,
have five degrees of freedom.
• Four of the numbers in the sample are {3, 8, 5, and 4} and the average of the entire data
sample is revealed to be 6.
• This must mean that the fifth number has to be 10. It can be nothing else. It does not
have the freedom to vary.
• So the Degrees of Freedom for this data sample is 4.

The formula for Degrees of Freedom equals the size of the data sample minus one:
Df=N−1
where , Df = degrees of freedom
N = sample size
Degrees of Freedom are commonly discussed in relation to various forms of hypothesis
testing in statistics, such as a Chi-Square.
For Chi-square tests, degrees of freedom are utilized to determine if a certain null
hypothesis can be rejected based on the total number of variables and samples within the
experiment. For example, when considering students and course choice, a sample size of 30
or 40 students is likely not large enough to generate significant data. Getting the same or
similar results from a study using a sample size of 400 or 500 students is more valid.

Vous aimerez peut-être aussi