Vous êtes sur la page 1sur 10

# Franciscan College of the Immaculate Conception Baybay City, Leyte

STAT. JOURNAL # 6

CORRELATION

## 1. Correlation is a measure of the relation between two or more variables.

2. Types of Correlation a. Positive Correlation Positive correlation occurs when an increase in one variable increases the value in another. The line corresponding to the scatter plot is an increasing line.

b. Negative Correlation Negative correlation occurs when an increase in one variable decreases the value of another. The line corresponding to the scatter plot is a decreasing line.

c. No Correlation No correlation occurs when there is no linear dependency between the variables.

d. Perfect Correlation Perfect correlation occurs when there is a functional dependency between the variables. In this case all the points are in a straight line.

e. Strong Correlation A correlation is stronger the closer the points are located to one another on the line.

f.

Weak Correlation

A correlation is weaker the farther apart the points are located to one another on the line.

3. Perfect positive correlation - a relationship between two variables in which both variables move in tandem. A positive correlation exists when as one variable decreases, the other variable also decreases and vice versa. In statistics, a perfect positive correlation is represented by the value +1.00, while a 0.00 indicates no correlation and a -1.00 indicates a perfect negative correlation.

Perfect positive correlation - A relationship between two variables in which one variable increases as the other decreases, and vice versa. In statistics, a perfect negative correlation is represented by the value -1.00, while a 0.00 indicates no correlation and a +1.00 indicates a perfect positive correlation.

4. Outlier - For a set of numerical data, any value that is markedly smaller or larger than other values. Mathematically, outliers are considered any number that is more than 1.5 times the interquartile range away from the median. For example, in the data set {3, 5, 4, 4, 6, 2, 25, 5, 6, 2 } the value of 25 is an outlier. - Basically, the one that doesnt belong. - Basically, the outlier is the number that stands out.

5. Five Steps of Hypothesis Testing 1. STATING THE RESEARCH QUESTION. The first step is to state the research problem in terms of a question that identifies the population(s) of interest to the researcher, the parameter(s) of the variable under investigation, and the hypothesized value of the parameter(s). This step makes the researcher not only define what is to be tested but what variable will be used in sample data collection. The types of variable (combination of variables as in relationship type research questions) whether categorical, discrete or continuous, further define the statistical test which can be performed on the collected data set. For example: Is the mean first salary of a newly graduated student equal to \$30,000? The population of interest is all students who have just graduated. The parameter of interest is the mean and the variable salary is continuous. The hypothesized value of the parameter, the mean, is \$30,000. Since the parameter is a population mean of a continuous variable, this suggests a one sample test of a mean. 2. SPECIFY THE NULL AND ALTERNATIVE HYPOTHESES. The second step is to state the research question in terms of a null hypothesis (H0) and an alternative hypothesis (HA). The null hypothesis is the population parameter, = \$30,000 (H0: = \$30,000). The alternative hypothesis is the population parameter does not equal \$30,000 ( HA: NE \$30,000). This HA suggests a two-tailed test as NE \$30,000 can be less than \$30,000 or more than \$30,000. Sometimes the alternative hypothesis is stated in terms of a direction such as less than or greater than a value such at \$30,000. A directional HA calls for a one-tailed test, in the direction stated in the HA.

The next part of step 2 is to select a significance level (Type I error) typically alpha is used at the .05 or the .01 level. A good researcher will also not neglect Type II error. In this step we are not only setting up our research question in terms of statistical hypotheses, but we must evaluate whether all the assumptions appropriate for the statistical test have been met. Example: H0: = \$30,000 HA: NE \$30,000 alpha=.05 Test assumptions are 1) the population is normally distributed or sample size is approximately >=30 and 2) the sample we have used to collect the data was drawn randomly from the population. If these test assumptions have not been meeting, then data collection should be reevaluated or continued under caution.

3. CALCULATE TEST STATISTIC. The third step is to calculate a statistic analogous to the parameter specified by the null hypothesis. If the null hypothesis is defined by the parameter , then the statistics computed on our data set would be the mean (xbar) and the standard deviation (s). Since the best estimate of is xbar, our sample mean, the test statistic is based on a distribution of sample means, the sampling distribution of the mean, xbar, with n, sample size, equal to the number of data values used to compute xbar. We have hypothesized from the research question the mean of this distribution and want to see if our sample mean is close to this value. To determine where our sample mean fits on this sampling distribution, we convert our sample mean, xbar, to a z-score. Thus the test statistic would be :

## z = xbar- (hypothesized) standard error of xbar

The standard error of xbar (point estimate) is s, the sample standard deviation, divided by square root of n, the sample size since the population standard deviation is unknown. Example: Suppose we randomly sampled 100 high school seniors and determined their salary of their first job. The sample mean salary, xbar, was \$29,000 with a standard deviation of \$6,000. Since sample size is >30, we don't have to worry about whether the population is normally distributed (Central Limit Theorem). The test statistic would be: z = \$29,000 - \$30,000 = -\$1,000 = -1.667 \$6,000/sqrt(100) \$600

Outlier - For a set of numerical data, any value that is markedly smaller or larger than other values. Mathematically, outliers are considered any number that is more than 1.5 times the interquartile range away from the median. For example, in the data set {3, 5, 4, 4, 6, 2, 25, 5, 6, 2} the value of 25 is an outlier. - Basically, the one that doesn't belong. - Basically, the outlier is the number that stands out.

## Five Steps of Hypothesis Testing

1. STATING THE RESEARCH QUESTION.

The first step is to state the research problem in terms of a question that identifies the population(s) of interest to the researcher, the parameter(s) of the variable under investigation, and the hypothesized value of the parameter(s). This step makes the researcher not only define what is to be tested but what variable will be used in sample data collection. The types of variable (combination of variables as in relationship type research questions) whether categorical, discrete or continuous, further define the statistical test which can be performed on the collected data set. For example: Is the mean first salary of a newly graduated student equal to \$30,000? The population of interest is all students who have just graduated. The parameter of interest is the mean and the variable salary is continuous. The hypothesized value of the parameter, the mean, is \$30,000. Since the parameter is a population mean of a continuous variable, this suggests a one sample test of a mean. 2. SPECIFY THE NULL AND ALTERNATIVE HYPOTHESES. The second step is to state the research question in terms of a null hypothesis (H0) and an alternative hypothesis (HA). The null hypothesis is the population parameter, = \$30,000 (H0: = \$30,000). The alternative hypothesis is the population parameter does not equal \$30,000 ( HA: NE \$30,000). This HA suggests a two-tailed test as NE \$30,000 can be less than \$30,000 or more than \$30,000. Sometimes the alternative hypothesis is stated in terms of a direction such as less than or greater than a value such at \$30,000. A directional HA calls for a one-tailed test, in the direction stated in the HA. The next part of step 2 is to select a significance level (Type I error) typically alpha is used at the .05 or the .01 level. A good researcher will also not neglect Type II error. In this step we are not only setting up our research question in terms of statistical hypotheses, but we must evaluate whether all the assumptions appropriate for the statistical test have been met. Example: H0: = \$30,000 HA: NE \$30,000 alpha=.05

Test assumptions are 1) the population is normally distributed or sample size is approximately >=30 and 2) the sample we have used to collect the data was drawn randomly from the population. If these test assumptions have not been meeting, then data collection should be reevaluated or continued under caution.

3. CALCULATE TEST STATISTIC. The third step is to calculate a statistic analogous to the parameter specified by the null hypothesis. If the null hypothesis is defined by the parameter , then the statistics computed on our data set would be the mean (xbar) and the standard deviation (s). A histogram of our sample data set gives us our best approximation of what we expect our population distribution to look like. Since the best estimate of is xbar, our sample mean, the test statistic is based on a distribution of sample means, the sampling distribution of the mean, xbar, with n, sample size, equal to the number of data values used to compute xbar. We have hypothesized from the research question the mean of this distribution and want to see if our sample mean is close to this value. To determine where our sample

mean fits on this sampling distribution, we convert our sample mean, xbar, to a z-score. Thus the test statistic would be :

z = xbar- (hypothesized) standard error of xbar The standard error of xbar (point estimate) is s, the sample standard deviation, divided by square root of n, the sample size since the population standard deviation is unknown.

Example: Suppose we randomly sampled 100 high school seniors and determined their salary of their first job. The sample mean salary, xbar, was \$29,000 with a standard deviation of \$6,000. Since sample size is >30, we don't have to worry about whether the population is normally distributed (Central Limit Theorem). The test statistic would be: z = \$29,000 - \$30,000 = -\$1,000 = -1.667 \$6,000/sqrt(100) \$600

4. COMPUTE PROBABILITY OF TEST STATISTIC OR REJECTION REGION. The fourth step is to calculate the probability value (often called the p-value) which is the probability of the test statistic for both tails since this this two-tailed test. The probability value computed in this step is compared with the significance level selected in step 2. If the probability is less than or equal to the significance level, then the null hypothesis is rejected. If the probability is greater than the significance level then the null hypothesis is not rejected. When the null hypothesis is rejected, the outcome is said to be "statistically significant"; when the null hypothesis is not rejected then the outcome is said be "not statistically significant." If the outcome is statistically significant, then the null hypothesis is rejected in favor of the alternative hypothesis. Example:

P(z> 1.667) =.048 + P(z< -1.667)=.048, the p-value is .048+.048=.096 Since this value is greater than alpha=.05 selected when we set up out hypotheses, we accept the null hypothesis, H0: = \$30,000.

If we wish to use a rejection region of alpha=.05 (.025 in each tail) to determine if we accept or reject the null hypothesis, the cut-off z-score would be -1.96 and 1.96. If our test statistic is >=1.96 or <= 1.96, then we would reject the null hypothesis at alpha=.05. We can say that our test statistic (transformed into a z-score) is in the rejection region. In this example, our test statistic, z=-1.667 for our test statistic does not fall in the rejection region (sometimes called the acceptance region), so we must accept the null hypothesis.

5. STATE CONCLUSIONS. The fifth and final step is to describe the results and state correct statistical conclusions in an understandable way. The conclusions consist of two statements-one describing the results of the null hypothesis and the other describing the results of the alternative hypothesis. The first statement should state as to whether we accepted or rejected the null hypothesis and for what value of alpha or p-value for our test statistic. The second statement should answer the research question proposed in step 1 stating the sample statistic collected which estimated the parameter we hypothesized. Example: Accept the null hypothesis at alpha=.05 or p-value of .096. Based on a sample mean of \$25,000, the mean salary of a newly graduated student does not equal \$30,000.