Vous êtes sur la page 1sur 92

Page 1

Statistics in Education
Course Packet
Dr. Rachel Vannatta

EDFI 641

Table of Contents
Video #1Introduction to Statistics................................................................2 Video #2Frequency Distributions ...................................................................6 Video #3Central Tendencies & Variability .................................................. 10 Video #4Probability & z Score ...................................................................... 18 M-n-M Activity........................................................................................ 19 Video #5Distribution of Sample Means...................................................... 24 Video #6Hypothesis Testing......................................................................... 28 Video #7t Test ................................................................................................. 38 Video #8t Test of Independent Samples................................................... 44 Interpreting Research ......................................................................... 49 Video #9t Test of Related Samples............................................................ 50 Interpreting Research ......................................................................... 54 Coke vs. Pepsi Experiment ................................................................... 55 Video #10AVOVA............................................................................................. 57 Interpreting Research ......................................................................... 63 Video #11Correlation & Regression.............................................................. 64 Interpreting Research ......................................................................... 73 Video #12Chi Square ...................................................................................... 75 Interpreting Research .......................................................................... 81 Statistical Test Grid ............................................................................ 82 Unit Normal (z-score) Table ............................................................................. 84 t Distribution Table ............................................................................................ 88 F Distribution (ANOVA) table .......................................................................... 89 Pearson Correlation Table.................................................................................. 90 Chi Square Distribution Table ........................................................................... 91

Video #1Introduction to Statistics


Populationthe entire group of individuals that the researcher WISHES to study. Samplea set of individuals selected from population, intended to represent the population Parametervalue that describes the population Statisticvalue that describes the sample

Page 2

Two major types of statistical methods descriptive statssummarize, organize and simplify data (e.g., mean, standard deviation, tables, graphs, distributions) data raw score inferential statstechniques that allow us to study samples and make generalizations about the population from which they were selected (e.g., t test, ANOVA, correlation) sampling erroramount of error between the sample statistic and the population parameter (degree to which the sample differs from the population) random samplingused to minimize error between sample and population Inferential statistics also allow us to study relationships between/among variables that the sample holds. variablecharacteristic/condition that differs among individuals (gender, height, test scores, IQ) constructhypothetical concepts/theory to organize observations operational definitiondefines a construct in terms of how it is measured

Types of Variables categorical variable (discrete)consists of separate categories (e.g., gender, religion, classification of personality) quantitative variable (continuous)can be divided into an infinite number of fractional parts (e.g., height, time, age) independent variableusually a treatment that has been manipulated (control group versus experimental group), usually categorical dependent variableusually the effect, usually quantitative confounding variablean uncontrolled variable that creates a difference between the control and experimental groups Variables determine type of relationship being studied mutual causal Groups must be compared to examine cause and effect categorical variable groups are created by a

Independent Variable Dependent Variable Key Words

Causal Categorical Quantitative Cause Effect Increase/Decrease Difference

Mutual Quantitative Quantitative Relate Relationship Predict Associate

Page 3

Class #1: In-Class Practice Problems


In the following research questions, identify the independent and dependent variables and indicate if it is categorical or quantitative. 1. Is there a significant relationship between college GPA and SAT scores among college freshmen? independent variable dependent variable research design 2. Does receiving a special diet of oat bran significantly decrease cholesterol levels among middle-age adults? Note: Researcher compared a treatment group to a control group. Groups were created using random selection and assignment. independent variable dependent variable research design 3. Does socio-economic status (low, middle, high) effect reading achievement among preschoolers? independent variable dependent variable research design 4. Does receiving whole-language reading instruction increase reading achievement among elementary students? Note: Research compared treatment group (whole-language) to control group (traditional). Existing groups were used. independent variable dependent variable research design

Page 4

Research Designs
Correlationalstudies relationships among 2 or more variables to explain for predict behaviors usually both IV and DV are quantitative example: Teacher studies the relationship between English grades and overall GPA.

Experimentalexamines cause and effect; manipulates a treatment and tests the outcome; compares the experimental and control groups (groups are randomly created)

IV=nominal; DV= interval/ratio example: Researcher compares grades of a group of students that receive computer-assisted instruction to a group that receives none. Groups were created through random assignment.

Quasi-Experimentalexamines cause and effect; indirectly manipulates a treatment and tests the outcome; compares the experimental and control groups (uses existing groups) IV=nominal; DV= interval/ratio example: Researcher compares grades of a group of students that receive computer-assisted instruction to a group that receives none. Existing groups were used.

Causal Comparativeexamines cause and effect (cautiously); compares groups created by some categorical characteristic (gender, religion, ethnicity) IV=nominal; DV= interval/ratio example: Researcher compares final grades of male and female students.

Most research is guided by a hypothesis, a prediction about the effect of the treatment.
Measurement Scales
Nominalnumbers have NO numerical value but represent categories (religion, ethnicity, occupation, gender) Ordinalnumbers represent a rank (1 begin the best); interval can vary (e.g., class rank, Olympic ordinals) Intervalnumbers have typical numerical value; interval are equal; no real zero (e.g., temperature, test score) Ratiosame as interval but has a real zero (e.g., money, time)

Page 5

Identify the measurement scale (nominal, ordinal, interval, ratio) for each. _________________5. _________________6. _________________7. _________________8. _________________9. Size of school district (small, medium, large) Rank of faculty on their teaching ratings Social security number Color of persons eyes IQ scores

_________________10. Degree in Fahrenheit _________________11. Religious affiliation Medalists in an Olympic event Income in actual dollars

________________12. ________________13.

Video #2Frequency Distributions


Frequency distributiontable/graph of the number of individuals located in each category places scores in highest to lowest; groups together all individuals who have the same score

Page 6

X 10 9 8 6 5 4

1 4 5 6 2 2

Proportion and Percents of Frequency Distributions Proportionrelative frequencies; measures the fraction of the total group that is associated with each score; most often appear as decimals proportion = p = N

Percentagepercent of the total group that is associated with each score percentage = p (100) = f (100) N

10 9 8 6 5 4

1 4 5 6 2 2

1/20=.05 4/20=.20 5/20=.25 6/20=.30 2/20=.10 2/20=.10

p=f/N

%=p(100)
5 20 25 30 10 10

cum f
1 5 10 16 18 20

cum%
5 25 50 80 90 100

Page 7

Grouped Frequency Distribution Table


used when data covers a wide range of values; groups are based on class intervals to construct a grouped frequency distribution table, follow these rules: rule 1number of intervalsshoot for 8-12 intervals, 10 intervals being the ideal rule 2interval widthuse appropriate width to reach appropriate # of intervals rule 3interval starting ptshould be a multiple of the width rule 4all intervals should be the same width

Helpful Hints
use the following equation to determine the number of intervals and the width of intervals that is appropriate for the data number of intervals = highest score - lowest score + 1 * interval width ALWAYS round up the number of intervals! It is impossible to have a fourth of an interval at the end of the distribution. So even if the number of intervals (using the above formula equals 8.25, round up to 9!* try different widths, until an appropriate number of intervals is calculated

Example: N=25
51, 55, 57, 60, 63, 66, 68, 69, 70, 72, 74, 74, 74, 75, 77, 79, 83, 84, 85, 85, 88, 90, 92, 95, 98

number of intervals =

98 51 + 1 5 2 2 3 2 3 5 3 2 2 1

48 = 9.6 5

(round up to 10)

95-99 90-94 85-89 80-84 75-79 70-74 65-69 60-64 55-59 50-54

keep in mind that since a continuous variable contains an infinite number of points, a score is not assigned a single point but rather an interval with boundaries, also called real limits, that separate a score from the adjacent scores. Example: X=88 upper real limit = 88.4 lower real limit= 87.5 therefore, a score of 87.75 would fall in the interval of X=88

Frequency Distribution Graphs Uses an x-axis to represent scores or and a y-axis to represent frequencies List scores increasing in value from left to right List frequencies in increasing value from bottom to top The height of the y-axis should be approximately 2/3 to 3/4 of the length of the x-axis

Page 8

Creating a Grouped Frequency Histogramfollow rules for Grouped Frequency Table Histogramused for interval/ratio data; a bar represents an interval (real limits of the score or class interval); bars touch each other to represent the continuous nature of the data; height corresponds to frequency Example: Using data from the Grouped Frequency Table on previous page

5 4

3 2 1
50-54 55-59 60-64 65-69 70-74 75-79 80-84 85-89 90-94 95-99

10 intervals meet the 8-12 interval requirement

Starting pt. is a multiple of the width (5)

Interval width is 5 in order to generate 10 intervals

Other Types of Frequency Distribution Graphs and Polygons

Bar Graphused for nominal/ordinal data; a bar represents a category, bars do not touch Frequency Distribution Polygons used for interval/ratio data; a single dot represents an individual score or a class interval; dots are connected Distribution Curveshows relative frequencies for the population; smooth Normalsymmetrical; greatest frequency in the middle, smallest frequency in the extremes (tails)

Positively Skewed smallest frequency in the positive (right) end of the distribution

Negatively Skewed smallest frequency in the negative (left) end of the distribution

Video # 2 In-Class Practice Problems

Page 9

10, 15, 18, 22, 25, 26, 29, 31, 33, 33, 34, 37, 38, 39, 39, 40, 40, 40, 41, 42, 42, 43, 44, 45, 46, 46, 47, 48, 49, 50 1. Using the data above, do the following: a. construct a histogram based upon the grouped frequency distribution b. determine the distribution type (normal, positive, negative) from the histogram

Video #3: Central Tendency


Measure of Central Tendency

Page 10

describes a group of individuals with a single measurement that is most representative of all individuals Types: mean, median, and mode

Meanarithmetic average used for interval/ratio (quantitative) data computed by adding all the scores and dividing by the number of scores Population mean = = X N Sample mean = X = X n

Medianthe midpoint; the score that divides the distribution exactly in half; 50% are above and below the median used for ordinal data or when: there is a skewed distribution, some scores are undetermined, or there is an open-ended distribution Calculating the median when N is an odd number make sure scores are in order; find the middle score Calculating the median when N is an even number make sure scores are in order; find the two middle scores; add the two scores & divide by 2

Modethe most frequent score used especially for nominal data represented by the highest point in the frequency distribution

Central Tendency and the Shape of Distributions


Normal distributionmean, median, and mode are equal and smack-dab in the middle of the distribution Skewed Distributions not symmetrical mean, median, mode are different extreme scores on one end of the distribution Mean is most affected by extreme scores, so it will be furthest out in the tail Negatively Skewedextreme scores are on the low end of the distribution

Mean Median Mode

Page 11

Positively Skewedextreme scores are on the high end of the distribution

Mode

Median

Mean

Variability
Variabilitya measure that describes how spread out or close together the scores are within the distribution Rangedistance between the highest score and the lowest score in the distribution; easiest measure of variability
7
6

range = (high score - low score)

6
5 5 4 3

5
4

4
3

Distribution 1 Range = 10 Mean= 6 Median= 6 Mode= 6 SD= 2.45

3
2 2 1

2
1

1 0
1 2 3 4 5 6 7 8 9 10 11

Page 12

Standard Deviation from the Mean most common measure of variability; average distance of scores from the mean

7
6

6
5 5 4 3 2 1

5
4

4
3

3
2

2
1

1 0
1 2 3 4 5 6 7 8 9 10 11

9 8 7 6 5 4 3 2 1 0

4 3 2 1 2 3 2 2 1

Distribution #2
5 6 7

10

11

Page 13

Page 14

Standard Deviation Activity


o Need 16 pieces of candy (M-n-Ms, Skittles, etc.) o You must use all 16 pieces for each distribution. o Use Distribution Graph from Blackboard Course Site (located in Course Documents)

Steps
1. For distribution A create a normal distribution like Dr. Vannattas with your candy. Trace outline of distribution. Now on your own, complete the following: 2. For distribution B, move candy around to create a distribution that has greater variability than A. Trace outline of distribution. 3. For distribution C, move candy around to create a distribution that has less variability than A. Trace outline of distribution. 4. For distribution D, move candy around to create a distribution that has the least possible amount of variability.

Variability Key Concepts


Variability shows how spread out scores are in the distribution. Range only takes into account the two extreme scores (highest and lowest) Standard deviation compares all scores to the mean When scores are close to the mean, then variability is less. When scores are far from the mean (outliers, extreme ends of the distribution), then variability is more.

Calculating Standard Deviation


standard deviation for population =

(X - )2
N

standard deviation for sample = s =

(X - X)2
n-1

degrees of freedom (df = n - 1) an adjustment of sample bias; to calculate the standard deviation, we must know the sample meanthis places a restriction on sample variability since only (n - 1) scores are free to vary once we know the sample mean.

Page 15

Example for calculating the standard deviation for a sample


X 2 3 3 4 4 4 5 5 5 5 6 6 6 7 7 8 X 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 X - X -3 -2 -2 -1 -1 -1 0 0 0 0 1 1 1 2 2 3 (X - X)2 9 4 4 1 1 1 0 0 0 0 1 1 1 4 4 9 SS = 40

Variance(s2) = (X - X)2 = SS = 40 = 2.6

n1

n1

15

Standard dev (s) = (X - X)2= SS = 2.6 = 1.62 n-1 n-1

Sum of squaressum of squared deviation scores or sum of squared differences SS = (X - X)2 also SS = s2(n-1) Variancemean of squared deviation scores; sum of squares divided by the number of scores minus 1 variance = s2 = (X - X)2 n-1

Steps to Calculate Standard Deviation


1. 2. 3. 4. Calculate mean (X) Calculate the difference between each score and the mean (X X) Square each difference (X X)2 Add the squared differences This is the Sum of Squares (SS) = (X X)2

5. Divide SS by degrees of freedom (df = n-1) This is Variance = (X X)2 n-1 6. Take the square root of variance This is the Standard Deviation (SD) =

(X X)2 n-1

Page 16

Standard Deviation Calculation Practice


a. Calculate the standard deviation for the following data (X=6). b. Calculate the standard deviation for the following data (X=6). Notice the mean is the same, but three scores have been changed to 6.

X 2 2 8 8 10

X X

(X X)2

X 2 6 6 6 10

X - X

(X - X)2

SS=

SS=

How does the change in data effect the SD? Why?

Characteristics of standard deviation a small standard deviation indicates that scores are close together a large standard deviation indicate that scores are spread out adding a constant to each score will not chance the standard deviation multiplying each score by a constant cause the standard deviation to multiply by that same constant research articles usually use (SD) to refer to the standard deviation Standard deviation and the normal distribution three standard deviations on each side of the mean

-3

mean

+1

+2

+3

Video #3: In-Class Practice Problems


For the following sample of scores: 1, 2, 3, 3, 4, 4, 5, 5, 5, 5, 6, 6, 7, 7, 8, 9 a. Sketch a frequency distribution histogram.

Page 17

This data is slightly different from what is presented in the video so that a cleaner mean would be calculated.

b. Calculate the following: mean = ____________________ median = ____________________ mode = ____________________ range = ____________________ degrees of freedom = ____________________ standard deviation = ____________________

c.

From you calculations, identify the distribution type.

Video #4: Probability


Probability is used to: determine the types of sample we are likely to obtain from a population make conclusions about the population from the sample

Page 18

Probabilityfraction, proportion or percent of selecting a specific outcome out of the total number of possible selections probability of A = number of As total number of possible outcomes

probability of selecting a heart out of deck of cards p (heart) = 13 = 1 = .25 25% 52 4

Probability and the Normal Distribution


A normal distribution holds 100% of the individuals in it the mean, median and mode are all equal and divide the distribution in half 50% of distribution is above and below the mean When the percent is divided by the standard deviations, it looks like this

99.7% 95% 68%

.13%

13.59% 2.14%

34.13%

34.13%

13.59%

2.14%

.13%

-3
0.13%

2
2.28%

1
15.87%

mean
50%

+1
84.13%

+2
97.72%

+3
99.87%

Page 19

z z

z Scores

Page 20

z scoremeasure of relative position; identifies position of a raw score in terms of the number of standard deviations it falls above or below the mean Use z scores to convert raw score into percentile rank z=X-

Example: Jill gets a raw score of 55 on a standardized math test (=50, =10). What is Jills z score?
z = X - = 55 - 50 10 = 5 = .5 10 So Jill is .5 standard deviation above the mean.

99.7% 95% 68% 13.59% .13% 2.14% 34.13% 34.13% 13.59% 2.14% .13%

-3z
0.13%

-2z
2.28%

-3z
15.87%

mean
50%

1z
84.13%

2z
97.72%

3z
99.87%

View area under the normal curve in terms of probability and percent:
What What What What What

is the probability of selecting a score that fall beyond 1z? p=.1587 is the probability of selecting a score that fall below -2z? p=.0228 is the percentile rank of someone who has a z score of 2? 98th %tile is the percentile rank of someone who has a z score of 1? 84th %tile if we have a z-score of 1.2, how can we find the probability or percentile rank?

we use the table of z scores provided in your course packet (see statistical tables on
page 84)

Page 21

Putting it all together


Suppose Jack receives a raw score of 540 on the SAT-math (=500, =100). What is Jacks z score and percentile rank? Jack

z = 540 - 500 = .4
100

Proportion (p) = .6554 Rank = 65.54 %tile

200 -3z

300 -2z

400 -1z

500 0z

600 +1z

700 +2z

800 +3z

Use z score determine an unknown raw score

Suppose an individual scored at the 70th % on a standardized test (,=100, =10), but for some reason we dont know his raw score and need to calculate it.

1. Use the equation: raw score = + z 2. Use the percentile rank and convert it to a probability (example: 70%
.7000).

3. Use the z-table to identify the z-score associated with the probability .7000 corresponds to a z-score of z=.52 (Notice that we could not find a probability of
exactly .7000 but had to find a probability that was closest to .7000, which was .6985).

Now just plug raw score raw score raw score

in z, , and to our equation = + z = 100 + .52(10) = 105.2

Video #4: In-Class Practice Problems


For the problems 1-4, apply the parameters ( = 50, = 5). 1. Draw the distribution. Include z-scores and mean and standard deviation.

Page 22

2. Bebe scored 48. Place Bebes score on the distribution. What is her z-score and percentile rank?

3. Kenny scored 63. Place Kennys score on the distribution. What is his z-score and percentile rank?

4. Sally is at the 71st percentile. Place Sally on the distribution. What is her z-score and raw score?

Page 23

For the problems 5-7, use the following parameters from the GRE ( = 500, = 100). 5. Mary scored 570. What is her z-score and percentile rank?

6. Dick scored 340. What is his z-score and percentile rank?

7. Jill is at the 38th percentile. What is her raw score?

For the problems 8-10, use the parameters from an IQ test ( = 100, = 15). 8. Wendy scored at the 90th percentile. What is her raw score?

9. What percent falls between the scores of 100 and 115?

10. Jack scored 80. What is his z score and percentile rank?

Answers for Class #4 In-Class Problems:


5) z=.7, percentile rank=75.8; 6) z=-1.6, percentile rank=5.48; 7) z=-.31, raw score = 469; 8) z=1.28, raw score = 119.2; 9) 34.13% fall between the mean and 1z; 10) z=-1.33, percentile rank = 9.2

Video #5: Distribution of Sample Means

Page 24

With statistics, we are usually trying to make conclusions/inferences about the population from the studied sample. Consequently, we want to compare the sample to the population of similar samples. But in doing so, two issues arise: How do we know is a sample is representative of the population when every sample is different? How can we transform a population distribution of individuals to a population distribution of sample means? Every sample is different from the population, this is known as sampling error, or the discrepancy/error between the sample and the population. Random sampling is used to minimize sampling error, which can occur randomly

If we were to take a population distribution of individuals. . . randomly group individuals into similar sized samples then calculated the means of these samples and placed them into a frequency distribution a normal curve would formthis distribution is known as the distribution of sample means. any distribution that is of sample statistics and NOT individual scores is referred to as a sampling distribution.

Characteristics of the distribution of sample means


will approach a normal distribution as sample size increases (a sample size greater than 30 is considered normal) the mean of the distribution of sample means is equal to the population mean of individuals and is also known as the expected value of X. standard deviation of this new distribution is called the standard error of X. standard error (x)measures the standard distance between the sample mean (X) and the population mean (); indicates how good an estimate X will be for . standard error (x) = n

as sample size increases, the standard error will decrease-----> which means that the samples are more representative of the population

Page 25

Probability and the Distribution of Sample Means


We can now use the distribution of sample means to find the probability of obtaining a specific sample mean from the population of samples Example: What is the probability of getting a sample mean of 515 or higher on the SAT-math (=500, =100) with a random sample of n = 400?

Calculate the standard error for samples of n=400.

x =

100 400

100 20

Draw distribution of sample means

pop of individuals
pop of samples (n=400)

-3z 200 485

-2z 300 490

-1z 400 495

0z 500 500

+1z 600 505

+2z 700 510

+3z 800 515

A sample mean of 515 corresponds to +3z Using the z table, +3z corresponds to a probability of .0013 (.13%)

Page 26

What if the sample mean does not correspond to a whole z score?


Use z = X - x

Example: What is the probability of getting a sample mean of 104 or higher on an IQ test (=100, =15) with a random sample of n = 36? Calculate the standard error for samples of n=36. x = n = 15 36 = 15 6 = 2.5

Draw distribution of sample means

pop of individuals
pop of samples (n=36)

-3z 55 92.5

-2z 70 95

-1z 85 97.5

0z 100 100

+1z 115 102.5

+2z 130 105

+3z 145 107.5

A sample mean of 104 corresponds to +1.6z z = X - = 104 100 = 4 = 1.6 X 2.5 2.5

Using the z table, +1.6z corresponds to a probability of .0548 (5.48%)

Video #5: In-Class Practice Problems

Page 27

1. A normal population has = 70 and = 12. a. Sketch the population distribution. What proportion of the scores have values greater than a score of X = 73? b. Sketch the distribution of sample means for samples of size n = 16. What proportion of the means have values greater than a mean of X = 73?

-3z
pop of individuals pop of samples (n=36)

-2z

-1z

0z

+1z

+2z

+3z

2.

For a normal population with = 70 and = 20, what is the probability of obtaining a sample mean greater than X = 75 a. For a random sample of n =4? b. For a random sample of n =16? c. For a random sample of n = 100?

pop of samples (n=4)


pop of samples (n=16)

-3z

-2z

-1z

0z

+1z

+2z

+3z

pop of samples (n=100)

Video #6: Hypothesis Testing


Page 28

Hypothesis Testingusing sample data to evaluate a hypothesis (prediction) about the population so conclusions/inferences can be made about the population from the sample We are testing a hypothesis to determine if the treatment has caused a significant change in the population the majority of sample means are in the middle of the distribution; so for a sample to be significantly different, it should be with the extreme means in the tails of the distribution, where the probability is very low

Steps in Hypothesis Testing


1. Stating the Hypotheses 2. Establish significance criteria 3. Collect and analyze data 4. Evaluate null hypothesis 5. Draw conclusion

Step 1Stating the Hypotheses


hypotheses should be stated in terms of the population like a research question, your hypothesis should include three parts: variables, relationship, and sample two hypotheses must be developedan alternative and a null Write alternative hypothesis in statement form Write notation for both alternative and null alternative hypothesisthe actual prediction about the change or relationship that may occur in the population null hypothesisstatement that the treatment has no effect on the population

hypotheses can also be directional or non-directional non-directionaljust a prediction of a change/effect Key words: effect, impact, difference, cause directionala prediction of increase or decrease Key words: increase, decrease, higher, lower, positive, negative
(applying example values of =60)

Summary of Hypotheses Notation

Alternative One-tailed
(Directional)

Null H0: sprog 60 H0: sprog 60 H0: sprog = 60

H1: sprog > 60 H1: sprog< 60 H1: sprog 60

Two-tailed

(Non-directional)

Page 29

Example: Suppose that local school district implemented an experimental program for science education. After one year, 100 children in the special program obtained a mean score of X=63 on a national science achievement test (=60, =12). Did the program have an impact on the participants science achievement? alternativeThe science program will significantly effect science achievement among program participants. This is an example of a non-directional hypothesis;

H1: sprog 60

null The science program will NOT significantly effect science achievement among program participants.

H0: sprog = 60

Step 2Establish significance criteria


How much does the population need to change to show a significant effect from the treatment? Is the change due to the treatment or sampling error? Typically to be significantly different, we require the sample to be different from 95% or 99% of the population

By setting a benchmark or criteria that requires the change in the population mean to be quite large and the probability of this change due to be very low, we decrease our chance of a Type I error this criteria is known as the level of significance or alpha level () most commonly used alpha levels are .05 (5%) and .01 (1%) these levels of significance correspond with specific z scores, but depends upon whether the hypothesis is directional or non-directional

non-directional hypothesis--->2-tailed test .05 level -------> zcritical = 1.96 .01 level -------> zcritical = 2.58

99% 95%

-3z

-2z

-1z

0z

+1z

+2z

+3z

-2.58z -1.96z

+1.96z +2.58z

directional hypothesis---> 1-tailed test .05 level -------> zcritical = + or - 1.65 .01 level -------> zcritical = + or - 2.33

95% 99%

-3z -2z -1z 0z when the sample mean exceeds the limit, then it differs significantly so we would reject the null

+1z+

2z

+3z

+1.65z +2.33z

Page 30

Step 3Collect & analyze sample data--random selection highly recommended so that
sample is representative of population
Recall that when a test statistic is calculated by hand, you need to identify the critical value (zcritical), which is then compared to the test statistic (zcalculated) to determine significance. Computer automatically determines the probability of obtaining a test statistic due to chance. Consequently, when determining significance you do NOT compare zcalculated to zcritical, rather you examine the p-value or level of significance.

If p (or sig) is less than alpha level (.05 or .01) the null.

test statistic is significant reject test statistic is NOT

If p (or sig) is greater than alpha level (.05 or .01) significant fail to reject the null.

Decision-making Table
Comparison
Hand Calculations Computer

Significance?
Significance! Not! Significance! Not!

Decision?
Reject Null Fail to Reject Null Reject Null Fail to Reject Null

Conclusion
Restate Alternative Restate Null Restate Alternative Restate Null

zcalculated zcritical zcalculated < zcritical p alpha p > alpha

Step 4Evaluate the null hypothesis


Compare the data with the null if the sample data is significantly different, then reject the null if the sample data is NOT significantly different, then fail to reject the null

Step 5Draw conclusion


If null is rejected restate alternative hypothesis for conclusion. If you fail to reject the null state the null hypothesis as conclusion

Errors in Hypothesis Testing--Two types of errors are possible when testing a hypothesis:
Type I Errorwe could make the mistake of rejecting the null when it really the H0 is true, when there really isnt a significant change due to the treatment this kind of error may be due to sampling error (the sample was above the population mean even before the treatment) minimize a Type I error by setting low alpha () level (low probability for making an error) Type I error is more serious!

Type II Error we could make the mistake of not rejecting the null when we should have, when there really is a significant change due to the treatment the treatment effect was not big enough most likely due to sampling error (the sample was below the population mean even before the treatment)

Page 31

Putting it all together Example of a two-tailed test


Lets go back to our previous example of the science program: After one year, 100 children in the special program obtained a mean score of 63 on a national science achievement test (=60, =12). Did the program have an impact on the participants science achievement? Test at the .05 level.

Step 1: Develop hypotheses


State AlternativeSpecial science program will significantly effect science achievement among program participants. Determine if it is a one-tailed or two-tailed test. It is non-directional hypothesis ------>two-tailed Notation: H1: sprog 60 H0: sprog = 60

Step 2: Establish significance criteria Computer = .05 Hand calculations identify z scores used for the alpha level and the appropriate test.
two-tailed test at .05 corresponds to zcritical = 1.96

Step 3: Collect and analyze sample data Computer enter and analyze data Hand calculations
Calculate standard error

x = = 12 = 12 = 1.2 n 100 10
Draw distribution of sample means and shade in critical region

95%

pop of individuals
pop of sample means (n=100)

56.4

-3z 24

57.6

-2z 36

58.8

-1z 48

0z 60
60

61.2

1z 72

62.4

2z 84

63.6

3z 96

-1.96z

+1.96z

Page 32

Step 4: Compare sample data to null

Computer

Identify test statistic and level of significance (p-value) in output z = 2.49, p=.0064 Compare level of significance with alpha level

p-value of .0064 is less than .05 it is significant reject null Hand calculations

Calculate test statistic Convert sample mean into z score to determine if it falls in critical region. z = X = 63 - 60 = 3 = 2.5 X 1.2 1.2 it exceeds +1.96z, so it is significant, reject the null

Step 5: Draw conclusion Null is rejected so alternative hypothesis is restated as conclusion Participation in the science program did significantly effect science achievement scores among program participants.

Example of a one-tailed test: Suppose we took the same example, but hypothesized that the
program would cause a significant increase in achievement scores--this would be a directional hypothesis. In addition, lets change the level of significance to .01 Recall: n = 100, X = 63, = 60, = 12

Step 1: Develop hypotheses


State alternative: Special science program will significantly increase science achievement scores among program participants. Determine if it is a one-tailed or two-tailed test. It is directional hypothesis ------>one-tailed H1: sprog > 60 H0: sprog < 60

Step 2: Establish significance criteria Computer = .01


Hand calculations Identify z scores used for the alpha level and the appropriate test.

one-tailed test at .01 corresponds to z = + 2.33, since we are looking for an increase, we are focusing on the positive end of the distribution

Step 3: Collect and analyze sample data Computer enter and analyze data Hand calculations Calculate standard error
x = n = 12 100 = 12 10 = 1.2

Page 33

Draw distribution of sample means and shade in critical region

99%

-3z pop of individuals 24


sample means (n=100)

56.4

57.6

-2z 36

58.8

-1z 48

0z 60
60

61.2

+1z 72

62.4

+2z 84

63.6

+3z 96

+2.33z

Step 4: Compare sample data to null Computer


Hand calculations

Identify test statistic and level of significance (p-value) in output z= 2.49, p=.0032 Compare level of significance with alpha level p-value of .0032 is less than .01 it is significant reject null

Calculate test statistic Convert sample mean to z score to determine if it falls into the critical region.

z = X - = 63 60 = X 1.2 Step 5: Draw conclusion

3 = 2.5 it exceeds +2.33z, 1.2 so it is significant,reject the null

Null is rejected so alternative hypothesis is restated as conclusion Participation in the science program did significantly increase achievement scores among program participants.

Page 34

Assumptions for Hypothesis Testing with z Scores


random sampling and independent observations population standard deviation will remain the same after the treatment; it is like adding a constantthe mean changes but the will not normal sampling distribution

Reporting of Results of the Statistical Test


p-value is reported in as: reject the nullp<.05 fail to reject the nullp>.05 z test results statement include the following parts: sample mean; (M=63) z calculated with the degrees of freedom in parentheses; (z(99) = 2.5) to calculate degrees of freedom (df); df = n - 1 in our example, n=100, so df= n-1 = 100 - 1 = 99 alpha level; (p< .05) two-tailed or one-tailed include population mean and SD (=60, =12) Example from one-tailed test: Participation (M=63) in the science program did significantly increase achievement scores; z(99)=2.5, p<.05, one-tailed; when compared to the population (=60, =12).

Complete the process of hypothesis testing for each of the scenarios. 1. A high school counselor created preparation course for the SAT-verbal (=500, =100). A random sample of n = 16 students complete the course and then take the SAT. The sample had a mean score of X = 554. Does the course have a significant affect on SAT scores? Test at the .01 level.

Video #6: In-Class Practice Problems

Page 35

Z-test results: - mean of Variable (Std. Dev. = 100) H0 : =500 HA : not equal 500 Variable var1 n 16 Sample Mean 554 Std. Err. 25 Z-Stat 2.16 P-value 0.0308

a. Alternative hypothesis in sentence form.

b. Circle:

one-tailed

or

two-tailed

c. Write the alternative and null hypotheses using correct notation. H 1: H0:

d. zcalculated =

e. Level of significance (p) =

f. Circle:

reject null

or

fail to reject null

g. Write your conclusion in sentence form.

Page 36

2. A researcher believes that children who grow up as an only child develop vocabulary skills at a faster rate than children in large families. To test this, a sample of n = 25 four-year-old only children are tested on a standardized vocabulary test (=60, =10). The sample obtains a mean of X = 63.8. Test at the .05 level.

Z-test results: - mean of Variable (Std. Dev. = 10) H0 : =10 HA : > 10 Variable var1 n 25 Sample Mean 63.8 Std. Err. 2 Z-Stat 26.9 P-value <0.0001

There was an error when conducting this test. The population mean is NOT 10 but rather 60. The result is still significant, but the z-statistics would have been 1.93 with p=.03.

a. Alternative hypothesis in sentence form.

b. Circle:

one-tailed

or

two-tailed

c. Write the alternative and null hypotheses using correct notation. H 1: H0:

d. zcalculated =

e. Level of significance (p) =

f. Circle:

reject null

or

fail to reject null

g. Write your conclusion in sentence form.

Page 37

3. A psychologist investigates IQ among autistic children to determine if their IQ is significantly different from the norm. Using a standardized IQ test (=100, =10), he tests 10 autistic children, all age 12. The following output was generated using StatCrunch. Test at = .05. Sample data are: 105, 110, 130, 150, 185, 100, 125, 95, 85, 120 Z-test results: - mean of Variable (Std. Dev. = 10) H0 : =100 HA : not equal 100 Variable var1 n 10 Sample Mean Std. Err. Z-Stat P-value

120.5 3.1622777 6.4826694 <0.0001

a. Alternative hypothesis in sentence form.

b. Circle:

one-tailed

or

two-tailed

c. Write the alternative and null hypotheses using correct notation. H 1: H0:

d. zcalculated =

e. Level of significance (p) =

f. Circle:

reject null

or

fail to reject null

g. Write your conclusion in sentence form.

Video #7: The t Statistic

Page 38

To use the z score as a test statistic, we must know the population standard deviation in order to calculate the standard error of sample means. Unfortunately, most of the time we do not know

, so what do we do?

The t statistic, commonly known as a t test, allows us to compare the sample to the null by using the sample standard deviation to estimate the standard error of sample means. estimated standard error (sX) = s n

The t statistic uses a formula very similar to z but instead utilizes the estimated standard error. t=X- z= X-

sX

Tip on when to use which: if you know , then use z if you dont know , use t Since we are comparing a single sample mean to a population mean, this t test is called Single Sample t Test or One Sample t Test.

The t Distribution

Since the t statistic utilizes the estimated standard error (sX), the t distribution only approximates the normal distribution and is based on degrees of freedom (df = n - 1) not the total sample size. as df and sample size increase, the closer the s represents , and the better the t distribution approximates the normal (z) distribution since the t distribution has more variability, it is more spread out and flatter we use the t statistic in a very similar way as we used z, in that we use a t distribution table to find the probability of a t statistic note: since the t statistic is dependent on degrees of freedom, the critical t statistics corresponding to levels of significance () vary with the degrees of freedom, unlike the critical z scores (where a two-tailed test at .05 will always corresponds to zcritical = 1.96)

Summary Table of Hypotheses Notation (applies values from following example)


Alternative One-tailed Two-tailed H1: > 27 H1: 27 Null H0: 27 H0: = 27

Page 39

Reporting of Results of the t Test t Test results statement include the following parts: results with sample mean and standard deviation; (M = 24.58 , SD = 3.48 ) t calculated with the degrees of freedom in parentheses; (t(11) = -2.40) alpha level or p-value; (p< .05) two-tailed or one-tailed

Example:

Subjects (M = 24.58 , SD = 3.48) spent significantly less time talking to parents than the therapists claim; t(11) = -2.40, p< .05, two-tailed. Assumptions of the t test: independent observations, normal population Putting it all together Example of a two-tailed t test A family therapist states that parent talk to their teens an average of 27 minutes per week. Surprised by this claim, a counselor collects data on 12 teens and finds the following (X = 24.58, s = 3.48) Does the amount of parent talk for the sample significantly differ from the therapists claim? Test at the .05 level. Step 1: Develop hypotheses State Alternative: Amount of parent talk for sample will significantly differ from the norm. Determine if it is a one-tailed or two-tailed test. It is non-directional hypothesis ------>two-tailed H1: 27 (samples will be different) H0: = 27 (samples will NOT be different) Step 2: Establish significance criteria Computer =.05 Hand calculations Identify tcritical used for the alpha level, the appropriate test, & df two-tailed test at .05 (df =11) corresponds to tcritical = 2.201 Step 3: Collect and analyze sample data Computer enter and analyze data Hand calculations Calculate estimated standard errorsx =

s n

= 3.48 = 3.48 = 1.01 12 3.46

Page 40

Step 4: Compare sample data to null------>calculate test statistic Computer Identify test statistic and p-value in output o t(11)=-2.396, p=.019 o p-value (.019) is less than alpha (.05) so it is significant reject null Hand Calculations Convert the sample mean into a t statistic to determine if it falls into the critical region. tcalculated = X - = 24.58 - 27 = -2.42 sX 1.01 1.01 = -2.396 it exceeds -2.201, so it is sig., reject null

Step 5: Draw conclusion Amount of parent talk for sample (M = 24.58, SD = 3.48) significantly differs

from the norm; t(11)=-2.396, p<.05, two-tailed.

Video #7: In-Class Practice Problems

Page 41

1. On a standardized spatial skills task, normative data reveals that people typically get = 15 correct solutions. A psychologist tests n = 7 individuals who have brain injuries in the right cerebral hemisphere. For the following data, determine whether or not right-hemisphere damage results in reduced performance on the spatial skills task. Test at the .05 level. Data: 12, 16, 9, 8, 10, 17, 10

T-test results: - mean of Variable H0 : = 15 HA : < 15


a. Independent Variable = b. Dependent Variable = c. Circle: One-tailed

Variable var1

Sample Mean 11.714286

Std. Err. 1.3222327

DF 6

T-Stat -2.4849744

P-value 0.0237

Scale (circle): Categorical Scale (circle): Categorical Two-tailed

Quantitative Quantitative

d. Alternative hypothesis in sentence form.

e. Write the alternative and null hypotheses using correct notation. H1: f. tcalculated = H0: g. Level of significance (p) =

h. Circle: i.

reject null

or

fail to reject null

Write your conclusion in sentence form.

Page 42

2. A researcher would like to examine the effects of humidity on eating behavior. It is know that laboratory rats normally eat an average of = 21 grams of food each day. The researcher selects a random sample of n = 25 rats and places them in a controlledatmosphere room where the relative humidity is maintained at 90%. On the basis of this sample, can the researcher conclude that humidity affects eating behavior. Test at the .05 level.

T-test results: - mean of Variable H0 : = 21 HA : not equal 21

Variable Sample Mean var1

Std. Err.

DF

T-Stat

P-value

16.12 0.79229623

24 -6.1593122 <0.0001

a. Independent Variable = b. Dependent Variable = c. Circle: One-tailed Two-tailed

Scale (circle): Categorical Scale (circle): Categorical

Quantitative Quantitative

d. Alternative hypothesis in sentence form.

e. Write the alternative and null hypotheses using correct notation. H1: f. tcalculated = h. Circle: i. reject null or H0: g. Level of significance (p) = fail to reject null

Write your conclusion in sentence form.

Page 43

3. Does the average age of students enrolled in EDFI 641 differ significantly from the average age of BGSU grad students (24 years)? Test at the .01 level. T-test results: - mean of Variable H0 : = 24 HA : not equal 24 Variable Sample Mean var1 Std. Err. DF T-Stat P-value 0.0453

27.125 1.4314183

15 2.1831493

a. Independent Variable = b. Dependent Variable = c. Circle: One-tailed Two-tailed

Scale (circle): Categorical Scale (circle): Categorical

Quantitative Quantitative

d. Alternative hypothesis in sentence form.

e. Write the alternative and null hypotheses using correct notation. H1: f. tcalculated = H0: g. Level of significance (p) =

h. Circle: i.

reject null

or

fail to reject null

Write your conclusion in sentence form.

Video #8: t Test of Independent Samples

Page 44

So far, we have only used one sample to draw inferences about one population. What if we want to compare two different groups, such as male vs female or Treatment A students vs Treatment B students? t Test of Independent Samples draws conclusions about two populations by comparing two samples; since we are looking at differences between the two samples and the two populations, the t statistic reflects these multiple comparisons tsingle sample = X - sX tind samples = (X1 - X2) - (1 2) sX1 - X2 where sX1 - X2 = sp2 n1 + sp 2 n2

Recall, that for the single sample t test, we calculated the estimated standard error. Since we are now comparing two samples to two populations, we calculate the standard error of sample mean differences. Standard error of sample mean differences total amount of error involved in using two sample means to approximate two population means (averages the error of the two sources). However, the preceding formula for sX1 - X2 is only appropriate when the two samples are the same size. To correct for the bias in sample variances, we need to combine the two sample variances into a single value called pooled variance.

Pooled Varianceaverages the two sample variances, which allows the bigger sample to carry more weight. pooled variance = sp2 = SS1 + SS2 df1 + df2 Using the pooled variance, we can now calculate an unbiased measure of the standard error of sample mean differences: sX1 - X2 = sp2 n1 + sp 2 n2

Hypothesis Testing with t Test of Independent Samples t Test of Independent Samples used to test a hypothesis about the mean difference between two populations null hypothesis reflects no difference alternative hypothesis reflects a difference

One-tailed Two-tailed

Alternative H1: 1 > 2 OR H1: 1 2 > 0 H1: 1 2 OR H1: 1 - 2 0

H0: 1 2 H0: 1 = 2

Null OR H0: 1 2 0 OR H1: 1 - 2 = 0

rejection of null------>data indicate a significant difference between the two populations failure to reject null------>data indicate NO significant difference between the two populations

Assumptions about t test of independent samples: independent observations, each population must be normal and have equal variances (homogeneity of variance).

Page 45

Putting it all together Example of a one-tailed t test A psychologist would like to examine the effects of fatigue on mental alertness. An attention test is prepared that requires subjects to sit in front of a blank TV screen and press a response button each time a dot appears on the screen. A total of 110 dots are presented during a 90 minute period, and the psychologist records the number of errors for each subject. Two groups of subjects are selected. The first group (n =5) is test after they have been awake for 24 hours (X = 34, SS = 63). The second group (n=10) is tested in the morning after a full nights sleep (X = 24, SS = 100). Can the psychologist conclude that fatigue significantly increases errors on an attention task? Test at .05 level.

Step 1: Develop hypotheses State alternative: Fatigue will significantly increase the number of errors on an attention task. It is directional hypothesis ------>one-tailed H1: fatigue > rested H0: fatigue rested Step 2: Establish significance criteria Computer =.05 Hand calculations Identify tcritical used for the alpha level, the appropriate test, and df one-tailed test at .05 (df =13) corresponds to tcritical = +1.771

Step 3: Collect and analyze sample data; Computer Hand calculations Calculate pooled variance pooled variance = sp2 = SS1 + SS2 = 63 + 100 df1 + df2 4+9 Calculate standard error of sample mean differences sX1 - X2 = sp2 n1 + sp2 n2

= 163 = 12.54 13

= 12.54 + 12.54 = 2.51 + 1.25 = 1.94 5 10

Step 4: Compare sample data to null------>calculate test statistic

Computer review output

Two Sample T-test results (with pooled variances): 1 - mean of var2 where var1=1 2 - mean of var2 where var1=2 H0 : 1 - 2 = 0 HA : 1 - 2 > 0 Difference 1 - 2 Sample Mean 10 Std. Err. 1.9360149 DF 13 T-Stat 5.1652493 P-value <0.0001

Identify test statistic and p-value in output t(13)=5.17, p<.0001 Compare p-value to alpha level p is less than .05 reject null

Hand calculations Calculate t tind samples = (X1 - X2) - (1 2) sX1 - X2 tcalculated > t critical, reject null Step 5: Draw conclusion

Page 46

(34 - 24) - 0 = 10 1.94 1.94

= 5.15

Null is rejects so alternative hypothesis is restated as conclusion Fatigue significantly increased the number of errors in attention task; t(13)=5.17, p<.0001, one-tailed.

Some additional thoughts when comparing groups:


Create frequency polygons for each group to decide which measure of central tendency is appropriate and if they follow a normal distribution If possible use information about known groups, such as norms from standardized tests, to compare sample data Calculate effect size as a measure of the magnitude of a difference between the two groups. This has become very important in recent years.

A t test will not calculate effect size. You must calculate it by hand.
o A common index of effect size (r2) Percentage of Variance accounted for effect size (r2) = t2 t2 + df

Typically an effect size of 0.50 (50%)or larger signifies an important difference Use inferential statistics very cautiously especially when dealing with non-random samples-be very careful in generalizing your results to the population

Page 47

In-Class Practice Problems


1. Extensive data indicate that first-born children develop different characteristics than later-born children. For example, first-borns tend to be more responsible, hard working, higher achieving, and more self-disciplined than their later-born siblings. The following data represent scores on a test measuring self-esteem and pride. Samples of n=10 first-born college freshman and n=20 later-born freshmen were each given the self-esteem test. Do these data indicate a significant difference? Test at the .05 level.

Summary statistics for var2 grouped by var1 var1 1 2 n 10 20 Mean 43.1 36.8 Variance 17.211111 25.010527 Std. Dev. 4.1486278 5.0010524 Std. Err. 1.3119112 1.1182693 Median 43.5 36.5 Range 14 18 Min 36 30 Max 50 48 Q1 40 33 Q3 46 40

Two Sample T-test results (with pooled variances): 1 - mean of var2 where var1=1 2 - mean of var2 where var1=2 H0 : 1 - 2 = 0 HA : 1 - 2 not equal 0

Difference 1 2

Sample Mean 6.3

Std. Err. 1.8372631

DF 28

T-Stat 3.4290135

P-value 0.0019

a. Independent Variable = b. Dependent Variable = c. Circle: One-tailed Two-tailed

Scale (circle): Categorical Scale (circle): Categorical

Quantitative Quantitative

d. Alternative hypothesis in sentence form.

e. Write the alternative and null hypotheses using correct notation. H1: H0: f. tcalculated = h. Circle: i. reject null or g. Level of significance (p) = fail to reject null j. effect size r2=

Write your conclusion in sentence form.

Page 48
2.

Does level of anxiety (measured on a scale from 1 to 10) when enrolling in a statistics class differ by gender? Test at the .05 level.

Summary statistics for var2 grouped by var1


var1 1 2 n 10 10 Mean 7.1 5.6 Variance 8.1 6.711111 Std. Dev. 2.8460498 2.5905812 Std. Err. 0.9 0.8192137 Median 8 5 Range 7 7 Min 3 3 Max 10 10 Q1 4 4 Q3 10 7

Two Sample T-test results (with pooled variances): 1 - mean of var2 where var1=1 Difference Sample Mean 2 - mean of var2 where var1=2 1.5 1 - 2 H0 : 1 - 2 = 0 HA : 1 - 2 not equal 0

Std. Err. 1.2170091

DF 18

T-Stat 1.2325299

P-value 0.2336

a. Independent Variable = b. Dependent Variable = c. Circle: One-tailed Two-tailed

Scale (circle): Categorical Scale (circle): Categorical

Quantitative Quantitative

d. Alternative hypothesis in sentence form.

e. Write the alternative and null hypotheses using correct notation. H1: H0: f. tcalculated = h. Circle: reject null or g. Level of significance (p) = fail to reject null

j. effect size r2= i. Write your conclusion in sentence form.

Page 49

Additional Practice: Interpreting Research Articles t-test of Independent Sample


Read the following excerpt to complete the questions on the next page: Researchers studied women enlisted in the Navy and examined the impact of sexual harassment on their satisfaction with the military. Among the participants, 436 were sexually harassed and 582 were not. Participants completed a 7-item question that utilized a 5 point scale in which higher scores indicate more positive perceptions. Item 3 scores have been reversed to align with the positive nature of the other items. Table 1. Mean responses and t-test results
Question 1. 2. 3. 4. 5. I would recommend the Navy to others. I am satisfied with my rating. I plan to leave the Navy because I am dissatisfied. My experiences have encouraged me to stay in the Navy. This command provides the information people need to make decisions about staying in the Navy. 6. In general, I am satisfied with the Navy. 7. I intend to stay in the Navy for at least 20 years. Mean Harassed 3.31 3.24 3.17 2.24 2.71 3.29 2.66 Mean Not Harassed 3.60 3.56 3.67 2.58 3.00 3.68 3.22

t
3.76* 4.02* 5.89* 4.56* 3.80* 5.41* 5.63*

* indicates p<.001
Source: Newell, C.E., Rosenfeld, P., & Culbertson, A. L. (1995). Sexual harassment experiences and equal opportunity perceptions of Navy women. Sex Roles, 32, 159-168.

1. Which group of Navy women is more likely to recommend the Navy to others? In other words, which group has the higher mean for item one?

2. Is the mean difference for item 1 statistically significant?

3. Should we reject the null hypothesis for item 1? Explain.

4. How many items generated statistically significant mean differences?

5. In general, what can we conclude about sexual harassment and navy satisfaction?

Answers: 1) Those who have NOT been sexually harassed have the higher mean and are more likely to recommend the Navy to others; 2) Yes, it is significant at the p<.001 level. 3) Yes, the t result is significant at p<.001.; 4) all items were significant; 5) Navy women who have NOT been sexually harassed are more satisfied with the Navy than those who have been sexually harassed.

Video #9: t Test of Related Samples

Page 50

Many times research evaluates the effect of a treatment by uses a pretreatment and post treatment design with a single sample, this is called a repeated measures study. since the test uses the same sample, there is no risk that one group is different from another even before the treatment begins. researchers try to build upon this concept when studying two samples by matching subjects from the two groups--this helps to eliminate pretreatment differences t test of related samples compares the differences between the pre and post treatment scores of the sample to pre-post differences in the population. difference score = D = X2 - X1 Mean of differences (D) = D n

Computing the t of related samples Recall tsingle sample = X - sX For t of related samples, the sample data are the difference scores (D) and the population data we are interested in is NOT the population mean but the population mean difference (D), therefore, t related samples = D - D sD where sD = s n

We are not comparing means of the pre and post, rather the pre and post scores for each individual are compared!
Developing the hypotheses:

Alternative One-tailed Two-tailed H 1 : D > 0 H 1 : D 0

Null H 0 : D 0 H 0 : D = 0

Assumptions of the related samples t test independent observations, normal distribution of pop of differences

Page 51

Putting it all together Example of a one-tailed t test A researcher is interested in studying the effects of endorphins (the feeling-good chemical that is released in the brain at the end of aerobic exercise) on pain tolerance. A sample of 16 subjects is obtained; each persons tolerance for pain is tested before and after a 50 minute session of aerobic exercise. On the average, the pain tolerance for the sample was D =10.5 higher after exercise than it was before. The SS for the sample difference scores was SS = 960. Do these data indicate a significant increase in pain tolerance following exercise. Test at the .01 level. Step 1: Develop hypotheses State alternativeExercise will significantly increase pain tolerance It is directional hypothesis ------>one-tailed H1: D > 0 H0: D 0 Step 2: Establish significance criteria Computer =.01 Hand calculations Identify tcritical used for the alpha level, the appropriate test, and df one-tailed test at .01 (df =15) corresponds to tcritical = +2.602 Step 3: Collect and analyze sample data Computer Hand calculations Calculate sample mean of D (D): D = 10.5 Calculate standard deviation of D scores s= SS = n-1 960 = 15 64 = 8 sD = s = 8 n 16 = 2

Calculate estimated standard error of D

Step 4: Compare sample data to null------>calculate test statistic Computer Identify test statistic and p-value; t(15)=5.25, p<.001 Compare p-value with alpha level .001 is less than .01 reject null Hand calculations Calculate t trelated samples = D - D = 10.5 = 5.25 it exceeds tcritical reject null 2 sD Step 5: Draw conclusion Aerobic exercise significantly increased pain tolerance; t(15)=5.25, p<.001, one-tailed.

1. An investigator for NASA examines the effect of cabin temperature on reaction time. A random sample of 10 astronauts and pilots is selected. Each persons reaction time to an emergency light is measured in a simulator where the cabin temperature is maintained at 70 degrees F and again the next day at 95 degrees. Using the results of this experiment, can the psychologist conclude that temperature has a significant effect on reaction time. Test at the .01 level. Summary statistics
Column var1 var2 n 10 10 Mean 203 223 Variance 381.55554 417.1111 Std. Dev. 19.533447 20.423298 Std. Err. 6.177018 6.458414 Median Range Min Max 205.5 224 55 176 65 190 231 Q1 183 Q3 216

In-Class Practice Problems

Page 52

255 206 240

Paired T-test results: D - mean of the differences between var1 and var2 H0:D = 0 HA:D not equal 0
Difference var1 - var2 Sample Diff. -20 Std. Err. 1.67332 DF 9 T-Stat -11.952286 P-value <0.0001

a. Independent Variable = b. Dependent Variable =


c. Circle:

Scale (circle): Categorical Scale (circle): Categorical Two-tailed

Quantitative Quantitative

One-tailed

d. Alternative hypothesis in sentence form.

e. Write the alternative and null hypotheses using correct notation. H 1: H0: f. tcalculated = h. Circle: reject null or g. Level of significance (p) = fail to reject null

i. Write your conclusion in sentence form.

Page 53

2. Does eating oatmeal decrease cholesterol levels? A researcher implements a 30-day treatment that consists of eating a bowl of oatmeal everyday for breakfast. Cholesterol is measured before (var1) and after (var2) the treatment for the 10 participants. An = .05 was utilized.

Summary statistics
Column var1 var2 n 10 10 Mean 258.2 222 Variance 192.4 269.33334 Std. Dev. 13.870832 16.411379 Std. Err. 4.3863425 5.1897335 Median 257.5 221 Range 40 56 Min 240 190 Max 280 246 Q1 245 210 Q3 270 230

Paired T-test results:

D - mean of differences between var1 and var2 H0:D = 0 HA:D > 0


Difference Sample Diff. 36.2 Std. Err. 4.319979 DF 9 T-Stat 8.379669 P-value <0.0001

var1 - var2

a. Independent Variable = b. Dependent Variable =


c. Circle:

Scale (circle): Categorical Scale (circle): Categorical Two-tailed

Quantitative Quantitative

One-tailed

d. Alternative hypothesis in sentence form.

e. Write the alternative and null hypotheses using correct notation. H 1: H0: f. tcalculated = h. Circle: reject null or g. Level of significance (p) = fail to reject null

i. Write your conclusion in sentence form.

Page 54

Additional Practice: Interpreting Research Articles t-test of Related Samples


Read the following excerpt to complete the questions on the next page: Seventy-four drug users participated in a Behavioral Counseling Program to reduce drug use. Among the participants, 75% were male, 75% were adults, 12% were minority, and 25% were mandated to obtain counseling by a public agency. With respect to drug use, about 50% used cocaine and 75% used marijuana. The Behavioral Counseling Program consisted of three parts: 1) stimulus control, including competing response training; 2) urge control procedure for interrupting incipient drug use urges, thoughts, and actions; and 3) behavior contracting, especially between youth and parents. Drug use was measured at the beginning of treatment, the end of treatment, and one month after treatment. Drug use decreased substantially from pretreatment to the end of treatment ( t=4.28, p<.001) with slight, nonsignificant decrease from end of treatment to the follow-up month ( t=.92,p=.72). The decrease from pretreatment to follow-up remained statistically significant ( t=4.42, p<.001).
Source: Azrin, N. H., Acierno, R., Kogan, E. S., Donohue, B., Besalel, V. A., & McMahon, P.T. (1996). Follow-up results

of supportive versus behavioral therapy for illicit drug use. Behavior Research and Therapy, 34, 41-46.

1. As is customary in journal article, the research did not state the null hypothesis. Write the appropriate null hypothesis for the first t-test result reported in the excerpt.

2. Should the null hypothesis written for item 1 be rejected? Explain.

3. Should the null hypothesis be rejected for the second t test reported in the excerpt. Explain.

4. The last difference in the excerpt was statistically significant at the .001 level. Was it also significant at the .05 level?

Answers: 1)The treatment of Behavioral Counseling Program will NOT significantly reduce drug use among participants. 2) Yes, since the p-value is less than .05. 3) No, the p-value is greater than .05. 4)Yes, If it is significant at p<.001 then it is also significant at p<.05.

Coke vs. Pepsi Experiment: t tests


We are going to conduct an experiment using the Coke vs. Pepsi Taste Test that investigates two research questions:

Page 55

1) Are diet drinkers (when compared to regular drinkers) more accurate in tasting the

difference between Coke and Pepsi?

This question will utilize a t-test of independent samples, which you can complete for 5 points of extra credit (Extra Credit #1).

2) When tasting the difference between Coke and Pepsi, is ones prediction of accuracy significantly different from ones actual ability/accuracy? This question will utilize a t-test of related samples, which you will complete for 5 points of extra credit (Extra Credit #2). In order to complete this experiment, you need at least one other person (who has the same pop preference as you) to participate. It would be great if you can find 2-4 more individuals.

Directions:

1. Identify your pop preference (Diet or Regular). If you prefer diet pop, purchase one can/bottle of Diet Coke and one of Diet Pepsi. If you prefer regular, purchase can/bottle of Coke and one of Pepsi. 2. In addition to the pop, you will need the following supplies to complete this experiment. 5 small paper cups for each participant Pen or pencil Napkins in case you spill Pretzels or chips for cleansing ones palate 3. Once you have your supplies and participants together, record each participants name in the first column of the data grid below and ones preference (diet=1, regular=2) in the second column.

Data Grid
Name Preference Prediction % Actual %

4. Have each participant predict how accurate they will be in identifying the pop as Coke or Pepsi. Since each person will be given 5 cups of pop, predict how many times out of 5 chances you will be correct in the identification process (e.g., 3/5). Then, convert that fraction into a percent (e.g., 3/5=60%). Record this percent in the third column of the grid. 5. Determine who will complete the taste test first. Have that person turn away while another participant fills 5 cups with pop (make sure that some cups have Pepsi and other cups have Coke

Page 56

and that you know which cups have which pop). Hint: Dont write the name of the pop on the bottom of the cup; it will show through as the person drinks the pop. 6. Have the taste tester proceed in identifying the pop in each cup, while another participant records the accuracy. Dont tell the results to the taster until all 5 cups have been tasted. Calculate the number of correct tastes out of five. Convert that fraction into a percent and record the percent in column 4 of the grid. 7. Once you and your fellow participants have finished the taste test, add your results to the spreadsheet below. 8. Go to StatCrunch and enter ALL the data from the spreadsheet (including the data provided for 15 individuals). You should have a minimum of n=17 for your sample. Proceed with the t-test directions.

Extra Credit Worksheets are in Computer Lab Packet!

Video #10: Analysis of Variance

Page 57

Analysis of Variance (ANOVA) is a hypothesis testing procedure that evaluates mean differences between two or more treatments or groups; t test can only compare two groups. Single Factor Designstudies the effect that one factor (independent variable) has on the dependent variable. Note that although there is only one factor, this factor has more than two categories so that we are comparing two or more groups/treatments. Hypothesis Testing for ANOVA Null hypothesis states that there is no difference among the groups or treatments H0: 1 = 2 = 3 Alternative hypothesis states that at least one mean is different from the others H1: At least one mean will differ

ANOVA Test Statistic ANOVA creates a test statistic called an F-ratio that is similar to t statistic

Recall that

t=

obtained difference between sample means = tsingle = X - difference expected by chance (error) sX

F is similar to t, but since there are more than two means to compare, variance will be used to represent the differences between all the means being compared.

F=

variance (differences ) between sample means variance (differences ) expected by chance (error)

Like t, a large F value indicates the treatment effect (mean differences) that is unlikely due to chance. when the treatment had no effect so that the means are the same (H0 is true), the F-ratio will be close to 1.00

Distribution of F-ratios Like t, F is also distributed But the F distribution is not normal; it is positively skewed, the degree of which depends upon the degrees of freedom from the two variances. large df -------> nearly all F-ratios are clustered around 1.00 small df -------> the F-ratios are more spread out Since the F distribution is positively skewed, we are only looking in one tail for the difference. As a result we dont need to indicate if the test is one or two tailed. Recall: we expect F near 1.00 if the null is true and expect a large F if the null is rejected therefore, significant F-ratios will be in the tail of the F distribution

Page 58

variance (differences) between group means variance (differences) expected by chance/error (within groups)

Variance (differences) between groups can be due to: treatment effect individual differences (subjects within the various groups are different even before the treatment begins experimental error (caused by poor equipment, lack of attention/knowledge on the researchers part, unpredictable change of events) Variance within groups can be due to: individual differences (subjects within the various groups are different even before the treatment begins experimental error (caused by poor equipment, lack of attention/knowledge on the researchers part, unpredictable change of events) Consequently, if we divide the variance between treatments by the variance within treatments, (individual differences and error cancel out) so we can determine the treatment effect.

F =

variance between groups = variance within groups

treatment effect + individual differences + error individual differences + error

The last few steps of ANOVA require the following calculations: df between groups = k 1 df within-groups = N k where k is number of groups where N is total number of individuals in groups

MS between = variance between treatments = SSbetween MS within = variance within treatments = SSwithin F-ratio = MS between MS within

df between

df within

Page 59

Putting it all together Example: A number of studies on jetlag have found that jetlag seems to be worse when people are traveling east. A researcher examines how many days it takes a person to adjust after taking a long flight. One groups flies west across time zones (NY to CA); a second group flies east (CA to NY); and a third group takes a long flight within one time zone (San Francisco to Seattle). Perform an analysis of variance to determine if jetlag varies for the direction of travel. Use the .05 level of significance.

Computer Results
Analysis of Variance results for var2 grouped by var1

Sample means:

Group 1 2 3
ANOVA table:

n 6 6 6

Mean 2.5 6 0.5

Std. Error 0.4281744 0.57735026 0.2236068

Source Treatments Error Total

df 2 15 17

SS 93 17 110

MS 46.5 1.1333333

F-Stat 41.02941

P-value <0.0001

Step 1: Develop hypotheses State alternativeDirection of travel will significantly effect jetlag. H0: 1 = 2 = 3 H1: At least one mean will differ Step 2: Establish significance criteria Computer =.05 Step 3: Collect and analyze sample data Computer enter data Step 4: Compare sample data to null------>calculate test statistic Computer Identify test statistic and p-value; F(2, 15)=41.03, p<.0001 Compare p-value with alpha level .0001 is less than .05 reject null Step 5: Draw conclusion Direction of travel significantly effected jetlag.

Page 60

Post Hoc Tests


So far, we have only been able to determine if there is a significant difference (treatment had an effect), but we are unable to determine which group is different. We could do a t test for each comparison, but we run the risk of a type I error when we run several hypothesis tests, called experimentwise alpha level, the overall probability of a Type I error over a series of separate hypothesis tests. Fortunately, there are some test that are very conservative and allow us to determine which group is different after ANOVA has been conducted and a difference has been found; these are called Post Hoc Tests. The Scheffe Test is the safest post hoc test used to compare two groups/treatments. It is safe because it uses the value of k to calculate the df and the critical F-ratio from the original ANOVA to determine if it is significant. Unfortunately, StatCrunch is unable to conduct Post Hoc tests!

Reporting of ANOVA Results


Much of the time an ANOVA summary table is presented that includes SS, df, and MS for each treatment as well as the F-ratio; in addition a table of means and standard deviations for each treatment will be presented. Using the previous example, the tables would look like the following Westbound 2.5 0.43 Eastbound 6.0 0.58 Same zone 0.5 0.22

M SE

Source Between treatments Within treatments Total

ANOVA SUMMARY SS df 93 2 17 15 110 17

MS 46.5 1.13

F = 41.02

When space is an issue, the results should include the F-ration with both degrees of freedom in parentheses and the p-value. Do NOT indicate one-tailed or two-tailed! Travel direction does effect jetlag; F(2, 15) = 41.02, p < .05.

Assumptions of ANOVA: independent observations, samples are selected from normal populations that also have equal variances.

ANOVA

In-Class Practice Problems

Page 61

1. The extent to which a persons attitude can be changed depends on how big a change you are trying
to produce. In a classic study on persuasion, Aronson, et al. (1985) obtained three groups of subjects. One group listened to a persuasive message that differed only slightly from the subjects original attitudes. For the second group, there was a moderate discrepancy between the message and the original attitudes. For the third group, there was a large discrepancy between the message and the original attitudes. For each subject, the amount of attitude change was measured. Data were entered for the three groups (small, moderate, large discrepancy) and an ANOVA was utilized to determine if the amount of discrepancy between the original attitude and the persuasive argument has a significant effect on the amount of attitude change. Test at the .05 level.

Analysis of Variance results for var2 grouped by var1


Group 1 2 3 n 6 6 6 Mean 1.5 6.6666665 1 Std. Error 0.4281744 0.71492034 0.2581989 Source Treatments Error Total df 2 15 17 SS 118.111115 22.833334 140.94444 MS 59.055557 1.5222223 F-Stat 38.79562 P-value <0.0001

a. Independent Variable = b. Dependent Variable = c. Alternative hypothesis in sentence form.

Scale (circle): Categorical Quantitative Scale (circle): Categorical Quantitative

d. Write the alternative and null hypotheses using correct notation. H1: e. Fcalculated = f. Level of significance (p) = g. Circle: reject null or fail to reject null H0:

h. Write your conclusion in sentence form.

Page 62

2. A psychologist would like to examine the relative effectiveness of three therapy techniques for treating mild phobias. A sample of N=15 individuals who display a moderate fear of spiders is obtained. These individuals are randomly assigned to the three therapies. After a certain amount of therapy, the psychologist measures the degree of fear reported by each individual. ANOVA was conducted to determine if there are any significant differences among the three therapies. Test at the .05 level.

Analysis of Variance results for var2 grouped by var1


Group 1 2 3 n 5 5 5 Mean 4 1.6 1.4 Std. Error 0.70710677 0.50990194 0.50990194 Source Treatments Error Total df 2 12 14 SS 20.933332 20.4 41.333332 MS 10.466666 1.7 F-Stat 6.1568627 P-value 0.0145

a. Independent Variable = b. Dependent Variable = c. Alternative hypothesis in sentence form.

Scale (circle): Categorical Scale (circle): Categorical

Quantitative Quantitative

d. Write the alternative and null hypotheses using correct notation. H0: H 1: e. Fcalculated = f. Level of significance (p) = g. Circle: reject null or fail to reject null

h. Write your conclusion in sentence form.

Page 63

Additional Practice: Interpreting Research Articles ANOVA


Read the following excerpt to complete the questions on the next page: Researchers examined the impact of teacher self-efficacy on classroom technology use. Participants included 101 teachers from four elementary (K-6) schools in Northwest Ohio. Of the 101 participants, 13 were male. Teachers were administered the Teacher Attribute Survey (TAS) which measured classroom technology use (teacher, student, and overall). Teacher self-efficacy was also measured in the instrument and represented ones belief in affecting student performance. Low, moderate, and high levels of self-efficacy were created. As such, a teacher with low self-efficacy was defined as 3.29 or below, medium self-efficacy as range from 3.3 to 4.6, and high self-efficacy as 4.61 and higher. Table 1. Means and ANOVA results for Self-Efficacy groups and Technology Use Technology Use Means by Level of Self-Efficacy Low (n=12) Moderate (n=78) High (n=11) ANOVA Results Teacher Tech Use 1.73 2.15 2.36 F(2,98)=3.77, p<.05 Student Tech Use Overall Tech Use 1.24 2.08 1.49 1.82 1.81 2.08 F(2,98)=4.52, p<.05 F(2,98)=4.71, p<.05

1. Which type of technology use is the highest among all levels of self-efficacy?

2. Which group of teachers (low, moderate, or high self-efficacy) report the highest technology use among their students?

3. Write the null hypothesis for self-efficacy and overall technology use, where the ANOVA results indicate: F(2,98)=4.71, p<.05.

4. Considering the null hypothesis that you wrote for item 3, should the null hypothesis be rejected? Explain.

Answers: 1) teacher technology use; 2) teachers with high self-efficacy (M=1.81); 3) Self-efficacy will NOT significantly impact overall technology use among teachers; 4) Reject the null, F(2,98)=4.71, p<.05.

Video #11: Correlation and Regression

Page 64

Correlationstatistical technique used to measure and describe a relationship between two quantitative variables; correlation measures 3 characteristics: direction of relationship positiveas one variable increases so does the other (food intake & weight) negative (inverse)as one variable increases the other decreases (exercise & weight)

x Positive (r = +.90)

x Negative (r = -.90)

form of relationship linearthe relationship between x and y falls in a straight line curvilinear the relationship between x and y curves (age across the lifespan is a variable that often creates a curvilinear relationship) degree (strength) of relationship degree of relationship is reflected in a correlation coefficient (usually r) r ranges between -1 to +1, 0 indicating no relationship, while +1 indicates a perfect positive relationship, and -1 indicates a perfect negative relationship

Pearson Correlation Coefficient


measures the degree and direction of linear relationship between two variables r = degree to which X and Y vary together = degree to which X and Y vary separately SP SSXSSY

since we will be computing variability for each variable as well as their variability together, we will be using SS and a new concept, SP, sum of products. Sum of products is used to compute the amount of covariability of two variables SP = (X - X)(Y - Y)

Page 65

Correlation
does NOT measure cause and effect when data have a limited range of scores, the value of the correlation can be exaggerated interpreting strength of coefficient (practical significance): r > .8 is very strong r = .6 - .79 is strong r = .4 - .59 is fair r < .39 is weak to describe how accurately one variable predicts the other, square r. For example, if r=.60, then r2 = .36, which can be interpreted as 36% of the variability in Y scores can be predicted from the relationship with X. r2 is called the coefficient of determination because is measures the proportion of variability in one variable that can be determined from the relationship with the other variable.

Hypothesis Testing (hypotheses use the Greek letter rho, , to signify r) Alternative H 1: > 0 H 1: 0 Null H0: 0 H0: = 0

One-tailed Two-tailed

Putting it all together Example: To measure the relationship between anxiety level and test performance, a psychologist obtains a sample of n=6 college students from an intro stats course. Students arrive fifteen minutes prior to the exam and complete physiological measures of anxiety (heart rate, skin resistance, blood pressure, etc.). Anxiety ratings and exam scores are listed below. Compute the Pearson correlation to determine if a negative relationship exists between anxiety and test performance. Test at the .05 level. Step 1: Develop hypotheses. State Alternative: Anxiety and test performance will negatively relate. It is a directional hypothesis ---- one-tailed H1: < 0 (population shows negative correlation) H0: > 0 (population does not show negative correlation) Step 2: Establish significance criteria Computer StatCrunch does not calculate the p-value for the correlation coefficient. As a result, we must identify rcritical used for , tails, and df df = n 2 = 6 2 = 4, r critical = -.729 Notice that df is n-2 for correlation, since we need two points to create a line. Hand calculations Identify rcritical used for , tails, and df

Page 66

Step 3: Utilize sample data to calculate r Computer Hand calculations Calculate SP, SSX, SSY, r
Exam Score(Y) 80 88 80 79 86 85 Y = 83 (X - X) 0 -3 2 2 -1 0 (Y - Y) -3 5 -3 -4 3 2 (X - X) (Y - Y) 0 -15 -6 -8 -3 0 SP = -32 (X - X)2 0 9 4 4 1 0 SSX=18 (Y - Y)2 9 25 9 16 9 4 SSY= 72

Anxiety Rating (X) 5 2 7 7 4 5 X= 5

Step 4: Compare sample data to null------>calculate test statistic Computer Identify test statistic and compare rcalculated to rcritical Correlation between var2 and var1 is: -0.8888889 r falls into critical region, it is significant reject null Hand Calculations Calculate r = SP = - 32 = -32 = -.888 SSX SSY 18(72) 36 Compare rcalculated to rcritical r falls into critical region reject null

Step 5: Draw conclusion A negative relationship exists between anxiety and test performance, r(4)=-.889, p<.05, one-tailed.

Computer Output

Correlation between var2 and var1 is: -0.8888889

Page 67

Regression
Regressionstatistical technique for finding the best-fitting straight line for a set of data; used when wanting to determine the ability of one variable to predict another variable (e.g., using SAT score to predict freshman college GPA) Regression lineline that represents the linear relationship; represented by a linear equation Y = a + bX, where a = Y-intercept and b=slope Least-squares method helps determine the best-fitting line by minimizing the error between the predicted & actual values of Y. Y = a + bX , where b = SP SSX and a = Y bX

Example: Using the correlation problem we just solved, lets calculate the regression line. Step 1: Use X, Y, SSX, SP to calculate b and a (previously calculated: X= 5, Y = 83, SP = -32, SSX=18, SSY= 72) b = SP = -32 = -1.777 18 SSX a = Y bX a = 83 (-1.777)(5) a = 83 + 8.888 a = 91.888 Step 2: Calculate regression equation Y = a + bX Y = 91.89 -1.78X now use regression equation to predict Y for a given value of X. If X=7, what is the predicted value of Y? Y = 91.89 -1.78X Y = 91.89 -1.78(7) = 79.43

We can

Page 68

Computer Output

Computer Output: The output in the video will appear different, since a different version of StatCrunch was used.
Simple linear regression results: Dependent Variable: var2 Independent Variable: var1 var2 = 91.888885 - 1.7777778 var1 Sample size: 6 R (correlation coefficient) = -0.8889 R-sq = 0.79012346 Estimate of error standard deviation: 1.9436506 Parameter estimates: Parameter Intercept Slope Estimate 91.888885 -1.7777778 Std. Err. 2.4241583 0.45812285 DF 4 4 T-Stat 37.905483 -3.88057 P-Value <0.0001 0.0178
Ignore these pvalues since they are NOT for the correlation coefficient (r).

Regression equation

Correlation coefficient

Analysis of variance table for regression model: Source DF Model Error Total 1 SS 56.88889 MS F-stat P-value 0.0178 56.88889 15.058824

4 15.111111 3.7777777 5 72

Predicted value for Y when X=7 95% C.I. (76.07917, 82.809715) 95% P.I. (73.08468, 85.80421)

Predicted values: X value 7 Pred. Y 79.44444 s.e.(Pred. y) 1.2120792

Video #11 In-Class Practice Problems

Page 69

1. You probably have read about he relationship between years of education and salary potential. The following hypothetical data represent a sample of n = 10 men who have been employed for five years. Does this data indicate a significant relationship between years of higher education and salary. Test at the .05 level. Also find the regression equation for predicting salary from education. (X) Years of Higher Education: 4, 4, 2, 8, 0, 5, 10, 4, 12, 0 (Y)Salary (in $1000s): 31, 29, 28, 42, 23, 35, 45, 27, 44, 24
Simple linear regression results: Dependent Variable: salary Independent Variable: education salary = 23.135265 + 1.9723947 education Sample size: 10 R (correlation coefficient) = 0.9601 R-sq = 0.92169785 Estimate of error standard deviation: 2.4466708 Parameter estimates: Parameter Intercept Slope Estimate 23.135265 1.9723947 Std. Err. 1.2611643 0.20325504 DF 8 8 T-Stat 18.34437 9.704039 P-Value <0.0001 <0.0001

Analysis of variance table for regression model: Source Model Error Total DF 1 8 9 SS 563.71045 47.88958 611.6 MS 563.71045 5.9861975 F-stat 94.168365 P-value <0.0001

Predicted values: X value 5 Pred. Y 32.99724 s.e.(Pred. y) 0.77397215 95% C.I. (31.212456, 34.78202) 95% P.I. (27.07964, 38.91484)

Page 70

a. Independent Variable = b. Dependent Variable = c. Circle: One-Tailed OR

Scale: Categorical Scale: Categorical Two-Tailed

Quantitative Quantitative

d. Alternative hypothesis in sentence form.

e. Write the alternative and null hypotheses using correct notation. H 1: f. rcritical = h. Circle: reject null or H0: g. rcalculated = fail to reject null

i. Write your conclusion in sentence form.

j. Regression equation: k. If one has 5 years of education, what is the predicted salary?

Page 71

2. Research has shown that similarity in attitudes, beliefs, and interests plays an important role in interpersonal attraction. A therapist examines the correlation in attitudes between husbands (X) and wives (Y). She administers a questionnaire that measures how liberal or conservative ones attitudes are. Low scores indicate that the person has liberal attitudes while high scores indicate conservatism (scale 1-10). Ten couples participate. Test at the .01 level.
Simple linear regression results: Dependent Variable: wife att Independent Variable: hus att wife att = 0.7785714 + 0.8035714 hus att Sample size: 10 R (correlation coefficient) = 0.7869 R-sq = 0.61919034 Estimate of error standard deviation: 1.6673064 Parameter estimates: Parameter Intercept Slope Estimate 0.7785714 0.8035714 Std. Err. 1.4370375 0.22280319 DF 8 8 T-Stat 0.54178923 3.6066422 P-Value 0.6027 0.0069

Analysis of variance table for regression model: Source Model Error Total DF 1 8 9 SS 36.160713 22.239286 58.4 MS 36.160713 2.7799108 F-stat 13.007869 P-value 0.0069

Predicted values: X value 5 Pred. Y 4.7964287 s.e.(Pred. y) 0.57239175 95% C.I. (3.4764907, 6.1163664) 95% P.I. (0.73135275, 8.861505)

Page 72

a. Independent Variable = b. Dependent Variable = c. Circle: One-Tailed OR

Scale: Categorical Scale: Categorical Two-Tailed

Quantitative Quantitative

d. Alternative hypothesis in sentence form.

e. Write the alternative and null hypotheses using correct notation. H 1: f. rcritical = h. Circle: reject null or H0: g. rcalculated = fail to reject null

i. Write your conclusion in sentence form.

j. Regression equation: k. If the husband has moderate attitude of 5, what is the value of the wifes attitude?

Page 73

Additional Practice: Interpreting Research Articles Correlation


Read the following excerpt to complete the questions on the next page: Boivin and Hymel (1997) examined the relationships among social behavior, peer experiences and selfperception. A total of 793 French Canadian children participated in the study (393 girls, 400 boys). The participants ranged from third to fifth grade, were from ten elementary schools and from a variety of socioeconomic backgrounds. The following variables were measured: Aggression and withdrawal were measure by showing a picture of all classmates and asking each student to choose two classmates who best fit each descriptor. For aggression, a score was obtained for each child by summing the number of times he or she was selected for these descriptors: gets into lots of fights, loses temper easily, too bossy, and picks on other kids. For withdrawal, a score was obtained for each child by summer the number of times he or she was selected for these descriptors: rather play alone than with others and very shy. Social preference was assessed by asking each child to name three other children they would like most and like least for playing together, inviting others to a birthday party, and sitting next to each other on a bus (Higher scores indicate greater social preference.) Victimization by peers was measure by asking each child to nominate up to five other students who could be described as being made fun of, being called names, and getting hit and pushed by other kids. (Higher scores indicated greater victimization.) Number of affiliative links was measured by asking, You have probably noticed children in class who often hang around together and others who are more often alone. Could you name children who often hang around together? (Higher scores indicate a larger number of affiliative links.) Loneliness was measured with a 16-item questionnaire with higher scores indicating greater loneliness. Perceived social acceptance and behavior-conflict were two aspects of self-concept measured with Harters Self-Perception Profile for Children. Higher scores reflect a better self-concept in each of the two domains. Table 1. Correlations among the social behavior, peer expectation, and self-perception measures
1 1. Withdrawal 2. Aggression 3. Social Preference 4. Victimization by Peers 5. # of Affiliate Links 6. Loneliness 7. Perceived social acceptance 8. Perceived behavior-conduct --.10 -.39 .42 -.35 .29 -.27 .06 2 --.44 .53 .05 .12 -.04 -.32 3 4 5 6 7 8

--.68 .35 -.34 .28 .17

--.21 .34 -.26 -.17

--.18 .18 -.06

--.69 -.35

-.39

--

Source: Boivine, M. & Hymel, S. (1997). Peer experiences and social self-perceptions: A sequential model. Developmental Psychology, 33, 135-143.
Notice that the correlation coefficients are presented in a matrix. The column header represent the same variables presented in the row headers, however the column header only uses the number to indicate a certain variable. For example, the circle coefficient of .39, represents the correlation between Perceived Social Acceptance and Perceived Behavior Conduct.

Page 74

-861. What is the value of the Pearson r for the relationship between withdrawal and loneliness? Describe this value in terms of strength and direction.

2. What is the value of the Pearson r for the relationship between social preference and victimization by peers? Describe this value in terms of strength and direction.

3. Which variable has the strongest relationship with withdrawal?

4. Which variable has the weakest relationship with withdrawal?

5. The Pearson r for the relationship between withdrawal and loneliness indicates that those who tend to be more lonely tend to be: A. more withdrawn B. less withdrawn

6. Which of the following pairs has the strongest relationship between them? A. Perceived social acceptance and loneliness B. Withdrawal and victimization by peers C. Number of affiliate links and aggression

7. Which of the following pairs has the weakest relationship between them? A. Withdrawal and social preference B. Withdrawal and perceived social acceptance C. Withdrawal and perceived behavior-conduct

Answers: 1) .29, weak and positive; 2) -.68, strong and negative; 3) Victimization by peers, r=.42; 4) Perceived behavior-conduct, r=.06; 5) A, more withdrawn; 6) A; 7) C.

Video #12: Chi Square Test for Independence

Page 75

So far we have used parametric tests to evaluate a hypothesis about the population. Parametric tests require certain assumptions about the population parameters, such as a normal distribution, homogeneity of variance, and a quantitative (interval/ratio) dependent variable. When these assumptions for parametric tests cannot be fulfilled, nonparametric tests can be used.

Nonparametric tests
usually do not state a hypothesis in terms of the population distribution, so they are often called distribution-free tests are suited for data that utilize a nominal or ordinal scale are not as sensitive as parametric testsare more likely to fail in detecting a real difference between two treatments one commonly used nonparametric tests is the Chi Square Test for Independence.

Chi Square Test of Independence


Used to test a relationship (differences) between two categorical variables If variables are independent of one another, then there is no relationship. As a result the distribution of one variable will have the same shape for all the categories of the second variable. Alternative hypothesis for Chi Square Test for Independence can be written to focus on the relationship or on the differences. H1: Gender is related to learning style. H1: Learning style will differ by gender. Chi Square Test for Independence compares the observed and expected frequencies. Our expected frequencies come from our null hypothesis and our observed data.

2 =

Building on our example of females and males with respect to learning styles, the table below presents the data observed for a sample of 125 males and 75 females. Audio 30 30 60 Visual 30 25 55 Kinesthetic 65 20 85

(fo-fe)2 fe

Males Females

125 75

Page 76

If the distribution for gender is predicted to be the same for the each learning style category, then the same proportion/percent of males and females in each category would be expected. to calculate the expected frequency for each category this formula is used fe = fcfr where fc = column total, fr = row total, n n = sample size the table of expected frequencies would look something like this Audio Males Females
60(125)/200=38 60(75)/200=22

Visual
55(125)/200=34 55(75)/200=21

Kinesthetic
85(125)/200=53 85(75)/200=32

125 75

60

55

85

Degrees of freedom are calculated a bit differently df = (R - 1)(C - 1), where R= number of rows, C=number of columns in our example, df = (2-1)(3-1) = 1(2) = 2 using this and =.05, our 2critical = 5.99

Page 77

Putting it all together Example: Based upon the observed frequencies presented in the table below, can a researcher conclude that learning styles differ by gender? Test at the .05 level. Audio Visual Kinesthetic Males 30 30 65 125 Females 30 25 20 75 60 55 85 Step 1: Develop hypotheses. State Alternative: Learning style will significantly differ by gender. Step 2: Establish significance criteria Computer = .05 Hand calculations Identify 2critical used for and df df = (2-1)(3-1) = 2 2critical = 5.99 Step 3: Utilize sample data to calculate 2 Computer enter data Hand calculations Calculate expected frequencies (fe), fo-fe, (fo-fe)2

fo
male-audio female-audio male-visual female-visual male-kinesthetic female-kinesthetic 30 30 30 25 65 20

fe
38 22 34 21 53 32

fo-fe
-8 8 -4 4 12 -12

(fo-fe)2
64 64 16 16 144 144

1.68 2.91 0.47 0.76 2.72 4.50 = 13.04

(fo-fe)2 fe

Step 4: Compare sample data to null------>calculate test statistic Computer Identify test statistic and compare p-value to a level Statistic Chi-square Hand DF 2 Value 13.042 P-value 0.0019

o p-value is less than .05 reject null Calculations Calculate 2 = 13.04 Compare 2calculated to 2critical Since 2 = 13.04 and exceeds the 2critical= 5.99, the null is rejected

Step 5: Draw conclusion Males and females differ in learning styles; 2(2, n=200)=13.04, p<.05.

Page 78

Computer Output
Contingency table results: Rows: var1 (1=male, 2=female) Columns: var2 (1=audio, 2=visual, 3=kinesthetic)
Cell format: Count Row percent Column percent Total percent

1 30 24% 50% 15% 30 40% 50% 15%

2 30 24% 54.55% 15% 25 33.33% 45.45% 12.5%

3 65 52% 76.47% 32.5% 20 26.67% 23.53% 10%

Total 125 100.00% 62.5% 62.5% 75 100.00% 37.5% 37.5% 200 100.00% 100.00% 100.00%

Total

60 30% 100.00% 30%

55 27.5% 100.00% 27.5%

85 42.5% 100.00% 42.5%

Statistic Chi-square

DF 2

Value 13.042

P-value 0.0019

Assumptions of Chi Square Tests Random sampling Independence of observations Expected frequency for any cell MUST be greater than 5 Reporting Chi Square Results Statement should include chi-square value with df and n in parenthesis, and p-value:

Males and females differ in learning styles; 2(2, n=200)=13.04, p<.05.

Video #12 In-Class Practice Problems

Page 79

1. The US Senate recently considered a controversial amendment for school prayer. The amendment did not get the required two-thirds majority, but the results of the vote are interesting when viewed in terms of the party affiliation of the senators. Does the vote on the prayer amendment (var2: 1=yes, 2=no) differ by political party (var1: 1=demo, 2=rep). Test at the .05 level. Contingency table results: Rows: var1 Columns: var2

1 19 42.22% 33.93% 19% 37 67.27% 66.07% 37% 56 56% 100.00% 56%

2 26 57.78% 59.09% 26% 18 32.73% 40.91% 18% 44 44% 100.00% 44%

Total 45 100.00% 45% 45% 55 100.00% 55% 55% 100 100.00% 100.00% 100.00%

1
Statistic Chi-square DF 1 Value 6.3032928 P-value 0.0121

Total

a. Independent Variable = b. Dependent Variable = c. Alternative hypothesis in sentence form.

Scale (circle): Categorical Scale (circle): Categorical

Quantitative Quantitative

d.

calculated

= ) fail to reject null

e. Level of significance (p) = f. Circle: reject null or

g. Write your conclusion in sentence form. (1 pt)

Page 80

2. A stats instructor would like to know whether it is worthwhile to require students to do weekly homework assignments. For one section of the course, homework is assigned, collected and graded each week. For the second section, the same problems are recommended but not required. At the end of the semester, all students complete the same final exam. Letter grades (A, B, C, D, F) are tabulated for each student by section. Do these data indicate significant grade differences for students with homework versus no homework? Test at the .05 level.
Contingency table results: Rows: var1 Columns: var2

1 6 30% 66.67% 14.29% 3 13.64% 33.33% 7.143%

2 5 25% 50% 11.9% 5 22.73% 50% 11.9%

3 5 25% 45.45% 11.9% 6 27.27% 54.55% 14.29%

4 2 10% 28.57% 4.762% 5 22.73% 71.43% 11.9%

Total

2 20 10% 100.00% 40% 47.62% 4.762% 47.62% 3 22 13.64% 100.00% 60% 52.38% 7.143% 52.38%

Total

9 10 11 7 5 42 21.43% 23.81% 26.19% 16.67% 11.9% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 100.00% 21.43% 23.81% 26.19% 16.67% 11.9% 100.00%

Statistic Chi-square

DF

Value

P-value 0.647
Scale (circle): Categorical Scale (circle): Categorical Quantitative Quantitative

4 2.4870248

a. Independent Variable = b. Dependent Variable = c. Alternative hypothesis in sentence form. d.


2 calculated

e. Level of significance (p) = f. Circle: reject null or fail to reject null

g. Write your conclusion in sentence form. (1 pt)

Page 81

Additional Practice: Interpreting Research Articles Read the following excerpt to complete the questions on the next page: Researchers surveyed 120 college sophomores and juniors enrolled in general education psychology courses. Participants were between the ages of 18 and 23 and completed a survey that measured class absenteeism (cutting class) in the past month (for no valid reason) and seven negative behaviors and two positive behaviors--all measured using yes/no response. Negative behaviors included: speeding, slapped/hit someone, getting drunk, breaking the law, telling a significant lie, thinking about dropping out of school, feeling depressed, getting a tattoo, piercing body. Positive behaviors were reading a book that wasnt required for class and visiting family.
Table 1. Number and percentage of students answering yes to behaviors by groups of students who have cut class (n=68) and not cut class (n=52)

Cutting

Not Cutting % 87 93 51 21 12 10 12 19 26 37 91 N 24 39 10 8 3 5 11 4 7 15 40 % 46 75 19 15 6 10 21 8 13 29 77

Behavior Getting drunk Speeding Breaking law Telling significant lie Thoughts of dropping out Feeling depressed Hitting/ slapping Getting tattoo Piercing body Reading a non-required book Visiting family Note: * p<.05, ** p<.002

N 59 63 35 14 8 7 8 12 18 25 62

2 22.79** 7.19* 13.07** 0.53 0.79 0.02 1.95 3.16 3.17 0.83 4.61*

Source: Trice, A.D. , Holland, S. A., & Gagne, P.E. (2000). Voluntary class absences and other behaviors in college students: An exploratory analysis. Psychological Reports, 87, 179-182.

1. What percentage of students who did not cut class report reading a non-required book? 2. Is the difference in frequencies for speeding significant for the two groups? Explain.

3. Write the null hypothesis for group differences in getting drunk.

4. Should the null hypothesis you wrote for item 3 be rejected? Explain.

5. What can you conclude about students who cut class and get drunk?

Answers: 1) 29%; 2) yes, 2 =7.10, p<.05; 3) Students who cut class will NOT significantly differ in the behavior of getting drunk from students who do not cut class; 4) The null should be rejected since 2 =22.79, p<.002; 5) Students who cut class are more likely to get drunk and vice versa.

Statistical Test Grid


Independent Variable
Categorical Quantitative

Page 82

Dependent Variable

Categorical

Chi Square Test of Independence

1
t test (2) Single Sample Independent Samples Related Samples ANOVA (3+) Pearson Correlation (relate) Regression (predict)

Quantitative

Overview Items
1. Does disability category (LD, EBD, none, etc.) differ by gender?

Page 83

2. Does gender effect GRE scores?

3. Are GRE scores related to graduate GPA?

4. Does SES (low, middle, high) effect reading preparedness (as measured by a test) among preschoolers?

5. Does a seminar on self-esteem increase self-esteem scores? (Self-esteem was measure before and after the seminar)

6. Does learning style type differ by hand preference?

7. Do ACT scores predict college freshman GPA?

8. Do BGSUs GRE scores for entering graduate students significantly differ from the population norm?

9. Does a reading intervention significantly increase 4th grade reading proficiency scores? Note: one group receives intervention, while another group receives traditional instruction.

10. Does foot size (small, medium, large) effect IQ?

Page 84

Page 85

Page 86

Page 87

Page 88

Page 89

Page 90

Page 91

Page 92

Vous aimerez peut-être aussi