Académique Documents
Professionnel Documents
Culture Documents
Chi-Square on STATISTICA
There are 2 ways of performing Chi-square tests on STATISTICA. The method
depends on the form of the data. One may have the original spreadsheet file, in
which each individuals score on the category appears or one may be provided with
the frequency of people in each category. I will deal with each of these separately.
Chi-square Goodness-of-Fit Test
Spreadsheet
In this situation one has the full set of data, such as that provided for your
assignment, SETA.STA.
Suppose one wished to establish whether the proportions of the different population
groups in the sample matched that of the population of the two provinces. The first
thing one would have to do is to establish the frequency of each population group
using the Frequency Tables option in the Basic Statistics module. These
frequencies are the observed frequencies. The expected frequencies would be
derived from the census data and transformed to match the size of our sample.
Thereafter you would perform the analysis in the way described in the following
section.
Frequencies provided
For the Goodness-of-Fit test, one has to have both the raw frequencies and the
expected frequencies. These must be captured into a file, such that the observed
and expected frequencies are in two columns. The name of the category should be
placed in the first column. To exemplify this, I will use Example 2.1.1 about the
popularity of take-way food given in the notes on Revision of Basic Statistics. The
data file would look like this:
1 of 4
Chi-square on STATISTICA
Once this file has been saved, to analyse the data do the following:
Switch to the STATISTICA module called Nonparametric Statistics.
From the menu, select the option Observed versus expected X2. Click on OK.
In the next screen, indicate the names of the variables containing the observed
and expected frequencies. These then appear under the variable list, next to
with observed: and with expected:. Click on OK.
The next screen gives the results:
expected
O-E
(O-E)**2
C: 1
43.0000
64.0000
-21.0000
6.89063
C: 2
122.0000
64.0000
58.0000
52.56250
C: 3
84.0000
64.0000
20.0000
6.25000
C: 4
70.0000
64.0000
6.0000
0.56250
C: 5
38.0000
64.0000
-26.0000
10.56250
C: 6
27.0000
64.0000
-37.0000
21.39063
Sum
384.0000
384.0000
0.0000
98.21875
On the first screen, under either the Crosstabulations or the Stub-and banner
table heading click on Specify tables.
If you did this under Crosstabulations heading then 6 columns will appear,
listing all the variables of the file (usethe first 2). If you did this under the Stuband banner heading, then only 2 columns will appear. Click on the first desired
variable in the first column, and the second desired variable in the second
column. Thus, in the question in Assignment 1, question 2c, you would click on
SEX in one column and LANG2 in the other, where LANG2 is a variable in which
you have recoded every language except 1 (English) and 2 (Afrikaans) as
missing. Click on OK.
You are taken back to the original screen. At this point, if you need to specify
that the analysis should be done for only a certain segment of the sample (e.g.
on Coloured adolescents only) click on the Select Cases button under the OK
button. If you do not need to select particular cases, skip the next step.
The Case Selection Conditions screen then opens. Click on the box next to the
Enable Selection Condition. Next to the Include section click on Specific,
selected by. The By Expression and By case number boxes then become
active. In the By Expression box, type your condition in this example,
PGP=3. In SETA, the variable PGP held the population group of the
adolescents, and the code for Coloured was 3. Click on OK. This will take you
back to the Crosstabulations Tables screen again. Click on OK.
2 of 4
Chi-square on STATISTICA
You are taken back to the Options page of the Crosstabulations Tables Results
screen. Here you can chose whether (among other options) you want the tables
to provide the frequencies as percentages, and can select what kind of statistics
one want. In this case we select the Pearson and M-L chi-square.
Select the Advanced page and click on Detailed two-way tables. Two screens
of results appear, one giving all the cell numbers and percentage, and the other
giving the chi-square results.
Frequencies provided
Sometimes we are presented with the results of a crosstabulation, without the raw
data. Thus we would be given the frequency in each cell, as in Example 2.3.1 in the
notes on Revision of Basic Statistics. Recall, the frequencies given were:
Southern Suburb
Northern Suburb
Atlantic
Seaboard
Chinese
43
10
45
Pizza
122
150
60
Chicken
84
60
58
Hamburger
70
230
30
Pasta
38
30
74
Fish
27
20
110
Create a file in which you have one variable (SUBURB) holding the suburb,
another (TYPE) holding the type of take-away, and a third (FREQ) holding the
frequency of the relevant cell. It would look like this:
On the first screen, under either the Multiway crosstabulation tables or the
Stub-and banner table heading click on Specify tables.
3 of 4
Chi-square on STATISTICA
For the first variable in the first column click on SUBURB, for the second click on
TYPE. Click on OK.
You are taken back to the Crosstabulation Tables screen. Click the button next
to Use selected grouping codes only, then on Codes. The Select codes for
grouping factors screen will appear. Select All for each variable. Check that
the codes that appear are accurate. Click on OK.
Back in the Crosstabulation Tables screen, click on the button next to the
Select Cases button labelled W. A screen called Analysis/Graph Case Weights
screen appears. Click the buttons next to:
- Use weights for this analysis only
- Status: On
In the box labelled Weight Variable type the name of the variable holding the
frequencies of the cells. In the above example this was called FREQ. Click on
OK.
You will be taken back to the Crosstabulation tables screen. Once again, click
on OK.
The screen called Crosstabulation Results will appear. Select the Option page.
In the section headed, Compute Tables, click on Percentages of total counts,
Percentages of row counts, and/or Percentages of column counts.
Under the section headed, Statistics for 2-way Tables, click on Pearson & M-L
Chi-square
Select the Advanced page and click the button labelled Detailed 2-way tables.
Two screens of results appear, one giving all the cell numbers and percentage,
and the other giving the chi-square results:
2-Way Summary Table: Observed Frequencies (chi3) Marked cells have counts > 10
TYPE
South
TYPE
TYPE
TYPE
TYPE
TYPE
43
122
84
70
38
27
Column %
43.88%
36.75%
41.58%
21.21%
26.76%
17.20%
Row %
11.20%
31.77%
21.88%
18.23%
9.90%
7.03%
10
150
60
230
30
20
Column %
10.20%
45.18%
29.70%
69.70%
21.13%
12.74%
Row %
2.00%
30.00%
12.00%
46.00%
6.00%
4.00%
Atlantic
45
60
58
30
74
110
Column %
45.92%
18.07%
28.71%
9.09%
52.11%
70.06%
Row %
11.94%
15.92%
15.38%
7.96%
19.63%
29.18%
Totals
98
332
202
330
142
157
North
Row
384
500
377
1261
363.6237
df
df=10 p=0.0000
df=10 p=0.0000
Gillian Finchilescu
Revised January 2003
4 of 4