Vous êtes sur la page 1sur 15

Personality and Social Psychology Bulletin

http://psp.sagepub.com/ Inductive Reasoning and Judgment Interference: Experiments on Simpson's Paradox


Klaus Fiedler, Eva Walther, Peter Freytag and Stefanie Nickel Pers Soc Psychol Bull 2003 29: 14 DOI: 10.1177/0146167202238368 The online version of this article can be found at: http://psp.sagepub.com/content/29/1/14

Published by:
http://www.sagepublications.com

On behalf of:

Society for Personality and Social Psychology

Additional services and information for Personality and Social Psychology Bulletin can be found at: Email Alerts: http://psp.sagepub.com/cgi/alerts Subscriptions: http://psp.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav Citations: http://psp.sagepub.com/content/29/1/14.refs.html

>> Version of Record - Jan 1, 2003 What is This?

Downloaded from psp.sagepub.com at University of Bucharest on April 22, 2013

10.1177/0146167202238368 PERSONALITY AND SOCIAL PSYCHOLOGY BULLETIN Fiedler et al. / SIMPSONS PARADOX

ARTICLE

Inductive Reasoning and Judgment Interference: Experiments on Simpsons Paradox


Klaus Fiedler Eva Walther Peter Freytag Stefanie Nickel University of Heidelberg In a series of experiments on inductive reasoning, participants assessed the relationship between gender, success, and a covariate in a situation akin to Simpsons paradox: Although women were less successful then men according to overall statistics, they actually fared better then men at either of two universities. Understanding trivariate relationships of this kind requires cognitive routines similar to analysis of covariance. Across the first five experiments, however, participants generalized the disadvantage of women at the aggregate level to judgments referring to the different levels of the covariate, even when motivation was high and appropriate mental models were activated. The remaining three experiments demonstrated that Simpsons paradox could be mastered when the salience of the covariate was increased and when the salience of gender was decreased by the inclusion of temporal cues that disambiguate the causal status of the covariate. ence to a covariate, z. Most correlations can be reduced by the inclusion of an appropriate third variable. For example, we may attribute the success of social classes to living conditions, or we may explain gender differences in personality by the social roles predominantly taken by men and women (Hoffman & Hurst, 1990). In brief, understanding Simpsons paradox calls for the ability to deal with problems that logically correspond to covariance analysis (Schaller, 1992a, 1992b). The ability to detect and understand the moderating impact of a third variable, z, on the contingency between two variables, x and y, can be considered a marker of higher order cognitive functioning. Trivariate problems involving the joint impact of two input variables (e.g., gender, working conditions) on an output variable (e.g., performance) call for the ability to withstand potentially fallacious bivariate interpretations that ignore the impact of one of the input variables or, to use a common term, to overcome the fundamental attribution error (Ross, 1977; Ross & Nisbett, 1991; Trope, 1986). However, it is important to note that Simpsons paradox is more difficult to handle than other two-factorial attribution tasks because the two factors interact in a particularly tricky fashion, such that noticing the impact of one factor (e.g., gender) can undo the perceived impact of the other (e.g., graduate program standards). In this
Authors Note: This research was supported by grants from the German Research Foundation (DFG) to the first author. The authors are grateful to Mark Schaller and Michael Waldmann for helpful comments on drafts of this article. Correspondence concerning this article should be addressed to Klaus Fiedler, Department of Psychology, University of Heidelberg, Hauptstrasse 47-51, 69117 Heidelberg, Germany; e-mail: kf@psychologie.uni-heidelberg.de. PSPB, Vol. 29 No. 1, January 2003 14-27 DOI: 10.1177/0146167202238368 2003 by the Society for Personality and Social Psychology, Inc.

magine a local newspaper revealed that the proportion of male applicants admitted to graduate programs at your department is higher than that of female applicants, suggesting that women are discriminated against. To your relief, closer inspection shows that women actually fare better than men when taking into account a third variable, for example, the distinction between two graduate programs, A and B. As it turns out, the apparent disadvantage of female applicants simply reflects that program B has a higher rejection rate than program A and that most women applied to program B, whereas most men applied to program A. The Simpsons paradox (Simpson, 1951) underlying the above graduate admission problem should not be dismissed as a far-fetched statistical constellation. It is as common as the fact that any correlation between two variables, x and y, can be (partially) explained with refer14

Downloaded from psp.sagepub.com at University of Bucharest on April 22, 2013

Fiedler et al. / SIMPSONS PARADOX regard, Simpsons paradox affords a rather demanding graduator of higher order cognitive functioning, which is at the heart of critical judgment, enlightenment, and emancipation from simplistic explanations. To provide at least one illustration of this point, consider the case of stereotyping. As the subtyping paradigm demonstrates (e.g., Hewstone & Hamberger, 2000; Kunda & Oleson, 1995; Weber & Crocker, 1983), the stereotype that women are lower in leadership ability than men can be maintained in the face of counterstereotypical evidence by classifying obviously strong female leaders as atypical women. Here, a third variable is included to account for unexpected, counterstereotypical evidence. In other cases, such as the graduate admission problem, the stereotype is maintained by ignoring the third variable (i.e., the differential acceptance rates of programs A and B). Thus, stereotype maintenance often involves the negotiation with third variables in that potentially meaningful third variables are either deliberately ignored or deliberately taken into account.
EMPIRICAL EVIDENCE

15

do so when the distinction was causally irrelevant (i.e., samples of fruit examined by different labs). These findings suggest that Simpsons paradox can be solved under auspicious conditions, such as being personally involved or being equipped with an appropriate causal model. However, we would like to question that the reported facilitation effects provide unambiguous evidence for sound trivariate reasoning. What the evidence does show is that judgmental outcomes can shift from one factor (e.g., gender) to another (e.g., task difficulty). Although such a shift may be due to sound trivariate reasoning, it may as well reflect nothing but a heuristic switch from one explanatory focus to another. For example, motivated female participants may bias their judgments toward feminist convictions without mastering the epistemic conflict underlying the paradox, thus instantiating a case of goal-driven belief biases (Klauer, Musch, & Naumer, 2000).
OVERVIEW OF THE PRESENT INVESTIGATION

Although trivariate relationships are generally difficult to understand (Fiedler & Graf, 1990; Schaller, 1992a, 1992b; Schaller & OBrien, 1992; Waldmann & Hagmayer, 1995), several findings suggest that Simpsons paradox can be mastered when people are strongly motivated (Schaller, 1992a), when mental models support the understanding of trivariate relationships (Fiedler & Graf, 1990; Waldmann & Hagmayer, 1995), or when statistical training sensitizes people to the importance of third variables (Schaller, Asp, Rosell, & Heim, 1996). In Schallers (1992a) studies on the impact of motivation, for example, female rather than male participants recognized that gender differences, x, in anagram solving, y, simply reflected differences in the difficulty, z, of the anagrams solved by men (i.e., easy anagrams) versus women (i.e., difficult anagrams). The facilitative impact of causal models, to illustrate another route to enhanced performance in trivariate reasoning tasks, has been documented by Waldmann and Hagmayer (1995). Participants in these studies were asked to judge the impact of irradiation, x, on the quality, y, of different kinds of fruit, z. Irradiation was associated with high quality when pooling over kinds of fruit, but the advantage of irradiation turned out to be spurious once two different kinds of fruit, A versus B, were distinguished. The point was that A was of higher quality than B and that a greater proportion of A had been irradiated. Participants recognized that the advantage of irradiation was spurious when the distinction between kinds of fruit was causally relevant (i.e., fruit differing in genetic makeup) but they failed to

From our perspective, sound trivariate reasoning means to understand that conflicting relations can exist simultaneously. Keeping the level of a relevant moderator variable constant, the superiority of female performance at each level may be recognized but at the same time the inferiority of female performance at the aggregate level should be acknowledged as well. Appropriate dependent measures are needed to assess the awareness of such conflicting relations. The dependent measures employed in previous research (Schaller, 1992a, 1992b; Schaller et al., 1996; Schaller & OBrien, 1992; Waldmann & Hagmayer, 1995), however, pertain to global judgments at the aggregate level (e.g., Based on their anagram-solving performances, which group exhibited the most verbal intelligence?) (Schaller & OBrien, 1992, p. 778). Such summary judgments cannot reveal full mastery of the paradox. We therefore include measures that address different levels of analysis by asking for estimates of the same relation at the aggregate level on one hand and at the different levels of a meaningful covariate on the other. Our operational definition of sound inductive reasoning is that people recognize that diverging relations can exist simultaneously at different levels. Accordingly, simply discarding a primary factor (e.g., gender) and switching to a secondary factor (e.g., task difficulty) would be as inconclusive an index of sound trivariate reasoning as attending to the primary factor alone. More precisely, we introduce a distinction between three levels of reasoning performance:

1. Undifferentiated guessing means to ignore the third variable. For example, the inferiority of female performance at the aggregate level may be noticed and then

Downloaded from psp.sagepub.com at University of Bucharest on April 22, 2013

16

PERSONALITY AND SOCIAL PSYCHOLOGY BULLETIN


generalized to judgments of male and female performance at split-level. 2. Inductive differentiation means to successfully encode the joint operation of both factors while failing to recognize divergent relations at aggregate versus split-levels. For example, the inferiority of male performance at either split-level may be noticed and then generalized to judgments of male and female performance at aggregate level. 3. Sound trivariate reasoning, finally, means to successfully encode the joint operation of both factors while recognizing divergent relations at the aggregate versus splitlevels.

With reference to this three-level conceptualization, the initial purpose of our research was to get a more refined picture of reasoning performance on Simpsons paradox and of the facilitative influence of motivation and mental models noted by Schaller (1992a, 1992b) and Waldmann and Hagmayer (1995). Would motivated judges equipped with a suitable mental model engage in sound trivariate reasoning? Would they recognize the discrepancy between the aggregate and the specific level? Would specific judgments reflect that the third variable had been taken into account, or would most judges resort to undifferentiated guessing? The first five experiments were originally guided by the optimistic expectation that appropriate mental models and motivation might induce trivariate reasoning proper. We faithfully report, in chronological order, the persistent failure to support this expectation, to let you witness the strong treatments and boundary conditions we employed in our attempt to improve performance. The remainder of the article will then be devoted to three additional experiments that may help understand the conditions that facilitate versus inhibit sound trivariate reasoning (as well as some discrepancies between the present and previous research). Because these results were not anticipated but rather reflect our struggle with the initial results, we defer their introduction to a theoretical reflection following a brief report of the first five experiments.
EXPERIMENT 1

tings. That is, the proportion of rejections was higher for women than for men. Splitting data by university, however, an equal proportion of men or a higher proportion of men was rejected within either university. The point is that the rejection rate of one of the universities was higher than that of the other and that a greater proportion of women applied to the former, whereas a greater proportion of men applied to the latter. Thus, the unequal university standards could account for differences in the rejection rates. According to prior research, appropriate mental models and enhanced motivation should facilitate reasoning. To manipulate the availability of mental models, we increased the salience of the unequal rejection rates of the two universities. For a manipulation of the motivation to account for differences in the rejection rates of men and women, we varied the external versus internal attribution of applicant success. In the external attribution condition, each stimulus informed participants whether a male or female applicant was accepted or rejected, thus framing rejection as an externally determined outcome. In contrast, in the internal attribution condition, an applicant was reported to have succeeded on the test or to have failed to meet the requirements, thus framing rejection as due to an internal cause (e.g., lack of ability). Participants believing in discrimination against women may be more willing to report that women tend to be rejected in the external attribution condition than in the internal attribution condition. Method Participants and design. Sixty female students of the University of Heidelberg participated either for course credit or for payment (DEM 7 = about U.S.$3). They were randomly assigned, with an equal-n constraint, to one of the four experimental groups resulting from the between-participants variables rejection rate salience (salient vs. not salient) and attribution (external vs. internal). Various contrasts between dependent measures were included as within-participants variables (see below). Materials and equipment. The stimulus series consisted of 64 items describing the outcome of the applications of 32 women and 32 men (identified by a first name and an initial for the surname) to one of two universities, A or B. Of all women, 19 were rejected and 13 accepted, whereas only 13 men were rejected and 19 accepted. However, acceptance rates were equal or in favor of women (see upper part of Table 1) within both A (7/8 = 87.5% for women vs. 17/24 = 70.8% for men) and B (2/8 = 25% vs. 6/24 = 25%). New random orders and assignments of name initials to outcomes were generated for each session. Stimulus displays were generated by a computer program and projected onto the wall.

We describe the first experiment in more detail than the subsequent ones to introduce the general method used in the present series of experiments. Using the graduate program admission problem as a constant setting, we asked participants to assess the contingency between gender (female vs. male) and graduate program admission (granted vs. denied) at different universities (A vs. B). Information about individual applicants was always presented sequentially, and the university variable always served the role of a covariate. At aggregate level, the stimulus series confirmed the expectation that women are discriminated against in professional set-

Downloaded from psp.sagepub.com at University of Bucharest on April 22, 2013

Fiedler et al. / SIMPSONS PARADOX


TABLE 1: Stimulus Distributions for the Versions of Simpsons Paradox Used in the Experiments Aggregate Level Admission Normal Simpsons paradox Men Women Reversed Simpsons paradox Men Women Modified Simpsons paradox Men Women Granted Denied University A Granted Denied University B Granted Denied

17

19 13 16 16 12 4

13 = 19 16 = 16 4 = 12

17 7 16 8 10 3

7 + 1 8 + 0 3 + 1

2 6 0 8 2 1

6 18 8 16 1 11

NOTE: The corresponding correlation coefficients are .19 (aggregate), .17 (University A), and 0 (University B) for the normal Simpsons paradox; 0, .31, and .33 for the reversed paradox; and .50, .02, and .58 for the modified Simpsons paradox.

Procedure. Participants were received at the social psychology lab in groups of up to six and seated at separate tables. The general instructions started with the following cover story:
The present experiment is concerned with the chances that women have in academic professions. As you may know, there are many female students (over 50%) but the proportion of female professors is very low. This socalled pyramid effect may have several causes. For the present study, we have selected the case of male and female graduate students. Being granted admission to a graduate program enables a young person to gain a doctorate. If somebody seeks an academic career, getting a graduate position can be very important. We have observed men and women who applied for two different graduate programs in Germany. Since the precise data are confidential, we simply refer to University A and B.

At this point, rejection rate salience was manipulated. To render rejection rates salient (i.e., to activate a causally relevant mental model), participants were sensitized to the selection criteria of the universities in the following fashion:
These two universities differ markedly in their application standards. One of them is an elite university warranting high-quality education and sponsoring. This university can accept only a few applicants and must reject the majority. However, the other university accepts most applicants, while teaching at this university is not that intensive.

In the remainder of the instructions, an attempt was made to induce an internal versus an external attribution set. Participants were either told that information would refer to people who were smart enough to succeed or who failed (internal attribution condition) or to people who were accepted or rejected by the university (external attribution condition). Then the stimulus presentation was started at a rate of 4.5 s per item. Each item consisted of (a) the applicants first name (male vs. female) plus an initial for the surname, (b) a label for the university (A vs. B) to which the person had applied, and (c) the universitys feedback on admission (granted vs. denied). These three pieces of information were given in three consecutive lines within a rectangle representing the decision letter. There was a 1-s interstimulus interval between successive displays in which the screen remained blank. In the internal attribution condition, the admission feedback referred to Exam achievement and either SUCCEEDED or FAILED. In the external attribution condition, the admission feedback referred to the Decision letter and either ACCEPTED or REJECTED. Dependent measures. The dependent measures were administered in a questionnaire asking for several percentage estimates prompted by the following questions (in this order): What percentage of women have been rejected? What percentage of men have been rejected? What percentage of women have been rejected by University A? What percentage of men have been rejected by University A? What percentage of women have been rejected by University B? What percentage of men have been rejected by University B? What percentage of all applications have been rejected by University A? What percentage of all applications have been rejected by University B? What percentage of all women have applied

Moreover, a reminder was given at the end to keep in mind the different university standards. In the notsalient condition, this passage was omitted.

Downloaded from psp.sagepub.com at University of Bucharest on April 22, 2013

18

PERSONALITY AND SOCIAL PSYCHOLOGY BULLETIN


TABLE 2: Mean Percentage Estimates by Experimental Conditions (Experiment 1) Rejection Rates Salient Attribution Women rejected Men rejected Women-men rejected Women rejected/A Men rejected/A Women rejected/B Men rejected/B Women rejected A + B Men rejected A + B Women-men rejected A + B A applicants rejected B applicants rejected A-B/female applicants A-B/male applicants External 54.58 40.83 13.75 39.17 27.92 49.58 42.08 44.38 35.00 9.38 38.75 53.75 0.83 11.67 Internal 55.00 44.58 10.42 29.17 27.25 57.92 47.33 43.54 37.29 6.25 36.08 60.00 1.67 10.83 Not Salient External 60.42 46.50 13.92 46.25 35.42 60.00 50.42 53.13 42.92 10.21 41.67 62.50 16.67 12.50 Internal 57.50 48.75 8.75 45.00 32.33 63.33 51.25 54.17 41.79 12.38 45.83 63.33 4.17 2.50

for University A and for University B? and What percentage of all men have applied for University A and for University B? In addition, we asked participants to indicate on 100-mm graphical rating scales whether they themselves believed that women were disadvantaged in general, that women were particularly disadvantaged in universities, whether they thought it was important to fight for womens rights, and whether they were involved in the womens movement. After completion of the dependent measures, participants were thanked, debriefed, and paid. Results Let us first consider the ratings for the bivariate relations between gender, admission, and the covariate, universities. As evident from Table 2, it was generally recognized that more female applicants were rejected than male applicants, that more rejections referred to B than to A, and although to a lesser degree, that female and male applicants tended to apply to different universities. Several analyses of variance (ANOVA) on percentage estimates with rejection rate salience and attribution as between-participant variables and applicant sex as well as university as within-participant variables corroborate these observations. The ANOVA of the estimated overall rejection rates yielded a highly significant main effect for applicant gender, F(1, 44) = 23.48, p < .001, but no other effect involving the between-participant variables. Participants in all conditions correctly noticed that more female than male applicants were rejected overall. Similarly, the ANOVA of the percentage estimates of applicants rejected by A versus B resulted in a single significant main effect for university, F(1, 44) = 28.59, p < .001, indicating that the different standards of A and B were generally noticed. To examine participants sensitivity to the university preferences of men and women, we calculated, separately for male and female applicants, the difference between the estimated percentage of applications to A and B (i.e., applications to A minus applications to B). If the actual preferences had been recognized, this difference should be positive for men and negative for women. Only the main effect for the within-participant variable was significant, F(1, 44) = 6.31, p < .05. Participants tended to recognize the higher proportion of women applying to B and of men applying to A, although less clearly than the other bivariate relations. No other effect was significant. The lack of any interaction involving the attribution and rejection rate salience variables suggests that all three bivariate contingencies were assessed quite accurately in all conditions. The premises for solving Simpsons paradoxsuccessful encoding of all three

NOTE: A and B represent University A and University B.

variablesshould thus be met. So we can now turn to the crucial test of the percentage estimates within universities. The estimated within-university rejection rates for men and women were summed over A and B and divided by 2. If judges correctly understood that women were in fact no less successful than men within universities, the applicant gender main effect should vanish on this score. However, as Table 2 reveals, these pooled estimates of female and male rejections within universities yielded the same strong main effect as the aggregate judgments, F(1, 44) = 18.90, p < .001. Women were judged to be less successful than men. This (erroneous) tendency did not interact with the between-participant variables. Of interest, the failure to solve Simpsons paradox was correlated with participants belief that women are disadvantaged. The sum score computed from the three relevant ratings (internal consistency = .76) correlated positively (!) with the judged inferiority of female applicants, both at aggregate level (r = .38) and at split-level (r = .36). Positive relations also were obtained with active involvement in the womens movement (r = .36 and r = .29, respectively). All of the reported correlations were significant at the 5% level. Thus, the more participants believed in discrimination, the more erroneous the judgments they gave. Discussion These results suggest that even though all bivariate relations were extracted quite accurately, and despite the measures taken to highlight the differing university standards and to make an internal attribution of applicant success more likely, no evidence pointed to sound

Downloaded from psp.sagepub.com at University of Bucharest on April 22, 2013

Fiedler et al. / SIMPSONS PARADOX trivariate reasoning. A similarly strong female disadvantage was perceived on aggregate and, erroneously, within universities. With regard to our three-level taxonomy of inductive reasoning, participants not only failed to reach the upper-most level of sound trivariate reasoning but also failed to reach the level of inductive differentiation in that performance hardly ever exceeded the level of stereotypical guessing. Closer inspection of Table 2 reveals that the judged difference of male and female rejection rates was at least somewhat reduced within universities. However, this reduction did not reach significance in a four-factorial ANOVA including the comparison of overall versus within-university judgments as a further withinparticipants variable, F(1, 44) = 2.74, ns, for the Levels Applicant Gender interaction. In sum, our results do not support the expectations derived from the work of Schaller and colleagues (Schaller, 1992a, 1992b; Schaller & OBrien, 1992) and the work of Waldmann and Hagmayer (1995). The observed discrepancies, however, may be attributable to differences in the experimental procedures. Unlike Waldmann and Hagmayer, for example, we used sequential rather than tabular stimulus presentation. In addition, we used different presentation rates and different motivators than did Schaller and colleagues. However, the poor performance we obtained may as well reflect the use of dependent measures that allow for a more refined assessment of inductive reasoning as distinct from inductive differentiation and guessing.
EXPERIMENTS 2 TO 5

19

Goal-driven versus open reasoning. The manipulations we have used to manipulate the activation of mental models (i.e., rejection rate salience) and participant motivation (i.e., attribution) may not have been strong enough to induce motivated reasoning (Friedrich, 1993; Gigerenzer & Hug, 1992). An explicit processing goal may be needed to counteract the impression that women are less successful than men. Thus, in Experiment 3 the task was modified such that participants were asked either to determine whether the assumption is true that women are inferior or to find out why the assumption must be false that women are inferior. Note that the latter version is close to blatant demand. Paradox type. Another reason for low performance may be offered by the often-noted difficulty to detect zero correlations (Petersen, 1978) or the difficulty to suppress a stereotype (Macrae, Bodenhausen, Milne, & Wheeler, 1996). In the normal Simpsons paradox (see the upper part of Table 1), the stereotype that more women are rejected than men is clearly supported at aggregate level, whereas solving the puzzle requires recognizing a zero correlation at the specific level. Experiment 4 thus compared the normal paradox with a reversed Simpsons paradox (see the midpart of Table 1), in which the correlation was zero at the aggregate level, whereas it was in favor of women at split-levels. Stereotype consistency. Similarly, the lack of trivariate reasoning and the adherence to crude guessing might reflect the power of an overwhelming gender stereotype. We thus reversed the role of men and women in Experiment 5 such that the phenomenon to be explained was counterstereotypical (i.e., a male disadvantage). Presentation rate. Schaller and OBrien (1992) showed enhanced performance with a presentation duration of 10 s per item as opposed to the 4.5 s per item used in Experiment 1. Accordingly, we varied the presentation rate in Experiment 5. Method Participants and design. The overall numbers of participants in Experiments 2 to 5 are 24, 60, 60, and 56, respectively. They participated either for course credit or for payment (DEM 7 = about U.S.$3). Participants were randomly assigned to experimental groups with an equal-n constraint. Table 3 summarizes which variables were manipulated in which experiment. Specifically, attribution was manipulated in Experiment 2, Experiment 3 included manipulations of mental model activation and processing goal, Experiment 4 involved manipulations of paradox type (normal vs. reversed) and mental model activation, and the design of Experiment 5 included manipulations of presentation rate and stereotype consistency. Up to Experiment 3, only female participants

Before drawing premature conclusions, we have to rule out alternative explanations. Experiments 2 to 5 constitute attempts to render our manipulations of the activation of appropriate mental models and of reasoning motivation more effective and to eliminate procedural differences from prior experiments. Because the basic method remained invariant, Experiments 2 to 5 are reported together in a summary format. The following factors were varied to clarify the results of Experiment 1 and the discrepancy from previous findings. Attention to the covariate. The comparably low F value in Experiment 1 for the rated number of women versus men applying to A versus B suggests that some participants might have failed to encode which university people had actually applied to (see Schaller & OBrien, 1992, Experiment 2, for a similar problem). To force participants to notice the university-applicant association, Experiments 2 to 5 were conducted in computer dialog and participants had to direct a request at the university to which an applicant had applied.

Downloaded from psp.sagepub.com at University of Bucharest on April 22, 2013

20

PERSONALITY AND SOCIAL PSYCHOLOGY BULLETIN


TABLE 3: Overview of Boundary Conditions Introduced in Experiments 2 to 5 Experiment Boundary Condition Attention to the covariate Warranted through active request Attribution Internal External Mental model Activated (reminder for differing standards) Not activated (no reminder) Processing goal Test female inferiority hypothesis Falsify female inferiority hypothesis Paradox type Normal (see upper part of Table 1) Reversed (see midpart of Table 1) Stereotype consistency Consistent (at aggregate level) Inconsistent (at aggregate level) Presentation rate 4.5 s per item 12 s per item 2 3 4 5

were used. Afterward, both male and female participants were permitted. Because participant gender never affected the results, all reported statistics were collapsed over the levels of this variable. Materials and procedure. Participants were received at the social psychology lab in groups of up to six and seated in separate cubicles equipped with a PC. The same questionnaire as in Experiment 1 was used to assess the dependent measures. In addition, the same basic instructions and the same task setting were used as before, except that participants were seated in separate cubicles equipped with a PC. In this fashion, a modified procedure during stimulus presentation could be implemented that served to warrant efficient encoding of the university to which people applied. Each trial started with an indication of the applicants gender and of the university to which he or she had applied. The outcome of the application, however, was not included. To receive this feedback, participants had to direct a request at the appropriate university by pressing the A versus B keys of the keyboard. If the keystroke did not match the university, the computer responded with a beep. After completion of the dependent measures, participants were thanked, debriefed, and paid. Each experiment employed subsets of the treatments described below (see also Table 3). Attribution. In the external attribution condition, the wording of the feedback was Decision letter: ACCEPTED/REJECTED, whereas in the internal attribution condition, the wording of the feedback was Exam achievement: SUCCEEDED/FAILED. Mental model activation. To activate an appropriate mental model, an extra reminder was given that A and B may have different standards. Beginning with Experiment 3, an additional hint suggested that women are striving for ambitious goals, thus highlighting their preference for the prestigious University B and offering an alternative interpretation for their seemingly weak performance. Processing goal. The last instruction paragraph in the neutral hypothesis condition read,
Your task is to find out whether the assumption is true that women are inferior. The question is whether or not this assumption is supported by the data you will be presented with soon. Later on, you will be asked to give an account of your responses.

NOTE: Entries indicate whether a specific boundary condition was realized () or not ().

Paradox type. The stimulus series either accorded to the normal Simpsons paradox or the reversed Simpsons paradox (see the upper part and the midpart of Table 1, respectively). Stereotype consistency. In the stereotype-consistent condition, information at aggregate level (in the normal paradox) was consistent with the stereotypical expectation that women are discriminated against in professional settings. That is, more female than male applicants were rejected at the aggregate level. In the stereotype-inconsistent condition, the role of male and female target persons was reversed. That is, a larger proportion of women than of men were accepted at the aggregate level. Presentation rate. Stimuli were presented at a rate of either 4.5 s or 12 s per item. Results The mean percentage estimates on all relevant dependent measures are summarized in Table 4, as a function of experimental conditions. Across all experiments, a rather stable pattern of results emerged that was largely independent of all between-participants variables. Aggregate level estimates reflect a strong and stable tendency to report higher rejection rates for women than for men. (Note that these judgments are veridical only for the normal Simpsons paradox and in the stereo-

In contrast, in the falsification goal condition, this passage of the instruction was replaced by the following text: Your task is to find out why the assumption must be false that women are inferior. Later on, you will be asked to give an account of your responses.

Downloaded from psp.sagepub.com at University of Bucharest on April 22, 2013

Fiedler et al. / SIMPSONS PARADOX


TABLE 4: Mean Percentage Estimates by Experimental Condition (Experiments 2 to 5) Experiment 3 Processing Goal Experiment 2 Attribution External Internal Fs rejected Ms rejected Fs-Ms rejected Fs rejected A+B Ms rejected A+B Fs-Ms rejected A+B A applicants rejected B applicants rejected A-B Fs rejected A-B Ms rejected 56.25 37.92 18.33 47.92 35.50 12.42 25.83 60.42 31.67 28.75 45.63 35.00 10.63 32.92 Test Mental Model Activ. 59.33 43.33 16.00 49.00 43.67 5.33 40.00 Not 57.07 41.60 15.47 48.67 37.00 11.67 32.00 Falsify Mental Model Activ. 66.80 53.80 13.00 53.33 47.33 6.00 43.33 66.33 3.33 8.00 Not 61.20 40.67 20.53 49.15 44.20 4.95 29.67 Reverse Mental Model Activ. 55.60 39.20 16.40 49.00 42.00 7.00 35.67 Not 48.67 39.33 9.34 40.83 39.33 1.50 37.67 53.67 18.00 19.33 Experiment 4 Paradox Type Normal Mental Model Activ. 47.87 34.67 13.20 46.07 36.87 9.20 35.53 50.00 21.33 22.13 Not 63.87 48.80 15.07 56.03 44.73 11.30 47.13 61.67 10.00 10.67 Experiment 5 Presentation Rate 12 s Stereotype Cons. 52.71 44.64 8.07 49.39 44.29 5.10 36.71 64.00 10.14 10.46 Inc. 45.00 46.43 1.43 39.11 37.75 1.36 34.14 64.00 2.68 3.39 4.5 s Stereotype Cons. 45.93 38.86 7.07 42.43 35.68 6.75 42.50 50.71 1.61 1.61

21

Inc. 41.07 43.43 2.36 41.79 38.54 3.25 36.79 59.86 3.93 3.93

63.75 59.17 22.83 32.50 24.17 26.42

54.33 62.33 18.67 22.00 23.33 22.67

61.60 59.33 28.93 11.00 34.00 14.33

NOTE: F = female; M = male; A = University A; B = University B; Activ. = activated; Not = not activated; Cons. = consistent; Inc. = inconsistent.

type-consistent conditions.) The corresponding main effects for the comparison of female versus male applicants (first two rows in Table 4) were significant in Experiment 2, F(1, 22) = 28.37, p < .001; in Experiment 3, F(1, 56) = 65.75, p < .001; in Experiment 4, F(1, 56) = 41.90, p < .001; and in Experiment 5, F(1, 52) = 5.89, p < .05. The somewhat reduced effect in Experiment 5 is due to the stereotype-inconsistent condition. When the roles of female versus male applicants were reversed so that more women were accepted on aggregate, men and women received similar ratings, reflecting a kind of compromise between stereotypical beliefs and veridical differences. There was general agreement on the other two bivariate relations regarding the differential rejection rates of A versus B, F(1, 22) = 40.47, F(1, 56) = 49.55, F(1, 56) = 28.37, and F(1, 52) = 24.62, for Experiments 2 to 5, respectively (all ps < .001). Likewise, it was generally recognized that men and women tended to apply to different universities, F(1, 22) = 25.57, F(1, 56) = 52.18, F(1, 56) = 46.79, and F(1, 52) = 14.24, for Experiments 2 to 5, respectively (all ps < .001). The crucial test of trivariate reasoning is whether participants recognized that the aggregate relationship between rejection and target gender was eliminated or reversed at split-level. If the role of the third variable was fully understood, judgments at split-level should be sensitive to the rejection rates within universities. Again, we took the average of the two estimates within University A and B (see rows 4 and 5 in Table 4). The degree to which women were judged to be inferior was occasionally somewhat lower than at the aggregate level, but a strong and

stable tendency remained to report more rejections for female than for male applicants. The target gender main effect amounts to F(1, 22) = 25.57, p < .001, F(1, 56) = 20.99, p < .001, F(1, 56) = 15.35, p < .001, and F(1, 52) = 7.55, p < .01, for Experiments 2 to 5. Discussion Thus, across all experiments reported so far, a robust bias is maintained to report higher rejection rates for women than for men, with little sensitivity to differences between the aggregate versus split-levels. Although this tendency seems to reflect the actual advantage men have over women at the aggregate level, the results obtained in the stereotype-inconsistent condition suggest that stereotypical guessing also was involved. Within the present task setting, then, there is little evidence for a mastery of Simpsons paradox that goes beyond the most primitive level of undifferentiated guessing. Although the task was obviously meaningful and involving and although rather accurate bivariate judgments testify to our participants attention and cooperation, hardly anybody was able to report the correct contingencies at the specific level. The persistence of this negative finding even under highly facilitative conditions with almost blatant extra reminders is quite surprising. Participants were sensitized to the higher rejection rate of University B. They found out that most women tended to apply for B, where most applicants were rejected. The additional hint that women are striving for ambitious goals provided an obvious link between the rejection of women and the high standard of the university to which they preferred to apply. Some partici-

Downloaded from psp.sagepub.com at University of Bucharest on April 22, 2013

22

PERSONALITY AND SOCIAL PSYCHOLOGY BULLETIN women and men to be 65% and 35%, respectively, should interfere with percentage estimates for the specific universities located in completely different ranges of the percentage scale. Such constraints might be much weaker for estimates of cardinal frequencies. Overall rejection frequencies of 19 out of 24 women and 13 out of 24 men might appear rather consistent with subsequently judged split-level frequencies of 1 out of 8 women versus 7 out of 24 men rejected at A and of 18 out of 24 women versus 6 out of 8 men at B. Because judgment interference could not occur in the earlier studies conducted by Schaller (1992a, 1992b) and Waldmann and Hagmayer (1995)which asked for aggregate ratings onlythis factor suggests a possible explanation for the diverging results. Likewise, the improved performance in our own experiments (Fiedler et al., 2002) was obtained using a frequency format. To the extent that judgment interference is at work, performance should increase when such interference is eliminated. In Experiment 6, the initial aggregate ratings were therefore omitted and frequency estimates rather than probability estimates of specific event combinations were called for. Under these conditions, performance should reach at least the level of inductive differentiation (i.e., correct reproduction of split-level frequencies). Of course, with aggregate ratings omitted, one cannot demonstrate the highest level of sound trivariate reasoning. We will therefore return to judgments at both levels in Experiments 7 and 8, using a reversed task order to minimize interference. If avoiding aggregate gender ratings in the primacy position is crucial, the highest level of sound trivariate reasoning might be possible if split-level estimates precede aggregate judgments. In Experiments 7 and 8, we also manipulated aspects of the experimental task that may affect the degree to which participants focus exclusively on the gender variable. In particular, we examined the role of causal cues in disambiguating the pattern of correlations underlying Simpsons paradox. Even when all event combinations are assessed perfectly, the crucial ambiguity remains: Was the female rejection rate caused by the high rejection rate of B? Or did the high rejection rate of B reflect the inferiority of the predominantly female applicants? Statistical assessment alone cannot provide an answer. Additional, nonstatistical cues may be needed. One such source of nonstatistical information may consist in temporal order cues. Whether admission outcomes are perceived to originate in gender or universities may depend on which factor is introduced first. Thus far, the stimulus display always favored the encoding of applicant gender over that of university. Within such a causal-temporal frame, gender is the antecedent, whereas universities may rather serve the role of a media-

pants were even explicitly asked to explain why it could not be true that women are inferior. All of these treatments failed to improve inductive reasoning. Theoretical Reflection Given such persistent failure, the interpretation suggests itself that the experimental task may have been too complex. Rather than arriving at such a pessimistic conclusion, however, we kept on searching for alternative accounts. After all, in other experiments on trivariate reasoning we ran recently, participants reached at least the level of inductive differentiation (Fiedler, Walther, Freytag, & Stryczek, 2002; Fiedler, Walther, & Nickel, 1999). These findings, together with those obtained in the studies reviewed in the introduction, suggest that the encoding of trivariate structures does not exceed human cognitive capacity. If we want to keep this more optimistic perspective, what inhibiting factors can we name to explain the persistent lack of evidence for sound trivariate reasoning? We devote the remaining three experiments to empirical answers to this question. In doing so, we will draw on the notion of judgment interference on one hand and on nonstatistical cues that can help disambiguate ambiguous statistical patterns on the other. As to the notion of judgment interference, it seems reasonable to ask whether we can rule out that the very format and ordering of the judgment tasks may have kept our participants from expressing their knowledge, although observations pertaining to the different levels of the gender, admission, and university variables may have been encoded effectively. For example, participants may have focused on the impact of gender while neglecting the impact of the university variable because aggregate judgments pertaining to gender always preceded split-level judgments pertaining to universities. Prompted by the initial task (i.e., What percentage of women have been rejected?), judges might have scanned memory for stimuli involving female applicants and responded, say, 65%. A few moments later, when asked to estimate the percentage of women rejected within A and B, respectively, the initial judgment and the associated sample may still have been active in working memory. It would not really be much of a surprise if these split-level judgments were biased toward the preceding aggregate judgments. Having just thought of all female applicants, and given that most women applied to B, why should participants not rely on the same sample when judging the fate of female applicants within B? Moreover, judgment interference processes of this kind seem particularly likely when judging probabilities rather than absolute frequencies (e.g., Fiedler, 1988; Gavanski & Hui, 1992; Gigerenzer & Hoffrage, 1995). Having just estimated the overall percentages of rejected

Downloaded from psp.sagepub.com at University of Bucharest on April 22, 2013

Fiedler et al. / SIMPSONS PARADOX tor of a primary gender effect. Presenting the university before gender might induce a different causal frame. The university variable may then serve the role of an antecedent and function as a moderator that offers an explanation of qualitatively different gender effects at its different levels. Recent evidence (Fiedler et al., 2002) suggests that, compared to a mediator frame, such a moderator frame facilitates reasoning about simultaneously operating factors (cf. Baron & Kenny, 1986).
EXPERIMENT 6

23

of the only between-participants variable, paradox type, with an equal-n constraint. Materials and procedure. Participants were received at the psychology lab in groups of up to eight and seated in separate cubicles equipped with a PC. Instructions were essentially the same as in previous experiments except for the cover story used to embed the student admission problem in a gambling machine context. Participants learned that applicant gender would be represented by two alternative buttons mounted on each of two gambling machines, which in turn would serve to represent the two universities, A and B. Two gambling machines were used to represent the two universities. One machine was represented by a rectangular frame on the left half of the screen (representing university A), whereas the other was represented by a rectangular frame on the right half of the screen (representing university B). Each gambling machine featured a blue and a yellow button used to represent applicant gender, and individual gambles could either result in a win (an array of stars appearing in the outcome display of the machine) or in a loss (an array of random symbols appearing in the outcome display of the machine). On each trial during stimulus presentation, the frame representing one of the gambling machines was highlighted. After 3 s, the yellow or the blue button started to blink (identifying the applicants gender) and a male versus female applicant name appeared. After another 2 s, feedback about acceptance versus rejection appeared in the outcome display of the machine. The feedback remained on the screen for 5 s. The assignment of the blue versus yellow buttons to applicant gender was counterbalanced and did not affect any of the results. Dependent measures. Upon completion of the last trial, participants estimated, separately on successive screens, how many women applied to A and how many of these had been accepted and how many men had applied to A and how many of these had been accepted. The same block of four frequency estimates was then repeated for B. Finally, the implicit measure was administered, involving 3 trials each for female and male applicants to A or B, thus yielding a total of 12 prediction task trials presented in random order. On each trial, participants could express their expectations regarding acceptance by staking a variable amount of money (ranging from DEM 0.00 to DEM 0.30, i.e., from U.S.$0.00 to U.S.$0.15) on the applicants success. The amount at stake was doubled and paid as an extra reward for participation if the applicant was indeed accepted and was lost if the applicant failed. Whatever amount participants had subtracted from the maximum stake would be theirs independent of the applicants fate. The amount of money put at stake was considered an implicit measure of expected applicant

The same material was used as in the preceding experiments, except for modifications of the judgment task intended to reduce interference. First, a frequency scale was used instead of a percentage scale. Second, only split-level judgments were called for (i.e., success of women vs. men applying to A vs. B), but no aggregate level judgments. The design included the normal paradox as well as a modified paradox (see bottom part of Table 1) with an even more differentiated stimulus distribution (i.e., a higher rejection rate for female applicants on aggregate, equal rejection rates within A and a higher rejection rate for male applicants within B). We expected that judgments should convey these differentiations quite accurately. In addition to the explicit frequency estimation task, we also administered a prediction task that yields an implicit index of the perceived rejection rates for male and female applicants at the two universities. On each trial of the prediction task, a novel male or female target applied to one of the universities and judges had to predict whether the applicant would be accepted or rejected. The prediction task comprised 12 trials, that is, 3 trials each for every combination of the applicant gender and university variables. On each trial, participants could express their expectations regarding acceptance by staking an amount of up to DEM 0.30 (U.S.$0.15) on the applicants success. The amount at stake was doubled if the applicant was indeed accepted. If a participant believed that an applicant would definitely be accepted, a rational strategy would be to put the maximum amount of DEM 0.30 (U.S.$0.15) at stake. If a participant believed that an applicant would definitely be rejected, a rational strategy would be not to put any money at stake. To render the prediction task more sensible, stimulus presentation was modified accordingly (see the Method section below for more details). Method Participants and design. Twenty-six male and female students of the University of Mannheim participated for payment. Participants were randomly assigned to the normal and the modified paradox condition (see the upper part and the bottom part of Table 1, respectively)

Downloaded from psp.sagepub.com at University of Bucharest on April 22, 2013

24

PERSONALITY AND SOCIAL PSYCHOLOGY BULLETIN

success as a function of applicant gender and university. No feedback on actual admission was given until the end of the prediction task. After the prediction task, participants were thanked, debriefed, and paid. Results and Discussion Indices of the explicit and implicit assessment of male and female rejection rates are summarized in Figure 1 as a function of paradox type. For comparison purposes, both frequency estimates as well as betting data were transformed to proportions. That is, the estimated frequency of rejections was divided by the estimated number of applicants, and the amount of money put at stake was divided by the maximum amount possible. The pattern of results is simple and straightforward. Assessments of all combinations of the gender and university variables were sensitive to the objectively presented stimulus frequencies. Aside from the normal regression tendency observed for estimates referring to extremely low and extremely high frequencies, subjective estimates reflected the objective values quite accurately. This is especially true for the implicit measure. Two-factorial ANOVAs on the frequency estimates, conducted separately for either paradox type, with applicant gender and university as repeated-measures variables corroborate this impression. For the normal paradox, the analysis for the frequency estimates yielded a significant main effect for university, F(1, 24) = 15.56, p < .001, but no other effects (all Fs < 1.7), thus correctly reflecting the fact that no gender difference remained within the two universities, which in turn were perceived to differ strongly from each other. On the implicit measure, we found a main effect for university, F(1, 24) = 36.88, p < .001, that was accompanied by an applicant gender main effect, F(1, 24) = 5.31, p < .05. The interaction term was also significant, F(1, 24) = 8.86, p < .01, reflecting a slightly larger proportion of estimated rejections for women applying to A and a slightly smaller proportion of estimated rejections for women applying to B. However, of importance, the generalized tendency to overestimate the rejection rate for women within both universities was largely eliminated. For the modified paradox, frequency estimates correctly reflected the outstanding rejection rate for one factor combination, female applicants at B, as manifested in significant main effects for university, F(1, 24) = 11.92, p < .01, and applicant gender, F(1, 24) = 13.05, p < .01, and a significant interaction term, F(1, 24) = 6.37, p < .05. The same pattern was obtained in the ANOVA on the implicit measure, Fs(1, 24) = 13.08, 19.27, and 4.78, p < .01, p < .001, and p < .05, respectively. Exactly such a pattern is typically produced by ANOVA when there is one deviating cell in a 2 2 design (Rosenthal & Rosnow, 1985). Thus, even when success and failure varied with

Male B

Female B

Male A

Explicit

Implicit Objective
Female A

20

40

60

80

100

Normal Paradox

Male B

Female B

Male A

Explicit

Implicit Objective
Female A

20

40

60

80

100

Modified Paradox
Figure 1 Objective percentages of rejected male versus female applicants, explicitly and implicitly assessed estimates of these percentages (transformed to percentages) by university and paradox type (Experiment 6).

universities in a complex manner, eliminating judgment interference led to improved performance up to the level of inductive differentiation at split-level.
EXPERIMENTS 7 AND 8

Can performance in the graduate program admission context even reach the highest level of sound trivariate reasoning? For an experimental test, of course, it is necessary to return to the assessment of both aggregate level and split-level judgments. Accordingly, we returned to our original task surface while trying to keep judgment

Downloaded from psp.sagepub.com at University of Bucharest on April 22, 2013

Fiedler et al. / SIMPSONS PARADOX interference low by soliciting split-level judgments before turning to the aggregate level. In addition, the frequency format was replaced by graphical ratings in Experiment 7 and a probability (percentage) format in Experiment 8. We expected that performance might be retained due to reversed task order. If so, the frequency format may not be necessary to obtain sound trivariate reasoning. In Experiment 7, we also manipulated the temporal order in which information about applicant gender and university appeared in the stimulus display. As explained above, the causal model induced by a moderator frame (i.e., university cues prior to gender cues) should facilitate the recognition of an independent causal role of the covariate and should therefore lead to higher performance than a mediator frame (i.e., gender cues prior to university cues). Being provided with the latter frame, the higher rejection rate of B than A might be conceived as simply mediating the effects of applicant gender. Method Participants and design. In Experiment 7, 43 participants were presented the normal Simpsons paradox; in Experiment 8, 40 participants were presented the reversed paradox. Participants were male and female undergraduate students at the Heidelberg University who volunteered without payment. Recall that the normal paradox is characterized by a female disadvantage on aggregate and little difference within universities, whereas the reversed paradox involves an equal distribution on aggregate and a marked female advantage within universities. Materials and procedure. The same materials and procedures were applied as in the first five experiments, except for the following modifications. First, the experiments were conducted in a lecture hall, using the same stimulus display as in the first experiments (i.e., by projecting the stimulus series onto the wall). Second, all ratings were made on 100-mm graphical scales, with splitlevel estimates preceding aggregate level estimates. In Experiment 7, but not in Experiment 8, participants also indicated their own attitudes concerning discrimination against women in academic contexts in particular and in society in general. Results and Discussion For the first time, there was a marked divergence of judgments at aggregate level from judgments at the splitlevel. In Experiment 7, using the normal paradox, the higher aggregate rejection rate for female applicants was correctly reflected in the mediator condition (Mfemale = 61.2, Mmale = 38.5) as well as the moderator condition (Mfemale = 57.8, Mmale = 39.8). In contrast, split-level estimates accurately reflected the rejection rates of female versus male applicants within A (in the mediator condi-

25

tion Mfemale = 8.8, Mmale = 9.9; in the moderator condition Mfemale = 10.6, Mmale = 10.3) and almost accurately reflected the rejection rates of female versus male applicants within B (in the mediator condition Mfemale = 20.0, Mmale = 13.7; in the moderator condition Mfemale = 21.1, Mmale = 15.7). Unlike the first five experiments, there was a clear-cut dissociation of judgments at different levels. ANOVAs with gender and level of analysis (aggregate level vs. split-levels) yielded significant interaction terms in separate analyses for the moderator condition, F(1, 22) = 16.65, p < .001, and for the mediator condition, F(1, 19) = 6.66, p < .05. When the moderator versus mediator manipulation was included in a three-factorial ANOVA, the three-way interaction fell short of significance, F(1, 41) < 1.5, ns. However, because rejection rates for A and B are not strictly equal in the normal paradox, we also ran a more refined analysis. A summary index was calculated to capture each participants level of reasoning, based on the differences between the estimated rejection rates of female and male applicants (i.e., rejection of women minus rejection of men) at the aggregate and the two split-levels, daggregate, dA, and dB. Accurate assessment means to recognize that daggregate is positive (i.e., more women are rejected overall), dA is negative (less women are rejected within A), and dB is zero (no differences exist within B). We transformed differences to deviation scores (by subtracting the mean) and then computed the following reasoning index (RI), which should be sensitive to differential rejection rates at different levels:
RI = (+1 daggregate) + (1 dA) + (0 dB).

The correlation of this index with the dummy coding of the temporal order manipulation (i.e., coding the mediator condition as 0 and the moderator condition as 1) amounts to r = .38, p < .05, reflecting more sensitivity to the divergent rejection rates in the moderator condition than in the mediator condition. Of interest, the advantage of the moderator condition carried over to the final ratings of beliefs in female discrimination. Being in the moderator condition correlated negatively (r = .30, p < .05) with the summed ratings of the three questions (i.e., Do you believe that women are disadvantaged in general? Do you believe that women are particularly disadvantaged in professional life and in universities? and Do you believe that it is important to fight for womens rights?). Thus, the results of Experiment 7 are at least compatible with the finding (Fiedler et al., 2002) that a moderator frame can facilitate trivariate reasoning, although the present evidence is not very strong. The results of Experiment 8 are based on percentage estimates, a constant moderator frame, and the reversed paradox (see the midpart of Table 1). Participants gener-

Downloaded from psp.sagepub.com at University of Bucharest on April 22, 2013

26

PERSONALITY AND SOCIAL PSYCHOLOGY BULLETIN that judgment interference might be responsible for the inability to use the encoded information adequately. Accordingly, we next tried to create conditions that reduce judgment interference. Indeed, judgments at aggregate and split-level diverged when the task context was modified so as to counteract judgment interference. More specifically, Experiment 6 showed that judgments met the criterion of inductive differentiation (i.e., accurate estimates of combinations of the applicant gender and university variables) when aggregate judgments were fully omitted and when the probability judgment format was replaced by a frequency format. To demonstrate trivariate reasoning proper, however, it was necessary to return to a task setting that includes aggregate as well as split-level judgments. Experiment 7 showed that diverging judgments could be induced for both levels, provided that split-level judgments precede aggregate judgments. Giving temporal primacy to the covariate, universitiesboth in the stimulus display and in the sequence of dependent measuresproved to be an appropriate remedy against the fixation on the overwhelming gender variable. Finally, Experiment 8 replicated and extended this finding to the reverse paradox, in which a split by universities revealed female superiority despite the fact that the rejection rate is lower for men at the aggregate level. Both Experiment 7 and 8 clarified that temporal order can reduce judgment interference. Although temporal order was not directly manipulated within the same experiment, the relevant comparisons were shown to be significant in another recent study (Fiedler et al., 2002). That some of our findings are based on comparisons between different experiments is admittedly a point of weakness of the present investigation. Despite these caveats, however, the present research suggests some noteworthy theoretical insights concerning the boundary conditions of sound inductive reasoning in social psychology. With regard to experimental paradigms in social cognition that are based on trivariate reasoning problems (e.g., subtyping, causal attribution, hypothesis testing), two interesting implications must be pointed out. First, it should be noted that stereotypical expectancies per se did not impose strong constraints on reasoning performance. Although the common expectation that women, rather than men, are at a disadvantage in professional settings did affect the more primitive level of undifferentiated guessing, the stereotype did not prevent accurate and differentiated judgments as soon as a more sophisticated level of reasoning was reached. The second implication, which may be more important, concerns the moderator versus mediator manipulation. Although the contiguity and temporal order of antecedent and consequent events has long been recognized as central to the definition of causality, theories and

ally recognized that women (Mfemale = 50.1%) and men (Mmale = 48.0%) perform about equally well on aggregate, although the rejection rate of women (Mfemale = 54.4%) outperforms that of men (Mfemale = 64.9%) within universities. The interaction term in an Applicant Gender Level of Analysis ANOVA was highly significant, F(1, 39) = 10.04, p < .01. These results corroborate that genuine trivariate reasoning can be solicited if judgment interference is reduced through appropriate task ordering and stimulus display.
GENERAL DISCUSSION

The present article started with the claim that Simpsons paradox can be considered a graduator of cognitive-social emancipation, or the ability to take more than a single salient influence factor into account. There is wide agreement that human intelligence can acquire and apply this ability at least in specific domains (Gilbert & Osborne, 1989). Under auspicious conditions, people can reattribute an observed behavior, y, as due to the situation, z, rather than the person, x. Even kids begin to understand that the amount of water, y, in a glass not only depends on the height, x, of the glass but also on its diameter, z (Piaget, 1952). Recent research on Simpsons paradox also conveys the optimistic conclusion that trivariate problems can be mastered, contingent on sufficiently high motivation and appropriate mental models (Schaller, 1992a, 1992b; Schaller et al., 1996; Schaller & OBrien, 1992; Waldmann & Hagmayer, 1995). However, the tasks and dependent measures used in these studies did not provide unequivocal evidence. We introduced a three-level taxonomy of reasoning performance in trivariate tasks, ranging from (a) undifferentiated guessing to (b) inductive differentiation and (c) sound trivariate reasoning as the most mature level, the latter level being characterized by the simultaneous recognition of diverging relation at different levels of analyses (i.e., at the aggregate level vs. at the split-levels of a covariate). Although we established high accuracy motivation and all kinds of helpful instructions and mental models, reasoning performance hardly exceeded the level of undifferentiated guessing across five (!) experiments. Even though most judges correctly recognized that a larger proportion of women applied to B than to A and that the rejection rate was much higher at B than at A, they failed to recognize that the apparent inferiority of female applicants might be due to that context factor. This failure persisted even when judges were repeatedly reminded of the unequal rejection rates and even when they were explicitly instructed to find out why it cannot be true that female applicants are inferior. However, granting that the encoding of trivariate information must be possible in principle, we speculated

Downloaded from psp.sagepub.com at University of Bucharest on April 22, 2013

Fiedler et al. / SIMPSONS PARADOX research on causal attribution (cf. Hewstone, 1989), contingency assessment (Allan, 1993; Fiedler, 2000), and stereotyping (Hamilton & Sherman, 1994) have largely neglected this factor. The present research lends further support to the notion that temporal cues may serve an important function in the disambiguation of basically ambiguous statistical data (see also Fiedler et al., 2002). Whether the same statistical input is interpreted as strengthening or as weakening a stereotypical expectation may depend crucially on the temporal order with which the respective causal factors are processed. Leaving behind us the optimism from which the present investigation started and the pessimism conveyed by the initial findings, the latter finding entails a really optimistic message: Human intelligence has apparently evolved inductive tools that integrate statistical as well as temporal (and presumably spatial) information within the same process. This ability points to a notable aspect of adaptive human cognition, well ahead of purely statistical models. Indeed, normative approaches could profit a lot from trying to imitate this human ability, developing models that consider both the extensional distribution of joint events as well as their timing, spatial characteristics, and intensional structure of meaningful events.
REFERENCES Allan, L. G. (1993). Human contingency judgments: Rule based or associative? Psychological Bulletin, 114, 435-448. Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51, 1173-1182. Fiedler, K. (1988). The dependence of the conjunction fallacy on subtle linguistic factors. Psychological Research, 50, 123-129. Fiedler, K. (2000). Illusory correlations: A simple associative algorithm provides a convergent account of seemingly divergent paradigms. General Review of Psychology, 4, 25-58. Fiedler, K., & Graf, R. (1990). Grouping and categorization in judgments of contingency. In J. P. Caverni, J. M. Fabre, & M. Gonzalez (Eds.), Cognitive biases (pp. 47-57). Amsterdam: North Holland. Fiedler, K., Walther, E., Freytag, P., & Stryczek, E. (2002). Playing mating games in foreign cultures: A conceptual framework and an experimental paradigm for inductive trivariate inference. Journal of Experimental Social Psychology, 38, 14-30. Fiedler, K., Walther, E., & Nickel, S. (1999). Covariation-based attribution: On the ability to assess multiple covariates of an effect. Personality and Social Psychology Bulletin, 25, 607-622. Friedrich, J. (1993). Primary error detection and minimization (PEDMIN) strategies in social cognition: A reinterpretation of confirmation bias phenomena. Psychological Review, 100, 298-319. Gavanski, I., & Hui, C. (1992). Natural sample spaces and uncertain beliefs. Journal of Personality and Social Psychology, 63, 766-780. Gigerenzer, G., & Hoffrage, U. (1995). How to improve Bayesian reasoning without instruction: Frequency formats. Psychological Review, 102, 684-704. Gigerenzer, G., & Hug, K. (1992). Domain-specific reasoning: Social contracts, cheating, and perspective change. Cognition, 43, 127-171.

27

Gilbert, D. T., & Osborne, R. E. (1989). Thinking backward: Some curable and incurable consequences of cognitive business. Journal of Personality and Social Psychology, 57, 940-949. Hamilton, D. L., & Sherman, J. W. (1994). Stereotypes. In R. S. Wyer & T. K. Srull (Eds.), Handbook of social cognition (2nd ed., Vol. 1, pp. 168). Hillsdale, NJ: Lawrence Erlbaum. Hewstone, M.R.C. (1989). Causal attribution. Cambridge, MA: Basil Blackwell. Hewstone, M., & Hamberger, J. (2000). Perceived variability and stereotype change. Journal of Experimental Social Psychology, 36, 103-124. Hoffman, C., & Hurst, N. (1990). Gender stereotypes or rationalization? Journal of Personality and Social Psychology, 58, 197-208. Klauer, K. C., Musch, J., & Naumer, B. (2000). On belief bias in syllogistic reasoning. Psychological Review, 107, 852-884. Kunda, Z., & Oleson, K. C. (1995). Maintaining stereotypes in the face of disconfirmation: Constructing grounds for subtyping deviants. Journal of Personality and Social Psychology, 68, 565-579. Macrae, C. N., Bodenhausen, G. V., Milne, A. B., & Wheeler, V. (1996). On resisting the temptation for simplification: Counterintentional effects of stereotype suppression on social memory. Social Cognition, 14, 1-20. Petersen, C. (1978). Learning impairment following insoluble problems: Learned helplessness or altered hypothesis pool. Journal of Experimental Social Psychology, 14, 53-68. Piaget, J. (1952). The origins of intelligence in children. New York: International University Press. Rosenthal, R., & Rosnow, R. L. (1985). Contrast analysis: Focused comparisons in the analysis of variance. New York: Cambridge University Press. Ross, L. (1977). The intuitive psychologist and his shortcomings: Distortions in the attribution process. In L. Berkowitz (Ed.), Advances in experimental social psychology (Vol. 10, pp. 173-220). New York: Academic Press. Ross, L., & Nisbett, R. E. (1991). The person and the situation: Perspectives of social psychology. New York: McGraw-Hill. Schaller, M. (1992a). In-group favoritism and statistical reasoning in social inference: Implications for formation and maintenance of group stereotypes. Journal of Personality and Social Psychology, 63, 61-74. Schaller, M. (1992b). Sample size, aggregation, and statistical reasoning in social inference. Journal of Experimental Social Psychology, 28, 65-85. Schaller, M., Asp, C. H., Rosell, M. -C., & Heim., S. J. (1996). Training in statistical reasoning inhibits the formation of erroneous group stereotypes. Personality and Social Psychology Bulletin, 22(8), 829844. Schaller, M., & OBrien, M. (1992). Intuitive analysis of covariance and group stereotype formation. Personality and Social Psychology Bulletin, 18, 776-785. Simpson, E. H. (1951). The interpretation of interaction in contingency tables. Journal of the Royal Statistical Society, 13, 238-241. Trope, Y. (1986). Identification and inference processes in dispositional attribution. Psychological Review, 93, 239-257. Waldmann, M., & Hagmayer, Y. (1995). Causal paradox: When a cause simultaneously produces and prevents an effect. Proceedings of the 17th annual Conference of the Cognitive Science Society. Hillsdale, NJ: Lawrence Erlbaum. Weber, R., & Crocker, J. (1983). Cognitive processes in the revision of stereotypic beliefs. Journal of Personality and Social Psychology, 45, 961-977. Received August 27, 2001 Revision accepted April 14, 2002

Downloaded from psp.sagepub.com at University of Bucharest on April 22, 2013

Vous aimerez peut-être aussi