Académique Documents
Professionnel Documents
Culture Documents
grades, the Watwin score was able to account for between 36%
Semantically incorrect Semantically incorrect
[26] and 42% [27] of the variance in final course grades.
However, a refinement of the Error Quotient published more
recently appears to raise the Error Quotient's predictive power to Unknown Syntactically correct/ Syntactically incorrect/
nearly 30% [23]. The NPSM presented here can be seen as an Semantically unknown Semantically unknown
expansion of the Error Quotient and Watwin Score—one that
Editor activity
or timeout
Editing Syntactically Correct, Editing Syntactically Correct, Editing Syntactically Incorrect, Editing Syntactically Incorrect,
Last Debug Unsuccessful Last Debug Successful Last Debug Successful Last Debug Unsuccessful
Code: YN Code: YU Code: NU Code: NN
Idle
All states timeout after 3 minutes of
Yes No inactivity (e.g. no editor activity,
compile, etc.)
Has Runtime Exception? Upon new activity, transition is made
to last known state.
Figure 1: Programming State Transition Diagram
state (NU and NN) to a semantically incorrect state (YN and YU) 11 data points per student. These 11 data points form the
only if the last debug attempt yielded a runtime exception. Normalized Programming State Model (NPSM).
Observe that intermediate execution states are also captured in 4. STUDY I: EVALUATING
this state transition diagram. If the student's program is
syntactically correct, it can be executed either with or without the EXPLANATORY POWER
debugger in Visual Studio. This leads to the four left-most The goal of our first empirical study was two-fold: (1) to address
"Execute" states in the diagram (RN, RU, DN, DU). In contrast, if RQ1 by exploring whether the results of previous studies of the
the student's program is syntactically incorrect, it is not possible to Error Quotient and Watwin Score could be replicated using a
execute it in debug mode. However, in Visual Studio, it is different student population, programming language, and
possible to execute the last successful build of a program. This programming environment, and (2) to address RQ2 by exploring
leads to the right-most "Execute" state (R/). the explanatory power of the NPSM.
Lastly, two additional states are necessary in this model. First, it is We conducted Study I in a 15-week CS 2 course at Washington
impossible to determine the state if no compilation or execution State University. Taught by the first author, the course used C++
attempts have been made. To account for this situation, which as its instructional language, and required students to use the
commonly occurs at the beginning of a programming session, we Microsoft® Visual Studio® programming environment [25] for
define an additional state called "Unknown (Start) State" (UU). course assignments. The course revolved around three weekly 50-
Second, a prolonged period of inactivity (three minutes) in any minute lectures and one weekly 170-minute lab. We collected
state leads to a transition to the Idle state, in which the next programming process and grade data, and used those data as input
editing activity causes a transition back to the previous state. to the Error Quotient, Watwin Score, and NPSM.
As indicated in Table 2, all measures were significant but weak NPSM (no min. 11, 653 8.92 < 0.01 0.08
predictors of individual assignment scores. If we filter out data time)
corresponding to students who spent less than an hour of NPSM (>1 hr) 10, 591 8.92 < 0.01 0.11
programming time on an assignment, the NPSM model accounted
for the most variance ( = 0.11) in assignment grades.
Table 3. Significant Predictors in NPSM for Individual
Interestingly, setting a minimum time limit of one hour altered
Assignment Grades (no minimum time)
three of the five significant contributing factors in the NPSM.
Variable t p
4.3.2 Predictions of Overall Assignment Averages
We next aggregated an entire semester's worth of IDE data and YN 0.32 2.03 0.04
correlated these data with students' overall assignment averages. RU 0.31 3.77 <0.01
Results for each measure are presented in Table 5. Significant
RN 0.18 2.98 <0.01
contributors in the NPSM model are shown in Table 6.
R/ 0.12 2.92 <0.01
By considering an entire semester's worth of data, two of the three
predictive measures improved. The Error Quotient's explanatory Time on task 0.09 2.45 0.02
power decreased, whereas the Watwin Score increased its
explanatory power by a factor of five, and the NPSM increased its Table 4. Significant Predictors in NPSM for Individual
explanatory power by more than a factor of three. In absolute Assignment Grades (at least 1 hr of programming time)
terms, the NPSM was a substantially better predictor than the
other two measures, nearly quadrupling the explanatory power of Variable t p
its closest rival (the Watwin Score). NU -0.17 -3.80 < 0.01
Interestingly, when the input dataset was expanded to include all UU -0.15 -3.45 < 0.01
data collected throughout the semester, the number of NPSM
variables that made significant contributions shrank from three RU 0.14 2.95 < 0.01
(NU, UU, RU) to two (UU, NU) (see Table 6). Moreover, both of Time on task 0.11 2.57 0.01
these variables (UU and NU) were negatively correlated with
performance. Recall that the UU state is used when students first
begin programming. It makes sense that the longer students go Table 5. Explanation of Variance in Average Assignment
without compiling or running their programs, the more likely it is Grades
that they will do poorly on the assignment. Likewise, it makes Measure df F p Adj. R2
sense that students who spend large proportions of time in the NU
state would tend to do worse on assignments, since students in Error Quotient 1, 94 7.16 <0.01 0.06
that state are grappling with syntax errors, and may not ever be Watwin Score (one) 1, 94 11.50 < 0.01 0.10
able to execute their programs. Indeed, the significance of NU as
Watwin Score (all) 1, 94 10.93 < 0.01 0.10
an explanatory factor aligns well with the Error Quotient and
Watwin Score, both of which can be seen as quantifying the rate NPSM 10, 84 7.00 < 0.01 0.39
at which students leave the NU state.
4.3.3 Predictions of Final Grades Table 6. Significant Predictors in NPSM for Assignment
Lastly, we consider each measure's ability to explain the variance Average Grades
in students' final course grades. As was the case with assignment
averages, we used students' programming behavior over the entire Variable t p
semester as input to each measure. The results of this analysis are UU -0.44 -3.88 <0.01
presented in Table 7. Significant factors in the NPSM are shown
NU -0.22 -2.28 0.03
in Table 8. As can be seen in Table 7, the Error Quotient was the
only measure that did not significantly correlate with students'
final grades. The Watwin Score appears to be slightly better at two significant factors in the NPSM (UU and NU) remained the
explaining the variance in final grades than it was at explaining same in its correlations with assignment average and final grade.
the variance in assignment scores. In contrast, the NPSM appears 5. STUDY II: DERIVING A PREDICTVE
slightly worse at accounting for the variance in final grades than
at accounting for the variance in average assignment scores. As MEASURE
before, however, the NPSM did a substantially better job in The previous study showed that the NPSM was able to account
absolute terms, furnishing three times the explanatory power of for substantially more variation in student performance than both
the Watwin Score, its closest competitor. Finally, we note that the the Error Quotient and Watwin Scores. Given this potential, it
makes sense to derive a predictive measure that can be used in situ
to predict performance, rather than post hoc to explain variance.
Table 7. Explanation of Variance in Final Grades
Measure df F p Adj. R2 A1
Error Quotient 1, 94 3.68 0.06 0.03 A1-A2
Watwin Score 1, 94 14.26 < 0.01 0.12 A1-A3
(single error)
A1-A4
Watwin Score (all 1, 94 14.40 < 0.01 0.12
errors) A1-A5
NPSM 10, 84 6.63 < 0.01 0.36 A1-A6
A1-A7
Table 8. Significant Predictors in NPSM for Final Grades
0% 20% 40% 60% 80% 100%
Variable t p
UU -0.30 -2.55 0.01 Figure 2. Seven Programming Data sets of Increasing Size
NU -0.27 -2.74 <0.01 as a Percentage of All Programming Data
coefficients, we see that the relative contributions of RU and RN
decrease as the size of the data increases, whereas the relative
We now present a follow-up study that uses the results of the
contributions of UU and NU increase as the size of the data
previous study to derive a predictive formula rooted in the NPSM.
increases. Finally, we see a drop in the amount of variance
Given that the NPSM includes eleven predictors, the ideal sample explained when adding data associated with the last assignment.
size for achieving full statistical power when deriving a predictive Whether this represents a true ceiling in the NPSM's predictive
measure would be approximately 220 students (see, e.g. [11, 29]). power remains an interesting question for future research.
While it is still possible to detect strong effects on a smaller
sample size, running eleven predictors against our sample size of 5.3 A Predictive Formula
95 students increases the probability of producing a significant Running the NPSM model with the variables UU, NU, RU, and
model without any significant predictors. For this reason, we RN across the seven overlapping data sets reveals a general trend
restricted ourselves to the development of a four-variable model— in which the amount of variance increases with the size of the data
the most appropriate size, given the size of our dataset [11, 29]. set. We now use the coefficients from these results in order to
formulate two predictive measures. The first is obtained by
We began by examining the seven significant variables identified
averaging the unstandardized beta coefficients of each predictor
in Study I: YN, RU, RN, R/, NU, UU, and time on task. A
variable across the data sets considered. The second model is
preliminary data analysis using datasets of varying sizes (see
obtained by using a weighted averaged of each predictor variable's
Figure 2) revealed a sporadic level of significance for YN, R/, and
unstandardized beta coefficients. Recall that the weighted average
time on task. Therefore, we decided to drop these variables from
is formulated based on the overall model's variance numbers.
further consideration, and settled on the variables UU, NU, RU,
Using the averaged coefficient values reported in Table 9, we
and RN for our predictive model.
arrive at the following formula:
5.1 Method
For Study II, we used the same programming log data and grade
data as were used in Study I. However, for this study, we
Using the weighted coefficient values yields a slightly different
evaluated the NPSM using seven input data sets whose sizes were
formula:
systematically varied, as illustrated in Figure 2. The first data set
consisted solely of the data collected during the first programming
assignment. The final six data sets each added an additional
assignment's grades and programming data. Therefore, starting To verify the accuracy of each formula, we calculated the
with the second, data set, the outcome variable was the average of predicted score for each dataset using both formulas. Next, we
all the programming assignment scores received up to that point in performed a linear regression using this predicted score as the
time. It follows that the final data set included all programming predictor variable and the student's actual assignment score as the
data from the semester, matching the dataset reported in Table 5. outcome variable. We deemed a formula to be successful if it
closely mirrored the total amount of variance explained by the
5.2 Results overall NPSM model. The amount of variance accounted for by
For each of the seven data sets, a multivariate regression was
each formula, as well as the overall NPSM model, is listed in
performed using UU, NU, RU, and RN as predictor variables and
Table 10. Inspection of Table 10 reveals that both formulas
assignment averages as outcome variables. Table 9 provides the
closely mirror each other (within +/- 2%), and that both are close
individual contribution of each predictive variable. The bottom
to the overall NPSM model in terms of explanatory power. As
two rows of the table compute the average value and weighted
such, it would appear that either formula does a good job of
average value of each coefficient.
transforming the NPSM data into a usable predictive measure.
In examining the coefficients listed in Table 9, we see that the RU
and RN variables are consistently significant regardless of the 6. DISCUSSION
amount of data considered. In contrast, the UU and NU variables We now turn to a detailed discussion of our results, organized
only became significant as the amount of data considered around the two research questions we posed for this research.
increases. However, in examining the standardized beta
Table 9: NSPM Predictive Power and Coefficients for Seven Datasets of Increasing Size (* = sig. at p < 0.05)
6.1 RQ1: Do Prior Results Generalize to Table 10: Variance Explained by Overall NPSM Model,
Averaged NPSM Formula, and Weighted NPSM Formula
Different Populations and Programming
Averaged Weighted
Languages/Environments? NPSM Coefficient Coefficient
The results for the Error Quotient and Watwin measures differ Dataset Model Formula Formula
drastically from the results presented in prior research. For
example, a recent study of both the Error Quotient and Watwin A1 13% 13% 13%
measures accounted for 18% and 36% of the variance in students' A1-A2 15% 15% 13%
final grades [26] as compared to merely 3% and 12% in our study.
A1-A3 20% 20% 18%
How can we account for this large discrepancy? We offer two
possible explanations. A1-A4 26% 28% 27%
First, differences in the instructional emphasis of the courses A1-A5 37% 38% 39%
studied might have contributed to the differences in the Error A1-A6 45% 43% 45%
Quotient and Watwin Score observed across the studies. In
A1-A7 41% 39% 41%
previous studies in which the Error Quotient Watwin Score were
calculated, student homework was worth just 25% of the overall
reason that the Error Quotient and Watwin would artificially
grade. In contrast, in our study, student homework accounted for
inflate the base penalty assigned to students for each failed
35% of the overall grade.
compilation. Furthermore, in Visual Studio®/C++, the possibility
Second, the discrepancies in Error Quotient and Watwin Score that both the Error Quotient and Watwin Score will generate false
measures might be related to key differences in the programming positives (matched compilations that have the same error message
environments and languages used in the studies. Previous studies but for different reasons) increases. In contrast, the coarser
focused on the BlueJ [17] and the Java programming language. approach taken by the NPSM is not affected by these differences:
This study collected data on Microsoft® Visual Studio® and the an error state is an error state, regardless of whether a student
C++ programming language. Both the Error Quotient and Watwin generated one or one hundred errors in a given compilation.
Score rely on the processing of compilation error messages. Given
Even though the amount of variance accounted for by the Error
that C++ compilers tend to produce terser and/or more obtuse
Quotient and Watwin Score in this analysis is much lower than
compilation error messages, it seems plausible that differences
what has been previously reported, it is still possible to make
could have occurred with respect to students' compilation
comparisons with prior studies. For example, in their first study,
behaviors in the two environments. For example, forgetting a
Watson et al. [27] found that the predictive power of both the
semi-colon in BlueJ and Java results in the error message, "error:
Error Quotient and Watwin Score increased as a function of the
';' expected," followed by the exact line on which a semi-colon is
size of the input data. When they considered only a single
missing. In contrast, forgetting a semi-colon in Visual Studio®
assignment's worth of data (roughly 2-3 weeks), the variance
and C++ results in nine error messages. The first message is a red-
explained by both Error Quotient and Watwin was fairly low:
herring referencing an illegal usage of a type as an expression. For
10% for Error Quotient and 6% for Watwin. However, by the end
the actual cause, the user must look to the second error message,
of the term, the variance explained by Error Quotient and Watwin
which states, "syntax error: missing ';' before identifier <x>,"
had increased to 19% and 42% respectively. Using relative
with <x> being the line below the statement on which a semi-
magnitudes, we see that, in their study, Error Quotient increased
colon is missing.
by a factor of two and Watwin by a factor of seven.
Of these two explanations, we find the second one to be the most
These results are somewhat consistent with the results of this
compelling. Recall that both the Error Quotient and Watwin Score
study, which found that the Error Quotient performed best with
assign penalty points when subsequent compilation attempts
smaller data sets, and that the Watwin Score performed best with
either result in more errors, or contain the same error messages as
larger data sets. However, unlike in previously published studies,
previous compilation attempts. Given that Visual Studio® and
the variance explained by the Error Quotient in our study actually
C++ generate more error messages per compilation, it stands to
decreased as the size of the input data set increased. That the
relative trend in the amount of variance explained by the Watwin 7. CONCLUSION AND FUTURE WORK
Score is similar across studies, whereas the relative trend in the This paper introduced the NPSM, a holistic model of student
amount of variance explained by the Error Quotient is not, lends programming behavior; compared its explanatory power against
further credence to the idea that these predictive measures do not two previously established measures; and derived a formula for
perform consistently when applied to different programming predicting student performance given a set of programming
environments and languages. However, in order to increase our process data. Our results indicate that, at least in the population
confidence in this claim, we would need to conduct additional considered in this paper, the NPSM is much better at predicting
studies of the Error Quotient, Watwin Score, and NPSM using a student performance than the Error Quotient and Watwin Score.
variety of programming environments and languages.
Our preliminary research into the NPSM suggests several
6.2 RQ2: How Well Can a More Holistic directions for future work. First, future studies should examine the
Programming Model Predict robustness of the NPSM by performing a replication study with a
Performance? larger student population under similar classroom conditions. This
would allow researchers to test the predictive NPSM formula
In this paper, we developed the NPSM, a predictive model based
against a population different from the population used to derive
on the time spent in a set of programming states derived from a
it. Furthermore, the increased power that accompanies an increase
program’s syntactic and semantic correctness. Given the
in sample size might allow for the discovery of additional
configuration of instructor, assignments, exams and IDE we
significant factors within the NPSM.
studied, the NPSM outperformed models that only consider
compilation behavior. At the level of individual programming Second, future research should examine the suitability of the
assignments, the NPSM accounted for four times as much NPSM as a predictive measure under conditions not considered in
variance as the Watwin Score, but only slightly more variance our study. This includes applying the NPSM to different
(1%) than the Error Quotient. With respect to students' overall programming languages, environments, and computing courses. It
assignment averages, the NPSM accounted for nearly four times will be especially important to explore the predictive power of the
as much variance as the Watwin Score, and over six times the NPSM as applied to different programming environments, given
variance accounted for by the Error Quotient. With respect to final that the NPSM includes states that are unreachable in novice
course grades, the NPSM accounted for three times as much programming environments. For example, novice programming
variance as the Watwin Score, and 12 times as much variance as environments often do not allow a program to be executed unless
the Error Quotient. it is syntactically correct (i.e., there is no R\ state), and have only
In Study II, we developed a predictive formula based on four one mode of execution (i.e., there is no distinction between RN
NPSM states. RN (execute a semantically incorrect program) and and DN, and between RU and DU). Might a modified NPSM,
RU (execute a semantically unknown program) were found to be with some states eliminated and other states combined, yield the
positive contributors to student success. Conversely, UU (default same predictive power as was observed in this study?
state before first compilation/execution action is taken in a Third, one should consider expanding the scope of the NPSM by
programming session) and NU (syntactically incorrect program) incorporating predictors that are not based on programming
were found to be negative contributors to a student's success. This behavior. For example, in ongoing work, we are exploring how a
seems to indicate that toying with a program's runtime behavior, student's online social behavior (see, e.g. [13]) might impact the
regardless of semantic correctness, is a successful programming predictive capabilities of the NPSM.
approach. In contrast, writing large portions of code without
attempting to compile (UU) is not conducive to success. Indeed, it Lastly, we plan to explore how the NPSM might serve as a
is easy to imagine that when these students finally do compile, foundation for pedagogical interventions derived from a student's
they quickly find themselves in NU, the other state negatively NPSM state. For example, a student who appears to be stuck in an
correlated with performance. It is also worth noting that editing unhelpful state (e.g. NU) might be prompted to ask for help.
states that precede runtime exceptions (YN, NN) were not Alternatively, we might be able to use programming behavior to
significant predictors. Therefore, it might be worthwhile to drop encourage students to improve their programming techniques. For
this distinction in a future version of our model. example, for students who spend a lot of time in the RN (execute
without debug) state, an intervention could suggest using the
As revealed by our study, the calculations performed by both the
debugger (DN) to troubleshoot semantic issues.
Error Quotient and Watwin measures are based on the least
weighted significant contributor in the NPSM model: NU. For instructors, we envision an online dashboard that could
Interestingly, performing a linear regression with NU as the sole present continuously-updated information on students' NPSM
predictor variable explains more variance than either the Error states and programming progress. Using this information,
Quotient or Watwin Score for both assignment average, F(1,93 = instructors could check in on struggling students, or perhaps
15.06), p < 0.01, = 0.13, and final grade, F(1,93 = devote additional lecture time to topics or strategies that the
23.676), p < 0.01, = 0.19. This strongly suggests that any dashboard indicates may be problematic for many students.
measurement based on programming behaviors would do well to
look beyond compilation behavior. 8. ACKNOWLEDGMENTS
This project is funded by the National Science Foundation under
Finally, we note that the aggregation method used in the NPSM is
grant no IIS-1321045.
only one possible approach to quantifying Error! Reference
source not found.'s state diagram. It is possible that other 9. REFERENCES
approaches, such as one that quantifying the number and types of
[1] Ahmadzadeh, M., Elliman, D. and Higgins, C. 2005. An
transitions, would yield better results. Exploring this possibility
analysis of patterns of debugging among novice computer
would be an interesting direction for future research.
science students. ITiCSE ’05: Proceedings of the 10th
annual SIGCSE conference on Innovation and technology [15] Jeske, D., Stamov-Rossnagel, C. and Backhaus, J. 2014.
in computer science education. ACM Press. 84–88. Learner characteristics predict performance and confidence
[2] Altadmri, A. and Brown, N.C.C. 2015. 37 Million in e-Learning: An analysis of user behavior and self-
Compilations: Investigating Novice Programming evaluation. Journal of Interactive Learning Research. 25, 4
Mistakes in Large-Scale Student Data. Proceedings of the (2014), 509–529.
46th ACM Technical Symposium on Computer Science [16] Kessler, C.M. and Anderson, J.R. 1986. A Model of
Education (Kansas City, MO, USA, 2015), 522–527. Novice Debugging in LISP. Empirical Studies of
[3] Baker, R.S.J. and Siemens, G. 2014. Educational data Programmers. 198–212.
mining and learning analytics. The Cambridge Handbook [17] Kölling, M., Quig, B., Patterson, A. and Rosenberg, J.
of the Learning Sciences. Cambridge University Press. 2003. The BlueJ system and its pedagogy. Journal of
253–274. Compuer Science Education. 13, 4 (2003), 249–268.
[4] Bergin, S., Reilly, R. and Traynor, D. 2005. Examining the [18] Ma, W., Adesope, O.O., Nesbit, J.C. and Liu, Q. 2014.
role of self-regulated learning on introductory Intelligent tutoring systems and learning outcomes: A
programming performance. Proc. 2005 ACM International meta-analytic survey. Journal of Educational Psychology.
Computing Education Research Workshop. ACM Press. 106, 2007 (2014), 901–918.
81–86. [19] Rosson, M.B., Carroll, J.M. and Sinha, H. 2011.
[5] Bransford, J., Brown, A.L. and Cocking, R.R. eds. 1999. Orientation of Undergraduates Toward Careers in the
How people learn: Brain, mind, experience, and school. Computer and Information Sciences: Gender, Self-Efficacy
National Academy Press. and Social Support. ACM Transactions on Computing
[6] Campbell, P.F. and McCabe, G.P. 1984. Predicting the Education. 11, 3 (Oct. 2011), 1–23.
Success of Freshmen in a Computer Science Major. [20] Schunk, D.H. 2012. Learning theories: An educational
Commun. ACM. 27, 11 (1984), 1108–1113. perspective. Merrill Prentice Hall.
[7] Carter, A.S. 2012. Supporting the virtual design studio [21] Slavin, R.E. 2011. Educational psychology: Theory and
through social programming environments. Proceedings of practice. Pearson Education.
the ninth annual international conference on International [22] Spohrer, J.C. 1992. MARCEL: Simulating the novice
computing education researc (Auckland, New Zealand, programmer. Ablex.
2012), 157–158.
[23] Tabano, E.S., Rodrigo, M.M.T. and Jadud, M.C. 2011.
[8] Goldenson, D.R. and Wang, B.J. 1991. Use of structure Predicting at-risk novice Java programmers through the
editing tools by novice programmers. Empirical Studies of analysis of online protocols. Proceedings of the seventh
Programmers: Fourth Workshop. Ablex. 99–120. international workshop on Computing education research
[9] Graham, M.J., Federick, J., Byers-Winston, A., Hunber, (Providence, Rhode Island, USA, 2011), 85–92.
A.B. and Handelsman, J. 2013. Increasing persistence of [24] U.S. Department of Education, Office of Educational
college students in STEM. Science. 341, 27 Sept. (2013), Technology 2012. Enhancing Teaching and Learning
1455–56. through Educational Data Mining and Learning Analytics:
[10] Guzdial, M. 1994. Software-realized scaffolding to An Issue Brief.
facilitate programming for science learning. Interactive [25] Visual Studio® - Microsoft® Developer Tools: 2015.
learning Environments. 4, 1 (1994), 1–44. http://www.visualstudio.com. Accessed: 2015-04-20.
[11] Harrell, F.E. 2001. Regression Modeling Strategies: With [26] Watson, C., Li, F.W.B. and Godwin, J.L. 2014. No tests
Applications to Linear Models, Logistic Regression, and required: comparing traditional and dynmaic predictors of
Survival Analysis. Springer. programming success. Proceedings of the 45th ACM
[12] Hundhausen, C.D., Brown, J.L., Farley, S. and Skarpas, D. Technical Symposium on Computer Science Education
2006. A methodology for analyzing the temporal evolution (2014), 469–474.
of novice programs based on semantic components. [27] Watson, C., Li, F.W.B. and Godwin, J.L. 2013. Predicting
Proceedings of the 2006 ACM International Computing Performance in an Introductory Programming Course by
Education Research Workshop. ACM Press. 45–56. Logging and Analyzing Student Programming Behavior.
[13] Hundhausen, C.D., Carter, A.S. and Adesope, O. 2015. Proceedings of the 2013 IEEE 13th International
Supporting Programming Assignments with Activity Conference on Advanced Learning Technologies (2013),
Streams: An Empirical Study. Proc. 2015 SIGCSE 319–323.
Symposium on Computer Science Education (New York, [28] Wilson, B.C. and Shrock, S. 2001. Contributing to Success
2015). in an Introductory Computer Science Course: A Study of
[14] Jadud, M.C. 2006. Methods and tools for exploring novice Twelve Factors. SIGCSE Bull. 33, 1 (2001), 184–188.
compilation behaviour. Proce. Second International [29] Wilson, C.R., Voorhis, V. and Morgan, B.L. 2007.
Workshop on Computing Education Research. ACM. 73– Understanding power and rules of thumb for determining
84. sample sizes. Tutorials in Quantitative Methods for
Psychology. 3, 2 (2007), 43–50.