Académique Documents
Professionnel Documents
Culture Documents
Article information:
To cite this document: Dmitriy V. Chulkov, Jason Van Alstine, (2012),"Challenges in designing student teaching evaluations in a
business program", International Journal of Educational Management, Vol. 26 Iss: 2 pp. 162 - 174
Permanent link to this document:
http://dx.doi.org/10.1108/09513541211201979
Downloaded on: 09-11-2012
References: This document contains references to 26 other documents
To copy this document: permissions@emeraldinsight.com
Access to this document was granted through an Emerald subscription provided by DAYSTAR UNIVERSITY
For Authors:
If you would like to write for this, or any other Emerald publication, then please use our Emerald for Authors service.
Information about how to choose which publication to write for and submission guidelines are available for all. Please visit
www.emeraldinsight.com/authors for more information.
About Emerald www.emeraldinsight.com
With over forty years' experience, Emerald Group Publishing is a leading independent publisher of global research with impact in
business, society, public policy and education. In total, Emerald publishes over 275 journals and more than 130 book series, as
well as an extensive range of online products and services. Emerald is both COUNTER 3 and TRANSFER compliant. The organization is
a partner of the Committee on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for digital archive
preservation.
*Related content and download information correct at time of download.
The current issue and full text archive of this journal is available at
www.emeraldinsight.com/0951-354X.htm
IJEM
26,2 Challenges in designing student
teaching evaluations in a business
program
162
Dmitriy V. Chulkov and Jason Van Alstine
School of Business, Indiana University Kokomo, Kokomo, Indiana, USA
Received 27 October 2010
Revised 13 December 2010
7 March 2011
Accepted 31 March 2011 Abstract
Purpose This article aims to present an empirical analysis of the effects of changes in the student
teaching evaluation (STE) form in a business school.
Design/methodology/approach The authors discuss a case of STE re-design in a business
school that focused on improving the STE instrument. They utilize empirical data collected from
students that completed both the original and the revised STE form in several semesters of
undergraduate economics courses to examine the effect of changing the evaluation scale and the
fashion in which written comments are solicited.
Findings There are three results of interest to departments considering a change to student
evaluation instruments. First, the authors find that a shift from a four-point scale to a five-point scale
leads to a decrease in evaluation scores even after making an adjustment for scaling. Second, they find
that students tend to give lower scores on comparison-type questions that ask for a comparison of the
instructor or the course to the students entire college experience. A larger share of such
comparison-type questions may depress the mean scores on composite evaluations. Third, soliciting
written feedback in a specific section of the form is an effective way to increase both the number of
written comments and the size of each comment.
Practical implications Student teaching evaluations serve as an assessment instrument and are
frequently used in faculty promotion decisions. A discussion of best practices in designing the STE is
provided in order to caution the stakeholders of the problems that may arise and to guide academic
institutions in the review of evaluation procedures.
Originality/value The authors start with an example of STE re-design and then analyze empirical
data from several semesters. Analysis of the literature and empirical evidence leads to recommended
best practices that make STE data more useful both as a summative measure for administrative
decisions and as a formative measure used by faculty looking to improve their teaching skills and
course design.
Keywords Teaching evaluation, Evaluation methods, Assessment, Business education, Teachers,
Business schools, Change management
Paper type Research paper
A score of 4 on this question was not necessarily worse than 2 or 3. Thus, averaging the
answers to this question did not provide a summary measure with a clear
interpretation. Furthermore, using the average in comparison to other questions that
had an ordinal scale could provide for misleading results. There were a total of six
questions on the original STE with no ordinal answer scale. Most of these questions
appeared in the course evaluation section of the STE, with only one such question in
the instructor evaluation section of the STE. Thus, the problem with non-ordinal scale
questions may have skewed the mean for the course evaluation questions for each
instructor more than the mean of the instructor evaluation questions. The department
used the STE to evaluate the mean performance on course-related questions separately
from the mean performance on the instructor-related questions. A bias caused by
non-ordinal scales makes the comparison of such means questionable.
A second issue concerned the evaluation scales on the original STE. Two questions
used a five-point scale, while the 15 remaining questions used a four-point scale.
Obviously, this made comparison of the means misleading. Furthermore, as the
comparison of the mean answers to evaluation questions is frequently used, clear
standards for the evaluation scale must exist to make these comparisons meaningful. Challenges in
An examination of the original STE also revealed that the four-point scale answer designing STEs
choices did not have the same midpoint for the answer scale. One would expect on the
four-point scale to have two answer choices above the average, and two answer choices
below the average. This was not always the case. For instance, question Speaking
ability had the following answer choices:
(1) Voice and demeanor excellent. 165
(2) Average.
(3) Poor speaking, distracting.
(4) Poor speaking, a serious handicap.
The average choice here was answer choice 2, rather than the point between 2 and 3.
Meanwhile, for several of the other questions an answer of two implied that instructor
was above the average for that particular question.
The third issue with the scale was discovered in the comparison with the other
schools and departments at the university. The other departments used scales in which
the highest answer choice was the best and the lowest was the worst. Having a scale
with the lowest score being the best put the School of Business at a disadvantage, as
the difference in scales had to be explained to the stakeholders at the campus level
every time the Business faculty STE scores were discussed.
Addressing these issues required making changes to the scale on the STE in order
to make the results ordinally comparable within the department and across the
departments at the university. The literature recommends using the five-point Likert
scale in the design of the STE (e.g. Frick et al., 2010). The three issues identified
previously are addressed by introducing a five-point scale as follows: 1 strongly
disagree; 2 disagree; 3 undecided; 4 agree; 5 strongly agree. This scale
provides a clear midpoint. It also is consistent with the scales used by other
departments across the campus in that the highest answer choice is the best.
Two other issues arose in the review of the original STE. The first involves the way
in which written comments were solicited from students. The original form provided
space to write additional comments at the side of the form in relation to any of the 17
questions on the form. The number of written comments actually submitted by the
students was low. The department decided to add specific questions to the form that
ask for optional written feedback. The following section presents empirical results that
demonstrate that including an explicit request for feedback and suggestions for
improvement helps increase the number of written comments received. The written
comments are important, as they complement the numerical rankings from the STE.
This additional evidence helps attenuate the issues with the validity of numerical STE
ranking raised by Onwuegbuzie et al. (2009). Complementing numerical rankings with
written comments and other teaching effectiveness data are advocated by studies of
STEs in business schools (e.g. McPherson et al., 2009; Kozub, 2010).
The final issue regards the nature and the phrasing of the questions. The original
form did not have a theoretical basis for the questions asked. The literature provides a
number of dimensions for evaluating teaching effectiveness. The STE should be used to
gather evidence along these dimensions. Lowman (1994) presents a two-factor model of
teaching effectiveness in which the factors are intellectual excitement and interpersonal
IJEM rapport. The basic idea of using two main factors to organize student ratings is
26,2 supported by a number of other researchers including Frey (1978), Cranton and Smith
(1986), Erdle et al. (1985). In these studies, one factor represents the pedagogical skills in
the classroom, such as presentation and organization, and the other factor represents
concern for or rapport with the students. These two dimensions are also consistent with
those identified in the factor analysis of STEs by Abrami et al. (1997) as being the two
166 most important. They also reflect two of the three roles of instructors (i.e. presentation
and facilitation) identified in the factor analysis by Feldman (1976).
Using the two-factor model, the department focused on the pedagogical skills in the
classroom, such as presentation and organization, and the instructors rapport with the
students. The revised form was designed to have two sections evaluated with the
Likert scale and an additional section that specifically asks for written feedback. The
17 STE questions on the revised form are divided into two sections dealing with the
instructor attributes and the course attributes, respectively. An optional written
comment section is placed at the end. This section solicits answers to four specific
questions designed to provide feedback to the instructor:
(1) What did you like most about the instructor?
(2) What did you like most about the course?
(3) What could your instructor do to most improve his/her effectiveness as a
teacher?
(4) What specific suggestions do you have for improving this course?
Writing the questions for the revised STE also involved changing the language in
order to clarify each question and to align each question with the Likert scale that was
introduced. For instance, question phrased as Ability to explain on the original form
was changed to: The instructor clearly explained the course material. In contrast, the
original STE relied on the answer choices on the four-point scale to clarify the nature of
the question. Moving away from the multiple scales toward the standard scale makes
the resulting data more comparable.
The issues with the scale of the STE form, the phrasing of the questions, and the way
in which the form solicits written feedback were addressed in the design of the revised
form. These issues may be present in the STE instruments used by other academic
institutions, especially in forms that were designed in an ad hoc fashion. In the following
we present an empirical analysis of the effects of changes to the STE instrument.
Of the 17 questions on both the original STE and the revised STE only nine questions can
be directly compared based on the content. Seven of these questions ask the student to
evaluate the characteristics of the instructor. Specifically, students are asked to evaluate
the instructors knowledge of the subject, the instructors enthusiasm about the subject,
the clarity of the instructors explanations, the instructors attitude towards students, the
instructors openness and responsiveness to questions during class, the instructors
organization of lectures and how the instructor compares to other instructors the student
has had. The other two questions ask the student to evaluate characteristics of the course,
specifically the organization of the topics covered and the textbook used for the course.
Means of all instructor questions, all course questions, and all questions (including the
questions that cannot be directly paired) are also generated and studied as these measures
are often used as summary statistics to evaluate the effectiveness of an instructor.
4. Empirical results
4.1 Changing the scale on the STE form
The first step we take to empirically study the impact of changing the answer scale
between the two STEs is to compare the means of the responses from the original STE
and the revised STE. We choose to start at this point since mean response values have
become a common measure to evaluate an instructors performance relative to the other
instructors in the department. Table II presents the t-statistics for a
comparison-of-means test. As a number of questions were changed on the revised
form, this comparison is only presented for the questions that can be paired between
the original and the revised forms. A negative t-value implies that the mean on the
revised STE is lower than on the original STE.
Section
Student participation 1 2 3 4 5 6
168
IJEM
Table II.
Section
1 2 3 4 5 6 Entire sample
Table III.
Comparison of means for Revised STE 3.03 * 2.96 * 1.83 1.56 0.03 0.78 3.39 *
evaluation-type and Original STE 1.77 3.11 * 2.41 * 0.41 1.30 20.99 3.63 *
comparison-type
questions Notes: Table reports t-statistics. *Significant at the 5 per cent level
Section
1 2 3 4 5 6 Entire sample
Table IV.
Comparison of written Number of comments 3.15 * 3.82 * 6.06 * 5.15 * 3.57 * 3.85 * 9.89 *
comments between the Number of words 2.83 * 2.55 * 3.59 * 5.30 * 3.68 * 2.93 * 8.16 *
original and the revised
STE Notes:Table reports t-statistics. *Significant at the 5 per cent level
find that a shift from a four-point scale to a five-point Likert scale leads to a decrease in Challenges in
evaluation scores even after making an adjustment for scaling. This means that designing STEs
departments and instructors should be careful before making conclusions when the
scale on the STE is revised. Second, we find that students tend to give lower scores on
comparison-type questions that ask for a comparison of the instructor or the course
to the students entire college experience. When making a change in STEs, one should
consider the ratio of evaluation-type questions and comparison-type questions. A 171
larger share of such comparison-type questions may depress the mean scores on
composite evaluations. Third, soliciting written feedback in a specific section of the
STE is an effective way to increase both the number of written comments and the size
of each comment. Providing space for students to leave comments without directly
asking them specific written-comment questions is not sufficient to elicit responses
from students.
Other possible applications of theory to the design of an evaluation form include the
TALQ process described by Frick et al. (2010) or the holistic approach of Patel (2003).
Fourth, the questions should be written clearly. Clear phrasing of the question is
especially important if the STE instrument uses a standard scale for the answers. The
questions should not rely on providing additional explanations or qualifications in the
answer choices, as these make the data less comparable across questions.
Fifth, numerical STE scores should be supplemented by written comments. It is
recommended to establish a separate section for written comments, which includes
specific open-ended questions. Our empirical analysis demonstrates that including
such a section significantly increased the number of comments and the size of
comments in our sample. Increasing the number of written comments enhances the
assessment information available to the instructors and helps improve course design
and teaching effectiveness.
Following these best practices makes STE data more useful both as a summative
measure used in administrative decisions about faculty tenure, promotion, and merit
pay and as a formative measure used by faculty looking to improve their teaching
skills and course design. Research on the validity and application of STEs is
ever-growing and continuous review and improvement of the process makes sure that
the STEs are helpful in improving student learning in Business programs and other
areas of higher education.
The limitations of this study include the fact that all empirical data were collected in
a public US university. International experiences with STE scales and question types
may be different, and additional research is needed to examine the applicability of the
findings to international data. Another limitation of this study is its scope, which is
empirical and is not designed to provide new theoretical developments. A direction for
future research is the development of a theoretical model that addresses the empirical
issues discovered in this study, including the fact that a change in scale may lead
significantly lower evaluation scores even after adjustment for scaling, and the fact
that lower scores are observed on comparison-type questions.
Note
1. The original STE and the revised STE form are available from the authors on request.
References Challenges in
AACSB (2010), Eligibility procedures and accreditation standards for business accreditation, designing STEs
available at: www.aacsb.edu/accreditation/business_standards.pdf (accessed 12 December
2010).
Abrami, P., DApollonia, S. and Rosenfield, S. (1997), The dimensionality of student ratings of
instruction: what we know and what we do not, in Perry, R. and Smart, J. (Eds), Effective
Teaching in Higher Education: Research and Practice, Agathon Press, New York, NY. 173
Badri, M., Abdulla, M., Kamali, M. and Dodeen, H. (2006), Identifying potential biasing variables
in student evaluation of teaching in a newly accredited business program in the UAE,
International Journal of Educational Management, Vol. 20, pp. 43-59.
Becker, W. and Watts, M. (1999), How departments of economics evaluate teaching, American
Economic Association Paper and Proceedings, Vol. 89, pp. 344-9.
Bedggood, R. and Pollard, R. (1999), Uses and misuses of student opinion surveys in eight
Australian universities, Australian Journal of Education, Vol. 43, pp. 129-41.
Cranton, P. and Smith, R. (1986), A new look at the effect of course characteristics on student
ratings of instruction, American Educational Research Journal, Vol. 23, pp. 117-28.
DApollonia, S. and Abrami, P. (1997), Navigating student ratings of instruction, American
Psychologist, Vol. 52 No. 11, pp. 1198-208.
Dommeyer, C., Baum, P., Chapman, K. and Hanna, R. (2002), Attitudes of business faculty
towards two methods of collecting teaching evaluations: paper vs online, Assessment and
Evaluation in Higher Education, Vol. 27, pp. 455-62.
Erdle, S., Murray, H. and Rushton, J. (1985), Personality, classroom behavior, and student
ratings of college teaching effectiveness, Journal of Educational Psychology, Vol. 11,
pp. 394-407.
Feldman, K. (1976), The superior college teacher from the students view, Research in Higher
Education, Vol. 5, pp. 243-88.
Frey, P. (1978), A two-dimensional analysis of student ratings of instruction, Research in
Higher Education, Vol. 9, pp. 69-91.
Frick, T., Chadha, R., Watson, C. and Zlatkovska, E. (2010), Improving course evaluations to
improve instruction and complex learning in higher education, Educational Technology
Research and Development, Vol. 58, pp. 115-36.
Husbands, C. and Fosh, P. (1993), Students evaluation of teaching in higher education:
experiences from four European countries and some implications of the practice,
Assessment and Evaluation in Higher Education, Vol. 18, pp. 95-114.
King, M., Morison, I., Reed, G. and Stachow, G. (1999), Student feedback systems in the business
school: a departmental model, Quality Assurance in Education, Vol. 7, pp. 90-8.
Koh, H. and Tan, T. (1997), Empirical investigation of the factors affecting SET results,
International Journal of Educational Management, Vol. 11, pp. 170-8.
Kozub, R. (2010), Relationship of course, instructor, and student characteristics to dimensions of
student ratings of teaching effectiveness in business schools, American Journal of
Business Education, Vol. 3, pp. 33-41.
Kulik, J. (2001), Student ratings: validity, utility, and controversy, New Directions in
Institutional Research, Vol. 109, pp. 9-25.
Liaw, S. and Goh, K. (2003), Evidence and control of biases in student evaluations of teaching,
International Journal of Educational Management, Vol. 17, pp. 37-43.
IJEM Lowman, L. (1994), Professors as performers and motivators, College Teaching, Vol. 42,
pp. 137-41.
26,2 McPherson, M., Jewell, R. and Kim, M. (2009), What determines student evaluation scores?,
Eastern Economic Journal, Vol. 35, pp. 37-51.
Marsh, H. (1987), Students evaluations of university teaching: research findings,
methodological issues, and directions for research, International Journal of Education
174 Research, Vol. 11, pp. 253-88.
Marsh, H. and Dunkin, M. (1997), Student evaluations of university teaching: a multidimensional
perspective, in Perry, R. and Smart, J. (Eds), Effective Teaching in Higher Education:
Research and Practice, Agathon Press, New York, NY.
Onwuegbuzie, A., Daniel, L. and Collins, K. (2009), A meta-validation model for assessing the
score-validity of student teaching evaluations, Quality and Quantity, Vol. 43, pp. 197-209.
Patel, N. (2003), A holistic approach to learning and teaching interaction: factors in the
development of critical learners, International Journal of Educational Management,
Vol. 17, pp. 272-84.
Seldin, P. (1993), The use and abuse of student ratings of professors, The Chronicle of Higher
Education, Vol. 39 No. 46, p. A40.
Wachtel, H. (1998), Student evaluation of college teaching effectiveness: a brief review,
Assessment and Evaluation in Higher Education, Vol. 23, pp. 191-211.