Paper of the Edgeworth Series in Quantitative Behavioral Science

(Paper No. ESQBS-95-10)

A Brief Report on

Student Evaluation of Teaching

Anita M. Hubley

Bruno D. Zumbo

Psychology Programme

Faculty of Health & Human Sciences

University of Northern British Columbia

When citing this paper, please use the following format (APA):


Hubley, A. M., & Zumbo, B. D. (1995). A Brief Report on Student Evaluation of Teaching (Paper No. ESQBS-95-10). Prince George, B.C.: University of Northern British Columbia. Edgeworth Laboratory for Quantitative Behavioral Science.

Running Head: Teaching Evaluation

0.0 Introduction

This brief working paper is in response to a request from Ken Prkachin (Chair, Psychology Program) that as experts in assessment and psychometrics, we reflect upon the current evaluation strategy that is being used. The purpose of this paper is to (a) give our colleagues a sense of the extensive literature that is available on student evaluation of teaching (SET), and (b) examine the currently used UBC graduate teaching evaluation form within this context. It should be stressed that we believe that it is crucial that the scholarly literature guide our evaluation process. We do not need another "in-house" evaluation tool that ignores the extensive literature available on this topic.

Before launching into this topic, let us remind the reader that at this point, this paper is only a brief report and is not meant to function as an extensive review of the literature with recommendations. For those who are interested in reading further on this topic (and beyond this paper), the most thorough review to date is the special issue of the International Journal of Educational Research (Marsh, 1987).

1.0 Background Information

It is apparent from the literature (e.g., Marsh, 1987), that the purpose of teaching evaluations is to assess teaching effectiveness. That is, the purpose of teaching evaluations is to help instructors and the institution achieve effective teaching. Teaching evaluation is a high-stakes activity and so any decisions we make need to keep this in mind. High-stakes conditions can cause a great deal of resentment among faculty and/or students and can have very negative consequences on the institution because neither the students nor the professors are being served and neither of their interests are being kept in mind. We are in the unique situation to be able to establish an effective evaluation system without having to "destroy" a previously entrenched value system. Thus, it is important that we have an evaluation system that is effective rather than one that is simply running and giving us data (even if we don't know what that data means).

2.0 Definitions

It would be prudent at this point to introduce a few terms from the teaching evaluation literature. The first set of important terms in the realm of teaching evaluations is formative and summative evaluation. As Murray (1987) states,

"In most institutions the results of student evaluations of

teaching are used for one or both of two major purposes:

diagnostic feedback intended to bring about improvement in

faculty teaching performance -- usually termed formative

evaluation; and as input to administrative decisions on

faculty retention, tenure, promotion, and merit pay --

generally known as summative evaluations." (pp. 85-86)

The distinction between summative and formative evaluation is important because as Murray goes on to say, these have very different purposes; thus different types of rating forms are appropriate. In addition, it should be noted that whereas summative feedback is shared with university administrators, formative feedback is made available only to the instructor. This rightfully prevents formative feedback from serving a summative role.

The most useful rating form for administrative purposes focuses on global characteristics of teaching effectiveness that are under the professor's direct control. Furthermore, this general form should be standardized in that all the professors are using the same form in the same way. However, this general rating form is usually of limited use for formative evaluation because, by definition, it does not provide enough information about idiosyncratic teaching behaviours and reasons for low or high ratings. Murray (1987) further states that "faculty dissatisfaction and frustration can arise when, as is commonly the case, a global evaluation form designed for summative use is forced to do double duty in a formative role (p. 86).

The second set of important terms is norm-referenced "testing" (NRT) and criterion-referenced "testing" (CRT). Much of our testing (or evaluation) procedures are norm-referenced. That is, an individual's performance is compared to everyone else's performance. There has been some discussion in the teaching evaluation literature to suggest that a more valuable approach would involve criterion-referenced evaluations. Such an approach would involve setting certain (teaching) objectives and

criteria based on a job analysis. Then an individual's performance is compared, not to other individuals, but to where the individual stands with respect to each of the objectives or criteria set out. A criterion-referenced approach emphasizes the competency of teaching, whereas a norm-referenced approach focuses on competition in teaching.

3.0 Issues

One issue has to do with the commonly used "horse-race" model of teaching evaluation. In this model, a "within-group" ranking is conducted each term (often disguised as percentiles). However, this horse-race model is ineffectual on two counts (a) it forces instructors to be compared to each other irrespective of their scores, and (b) it results in "negative-washback". This latter term refers to the situation in which the evaluation process ultimately harms the educational system because of competition among colleagues and a greater interest in "scoring well" rather than effective teaching. For example, such a system may ultimately result in less experimentation in one's teaching style (due to its high risk) and a greater reliance on activities that produce high student satisfaction (regardless of the educative value).

Another issue concerns the reporting of results from teaching evaluations. Marsh and Roche (1993) comment on the not-so-surprising fact that since effective teaching is a multidimensional construct, so measures of effective teaching also tend to be multidimensional. The debate in the literature however, has to do with whether student evaluation of teaching scores should be reported as one score or as a profile (for example, see Abrami & d'Appollonia, 1991).

Finally, like all assessment contexts, an important aspect of the validity of the construct-level inference concerns the content and wording of items. It must be clear for both summative and formative purposes what is being measured. In terms of the content of the evaluation, items should be selected that reflect content that has been shown in the literature to be important. In terms of the wording of items, one must be careful to ensure that items actually measure what we want them to measure. For example, it is not clear what is being assessed in questions such as: (a) The environment in this course encouraged the equal participation/involvement of all students, or (b) The professor belittles students. In the first case, the cause for a "poor" environment is automatically attributed to the instructor despite the possibility that many other factors play a role. In the second case, students are asked to perceive what their classmates are thinking.

4.0 Recommendations

We suggest the following tentative recommendations:

1. We should focus on evaluating teaching effectiveness only -- as opposed to other dimensions that have been suggested in the literature.

2. The teaching evaluation form should be kept short and simple. This increases student participation and attentiveness to the questions being asked. This would also suggest that complicating items such as an "importance" rating should be avoided.

3. In calculating summative information, the median should be computed rather than the mean to avoid information being unduly influenced by outliers and extreme scores.

4. We need to consider which items should be used for summative purposes and which items should be used for formative purposes. As described above, summative and formative evaluations serve different purposes and items should not be used for both purposes (see Murray, 1987).

5. The most thoroughly investigated measure is Marsh's SEEQ (particularly since Marsh tends to be rather prolific). Thus, we will make our reccommendations for a "tentative" evaluation form based on Marsh's items (see attached). Our guidelines in proposing a measure are that we want a short form that is essentially unidimensional (so we can add the items up to obtain a score), but taps a broad enough domain to be of some use to the instructor for formative purposes. We have used Abrami and d'Appolonia's (1991) analysis of the SEEQ to help create some salient items to measure teaching effectiveness. We also considered some of the items on the UBC form and the CNC form that we felt were particularly relevant to UNBC.

6. We also recommend that UNBC consider a CRT approach to the summative aspect of student evaluation of teaching. Such an approach would entail (a) a job analysis to set teaching objectives, (b) the development of a form based on such a job analysis and incorporating items from the literature (which is primarily norm-referenced), and (c) several "standard setting" exercises to set the cut-off scores for the various categories. We suggest that the university consider a minimum competency approach that would potentially result in categories such as:

(a) not acceptable

(b) acceptable (i.e., has attained minimum competency)

(c) exceptional.

We recommend that the principles described by Richard Jaeger in Laveault, Zumbo, Gessaroli, and Boss (1994) be followed in the development of a CRT measure.

7. Since student evaluations of teaching are just one piece of evidence regarding an individual's teaching effectiveness, we strongly recommend that a "portfolio assessment scheme" be considered. Portfolios essentially allow an instructor to gather "evidence" of their teaching effectiveness. Materials that would comprise such a portfolio might include teaching evaluations, course syllabi, student testimonials, articles/presentations on teaching, membership to teaching sections (e.g., Section 2 of APA; Section 15 of CPA), and descriptions of self-developed class demonstration techniques or distance education courses.5.0


References

Abrami, P. C., & d'Appollonia, S. (1991). Multidimensional student's evaluation of teaching effectiveness -- Generalizability of "N=1" research: Comment on Marsh (1991). Journal of Educational Psychology, 83, 411-415.

Laveault, D., Zumbo, B.D., Gessaroli, M. E., & Boss, M. W. (Eds.). (1994). Modern Theories of Measurement: Problems and Issues. Ottawa, Ont.: University of Ottawa.

Marsh, H. W. (1987). Student's evaluations of university teaching: Research findings, methodological issues, and directions for future research. International Journal of Educational Research, 11, 253-388. (Entire Issue No. 3).

Marsh, H. W., & Roche, L. (1993). The use of student's evaluations and an individually structured intervention to enhance university teaching effectiveness. American Educational Research Journal, 30, 217-251.

Murray, H.G. (1987). Acquiring student feedback that improves instruction. In M.G. Weimer (Ed.), New Directions for Teaching and Learning (No. 32). San Francisco, Calif.: Jossey-Bass.


TEACHING EFFECTIVENESS QUESTIONNAIRE

The purpose of this questionnaire is to obtain information about this course. This information will be used as feedback to a) to the course instructor, and (b) the department, to aid in course planning, and individual contract renewal, tenure, promotion, and salary decisions. You are urged, therefore, to respond to the following statements carefully and thoughtfully.

Note: The results of this teaching effectiveness questionnaire will be shared with the instructor only after the final grades have been assigned.

For your response to the statements on the next page, please circle the category that you feel best represents your thoughts.

SA - Strongly Agree

A - Agree

U - Undecided

D - Disagree

SD - Strongly Disagree

na - Not Applicable

Course No.: _________________ Instructor: _________________

1. At the beginning of the course, the instructor provided a clear, written outline describing the course requirements.

SA A U D SD na

2. The instructor was well-prepared for class.

SA A U D SD na

3. The instructor communicated the course material effectively.

SA A U D SD na

4. The instructor showed enthusiasm for the subject matter.

SA A U D SD na

5. The instructor encouraged class participation and/or

discussion.

SA A U D SD na

6. The instructor was sensitive to gender and/or cultural

issues.

SA A U D SD na

7. The instructor was helpful when students had difficulty with

the course material.

SA A U D SD na

8. The instructor was available and willing to consult with

students outside of class times (e.g., during office hours).

SA A U D SD na

9. The instructor gave feedback and/or returned assignments or

exams within a reasonable time period.

SA A U D SD na

10. Overall, I would rate the instructor as:

Excellent Good Satisfactory Poor Very Poor

General Comments:(from students)

Please record any comments you have about this course and the instructor in the space below, omitting any information that you be likely to identify you personally. This information will be made available only to the instructor and only after all course grades are assigned. Please do not identify yourself on this form.

(Note: instructors may provide their own questions for formative feedback - e.g., Please comment on whether the textbook was easy to read and understand. OR Please comment on whether the laboratory assignments were helpful in understanding the course content.)