Validity of Interviews for Admissions Evaluation
E. G. Kelman and S. Canger
From the College of Veterinary Medicine
Cornell University, Ithaca, NY 14853-6401.
It is well documented that previous grades combined with standardized test scores are the best predictors of preclinical achievement in veterinary medical schools (1-5). Admissions committees often rely on subjective information, including references, personal statements, and interviews to assess characteristics thought to be related to clinical performance. Although a few studies indicate that interview ratings are predictive of clinical competence (6) or of higher recommendations for internships (7) in allopathic medicine, it is not clear that interviews or other subjective information gathered at the time of admission predict personal qualities desirable in clinical practice (8-11). One correlational study found a significantly negative relationship between interview scores and grades in the 4th-year clinical courses at a college of veterinary medicine (2). Researchers cite methodological problems with correlational studies due to restricted range in either the predictor (interview scores) or the criterion (clinic grades). In veterinary colleges where interviews are conducted, weight given to the interview ranges for 10 to 55 percent of the overall evaluation, with 25 percent of the total admissions evaluation as the modal weight given to the interview 12).
Predictive validity, or the ability to accurately estimate an outcome, depends on accuracy of measurement. Variability of interviewing procedures and interviewer biases of various kinds cause unreliability of interviewer scores, and without reliability there can be no validity. The most frequently cited interviewer errors that prevent reliability and validity of the interview as a measurement technique are: overweighting the first impression (making "snap" judgments); overgeneralizing from one trait to other (the "halo" effect); order effect (being first or last in a series or biases due to following an interviewee who made either a strongly positive or very negative impression); losing control of the interview (inability to standardize the interview, such that interviewees are compared similarly); naive implicit personality theories (undocumented assumptions about the relationships between traits, for example, that persons with red hair are emotionally volatile); and biases due to stereotypes about gender, race, national origin, sexual preference, or handicap (13-15).
Probably the greatest source of error variance is lack of agreement among interviewers with regard to desirable traits in applicants, or lack of construct validity. What one interviewer judges with favor may be viewed with disfavor by another interviewer (16, 17). Permitting interviewers great latitude to define for themselves the characteristics of interest in a candidate presents a similar problem of low reliability. Most medical schools' interview procedures are loosely to moderately structured, and interviewers receive minimal training (18), in spite of consensus by experts in the art and science of interviewing that a highly structured, patterned interview, conducted by a well-trained interviewer is essential.
Concerned that their interview scores could be unfair to applicants, given the difficulty of training a large number of interviewers and of monitoring the pattern of admissions interviews to ensure reasonable similarity of experience for each interviewee, the admissions committee at the College of Veterinary Medicine at Cornell University decided in 1988 to join 4 other U.S. veterinary medical schools in not utilizing interviews in the selection of new veterinary medical students. Faculty at Cornell agreed to this decision, based on a review of empirical studies about interviewing.
The class graduating in 1992 was the last group of Cornell veterinary medical students to receive an interview at admission. When this class began its fourth-year clinical rotations, a rare opportunity existed to study the validity of their interview scores. It was possible not only to relate the original interview ratings of these seniors to similar ratings made by their current clinical supervisors in the veterinary teaching hospital, but it would also be feasible to compare the clinicians' perceptions of the Class of 1992 with the succeeding class, which was not interviewed as part of its admission process. Over a 2-year period, two separate studies were conducted. The purpose of the first study was to determine if there were any relationships between interviewers' original assessments of applicants and the perceptions of supervising clinicians of these same individuals as seniors working in a clinical setting 4 years later. The rationale for this first study is that the interview purports to assess those personal characteristics associated with later good professional performance. Seniors' work in the Veterinary Teaching Hospital is a reasonable approximation of professional performance. The second study compared clinicians' ratings of the Class of 1992, which had received an interview, with the Class of 1993, which was not interviewed as part of its admissions evaluation. If having an interview is essential to identifying those personal characteristics thought to be appropriate to professional behavior, it would be reasonable to assume that a class selected without an interview would be perceptibly different from one selected with an interview.
Methodology of the First Study
Copies of the same interview rating forms originally used in the students' admissions interviews were distributed to clinicians in the Veterinary Teaching Hospital. Clinical faculty were asked to complete the forms for supervised students at the end of each rotation. The forms consisted of three rating scales, with a maximum of 22 points available. The three personal characteristics to be assessed were: communication skills (0-6 points); problem-solving ability (0-8 points); and social responsibility/dedication to professional service (0-8 points). These were the characteristics rated previously by two interviewers at the time of application. As seniors, at least three ratings by clinicians were obtained. A Pearson product-moment correlation was calculated for 78 students' average interview total score with the average clinicians' total scores for those same individuals as seniors.
Results and Discussion of the First Study
The resultant correlation between interview ratings and clinicians' ratings was -0.068, indicating no relationship between the earlier interview scores and later clinicians' assessments on the same characteristics.
Analysis of variance (see Table 1), with the groups divided by gender, revealed:
- The 56 women in the study received lower scores as seniors than as applicants, with both sets of raters using a similar range in making their evaluations. The average for the group was much affected by a few outliers, that is, women who received very high admissions interview scores and very low ratings by clinicians.
- The 22 men in the study received, on the average, similar scores from clinical supervisors as from their interviewers four years earlier, but the range of scores was greater as seniors than it had been for them as applicants. A small number of men with mediocre interview scores had very high ratings by clinicians, based on their senior-year work in the hospital.
- Admissions interviewers for both men and women tended not to deviate much from the average score, whereas clinicians had a larger standard deviation for their ratings. This difference between the two sets of raters was especially pronounced in assessments of men in the Class of 1992.
Analysis of variance suggests a gender effect in accuracy of rating. Assuming that the clinicians' ratings were more accurate, having been made on the basis of at least 2 weeks of observation of actual behavior in a clinical setting (as compared to a half-hour interview, where evaluations are based on conversations and theoretical situations posed by interviewers), a major contributor to the inaccuracy of the interview in predicting later clinical behavior was that women were perceived to perform better, on the average, in the largely social setting of the interview, whereas men were seen to perform better, on the average, in the clinical work setting. Clinicians also used a wider range of scores in making their evaluations, possibly feeling more confident in their ratings, because they had observed a much larger sample of behavior than the interviewers.
Table 1. Gender differences in admissions interview and clinical rotation ratings for the Class of 1992
Females (n = 56) Males (n = 22)
Interview ratings (by admission committee)
x = 16.036 x = 16.144
std.dev. = 1.8534 std.dev. = 1.3446
range* = 8 points range* = 5.5 points
Clinical rotation ratings (by clinical staff in VTM)
x = 14.67 x = 16.16
std.dev. = 2.797 std.dev. = 2.387
range* = 7.9 points range* = 10 points
Methodology of the Second Study
During the 1992-93 academic year, clinicians were asked to rate seniors on the same interview forms formerly used by the admissions committee in evaluating applicants, though these students had not been interviewed as applicants. Averages were calculated for the 77 seniors with 3 or more ratings in the Class of 1993. Overall averages for the Class of 1992 were compared with those of the Class of 1993.
Results and Discussion of the Second Study
Though the Class of 1992, which had been interviewed, had a slightly higher average clinicians' rating (15.09) than the Class of 1993 (14.62), the difference was not significant at the .05 level using the t-statistic. If faculty cannot perceive a difference between a class admitted with an interview and a class not interviewed at admission on the same characteristics measured in the interview, it may be concluded that the interview made no difference in selecting for those characteristics or that training provided in communication skills, problem-solving skills, and professionally appropriate attitudes and behavior during the 4-year curriculum eliminates any differences that may have existed prior to admission.
Two separate, but related, studies conducted over a 2-year period at the College of Veterinary Medicine at Cornell University confirmed findings of several earlier studies that interviewing may not be a valid assessment tool in making admissions decisions. These studies differed from earlier research concerning the validity of interviews in that comparison of interview ratings was not made to later grades or residency evaluations but to the identical personal characteristics made in the original interview and on the same rating forms with the same rating scales by clinicians supervising senior veterinary medical students in clinical practice in the teaching hospital.
Predictive validity could not be shown in the first study where interviewers' ratings of communication skills, problem-solving skills, and professional ethics/responsibility had no correlation with later clinicians' assessments of the same individuals on the same characteristics. In the second study, faculty were unable to distinguish these characteristics between a class admitted with an interview and a subsequent class which had not been interviewed at admission.
References and Endnotes
1. Halm GC: Selecting the professional student. In March HL, Third Symposium on Veterinary Medical Education. East Lansing, MI: Michigan State University, 1966.
2. Kelman EG: Predicting success in veterinary medical college. JVME 5:92-94, 1982.
3. Layton WL: Predicting success of students in veterinary medicine. J Appl Psych 36:312-315, 1952.
4. Niedwiedz ER and Friedman BF: A comparative analysis of the validity of preadmissions information at four colleges of veterinary medicine. JVME 3:32-38, 1976.
5. Noeth RJ, Smith DS, Stockton JJ and Henry CA: Predicting success in the study of veterinary science and medicine. J Edu Res 67:213-215, 1974.
6. Korman M, Stubblefield RF and Martin LW: Patterns of success in medical school and their correlates. JME 43-405-411, 1968.
7. Murden R, Galloway GM, Reid JC and Colwill JM: Academic and personal predictors of clinical success in medical school. JME 53:7110719, 1978.
8. Gough HS, Hall WB and Harris RE: Evaluation of performance in medical training. JME 39:679-692, 1964.
9. Johnson DG: A multifactor method of evaluating medical school applicants. JME 37:656-665, 1962.
10. Kegel-Flom P: Predicting supervisor, peer, and self-ratings of intern performance. JME 50:812-815, 1975.
11. Mensh IN: Orientations of social values in medical school assessment. Soc Sci Med 3:339-348, 1970.
12. Association of American Veterinary Medical Colleges, Kelman EG (Ed.): Veterinary Medical School Admission Requirements in the United States and Canada. Rockville, MD: Betz Publishing Co., 1993.
13. Smart BD: Selection Interviewing: A Management Psychologist's Recommended Approach. New York: Wiley & Sons, 1983.
14. Uris A: 88 Mistakes Interviewers Make. New York: American Management Association, 1988.
15. Webster, EC: The Employment Interview: A Social Judgement Process. Schomberg, Ontario: S.I.P. Publications, 1982.
16. Mayfield EC and Carlson RE: Selection interview decisions: first results from a long-term research project. Personnel Psyc 19:41-53, 1966.
17. Bingham W and Moore B: How to Interview. New York: Harper, 1959.
18. Johnson EK and Edwards JC: Current practices in admission interviews at US medical schools. Acad Med 66:408-412, 1991.