JVME v20n3: Extended Matching Questions: An Alternative to Multiple-choice or Free-response Questions

Volume 20, Number 3	1993

Extended Matching Questions: An Alternative to Multiple-choice or Free-response Questions

R. B. Wilson and S. M. Case

Dr. Wilson is professor of Veterinary Pathology and Basic Medical Sciences at Washington State University, Pullman, WA 99164. Dr. Case is Senior Evaluation Officer, National Board of Medical Examiners, 3930 Chestnut Street, Philadelphia, PA 19104

Introduction

A variety of methods are available for student assessment including global faculty ratings, structured oral examinations, standardized patient simulations, patient management problems, computer-based simulations, free-response questions (essay and short answer), and various forms of multiple-choice questions. Each method has inherent strengths and weaknesses associated with its reproducibility, validity and utility. The purpose of this paper is to discuss the use of extended matching questions as an alternative to multiple-choice questions or free-response questions in student assessment.

Free-response questions are commonly believed to test important higher-order skills whereas multiple-choice questions are thought to assess only knowledge of isolated facts ( 1 ) or, as Newbie, et al. stated, "a combination of what the student knows, partially knows, can guess, or is cunning enough to surmise from cues in the questions." ( 2 ) Some of the flaws in multiple-choice questions can be overcome by following important construction principles ( 3 ). But, as commonly used, multiple-choice examinations often place undue emphasis on recall and stimulate students to learn in a like mode. On the positive side, scoring reproducibility for multiple-choice questions is excellent and many topic areas can be sampled in a short time.

Free-response questions are not without disadvantages. They require students to guess, to some degree, what the author intended and what the grader (sometimes not the author) will reward. This ambiguity can reduce the reliability and validity of scores. Some students might simply be "lucky" or "unlucky" with respect to guessing exactly what the author had in mind or what the grader will expect. A second disadvantage is that free-response items usually sample a relatively small portion of the topic area. A third disadvantage is that free-response questions must be hand-scored which is cumbersome, time-consuming and resource intensive. Most importantly, subjectivity in grading can reduce score reliability and validity particularly for longer essay questions. In one study, the reliability correlation for scoring essay questions at six-month intervals, by the same grader, was only 0.35 ( 4 ). Scoring reliability is less of a problem with short-answer questions. Methods are available to minimize subjectivity and eliminate bias from scoring free-response items, but such methods also increase the cost of scoring ( 5 ).

Extended matching questions are reasonable alternatives to either multiple-choice questions or free-response questions and have advantages of each in that application of knowledge can be tested and the reliability of scoring is high ( 6 ). Extended matching questions allow one to ask questions where any number of answers from a large provided list may be correct or incorrect. Examinees may also rank, from a list of possibilities, those that are more correct. The same list of possible answers can pertain to any number of independent test items. There is no restriction on the number of times a given answer may be correct. Relative to other multiple-choice formats, there is less cuing and less chance of examinees guessing the correct response, both because there are more options and because the list contains all relevant responses. When large numbers of options are used, extended matching questions become more like free-response questions, forcing the examinee to evaluate each option individually ( 7 ).

Extended Matching Format

The extended matching format has been used for a number of years in several medical specialty board exams and in the Part III examination of the National Board of Medical Examiners (NBME): it is also used in the U.S. Medical Licensing Examination. Well-constructed extended matching sets include four elements: 1) a theme, 2) an option list, 3) a lead-in statement, and 4) two or more item stems, as illustrated below. (The number of stems is limited in these examples in the interest of conserving space.)

Theme: Edema, pathogenesis
Options:

A. Endothelial cell damage
B. Excessive salt retention
C. Hypersecretion of aldosterone
D. Hypersecretion of antidiuretic hormone
E. Increased capillary pressure
F. Reduced plasma protein concentrations
G. Lymphatic blockage

Lead-in:
For each patient, select the pathophysiologic mechanism to best explain the edema. Each option can be used once, more than once, or not at all.

Items:
1. Ascities, hydrothorax and hydropericardium in a 10-year old dog with glomerulonephritis.
2. A 6-year old Shorthorn cow developed marked pulmonary edema and dyspnea 8 days after being moved from a dry summer range to a lush pasture of young grasses and clover.

Additional items might cover some of the other pathophysiologic mechanisms. The items above are constructed with relatively short, focused stems. Alternatively, examinees could be challenged to identify key diagnostic information intermingled with incidental findings using longer vignettes as follows:

Theme: Chronic Renal Disease, Diagnosis
Options:
A. Bilateral renal hydronephrosis
B. Chronic interstitial nephritis
C. Familial renal disease
D. Glomerulonephritis
E. Hypercalcemic nephropathy
F. Obstructive uropathy
G. Polycystic kidneys
H. Pyelonephritis
I. Renal amyloidosis
J. Renal neoplasia
K. Renal vascular disease
L. Tubulointerstitial fibrosis

Lead-in:
For each patient with urological abnormalities, based upon the information provided, select the most likely diagnosis. Each option can be used once, more than once, or not at all.

1. A male, 9-year old miniature poodle had a one-month history of inappetence, weight loss, intermittent vomiting, gingivitis, depression, polydipsia, and polyuria.

Laboratory Test Results

PCV 23%

Total WBC 6,400/uL

Total plasma protein 6.0 g/dL

BUN 295 mg/dL

Creatinine 9.2 mg/dL

Cholesterol l140 mg/dL

Phosphorus 16 mg/dL

Calcium 9.8 mg/dL

Antinuclear antibody Negative

LE cell Negative

Urinalysis:

sp. gr 1.010

pH 5.6

Protein 3+

Sediment Not significant

Renal biopsy:

Direct immunofluorescence IgG and C3 in glomerular
mesangium and subendothelium

Congo-red Negative

2. A 16-year old mare had a 3-month history of generalized weakness, weight loss, polyuria and polydipsia. The animal voided small amounts of urine frequently.

Laboratory Test Results

PCV 29%

Total WBC 27,200/uL

Total plasma protein 6.5 g/dL

BUN 180 mg/dL

Creatinine 8.9 mg/dL

Cholesterol 162 mg/dL

Phosphorus 15.8 mg/dL

Calcium 9.7 mg/dL

Urinalysis:

sp. gr 1.009

pH 6.9

Protein 1+

Sediment

RBC 4-5/high-power field

WBC 6-8/high-power field

Culture 100,000 colonies S. aureus/dL

At necropsy: Radial bands of neutrophils, fibrin, edema, and erythrocytes in the interstitium and within tubular lumens extended from the renal pelvis to the cortex. Glomeruli were atrophic.

Identifying the Theme for a Set
The theme is the topic addressed by a set of items. Themes may be anatomic sites, cell types, clinical signs, laboratory data, functions, a class of drugs, pathogens, pathphysiologic mechanisms, etc.
Option List
The option list provides the response choices that apply to the items in the set. The number of options may be variable. Relatively long option lists allow the inclusion of all relevant options, rather than requiring item writers to "guess" which 3 or 4 distractors would work best in a traditional multiple-choice question. Low-ability examinees benefit from a restricted number of options ( 8 ). Sets can be made more or less difficult by altering the option lists in terms of number of options (to 20 or more) and the degree of discrimination among options. Options may be single words, short phrases or more creative forms such as pictorial material. For example, a labeled drawing or electron photomicrograph of a cell might serve as an "option" list with stems challenging examinees to identify structure/function relationships. Options should be listed in alphabetical order to minimize cuing, unless another logical ordering is possible.
Lead-in Statement
A single lead-in statement is used for all items in a set. It provides directions for the set and indicates the relationship between the stems and the options. Lead-in statements in the above examples require the examinee to select a single best response. Lead-ins can be written to require the student to select more than one response or to order the responses in some way. Care must be taken to give explicit directions. Sets without lead-ins or with nonspecific lead-ins should be avoided because they often pose ambiguous tasks.
Item Stems
A useful form of item stem is the clinical vignette, which describes a patient in a clinical situation. Carefully selected and crafted stems of this type can be used beginning in the first-year courses to draw relationships between basic and clinical sciences. Structure-function relationships are also excellent forms of item stems in the basic sciences. Vignettes can range from complete patient descriptions to brief presentations. Extended matching items can be written in a nonvignette format. However, care must be taken not to focus on recall of isolated facts or on simple associations. To minimize cuing, stems within a set should be of similar structure.
Amplified Extended Matching Questions
The full range of higher order skills cannot be assessed without requiring some writing from the student. Neither extended matching nor multiple-choice questions are adequate for assessing the generative skills. Essay and short answer questions can be used for this purpose, but alternatively, students might be asked to justify, in a sentence or two, their answers or some of their answers to extended matching questions. For example, examinees might be asked to justify every sixth question or two of ten questions marked with an asterisk. Justifications can be written on a separate page or on the back of the test page. The justification can be used to assign partial credit or only as a check on literacy. This variation provides assessment of the examinee's reasoning and literacy and reduces "sneak-a-peek" cheating. It also increases the time needed to score an examination.

Discussion relative characteristics of 4 types of examinations are summarized in Table 1. It assumes that each are constructed to assess application of important knowledge and without technical flaws. Extended matching questions provide a reasonable compromise between free-response questions and traditional multiple-choice questions, retaining most of the advantages of each and avoiding many of the disadvantages. The extended matching variation, with one best response has maximum utility where the number of examinees is large. Amplified extended matching questions are useful with smaller groups. The purpose of the test should receive highest consideration before selecting the test type, but practical factors are also important.

Test content should follow from associated educational goals and objectives. To the extent that these objectives are explicit, test content and item writing follow directly. Even if the objectives are implicit, for extended matching questions are generally easy to prepare, because of the inherent organization provided by themes, lead-ins, and option lists. Authors can be assigned one or more themes; with a lead-in statement, the option list flows naturally from the theme and lead-in; and the item stems flow naturally from the option list. Each stem becomes a model for preparing additional items in the set. Even new item writers should be able to generate 10 useable items per hour, Of critical importance in the formulation of any test is a careful review of questions. We recommend group item writing and review sessions, and have found that preparation and review of items individually generally requires more time and produces poorer test material than if done in groups. Banks of questions that are well constructed and well targeted can be created and used for years to come.

Table 1. Relative characteristics of 4 types of examinations.*

Essay Short Answer Multiple-Choice Extended-Matching

Application of Knowledge Excellent Good Poor Good, can be improved
with justification

Assessment Excellent Good Poor Poor to good if justification is required

Coverage of Topic Poor Good Excellent Excellent

Reliability of Score Poor to Fair Good Excellent Excellent

Ease of Scoring Poor Moderate Excellent Excellent

Preparation time Minimal to Moderate Moderate Large, if properly done Moderate

Total Costs Large Moderate Low** Low**

Cheating
(Sneak-a-Peek) Most Difficult Difficult Easy Easy unless
justification is
required

* Characteristics of examinations vary depending on the construction and context of the questions. A well-constructed multiple choice question might better assess cognitive skills than a poorly-constructed essay question. The relative characteristics here assume well-constructed questions of all types.

** Particularly with large numbers of examinees.

References and Endnotes

1 . McGuire C: Written methods for assessing clinical competence. In Hart I, Harden R, Eds: Further Developments in Assessing Clinical Competence. Montreal: Can-Heal Publications, 1987, pp 46-58.

2 . Newble D, Baxter A, Elmslie R: A comparison of multiple-choice and free-response tests in examinations of clinical competence. Med Educ 13:263-268, 1979.

3 . Shively MJ: Improving the quality of multiple-choice examinations. J Vet Med Educ 5:76-76, 1978.

4 . Scrivin M: Beyond multiple-choice--but this side of essay questions. Paper presented at International Conference on Critical Thinking. Rohnert Park, CA, August 1991.

5 . Milton O: Improving achievement via essay exams. J Vet Med Educ 6:108-112, 1979.

6 . Case SM and Swanson DB: Extended matching items: a practical alternative to free-response questions. Teaching and Learning in Medicine , in press.

7 . Veloski J, Robinowitz H, Robeson M: Cuing in multiple-choice questions: a reliable, valid and economical solution. In Research in Medical Education . Washington, DC: Association of Medical Colleges, 1988, pp 195-200.

8 . Case SM, Swanson DB: Evaluating diagnostic pattern: a psychometric comparison of stems with 15, 5, and 2 options. Paper presented at Annual Meeting of the American Education Research Association, San Francisco, April 1989.

Laboratory Test Results
PCV	23%
Total WBC	6,400/uL
Total plasma protein	6.0 g/dL
BUN	295 mg/dL
Creatinine	9.2 mg/dL
Cholesterol	l140 mg/dL
Phosphorus	16 mg/dL
Calcium	9.8 mg/dL
Antinuclear antibody	Negative
LE cell	Negative
Urinalysis:
sp. gr	1.010
pH	5.6
Protein	3+
Sediment	Not significant
Renal biopsy:
Direct immunofluorescence	IgG and C3 in glomerular mesangium and subendothelium
Congo-red	Negative

Laboratory Test Results
PCV	29%
Total WBC	27,200/uL
Total plasma protein	6.5 g/dL
BUN	180 mg/dL
Creatinine	8.9 mg/dL
Cholesterol	162 mg/dL
Phosphorus	15.8 mg/dL
Calcium	9.7 mg/dL
Urinalysis:
sp. gr	1.009
pH	6.9
Protein	1+
Sediment
RBC	4-5/high-power field
WBC	6-8/high-power field
Culture	100,000 colonies S. aureus/dL