JTE v4n1 - Post Hoc Analysis of Test Items Written by Technology Education Teachers

Volume 4, Number 1
Fall 1992

Post Hoc Analysis of Test Items Written by Technology Education Teachers
          W. J. Haynie, III
               Technology education teachers frequently author their
          own tests.  The effectiveness of tests depends upon many
          factors, however, it is clear that the quality of each
          individual item is of great importance.  This study sought
          to determine the quality of teacher-authored test items in
          terms of nine rating factors.
               Most testing in schools employs teacher-made tests
          (Haynie, 1983, 1990, 1991; Herman & Dorr-Bremme, 1982;
          Mehrens & Lehmann, 1987; Newman & Stallings, 1982).  Despite
          this dependance upon teacher-made tests, Stiggins, Conklin,
          and Bridgeford (1986) point out that "nearly all major
          studies of testing in the schools have focused on the role
          of standardized tests" (p. 5).
               Research concerning teacher-constructed tests has found
          that teachers lack understanding of measurement (Fleming &
          Chambers, 1983; Gullickson & Ellwein, 1985; Mehrens &
          Lehmann, 1987; Stiggins & Bridgeford, 1985). Research has
          shown that teachers lack sufficient training in test
          development, fail to analyze tests, do not establish
          reliability or validity, do not use a test blueprint, weight
          all content equally, rarely test above the basic knowledge
          level, and use tests with grammatical and spelling errors
          (Burdin, 1982; Carter, 1984; Gullickson, 1982; Gullickson
          & Ellwein, 1985; Hills, 1991). Technically their tests are
          simplistic and depend upon short answer, true-false, and
          other easily prepared items.  Their multiple-choice items
          often have serious flaws--especially in distractors (Haynie,
          1990; Mehrens & Lehmann, 1984, 1987; Newman & Stallings,
               A few investigations have studied the value of tests as
          aids to learning subject content (Haynie, 1987, 1990, 1991;
          Nungester & Duchastel, 1982).  Time on-task has been shown
          to be very important in many studies (Jackson, 1987; Salmon,
          1982; Seifert & Beck, 1984).  Taking a test is a time
          on-task learning activity.  Works which studied testing
          versus similar on-task time spent in structured review of
          the material covered in class have had mixed results, but
          testing appears to be at least as effective as reviews in
          promotion of learning (Haynie, 1990; Nungester & Duchastel,
          1982).  Research is lacking on the quality of tests and test
          items written by technology education teachers.
               The purpose of this investigation was to study the
          quality of technology education test items written by
          teachers.  Face validity, clarity, accuracy in identifying
          taxonometric level, and rates of spelling and punctuation
          errors were some of the determinants of quality assessed.
          Additionally, data were collected concerning teachers'
          experience levels, highest degree held, and sources of
          training in test construction. The following research
          questions were addressed in this study:
          1.  What types of errors are common in test items?
          2.  Do the error rate or types of errors in teacher
              constructed test items vary with demographic factors?
          3.  Do teachers understand how to match test items to
              curriculum content and taxonometric level?
          SOURCE OF DATA
               Between April 23, 1988 and January 8, 1990, a team of
          15 technology education teachers worked to develop test
          items for a computerized test item bank for the North
          Carolina State Department of Public Instruction (SDPI).  The
          work was completed under two projects funded by SDPI and
          directed by DeLuca and Haynie (1989, 1990) at North Carolina
          State University.  The data for this study came from the
          items developed in those projects.
               The teachers were selected on recommendation of
          supervisors, SDPI consultants, or teacher educators.  All
          were recognized as leaders among their peers and most had
          been nominated for teacher of the year or program of the
          year commendation.  They were all active in the North
          Carolina Technology Education Association and supported the
          transition to the new curriculum.  Table 1 displays
          demographic data concerning the test item authors.
          TABLE 1
                   Years of     Undergraduate   Test &
                   Teaching  Highest  Test & Measure  Measure
          Author  Experience  Degree     Courses     Courses
            1             9        B.S.         0     0
            2             5        B.S.         1     0
            3             23       B.S.         0     0
            4             4        B.S.         0     1
            5             5        B.S.         0     1
            6             23       M.Ed.        0     1
            7             19       M.Ed.        0     1
            8             17   M.Ed. + 2 yrs.   0     2
            9             25       M.Ed.        0     0
            10            5        M.Ed.        0     0
            11            7        M.Ed.        0     0
            12            7        B.S.         0     0
            13            7        M.Ed.        0     0
            14            15       B.S.         1     0
            15            5        B.S.         1     1
               Teachers came to the university campus for a workshop
          on April 23, 1988.  Project directors oriented teachers to
          the computerized test bank, reviewed the revised technology
          education curriculum, and explained how to develop good test
          items.  A 13 page instructional packet was also given to
          each author.  It should be noted that the training session
          and instructional packet may confound attempts to generalize
          these findings.
               The authors were required to develop and properly code
          six items which were submitted for approval and corrective
          feedback before they were allowed to proceed.  The teachers
          who authored the items were paid an honorarium for their
               Each item was prepared on a separate sheet of paper
          with a coding sheet attached and completed by the teacher.
          The coding sheet identified the author, the specific
          objective tested, the taxonometric level, and information
          for the computerized system.  The project directors edited
          the items with contrasting colored felt tip pens on the
          teachers' original forms.
               The data for this investigation were the editing
          markings on the original test items submitted by the
          teachers.  Scores for 9 scales of information were recorded
          for analysis.  Each of the scales was established so that a
          low score would be optimal.  The scales were Spelling Errors
          (SE), Punctuation Errors (PE), Distractors (D), Key (K),
          Usability (U), Validity (V), Stem Clarity (SC), Taxonomy
          (TX), and an overall Quality (Q) rating.  After all of the
          ratings were completed, the General Linear Models (GLM)
          procedure was used for F testing and the LSD procedure was
          used when t-tests were appropriate.
               The frequency and percentage of scores for the 993
          items on the nine ratings, and mean scores of each factor,
          are shown in Table 2.  An item's SE rating indicates how
          many words were misspelled in the item. There were 98 items
          (10%) which had one or more spelling errors.  Spelling
          errors are detrimental to good teaching and testing. However
          the literature shows that this problem is common to other
          TABLE 2
                        Frequency of       % of      Mean
                          Items With     Items/      Item
          Rating Category      Score  Each Score    Score   Score   SD
          Spelling Errors (SE)    0         895      90.1
                                  1          76       7.7
                                  2          11       1.1
                                  3           6       0.6
                                  4           3       0.3
                                  5           1       0.1
                                  6           1       0.1
                 SE Totals       ---        993      100%   0.14  0.52
          Punctuation Errors(PE)  0         735      74.0
                                  1         220      22.2
                                  2          25       2.5
                                  3           4       0.4
                                  4           1       0.1
                                  5           8       0.8
                PE Totals        ---        993      100%   0.38  0.68
          Distractors (D)         0        447       45.0
                                  1        398       40.1
                                  2         95        9.6
                                  3         30        3.0
                                  4          9        0.9
                                  5         14        1.4
          D Totals               ---       993       100%   0.79  0.96
          Key (K)                 0        889       89.5
                                  2        104       10.5
          K Totals               ---       993       100%   0.21  0.61
          Usability (U)           0        249       25.1
                                  1        265       26.7
                                  2        159       16.0
                                  3        131       13.2
                                  4         74        7.5
                                  5         50        5.0
                                  6         21        2.1
                                  7         11        1.1
                                  8         16        1.6
                                  9         17        1.7
          U Totals               ---       993       100%   2.02  2.04
          Stem Clarity (SC)       0        602       60.6
                                  1        352       35.4
                                  2         39        3.9
          SC Totals              ---       993       100%   0.43  0.57
          Taxonomy (TX)           0        835       84.1
                                  1        124       12.5
                                  2         34        3.4
          TX Totals              ---       993       100%   0.19  0.47
          Quality (Q)             0        208       20.9
                                  1        235       23.7
                                  2        200       20.1
                                  3        129       13.0
                                  4         74        7.5
                                  5         58        5.8
                                  6         42        4.2
                                  7         17        1.7
                                  8         10        1.0
                                  9         12        1.2
                                 10          2        0.2
                                 11          3        0.3
                                 12          1        0.1
                                 13          1        0.1
                                 14          1        0.1
                                 15          0        ---
                                 16          0        ---
                                 17          1        0.1
              Q Totals          ----       993       100%   2.28  2.20
          NOTE. There were 993 items.
               The authors were compared on each of the scales to
          determine whether they differed significantly and to see if
          similar or dissimilar errors were made by different authors.
          On the spelling errors factor authors were found to differ
          significantly:  F(14, 978) = 11.99, p<.0001. ___="" ____="" a="" ability="" ability.="" about="" above="" according="" accuracy="" accurate="" accurately="" activities="" actually="" additionally="" addressed="" adjacent="" after="" agreement="" all="" alone="" already="" also="" alternatively="" alternatives="" among="" an="" analysis="" and="" another="" answer="" answered="" any="" apparently="" appear="" appeared="" appeared.="" application="" are="" areas="" article="" as="" aspect="" aspects="" assessment="" assigned="" assignments="" assume="" at="" attained="" author="" authored="" authors="" average="" bachelor="" bank="" be="" be:="" because="" become="" been="" before="" begins="" begun="" believed="" beneficial.="" best="" better="" between="" blank.="" bloom="" both="" burn="" but="" by="" can="" capable="" carefully="" case="" categories="" categories.="" category="" category.="" category:="" clarity="" clarity.="" clear="" clearly="" clearly.="" code="" coded="" codes="" coding="" cognitive="" colon="" common="" compared="" comparisons="" comparisons.="" competing="" comprehension="" conclude="" concluded="" conclusions="" confusing="" confusion="" considered="" correct="" correct.="" correctly="" correspond="" could="" counted="" counting="" course="" course.="" courses="" criticism="" d="" damaging="" data="" defects="" degree="" degrees="" demographic="" demonstrate.="" demonstrated="" derived="" desired="" despite="" develop="" developed="" developing="" development="" devote="" devoted="" did="" differ="" differed="" difference="" differences="" different="" differing="" difficult="" difficulties="" difficulty="" disciplines="" discussion="" distractors="" distractors:="" divided="" documents="" domain="" done="" drawn="" due="" each="" earlier="" earned="" editing="" education="" effective="" eight="" either="" element="" eliminated="" enabling="" end="" ended="" enough="" equalled="" error="" errors="" errors.="" evaluation="" even="" exactly="" examine="" example="" except="" experience="" experience.="" experienced="" explained="" extent="" extra="" f="" face="" face.="" fact="" factor="" factors="" favorably="" felt="" fewer="" finding="" findings="" findings.="" first="" five="" flawless.="" follow-up="" for="" forms--thus="" forth="" found="" four="" frequent="" frequently="" from="" function="" gender.="" general="" generally="" given="" good="" grading="" graduate="" grammar="" grand="" greater="" groups="" had="" has="" have="" held="" help="" helped="" helpful="" helping="" here="" high="" higher="" highest="" how="" however="" identified="" identify="" if="" immediately="" important="" improve="" improved="" in="" in:="" inaccurate="" included="" incompatibility="" incorrect="" incorrectly="" increase="" indeed="" indicate="" indicated="" indicates="" indicating="" individual="" inexperienced="" inflated="" informal="" information="" instructions="" insufficient="" intended="" intention.="" into="" introductory="" invalid="" investigated="" investigation="" is="" it="" item="" item.="" items="" items.="" items:="" judged="" judgement="" k="" key="" keyed="" keying="" keying.="" knowledge="" knowledgeable="" lack="" lead="" learning="" learning.="" least="" left="" less="" level="" level.="" level:="" levels="" likely="" likewise="" limited="" longer="" low="" lower="" lsd="" made="" many="" marginally="" marked="" marks="" match="" matter="" may="" meaningful="" means="" measurement="" measurements="" mechanical="" mismatch="" misspelled="" more="" most="" much="" n="" name="Burdin" necessarily="" need="" needs="" neighboring="" no="" none="" normal="" not="" note.="" noteworthy="" number="" numerous="" objective="" objectives="" objectives.="" obviously="" of="" off="" often="" omission="" on="" on-task="" one="" ones="" only="" operate="" operate.="" operated="" or="" original="" other="" others="" others.="" out="" outperformed="" overall="" p="" pair="" partial="" participated.="" particular="" patience="" pe="" peers="" per="" perhaps="" plural="" plus="" poor="" poorer="" poorest="" poorly="" portion="" possessed="" possibility="" possibly="" predicted="" preparation="" prepare="" prepared="" preparing="" presented="" previous="" problem="" problems="" problems.="" procedure="" produced="" profession="" projects="" promote="" proofreading="" proper="" prose="" punctuation="" punctuation.="" purpose="" purposes.="" q="" quality="" quality.="" quality:="" quantify="" question="" questionable.="" questions="" range="" rate="" rated="" rates.="" rating="" rating.="" rating:="" ratings="" ratings.="" read="" reading="" reasonable="" reasonably="" received="" recent="" references="" regardless="" regrettably="" related="" reliability="" remain="" remaining="" required="" research="" research.="" researcher="" response="" resulted="" results="" safe="" same="" sample="" sampling.="" sc="" score="" scoring="" se="" see="" seem="" seemed="" selected="" selection="" set="" several="" should="" showed="" shown="" shows="" significant="" significantly="" simple="" simply="" since="" singular="" six="" size="" sizeable="" small="" so="" some="" sort="" special="" specific="" spelling="" spend="" spent.="" spurious="" spuriously="" statements="" stem="" stems="" stems.="" stems:="" still="" stressed="" students="" studied="" studied:="" study="" study--but="" study.="" study:="" subject="" subjective="" submitted="" submitting="" such="" suggest="" suited="" sum="" summarized="" summation="" summation.="" summed="" summing="" superior="" switched="" table="" take="" taken="" taking="" targeted="" taxonometric="" taxonomy="" taxonomy:="" teacher="" teacher-made="" teacher.="" teachers="" teachers.="" teachers:="" teaching="" technology="" tense="" terms="" test="" test.="" tested="" tested.="" testing="" tests="" tests.="" than="" that="" the="" their="" them="" them.="" then="" theories.="" there="" there.="" these="" they="" this="" those="" though="" three="" time="" time.="" to="" together="" total="" training="" true="" two="" tx="" types="" u="" unanticipated="" undergraduate="" understandable="" unique="" unknown="" usability="" usable="" use="" used="" useful="" usefully="" v="" valid="" validity="" validity.="" value="" variables="" vary="" version="" very="" via="" was="" waste="" ways="" weaknesses="" well="" were="" were:="" when="" whether="" which="" who="" with="" word="" worded="" wording="" work="" works="" worse="" worst="" would="" write="" writing="" wrong="" wrote="" years="">Burdin, J.L. (1982). Teacher certification. In H.E. Mitzel
              (Ed.), Encyclopedia of education research (5th ed.). New
              York:  Free Press.
          Carter, K. (1984). Do teachers understand the principles for
              writing  tests? Journal of Teacher Education, 35(6),
          DeLuca, V.W. & Haynie, W.J. (1990). Updating,
              computerization, and field validation of
              competency-based test-item banks for selected
              construction and communications technology
              courses (Contract No. RFP 90-A-07). Raleigh, NC: North
              Carolina State Department of Public Instruction.
          DeLuca, V.W. & Haynie, W.J. (1989). Updating,
              computerization , and field validation of
              competency-based test-item banks for selected
              manufacturing technology education courses (Contract No.
              RFP  88-R-03). Raleigh, NC: North Carolina State
              Department of Public  Instruction.
          Fleming, M. & Chambers, B. (1983). Teacher-made tests:
              Windows on the  classroom. In W. E. Hathaway (Ed.),
              Testing in the schools: New directions for testing and
              measurement, NO. 19 (pp.29-38). San  Francisco:
          Gullickson, A.R. (1982). Survey data collected in survey of
              South Dakota teachers' attitudes and opinions toward
              testing. Vermillion: University of South Dakota.
          Gullickson, A.R. & Ellwein, M.C. (1985). Post hoc analysis
              of teacher-made tests: The goodness-of-fit between
              prescription and practice. Educational Measurement:
              Issues and Practice, 4(1), 15-18.
          Haynie, W.J. (1983). Student evaluation: The teachers' most
              difficult job. Monograph Series of the Virginia
              Industrial Arts Teacher  Education Council, Monograph
              Number 11.
          Haynie, W.J. (1987). Anticipation of tests as a learning
              variable.  Unpublished manuscript, North Carolina State
              University,  Raleigh, NC.
          Haynie, W.J. (1990). Effects of tests and anticipation of
              tests on  learning via videotaped materials. Journal of
              Industrial Teacher Education, 27(4), 18-30.
          Haynie, W.J. (1991). Effects of take-home and in-class tests
              on delayed retention learning acquired via
              individualized, self-paced instructional texts.
              Manuscript submitted for publication.
          Herman, J. & Dorr-Bremme, D.W. (1982). Assessing
              students: Teachers' routine practices and reasoning.
              Paper presented at the annual  meeting of the American
              Educational Research Association, New  York.
          Hills, J.R. (1991). Apathy concerning grading and testing.
              Phi Delta Kappan, 72(7), 540-545.
          Jackson, S.D. (1987). The relationship between time and
              achievement in selected automobile mechanics classes.
              (Doctoral dissertation,  Texas A&M University).
          Mehrens, W.A. & Lehmann, I.J. (1984). Measurement and
              Evaluation in Education and Psychology. 3rd ed. New
              York: Holt, Rinehart, and  Winston.
          Mehrens, W.A. & Lehmann, I.J. (1987). Using teacher-made
              measurement  devices. NASSP Bulletin, 71(496), 36-44.
          Newman, D.C. &  Stallings, W.M. (1982,  March). Teacher
              competency in classroom testing, measurement
              preparation, and classroom testing. Paper
              presented at the Annual Meeting of the National  Council
              on measurement in Education. (In Mehrens & Lehmann,
          Nungester, R.J. & Duchastel, P.C. (1982). Testing versus
              review: Effects on retention. Journal of Educational
              Psychology, 74(1),  18-22.
          Salmon, P.B. (Ed.). (1982). Time on task: Using
              instructional time more effectively. Arlington, VA:
              American Association of  School Administrators.
          Seifert, E.H. & Beck, J.J. (1984). Relationships between
              task time and  learning gains in secondary schools.
              Journal of Educational Research, 78(1), 5-10.
          Stiggins, R.J. & Bridgeford, N.J. (1985). The ecology of
              classroom assessment. Journal of Educational
              Measurement, 22(4), 271-286.
          Stiggins, R.J., Conklin, N.F. & Bridgeford, N.J. (1986).
              Classroom  assessment: A key to effective education.
              Educational Measurement: Issues and Practice, 5(2),
          W.J. Haynie, III is Associate Professor, Department of
          Occupational Education, North Carolina State University,
          Raleigh, NC.
        Permission is given to copy any
          article or graphic provided credit is given and
          the copies are not intended for sale.
Journal of Technology Education   Volume 4, Number 1       Fall 1992