Journal of Technology Education

Journal of Technology Education

Current Editor: Chris Merrill, cpmerri@ilstu.edu
Previous Editors: Mark Sanders 1989-1997; James LaPorte: 1997-2010

As an open access journal, the JTE does not charge fees for authors to publish or readers to access.


JTE Access Data | About JTE

Volume 4, Number 1
Fall 1992

            Post Hoc Analysis of Test Items Written by Technology Education Teachers
             
                      W. J. Haynie, III
             
                           Technology education teachers frequently author their
                      own tests.  The effectiveness of tests depends upon many
                      factors, however, it is clear that the quality of each
                      individual item is of great importance.  This study sought
                      to determine the quality of teacher-authored test items in
                      terms of nine rating factors.
             
                      BACKGROUND
                           Most testing in schools employs teacher-made tests
                      (Haynie, 1983, 1990, 1991; Herman & Dorr-Bremme, 1982;
                      Mehrens & Lehmann, 1987; Newman & Stallings, 1982).  Despite
                      this dependance upon teacher-made tests, Stiggins, Conklin,
                      and Bridgeford (1986) point out that "nearly all major
                      studies of testing in the schools have focused on the role
                      of standardized tests" (p. 5).
                           Research concerning teacher-constructed tests has found
                      that teachers lack understanding of measurement (Fleming &
                      Chambers, 1983; Gullickson & Ellwein, 1985; Mehrens &
                      Lehmann, 1987; Stiggins & Bridgeford, 1985). Research has
                      shown that teachers lack sufficient training in test
                      development, fail to analyze tests, do not establish
                      reliability or validity, do not use a test blueprint, weight
                      all content equally, rarely test above the basic knowledge
                      level, and use tests with grammatical and spelling errors
                      (Burdin, 1982; Carter, 1984; Gullickson, 1982; Gullickson
                      & Ellwein, 1985; Hills, 1991). Technically their tests are
                      simplistic and depend upon short answer, true-false, and
                      other easily prepared items.  Their multiple-choice items
                      often have serious flaws--especially in distractors (Haynie,
                      1990; Mehrens & Lehmann, 1984, 1987; Newman & Stallings,
                      1982).
                           A few investigations have studied the value of tests as
                      aids to learning subject content (Haynie, 1987, 1990, 1991;
                      Nungester & Duchastel, 1982).  Time on-task has been shown
                      to be very important in many studies (Jackson, 1987; Salmon,
                      1982; Seifert & Beck, 1984).  Taking a test is a time
                      on-task learning activity.  Works which studied testing
                      versus similar on-task time spent in structured review of
                      the material covered in class have had mixed results, but
                      testing appears to be at least as effective as reviews in
                      promotion of learning (Haynie, 1990; Nungester & Duchastel,
                      1982).  Research is lacking on the quality of tests and test
                      items written by technology education teachers.
             
                      PURPOSE
                           The purpose of this investigation was to study the
                      quality of technology education test items written by
                      teachers.  Face validity, clarity, accuracy in identifying
                      taxonometric level, and rates of spelling and punctuation
                      errors were some of the determinants of quality assessed.
                      Additionally, data were collected concerning teachers'
                      experience levels, highest degree held, and sources of
                      training in test construction. The following research
                      questions were addressed in this study:
                      1.  What types of errors are common in test items?
                      2.  Do the error rate or types of errors in teacher
                          constructed test items vary with demographic factors?
                      3.  Do teachers understand how to match test items to
                          curriculum content and taxonometric level?
             
                      METHODOLOGY
                      SOURCE OF DATA
                           Between April 23, 1988 and January 8, 1990, a team of
                      15 technology education teachers worked to develop test
                      items for a computerized test item bank for the North
                      Carolina State Department of Public Instruction (SDPI).  The
                      work was completed under two projects funded by SDPI and
                      directed by DeLuca and Haynie (1989, 1990) at North Carolina
                      State University.  The data for this study came from the
                      items developed in those projects.
             
                      TEST ITEM AUTHORS
                           The teachers were selected on recommendation of
                      supervisors, SDPI consultants, or teacher educators.  All
                      were recognized as leaders among their peers and most had
                      been nominated for teacher of the year or program of the
                      year commendation.  They were all active in the North
                      Carolina Technology Education Association and supported the
                      transition to the new curriculum.  Table 1 displays
                      demographic data concerning the test item authors.
             
             
                      TABLE 1
                      PROFILE OF AUTHORS' DEMOGRAPHIC FACTORS
                      ---------------------------------------------------
             
                      Graduate
                               Years of     Undergraduate   Test &
                               Teaching  Highest  Test & Measure  Measure
                      Author  Experience  Degree     Courses     Courses
                      ---------------------------------------------------
                        1             9        B.S.         0     0
                        2             5        B.S.         1     0
                        3             23       B.S.         0     0
                        4             4        B.S.         0     1
                        5             5        B.S.         0     1
                        6             23       M.Ed.        0     1
                        7             19       M.Ed.        0     1
                        8             17   M.Ed. + 2 yrs.   0     2
                        9             25       M.Ed.        0     0
                        10            5        M.Ed.        0     0
                        11            7        M.Ed.        0     0
                        12            7        B.S.         0     0
                        13            7        M.Ed.        0     0
                        14            15       B.S.         1     0
                        15            5        B.S.         1     1
                      ---------------------------------------------------
             
             
                      TRAINING OF AUTHORS
                           Teachers came to the university campus for a workshop
                      on April 23, 1988.  Project directors oriented teachers to
                      the computerized test bank, reviewed the revised technology
                      education curriculum, and explained how to develop good test
                      items.  A 13 page instructional packet was also given to
                      each author.  It should be noted that the training session
                      and instructional packet may confound attempts to generalize
                      these findings.
                           The authors were required to develop and properly code
                      six items which were submitted for approval and corrective
                      feedback before they were allowed to proceed.  The teachers
                      who authored the items were paid an honorarium for their
                      services.
             
                      EDITING AND CODING OF ITEMS
                           Each item was prepared on a separate sheet of paper
                      with a coding sheet attached and completed by the teacher.
                      The coding sheet identified the author, the specific
                      objective tested, the taxonometric level, and information
                      for the computerized system.  The project directors edited
                      the items with contrasting colored felt tip pens on the
                      teachers' original forms.
             
                      DESIGN OF THIS STUDY
                           The data for this investigation were the editing
                      markings on the original test items submitted by the
                      teachers.  Scores for 9 scales of information were recorded
                      for analysis.  Each of the scales was established so that a
                      low score would be optimal.  The scales were Spelling Errors
                      (SE), Punctuation Errors (PE), Distractors (D), Key (K),
                      Usability (U), Validity (V), Stem Clarity (SC), Taxonomy
                      (TX), and an overall Quality (Q) rating.  After all of the
                      ratings were completed, the General Linear Models (GLM)
                      procedure was used for F testing and the LSD procedure was
                      used when t-tests were appropriate.
             
                      FINDINGS
                      SPELLING ERRORS (SE)
                           The frequency and percentage of scores for the 993
                      items on the nine ratings, and mean scores of each factor,
                      are shown in Table 2.  An item's SE rating indicates how
                      many words were misspelled in the item. There were 98 items
                      (10%) which had one or more spelling errors.  Spelling
                      errors are detrimental to good teaching and testing. However
                      the literature shows that this problem is common to other
                      disciplines.
             
                      TABLE 2
                      RATINGS OF TEST ITEM QUALITY
                      -----------------------------------------------------------
                                    Frequency of       % of      Mean
                                      Items With     Items/      Item
                      Rating Category      Score  Each Score    Score   Score   SD
                      -----------------------------------------------------------
                      Spelling Errors (SE)    0         895      90.1
                                              1          76       7.7
                                              2          11       1.1
                                              3           6       0.6
                                              4           3       0.3
                                              5           1       0.1
                                              6           1       0.1
                             SE Totals       ---        993      100%   0.14  0.52
                      -----------------------------------------------------------
                      Punctuation Errors(PE)  0         735      74.0
                                              1         220      22.2
                                              2          25       2.5
                                              3           4       0.4
                                              4           1       0.1
                                              5           8       0.8
                            PE Totals        ---        993      100%   0.38  0.68
                      -----------------------------------------------------------
                      Distractors (D)         0        447       45.0
                                              1        398       40.1
                                              2         95        9.6
                                              3         30        3.0
                                              4          9        0.9
                                              5         14        1.4
                      D Totals               ---       993       100%   0.79  0.96
                      -----------------------------------------------------------
                      Key (K)                 0        889       89.5
                                              2        104       10.5
                      K Totals               ---       993       100%   0.21  0.61
             
                      -----------------------------------------------------------
                      Usability (U)           0        249       25.1
                                              1        265       26.7
                                              2        159       16.0
                                              3        131       13.2
                                              4         74        7.5
                                              5         50        5.0
                                              6         21        2.1
                                              7         11        1.1
                                              8         16        1.6
                                              9         17        1.7
                      U Totals               ---       993       100%   2.02  2.04
                      -----------------------------------------------------------
                      Stem Clarity (SC)       0        602       60.6
                                              1        352       35.4
                                              2         39        3.9
                      SC Totals              ---       993       100%   0.43  0.57
                      -----------------------------------------------------------
                      Taxonomy (TX)           0        835       84.1
                                              1        124       12.5
                                              2         34        3.4
                      TX Totals              ---       993       100%   0.19  0.47
                      -----------------------------------------------------------
                      Quality (Q)             0        208       20.9
                                              1        235       23.7
                                              2        200       20.1
                                              3        129       13.0
                                              4         74        7.5
                                              5         58        5.8
                                              6         42        4.2
                                              7         17        1.7
                                              8         10        1.0
                                              9         12        1.2
                                             10          2        0.2
                                             11          3        0.3
                                             12          1        0.1
                                             13          1        0.1
                                             14          1        0.1
                                             15          0        ---
                                             16          0        ---
                                             17          1        0.1
                          Q Totals          ----       993       100%   2.28  2.20
                      ----------------------------------------------------------
                      NOTE. There were 993 items.
             
                           The authors were compared on each of the scales to
                      determine whether they differed significantly and to see if
                      similar or dissimilar errors were made by different authors.
                      On the spelling errors factor authors were found to differ
                      significantly:  F(14, 978) = 11.99, p<.0001. ___="" ____="" a="" ability="" ability.="" about="" above="" according="" accuracy="" accurate="" accurately="" activities="" actually="" additionally="" addressed="" adjacent="" after="" agreement="" all="" alone="" already="" also="" alternatively="" alternatives="" among="" an="" analysis="" and="" another="" answer="" answered="" any="" apparently="" appear="" appeared="" appeared.="" application="" are="" areas="" article="" as="" aspect="" aspects="" assessment="" assigned="" assignments="" assume="" at="" attained="" author="" authored="" authors="" average="" bachelor="" bank="" be="" be:="" because="" become="" been="" before="" begins="" begun="" believed="" beneficial.="" best="" better="" between="" blank.="" bloom="" both="" burn="" but="" by="" can="" capable="" carefully="" case="" categories="" categories.="" category="" category.="" category:="" clarity="" clarity.="" clear="" clearly="" clearly.="" code="" coded="" codes="" coding="" cognitive="" colon="" common="" compared="" comparisons="" comparisons.="" competing="" comprehension="" conclude="" concluded="" conclusions="" confusing="" confusion="" considered="" correct="" correct.="" correctly="" correspond="" could="" counted="" counting="" course="" course.="" courses="" criticism="" d="" damaging="" data="" defects="" degree="" degrees="" demographic="" demonstrate.="" demonstrated="" derived="" desired="" despite="" develop="" developed="" developing="" development="" devote="" devoted="" did="" differ="" differed="" difference="" differences="" different="" differing="" difficult="" difficulties="" difficulty="" disciplines="" discussion="" distractors="" distractors:="" divided="" documents="" domain="" done="" drawn="" due="" each="" earlier="" earned="" editing="" education="" effective="" eight="" either="" element="" eliminated="" enabling="" end="" ended="" enough="" equalled="" error="" errors="" errors.="" evaluation="" even="" exactly="" examine="" example="" except="" experience="" experience.="" experienced="" explained="" extent="" extra="" f="" face="" face.="" fact="" factor="" factors="" favorably="" felt="" fewer="" finding="" findings="" findings.="" first="" five="" flawless.="" follow-up="" for="" forms--thus="" forth="" found="" four="" frequent="" frequently="" from="" function="" gender.="" general="" generally="" given="" good="" grading="" graduate="" grammar="" grand="" greater="" groups="" had="" has="" have="" held="" help="" helped="" helpful="" helping="" here="" high="" higher="" highest="" how="" however="" identified="" identify="" if="" immediately="" important="" improve="" improved="" in="" in:="" inaccurate="" included="" incompatibility="" incorrect="" incorrectly="" increase="" indeed="" indicate="" indicated="" indicates="" indicating="" individual="" inexperienced="" inflated="" informal="" information="" instructions="" insufficient="" intended="" intention.="" into="" introductory="" invalid="" investigated="" investigation="" is="" it="" item="" item.="" items="" items.="" items:="" judged="" judgement="" k="" key="" keyed="" keying="" keying.="" knowledge="" knowledgeable="" lack="" lead="" learning="" learning.="" least="" left="" less="" level="" level.="" level:="" levels="" likely="" likewise="" limited="" longer="" low="" lower="" lsd="" made="" many="" marginally="" marked="" marks="" match="" matter="" may="" meaningful="" means="" measurement="" measurements="" mechanical="" mismatch="" misspelled="" more="" most="" much="" n="" name="Burdin" necessarily="" need="" needs="" neighboring="" no="" none="" normal="" not="" note.="" noteworthy="" number="" numerous="" objective="" objectives="" objectives.="" obviously="" of="" off="" often="" omission="" on="" on-task="" one="" ones="" only="" operate="" operate.="" operated="" or="" original="" other="" others="" others.="" out="" outperformed="" overall="" p="" pair="" partial="" participated.="" particular="" patience="" pe="" peers="" per="" perhaps="" plural="" plus="" poor="" poorer="" poorest="" poorly="" portion="" possessed="" possibility="" possibly="" predicted="" preparation="" prepare="" prepared="" preparing="" presented="" previous="" problem="" problems="" problems.="" procedure="" produced="" profession="" projects="" promote="" proofreading="" proper="" prose="" punctuation="" punctuation.="" purpose="" purposes.="" q="" quality="" quality.="" quality:="" quantify="" question="" questionable.="" questions="" range="" rate="" rated="" rates.="" rating="" rating.="" rating:="" ratings="" ratings.="" read="" reading="" reasonable="" reasonably="" received="" recent="" references="" regardless="" regrettably="" related="" reliability="" remain="" remaining="" required="" research="" research.="" researcher="" response="" resulted="" results="" safe="" same="" sample="" sampling.="" sc="" score="" scoring="" se="" see="" seem="" seemed="" selected="" selection="" set="" several="" should="" showed="" shown="" shows="" significant="" significantly="" simple="" simply="" since="" singular="" six="" size="" sizeable="" small="" so="" some="" sort="" special="" specific="" spelling="" spend="" spent.="" spurious="" spuriously="" statements="" stem="" stems="" stems.="" stems:="" still="" stressed="" students="" studied="" studied:="" study="" study--but="" study.="" study:="" subject="" subjective="" submitted="" submitting="" such="" suggest="" suited="" sum="" summarized="" summation="" summation.="" summed="" summing="" superior="" switched="" table="" take="" taken="" taking="" targeted="" taxonometric="" taxonomy="" taxonomy:="" teacher="" teacher-made="" teacher.="" teachers="" teachers.="" teachers:="" teaching="" technology="" tense="" terms="" test="" test.="" tested="" tested.="" testing="" tests="" tests.="" than="" that="" the="" their="" them="" them.="" then="" theories.="" there="" there.="" these="" they="" this="" those="" though="" three="" time="" time.="" to="" together="" total="" training="" true="" two="" tx="" types="" u="" unanticipated="" undergraduate="" understandable="" unique="" unknown="" usability="" usable="" use="" used="" useful="" usefully="" v="" valid="" validity="" validity.="" value="" variables="" vary="" version="" very="" via="" was="" waste="" ways="" weaknesses="" well="" were="" were:="" when="" whether="" which="" who="" with="" word="" worded="" wording="" work="" works="" worse="" worst="" would="" write="" writing="" wrong="" wrote="" years="">Burdin, J.L. (1982). Teacher certification. In H.E. Mitzel
                          (Ed.), Encyclopedia of education research (5th ed.). New
                          York:  Free Press.
                      Carter, K. (1984). Do teachers understand the principles for
                          writing  tests? Journal of Teacher Education, 35(6),
                          57-60.
                      DeLuca, V.W. & Haynie, W.J. (1990). Updating,
                          computerization, and field validation of
                          competency-based test-item banks for selected
                          construction and communications technology
                          courses (Contract No. RFP 90-A-07). Raleigh, NC: North
                          Carolina State Department of Public Instruction.
                      DeLuca, V.W. & Haynie, W.J. (1989). Updating,
                          computerization , and field validation of
                          competency-based test-item banks for selected
                          manufacturing technology education courses (Contract No.
                          RFP  88-R-03). Raleigh, NC: North Carolina State
                          Department of Public  Instruction.
                      Fleming, M. & Chambers, B. (1983). Teacher-made tests:
                          Windows on the  classroom. In W. E. Hathaway (Ed.),
                          Testing in the schools: New directions for testing and
                          measurement, NO. 19 (pp.29-38). San  Francisco:
                          Jossey-Bass.
                      Gullickson, A.R. (1982). Survey data collected in survey of
                          South Dakota teachers' attitudes and opinions toward
                          testing. Vermillion: University of South Dakota.
                      Gullickson, A.R. & Ellwein, M.C. (1985). Post hoc analysis
                          of teacher-made tests: The goodness-of-fit between
                          prescription and practice. Educational Measurement:
                          Issues and Practice, 4(1), 15-18.
                      Haynie, W.J. (1983). Student evaluation: The teachers' most
                          difficult job. Monograph Series of the Virginia
                          Industrial Arts Teacher  Education Council, Monograph
                          Number 11.
                      Haynie, W.J. (1987). Anticipation of tests as a learning
                          variable.  Unpublished manuscript, North Carolina State
                          University,  Raleigh, NC.
                      Haynie, W.J. (1990). Effects of tests and anticipation of
                          tests on  learning via videotaped materials. Journal of
                          Industrial Teacher Education, 27(4), 18-30.
                      Haynie, W.J. (1991). Effects of take-home and in-class tests
                          on delayed retention learning acquired via
                          individualized, self-paced instructional texts.
                          Manuscript submitted for publication.
                      Herman, J. & Dorr-Bremme, D.W. (1982). Assessing
                          students: Teachers' routine practices and reasoning.
                          Paper presented at the annual  meeting of the American
                          Educational Research Association, New  York.
                      Hills, J.R. (1991). Apathy concerning grading and testing.
                          Phi Delta Kappan, 72(7), 540-545.
                      Jackson, S.D. (1987). The relationship between time and
                          achievement in selected automobile mechanics classes.
                          (Doctoral dissertation,  Texas A&M University).
                      Mehrens, W.A. & Lehmann, I.J. (1984). Measurement and
                          Evaluation in Education and Psychology. 3rd ed. New
                          York: Holt, Rinehart, and  Winston.
                      Mehrens, W.A. & Lehmann, I.J. (1987). Using teacher-made
                          measurement  devices. NASSP Bulletin, 71(496), 36-44.
                      Newman, D.C. &  Stallings, W.M. (1982,  March). Teacher
                          competency in classroom testing, measurement
                          preparation, and classroom testing. Paper
                          presented at the Annual Meeting of the National  Council
                          on measurement in Education. (In Mehrens & Lehmann,
                          1987)
                      Nungester, R.J. & Duchastel, P.C. (1982). Testing versus
                          review: Effects on retention. Journal of Educational
                          Psychology, 74(1),  18-22.
                      Salmon, P.B. (Ed.). (1982). Time on task: Using
                          instructional time more effectively. Arlington, VA:
                          American Association of  School Administrators.
                      Seifert, E.H. & Beck, J.J. (1984). Relationships between
                          task time and  learning gains in secondary schools.
                          Journal of Educational Research, 78(1), 5-10.
                      Stiggins, R.J. & Bridgeford, N.J. (1985). The ecology of
                          classroom assessment. Journal of Educational
                          Measurement, 22(4), 271-286.
                      Stiggins, R.J., Conklin, N.F. & Bridgeford, N.J. (1986).
                          Classroom  assessment: A key to effective education.
                          Educational Measurement: Issues and Practice, 5(2),
                          5-17.
             
             
                      ----------------
                      W.J. Haynie, III is Associate Professor, Department of
                      Occupational Education, North Carolina State University,
                      Raleigh, NC.
             
             
                    Permission is given to copy any
                      article or graphic provided credit is given and
                      the copies are not intended for sale.
             
            Journal of Technology Education   Volume 4, Number 1       Fall 1992