| Type of Document |
Dissertation |
| Author |
Yue, Xiaohui
|
| Author's Email Address |
yuexi@vt.edu |
| URN |
etd-07272011-104720 |
| Title |
Detecting Rater Centrality Effect
Using Simulation Methods and Rasch Measurement Analysis |
| Degree |
PhD |
| Department |
Educational Leadership and Policy Studies |
| Advisory Committee |
| Advisor Name |
Title |
| Skaggs, Gary E. |
Committee Co-Chair |
| Wolfe, Edward W. |
Committee Co-Chair |
| Creamer, Elizabeth G. |
Committee Member |
| Miyazaki, Yasuo |
Committee Member |
|
| Keywords |
- ANOVA
- Rasch measurement
- centrality
- rater effects
- Type I and Type II errors
- performance assessment
- statistical power
- logistic regression
|
| Date of Defense |
2011-07-14 |
| Availability |
unrestricted |
Abstract
This dissertation illustrates how to detect the rater centrality effect in a simulation study that approximates data collected in large scale performance assessment settings. It addresses three research questions that: (1) which of several centrality-detection indices are most sensitive to the difference between effect raters and non-effect raters; (2) how accurate (and inaccurate), in terms of Type I error rate and statistical power, each centrality-detection index is in flagging effect raters; and (3) how the features of the data collection design (i.e., the independent variables including the level of centrality strength, the double-scoring rate, and the number of raters and ratees) influence the accuracy of rater classifications by these centrality-detection indices. The results reveal that the measure-residual correlation, the expected-residual correlation, and the standardized deviation of assigned scores perform better than the point-measure correlation. The mean-square fit statistics, traditionally viewed as potential indicators of rater centrality, perform poorly in terms of differentiating central raters from normal raters. Along with the rater slope index, the mean-square fit statistics did not appear to be sensitive to the rater centrality effect. All of these indices provided reasonable protection against Type I errors when all responses were double scored, and that higher statistical power was achieved when responses were 100% double scored in comparison to only 10% being double scored. With a consideration on balancing both Type I error and statistical power, I recommend the measure-residual correlation and the expected-residual correlation for detecting the centrality effect. I suggest using the point-measure correlation only when responses are 100% double scored. The four parameters evaluated in the experimental simulations had different impact on the accuracy of rater classification. The results show that improving the classification accuracy for non-effect raters may come at a cost of reducing the classification accuracy for effect raters. Some simple guidelines for the expected impact of classification accuracy when a higher-order interaction exists summarized from the analyses offer a glimpse of the “pros” and “cons” in adjusting the magnitude of the parameters when we evaluate the impact of the four experimental parameters on the outcomes of rater classification.
|
| Files |
| Filename |
Size |
Approximate Download Time
(Hours:Minutes:Seconds) |
| 28.8 Modem |
56K Modem |
ISDN (64 Kb) |
ISDN (128 Kb) |
Higher-speed Access |
| |
Yue_X_D_2011.pdf |
1.05 Mb |
00:04:52 |
00:02:30 |
00:02:11 |
00:01:05 |
00:00:05 |
|