ESTIMATING THE MAGNITUDE OF MEASUREMENT ERROR THROUGH THE APPLICATION OF GENERALIZABILITY THEORY: A CASE OF REMARKED MSCE MATHEMATICS PAPER I
Abstract
Reliability of test scores is a desirable element in any measurement and so too the method used to generate such scores. Numerous efforts have been put in place by the Malawi National Examinations Board to ensure that errors are reduced to the barest levels possible or eliminated in the scoring of candidates‟ scripts. In 2010 a new system of marking known as the Conveyor Belt was adopted. Ex post facto analysis carried so far in different subjects by MANEB has clearly revealed that errors have been reduced to some extent. Errors emanate from various sources of measurement, so in order to quantify the amount of error Generalizability Theory was applied in the estimation of measurement error attributable to facets of item and occasion. This is a
departure from the confines of Classical Test Theory (CTT) in which measurement error is left undifferentiated. The study sought to unpack the measurement error by estimating the variance components of items, occasion and their interaction. A sample of 200 candidates was randomly selected in the remarked Mathematics paper I at MSCE level. Scores at item level were collected for the 20 items in the remarked scripts where initial and remarked scores were recorded. Data analysis was done through the G.2.sps program that uses SPSSVARCOMPS procedures in computing the variance components that is ideal for Generalizability Theory (GT) analyses for balanced data. A two facet fully crossed design was the main focus in which two facets of item and occasion were of central focus and their interactions. Results yielded a Generalizability coefficient of 0.929, which shows a strong correlation of agreement between the two facets hence there is not much difference between initial marking and remarking of examination. However, measurement facet of item accounted for 14% of the total variance .This potential difference could be due to the number of items in a test as well as varying levels of item difficulty. The occasion facet had negligible percentage of 0 % accounted for the total variability. This reveals that remarking does not necessarily change much on the candidate score. The random error (residual) term had 6 % accounted for in the total variation, that is unmeasurable error, this could be due to other confounding factors other than those identified in the study.
