The standard error or measurement is a more appropriate measure of quality for postgraduate medical assessments than is reliability: an analysis of MRCP(UK) examinations

Tighe, Jane; McManus, I. C.; Dewhurst, Neil G.; Chis, Liliana; Mucklow, John
January 2010
BMC Medical Education;2010, Vol. 10, p40
Academic Journal
Background: Cronbach's alpha is widely used as the preferred index of reliability for medical postgraduate examinations. A value of 0.8-0.9 is seen by providers and regulators alike as an adequate demonstration of acceptable reliability for any assessment. Of the other statistical parameters, Standard Error of Measurement (SEM) is mainly seen as useful only in determining the accuracy of a pass mark. However the alpha coefficient depends both on SEM and on the ability range (standard deviation, SD) of candidates taking an exam. This study investigated the extent to which the necessarily narrower ability range in candidates taking the second of the three part MRCP(UK) diploma examinations, biases assessment of reliability and SEM. Methods: a) The interrelationships of standard deviation (SD), SEM and reliability were investigated in a Monte Carlo simulation of 10,000 candidates taking a postgraduate examination. b) Reliability and SEM were studied in the MRCP(UK) Part 1 and Part 2 Written Examinations from 2002 to 2008. c) Reliability and SEM were studied in eight Specialty Certificate Examinations introduced in 2008-9. Results: The Monte Carlo simulation showed, as expected, that restricting the range of an assessment only to those who had already passed it, dramatically reduced the reliability but did not affect the SEM of a simulated assessment. The analysis of the MRCP(UK) Part 1 and Part 2 written examinations showed that the MRCP(UK) Part 2 written examination had a lower reliability than the Part 1 examination, but, despite that lower reliability, the Part 2 examination also had a smaller SEM (indicating a more accurate assessment). The Specialty Certificate Examinations had small Ns, and as a result, wide variability in their reliabilities, but SEMs were comparable with MRCP(UK) Part 2. Conclusions: An emphasis upon assessing the quality of assessments primarily in terms of reliability alone can produce a paradoxical and distorted picture, particularly in the situation where a narrower range of candidate ability is an inevitable consequence of being able to take a second part examination only after passing the first part examination. Reliability also shows problems when numbers of candidates in examinations are low and sampling error affects the range of candidate ability. SEM is not subject to such problems; it is therefore a better measure of the quality of an assessment and is recommended for routine use.


Related Articles

  • Do Medical Students Prefer a Career in Community Medicine? Kar, Sitanshu Sekhar; Ramalingam, Archana; Premarajan, K. C.; Roy, Gautam // International Journal of Preventive Medicine;Nov2014, Vol. 5 Issue 11, p1468 

    Background: Inadequate attention to management and institutional reforms is an important barrier to achieving universal health coverage. Skilled and motivated public health managers in adequate numbers are an important requirement to overcome this hurdle. However, what are the career choices of...

  • Assessing the Communication Skills of Doctors in Training: Reliability and Sources of Error. Keen, A.J.A.; Klein, S.; Alexander, D.A. // Advances in Health Sciences Education;2003, Vol. 8 Issue 1, p5 

    Medical examining bodies now commonly assess candidates' communication skills. However, there are a number of within-case sources of error that can mean examinations have poor reliability and validity. The aims of this study were to determine the main within-case sources of error and to identify...

  • The long case versus objective structured clinical examinations. Norman, Geoff // BMJ: British Medical Journal (International Edition);3/30/2002, Vol. 324 Issue 7340, p748 

    Editorial. Comments on a study which compared long case examinations with objective structured clinical examinations to determine the competence of graduates of medicine. Concern about the objectivity of long case exams; How long case exams performed as well as objective structured clinical...

  • Medical education.  // Clinical & Investigative Medicine;Aug97 Supplement, Vol. 20, pS50 

    Presents an abstract of the research manuscript `Reliability of student ratings of examiner feedback during an undergraduate OSCE,' by S. Humphrey-Murto. Objective Structured Clinical Examination (OSCE) in Canada.

  • The Association between Interview and Written Exam in Graduate Student Admission of Medical Education and Rehabilitation Management. Dehnavi, A. Mehri // Iranian Journal of Medical Education;Autumn2008, Vol. 8 Issue 2, p1 

    Introduction: In 2007 and the years before, the Ministry of Health and Medical Education invited MS volunteers in different disciplines such as rehabilitation management and medical education for interview in addition to written exam. This study tried to determine the role of interview in...

  • Dahlberg formula - a novel approach for its evaluation. de Souza Galvão, Maria Christina; Sato, João Ricardo; Coelho, Edvaldo Capobiango // Dental Press Journal of Orthodontics;Jan/Feb2012, Vol. 17 Issue 1, p115 

    Introduction: The accurate evaluation of error of measurement (EM) is extremely important as in growth studies as in clinical research, since there are usually quantitatively small changes. In any study it is important to evaluate the EM to validate the results and, consequently, the...

  • Suppression of Systematic Errors of Electronic Distance Meters for Measurement of Short Distances. Braun, Jaroslav; Štroner, Martin; Urban, Rudolf; Dvořáček, Filip // Sensors (14248220);Aug2015, Vol. 15 Issue 8, p19264 

    In modern industrial geodesy, high demands are placed on the final accuracy, with expectations currently falling below 1 mm. The measurement methodology and surveying instruments used have to be adjusted to meet these stringent requirements, especially the total stations as the most often used...

  • Uncertainty analysis for evaluating the accuracy of snow depth measurements. Lee, J.-E.; Lee, G. W.; Earle, M.; Nitu, R. // Hydrology & Earth System Sciences Discussions;2015, Vol. 12 Issue 4, p4157 

    A methodology for quantifying the accuracy of snow depth measurement are demonstrated in this study by using the equation of error propagation for the same type sensors and by compariong autimatic measurement with manual observation. Snow depth was measured at the Centre for Atmospheric Research...

  • Pick-N multiple choice-exams: a comparison of scoring algorithms. Bauer, Daniel; Holzer, Matthias; Kopp, Veronika; Fischer, Martin R. // Advances in Health Sciences Education;May2011, Vol. 16 Issue 2, p211 

    To compare different scoring algorithms for Pick-N multiple correct answer multiple-choice (MC) exams regarding test reliability, student performance, total item discrimination and item difficulty. Data from six 3rd year medical students' end of term exams in internal medicine from 2005 to 2008...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics