Ethan Weiss1, Silvana Perretta, MD2, Ilay Habaz, BSc1, Dennis Tchoudnovski1, Pietro Mascagni, MD2, Ludovica Guerriero, MD2, Else Van Der Velden2, Fabio Longo, MD2, Lee L Swanstrom, MD2, Allan Okrainec, MD1, Eran Shlomovitz, MD1. 1Division of General Surgery, University Health Network, Toronto, Canada, 2Institute for Image Guided Surgery IHU-Strasbourg, Strasbourg, France
OBJECTIVE: The increasing use of flexible endoscopy as a minimally invasive tool has generated demand for a model which can easily train users on basic flexible endoscopic skills. The Basic Endoscopic Skills Training (BEST) box has been validated as a low-cost flexible endoscopy trainer. The BEST box consists of a conversion kit to the Fundamentals of Laparoscopic Surgery (FLS) box whereby 6 different flexible endoscopy skills are simulated. This pilot study aims to investigate the inter-rater reliability (IRR) as well as the consistency between live and delayed video scoring.
METHODS: 40 participants were rated by several live proctors (L) and on video by two trained, remote proctors (V1 and V2). All proctors used the same previously validated BEST box scoring formula. To assess inter-rater reliability, interclass correlation coefficients (ICC) with absolute agreement and two-way mixed models were used. ICC values >0.9 and >0.8 suggest excellent and great reliability, respectively.
RESULTS: Inter-rater reliability between the delayed video raters, as well as between the delayed video raters and the live raters is excellent (Table 1). ICCs remain consistently high when separately analysing the individual tasks including absolute time and errors.
CONCLUSION: The scoring system for the BEST box demonstrates high inter-rater reliability and remains consistent both when scored live as well as over video. The results of this study add further evidence to the validity of the BEST box as an objective tool for endoscopic evaluation. The ability for accurate video scoring adds future possibilities for remote scoring and telementoring.
Two or three raters | ICC (95% CI) |
V1 vs V2 (n = 39) | 0.96*** (0.93 – 0.98) |
V1 vs L (n = 36) | 0.96*** (0.91 – 0.98) |
V2 vs L (n = 37) | 0.93*** (0.86 – 0.96) |
V1 vs V2 vs L (n = 36) | 0.97*** (0.94 – 0.98) |
Note: *** p < .001
Table 1. Interclass correlation coefficients for inter-rater reliability of BEST scores.
Presented at the SAGES 2017 Annual Meeting in Houston, TX.
Abstract ID: 95404
Program Number: P405
Presentation Session: Poster Session (Non CME)
Presentation Type: Poster