Reliability of GOALS Scores using Generalizability Theory

E Bilgic, Y Watanabe, MD, L Lee, MD, M C Vassiliou, MD. Steinberg-Bernstein Centre for Minimally Invasive Surgery and Innovation, McGill University Health Centre, Montreal, Quebec, Canada.

The Global Operative Assessment of Laparoscopic Skills (GOALS) is an objective assessment tool that had been used to assess performance in the operating room. Its reliability has been demonstrated using traditional methods such as inter-rater reliability. However, these methods cannot assess if the reliability of an instrument is affected by other factors such as different procedures, the attending or the difficulty of the procedure. Generalizability Theory (GT) calculates the independent variability attributed to external factors, and therefore can assess how these factors may affect reliability while also incorporating test-retest and inter-rater reliability. The purpose of this study was to use GT to determine if external factors affect the reliability of GOALS scores.

A sample of general surgery residents and attending staff at a single teaching institution between 2003 and 2009 underwent GOALS assessment (5 items, maximum score 25) after laparoscopic cholecystectomy, inguinal, or incisional hernia repairs. Participants must have undergone two GOALS assessments completed by 2 independent trained observers. Inter-rater reliability at each case (traditional method) was determined by intra-class coefficient (ICC). Reliability was also calculated using GT, with participants, raters, and occasions included as factors, along with their interaction terms. Factors such as case differences, the attending or the case difficulty were included in the definition of occasion. The independent effect of each factor is expressed as a percentage of total adjusted variance. GT analysis was performed using G-STRING 4, and other analysis using SPSSv20.

A total of 17 subjects (11 cholecystectomy, 3 inguinal, 3 incisional) were included in this study, for a total of 68 GOALS assessments. The mean GOALS score at baseline was 17.3±3.6, and 18.8±3.7 at the second assessment (p=0.08). The mean time to complete both assessments was 32.2 days. The inter-rater reliability using traditional methods was good (ICC = 0.85). However, GT analysis reported that participants only accounted for 63.4% of the total variability in GOALS scores, occasion accounted for 4%, while the interaction between participants and occasion accounted for 25.0% which means all factors related to occasion (the different cases, the attending and the difficulty) contribute to the reliability of the GOALS score. The high impact of the interaction of participants and occasion means that there is a tendency for surgeon performance to be affected by a particular case. The raters did not account for any variability in GT.

The reliability of GOALS assessments is dependent on other factors apart from participants and raters. When assessing operative performance using GOALS, these factors could be taken into account using GT.

View Poster

« Return to SAGES 2014 abstract archive