Babak Namazi, BS1, Ganesh Sankaranarayanan, PhD2, Venkat Devarajan, PhD1, James W Fleshman, MD2. 1University of Texas at Arlington, 2Baylor University Medical Center
Objective of the Technology: Assessment of surgical performance through video is a time consuming process that requires preprocessing of videos to short snippets and manual observation for scoring. Critical View of Safety (CVS) is a method of visualization to clearly identify cystic structures. The extraction, examination and understanding of CVS is advocated to reduce the incidence of bile duct injuries. CVS is considered achieved, when the view clearly shows only two structures, the cystic duct and the artery entering the gallbladder with cystic plate exposed and, the hepatocystic triangle is cleared of fat and fibrous tissue. Our objective is to develop a deep learning system to automatically detect the presence of CVS in laparoscopic cholecystectomy videos for surgical assessment and quality improvement and, video-based coaching of residents.
Description of the Technology and its application: Convolutional Neural Network (CNN) is a deep feed-forward artificial neural network that can be trained to extract high-level features in an image and has been successfully used in face and object recognition and, more recently, in self-driving cars. Using a large set of labeled images (training set), the CNN is trained to detect specific features, which can then be tested on unlabeled test or real world images. For surgery, a CNN based deep learning system can be applied to automatically and objectively assess surgery videos for quality improvement, video-based coaching and the technical and cognitive milestone assessment of residents.
Preliminary Results: We used the publicly available cholec80 dataset, which contains 80 videos of laparoscopic cholecystectomy surgeries performed by 13 surgeons. Images from the video (20 images per second) were extracted and manually labeled as either 1 (CVS) or 0 (no CVS). The training set contained 50 videos that were used to train the CNN based on an inception-V3 architecture. Ten videos were used as the validation set for tuning the hyper-parameters of the CNN and the final 20 videos for testing. All the videos in training, validation and test sets included instances, where CVS was not achieved. Both over-sampling and under-sampling were used to deal with the issue of high imbalance in training set due to lower incidence of not obtaining CVS. True positives (TP), true Negatives (TN), False Positives (FP) and False Negatives (FN) were defined. Our current accuracy (TP+TN/TP+FP+FN+TN) of the model is 94.6% with a precision (TP/TP+FP) of 70% and recall (TP/TP+FN) of 65.5%. For the 20 test videos, the results were obtained automatically in real-time, which is a major improvement to manual inspection by trained examiners or experts.
Future Directions: Our training data sets do not have enough instances of “not achieving CVS”. Therefore, we plan to get more such videos to improve the training. We will also use a recurrent neural network (RNN) along with CNN to take the temporal features in the video into account such as a) end of dissection and b) beginning of clipping to search for CVS. We also plan to validate our results using clinical experts on videos different from the cholec80 dataset.
Presented at the SAGES 2017 Annual Meeting in Houston, TX.
Abstract ID: 91134
Program Number: ETP877
Presentation Session: Emerging Technology iPoster Session (Non CME)
Presentation Type: Poster