Babak Namazi, MS1, Katerina 0 Wells, MD2, Steven G Leeds, MD2, Ganesh Sankaranarayanan, PhD2, Venkat Deverajan, PhD1, James W Fleshman, MD2. 1University of Texas at Arlington, 2Baylor University Medical Center
Objective of the Technology: Monitoring the usage of different surgical instruments in a laparoscopic procedure is a critical part of automated systems for tracking surgical actions. The objective of our technology is to develop the LapTool-NetTM, a deep learning based system for automated detection of surgical tools in a laparoscopic procedure.
Description of the Technology and its Application: LapTool-Net is a multi-label classifier that was designed to detect the presence of surgical tools in a laparoscopic video. It is a deep CNN-RNN (Convolutional Neural Network – Recurrent Neural Network) model, where the CNN is a deep feed-forward artificial neural network that is trained to learn high-level features in still frames of the video, while the RNN is trained using a sequence of frames from the videos to learn the temporal features. The unique aspect of the LapTool-Net is the awareness of the context of the usage of the surgical tools, for example, the low probability of the co-occurrence of a laparoscopic shear and a clip applicator, thereby reducing the number of combinations of tools for detection. A separate RNN was trained to model the specific pattern of usage of the tools after training the CNN-RNN model, based on the sequential order of the surgical tasks.
The LapTool-Net can be used offline for the assessment of the recorded videos, information retrieval for education and operative summary report generation. It can also be used in real-time for monitoring surgical actions to prevent errors and provide instantaneous feedback for quality improvement.
Preliminary Results: We used the publicly available M2CAI tool detection challenge dataset, which contains 15 videos of laparoscopic cholecystectomy surgeries. Images from the videos (one frame per second) were extracted and manually labeled for seven tools; bipolar, clipper, grasper, hook, irrigator, shears, and specimen bag. The training set contained 10 videos that were used to train the CNN-RNN model based on an inception-V1 and Gated Recurrent Unit (GRU) architectures. The other 5 videos were used for validating the performance of the model. In order to address the class imbalance due to the high variations in the time that each tool is being used, we employed the under-sampling method. Based on our knowledge of the co-occurrence of tools, the number of combinations for detection was reduced from 128 to 15. Frame level accuracy was the metric for evaluation, which considers a frame as correct only if all of the tools in the frame are correctly predicted. Our current accuracy is 80.96% and the average per-class F1-score (the harmonic average of the precision and recall) is 89.01%, which is by far the best performance reported for any automated tool detection system on the M2CAI dataset. The processing time for each frame is about 0.01 seconds, which makes it suitable for real-time applications.
Future Directions: The LapTool-Net is based on a supervised learning method. In the future, we plan to develop a semi-supervised learning method using a larger training dataset. In addition, we will test the performance of the proposed system on different laparoscopic procedures.

Presented at the SAGES 2017 Annual Meeting in Houston, TX.
Abstract ID: 98856
Program Number: ETP725
Presentation Session: Emerging Technology Poster Session (Non CME)
Presentation Type: Poster
