Young-Eun Lee1, Haliza Mat Husin2, Maria Paola Forte2, Seong-Whan Lee, PhD3, Katherine J Kuchenbecker, PhD2. 1Department of Brain and Cognitive Engineering, Korea University, 2Max Planck Institute for Intelligent Systems, 3Department of Artificial Intelligence, Korea University
Objective: Robot-assisted minimally invasive surgery enhances the surgeon's precision, dexterity, and vision compared to traditional minimally invasive surgery. However, current teleoperated surgical robots do not allow the surgeon at the console to feel the forces exerted by the instruments. Haptic feedback of these forces could improve surgical performance and reduce intra-operative tissue damage. Providing haptic feedback requires estimation of instrument contact forces, which are technically difficult to sense. The objective of this study is to determine whether vision-based features can be used to estimate instrument contact forces.
Methods:
Data acquisition. With an external RGB camera, we recorded video data of a person manually exerting forces on a digital scale with a da Vinci surgical instrument held at different angles. The digital scale provided the ground truth forces. Four videos (one at 30°, two at 60° and one at 80°, with approximately 700 frames per video) were used to train our machine learning algorithm, and one video at 60° was used to test the developed algorithm.
Feature extraction. We used computer vision methods to extract the instrument deflection and instrument tip deformation from each frame. The instrument deflection is the difference between the distal instrument angle and its initial angle. The instrument tip deformation is the small region-of-interest image where the deformation of the robotic instrument occurs when exerting forces on the surface.
Prediction. We applied several deep learning algorithms: long short-term memory (LSTM) with three hidden layers, convolutional neural networks (CNN) with two hidden layers, and neural networks (NN) with two hidden layers. Using the extracted features and initial angle, we estimated instrument forces using two combined architectures: LSTM-NN and LSTM-NN-CNN. We evaluated the prediction using root mean square error (RMSE) and R2.
Preliminary results: Using the LSTM-NN architecture with the initial angle and instrument deflection as features, we obtained an RMSE of 0.728 N and R2 value of 0.858 on the test video. The LSTM-NN-CNN architecture with initial angle, instrument deflection, and tip deformation achieved an RMSE of 0.703 N and R2 of 0.892. The peak forces were more accurately predicted when tip deformation was included, although the overall RMSE and R2 were not significantly different between the two architectures.

Conclusions and future directions: This proof-of-concept study showed that the force exerted by a robotic instrument can be estimated somewhat accurately using vision-based features in a simplified environment. Initial instrument angle, instrument deflection, and instrument tip deformation provide important information related to the applied force. In the future, we plan to use a similar approach with materials closer to human tissue and more varied instrument movements to visually estimate contact forces for haptic feedback.
View Poster
This abstract was accepted for Poster presentation at the 2020 SAGES Virtual Meeting in the topic. Its program number was: ETP610 and its Abstract ID was: 106529
