REMOTE: Real-time Ego-motion Tracking for Various Endoscope via Multimodal Visual Feature Learning

Liangjing Shao1,2, Benshuang Chen1,2, Shuting Zhao1,2, Xinrong Chen1,2,*,
1Academy for Engineering & Technology, Fudan University 2Shanghai Key Laboratroy for Medical Image Computing and Computer Assisted Intervention, Fudan University
*Corresponding Author: chenxinrong[at]fudan.edu.cn

Accepted by ICRA 2025

How the pipeline of the proposed framework works to perform real-time ego-motion tracking.

Abstract

Real-time ego-motion tracking for endoscope is a significant task for efficient navigation and robotic automation of endoscopy. In this paper, a novel framework is proposed to perform real-time ego-motion tracking for endoscope. Firstly, a multi-modal visual feature learning network is proposed to perform relative pose prediction. In particular, the scene feature is extracted from the current observation and the previous frame. Simultaneously, the motion feature is extracted from the optical flow, which is obtained by a pretrained high-speed network. Moreover, a feature extractor is designed based on attention mechanism to extract multi-dimensional information from the concatenation of two continuous frames. At the end of the network, a pose decoder is proposed to predict the pose transformation from the concatenated feature map. At last, the absolute pose of endoscope is calculated based on relative poses. In the experiments, the proposed method outperforms state-of-the-art methods on three datasets of various endoscope. Besides, based on experiments on three datasets, the inference speed of the proposed method is over 30 frames per second, which achieves the real-time requirement.

Examples of Real-time Ego-motion Tracking on NEPose Dataset.

MY ALT TEXT
MY ALT TEXT
MY ALT TEXT
MY ALT TEXT

Examples of Real-time Ego-motion Tracking on SimCol Dataset.

MY ALT TEXT
MY ALT TEXT
MY ALT TEXT