ROBUST VISUAL-INERTIAL ODOMETRY IN DYNAMIC ENVIRONMENTS USING SEMANTIC SEGMENTATION FOR FEATURE SELECTION
Camera based navigation in dynamic environments with high content of moving objects is challenging. Keypoint-based localization methods need to reliably reject features that do not belong to the static background. Here, traditional statistical methods for outlier rejection quickly reach their limits. A common approach is the combination with an inertial measurement unit for visual-inertial odometry. Also, deep learning based semantic segmentation was recently successfully applied in camera based localization to identify features on common objects. In this work, we study the application of mask-based feature selection based on semantic segmentation for robust localization in high dynamic environments. We focus on visual-inertial odometry, but similarly investigate a state-of-the-art pure vision-based method as baseline. For a versatile evaluation, we use challenging self-recorded datasets based on different sensor systems. This includes a combined dataset of a real world system and its synthetic clone with a large number of humans for in-depth analysis. We further deploy large-scale datasets from pedestrian navigation in a mall with escalator scenes and vehicle navigation during the day and at night. Our results show that visual-inertial odometry performs generally well in dynamic environments itself, but also shows significant failures in challenging scenes, which are prevented by using the segmentation aid.