DETERMINING PLANE-SWEEP SAMPLING POINTS IN IMAGE SPACE USING THE CROSS-RATIO FOR IMAGE-BASED DEPTH ESTIMATION
With the emergence of small consumer Unmanned Aerial Vehicles (UAVs), the importance and interest of image-based depth estimation and model generation from aerial images has greatly increased in the photogrammetric society. In our work, we focus on algorithms that allow an online image-based dense depth estimation from video sequences, which enables the direct and live structural analysis of the depicted scene. Therefore, we use a multi-view plane-sweep algorithm with a semi-global matching (SGM) optimization which is parallelized for general purpose computation on a GPU (GPGPU), reaching sufficient performance to keep up with the key-frames of input sequences. One important aspect to reach good performance is the way to sample the scene space, creating plane hypotheses. A small step size between consecutive planes, which is needed to reconstruct details in the near vicinity of the camera may lead to ambiguities in distant regions, due to the perspective projection of the camera. Furthermore, an equidistant sampling with a small step size produces a large number of plane hypotheses, leading to high computational effort. To overcome these problems, we present a novel methodology to directly determine the sampling points of plane-sweep algorithms in image space. The use of the perspective invariant cross-ratio allows us to derive the location of the sampling planes directly from the image data. With this, we efficiently sample the scene space, achieving higher sampling density in areas which are close to the camera and a lower density in distant regions. We evaluate our approach on a synthetic benchmark dataset for quantitative evaluation and on a real-image dataset consisting of aerial imagery. The experiments reveal that an inverse sampling achieves equal and better results than a linear sampling, with less sampling points and thus less runtime. Our algorithm allows an online computation of depth maps for subsequences of five frames, provided that the relative poses between all frames are given.