Cross-view image geo-localization aims to determine the locations of street-view query images by searching in a GPS-tagged reference image database from aerial view. One fundamental challenge is the dramatic view-point/domain difference between the street-view query images and aerial-view reference images. Recent works have made great progress on bridging the domain gap with advanced deep learning techniques and geometric prior knowledge, i.e. the query is aligned at the center of one aerial-view reference image (spatial alignment) and the orientation relationship between the two views is known (orientation alignment). However, such prior knowledge of the geometry correspondence of the two views is usually not available for real-world scenarios. In this dissertation, we first explore how current model would perform in real-world scenarios, where the spatial or orientation alignment is not available and geometric prior knowledge (e.g. polar transform) does not work well. For spatial alignment, we collect a new dataset with real-world protocol for this scenario and propose a better solution, as the first to explore multiple reference correspondence and GPS offset prediction beyond image-level retrieval. For orientation alignment, we demonstrate better metric learning techniques for this scenario and propose to estimate the orientation without explicit supervision. Then we propose a novel visual explanation method as well as the first quantitative analysis of visual explanation of deep metric learning to gain deeper understanding about the model with improved orientation estimation. Finally, we propose the first pure transformer-based method which does not rely on geometric prior knowledge (polar transform) and generalizes well on real-world scenarios w/o orientation or spatial alignment. We also provide quantitative measurement on computational cost to show that our model is more efficient than previous methods. In summary, we push cross-view image geo-localization toward real-world application with more realistic settings, higher accuracy, lower computational cost and better understanding/interpretation.
If this is your thesis or dissertation, and want to learn how to access it or for more information about readership statistics, contact us at STARS@ucf.edu
Doctor of Philosophy (Ph.D.)
College of Engineering and Computer Science
Length of Campus-only Access
Doctoral Dissertation (Campus-only Access)
Zhu, Sijie, "Torward Real-world Cross-view Image Geo-localization" (2022). Electronic Theses and Dissertations, 2020-. 1709.
Restricted to the UCF community until June 2024; it will then be open access.