City-scale image retrieval and tagging is an important problem with many applications in localization and augmented reality. The basic idea is to match a user generated query image against a database of tagged images. Once a correct match is retrieved, pose information associated with the retrieved image can be used to augment the query image. In this report we describe an approach to large scale image retrieval in urban environment by taking advantage of coarse position estimates available on many mobile devices today, e.g. via GPS or cell tower triangulation. By partitioning the large image database for a given geographic region into a number of overlapping cells each with its own prebuilt search and retrieval structure, we avoid the performance degradation faced by many city-scale retrieval systems. Typically, both retrieval speed and retrieval accuracy decreases as the size of the database grows. Once a correct image match is found, a set of point to point correspondences between query and retrieved image is used to compute a homography transformation which can then be used to transfer tag information associated with points in the database image onto the query image with near pixel-level accuracy. An example of a tagged query outputted by our system and its corresponding database match is shown in Figure 1. We demonstrate retrieval results over a ~12,000 image database covering a 1 km2 area of downtown Berkeley and illustrate tag transfer results over the same dataset.




Download Full History