Top-down Bird’s Eye View (BEV) maps are a popular representation for ground robot navigation due to their richness and flexibility for downstream tasks. While recent methods have shown promise for predicting BEV maps from First-Person View (FPV) images, their generalizability is limited to small regions captured by current autonomous vehicle-based datasets. In this context, we show that a more scalable approach towards generalizable map prediction can be enabled by using two large-scale crowd-sourced mapping platforms, Mapillary for FPV images and OpenStreetMap for BEV semantic maps.
We introduce Map It Anywhere (MIA), a data engine that enables seamless curation and modeling of labeled map prediction data from existing open-source map platforms. Using our MIA data engine, we display the ease of automatically collecting a 1.2 million FPV & BEV pair dataset encompassing diverse geographies, landscapes, environmental factors, camera models & capture scenarios. We further train a simple camera model-agnostic model on this data for BEV map prediction. Extensive evaluations using established benchmarks and our dataset show that the data curated by MIA enables effective pretraining for generalizable BEV map prediction, with zero-shot performance far exceeding baselines trained on existing datasets by 35%. Our analysis highlights the promise of using large-scale public maps for developing & testing generalizable BEV perception, paving the way for more robust autonomous navigation.
Read the paper: https://arxiv.org/abs/2407.08726
About the Speakers
Cherie Ho is a final year robotics PhD student at Carnegie Mellon University working with Prof. Sebastian Scherer. Her research interest is in the intersection of field robotics, computer vision, and machine learning to develop robots that can continuously learn in new scenarios. She has developed generalizable, adaptive, and uncertainty-awarerobot algorithms for dynamic real-world applications. Applications include high-speed offroad driving, outdoor multi-drone systems, and outdoor wheelchairs. She is a recipient of Croucher Scholarship for Doctoral Study.
Jiaye (Tony) Zou is a senior CS undergraduate from Carnegie Mellon University. He is interested in multi-modal perception in dynamic real-world environments. He has developed MapItAnywhere, a large-scale data engine and baseline model for generalizable Bird’s Eye View mapping.
Omar Alama is starting his PhD at Carnegie Mellon University ECE in Fall 2024 advised by Prof. Sebastian Scherer and working in the Airlab at the CMU Robotics Institute. His research interests revolve around classical and modern deep-learning-based computer vision, which is used to build generalizable and efficient perception systems.
#computervision #ai #artificialintelligence #machinevision #machinelearning #datascience #NeurIPS #NeurIPS2024