Motivated by how humans perceive scenes, we propose the Multiview Scene Graph (MSG) as a general topological scene representation. MSG constructs a place+object graph from unposed RGB images and we provide novel metrics to evaluate the graph quality. We combine visual place recognition and object association to build MSG in one Transformer decoder model. We believe MSG can connect dots across classic vision tasks to promote spatial intelligence and open new doors for topological 3D scene understanding.
Read the paper: https://arxiv.org/abs/2410.11187
About the Speaker:
Juexiao Zhang is a second-year PhD student in computer science at NYU Courant, advised by Professor Chen Feng. He is interested in learning scene representations that are useful for robots to understand the world and interact with it.
#computervision #ai #artificialintelligence #machinevision
#machinelearning #datascience