Truly general-purpose vision systems require pre-training on diverse and representative visual datasets. The “dataset classification” experiment reveals that modern large-scale visual datasets are still very biased: neural networks can achieve excellent accuracy in classifying which dataset an image is from. However, the concrete forms of bias among these datasets remain unclear. In this talk, I will present a framework to identify the unique visual attributes distinguishing these large-scale datasets. Read the paper: https://arxiv.org/abs/2412.01876 About the Speaker Boya Zeng is an undergraduate student at the University of Pennsylvania. He is currently working with Prof. Zhuang Liu at Princeton University on visual datasets and generative models.