MENU

Fun & Interesting

Bill Engels & Chris Fonnesbeck - Making Gaussian Processes Useful | PyData Global 2024

PyData 809 3 weeks ago
Video Not Working? Fix It Now

www.pydata.org The goal of this tutorial is to make Gaussian processes (GPs) useful. In most practicing data scientists' mental map of modeling and machine learning techniques, Gaussian processes are an advanced approach that sit alone on an island, perhaps with narrow use cases like Bayesian optimization. Most books and other material on GPs tend to focus on theoretical aspects, and it can be hard to close the gap between the theory and putting those ideas into practice to solve real problems in a reasonable amount of time. This tutorial is split into two parts. The first part introduces Bayesian modeling, focusing on hierarchical modeling and the concept of partial pooling. We’ll use the classic example of estimating the batting average of a group of baseball players as motivation. Then we’ll introduce GPs as a useful generalization of hierarchical modeling for the common situation where our groups aren’t distinct categories. Instead of thinking of each baseball player as completely distinct and exchangeable entities, we can use a GP to partially pool information locally by also considering each player's age. Finally we’ll close the first part by connecting back to the more common introduction to GPs as infinite dimensional multivariate normals. The second part of the tutorial will give an overview of practical tips and tricks for modeling with GPs using the open source Python package PyMC. Specifically, how to address the two big issues to using GPs in practice: scaling and identifiability. We’ll discuss useful approximations like the HSGP and when to apply them, advice on when to use splines, and finally when you need to step out of a PPL like PyMC or Stan to a GP specific library like GPFlow or GPyTorch. We’ll do so with a couple motivating examples. The audience should have some familiarity with basic ML and statistics concepts, such as probability distributions, normal and multivariate normal distributions, correlation and covariance, and linear regression - but the talk will aim to be non-technical and the goal will be introduce GPs and give people the tools they need to use them effectively in practice. PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps

Comment