Talk by Dr. Leilani Battle
Title:
Behavior-Driven Optimizations for Big Data Exploration
Abstract:
Before an analyst or organization can leverage their data for effective decision making, they first need to understand it; for example, how the data is structured, what the most relevant data attributes are, and what relationships may exist between these attributes. Systems designed for exploratory data analysis provide the kind of flexible, open-ended environment that analysts need to answer these questions, and exploratory data analysis is often considered the first step in a human-centered data science process. With the abundance of massive datasets in industry and science, analysts need visual exploration systems that can process data fast enough to keep pace with a person’s analytic flow. However, analysts also need systems that make data transformations fast to interpret and analysis results intuitive to explore. Otherwise, analysts run the risk of drawing the wrong conclusions from the data, which can have serious business and societal consequences. To avoid these pitfalls, data science researchers and developers need to understand how an analyst’s actions are driven by her understanding of the data and her exploration goals, requiring a holistic approach to designing data exploration systems that considers the strengths and weaknesses of both humans and systems.
I adopt an integrative approach to systems design that balances both system performance and human performance concerns, not only in terms of data management but also user interface design, perception, and cognition. In this talk, I will discuss ForeCache, a visual exploration system that learns user exploration patterns automatically, and exploits these patterns to pre-fetch data ahead of users as they explore. I will show that ForeCache's pre-fetching techniques provide significant performance benefits compared to existing systems. I will then discuss my work in developing a new performance benchmark to measure the performance of database management systems when supporting exploration interfaces that provide real-time feedback as users interact with their data. Using this benchmark, I show how industry standard DBMSs are currently 4x too slow to support real-time querying, highlighting the need not only for more interaction-focused systems optimizations for data exploration, but also closer collaboration and more explicit sharing of use cases and experiment data between the visualization and database communities. Finally, I discuss my ongoing research to further characterize, optimize, and evaluate data exploration systems to promote more reliable, rigorous, and engaging analyses.
Research: https://www.cs.umd.edu/~leilani/projects.html
Twitter: @leibatt