Description
Exploratory Data Analysis (EDA) largely takes place in computational notebooks and includes the use of common packages such as matplotlib, pandas, and scikit learn. Though every user EDA process is unique, there exist some common patterns among different analysis sessions that are difficult to describe and quantify. To study these patterns, we propose categorization for functions from common data science packages and parse through a sample of notebooks to examine the frequency and sequencing of function calls. On observing the challenges users have with visualization, we turned to work on LUX, a framework to accelerate the visualization of pandas dataframes which are widely used by individuals across a spectrum of industries. We built a system to send and receive logs of user interactions before conducting user studies with LUX to examine how new users incorporate the visualization assistant into their workflow. Through our study of notebook and LUX user logs, we help uncover typical patterns of data exploration, both with and without visualization tools like LUX.