PDF

Description

Exploratory Data Analysis (EDA) is a necessary and vital part of data science that usually occurs in computational notebooks with tools such as pandas. One of the most popular tools for EDA is Lux which visualizes data stored in pandas DataFrames in a dashboard displayed in a Jupyter Notebook. However, as datasets become larger in size, the computation necessary to compute these visualizations becomes larger as well, slowing down Lux. We consider the use of sampling to accelerate the computation required for generating visualizations. We analyzed how Lux performs on large datasets and determined what parts of Lux could be accelerated using data sampling. We then integrate our sampling method into Lux and demonstrate a significant speedup while not compromising the quality of the visualizations produced by Lux.

Details

Files

Statistics

from
to
Export
Download Full History