PDF

Description

Pandas is a popular dataframe manipulation tool used by data scientists. A key problem with Pandas is its inability to scale across cores, which severely limits its ability to deal with big data workloads. In order to keep up with ever larger datasets, data scientists need a dataframe tool that can scale effectively but also retain Pandas’s ease of use. Modin, a drop-in substitute for Pandas, can effectively parallelize dataframe workloads and supports various computational backends, such as Ray, Dask, or Python. In this project we implement another compute backend for Modin: OpenMPI, an implementation of Message Passing Interface. This will allow users to tap into OpenMPI infrastructure to scale up their dataframe processing needs.

Details

Files

Statistics

from
to
Export
Download Full History