The prevalence of large genomics datasets has made the the need to explore this data more important. Large sequencing projects like the 1000 Genomes Project, which reconstructed the genomes of 2,504 individuals sampled from 26 populations, have produced over 200TB of publically available data. Meanwhile, existing genomic visualization tools have been unable to scale with the growing amount of larger, more complex data. This difficulty is acute when viewing large regions (over 1 megabase, or 1,000,000 bases of DNA), or when concurrently viewing multiple samples of data. While genomic processing pipelines have shifted towards using distributed computing techniques, such as with ADAM, genomic visualization tools have not.
In this work we present Mango, a scalable genome browser built on top of ADAM that can run both locally and on a cluster. Mango presents a combination of different optimizations that can be combined in a single application to drive novel genomic visualization techniques over terabytes of genomic data. By building visualization on top of a distributed processing pipeline, we can perform visualization queries over large regions that are not possible with current tools, and decrease the time for viewing large data sets. Mango is part of the Big Data Genomics project at University of California-Berkeley and is published under the Apache 2 license. Mango is available at https://github.com/bigdatagenomics/mango
Interactive Exploration on Large Genomic Datasets
Researchers may make free and open use of the UC Berkeley Library’s digitized public domain materials. However, some materials in our online collections may be protected by U.S. copyright law (Title 17, U.S.C.). Use or reproduction of materials protected by copyright beyond that allowed by fair use (Title 17, U.S.C. § 107) requires permission from the copyright owners. The use or reproduction of some materials may also be restricted by terms of University of California gift or purchase agreements, privacy and publicity rights, or trademark law. Responsibility for determining rights status and permissibility of any use or reproduction rests exclusively with the researcher. To learn more or make inquiries, please see our permissions policies (https://www.lib.berkeley.edu/about/permissions-policies).