Reply to chanzuckerberg基金会两个Spark单细胞项目 on Mon, 22 Nov 2021 06:20:42 GMT

anneng — Mon, 22 Nov 2021 06:20:42 GMT

Accelerating Cross-Sample Analysis of Single-Cell Genomic Data with Adam and Apache Spark
Project Goal
To build computational tools that enable researchers to harness distributed computing to enable machine learning and interactive data exploration across raw single-cell data.

Results & Resources
The Joseph lab’s primary goal was to support the Apache Spark ecosystem to extend their work on hyper scalable workflows and visualization. They pursued a wide number of projects:

ADAM, a library and command line tool to parallelize genomic data analysis across cluster and cloud computing environments.
Mango, a distributed visualization tool for visualizing and manipulating large genomic sequencing datasets in a Jupyter notebook.
Modin, a drop-in replacement for pandas that allows users to interpret large datasets in table format with high throughput and low latency.