chanzuckerberg基金会两个Spark单细胞项目

anneng

Scalable Interactive Analysis of Single-Cell Data with Apache Spark
https://chanzuckerberg.com/human-cell-atlas/scalable-interactive-analysis-of-single-cell-data-with-apache-spark/
chanzuckerberg基金会下面的孵化的一个项目
Project Goal
To develop a computational infrastructure backend system that enables interactive exploratory analysis on enormous single-cell datasets.

Results & Resources
The Laserson group made contributions to existing open source projects, such as Zarr, Scanpy and PyNNDescent. They also developed a number of new projects:

Zappy, an API exposing a numpy interface that can be pushed down into multiple execution engines and also read and write Zarr data.
ndarray.scala, a Scala implementation of the “ndarray” that is compatible with reading and writing Zarr data.
scsearch, an experimental implementation for indexing single-cell data with Elasticsearch.
Instructions, demos and jupyter notebooks for running select Scanpy operations using distributed computing engines for scalable single-cell analytics.

anneng

Accelerating Cross-Sample Analysis of Single-Cell Genomic Data with Adam and Apache Spark
Project Goal
To build computational tools that enable researchers to harness distributed computing to enable machine learning and interactive data exploration across raw single-cell data.

Results & Resources
The Joseph lab’s primary goal was to support the Apache Spark ecosystem to extend their work on hyper scalable workflows and visualization. They pursued a wide number of projects:

ADAM, a library and command line tool to parallelize genomic data analysis across cluster and cloud computing environments.
Mango, a distributed visualization tool for visualizing and manipulating large genomic sequencing datasets in a Jupyter notebook.
Modin, a drop-in replacement for pandas that allows users to interpret large datasets in table format with high throughput and low latency.