<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[chanzuckerberg基金会两个Spark单细胞项目]]></title><description><![CDATA[<p dir="auto">Scalable Interactive Analysis of Single-Cell Data with Apache Spark<br />
<a href="https://chanzuckerberg.com/human-cell-atlas/scalable-interactive-analysis-of-single-cell-data-with-apache-spark/" rel="nofollow ugc">https://chanzuckerberg.com/human-cell-atlas/scalable-interactive-analysis-of-single-cell-data-with-apache-spark/</a><br />
chanzuckerberg基金会下面的孵化的一个项目<br />
Project Goal<br />
To develop a computational infrastructure backend system that enables interactive exploratory analysis on enormous single-cell datasets.</p>
<p dir="auto">Results &amp; Resources<br />
The Laserson group made contributions to existing open source projects, such as Zarr, Scanpy and PyNNDescent. They also developed a number of new projects:</p>
<p dir="auto">Zappy, an API exposing a numpy interface that can be pushed down into multiple execution engines and also read and write Zarr data.<br />
ndarray.scala, a Scala implementation of the “ndarray” that is compatible with reading and writing Zarr data.<br />
scsearch, an experimental implementation for indexing single-cell data with Elasticsearch.<br />
Instructions, demos and jupyter notebooks for running select Scanpy operations using distributed computing engines for scalable single-cell analytics.</p>
]]></description><link>http://an.forum.genostack.com/topic/451/chanzuckerberg基金会两个spark单细胞项目</link><generator>RSS for Node</generator><lastBuildDate>Sat, 13 Jun 2026 12:31:13 GMT</lastBuildDate><atom:link href="http://an.forum.genostack.com/topic/451.rss" rel="self" type="application/rss+xml"/><pubDate>Mon, 22 Nov 2021 06:02:11 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to chanzuckerberg基金会两个Spark单细胞项目 on Mon, 22 Nov 2021 06:20:42 GMT]]></title><description><![CDATA[<p dir="auto">Accelerating Cross-Sample Analysis of Single-Cell Genomic Data with Adam and Apache Spark<br />
Project Goal<br />
To build computational tools that enable researchers to harness distributed computing to enable machine learning and interactive data exploration across raw single-cell data.</p>
<p dir="auto">Results &amp; Resources<br />
The Joseph lab’s primary goal was to support the Apache Spark ecosystem to extend their work on hyper scalable workflows and visualization. They pursued a wide number of projects:</p>
<p dir="auto">ADAM, a library and command line tool to parallelize genomic data analysis across cluster and cloud computing environments.<br />
Mango, a distributed visualization tool for visualizing and manipulating large genomic sequencing datasets in a Jupyter notebook.<br />
Modin, a drop-in replacement for pandas that allows users to interpret large datasets in table format with high throughput and low latency.</p>
]]></description><link>http://an.forum.genostack.com/post/895</link><guid isPermaLink="true">http://an.forum.genostack.com/post/895</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Mon, 22 Nov 2021 06:20:42 GMT</pubDate></item></channel></rss>