暗能星系

    • 登录
    • 搜索

    chanzuckerberg基金会两个Spark单细胞项目

    单细胞分析
    1
    2
    10
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 anneng 编辑

      Scalable Interactive Analysis of Single-Cell Data with Apache Spark
      https://chanzuckerberg.com/human-cell-atlas/scalable-interactive-analysis-of-single-cell-data-with-apache-spark/
      chanzuckerberg基金会下面的孵化的一个项目
      Project Goal
      To develop a computational infrastructure backend system that enables interactive exploratory analysis on enormous single-cell datasets.

      Results & Resources
      The Laserson group made contributions to existing open source projects, such as Zarr, Scanpy and PyNNDescent. They also developed a number of new projects:

      Zappy, an API exposing a numpy interface that can be pushed down into multiple execution engines and also read and write Zarr data.
      ndarray.scala, a Scala implementation of the “ndarray” that is compatible with reading and writing Zarr data.
      scsearch, an experimental implementation for indexing single-cell data with Elasticsearch.
      Instructions, demos and jupyter notebooks for running select Scanpy operations using distributed computing engines for scalable single-cell analytics.

      1 条回复 最后回复 回复 引用 0
      • A
        anneng 最后由 编辑

        Accelerating Cross-Sample Analysis of Single-Cell Genomic Data with Adam and Apache Spark
        Project Goal
        To build computational tools that enable researchers to harness distributed computing to enable machine learning and interactive data exploration across raw single-cell data.

        Results & Resources
        The Joseph lab’s primary goal was to support the Apache Spark ecosystem to extend their work on hyper scalable workflows and visualization. They pursued a wide number of projects:

        ADAM, a library and command line tool to parallelize genomic data analysis across cluster and cloud computing environments.
        Mango, a distributed visualization tool for visualizing and manipulating large genomic sequencing datasets in a Jupyter notebook.
        Modin, a drop-in replacement for pandas that allows users to interpret large datasets in table format with high throughput and low latency.

        1 条回复 最后回复 回复 引用 0
        • First post
          Last post
        Powered by 暗能星系