暗能星系

    • 登录
    • 搜索

    Snakemake

    其它
    1
    4
    14
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 anneng 编辑

      一,安装
      准备:
      sudo apt update
      sudo apt install python3-venv
      python3 -m venv .
      source bin/activate
      准备:
      sudo apt-get install build-essential
      python setup.py bdist_wheel
      sudo apt-get install python3-dev
      安装:
      pip3 install --upgrade snakemake
      二,简介
      参考胶片:snakemake.pptx
      The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition. Finally, Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment.

      rule targets:
          input:
              "plots/myplot.pdf"
      
      rule transform:
          input:
              "raw/{dataset}.csv"
          output:
              "transformed/{dataset}.csv"
          singularity:
              "docker://somecontainer:v1.0"
          shell:
              "somecommand {input} {output}"
      
      rule aggregate_and_plot:
          input:
              expand("transformed/{dataset}.csv", dataset=[1, 2])
          output:
              "plots/myplot.pdf"
          conda:
              "envs/matplotlib.yaml"
          script:
              "scripts/plot.py"
      

      也支持 k8s 运行:

      snakemake --kubernetes --use-conda --default-remote-provider $REMOTE --default-remote-prefix $PREFIX
      

      https://snakemake.readthedocs.io/en/stable/

      1 条回复 最后回复 回复 引用 0
      • A
        anneng 最后由 编辑

        https://docs.nersc.gov/jobs/workflow/snakemake/
        Strengths of Snakemake¶
        Suitable for complex jobs and dependency structures, including wildcards
        Built-in methods to visualize task graph
        Can run tasks in parallel
        Built-in methods to keep track of finished and pending tasks
        Easy on SLURM (when Cluster execution is not used)
        Disadvantages of Snakemake¶
        Snakemake is complex and requires time to learn
        Works in two modes: single node or cluster mode, although cluster mode is not a good fit for NERSC. We strongly suggest that Snakemake be used for single node jobs. Users who need multinode jobs should use another tool such as Parsl, Taskfarmer, or Fireworks.
        Cluster execution not suitable for NERSC

        Astute readers of the Snakemake docs will find that Snakemake has a cluster execution capability. However, this means that Snakemake will treat each rule as a separate job and submit many requests to Slurm. We don't recommend this for Snakemake users at NERSC. 1) Submitting each rule as a separate job means that a Snakemake workflow may take a long time to complete and 2) submitting many jobs to Slurm will degrade scheduler performance for everyone. One of the main advantages of workflow tools is that they can often work independently of a job scheduler, so we strongly encourage single node Snakeflow jobs that will run without burdening Slurm.

        1 条回复 最后回复 回复 引用 0
        • A
          anneng 最后由 编辑

          https://elixir-workflow-workshop.github.io/2021/slide_decks/ELIXIR_WorkWork2021_WESkit.pdf
          https://gitlab.com/one-touch-pipeline/weskit/api

          weskit
          A GA4GH compliant Workflow-Execution-Service (WES) for Nextflow and Snakemake.

          1 条回复 最后回复 回复 引用 0
          • A
            anneng 最后由 编辑

            https://snakemake.readthedocs.io/en/stable/executing/cloud.html
            snakemake支持tes

            1 条回复 最后回复 回复 引用 0
            • First post
              Last post
            Powered by 暗能星系