Snakemake
-
一,安装
准备:
sudo apt update
sudo apt install python3-venv
python3 -m venv .
source bin/activate
准备:
sudo apt-get install build-essential
python setup.py bdist_wheel
sudo apt-get install python3-dev
安装:
pip3 install --upgrade snakemake
二,简介
参考胶片:snakemake.pptx
The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments, without the need to modify the workflow definition. Finally, Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment.rule targets: input: "plots/myplot.pdf" rule transform: input: "raw/{dataset}.csv" output: "transformed/{dataset}.csv" singularity: "docker://somecontainer:v1.0" shell: "somecommand {input} {output}" rule aggregate_and_plot: input: expand("transformed/{dataset}.csv", dataset=[1, 2]) output: "plots/myplot.pdf" conda: "envs/matplotlib.yaml" script: "scripts/plot.py"也支持 k8s 运行:
snakemake --kubernetes --use-conda --default-remote-provider $REMOTE --default-remote-prefix $PREFIX -
https://docs.nersc.gov/jobs/workflow/snakemake/
Strengths of Snakemake¶
Suitable for complex jobs and dependency structures, including wildcards
Built-in methods to visualize task graph
Can run tasks in parallel
Built-in methods to keep track of finished and pending tasks
Easy on SLURM (when Cluster execution is not used)
Disadvantages of Snakemake¶
Snakemake is complex and requires time to learn
Works in two modes: single node or cluster mode, although cluster mode is not a good fit for NERSC. We strongly suggest that Snakemake be used for single node jobs. Users who need multinode jobs should use another tool such as Parsl, Taskfarmer, or Fireworks.
Cluster execution not suitable for NERSCAstute readers of the Snakemake docs will find that Snakemake has a cluster execution capability. However, this means that Snakemake will treat each rule as a separate job and submit many requests to Slurm. We don't recommend this for Snakemake users at NERSC. 1) Submitting each rule as a separate job means that a Snakemake workflow may take a long time to complete and 2) submitting many jobs to Slurm will degrade scheduler performance for everyone. One of the main advantages of workflow tools is that they can often work independently of a job scheduler, so we strongly encourage single node Snakeflow jobs that will run without burdening Slurm.
-
https://elixir-workflow-workshop.github.io/2021/slide_decks/ELIXIR_WorkWork2021_WESkit.pdf
https://gitlab.com/one-touch-pipeline/weskit/apiweskit
A GA4GH compliant Workflow-Execution-Service (WES) for Nextflow and Snakemake. -