暗能星系

    • 登录
    • 搜索

    Toil验证记录

    生物信息分析
    1
    7
    61
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 anneng 编辑

      目标:验证Toil+CWL+k8s环境的运行 Toil采用Server模式
      1.环境安装
      1.1 安装k8s minikube

      googleapis无法访问 用浏览器代理下载
      curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
      sudo install minikube-linux-amd64 /usr/local/bin/minikube
      注意要使用cn镜像 新版本下载有问题 选择稍微低的版本
      minikube start --image-mirror-country='cn' --kubernetes-version=v1.23.8 --extra-config=kubelet.housekeeping-interval=10s
      --extra-config=kubelet.housekeeping-interval=10s  用来使metrics-server可用
      

      1.2 minikube的简单使用

      查看状态
      minikube status
      获取集群所有节点(机器):
      
      minikube kubectl get nodes
      获取集群所有命名空间:
      
      minikube kubectl get namespaces
      查看集群所有 Pod:
      
      minikube kubectl -- get pods -A
      进入节点服务器:
      
      minikube ssh
      执行节点服务器命令,例如查看节点 docker info:
      
      minikube ssh -- docker info
      删除集群, 删除 ~/.minikube 目录缓存的文件:
      
      minikube delete
      关闭集群:
      
      minikube stop
      销毁集群:
      
      minikube stop && minikube delete
      
      minikube dashboard
      有个错误
      libva error: /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so init failed
      重新安装驱动:
      sudo apt-get install --reinstall intel-media-va-driver:amd64(没有解决错误)
      先忽略
      

      运行一个服务

        minikube kubectl -- create deployment hello-minikube --image=kicbase/echo-server:1.0
        minikube kubectl -- expose deployment hello-minikube --type=NodePort --port=8080
        minikube service hello-minikube
      

      Toil需要 metrics-server
      https://www.mls-tech.info/microservice/k8s/minikube-use-metrics-server/
      minikube addons enable metrics-server //这个命令总是从k8s下载镜像
      sudo docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server:v0.6.2
      sudo docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server:v0.6.2 registry.k8s.io/metrics-server/metrics-server:v0.6.2

      启动 metrics-server 注意要设置国内镜像
      minikube addons enable metrics-server --images="MetricsServer=metrics-server:v0.6.2" --registries="MetricsServer=registry.cn-hangzhou.aliyuncs.com/google_containers"

      2.安装Toil

      sudo pip install virtualenv
      virtualenv ~/venv
      source ~/venv/bin/activate
      pip install toil
      pip install toil[cwl,wdl,kubernetes,server]   //安装额外的插件
      

      Toil的架构:
      80ac31e0-5f34-4528-9535-f25369a7a164-image.png
      the leader:
      The leader is responsible for deciding which jobs should be run. To do this it traverses the job graph. Currently this is a single threaded process, but we make aggressive steps to prevent it becoming a bottleneck

      There are two main ways to run Toil workflows on Kubernetes. You can either run the Toil leader on a machine outside the cluster, with jobs submitted to and run on the cluster, or you can submit the Toil leader itself as a job and have it run inside the cluster.

      the job-store:
      Handles all files shared between the components. Files in the job-store are the means by which the state of the workflow is maintained. Each job is backed by a file in the job store, and atomic updates to this state are used to ensure the workflow can always be resumed upon failure. The job-store can also store all user files, allowing them to be shared between jobs. The job-store is defined by the AbstractJobStore class. Multiple implementations of this class allow Toil to support different back-end file stores, e.g.: S3, network file systems, Google file store, etc.

      workers:
      The workers are temporary processes responsible for running jobs, one at a time per worker. Each worker process is invoked with a job argument that it is responsible for running. The worker monitors this job and reports back success or failure to the leader by editing the job’s state in the file-store. If the job defines successor jobs the worker may choose to immediately run them

      the batch-system:
      Responsible for scheduling the jobs given to it by the leader, creating a worker command for each job. The batch-system is defined by the AbstractBatchSystem class. Toil uses multiple existing batch systems to schedule jobs, including Apache Mesos, GridEngine and a multi-process single node implementation that allows workflows to be run without any of these frameworks. Toil can therefore fairly easily be made to run a workflow using an existing cluster.
      the node provisioner:
      Creates worker nodes in which the batch system schedules workers. It is defined by the AbstractProvisioner class.
      the statistics and logging monitor:
      Monitors logging and statistics produced by the workers and reports them. Uses the job-store to gather this information.

      3.测试用例
      3.1 Toil原生的hello world 流程

      //helloWorld.py  Toil原生的流程就是python脚本编写的
      from toil.common import Toil
      from toil.job import Job
      def helloWorld(message, memory="1G", cores=1, disk="1G"):
          return f"Hello, world!, here's a message: {message}"
      if __name__ == "__main__":
          parser = Job.Runner.getDefaultArgumentParser()
          options = parser.parse_args()
          options.clean = "always"
          with Toil(options) as toil:
              output = toil.start(Job.wrapFn(helloWorld, "You did it!"))
          print(output)
      
      (venv) $ python helloWorld.py file:my-job-store
      

      3.2 测试CWL流程

      cwlVersion: v1.0
      class: CommandLineTool
      baseCommand: echo
      stdout: output.txt
      inputs:
        message:
          type: string
          inputBinding:
            position: 1
      outputs:
        output:
          type: stdout
      
      运行一个cwl文件
      toil-cwl-runner example.cwl example-job.yaml
      cat output.txt
      

      3.3 测试WDL

          workflow write_simple_file {
            call write_file
          }
          task write_file {
            String message
            command { echo ${message} > wdl-helloworld-output.txt }
            output { File test = "wdl-helloworld-output.txt" }
          }
      
      and this code into ``wdl-helloworld.json``::
      
          {
            "write_simple_file.write_file.message": "Hello world!"
          }
      
      toil-wdl-runner wdl-helloworld.wdl wdl-helloworld.json
      

      3.4 测试CWL k8s
      😠 Toil 要求k8s 对接时 只能用AWS的存储作为job-store

      1 条回复 最后回复 回复 引用 0
      • A
        anneng 最后由 编辑

        minikube的其他参考
        https://www.zhaowenyu.com/minikube-doc/ops/minikube.html

        1 条回复 最后回复 回复 引用 0
        • A
          anneng 最后由 anneng 编辑

          Toil k8s 部署方式一 将toil leader部署到k8s 内部

          apiVersion: batch/v1
          kind: Job
          metadata:
            # It is good practice to include your username in your job name.
            # Also specify it in TOIL_KUBERNETES_OWNER
            name: demo-user-toil-test
          # Do not try and rerun the leader job if it fails
          
          spec:
           backoffLimit: 0
           template:
             spec:
               # Do not restart the pod when the job fails, but keep it around so the
               # log can be retrieved
               restartPolicy: Never
               volumes:
               - name: aws-credentials-vol
                 secret:
                   # Make sure the AWS credentials are available as a volume.
                   # This should match TOIL_AWS_SECRET_NAME
                   secretName: aws-credentials
               # You may need to replace this with a different service account name as
               # appropriate for your cluster.
               serviceAccountName: default
               containers:
               - name: main
                 image: quay.io/ucsc_cgl/toil:5.5.0
                 env:
                 # Specify your username for inclusion in job names
                 - name: TOIL_KUBERNETES_OWNER
                   value: demo-user
                 # Specify where to find the AWS credentials to access the job store with
                 - name: TOIL_AWS_SECRET_NAME
                   value: aws-credentials
                 # Specify where per-host caches should be stored, on the Kubernetes hosts.
                 # Needs to be set for Toil's caching to be efficient.
                 - name: TOIL_KUBERNETES_HOST_PATH
                   value: /data/scratch
                 volumeMounts:
                 # Mount the AWS credentials volume
                 - mountPath: /root/.aws
                   name: aws-credentials-vol
                 resources:
                   # Make sure to set these resource limits to values large enough
                   # to accommodate the work your workflow does in the leader
                   # process, but small enough to fit on your cluster.
                   #
                   # Since no request values are specified, the limits are also used
                   # for the requests.
                   limits:
                     cpu: 2
                     memory: "4Gi"
                     ephemeral-storage: "10Gi"
                 command:
                 - /bin/bash
                 - -c
                 - |
                   # This Bash script will set up Toil and the workflow to run, and run them.
                   set -e
                   # We make sure to create a work directory; Toil can't hot-deploy a
                   # script from the root of the filesystem, which is where we start.
                   mkdir /tmp/work
                   cd /tmp/work
                   # We make a virtual environment to allow workflow dependencies to be
                   # hot-deployed.
                   #
                   # We don't really make use of it in this example, but for workflows
                   # that depend on PyPI packages we will need this.
                   #
                   # We use --system-site-packages so that the Toil installed in the
                   # appliance image is still available.
                   virtualenv --python python3 --system-site-packages venv
                   . venv/bin/activate
                   # Now we install the workflow. Here we're using a demo workflow
                   # script from Toil itself.
                   wget https://raw.githubusercontent.com/DataBiosphere/toil/releases/4.1.0/src/toil/test/docs/scripts/tutorial_helloworld.py
                   # Now we run the workflow. We make sure to use the Kubernetes batch
                   # system and an AWS job store, and we set some generally useful
                   # logging options. We also make sure to enable caching.
                   python3 tutorial_helloworld.py \
                       aws:us-west-2:demouser-toil-test-jobstore \
                       --batchSystem kubernetes \
                       --realTimeLogging \
                       --logInfo
          

          kubectl apply -f leader.yaml

          注意:
          Note that the leader pod will need your workflow script, its other dependencies, and Toil all installed. An easy way to get Toil installed is to start with the Toil appliance image for the version of Toil you want to use. In this example, we use quay.io/ucsc_cgl/toil:5.5.0.
          Toil要求这种模式 把脚本、Toil都打包到镜像里面

          1 条回复 最后回复 回复 引用 0
          • A
            anneng 最后由 anneng 编辑

            Toil k8s 部署方式二 将toil leader部署到k8s 外部
            主要用于开发测试 而且当前要求本地能访问 aws
            实时日志将无法使用 除非本地有外网
            Real time logging will not work unless your local machine is able to listen for incoming UDP packets on arbitrary ports on the address it uses to contact the IPv4 Internet; Toil does no NAT traversal or detection.

            $ export TOIL_KUBERNETES_OWNER=demo-user  # This defaults to your local username if not set
            $ export TOIL_AWS_SECRET_NAME=aws-credentials
            $ export TOIL_KUBERNETES_HOST_PATH=/data/scratch
            $ virtualenv --python python3 --system-site-packages venv
            $ . venv/bin/activate
            $ wget https://raw.githubusercontent.com/DataBiosphere/toil/releases/4.1.0/src/toil/test/docs/scripts/tutorial_helloworld.py
            $ python3 tutorial_helloworld.py \
                  aws:us-west-2:demouser-toil-test-jobstore \
                  --batchSystem kubernetes \
                  --realTimeLogging \
                  --logInfo
            

            😒 ModuleNotFoundError: No module named 'boto'
            pip install boto botocore boto3 mypy_boto3_s3

            尝试将任务在k8s上启动 使用file模式 任务可以下发到minikube 但是无法正常启动 toil默认会挂载aws的pv
            python3 tutorial_helloworld.py file:job-store --batchSystem kubernetes --realTimeLogging --logInfo
            MountVolume.SetUp failed for volume "s3-credentials" : secret "aws-credentials" not found
            不要设置 export TOIL_AWS_SECRET_NAME=aws-credentials

            验证结果:

            export TOIL_KUBERNETES_HOST_PATH=/home/jynlix/Downloads/src/toil/data
            export TOIL_WORKDIR=/home/jynlix/Downloads/src/toil/data
            minikube mount /home/jynlix/Downloads/src/toil/data:/home/jynlix/Downloads/src/toil/data
            python3 -m pdb tutorial_helloworld.py file:job-store  --batchSystem kubernetes --realTimeLogging --logInfo
            
            1 条回复 最后回复 回复 引用 0
            • A
              anneng 最后由 编辑

              https://www.researchgate.net/publication/345904527_Rapid_and_efficient_analysis_of_20000_RNA-seq_samples_with_Toil
              一个案例:
              Rapid and efficient analysis of 20,000 RNA-seq samples with Toil

              1 条回复 最后回复 回复 引用 0
              • A
                anneng 最后由 编辑

                AWS对toil的支持
                https://aws.github.io/amazon-genomics-cli/docs/workflow-engines/toil/
                2bf9b649-e0e7-4f57-a154-c7903e51aa8d-image.png

                1 条回复 最后回复 回复 引用 0
                • A
                  anneng 最后由 anneng 编辑

                  Toil Server模式

                  docker run -d --name wes-rabbitmq -p 5672:5672 rabbitmq:3.9.5
                  celery -A toil.server.celery_app worker --loglevel=INFO
                  toil server

                  curl --location --request POST 'http://localhost:8000/ga4gh/wes/v1/runs' --user test:test --form 'workflow_url="example.cwl"' --form 'workflow_type="cwl"' --form 'workflow_type_version="v1.0"' --form 'workflow_params="{"message": "Hello world!"}"' --form
                  'workflow_attachment=@"./example.cwl"'

                  ===========需要metrics-server==============================
                  kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
                  卸载 metrics-server
                  kubectl delete -f components.yaml

                  根据 https://stackoverflow.com/questions/71843068/metrics-server-is-currently-unable-to-handle-the-request

                  labels:
                      k8s-app: metrics-server
                  spec:
                    containers:
                    - args:
                      - --cert-dir=/tmp
                      - --secure-port=443
                      - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
                      - --kubelet-use-node-status-port
                      - --metric-resolution=15s
                      - --kubelet-insecure-tls **# add this line**
                  

                  kubectl apply -f components.yaml
                  否则会产生下面的错误
                  kubectl get deployment/metrics-server -n kube-system
                  v1beta1.metrics.k8s.io kube-system/metrics-server False (MissingEndpoints) 44m
                  测试:
                  kubectl top nodes
                  NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
                  node1 1076m 13% 17670Mi 75%
                  node2 1295m 16% 10048Mi 65%
                  node3 1168m 14% 14871Mi 63%

                  ============apparmor可能有影响 删除这个服务(产品环境按照https://github.com/adamnovak/gi-kubernetes-autoscaling-config/blob/e1350ac9ad17d94b5073b20db3c75620957926e3/kubenode.ubuntu.cloud-config.yaml#L27-L67设置)=====
                  sudo systemctl stop apparmor.service
                  sudo systemctl disable apparmor.service

                  Toil server在启动 toil-cwl-runner的时候 可能是没有把全局变量传递过去 会报错 但是 直接使用下面的提交 就成功了
                  export TOIL_WORKDIR=/cephfs_data/toil
                  export TOIL_KUBERNETES_HOST_PATH=/cephfs_data/toil
                  toil-cwl-runner --writeMessages=/cephfs_data/toil/run-6aef556521e1460e94b0557ce848f49e/bus_messages --batchSystem=kubernetes --workDir=/cephfs_data/toil --clean=always --outdir=/cephfs_data/toil/run-6aef556521e1460e94b0557ce848f49e/outputs --jobStore=/cephfs_data/toil/run-6aef556521e1460e94b0557ce848f49e/toil_job_store /cephfs_data/toil/run-6aef556521e1460e94b0557ce848f49e/execution/example.cwl /cephfs_data/toil/run-6aef556521e1460e94b0557ce848f49e/execution/wes_inputs.json

                  cat /cephfs_data/toil/run-6aef556521e1460e94b0557ce848f49e/outputs/output.txt
                  Hello world!

                  后面产品环境看看是用hostpath 还是pv

                  1 条回复 最后回复 回复 引用 0
                  • First post
                    Last post
                  Powered by 暗能星系