暗能星系

    • 登录
    • 搜索

    GATK

    生物信息分析
    2
    20
    63
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 编辑

      https://gatk.broadinstitute.org/hc/en-us/community/posts/360066050311-BQSR-bootstrapping-for-multiple-sample-dataset-with-no-known-variants-non-human-

      Thanks for the follow up question on this post so that we can address it!

      We don't have any current BQSR bootstrapping methods or recommendations for when there is no known sites file.

      If you don't have a known sites file, you can still use GATK. Just skip the BQSR step and use hard filtering instead of VQSR. It's more ideal to be able to use the BQSR and VQSR machine learning steps, but it's not possible if you don't have a known sites file.

      Hope this helps!

      Genevieve

      GATK的的BQSR 方法依赖known sites,例如dbsnp,对于研究比较成熟的模式生物,如人类比较有用。其他物种的话可以删除。

      1 条回复 最后回复 回复 引用 0
      • A
        anneng 最后由 编辑

        https://eriqande.github.io/eca-bioinf-handbook/

        1 条回复 最后回复 回复 引用 0
        • Z
          zhanglu 最后由 zhanglu 编辑

          GATK流程记录

          参考序列处理:

          1. bwa index 建索引

          镜像:11918067/genomes-in-the-cloud:2.4.2-1552931386
          命令: /usr/gitc/bwa index *.fa
          

          2. picard 创建.dict文件

          镜像: broadinstitute/picard:2.27.3
          命令:java -jar /usr/picard/picard.jar CreateSequenceDictionary R=Nitab-v4.5_genome_Chr_Edwards2017.fasta O=Nitab-v4.5_genome_Chr_Edwards2017.fasta.dict
          

          3. samtools 创建.fai

          命令:samtools faidx Nitab-v4.5_genome_Chr_Edwards2017.fasta
          镜像:quay.io/biocontainers/samtools:1.15.1--h1170115_0
          

          4. SnpEff ref库:

          命令: snp_build -name ${Name} -ann ${Gff} -fa ${Fa}
          镜像: docker: "anneng01:8090/library/angs_snpeff:1.0.0"
          输出路径result
          

          注意事项

          SnpEff 输入路径VCF及其索引文件必须是gz压缩格式,例如: reference.vcf.gz

          WDL流程

          gatk_merge.wdl

          测试数据

          {
              "GATK.fastq_1":"/ceph_disk3/file_server/tmp/lanzhou/data/H06HDADXX130110.1.ATCACGAT.20k_reads_1.fastq",
              "GATK.fastq_2":"/ceph_disk3/file_server/tmp/lanzhou/data/H06HDADXX130110.1.ATCACGAT.20k_reads_2.fastq",
              "GATK.ref_fasta":"/ceph_disk2/data/hongyuan/mnt/data/public_data/tobacco_1656467890/Nitab-v4.5_genome_Chr_Edwards2017.fasta",
              "GATK.snp_ref":"/ceph_disk2/data/hongyuan/mnt/data/public_data/tobacco_1656467890/result"
          }
          
          1 条回复 最后回复 回复 引用 0
          • A
            anneng 最后由 编辑

            https://qcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATK_Discovery_Tutorial-Worksheet-AUS2016.pdf

            一个gatk指南 讲的比较细致

            1 条回复 最后回复 回复 引用 0
            • A
              anneng 最后由 编辑

              https://yulijia.net/slides/bioinfomatcis_for_medical_students/2019-07-31-A_beginners_guide_to_Call_SNPs_and_indels_Part_II.html#1
              201d70bf-c895-4ca6-9b74-f244472b1cbe-image.png

              1 条回复 最后回复 回复 引用 0
              • A
                anneng 最后由 编辑

                https://wikis.univ-lille.fr/bilille/_media/ngs2019_dna_duplicates.pdf
                d1d000ef-2dfb-491c-870e-9436c0646304-image.png

                1 条回复 最后回复 回复 引用 0
                • A
                  anneng 最后由 编辑

                  https://paleomix.readthedocs.io/en/stable/other_tools.html#paleomix-rmdup-collapsed
                  去重合并的reads

                  https://www.biostars.org/p/347514/
                  先去重再合并

                  1 条回复 最后回复 回复 引用 0
                  • A
                    anneng 最后由 编辑

                    GATK_Discovery_Tutorial-Worksheet-AUS2016.pdf

                    1 条回复 最后回复 回复 引用 0
                    • A
                      anneng 最后由 编辑

                      https://qcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr12-5-Variant_calling_joint_genotyping.pdf
                      GATKwr12-5-Variant_calling_joint_genotyping.pdf
                      对gvcf的解释

                      1 条回复 最后回复 回复 引用 0
                      • A
                        anneng 最后由 编辑

                        https://gatk.broadinstitute.org/hc/en-us/articles/360035890471-Hard-filtering-germline-short-variants
                        vcf过滤

                        1 条回复 最后回复 回复 引用 0
                        • A
                          anneng 最后由 anneng 编辑

                          新增一个SNP过滤和注释流程包括
                          过滤:

                             gatk VariantFiltration \
                             -R reference.fasta \
                             -V input.vcf.gz \
                             -O output.vcf.gz \
                             -filter "QD <4.0 || FS> 60.0 || MQ <40.0" \
                             --filterName "tobacco"
                             -window "4"
                             -G-filter "GQ<20.0"
                          

                          这些过滤参数要暴露出来 可以设置
                          snpeff:注释
                          在gatk中得到的是全的未过滤的vcf
                          过滤后也用sneff看下结果

                          1 条回复 最后回复 回复 引用 0
                          • A
                            anneng 最后由 编辑

                            https://nbisweden.github.io/workshop-ngsintro/2005/lab_vc.html#5_Variant_calling_in_cohort

                            1 条回复 最后回复 回复 引用 0
                            • A
                              anneng 最后由 编辑

                              https://www.clinbioinfosspa.es/files/pipelines/germline_workflow_diagram.pdf
                              5ceb753b-158c-4e38-8f74-ef3fbccd79d7-image.png

                              1 条回复 最后回复 回复 引用 0
                              • A
                                anneng 最后由 编辑

                                https://haplotypecaller1.rssing.com/chan-10646605/all_p46.html

                                1. Merging VCF files
                                  There are three main reasons why you might want to combine variants from different files into one, and the tool to use depends on what you are trying to achieve.

                                The most common case is when you have been parallelizing your variant calling analyses, e.g. running HaplotypeCaller per-chromosome, producing separate VCF files (or GVCF files) per-chromosome. For that case, you can use the Picard tool MergeVcfs to merge the files. See the relevant Tool Doc page for usage details.

                                The second case is when you have been using HaplotypeCaller in -ERC GVCF or -ERC BP_RESOLUTION to call variants on a large cohort, producing many GVCF files. You then need to consolidate them before joint-calling variants with GenotypeGVCFs (for performance reasons). This can be done with either CombineGVCFs or ImportGenomicsDB tools, both of which are specifically designed to handle GVCFs in this way. See the relevant Tool Doc pages for usage details and the Best Practices workflow documentation to learn more about the logic of this workflow.

                                The third case is when you want to compare variant calls that were produced from the same samples but using different methods, for comparison. For example, if you're evaluating variant calls produced by different variant callers, different workflows, or the same but using different parameters. For this case, we recommend taking a different approach; rather than merging the VCF files (which can have all sorts of complicated consequences), you can us the VariantAnnotator tool to annotate one of the VCFs with the other treated as a resource. See the relevant Tool Doc page for usage details.

                                1 条回复 最后回复 回复 引用 0
                                • A
                                  anneng 最后由 编辑

                                  https://gatk.broadinstitute.org/hc/en-us/community/posts/360071192131-Merge-different-individual-VCF

                                  mergevcfs 的样本列表要相同

                                  1 条回复 最后回复 回复 引用 0
                                  • A
                                    anneng 最后由 编辑

                                    https://www.melbournebioinformatics.org.au/tutorials/tutorials/variant_calling_gatk1/files/variant_calling_gatk1.pdf

                                    gatk VariantsToTable \
                                    -R reference/hg38/Homo_sapiens_assembly38.fasta \
                                    -V output/output.vqsr.varfilter.pass.vcf.gz \
                                    -F CHROM -F POS -F FILTER -F TYPE -GF AD -GF DP \
                                    --show-filtered \
                                    -O output/output.vqsr.varfilter.pass.tsv
                                    

                                    GATK可以把vcf变成表格

                                    1 条回复 最后回复 回复 引用 0
                                    • A
                                      anneng 最后由 编辑

                                      https://hpc.nih.gov/training/gatk_tutorial/
                                      A practical introduction to GATK 4 on Biowulf (NIH HPC)

                                      1 条回复 最后回复 回复 引用 0
                                      • A
                                        anneng 最后由 编辑

                                        https://support.terra.bio/hc/en-us/articles/360037493811--4-howto-Use-scatter-gather-to-joint-call-genotypes

                                        1 条回复 最后回复 回复 引用 0
                                        • A
                                          anneng 最后由 编辑

                                          https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-020-07013-y
                                          Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework

                                          1 条回复 最后回复 回复 引用 0
                                          • First post
                                            Last post
                                          Powered by 暗能星系