暗能星系

    • 登录
    • 搜索

    GATK

    生物信息分析
    2
    20
    63
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 编辑

      https://eriqande.github.io/eca-bioinf-handbook/

      1 条回复 最后回复 回复 引用 0
      • Z
        zhanglu 最后由 zhanglu 编辑

        GATK流程记录

        参考序列处理:

        1. bwa index 建索引

        镜像:11918067/genomes-in-the-cloud:2.4.2-1552931386
        命令: /usr/gitc/bwa index *.fa
        

        2. picard 创建.dict文件

        镜像: broadinstitute/picard:2.27.3
        命令:java -jar /usr/picard/picard.jar CreateSequenceDictionary R=Nitab-v4.5_genome_Chr_Edwards2017.fasta O=Nitab-v4.5_genome_Chr_Edwards2017.fasta.dict
        

        3. samtools 创建.fai

        命令:samtools faidx Nitab-v4.5_genome_Chr_Edwards2017.fasta
        镜像:quay.io/biocontainers/samtools:1.15.1--h1170115_0
        

        4. SnpEff ref库:

        命令: snp_build -name ${Name} -ann ${Gff} -fa ${Fa}
        镜像: docker: "anneng01:8090/library/angs_snpeff:1.0.0"
        输出路径result
        

        注意事项

        SnpEff 输入路径VCF及其索引文件必须是gz压缩格式,例如: reference.vcf.gz

        WDL流程

        gatk_merge.wdl

        测试数据

        {
            "GATK.fastq_1":"/ceph_disk3/file_server/tmp/lanzhou/data/H06HDADXX130110.1.ATCACGAT.20k_reads_1.fastq",
            "GATK.fastq_2":"/ceph_disk3/file_server/tmp/lanzhou/data/H06HDADXX130110.1.ATCACGAT.20k_reads_2.fastq",
            "GATK.ref_fasta":"/ceph_disk2/data/hongyuan/mnt/data/public_data/tobacco_1656467890/Nitab-v4.5_genome_Chr_Edwards2017.fasta",
            "GATK.snp_ref":"/ceph_disk2/data/hongyuan/mnt/data/public_data/tobacco_1656467890/result"
        }
        
        1 条回复 最后回复 回复 引用 0
        • A
          anneng 最后由 编辑

          https://qcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATK_Discovery_Tutorial-Worksheet-AUS2016.pdf

          一个gatk指南 讲的比较细致

          1 条回复 最后回复 回复 引用 0
          • A
            anneng 最后由 编辑

            https://yulijia.net/slides/bioinfomatcis_for_medical_students/2019-07-31-A_beginners_guide_to_Call_SNPs_and_indels_Part_II.html#1
            201d70bf-c895-4ca6-9b74-f244472b1cbe-image.png

            1 条回复 最后回复 回复 引用 0
            • A
              anneng 最后由 编辑

              https://wikis.univ-lille.fr/bilille/_media/ngs2019_dna_duplicates.pdf
              d1d000ef-2dfb-491c-870e-9436c0646304-image.png

              1 条回复 最后回复 回复 引用 0
              • A
                anneng 最后由 编辑

                https://paleomix.readthedocs.io/en/stable/other_tools.html#paleomix-rmdup-collapsed
                去重合并的reads

                https://www.biostars.org/p/347514/
                先去重再合并

                1 条回复 最后回复 回复 引用 0
                • A
                  anneng 最后由 编辑

                  GATK_Discovery_Tutorial-Worksheet-AUS2016.pdf

                  1 条回复 最后回复 回复 引用 0
                  • A
                    anneng 最后由 编辑

                    https://qcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr12-5-Variant_calling_joint_genotyping.pdf
                    GATKwr12-5-Variant_calling_joint_genotyping.pdf
                    对gvcf的解释

                    1 条回复 最后回复 回复 引用 0
                    • A
                      anneng 最后由 编辑

                      https://gatk.broadinstitute.org/hc/en-us/articles/360035890471-Hard-filtering-germline-short-variants
                      vcf过滤

                      1 条回复 最后回复 回复 引用 0
                      • A
                        anneng 最后由 anneng 编辑

                        新增一个SNP过滤和注释流程包括
                        过滤:

                           gatk VariantFiltration \
                           -R reference.fasta \
                           -V input.vcf.gz \
                           -O output.vcf.gz \
                           -filter "QD <4.0 || FS> 60.0 || MQ <40.0" \
                           --filterName "tobacco"
                           -window "4"
                           -G-filter "GQ<20.0"
                        

                        这些过滤参数要暴露出来 可以设置
                        snpeff:注释
                        在gatk中得到的是全的未过滤的vcf
                        过滤后也用sneff看下结果

                        1 条回复 最后回复 回复 引用 0
                        • A
                          anneng 最后由 编辑

                          https://nbisweden.github.io/workshop-ngsintro/2005/lab_vc.html#5_Variant_calling_in_cohort

                          1 条回复 最后回复 回复 引用 0
                          • A
                            anneng 最后由 编辑

                            https://www.clinbioinfosspa.es/files/pipelines/germline_workflow_diagram.pdf
                            5ceb753b-158c-4e38-8f74-ef3fbccd79d7-image.png

                            1 条回复 最后回复 回复 引用 0
                            • A
                              anneng 最后由 编辑

                              https://haplotypecaller1.rssing.com/chan-10646605/all_p46.html

                              1. Merging VCF files
                                There are three main reasons why you might want to combine variants from different files into one, and the tool to use depends on what you are trying to achieve.

                              The most common case is when you have been parallelizing your variant calling analyses, e.g. running HaplotypeCaller per-chromosome, producing separate VCF files (or GVCF files) per-chromosome. For that case, you can use the Picard tool MergeVcfs to merge the files. See the relevant Tool Doc page for usage details.

                              The second case is when you have been using HaplotypeCaller in -ERC GVCF or -ERC BP_RESOLUTION to call variants on a large cohort, producing many GVCF files. You then need to consolidate them before joint-calling variants with GenotypeGVCFs (for performance reasons). This can be done with either CombineGVCFs or ImportGenomicsDB tools, both of which are specifically designed to handle GVCFs in this way. See the relevant Tool Doc pages for usage details and the Best Practices workflow documentation to learn more about the logic of this workflow.

                              The third case is when you want to compare variant calls that were produced from the same samples but using different methods, for comparison. For example, if you're evaluating variant calls produced by different variant callers, different workflows, or the same but using different parameters. For this case, we recommend taking a different approach; rather than merging the VCF files (which can have all sorts of complicated consequences), you can us the VariantAnnotator tool to annotate one of the VCFs with the other treated as a resource. See the relevant Tool Doc page for usage details.

                              1 条回复 最后回复 回复 引用 0
                              • A
                                anneng 最后由 编辑

                                https://gatk.broadinstitute.org/hc/en-us/community/posts/360071192131-Merge-different-individual-VCF

                                mergevcfs 的样本列表要相同

                                1 条回复 最后回复 回复 引用 0
                                • A
                                  anneng 最后由 编辑

                                  https://www.melbournebioinformatics.org.au/tutorials/tutorials/variant_calling_gatk1/files/variant_calling_gatk1.pdf

                                  gatk VariantsToTable \
                                  -R reference/hg38/Homo_sapiens_assembly38.fasta \
                                  -V output/output.vqsr.varfilter.pass.vcf.gz \
                                  -F CHROM -F POS -F FILTER -F TYPE -GF AD -GF DP \
                                  --show-filtered \
                                  -O output/output.vqsr.varfilter.pass.tsv
                                  

                                  GATK可以把vcf变成表格

                                  1 条回复 最后回复 回复 引用 0
                                  • A
                                    anneng 最后由 编辑

                                    https://hpc.nih.gov/training/gatk_tutorial/
                                    A practical introduction to GATK 4 on Biowulf (NIH HPC)

                                    1 条回复 最后回复 回复 引用 0
                                    • A
                                      anneng 最后由 编辑

                                      https://support.terra.bio/hc/en-us/articles/360037493811--4-howto-Use-scatter-gather-to-joint-call-genotypes

                                      1 条回复 最后回复 回复 引用 0
                                      • A
                                        anneng 最后由 编辑

                                        https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-020-07013-y
                                        Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework

                                        1 条回复 最后回复 回复 引用 0
                                        • First post
                                          Last post
                                        Powered by 暗能星系