暗能星系

    • 登录
    • 搜索

    GATK

    生物信息分析
    2
    20
    63
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 编辑

      https://yulijia.net/slides/bioinfomatcis_for_medical_students/2019-07-31-A_beginners_guide_to_Call_SNPs_and_indels_Part_II.html#1
      201d70bf-c895-4ca6-9b74-f244472b1cbe-image.png

      1 条回复 最后回复 回复 引用 0
      • A
        anneng 最后由 编辑

        https://wikis.univ-lille.fr/bilille/_media/ngs2019_dna_duplicates.pdf
        d1d000ef-2dfb-491c-870e-9436c0646304-image.png

        1 条回复 最后回复 回复 引用 0
        • A
          anneng 最后由 编辑

          https://paleomix.readthedocs.io/en/stable/other_tools.html#paleomix-rmdup-collapsed
          去重合并的reads

          https://www.biostars.org/p/347514/
          先去重再合并

          1 条回复 最后回复 回复 引用 0
          • A
            anneng 最后由 编辑

            GATK_Discovery_Tutorial-Worksheet-AUS2016.pdf

            1 条回复 最后回复 回复 引用 0
            • A
              anneng 最后由 编辑

              https://qcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr12-5-Variant_calling_joint_genotyping.pdf
              GATKwr12-5-Variant_calling_joint_genotyping.pdf
              对gvcf的解释

              1 条回复 最后回复 回复 引用 0
              • A
                anneng 最后由 编辑

                https://gatk.broadinstitute.org/hc/en-us/articles/360035890471-Hard-filtering-germline-short-variants
                vcf过滤

                1 条回复 最后回复 回复 引用 0
                • A
                  anneng 最后由 anneng 编辑

                  新增一个SNP过滤和注释流程包括
                  过滤:

                     gatk VariantFiltration \
                     -R reference.fasta \
                     -V input.vcf.gz \
                     -O output.vcf.gz \
                     -filter "QD <4.0 || FS> 60.0 || MQ <40.0" \
                     --filterName "tobacco"
                     -window "4"
                     -G-filter "GQ<20.0"
                  

                  这些过滤参数要暴露出来 可以设置
                  snpeff:注释
                  在gatk中得到的是全的未过滤的vcf
                  过滤后也用sneff看下结果

                  1 条回复 最后回复 回复 引用 0
                  • A
                    anneng 最后由 编辑

                    https://nbisweden.github.io/workshop-ngsintro/2005/lab_vc.html#5_Variant_calling_in_cohort

                    1 条回复 最后回复 回复 引用 0
                    • A
                      anneng 最后由 编辑

                      https://www.clinbioinfosspa.es/files/pipelines/germline_workflow_diagram.pdf
                      5ceb753b-158c-4e38-8f74-ef3fbccd79d7-image.png

                      1 条回复 最后回复 回复 引用 0
                      • A
                        anneng 最后由 编辑

                        https://haplotypecaller1.rssing.com/chan-10646605/all_p46.html

                        1. Merging VCF files
                          There are three main reasons why you might want to combine variants from different files into one, and the tool to use depends on what you are trying to achieve.

                        The most common case is when you have been parallelizing your variant calling analyses, e.g. running HaplotypeCaller per-chromosome, producing separate VCF files (or GVCF files) per-chromosome. For that case, you can use the Picard tool MergeVcfs to merge the files. See the relevant Tool Doc page for usage details.

                        The second case is when you have been using HaplotypeCaller in -ERC GVCF or -ERC BP_RESOLUTION to call variants on a large cohort, producing many GVCF files. You then need to consolidate them before joint-calling variants with GenotypeGVCFs (for performance reasons). This can be done with either CombineGVCFs or ImportGenomicsDB tools, both of which are specifically designed to handle GVCFs in this way. See the relevant Tool Doc pages for usage details and the Best Practices workflow documentation to learn more about the logic of this workflow.

                        The third case is when you want to compare variant calls that were produced from the same samples but using different methods, for comparison. For example, if you're evaluating variant calls produced by different variant callers, different workflows, or the same but using different parameters. For this case, we recommend taking a different approach; rather than merging the VCF files (which can have all sorts of complicated consequences), you can us the VariantAnnotator tool to annotate one of the VCFs with the other treated as a resource. See the relevant Tool Doc page for usage details.

                        1 条回复 最后回复 回复 引用 0
                        • A
                          anneng 最后由 编辑

                          https://gatk.broadinstitute.org/hc/en-us/community/posts/360071192131-Merge-different-individual-VCF

                          mergevcfs 的样本列表要相同

                          1 条回复 最后回复 回复 引用 0
                          • A
                            anneng 最后由 编辑

                            https://www.melbournebioinformatics.org.au/tutorials/tutorials/variant_calling_gatk1/files/variant_calling_gatk1.pdf

                            gatk VariantsToTable \
                            -R reference/hg38/Homo_sapiens_assembly38.fasta \
                            -V output/output.vqsr.varfilter.pass.vcf.gz \
                            -F CHROM -F POS -F FILTER -F TYPE -GF AD -GF DP \
                            --show-filtered \
                            -O output/output.vqsr.varfilter.pass.tsv
                            

                            GATK可以把vcf变成表格

                            1 条回复 最后回复 回复 引用 0
                            • A
                              anneng 最后由 编辑

                              https://hpc.nih.gov/training/gatk_tutorial/
                              A practical introduction to GATK 4 on Biowulf (NIH HPC)

                              1 条回复 最后回复 回复 引用 0
                              • A
                                anneng 最后由 编辑

                                https://support.terra.bio/hc/en-us/articles/360037493811--4-howto-Use-scatter-gather-to-joint-call-genotypes

                                1 条回复 最后回复 回复 引用 0
                                • A
                                  anneng 最后由 编辑

                                  https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-020-07013-y
                                  Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework

                                  1 条回复 最后回复 回复 引用 0
                                  • First post
                                    Last post
                                  Powered by 暗能星系