暗能星系

    • 登录
    • 搜索

    烟草数据分析

    生物信息分析
    1
    12
    70
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 anneng 编辑

      3.java -jar picard.jar MarkDuplicates
      I=input.bam
      O=marked_duplicates.bam
      M=marked_dup_metrics.txt
      去重的工具对比 samtools vs picard
      https://www.biostars.org/p/390305/

      A 1 条回复 最后回复 回复 引用 0
      • A
        anneng 最后由 anneng 编辑

        4.RBQS
        GATK4以前的版本 还需要执行 IndelRealigner 现在不需要了
        https://www.biostars.org/p/305123/
        https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/2018-08-10-2018-04-11/11826-BQSR-without-IndelRealigner
        According to the bestPractice of GATK4,

        Mark Duplicates -> RBQS -> IndelRealigning by Mutect2

        which was below in previous version of GATK

        Mark Duplicates -> IndelRealigner -> RBQS -> Mutect2

        RBQS without INDEL realigning wouldn’t be affected by false alignments ?

        /home/admin/software/gatk-4.1.0.0/gatk BaseRecalibrator
        -I my_reads.bam
        -R reference.fasta
        --known-sites dbsnp_138.hg19.vcf
        --known-sites Mills_and_1000G_gold_standard.indels.hg19.sites.vcf
        --known-sites 1000G_phase1.indels.hg19.sites.vcf
        --known-sites 1000G_phase1.snps.high_confidence.hg19.sites.vcf
        -L ../in/Illumina.bed
        -O recal_data.table

        /home/admin/software/gatk-4.1.0.0/gatk ApplyBQSR
        -R reference.fasta
        -I input.bam
        --bqsr-recal-file recal_data.table
        -L ../in/Illumina.bed
        -O output.bam

        对于非人类的物种怎么处理?
        https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/
        博得推荐的方法是bootstrapping  即先call SNP 然后用这些数据再做BQSR
        Base Quality Score Recalibration (BQSR) is an important step for accurate variant detection that aims to minimize the effect of technical variation on base quality scores (measured as Phred scores). As with the original pipeline (link), this pipeline assumes that a ‘gold standard’ set of SNPS and indels are not available for BQSR. In the absence of a gold standard the pipeline performs an initial step detecting variants without performing BQSR, and then uses the identified SNPs as input for BQSR before calling variants again. This process is referred to as bootstrapping and is the procedure recommended by the Broad Institute’s best practices for variant discovery analysis when a gold standard is not available.

        1 条回复 最后回复 回复 引用 0
        • A
          anneng 最后由 编辑

          此回复已被删除!
          1 条回复 最后回复 回复 引用 0
          • A
            anneng 最后由 anneng 编辑

            5.获取突变
            gatk HaplotypeCaller
            -R ref.fa
            -I sorted_dedup_reads.bam
            -o raw_variants.vcf

            gatk --java-options "-Xmx8G" Mutect2
            -R /home/admin/database/reference/hg19/ucsc.hg19.fasta
            -I ../out/normal_recal.bam
            -I ../out/cancer_recal.bam
            -tumor cancer
            -normal normal
            --germline-resource af-only-gnomad.raw.sites.hg19.vcf.gz
            -L ../in/Illumina.bed
            -O ../out/somatic.vcf

            gatk --java-options "-Xmx8G" FilterMutectCalls
            -V ../out/somatic.vcf
            -O ../out/somatic_filtered.vcf.gz

            获取SNP和Indels
            gatk SelectVariants
            -R ref.fa
            -V raw_variants.vcf
            -selectType SNP
            -o raw_snps.vcf
            gatk SelectVariants
            -R ref.fa
            -V raw_variants.vcf
            -selectType INDEL
            -o raw_indels.vcf

            过滤SNP和Indels
            gatk VariantFiltration
            -R ref.fa
            -V raw_snps.vcf
            -O filtered_snps.vcf
            -filter-name "QD_filter" -filter "QD < 2.0"
            -filter-name "FS_filter" -filter "FS > 60.0"
            -filter-name "MQ_filter" -filter "MQ < 40.0"
            -filter-name "SOR_filter" -filter "SOR > 4.0"
            -filter-name "MQRankSum_filter" -filter "MQRankSum < -12.5"
            -filter-name "ReadPosRankSum_filter" -filter "ReadPosRankSum < -8.0"

            gatk VariantFiltration
            -R ref.fa
            -V raw_indels.vcf
            -O filtered_indels.vcf
            -filter-name "QD_filter" -filter "QD < 2.0"
            -filter-name "FS_filter" -filter "FS > 200.0"
            -filter-name "SOR_filter" -filter "SOR > 10.0"

            gatk SelectVariants
            --exclude-filtered
            -V filtered_snps.vcf
            -O bqsr_snps.vcf
            gatk SelectVariants
            --exclude-filtered
            -V filtered_indels.vcf
            -O bqsr_indels.vcf

            1 条回复 最后回复 回复 引用 0
            • A
              anneng 最后由 编辑

              此回复已被删除!
              1 条回复 最后回复 回复 引用 0
              • A
                anneng 最后由 编辑

                第一轮  Base Quality Score Recalibration (BQSR)
                gatk BaseRecalibrator
                -R ref.fa
                -I sorted_dedup_reads.bam
                --known-sites bqsr_snps.vcf
                --known-sites bqsr_indels.vcf
                -O recal_data.table

                gatk ApplyBQSR
                -R ref.fa
                -I sorted_dedup_reads.bam
                -bqsr recal_data.table
                -O recal_reads.bam \

                第二轮(可选)
                gatk BaseRecalibrator
                -R ref.fa
                -I recal_reads.bam
                --known-sites bqsr_snps.vcf
                --known-sites bqsr_indels.vcf
                -O post_recal_data.table
                分析数据:
                gatk AnalyzeCovariates \
                -before recal_data.table
                -after post_recal_data.table
                -plots recalibration_plots.pdf

                1 条回复 最后回复 回复 引用 0
                • A
                  anneng 最后由 编辑

                  基于优化的结果进行第二轮突变分析:
                  gatk HaplotypeCaller
                  -R ref.fa
                  -I recal_reads.bam
                  -o raw_variants_recal.vcf

                  gatk SelectVariants
                  -R ref.fa
                  -V raw_variants_recal.vcf
                  -selectType SNP
                  -o raw_snps_recal.vcf
                  gatk SelectVariants
                  -R ref.fa
                  -V raw_variants.vcf
                  -selectType INDEL
                  -o raw_indels_recal.vcf

                  gatk VariantFiltration \
                  -R ref.fa
                  -V raw_snps_recal.vcf
                  -O filtered_snps_final.vcf
                  -filter-name "QD_filter" -filter "QD < 2.0"
                  -filter-name "FS_filter" -filter "FS > 60.0"
                  -filter-name "MQ_filter" -filter "MQ < 40.0"
                  -filter-name "SOR_filter" -filter "SOR > 4.0"
                  -filter-name "MQRankSum_filter" -filter "MQRankSum < -12.5"
                  -filter-name "ReadPosRankSum_filter" -filter "ReadPosRankSum < -8.0"

                  gatk VariantFiltration \
                  -R ref.fa
                  -V raw_indels_recal.fa
                  -O filtered_indels_final.vcf
                  -filter-name "QD_filter" -filter "QD < 2.0"
                  -filter-name "FS_filter" -filter "FS > 200.0"
                  -filter-name "SOR_filter" -filter "SOR > 10.0"

                  1 条回复 最后回复 回复 引用 0
                  • A
                    anneng 最后由 编辑

                    注释:
                    java -jar snpEff.jar -v \
                    <snpeff_db>
                    filtered_snps_final.vcf > $filtered_snps_final.ann.vcf

                    或者
                    perl /home/admin/software/annovar/table_annovar.pl $INPUT
                    /home/admin/software/annovar/humandb -buildver hg19
                    -out $OUT_DIR/$PREFIX.annovar -remove
                    -protocol refGene,clinvar_20170905
                    -operation g,f
                    -nastring .
                    -vcfinput

                    1 条回复 最后回复 回复 引用 0
                    • A
                      anneng 最后由 anneng 编辑

                      https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/
                      2a9612fe-b9e4-410b-88e6-4d29797bade3-image.png

                      1 条回复 最后回复 回复 引用 0
                      • A
                        anneng @anneng 最后由 编辑

                        @anneng picard 在GATK4.0以后已经集成到了 GATK 本身

                        1 条回复 最后回复 回复 引用 0
                        • A
                          anneng 最后由 编辑

                          https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/4097/100/GCF_000715135.1_Ntab-TN90/

                          注释的话 从这里可以下载 gff文件 用 snpeff或者 ANNOVAR 进行注释

                          1 条回复 最后回复 回复 引用 0
                          • First post
                            Last post
                          Powered by 暗能星系