暗能星系

    • 登录
    • 搜索

    烟草数据分析

    生物信息分析
    1
    12
    70
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 anneng 编辑

      4.RBQS
      GATK4以前的版本 还需要执行 IndelRealigner 现在不需要了
      https://www.biostars.org/p/305123/
      https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/2018-08-10-2018-04-11/11826-BQSR-without-IndelRealigner
      According to the bestPractice of GATK4,

      Mark Duplicates -> RBQS -> IndelRealigning by Mutect2

      which was below in previous version of GATK

      Mark Duplicates -> IndelRealigner -> RBQS -> Mutect2

      RBQS without INDEL realigning wouldn’t be affected by false alignments ?

      /home/admin/software/gatk-4.1.0.0/gatk BaseRecalibrator
      -I my_reads.bam
      -R reference.fasta
      --known-sites dbsnp_138.hg19.vcf
      --known-sites Mills_and_1000G_gold_standard.indels.hg19.sites.vcf
      --known-sites 1000G_phase1.indels.hg19.sites.vcf
      --known-sites 1000G_phase1.snps.high_confidence.hg19.sites.vcf
      -L ../in/Illumina.bed
      -O recal_data.table

      /home/admin/software/gatk-4.1.0.0/gatk ApplyBQSR
      -R reference.fasta
      -I input.bam
      --bqsr-recal-file recal_data.table
      -L ../in/Illumina.bed
      -O output.bam

      对于非人类的物种怎么处理?
      https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/
      博得推荐的方法是bootstrapping  即先call SNP 然后用这些数据再做BQSR
      Base Quality Score Recalibration (BQSR) is an important step for accurate variant detection that aims to minimize the effect of technical variation on base quality scores (measured as Phred scores). As with the original pipeline (link), this pipeline assumes that a ‘gold standard’ set of SNPS and indels are not available for BQSR. In the absence of a gold standard the pipeline performs an initial step detecting variants without performing BQSR, and then uses the identified SNPs as input for BQSR before calling variants again. This process is referred to as bootstrapping and is the procedure recommended by the Broad Institute’s best practices for variant discovery analysis when a gold standard is not available.

      1 条回复 最后回复 回复 引用 0
      • A
        anneng 最后由 编辑

        此回复已被删除!
        1 条回复 最后回复 回复 引用 0
        • A
          anneng 最后由 anneng 编辑

          5.获取突变
          gatk HaplotypeCaller
          -R ref.fa
          -I sorted_dedup_reads.bam
          -o raw_variants.vcf

          gatk --java-options "-Xmx8G" Mutect2
          -R /home/admin/database/reference/hg19/ucsc.hg19.fasta
          -I ../out/normal_recal.bam
          -I ../out/cancer_recal.bam
          -tumor cancer
          -normal normal
          --germline-resource af-only-gnomad.raw.sites.hg19.vcf.gz
          -L ../in/Illumina.bed
          -O ../out/somatic.vcf

          gatk --java-options "-Xmx8G" FilterMutectCalls
          -V ../out/somatic.vcf
          -O ../out/somatic_filtered.vcf.gz

          获取SNP和Indels
          gatk SelectVariants
          -R ref.fa
          -V raw_variants.vcf
          -selectType SNP
          -o raw_snps.vcf
          gatk SelectVariants
          -R ref.fa
          -V raw_variants.vcf
          -selectType INDEL
          -o raw_indels.vcf

          过滤SNP和Indels
          gatk VariantFiltration
          -R ref.fa
          -V raw_snps.vcf
          -O filtered_snps.vcf
          -filter-name "QD_filter" -filter "QD < 2.0"
          -filter-name "FS_filter" -filter "FS > 60.0"
          -filter-name "MQ_filter" -filter "MQ < 40.0"
          -filter-name "SOR_filter" -filter "SOR > 4.0"
          -filter-name "MQRankSum_filter" -filter "MQRankSum < -12.5"
          -filter-name "ReadPosRankSum_filter" -filter "ReadPosRankSum < -8.0"

          gatk VariantFiltration
          -R ref.fa
          -V raw_indels.vcf
          -O filtered_indels.vcf
          -filter-name "QD_filter" -filter "QD < 2.0"
          -filter-name "FS_filter" -filter "FS > 200.0"
          -filter-name "SOR_filter" -filter "SOR > 10.0"

          gatk SelectVariants
          --exclude-filtered
          -V filtered_snps.vcf
          -O bqsr_snps.vcf
          gatk SelectVariants
          --exclude-filtered
          -V filtered_indels.vcf
          -O bqsr_indels.vcf

          1 条回复 最后回复 回复 引用 0
          • A
            anneng 最后由 编辑

            此回复已被删除!
            1 条回复 最后回复 回复 引用 0
            • A
              anneng 最后由 编辑

              第一轮  Base Quality Score Recalibration (BQSR)
              gatk BaseRecalibrator
              -R ref.fa
              -I sorted_dedup_reads.bam
              --known-sites bqsr_snps.vcf
              --known-sites bqsr_indels.vcf
              -O recal_data.table

              gatk ApplyBQSR
              -R ref.fa
              -I sorted_dedup_reads.bam
              -bqsr recal_data.table
              -O recal_reads.bam \

              第二轮(可选)
              gatk BaseRecalibrator
              -R ref.fa
              -I recal_reads.bam
              --known-sites bqsr_snps.vcf
              --known-sites bqsr_indels.vcf
              -O post_recal_data.table
              分析数据:
              gatk AnalyzeCovariates \
              -before recal_data.table
              -after post_recal_data.table
              -plots recalibration_plots.pdf

              1 条回复 最后回复 回复 引用 0
              • A
                anneng 最后由 编辑

                基于优化的结果进行第二轮突变分析:
                gatk HaplotypeCaller
                -R ref.fa
                -I recal_reads.bam
                -o raw_variants_recal.vcf

                gatk SelectVariants
                -R ref.fa
                -V raw_variants_recal.vcf
                -selectType SNP
                -o raw_snps_recal.vcf
                gatk SelectVariants
                -R ref.fa
                -V raw_variants.vcf
                -selectType INDEL
                -o raw_indels_recal.vcf

                gatk VariantFiltration \
                -R ref.fa
                -V raw_snps_recal.vcf
                -O filtered_snps_final.vcf
                -filter-name "QD_filter" -filter "QD < 2.0"
                -filter-name "FS_filter" -filter "FS > 60.0"
                -filter-name "MQ_filter" -filter "MQ < 40.0"
                -filter-name "SOR_filter" -filter "SOR > 4.0"
                -filter-name "MQRankSum_filter" -filter "MQRankSum < -12.5"
                -filter-name "ReadPosRankSum_filter" -filter "ReadPosRankSum < -8.0"

                gatk VariantFiltration \
                -R ref.fa
                -V raw_indels_recal.fa
                -O filtered_indels_final.vcf
                -filter-name "QD_filter" -filter "QD < 2.0"
                -filter-name "FS_filter" -filter "FS > 200.0"
                -filter-name "SOR_filter" -filter "SOR > 10.0"

                1 条回复 最后回复 回复 引用 0
                • A
                  anneng 最后由 编辑

                  注释:
                  java -jar snpEff.jar -v \
                  <snpeff_db>
                  filtered_snps_final.vcf > $filtered_snps_final.ann.vcf

                  或者
                  perl /home/admin/software/annovar/table_annovar.pl $INPUT
                  /home/admin/software/annovar/humandb -buildver hg19
                  -out $OUT_DIR/$PREFIX.annovar -remove
                  -protocol refGene,clinvar_20170905
                  -operation g,f
                  -nastring .
                  -vcfinput

                  1 条回复 最后回复 回复 引用 0
                  • A
                    anneng 最后由 anneng 编辑

                    https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/
                    2a9612fe-b9e4-410b-88e6-4d29797bade3-image.png

                    1 条回复 最后回复 回复 引用 0
                    • A
                      anneng @anneng 最后由 编辑

                      @anneng picard 在GATK4.0以后已经集成到了 GATK 本身

                      1 条回复 最后回复 回复 引用 0
                      • A
                        anneng 最后由 编辑

                        https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/4097/100/GCF_000715135.1_Ntab-TN90/

                        注释的话 从这里可以下载 gff文件 用 snpeff或者 ANNOVAR 进行注释

                        1 条回复 最后回复 回复 引用 0
                        • First post
                          Last post
                        Powered by 暗能星系