暗能星系

    • 登录
    • 搜索

    烟草数据分析

    生物信息分析
    1
    12
    70
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 anneng 编辑

      1.构建索引
      bwa index GCF_000715135.1_Ntab-TN90_genomic.fna.gz
      samtools faidx GCF_000715135.1_Ntab-TN90_genomic.fna.gz
      注意:不能对gzip压缩包构建索引
      Cannot index files compressed with gzip, please use bgzip

      2.比对
      bwa mem -M -t 10 GCF_000715135.1_Ntab-TN90_genomic.fna.gz raw/T3_R1.fq.gz raw/T3_R2.fq.gz
      Bwa Mem -M Option
      https://www.biostars.org/p/97323/
      生成bam并对bam进行排序
      samtools view T3.sam -b | samtools sort -o T3.sorted.bam

      添加Read Group标识 
      samtools addreplacerg -r 'ID:tabacco' -r 'LB:1334' -r 'SM:T3' -r 'PL:ILLUMINA' -r 'PU:AAA.2.xxxx' -o T3.sorted.RG.bam T3.sorted.bam

      什么是Read Group?
      https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups
      构建BAM索引:
      samtools index T3.sorted.RG.bam

      1 条回复 最后回复 回复 引用 0
      • A
        anneng 最后由 anneng 编辑

        3.java -jar picard.jar MarkDuplicates
        I=input.bam
        O=marked_duplicates.bam
        M=marked_dup_metrics.txt
        去重的工具对比 samtools vs picard
        https://www.biostars.org/p/390305/

        A 1 条回复 最后回复 回复 引用 0
        • A
          anneng 最后由 anneng 编辑

          4.RBQS
          GATK4以前的版本 还需要执行 IndelRealigner 现在不需要了
          https://www.biostars.org/p/305123/
          https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/2018-08-10-2018-04-11/11826-BQSR-without-IndelRealigner
          According to the bestPractice of GATK4,

          Mark Duplicates -> RBQS -> IndelRealigning by Mutect2

          which was below in previous version of GATK

          Mark Duplicates -> IndelRealigner -> RBQS -> Mutect2

          RBQS without INDEL realigning wouldn’t be affected by false alignments ?

          /home/admin/software/gatk-4.1.0.0/gatk BaseRecalibrator
          -I my_reads.bam
          -R reference.fasta
          --known-sites dbsnp_138.hg19.vcf
          --known-sites Mills_and_1000G_gold_standard.indels.hg19.sites.vcf
          --known-sites 1000G_phase1.indels.hg19.sites.vcf
          --known-sites 1000G_phase1.snps.high_confidence.hg19.sites.vcf
          -L ../in/Illumina.bed
          -O recal_data.table

          /home/admin/software/gatk-4.1.0.0/gatk ApplyBQSR
          -R reference.fasta
          -I input.bam
          --bqsr-recal-file recal_data.table
          -L ../in/Illumina.bed
          -O output.bam

          对于非人类的物种怎么处理?
          https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/
          博得推荐的方法是bootstrapping  即先call SNP 然后用这些数据再做BQSR
          Base Quality Score Recalibration (BQSR) is an important step for accurate variant detection that aims to minimize the effect of technical variation on base quality scores (measured as Phred scores). As with the original pipeline (link), this pipeline assumes that a ‘gold standard’ set of SNPS and indels are not available for BQSR. In the absence of a gold standard the pipeline performs an initial step detecting variants without performing BQSR, and then uses the identified SNPs as input for BQSR before calling variants again. This process is referred to as bootstrapping and is the procedure recommended by the Broad Institute’s best practices for variant discovery analysis when a gold standard is not available.

          1 条回复 最后回复 回复 引用 0
          • A
            anneng 最后由 编辑

            此回复已被删除!
            1 条回复 最后回复 回复 引用 0
            • A
              anneng 最后由 anneng 编辑

              5.获取突变
              gatk HaplotypeCaller
              -R ref.fa
              -I sorted_dedup_reads.bam
              -o raw_variants.vcf

              gatk --java-options "-Xmx8G" Mutect2
              -R /home/admin/database/reference/hg19/ucsc.hg19.fasta
              -I ../out/normal_recal.bam
              -I ../out/cancer_recal.bam
              -tumor cancer
              -normal normal
              --germline-resource af-only-gnomad.raw.sites.hg19.vcf.gz
              -L ../in/Illumina.bed
              -O ../out/somatic.vcf

              gatk --java-options "-Xmx8G" FilterMutectCalls
              -V ../out/somatic.vcf
              -O ../out/somatic_filtered.vcf.gz

              获取SNP和Indels
              gatk SelectVariants
              -R ref.fa
              -V raw_variants.vcf
              -selectType SNP
              -o raw_snps.vcf
              gatk SelectVariants
              -R ref.fa
              -V raw_variants.vcf
              -selectType INDEL
              -o raw_indels.vcf

              过滤SNP和Indels
              gatk VariantFiltration
              -R ref.fa
              -V raw_snps.vcf
              -O filtered_snps.vcf
              -filter-name "QD_filter" -filter "QD < 2.0"
              -filter-name "FS_filter" -filter "FS > 60.0"
              -filter-name "MQ_filter" -filter "MQ < 40.0"
              -filter-name "SOR_filter" -filter "SOR > 4.0"
              -filter-name "MQRankSum_filter" -filter "MQRankSum < -12.5"
              -filter-name "ReadPosRankSum_filter" -filter "ReadPosRankSum < -8.0"

              gatk VariantFiltration
              -R ref.fa
              -V raw_indels.vcf
              -O filtered_indels.vcf
              -filter-name "QD_filter" -filter "QD < 2.0"
              -filter-name "FS_filter" -filter "FS > 200.0"
              -filter-name "SOR_filter" -filter "SOR > 10.0"

              gatk SelectVariants
              --exclude-filtered
              -V filtered_snps.vcf
              -O bqsr_snps.vcf
              gatk SelectVariants
              --exclude-filtered
              -V filtered_indels.vcf
              -O bqsr_indels.vcf

              1 条回复 最后回复 回复 引用 0
              • A
                anneng 最后由 编辑

                此回复已被删除!
                1 条回复 最后回复 回复 引用 0
                • A
                  anneng 最后由 编辑

                  第一轮  Base Quality Score Recalibration (BQSR)
                  gatk BaseRecalibrator
                  -R ref.fa
                  -I sorted_dedup_reads.bam
                  --known-sites bqsr_snps.vcf
                  --known-sites bqsr_indels.vcf
                  -O recal_data.table

                  gatk ApplyBQSR
                  -R ref.fa
                  -I sorted_dedup_reads.bam
                  -bqsr recal_data.table
                  -O recal_reads.bam \

                  第二轮(可选)
                  gatk BaseRecalibrator
                  -R ref.fa
                  -I recal_reads.bam
                  --known-sites bqsr_snps.vcf
                  --known-sites bqsr_indels.vcf
                  -O post_recal_data.table
                  分析数据:
                  gatk AnalyzeCovariates \
                  -before recal_data.table
                  -after post_recal_data.table
                  -plots recalibration_plots.pdf

                  1 条回复 最后回复 回复 引用 0
                  • A
                    anneng 最后由 编辑

                    基于优化的结果进行第二轮突变分析:
                    gatk HaplotypeCaller
                    -R ref.fa
                    -I recal_reads.bam
                    -o raw_variants_recal.vcf

                    gatk SelectVariants
                    -R ref.fa
                    -V raw_variants_recal.vcf
                    -selectType SNP
                    -o raw_snps_recal.vcf
                    gatk SelectVariants
                    -R ref.fa
                    -V raw_variants.vcf
                    -selectType INDEL
                    -o raw_indels_recal.vcf

                    gatk VariantFiltration \
                    -R ref.fa
                    -V raw_snps_recal.vcf
                    -O filtered_snps_final.vcf
                    -filter-name "QD_filter" -filter "QD < 2.0"
                    -filter-name "FS_filter" -filter "FS > 60.0"
                    -filter-name "MQ_filter" -filter "MQ < 40.0"
                    -filter-name "SOR_filter" -filter "SOR > 4.0"
                    -filter-name "MQRankSum_filter" -filter "MQRankSum < -12.5"
                    -filter-name "ReadPosRankSum_filter" -filter "ReadPosRankSum < -8.0"

                    gatk VariantFiltration \
                    -R ref.fa
                    -V raw_indels_recal.fa
                    -O filtered_indels_final.vcf
                    -filter-name "QD_filter" -filter "QD < 2.0"
                    -filter-name "FS_filter" -filter "FS > 200.0"
                    -filter-name "SOR_filter" -filter "SOR > 10.0"

                    1 条回复 最后回复 回复 引用 0
                    • A
                      anneng 最后由 编辑

                      注释:
                      java -jar snpEff.jar -v \
                      <snpeff_db>
                      filtered_snps_final.vcf > $filtered_snps_final.ann.vcf

                      或者
                      perl /home/admin/software/annovar/table_annovar.pl $INPUT
                      /home/admin/software/annovar/humandb -buildver hg19
                      -out $OUT_DIR/$PREFIX.annovar -remove
                      -protocol refGene,clinvar_20170905
                      -operation g,f
                      -nastring .
                      -vcfinput

                      1 条回复 最后回复 回复 引用 0
                      • A
                        anneng 最后由 anneng 编辑

                        https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/
                        2a9612fe-b9e4-410b-88e6-4d29797bade3-image.png

                        1 条回复 最后回复 回复 引用 0
                        • A
                          anneng @anneng 最后由 编辑

                          @anneng picard 在GATK4.0以后已经集成到了 GATK 本身

                          1 条回复 最后回复 回复 引用 0
                          • A
                            anneng 最后由 编辑

                            https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/4097/100/GCF_000715135.1_Ntab-TN90/

                            注释的话 从这里可以下载 gff文件 用 snpeff或者 ANNOVAR 进行注释

                            1 条回复 最后回复 回复 引用 0
                            • First post
                              Last post
                            Powered by 暗能星系