<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[烟草数据分析]]></title><description><![CDATA[<p dir="auto">1.构建索引<br />
bwa index GCF_000715135.1_Ntab-TN90_genomic.fna.gz<br />
samtools faidx GCF_000715135.1_Ntab-TN90_genomic.fna.gz<br />
注意：不能对gzip压缩包构建索引<br />
<em>Cannot index files compressed with gzip, please use bgzip</em></p>
<p dir="auto">2.比对<br />
bwa mem -M -t 10 GCF_000715135.1_Ntab-TN90_genomic.fna.gz raw/T3_R1.fq.gz raw/T3_R2.fq.gz<br />
Bwa Mem -M Option<br />
<a href="https://www.biostars.org/p/97323/" rel="nofollow ugc">https://www.biostars.org/p/97323/</a><br />
生成bam并对bam进行排序<br />
samtools view T3.sam -b | samtools sort -o T3.sorted.bam</p>
<p dir="auto">添加Read Group标识　<br />
samtools addreplacerg -r 'ID:tabacco' -r 'LB:1334' -r 'SM:T3' -r 'PL:ILLUMINA' -r 'PU:AAA.2.xxxx' -o T3.sorted.RG.bam T3.sorted.bam</p>
<p dir="auto">什么是Read Group?<br />
<a href="https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups" rel="nofollow ugc">https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups</a><br />
构建BAM索引：<br />
samtools index T3.sorted.RG.bam</p>
]]></description><link>http://an.forum.genostack.com/topic/180/烟草数据分析</link><generator>RSS for Node</generator><lastBuildDate>Sat, 13 Jun 2026 12:33:49 GMT</lastBuildDate><atom:link href="http://an.forum.genostack.com/topic/180.rss" rel="self" type="application/rss+xml"/><pubDate>Fri, 22 Jan 2021 03:14:59 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to 烟草数据分析 on Wed, 07 Jul 2021 12:41:13 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/4097/100/GCF_000715135.1_Ntab-TN90/" rel="nofollow ugc">https://ftp.ncbi.nlm.nih.gov/genomes/all/annotation_releases/4097/100/GCF_000715135.1_Ntab-TN90/</a></p>
<p dir="auto">注释的话 从这里可以下载 gff文件 用 snpeff或者 ANNOVAR 进行注释</p>
]]></description><link>http://an.forum.genostack.com/post/673</link><guid isPermaLink="true">http://an.forum.genostack.com/post/673</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Wed, 07 Jul 2021 12:41:13 GMT</pubDate></item><item><title><![CDATA[Reply to 烟草数据分析 on Wed, 03 Feb 2021 03:04:32 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="http://an.forum.genostack.com/uid/1">@anneng</a> picard 在GATK4.0以后已经集成到了 GATK 本身</p>
]]></description><link>http://an.forum.genostack.com/post/377</link><guid isPermaLink="true">http://an.forum.genostack.com/post/377</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Wed, 03 Feb 2021 03:04:32 GMT</pubDate></item><item><title><![CDATA[Reply to 烟草数据分析 on Sat, 30 Jan 2021 03:02:03 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/" rel="nofollow ugc">https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/</a><br />
<img src="/assets/uploads/files/1611975532612-2a9612fe-b9e4-410b-88e6-4d29797bade3-image.png" alt="2a9612fe-b9e4-410b-88e6-4d29797bade3-image.png" class=" img-responsive img-markdown" /></p>
]]></description><link>http://an.forum.genostack.com/post/363</link><guid isPermaLink="true">http://an.forum.genostack.com/post/363</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 30 Jan 2021 03:02:03 GMT</pubDate></item><item><title><![CDATA[Reply to 烟草数据分析 on Sat, 30 Jan 2021 02:32:14 GMT]]></title><description><![CDATA[<p dir="auto">注释：<br />
java -jar snpEff.jar -v \<br />
&lt;snpeff_db&gt; <br />
filtered_snps_final.vcf &gt; $filtered_snps_final.ann.vcf</p>
<p dir="auto">或者<br />
perl /home/admin/software/annovar/table_annovar.pl $INPUT<br />
/home/admin/software/annovar/humandb -buildver hg19<br />
-out $OUT_DIR/$PREFIX.annovar -remove<br />
-protocol refGene,clinvar_20170905<br />
-operation g,f<br />
-nastring .<br />
-vcfinput</p>
]]></description><link>http://an.forum.genostack.com/post/362</link><guid isPermaLink="true">http://an.forum.genostack.com/post/362</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 30 Jan 2021 02:32:14 GMT</pubDate></item><item><title><![CDATA[Reply to 烟草数据分析 on Sat, 30 Jan 2021 02:31:25 GMT]]></title><description><![CDATA[<p dir="auto">基于优化的结果进行第二轮突变分析：<br />
gatk HaplotypeCaller <br />
-R ref.fa <br />
-I recal_reads.bam <br />
-o raw_variants_recal.vcf</p>
<p dir="auto">gatk SelectVariants <br />
-R ref.fa <br />
-V raw_variants_recal.vcf <br />
-selectType SNP <br />
-o raw_snps_recal.vcf<br />
gatk SelectVariants <br />
-R ref.fa <br />
-V raw_variants.vcf <br />
-selectType INDEL <br />
-o raw_indels_recal.vcf</p>
<p dir="auto">gatk VariantFiltration \<br />
-R ref.fa <br />
-V raw_snps_recal.vcf <br />
-O filtered_snps_final.vcf <br />
-filter-name "QD_filter" -filter "QD &lt; 2.0" <br />
-filter-name "FS_filter" -filter "FS &gt; 60.0" <br />
-filter-name "MQ_filter" -filter "MQ &lt; 40.0" <br />
-filter-name "SOR_filter" -filter "SOR &gt; 4.0" <br />
-filter-name "MQRankSum_filter" -filter "MQRankSum &lt; -12.5" <br />
-filter-name "ReadPosRankSum_filter" -filter "ReadPosRankSum &lt; -8.0"</p>
<p dir="auto">gatk VariantFiltration \<br />
-R ref.fa <br />
-V raw_indels_recal.fa <br />
-O filtered_indels_final.vcf <br />
-filter-name "QD_filter" -filter "QD &lt; 2.0" <br />
-filter-name "FS_filter" -filter "FS &gt; 200.0" <br />
-filter-name "SOR_filter" -filter "SOR &gt; 10.0"</p>
]]></description><link>http://an.forum.genostack.com/post/361</link><guid isPermaLink="true">http://an.forum.genostack.com/post/361</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 30 Jan 2021 02:31:25 GMT</pubDate></item><item><title><![CDATA[Reply to 烟草数据分析 on Sat, 30 Jan 2021 02:30:08 GMT]]></title><description><![CDATA[<p dir="auto">第一轮　　Base Quality Score Recalibration (BQSR)<br />
gatk BaseRecalibrator <br />
-R ref.fa <br />
-I sorted_dedup_reads.bam <br />
--known-sites bqsr_snps.vcf <br />
--known-sites bqsr_indels.vcf <br />
-O recal_data.table</p>
<p dir="auto">gatk ApplyBQSR <br />
-R ref.fa <br />
-I sorted_dedup_reads.bam <br />
-bqsr recal_data.table <br />
-O recal_reads.bam \</p>
<p dir="auto">第二轮（可选）<br />
gatk BaseRecalibrator <br />
-R ref.fa <br />
-I recal_reads.bam <br />
--known-sites bqsr_snps.vcf <br />
--known-sites bqsr_indels.vcf <br />
-O post_recal_data.table<br />
分析数据：<br />
gatk AnalyzeCovariates \<br />
-before recal_data.table <br />
-after post_recal_data.table <br />
-plots recalibration_plots.pdf</p>
]]></description><link>http://an.forum.genostack.com/post/360</link><guid isPermaLink="true">http://an.forum.genostack.com/post/360</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 30 Jan 2021 02:30:08 GMT</pubDate></item><item><title><![CDATA[Reply to 烟草数据分析 on Sat, 30 Jan 2021 02:26:36 GMT]]></title><description><![CDATA[<p dir="auto">5.获取突变<br />
gatk HaplotypeCaller <br />
-R ref.fa <br />
-I sorted_dedup_reads.bam <br />
-o raw_variants.vcf</p>
<p dir="auto">gatk --java-options "-Xmx8G" Mutect2 <br />
-R /home/admin/database/reference/hg19/ucsc.hg19.fasta <br />
-I ../out/normal_recal.bam <br />
-I ../out/cancer_recal.bam <br />
-tumor cancer <br />
-normal normal <br />
--germline-resource af-only-gnomad.raw.sites.hg19.vcf.gz <br />
-L ../in/Illumina.bed <br />
-O ../out/somatic.vcf</p>
<p dir="auto">gatk --java-options "-Xmx8G" FilterMutectCalls <br />
-V ../out/somatic.vcf <br />
-O ../out/somatic_filtered.vcf.gz</p>
<p dir="auto">获取SNP和Indels<br />
gatk SelectVariants <br />
-R ref.fa <br />
-V raw_variants.vcf <br />
-selectType SNP <br />
-o raw_snps.vcf<br />
gatk SelectVariants <br />
-R ref.fa <br />
-V raw_variants.vcf <br />
-selectType INDEL <br />
-o raw_indels.vcf</p>
<p dir="auto">过滤SNP和Indels<br />
gatk VariantFiltration <br />
-R ref.fa <br />
-V raw_snps.vcf <br />
-O filtered_snps.vcf <br />
-filter-name "QD_filter" -filter "QD &lt; 2.0" <br />
-filter-name "FS_filter" -filter "FS &gt; 60.0" <br />
-filter-name "MQ_filter" -filter "MQ &lt; 40.0" <br />
-filter-name "SOR_filter" -filter "SOR &gt; 4.0" <br />
-filter-name "MQRankSum_filter" -filter "MQRankSum &lt; -12.5" <br />
-filter-name "ReadPosRankSum_filter" -filter "ReadPosRankSum &lt; -8.0"</p>
<p dir="auto">gatk VariantFiltration <br />
-R ref.fa <br />
-V raw_indels.vcf <br />
-O filtered_indels.vcf <br />
-filter-name "QD_filter" -filter "QD &lt; 2.0" <br />
-filter-name "FS_filter" -filter "FS &gt; 200.0" <br />
-filter-name "SOR_filter" -filter "SOR &gt; 10.0"</p>
<p dir="auto">gatk SelectVariants <br />
--exclude-filtered <br />
-V filtered_snps.vcf <br />
-O bqsr_snps.vcf<br />
gatk SelectVariants <br />
--exclude-filtered <br />
-V filtered_indels.vcf <br />
-O bqsr_indels.vcf</p>
]]></description><link>http://an.forum.genostack.com/post/358</link><guid isPermaLink="true">http://an.forum.genostack.com/post/358</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 30 Jan 2021 02:26:36 GMT</pubDate></item><item><title><![CDATA[Reply to 烟草数据分析 on Sat, 30 Jan 2021 02:20:14 GMT]]></title><description><![CDATA[<p dir="auto">4.RBQS<br />
GATK4以前的版本　还需要执行　IndelRealigner　现在不需要了<br />
<a href="https://www.biostars.org/p/305123/" rel="nofollow ugc">https://www.biostars.org/p/305123/</a><br />
<a href="https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/2018-08-10-2018-04-11/11826-BQSR-without-IndelRealigner" rel="nofollow ugc">https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/2018-08-10-2018-04-11/11826-BQSR-without-IndelRealigner</a><br />
According to the bestPractice of GATK4,</p>
<p dir="auto">Mark Duplicates -&gt; RBQS -&gt; IndelRealigning by Mutect2</p>
<p dir="auto">which was below in previous version of GATK</p>
<p dir="auto">Mark Duplicates -&gt; IndelRealigner -&gt; RBQS -&gt; Mutect2</p>
<p dir="auto">RBQS without INDEL realigning wouldn’t be affected by false alignments ?</p>
<p dir="auto">/home/admin/software/gatk-4.1.0.0/gatk BaseRecalibrator <br />
-I my_reads.bam <br />
-R reference.fasta <br />
--known-sites dbsnp_138.hg19.vcf <br />
--known-sites Mills_and_1000G_gold_standard.indels.hg19.sites.vcf <br />
--known-sites 1000G_phase1.indels.hg19.sites.vcf <br />
--known-sites 1000G_phase1.snps.high_confidence.hg19.sites.vcf <br />
-L ../in/Illumina.bed <br />
-O recal_data.table</p>
<p dir="auto">/home/admin/software/gatk-4.1.0.0/gatk ApplyBQSR <br />
-R reference.fasta <br />
-I input.bam <br />
--bqsr-recal-file recal_data.table <br />
-L ../in/Illumina.bed <br />
-O output.bam</p>
<p dir="auto">对于非人类的物种怎么处理？<br />
<a href="https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/" rel="nofollow ugc">https://gencore.bio.nyu.edu/variant-calling-pipeline-gatk4/</a><br />
博得推荐的方法是bootstrapping 　即先call SNP　然后用这些数据再做BQSR<br />
Base Quality Score Recalibration (BQSR) is an important step for accurate variant detection that aims to minimize the effect of technical variation on base quality scores (measured as Phred scores). As with the original pipeline (link), this pipeline assumes that a ‘gold standard’ set of SNPS and indels are not available for BQSR.  In the absence of a gold standard the pipeline performs an initial step detecting variants without performing BQSR, and then uses the identified SNPs as input for BQSR before calling variants again. This process is referred to as bootstrapping and is the procedure recommended by the Broad Institute’s best practices for variant discovery analysis when a gold standard is not available.</p>
]]></description><link>http://an.forum.genostack.com/post/356</link><guid isPermaLink="true">http://an.forum.genostack.com/post/356</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 30 Jan 2021 02:20:14 GMT</pubDate></item><item><title><![CDATA[Reply to 烟草数据分析 on Fri, 22 Jan 2021 11:55:34 GMT]]></title><description><![CDATA[<p dir="auto">3.java -jar picard.jar MarkDuplicates <br />
I=input.bam <br />
O=marked_duplicates.bam <br />
M=marked_dup_metrics.txt<br />
去重的工具对比　samtools vs picard<br />
<a href="https://www.biostars.org/p/390305/" rel="nofollow ugc">https://www.biostars.org/p/390305/</a></p>
]]></description><link>http://an.forum.genostack.com/post/344</link><guid isPermaLink="true">http://an.forum.genostack.com/post/344</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Fri, 22 Jan 2021 11:55:34 GMT</pubDate></item></channel></rss>