<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[GATK]]></title><description><![CDATA[<p dir="auto"><a href="https://hpc.nih.gov/training/gatk_tutorial/bqsr.html" rel="nofollow ugc">https://hpc.nih.gov/training/gatk_tutorial/bqsr.html</a></p>
]]></description><link>http://an.forum.genostack.com/topic/702/gatk</link><generator>RSS for Node</generator><lastBuildDate>Sat, 13 Jun 2026 10:41:47 GMT</lastBuildDate><atom:link href="http://an.forum.genostack.com/topic/702.rss" rel="self" type="application/rss+xml"/><pubDate>Mon, 27 Jun 2022 06:52:11 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to GATK on Tue, 19 Jul 2022 10:35:41 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-020-07013-y" rel="nofollow ugc">https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-020-07013-y</a><br />
Optimizing performance of GATK workflows using Apache Arrow In-Memory data framework</p>
]]></description><link>http://an.forum.genostack.com/post/1702</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1702</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Tue, 19 Jul 2022 10:35:41 GMT</pubDate></item><item><title><![CDATA[Reply to GATK on Mon, 18 Jul 2022 10:05:35 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://support.terra.bio/hc/en-us/articles/360037493811--4-howto-Use-scatter-gather-to-joint-call-genotypes" rel="nofollow ugc">https://support.terra.bio/hc/en-us/articles/360037493811--4-howto-Use-scatter-gather-to-joint-call-genotypes</a></p>
]]></description><link>http://an.forum.genostack.com/post/1696</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1696</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Mon, 18 Jul 2022 10:05:35 GMT</pubDate></item><item><title><![CDATA[Reply to GATK on Mon, 18 Jul 2022 08:49:55 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://hpc.nih.gov/training/gatk_tutorial/" rel="nofollow ugc">https://hpc.nih.gov/training/gatk_tutorial/</a><br />
A practical introduction to GATK 4 on Biowulf (NIH HPC)</p>
]]></description><link>http://an.forum.genostack.com/post/1695</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1695</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Mon, 18 Jul 2022 08:49:55 GMT</pubDate></item><item><title><![CDATA[Reply to GATK on Mon, 18 Jul 2022 08:46:09 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://www.melbournebioinformatics.org.au/tutorials/tutorials/variant_calling_gatk1/files/variant_calling_gatk1.pdf" rel="nofollow ugc">https://www.melbournebioinformatics.org.au/tutorials/tutorials/variant_calling_gatk1/files/variant_calling_gatk1.pdf</a></p>
<pre><code>gatk VariantsToTable \
-R reference/hg38/Homo_sapiens_assembly38.fasta \
-V output/output.vqsr.varfilter.pass.vcf.gz \
-F CHROM -F POS -F FILTER -F TYPE -GF AD -GF DP \
--show-filtered \
-O output/output.vqsr.varfilter.pass.tsv
</code></pre>
<p dir="auto">GATK可以把vcf变成表格</p>
]]></description><link>http://an.forum.genostack.com/post/1694</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1694</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Mon, 18 Jul 2022 08:46:09 GMT</pubDate></item><item><title><![CDATA[Reply to GATK on Mon, 18 Jul 2022 08:23:13 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://gatk.broadinstitute.org/hc/en-us/community/posts/360071192131-Merge-different-individual-VCF" rel="nofollow ugc">https://gatk.broadinstitute.org/hc/en-us/community/posts/360071192131-Merge-different-individual-VCF</a></p>
<p dir="auto">mergevcfs 的样本列表要相同</p>
]]></description><link>http://an.forum.genostack.com/post/1693</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1693</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Mon, 18 Jul 2022 08:23:13 GMT</pubDate></item><item><title><![CDATA[Reply to GATK on Mon, 18 Jul 2022 07:36:14 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://haplotypecaller1.rssing.com/chan-10646605/all_p46.html" rel="nofollow ugc">https://haplotypecaller1.rssing.com/chan-10646605/all_p46.html</a></p>
<ol start="7">
<li>Merging VCF files<br />
There are three main reasons why you might want to combine variants from different files into one, and the tool to use depends on what you are trying to achieve.</li>
</ol>
<p dir="auto">The most common case is when you have been parallelizing your variant calling analyses, e.g. running HaplotypeCaller per-chromosome, producing separate VCF files (or GVCF files) per-chromosome. For that case, you can use the Picard tool <strong>MergeVcfs</strong> to merge the files. See the relevant Tool Doc page for usage details.</p>
<p dir="auto">The second case is when you have been using HaplotypeCaller in -ERC GVCF or -ERC BP_RESOLUTION to call variants on a large cohort, producing many GVCF files. You then need to consolidate them before joint-calling variants with <strong>GenotypeGVCFs</strong> (for performance reasons). This can be done with either CombineGVCFs or ImportGenomicsDB tools, both of which are specifically designed to handle GVCFs in this way. See the relevant Tool Doc pages for usage details and the Best Practices workflow documentation to learn more about the logic of this workflow.</p>
<p dir="auto">The third case is when you want to compare variant calls that were produced from the same samples but using different methods, for comparison. For example, if you're evaluating variant calls produced by different variant callers, different workflows, or the same but using different parameters. For this case, we recommend taking a different approach; rather than merging the VCF files (which can have all sorts of complicated consequences), you can us the VariantAnnotator tool to annotate one of the VCFs with the other treated as a resource. See the relevant Tool Doc page for usage details.</p>
]]></description><link>http://an.forum.genostack.com/post/1692</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1692</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Mon, 18 Jul 2022 07:36:14 GMT</pubDate></item><item><title><![CDATA[Reply to GATK on Mon, 18 Jul 2022 04:33:41 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://www.clinbioinfosspa.es/files/pipelines/germline_workflow_diagram.pdf" rel="nofollow ugc">https://www.clinbioinfosspa.es/files/pipelines/germline_workflow_diagram.pdf</a><br />
<img src="/assets/uploads/files/1658118792679-5ceb753b-158c-4e38-8f74-ef3fbccd79d7-image.png" alt="5ceb753b-158c-4e38-8f74-ef3fbccd79d7-image.png" class=" img-responsive img-markdown" /></p>
]]></description><link>http://an.forum.genostack.com/post/1691</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1691</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Mon, 18 Jul 2022 04:33:41 GMT</pubDate></item><item><title><![CDATA[Reply to GATK on Mon, 18 Jul 2022 04:14:30 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://nbisweden.github.io/workshop-ngsintro/2005/lab_vc.html#5_Variant_calling_in_cohort" rel="nofollow ugc">https://nbisweden.github.io/workshop-ngsintro/2005/lab_vc.html#5_Variant_calling_in_cohort</a></p>
]]></description><link>http://an.forum.genostack.com/post/1690</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1690</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Mon, 18 Jul 2022 04:14:30 GMT</pubDate></item><item><title><![CDATA[Reply to GATK on Thu, 14 Jul 2022 03:19:38 GMT]]></title><description><![CDATA[<p dir="auto">新增一个SNP过滤和注释流程包括<br />
过滤：</p>
<pre><code>   gatk VariantFiltration \
   -R reference.fasta \
   -V input.vcf.gz \
   -O output.vcf.gz \
   -filter "QD &lt;4.0 || FS&gt; 60.0 || MQ &lt;40.0" \
   --filterName "tobacco"
   -window "4"
   -G-filter "GQ&lt;20.0"
</code></pre>
<p dir="auto">这些过滤参数要暴露出来 可以设置<br />
snpeff：注释<br />
在gatk中得到的是全的未过滤的vcf<br />
过滤后也用sneff看下结果</p>
]]></description><link>http://an.forum.genostack.com/post/1681</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1681</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 14 Jul 2022 03:19:38 GMT</pubDate></item><item><title><![CDATA[Reply to GATK on Thu, 14 Jul 2022 02:48:37 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://gatk.broadinstitute.org/hc/en-us/articles/360035890471-Hard-filtering-germline-short-variants" rel="nofollow ugc">https://gatk.broadinstitute.org/hc/en-us/articles/360035890471-Hard-filtering-germline-short-variants</a><br />
vcf过滤</p>
]]></description><link>http://an.forum.genostack.com/post/1680</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1680</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 14 Jul 2022 02:48:37 GMT</pubDate></item><item><title><![CDATA[Reply to GATK on Thu, 14 Jul 2022 01:44:38 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://qcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr12-5-Variant_calling_joint_genotyping.pdf" rel="nofollow ugc">https://qcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATKwr12-5-Variant_calling_joint_genotyping.pdf</a><br />
<a href="/assets/uploads/files/1657763067839-gatkwr12-5-variant_calling_joint_genotyping.pdf">GATKwr12-5-Variant_calling_joint_genotyping.pdf</a><br />
对gvcf的解释</p>
]]></description><link>http://an.forum.genostack.com/post/1679</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1679</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 14 Jul 2022 01:44:38 GMT</pubDate></item><item><title><![CDATA[Reply to GATK on Wed, 13 Jul 2022 11:12:51 GMT]]></title><description><![CDATA[<p dir="auto"><a href="/assets/uploads/files/1657710770798-gatk_discovery_tutorial-worksheet-aus2016.pdf">GATK_Discovery_Tutorial-Worksheet-AUS2016.pdf</a></p>
]]></description><link>http://an.forum.genostack.com/post/1678</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1678</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Wed, 13 Jul 2022 11:12:51 GMT</pubDate></item><item><title><![CDATA[Reply to GATK on Sat, 02 Jul 2022 16:18:59 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://paleomix.readthedocs.io/en/stable/other_tools.html#paleomix-rmdup-collapsed" rel="nofollow ugc">https://paleomix.readthedocs.io/en/stable/other_tools.html#paleomix-rmdup-collapsed</a><br />
去重合并的reads</p>
<p dir="auto"><a href="https://www.biostars.org/p/347514/" rel="nofollow ugc">https://www.biostars.org/p/347514/</a><br />
先去重再合并</p>
]]></description><link>http://an.forum.genostack.com/post/1646</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1646</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 02 Jul 2022 16:18:59 GMT</pubDate></item><item><title><![CDATA[Reply to GATK on Sat, 02 Jul 2022 16:12:12 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://wikis.univ-lille.fr/bilille/_media/ngs2019_dna_duplicates.pdf" rel="nofollow ugc">https://wikis.univ-lille.fr/bilille/_media/ngs2019_dna_duplicates.pdf</a><br />
<img src="/assets/uploads/files/1656778330753-d1d000ef-2dfb-491c-870e-9436c0646304-image.png" alt="d1d000ef-2dfb-491c-870e-9436c0646304-image.png" class=" img-responsive img-markdown" /></p>
]]></description><link>http://an.forum.genostack.com/post/1645</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1645</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 02 Jul 2022 16:12:12 GMT</pubDate></item><item><title><![CDATA[Reply to GATK on Sat, 02 Jul 2022 16:08:32 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://yulijia.net/slides/bioinfomatcis_for_medical_students/2019-07-31-A_beginners_guide_to_Call_SNPs_and_indels_Part_II.html#1" rel="nofollow ugc">https://yulijia.net/slides/bioinfomatcis_for_medical_students/2019-07-31-A_beginners_guide_to_Call_SNPs_and_indels_Part_II.html#1</a><br />
<img src="/assets/uploads/files/1656778107567-201d70bf-c895-4ca6-9b74-f244472b1cbe-image.png" alt="201d70bf-c895-4ca6-9b74-f244472b1cbe-image.png" class=" img-responsive img-markdown" /></p>
]]></description><link>http://an.forum.genostack.com/post/1644</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1644</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 02 Jul 2022 16:08:32 GMT</pubDate></item><item><title><![CDATA[Reply to GATK on Thu, 30 Jun 2022 08:31:33 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://qcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATK_Discovery_Tutorial-Worksheet-AUS2016.pdf" rel="nofollow ugc">https://qcb.ucla.edu/wp-content/uploads/sites/14/2016/03/GATK_Discovery_Tutorial-Worksheet-AUS2016.pdf</a></p>
<p dir="auto">一个gatk指南 讲的比较细致</p>
]]></description><link>http://an.forum.genostack.com/post/1627</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1627</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 30 Jun 2022 08:31:33 GMT</pubDate></item><item><title><![CDATA[Reply to GATK on Wed, 29 Jun 2022 09:39:34 GMT]]></title><description><![CDATA[<p dir="auto">GATK流程记录</p>
<h3>参考序列处理：</h3>
<h4>1. bwa index 建索引</h4>
<pre><code>镜像：11918067/genomes-in-the-cloud:2.4.2-1552931386
命令： /usr/gitc/bwa index *.fa
</code></pre>
<h4>2. picard 创建.dict文件</h4>
<pre><code>镜像： broadinstitute/picard:2.27.3
命令：java -jar /usr/picard/picard.jar CreateSequenceDictionary R=Nitab-v4.5_genome_Chr_Edwards2017.fasta O=Nitab-v4.5_genome_Chr_Edwards2017.fasta.dict
</code></pre>
<h4>3. samtools 创建.fai</h4>
<pre><code>命令：samtools faidx Nitab-v4.5_genome_Chr_Edwards2017.fasta
镜像：quay.io/biocontainers/samtools:1.15.1--h1170115_0
</code></pre>
<h4>4. SnpEff  ref库：</h4>
<pre><code>命令： snp_build -name ${Name} -ann ${Gff} -fa ${Fa}
镜像： docker: "anneng01:8090/library/angs_snpeff:1.0.0"
输出路径result
</code></pre>
<h3>注意事项</h3>
<p dir="auto">SnpEff 输入路径VCF及其索引文件必须是gz压缩格式，例如： reference.vcf.gz</p>
<h3>WDL流程</h3>
<p dir="auto"><a href="/assets/uploads/files/1656494883519-gatk_merge.wdl">gatk_merge.wdl</a></p>
<h3>测试数据</h3>
<pre><code>{
    "GATK.fastq_1":"/ceph_disk3/file_server/tmp/lanzhou/data/H06HDADXX130110.1.ATCACGAT.20k_reads_1.fastq",
    "GATK.fastq_2":"/ceph_disk3/file_server/tmp/lanzhou/data/H06HDADXX130110.1.ATCACGAT.20k_reads_2.fastq",
    "GATK.ref_fasta":"/ceph_disk2/data/hongyuan/mnt/data/public_data/tobacco_1656467890/Nitab-v4.5_genome_Chr_Edwards2017.fasta",
    "GATK.snp_ref":"/ceph_disk2/data/hongyuan/mnt/data/public_data/tobacco_1656467890/result"
}
</code></pre>
]]></description><link>http://an.forum.genostack.com/post/1616</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1616</guid><dc:creator><![CDATA[zhanglu]]></dc:creator><pubDate>Wed, 29 Jun 2022 09:39:34 GMT</pubDate></item><item><title><![CDATA[Reply to GATK on Mon, 27 Jun 2022 07:02:11 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://eriqande.github.io/eca-bioinf-handbook/" rel="nofollow ugc">https://eriqande.github.io/eca-bioinf-handbook/</a></p>
]]></description><link>http://an.forum.genostack.com/post/1590</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1590</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Mon, 27 Jun 2022 07:02:11 GMT</pubDate></item><item><title><![CDATA[Reply to GATK on Mon, 27 Jun 2022 06:59:51 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://gatk.broadinstitute.org/hc/en-us/community/posts/360066050311-BQSR-bootstrapping-for-multiple-sample-dataset-with-no-known-variants-non-human-" rel="nofollow ugc">https://gatk.broadinstitute.org/hc/en-us/community/posts/360066050311-BQSR-bootstrapping-for-multiple-sample-dataset-with-no-known-variants-non-human-</a></p>
<p dir="auto">Thanks for the follow up question on this post so that we can address it!</p>
<p dir="auto">We don't have any current BQSR bootstrapping methods or recommendations for when there is no known sites file.</p>
<p dir="auto">If you don't have a known sites file, you can still use GATK. Just skip the BQSR step and use hard filtering instead of VQSR. It's more ideal to be able to use the BQSR and VQSR machine learning steps, but it's not possible if you don't have a known sites file.</p>
<p dir="auto">Hope this helps!</p>
<p dir="auto">Genevieve</p>
<p dir="auto">GATK的的BQSR 方法依赖known sites,例如dbsnp，对于研究比较成熟的模式生物，如人类比较有用。其他物种的话可以删除。</p>
]]></description><link>http://an.forum.genostack.com/post/1589</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1589</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Mon, 27 Jun 2022 06:59:51 GMT</pubDate></item></channel></rss>