<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[新冠病毒数据分析]]></title><description><![CDATA[<p dir="auto">nCoV-2019 novel coronavirus bioinformatics protocol<br />
<a href="https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html" rel="nofollow ugc">https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html</a><br />
1.硬件配置<br />
ARTIC 针对的设备是MinION，推荐的配置为：<br />
Intel i7 or Xeon processor、16GB RAM、1TB SSD hard drive、USB 3<br />
这个是一个主流笔记本的配置</p>
<p dir="auto">2.软件环境<br />
2.1 操作系统<br />
64-bit UNIX, Linux or similar environment<br />
Mac OS X (Yosemite or later), Linux (e.g., Ubuntu 16 or later), or Windows 10 Subsystem for Linux。</p>
<p dir="auto">2.2 应用环境<br />
<a href="https://conda.io/docs/user-guide/install/" rel="nofollow ugc">https://conda.io/docs/user-guide/install/</a><br />
64-bit Python 3.6 version of Miniconda</p>
<p dir="auto">2.3 安装artic<br />
git clone --recursive <a href="https://github.com/artic-network/artic-ncov2019.git" rel="nofollow ugc">https://github.com/artic-network/artic-ncov2019.git</a><br />
conda env create -f artic-ncov2019/environment.yml<br />
conda activate artic-ncov2019<br />
去激活：conda deactivate 删除：conda remove --name artic-ncov2019 --all</p>
<p dir="auto">更新artic<br />
cd artic-ncov2019<br />
conda env remove -n artic-ncov2019<br />
conda env create -f environment.yml</p>
<p dir="auto">2.4 开始分析<br />
创建分析目录<br />
mkdir analysis<br />
cd analysis<br />
mkdir run_name<br />
cd run_name<br />
2.4.1 激活环境<br />
source activate artic-ncov2019<br />
2.4.2  basecalling<br />
快速模式<br />
guppy_basecaller -c dna_r9.4.1_450bps_fast.cfg -i /path/to/reads -s run_name -x auto -r<br />
高精度模式<br />
guppy_basecaller -c dna_r9.4.1_450bps_hac.cfg -i /path/to/reads -s run_name -x auto -r<br />
2.4.3 Demultiplexing<br />
guppy_barcoder --require_barcodes_both_ends -i run_name -s output_directory --arrangements_files "barcode_arrs_nb12.cfg barcode_arrs_nb24.cfg"<br />
--require_barcodes_both_ends:Reads will only be classified if there is a barcode above the min_score at both ends of the read.<br />
-i  Path to input fastq files<br />
-s Path to save fastq files.<br />
--arrangements_files Files containing arrangements.</p>
<p dir="auto">2.4.4 过滤<br />
由于有嵌合体，需要按照长度进行过滤。<br />
artic guppyplex --min-length 400 --max-length 700 --directory output_directory/barcode03 --prefix run_name</p>
<p dir="auto">2.4.5 分析<br />
artic minion --normalise 200 --threads 4 --scheme-directory ~/artic-ncov2019/primer_schemes --read-file run_name_barcode03.fastq --fast5-directory path_to_fast5 --sequencing-summary path_to_sequencing_summary.txt nCoV-2019/V3 samplename<br />
产生如下文件：<br />
samplename.rg.primertrimmed.bam - BAM file for visualisation after primer-binding site trimming<br />
samplename.trimmed.bam - BAM file with the primers left on (used in variant calling)<br />
samplename.merged.vcf - all detected variants in VCF format<br />
samplename.pass.vcf - detected variants in VCF format passing quality filter<br />
samplename.fail.vcf - detected variants in VCF format failing quality filter<br />
samplename.primers.vcf - detected variants falling in primer-binding regions<br />
samplename.consensus.fasta - consensus sequence</p>
]]></description><link>http://an.forum.genostack.com/topic/100/新冠病毒数据分析</link><generator>RSS for Node</generator><lastBuildDate>Sat, 13 Jun 2026 10:59:10 GMT</lastBuildDate><atom:link href="http://an.forum.genostack.com/topic/100.rss" rel="self" type="application/rss+xml"/><pubDate>Fri, 30 Oct 2020 11:19:11 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to 新冠病毒数据分析 on Wed, 20 Apr 2022 11:02:26 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/sars-cov-2-variant-discovery/tutorial.html" rel="nofollow ugc">https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/sars-cov-2-variant-discovery/tutorial.html</a><br />
<img src="/assets/uploads/files/1650452506213-3d45b431-f0f2-4fb8-8097-95ef568404a1-image-resized.png" alt="3d45b431-f0f2-4fb8-8097-95ef568404a1-image.png" class=" img-responsive img-markdown" /></p>
<p dir="auto"><img src="/assets/uploads/files/1650452531584-b3923b0d-86a8-4cf0-a330-7faf0d318bfb-image.png" alt="b3923b0d-86a8-4cf0-a330-7faf0d318bfb-image.png" class=" img-responsive img-markdown" /></p>
<p dir="auto"><img src="/assets/uploads/files/1650452544657-8e618714-840f-4f27-985a-50420fdd0571-image.png" alt="8e618714-840f-4f27-985a-50420fdd0571-image.png" class=" img-responsive img-markdown" /></p>
]]></description><link>http://an.forum.genostack.com/post/1402</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1402</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Wed, 20 Apr 2022 11:02:26 GMT</pubDate></item><item><title><![CDATA[Reply to 新冠病毒数据分析 on Wed, 06 Apr 2022 11:54:35 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://bugseq.com/demo/metagenomic" rel="nofollow ugc">https://bugseq.com/demo/metagenomic</a><br />
<img src="/assets/uploads/files/1649246073368-58413ace-956d-41e9-a1e6-a55667bea09a-image.png" alt="58413ace-956d-41e9-a1e6-a55667bea09a-image.png" class=" img-responsive img-markdown" /></p>
]]></description><link>http://an.forum.genostack.com/post/1365</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1365</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Wed, 06 Apr 2022 11:54:35 GMT</pubDate></item><item><title><![CDATA[Reply to 新冠病毒数据分析 on Fri, 01 Apr 2022 02:59:54 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://artic.readthedocs.io/en/latest/primer-schemes/" rel="nofollow ugc">https://artic.readthedocs.io/en/latest/primer-schemes/</a><br />
artic的引物设计<br />
<img src="/assets/uploads/files/1648538640061-8a721750-2510-4f0c-8cf0-dd8ff18b6f56-image.png" alt="8a721750-2510-4f0c-8cf0-dd8ff18b6f56-image.png" class=" img-responsive img-markdown" /><br />
该图来自该nature的文献 对多重PCR做了详细说明<br />
<a href="https://www.nature.com/articles/nprot.2017.066" rel="nofollow ugc">https://www.nature.com/articles/nprot.2017.066</a><br />
Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples<br />
文档中提到了这个方法主要是用于在临床中 宏基因测序的病毒载量很低<br />
genome sequencing directly from clinical samples (i.e., without isolation and culture) remains challenging for viruses such as Zika, for which metagenomic sequencing methods may generate insufficient numbers of viral reads.<br />
这个文章还提到了一个在线引物设计工具（引物设计是实验的一个关键环节 这类工具我们可以做到系统里面 甚至做成一个app）<br />
<img src="/assets/uploads/files/1648781911627-5d442beb-a955-439a-b12d-4def69ab3642-image.png" alt="5d442beb-a955-439a-b12d-4def69ab3642-image.png" class=" img-responsive img-markdown" /></p>
]]></description><link>http://an.forum.genostack.com/post/1344</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1344</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Fri, 01 Apr 2022 02:59:54 GMT</pubDate></item><item><title><![CDATA[Reply to 新冠病毒数据分析 on Tue, 29 Mar 2022 07:09:40 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://dockstore.org/organizations/BroadInstitute/collections/pgs" rel="nofollow ugc">https://dockstore.org/organizations/BroadInstitute/collections/pgs</a></p>
]]></description><link>http://an.forum.genostack.com/post/1343</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1343</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Tue, 29 Mar 2022 07:09:40 GMT</pubDate></item><item><title><![CDATA[Reply to 新冠病毒数据分析 on Fri, 01 Apr 2022 02:21:02 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://terra.bio/workflow-updates-to-the-covid-19-workspace-better-viral-assembly-and-phylogenetics-with-nextstrain/" rel="nofollow ugc">https://terra.bio/workflow-updates-to-the-covid-19-workspace-better-viral-assembly-and-phylogenetics-with-nextstrain/</a><br />
terra 对sars 流程的更新说明<br />
公开流程仓库<br />
<a href="https://app.terra.bio/#workspaces/pathogen-genomic-surveillance/COVID-19" rel="nofollow ugc">https://app.terra.bio/#workspaces/pathogen-genomic-surveillance/COVID-19</a></p>
<p dir="auto"><a href="https://support.terra.bio/hc/en-us/articles/360041068771" rel="nofollow ugc">https://support.terra.bio/hc/en-us/articles/360041068771</a><br />
<img src="/assets/uploads/files/1648777690021-7192e3a3-c3c9-4efb-bbc5-075ba9f62616-image.png" alt="7192e3a3-c3c9-4efb-bbc5-075ba9f62616-image.png" class=" img-responsive img-markdown" /></p>
<p dir="auto"><img src="/assets/uploads/files/1648777705037-d326298f-067b-44f4-9c3d-26d5f6af6d7f-image.png" alt="d326298f-067b-44f4-9c3d-26d5f6af6d7f-image.png" class=" img-responsive img-markdown" /><br />
<a href="/assets/uploads/files/1648778512456-2020.08.23.20178236v1.full.sars.kraken2.terra.pdf">2020.08.23.20178236v1.full.sars.kraken2.terra.pdf</a></p>
<p dir="auto">这个文章里面提到 使用kraken2检测其他病毒 主要用于排除新冠和其他病毒的交叉感染</p>
<pre><code>We used Kraken2 (46) to identify other viral taxa present in NP swab samples from COVID
positive patients, excluding those removed by filters i and ii described above. To do so, we ran
the classify_single workflow on all reads from all samples (with
kraken2_db_tgz=”gs://pathogen-public-dbs/v1/kraken2-broad-20200505.tar.zst”,
krona_taxonomy_db_kraken2_tgz=”gs://pathogen-public-dbs/v1/krona.taxonomy-20200505.tab.
zst”, ncbi_taxdump_tgz=”gs://pathogen-public-dbs/v1/taxdump-20200505.tar.gz”,
trim_clip_db=”gs://pathogen-public-dbs/v0/contaminants.clip_db.fasta”,
spikein_db=”gs://pathogen-public-dbs/v0/ERCC_96_nopolyA.fasta”). Our kraken2 database was
</code></pre>
<p dir="auto">Terra的这个流程是针对illumina二代的情况<br />
<a href="https://app.terra.bio/#workspaces/pathogen-genomic-surveillance/COVID-19_Broad_Viral_NGS" rel="nofollow ugc">https://app.terra.bio/#workspaces/pathogen-genomic-surveillance/COVID-19_Broad_Viral_NGS</a></p>
]]></description><link>http://an.forum.genostack.com/post/1342</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1342</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Fri, 01 Apr 2022 02:21:02 GMT</pubDate></item><item><title><![CDATA[Reply to 新冠病毒数据分析 on Sat, 26 Mar 2022 08:48:07 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://www.nature.com/articles/s41467-020-20075-6" rel="nofollow ugc">https://www.nature.com/articles/s41467-020-20075-6</a></p>
]]></description><link>http://an.forum.genostack.com/post/1329</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1329</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 26 Mar 2022 08:48:07 GMT</pubDate></item><item><title><![CDATA[Reply to 新冠病毒数据分析 on Tue, 22 Mar 2022 01:32:25 GMT]]></title><description><![CDATA[<p dir="auto">新冠的命名<br />
<a href="https://covariants.org/" rel="nofollow ugc">https://covariants.org/</a><br />
<a href="https://cov-lineages.org/" rel="nofollow ugc">https://cov-lineages.org/</a>   Pango的官网</p>
]]></description><link>http://an.forum.genostack.com/post/1294</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1294</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Tue, 22 Mar 2022 01:32:25 GMT</pubDate></item><item><title><![CDATA[Reply to 新冠病毒数据分析 on Tue, 15 Mar 2022 05:51:12 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://www.mdpi.com/2075-1729/12/1/69/htm" rel="nofollow ugc">https://www.mdpi.com/2075-1729/12/1/69/htm</a><br />
Direct RNA Nanopore Sequencing of SARS-CoV-2 Extracted from Critical Material from Swabs<br />
直接用Nanopore RNA测序来检测新冠</p>
<ul>
<li>
<p dir="auto">basecalling<br />
Nanopore Guppy base caller (v3.4.4) tool<br />
“flow cell = FLO-MIN106” and “kit = SQK-RNA002”.</p>
</li>
<li>
<p dir="auto">质控<br />
PycoQC (v2.5.0.21) software</p>
</li>
<li>
<p dir="auto">过滤<br />
NanoFilt (v2.7.0)<br />
minimum read length ≥500 nt and read quality ≥8.</p>
</li>
<li>
<p dir="auto">去宿主和其他微生物(去污染)<br />
去除人GRCh38 (hg38) fungal and bacterial genome<br />
minimap2 (v2.17–r941)</p>
</li>
<li>
<p dir="auto">提取新冠病毒<br />
samtools (v1.7)  view   unmapped reads and reads with mapping quality lower than 10</p>
</li>
<li>
<p dir="auto">对齐 call 突变<br />
minimap2<br />
BCFtools mpileup\call<br />
使用Integrative Genomic Viewer (IGV) (v2.8.2) 查看突变</p>
</li>
</ul>
]]></description><link>http://an.forum.genostack.com/post/1262</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1262</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Tue, 15 Mar 2022 05:51:12 GMT</pubDate></item><item><title><![CDATA[Reply to 新冠病毒数据分析 on Sat, 05 Mar 2022 16:02:48 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://www.cdc.gov/amd/pdf/slidesets/toolkitmodule_3.5-508c.pdf" rel="nofollow ugc">https://www.cdc.gov/amd/pdf/slidesets/toolkitmodule_3.5-508c.pdf</a><br />
新冠病毒的仓库 gisaid 和ncbi</p>
]]></description><link>http://an.forum.genostack.com/post/1240</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1240</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 05 Mar 2022 16:02:48 GMT</pubDate></item><item><title><![CDATA[Reply to 新冠病毒数据分析 on Sun, 06 Mar 2022 11:47:14 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://www.frontiersin.org/articles/10.3389/fmicb.2021.665041/full" rel="nofollow ugc">https://www.frontiersin.org/articles/10.3389/fmicb.2021.665041/full</a><br />
A Novel SARS-CoV-2 Viral Sequence Bioinformatic Pipeline Has Found Genetic Evidence That the Viral 3′ Untranslated Region (UTR) Is Evolving and Generating Increased Viral Diversity</p>
<p dir="auto">这个文章对新冠的UTR部分的突变做了分析<br />
<img src="/assets/uploads/files/1646450271095-c553f6b4-1c84-47c6-87a3-bcc14e818150-image-resized.png" alt="c553f6b4-1c84-47c6-87a3-bcc14e818150-image.png" class=" img-responsive img-markdown" /></p>
<p dir="auto">这个文章的附件有一张表 里面有SRR号 序列数很多 对我们来说需要支持这种批量下载数据的情况</p>
]]></description><link>http://an.forum.genostack.com/post/1231</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1231</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sun, 06 Mar 2022 11:47:14 GMT</pubDate></item><item><title><![CDATA[Reply to 新冠病毒数据分析 on Fri, 04 Mar 2022 03:59:44 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015676/" rel="nofollow ugc">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4015676/</a><br />
多序列比对软件的对比</p>
]]></description><link>http://an.forum.genostack.com/post/1226</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1226</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Fri, 04 Mar 2022 03:59:44 GMT</pubDate></item><item><title><![CDATA[Reply to 新冠病毒数据分析 on Thu, 03 Mar 2022 09:49:36 GMT]]></title><description><![CDATA[<p dir="auto"><a href="http://covid19.sfb.uit.no" rel="nofollow ugc">covid19.sfb.uit.no</a><br />
新冠数据库</p>
]]></description><link>http://an.forum.genostack.com/post/1222</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1222</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 03 Mar 2022 09:49:36 GMT</pubDate></item><item><title><![CDATA[Reply to 新冠病毒数据分析 on Thu, 03 Mar 2022 08:13:46 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7929396/" rel="nofollow ugc">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7929396/</a><br />
Bioinformatics resources for SARS-CoV-2 discovery and surveillance<br />
几种实验方法<br />
<img src="/assets/uploads/files/1646291301524-6ac373b5-27c1-4674-b73a-8e083137b005-image.png" alt="6ac373b5-27c1-4674-b73a-8e083137b005-image.png" class=" img-responsive img-markdown" /><br />
The workflow of different NGS sequencing approaches currently available for virus discovery and genomic surveillance. The library construction scheme employed in (A) metatranscriptomic sequencing, (B) a hybrid capture-based approach based on a metatranscriptomic library, (C) multiplex PCR amplification for NGS platforms and (D) the Oxford Nanopore sequencing platform.<br />
新病毒发现的基本过程和工具<br />
<img src="/assets/uploads/files/1646291724286-a9c9d66c-b37a-4cd5-b221-d1929cc3c715-image.png" alt="a9c9d66c-b37a-4cd5-b221-d1929cc3c715-image.png" class=" img-responsive img-markdown" /><br />
在去宿主步骤 提到了要去掉rRNA  而且也提到病毒载量比较低的情况下 可以不去宿主</p>
]]></description><link>http://an.forum.genostack.com/post/1218</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1218</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 03 Mar 2022 08:13:46 GMT</pubDate></item><item><title><![CDATA[Reply to 新冠病毒数据分析 on Thu, 03 Mar 2022 06:05:20 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://pubmed.ncbi.nlm.nih.gov/31978945/" rel="nofollow ugc">https://pubmed.ncbi.nlm.nih.gov/31978945/</a><br />
A Novel Coronavirus from Patients with Pneumonia in China, 2019<br />
这个文章提到了最初武汉的几个样本采用的二代和三代混合测序的方式</p>
]]></description><link>http://an.forum.genostack.com/post/1217</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1217</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 03 Mar 2022 06:05:20 GMT</pubDate></item><item><title><![CDATA[Reply to 新冠病毒数据分析 on Thu, 03 Mar 2022 03:29:42 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://sci-hub.st/10.3390/cimb43020061" rel="nofollow ugc">https://sci-hub.st/10.3390/cimb43020061</a><br />
Next-Generation Sequencing (NGS) in COVID-19: A Tool for<br />
SARS-CoV-2 Diagnosis, Monitoring New Strains and<br />
Phylodynamic Modeling in Molecular Epidemiology</p>
<p dir="auto">这个文章里面对武汉新冠当时的情况做了一个介绍 5个样本中国疾控处理、4个样本华大处理的<br />
<strong>华大用bwa和hg19人的基因组对比 去除了宿主 然后和NCBI的冠状病毒（具体是哪个序列还不知道）做了对齐 然后使用SPAdes做了一个一致性序列</strong><br />
中国疾控用的是  <strong>CLCBio</strong> （就是我研究的竞品 CLC workbench）software version 11.0.1 was used for de novo assembly, variant calling, and alignment<br />
有了这些组装的结果之后，就可以做进化分析（ phylogenetic analysis）</p>
<p dir="auto">该文章对covid-19 NGS实验和生信做了综述 里面引用了一些文章还是有价值的：<br />
<strong>1.<a href="https://pubmed.ncbi.nlm.nih.gov/29154853/" rel="nofollow ugc">https://pubmed.ncbi.nlm.nih.gov/29154853/</a></strong><br />
Standards and Guidelines for Validating Next-Generation Sequencing Bioinformatics Pipelines: A Joint Recommendation of the Association for Molecular Pathology and the College of American Pathologists<br />
这个文章里面总结了一个专家共识，提出了17条临床NGS生物信息学管道验证的最佳实践共识建议<br />
<img src="/assets/uploads/files/1646216559483-470f8ff7-4d5e-4bc7-b759-344216c77ed1-image.png" alt="470f8ff7-4d5e-4bc7-b759-344216c77ed1-image.png" class=" img-responsive img-markdown" /><br />
属于解释<br />
Terminology<br />
Description<br />
BAM<br />
Compressed (binary) format of a SAM file intended for faster random access (search) of aligned and unaligned sequences and its related metadata. Compression enables smaller file size and storage efficiency which makes it popular over the SAM format. This is a default output of many alignment and post-alignment softwares used in bioinformatics pipeline and commonly used as an input by many variant callers*.<br />
FASTQ<br />
It is a de facto, human readable, file format that stores nucleotide sequences and corresponding quality (PHRED) scores for each nucleotide as an ASCII encoded character.4 This is commonly used for storing unaligned short sequence reads after the steps of base calling and is a typical starting point for NGS bioinformatics pipeline.<br />
PHRED Score<br />
It is a per base (nucleotide) quality score that is defined as an estimated probability for a called base to be incorrect (erroneous call). Mathematically, it is expressed as4</p>
<p dir="auto">where Q is Phred quality score, Pe is the probability for an erroneous base call. The Pe is typically generated by the base calling software which is sequence instrument specific. Therefore, Q values in isolation cannot be used to compare sequence quality across different sequencing platforms.<br />
SAM<br />
Stands for Sequence Alignment/Map format. It is a human readable (text file) file format specification for storing information on aligned sequence. This is a default output of many alignment softwares used in bioinformatics pipeline. Given the large file size and slower random access, BAM format is preferred for routine bioinformatics data processing. This format is helpful for technical troubleshooting when manual review of the stored information is necessary*.<br />
Variant - horizontally complex<br />
When two or more sequence alterations are present on the same read in close proximity such that they may represent a single complex variant. These variants are frequently represented as deletion-insertions and may result in ambiguous sequence description or HGVS nomenclature.<br />
Variant - Left-aligned<br />
If there are multiple potential VCF entries of the same allele length that represent the same variant, then left-alignment refers to the VCF entry with the smallest base position. The base position is typically represented in genomic coordinate for a given primary assembly (eg,  GRCh38) and represents the most 5’ position.5<br />
Variant - Normalized<br />
A normalized variant must be parsimonious as well as left-aligned.5<br />
Variant - Parsimony<br />
If there are more than one way to represent the same variant in a VCF file, parsimony refers to the representation with the shortest possible allele length5  (positive and non-zero length).<br />
Variant - vertically complex<br />
A vertically complex variant occurs when three or more alleles are represented by different sequence reads, typically with or uncommonly without a reference (normal) allele, at the same genomic coordinate or set of coordinates.<br />
VCF<br />
Variant Call Format is a versioned, text-file (human readable) specification for storing sequence variant calls. The file contains meta-information containing various details of the variant calling process and definition of headers and format tags, a header line and data lines. Each data line represents a sequence variant defined using a combination of chromosome, position, reference allele, and alternate allele†.</p>
]]></description><link>http://an.forum.genostack.com/post/1215</link><guid isPermaLink="true">http://an.forum.genostack.com/post/1215</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 03 Mar 2022 03:29:42 GMT</pubDate></item><item><title><![CDATA[Reply to 新冠病毒数据分析 on Fri, 25 Dec 2020 10:24:27 GMT]]></title><description><![CDATA[<p dir="auto">(artic-ncov2019) bioinfo@genostack:~/artic_ncov_2019/analysis/run_name$ artic minion --scheme-directory ~/artic_ncov_2019/artic-ncov2019/primer_schemes/ nCoV-2019/V3 nCov19 --read-file guppy_fastq/all_sars19.fastq --fast5-directory /mnt/sdf1/sars_2019/ --sequencing-summary guppy_fastq/sequencing_summary.txt<br />
<strong>Running</strong>: nanopolish index -s guppy_fastq/sequencing_summary.txt -d /mnt/sdf1/sars_2019/ guppy_fastq/all_sars19.fastq<br />
[readdb] indexing /mnt/sdf1/sars_2019/<br />
[readdb] num reads: 2072359, num reads with path to fast5: 2072359<br />
<strong>Running</strong>: minimap2 -a -x map-ont -t 8 /home/bioinfo/artic_ncov_2019/artic-ncov2019/primer_schemes//nCoV-2019/V3/nCoV-2019.reference.fasta guppy_fastq/all_sars19.fastq | samtools view -bS -F 4 - | samtools sort -o nCov19.sorted.bam -<br />
[M::mm_idx_gen::0.007<em>1.36] collected minimizers<br />
[M::mm_idx_gen::0.011</em>2.78] sorted minimizers<br />
[M::main::0.012<em>2.75] loaded/built the index for 1 target sequence(s)<br />
[M::mm_mapopt_update::0.015</em>2.37] mid_occ = 3<br />
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1<br />
[M::mm_idx_stat::0.017<em>2.15] distinct minimizers: 5587 (99.93% are singletons); average occurrences: 1.004; average spacing: 5.332<br />
[M::worker_pipeline::80.064</em>6.71] mapped 550538 sequences<br />
[M::worker_pipeline::152.050<em>5.23] mapped 548324 sequences<br />
[M::worker_pipeline::209.036</em>4.73] mapped 548462 sequences<br />
[M::worker_pipeline::259.840<em>3.83] mapped 425035 sequences<br />
[M::main] Version: 2.17-r941<br />
[M::main] CMD: minimap2 -a -x map-ont -t 8 /home/bioinfo/artic_ncov_2019/artic-ncov2019/primer_schemes//nCoV-2019/V3/nCoV-2019.reference.fasta guppy_fastq/all_sars19.fastq<br />
[M::main] Real time: 259.915 sec; CPU: 995.230 sec; Peak RSS: 2.570 GB<br />
[bam_sort_core] merging from 3 files and 1 in-memory blocks...<br />
<strong>Running</strong>: samtools index nCov19.sorted.bam<br />
<strong>Running</strong>: align_trim --start --normalise 100 /home/bioinfo/artic_ncov_2019/artic-ncov2019/primer_schemes//nCoV-2019/V3/nCoV-2019.scheme.bed --report nCov19.alignreport.txt &lt; nCov19.sorted.bam 2&gt; <a href="http://nCov19.alignreport.er" rel="nofollow ugc">nCov19.alignreport.er</a> | samtools sort -T nCov19 - -o nCov19.trimmed.rg.sorted.bam<br />
<strong>Running</strong>: align_trim --normalise 100 /home/bioinfo/artic_ncov_2019/artic-ncov2019/primer_schemes//nCoV-2019/V3/nCoV-2019.scheme.bed --remove-incorrect-pairs --report nCov19.alignreport.txt &lt; nCov19.sorted.bam 2&gt; <a href="http://nCov19.alignreport.er" rel="nofollow ugc">nCov19.alignreport.er</a> | samtools sort -T nCov19 - -o nCov19.primertrimmed.rg.sorted.bam<br />
<strong>Running</strong>: samtools index nCov19.trimmed.rg.sorted.bam<br />
<strong>Running</strong>: samtools index nCov19.primertrimmed.rg.sorted.bam<br />
<strong>Running</strong>: nanopolish variants --min-flanking-sequence 10 -x 1000000 --progress -t 8 --reads guppy_fastq/all_sars19.fastq -o nCov19.nCoV-2019_1.vcf -b nCov19.trimmed.rg.sorted.bam -g /home/bioinfo/artic_ncov_2019/artic-ncov2019/primer_schemes//nCoV-2019/V3/nCoV-2019.reference.fasta -w "MN908947.3:1-29904" --ploidy 1 -m 0.15 --read-group nCoV-2019_1<br />
[post-run summary] total reads: 6604, unparseable: 0, qc fail: 26, could not calibrate: 22, no alignment: 102, bad fast5: 0<br />
<strong>Running</strong>: nanopolish variants --min-flanking-sequence 10 -x 1000000 --progress -t 8 --reads guppy_fastq/all_sars19.fastq -o nCov19.nCoV-2019_2.vcf -b nCov19.trimmed.rg.sorted.bam -g /home/bioinfo/artic_ncov_2019/artic-ncov2019/primer_schemes//nCoV-2019/V3/nCoV-2019.reference.fasta -w "MN908947.3:1-29904" --ploidy 1 -m 0.15 --read-group nCoV-2019_2<br />
[post-run summary] total reads: 5890, unparseable: 0, qc fail: 24, could not calibrate: 15, no alignment: 86, bad fast5: 0<br />
<strong>Running</strong>: artic_vcf_merge nCov19 /home/bioinfo/artic_ncov_2019/artic-ncov2019/primer_schemes//nCoV-2019/V3/nCoV-2019.scheme.bed nCoV-2019_1:nCov19.nCoV-2019_1.vcf nCoV-2019_2:nCov19.nCoV-2019_2.vcf<br />
<strong>Running</strong>: artic_vcf_filter --nanopolish nCov19.merged.vcf nCov19.pass.vcf nCov19.fail.vcf<br />
<strong>Running</strong>: artic_make_depth_mask --store-rg-depths /home/bioinfo/artic_ncov_2019/artic-ncov2019/primer_schemes//nCoV-2019/V3/nCoV-2019.reference.fasta nCov19.primertrimmed.rg.sorted.bam nCov19.coverage_mask.txt<br />
<strong>Running</strong>: artic_plot_amplicon_depth --primerScheme /home/bioinfo/artic_ncov_2019/artic-ncov2019/primer_schemes//nCoV-2019/V3/nCoV-2019.scheme.bed --sampleID nCov19 --outFilePrefix nCov19 nCov19</em>.depths<br />
<strong>Running</strong>: bgzip -f nCov19.pass.vcf<br />
<strong>Running</strong>: tabix -p vcf nCov19.pass.vcf.gz<br />
<strong>Running</strong>: artic_mask /home/bioinfo/artic_ncov_2019/artic-ncov2019/primer_schemes//nCoV-2019/V3/nCoV-2019.reference.fasta nCov19.coverage_mask.txt nCov19.fail.vcf nCov19.preconsensus.fasta<br />
<strong>Running</strong>: bcftools consensus -f nCov19.preconsensus.fasta nCov19.pass.vcf.gz -m nCov19.coverage_mask.txt -o nCov19.consensus.fasta<br />
Note: the --sample option not given, applying all records regardless of the genotype<br />
Applied 2 variants<br />
<strong>Running</strong>: artic_fasta_header nCov19.consensus.fasta "nCov19/ARTIC/nanopolish"<br />
<strong>Running</strong>: cat nCov19.consensus.fasta /home/bioinfo/artic_ncov_2019/artic-ncov2019/primer_schemes//nCoV-2019/V3/nCoV-2019.reference.fasta &gt; nCov19.muscle.in.fasta<br />
<strong>Running</strong>: muscle -in nCov19.muscle.in.fasta -out nCov19.muscle.out.fasta</p>
<p dir="auto">MUSCLE v3.8.1551 by Robert C. Edgar</p>
<p dir="auto"><a href="http://www.drive5.com/muscle" rel="nofollow ugc">http://www.drive5.com/muscle</a><br />
This software is donated to the public domain.<br />
Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.</p>
<p dir="auto"><a href="http://nCov19.muscle.in" rel="nofollow ugc">nCov19.muscle.in</a> 2 seqs, lengths min 29903, max 29903, avg 29903<br />
00:00:00    20 MB(-2%)  Iter   1  100.00%  K-mer dist pass 1<br />
00:00:00    20 MB(-2%)  Iter   1  100.00%  K-mer dist pass 2<br />
00:01:16  1004 MB(-116%)  Iter   1  100.00%  Align node<br />
00:01:16  1004 MB(-116%)  Iter   1  100.00%  Root alignment</p>
<p dir="auto">详细过程分析</p>
<p dir="auto">1.nanopolish index -s guppy_fastq/sequencing_summary.txt -d /mnt/sdf1/sars_2019/ guppy_fastq/all_sars19.fastq<br />
<img src="/assets/uploads/files/1608881450212-b97c42d2-17e1-49f5-910e-3365706f94f4-image.png" alt="b97c42d2-17e1-49f5-910e-3365706f94f4-image.png" class=" img-responsive img-markdown" /><br />
nanopolish需要使用原始数据，index命令会将原始数据和guppy产生的reads关联起来</p>
<p dir="auto">2.minimap2 -a -x map-ont -t 8 /home/bioinfo/artic_ncov_2019/artic-ncov2019/primer_schemes//nCoV-2019/V3/nCoV-2019.reference.fasta guppy_fastq/all_sars19.fastq | samtools view -bS -F 4 - | samtools sort -o nCov19.sorted.bam -<br />
samtools index nCov19.sorted.bam<br />
将序列和参考序列进行比对　生成bam文件 此步骤也可以使用bwa mem进行比对</p>
<p dir="auto">3.align_trim --start --normalise 100 /home/bioinfo/artic_ncov_2019/artic-ncov2019/primer_schemes//nCoV-2019/V3/nCoV-2019.scheme.bed --report nCov19.alignreport.txt &lt; nCov19.sorted.bam 2&gt; <a href="http://nCov19.alignreport.er" rel="nofollow ugc">nCov19.alignreport.er</a> | samtools sort -T nCov19 - -o nCov19.trimmed.rg.sorted.bam<br />
目的：这一步严重依赖primer的相关知识　相关论文和工具可以从下面链接查到<br />
<a href="https://primalscheme.com/" rel="nofollow ugc">https://primalscheme.com/</a><br />
The purpose of alignment post-processing is:<br />
assign each read alignment to a derived amplicon<br />
using the derived amplicon, assign each read a read group based on the primer pool<br />
softmask read alignments within their derived amplicon</p>
<p dir="auto">4.突变分析 medaka-longshot　流程　在这个流程中Longshot is used in the ARTIC pipeline simply to annotate the VCF with various statistics (like read support for ALTs). The newest medaka v1.0.0 has a tool to perform this operation:可以被medaka tools annotate --help　取代<br />
medaka consensus<br />
medaka variant or snps (if pipeline has been told not to detect INDELS via --no-indels)<br />
medaka tools annotate (if --no-longshot has been selected)<br />
longshot (if --no-longshot not selected)</p>
<p dir="auto">另外一个突变流程是nanopolish　但是medaka宣称　50X faster than Nanopolish (and can run on GPUs).</p>
<p dir="auto">5.生成一致性序列<br />
<img src="/assets/uploads/files/1608891838034-92d670f8-db26-4266-82b4-79fcc0646de2-image.png" alt="92d670f8-db26-4266-82b4-79fcc0646de2-image.png" class=" img-responsive img-markdown" /></p>
]]></description><link>http://an.forum.genostack.com/post/260</link><guid isPermaLink="true">http://an.forum.genostack.com/post/260</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Fri, 25 Dec 2020 10:24:27 GMT</pubDate></item><item><title><![CDATA[Reply to 新冠病毒数据分析 on Wed, 23 Dec 2020 11:06:43 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://artic.readthedocs.io/en/latest/minion/" rel="nofollow ugc">https://artic.readthedocs.io/en/latest/minion/</a><br />
<a href="https://artic.readthedocs.io/en/latest/" rel="nofollow ugc">https://artic.readthedocs.io/en/latest/</a></p>
]]></description><link>http://an.forum.genostack.com/post/258</link><guid isPermaLink="true">http://an.forum.genostack.com/post/258</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Wed, 23 Dec 2020 11:06:43 GMT</pubDate></item><item><title><![CDATA[Reply to 新冠病毒数据分析 on Mon, 28 Mar 2022 12:02:59 GMT]]></title><description><![CDATA[<p dir="auto">minimap2 -R '@RG\tID:sras19\tPU:nanopore\tSM:sars19\tPL:minion\tLB:sqklsk109' -a -x map-ont -t 24 ~/ref/h1n1/ref.fasta barcode$i.fastq | samtools view -bS -F 4 - | samtools sort -o barcode$i.bam -<br />
samtools index barcode$i.bam<br />
medaka consensus barcode$i.bam barcode$i.hdf<br />
medaka variant ~/ref/h1n1/ref.fasta barcode$i.hdf barcode$i.vcf</p>
<p dir="auto">bgzip barcode$i.vcf<br />
tabix -p vcf barcode$i.vcf.gz</p>
<p dir="auto">longshot -P 0 -F -A --no_haps --bam barcode$i.bam --ref ~/ref/h1n1/ref.fasta --out barcode$i.longshot.vcf --potential_variants barcode$i.vcf.gz<br />
artic_vcf_filter --longshot barcode$i.longshot.vcf barcode$i.pass.vcf barcode$i.fail.vcf<br />
artic_make_depth_mask ~/ref/h1n1/ref.fasta barcode$i.bam barcode$i.coverage_mask.txt</p>
<p dir="auto">bgzip -f barcode$i.pass.vcf<br />
tabix -p vcf barcode$i.pass.vcf.gz</p>
<p dir="auto">artic_mask ~/ref/h1n1/ref.fasta barcode$i.coverage_mask.txt barcode$i.fail.vcf barcode$i.preconsensus.fasta<br />
bcftools consensus -f barcode$i.preconsensus.fasta barcode$i.pass.vcf.gz -m barcode$i.coverage_mask.txt -o barcode$i.consensus.fasta</p>
<p dir="auto"><a href="https://github.com/nanoporetech/medaka/issues/149" rel="nofollow ugc">https://github.com/nanoporetech/medaka/issues/149</a><br />
longshot只是做了一些统计</p>
<p dir="auto"><a href="https://artic.readthedocs.io/en/latest/release-notes/" rel="nofollow ugc">https://artic.readthedocs.io/en/latest/release-notes/</a><br />
<strong>Longshot added to Medaka to permit filter VCF on depth</strong><br />
Longshot似乎只是做了一些过滤</p>
<p dir="auto"><a href="https://bedtools.readthedocs.io/en/latest/content/tools/maskfasta.html" rel="nofollow ugc">https://bedtools.readthedocs.io/en/latest/content/tools/maskfasta.html</a><br />
咱们的过程没有用到  medaka tools annotate 步骤 这个步骤和longshot是二选一的过程<br />
<a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7480024/" rel="nofollow ugc">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7480024/</a></p>
]]></description><link>http://an.forum.genostack.com/post/207</link><guid isPermaLink="true">http://an.forum.genostack.com/post/207</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Mon, 28 Mar 2022 12:02:59 GMT</pubDate></item></channel></rss>