<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[简化基因组分析]]></title><description><![CDATA[<p dir="auto">一，生物学预备知识<br />
<img src="/assets/uploads/files/1608168936154-160ccc2c-d5ab-46b3-94c5-b0b083f97f78-image.png" alt="160ccc2c-d5ab-46b3-94c5-b0b083f97f78-image.png" class=" img-responsive img-markdown" /><br />
<strong>Restriction enzymes</strong> are DNA-cutting enzymes. Each enzyme recognizes one or a few target sequences and cuts DNA at or near those sequences.<br />
<strong>DNA ligase</strong> is a DNA-joining enzyme. If two pieces of DNA have matching ends, ligase can link them to form a single, unbroken molecule of DNA.<br />
<img src="/assets/uploads/files/1608171936356-e5a0ef80-94fc-400b-b9a8-e2f74f540dd9-image.png" alt="e5a0ef80-94fc-400b-b9a8-e2f74f540dd9-image.png" class=" img-responsive img-markdown" /><br />
<img src="/assets/uploads/files/1608172027370-64a727dc-5c4e-43ce-b3b7-a4267b5b3997-image.png" alt="64a727dc-5c4e-43ce-b3b7-a4267b5b3997-image.png" class=" img-responsive img-markdown" /><br />
enzymes that leave single-stranded overhangs are said to produce sticky ends. <strong>Sticky ends</strong> are helpful in cloning because they hold two pieces of DNA together so they can be linked by DNA ligase.<br />
Some are “<strong>blunt cutters</strong>,” which cut straight down the middle of a target sequence and leave no overhang. The restriction enzyme SmaI is an example of a blunt cutter:<br />
<img src="/assets/uploads/files/1608172137918-9b706ca4-000b-4cab-a73c-f22172af4b32-image.png" alt="9b706ca4-000b-4cab-a73c-f22172af4b32-image.png" class=" img-responsive img-markdown" /><br />
<img src="/assets/uploads/files/1608172402162-adffef80-198b-4eff-97b0-67b0594befe4-image.png" alt="adffef80-198b-4eff-97b0-67b0594befe4-image.png" class=" img-responsive img-markdown" /><br />
一个例子：<br />
<img src="/assets/uploads/files/1608172849422-5e9fbbda-8e8c-477e-9304-1f76faafa453-image.png" alt="5e9fbbda-8e8c-477e-9304-1f76faafa453-image.png" class=" img-responsive img-markdown" /><br />
<img src="/assets/uploads/files/1608172879377-7320c572-4466-4497-acd3-4a1ad2c62168-image.png" alt="7320c572-4466-4497-acd3-4a1ad2c62168-image.png" class=" img-responsive img-markdown" /><br />
<img src="/assets/uploads/files/1608172898948-78f5acd4-506a-4d13-96b9-315a9bcc1b72-image.png" alt="78f5acd4-506a-4d13-96b9-315a9bcc1b72-image.png" class=" img-responsive img-markdown" /><br />
二，实验方法<br />
CRoPS<br />
RAD-Seq<br />
GBS<br />
double-digest RAD-Seq<br />
2bRAD<br />
restriction-enzyme anchored positions<br />
<a href="https://www.researchgate.net/figure/An-overview-of-the-RAD-seq-library-creation-protocol-and-initial-analysis-steps_fig1_235794002" rel="nofollow ugc">https://www.researchgate.net/figure/An-overview-of-the-RAD-seq-library-creation-protocol-and-initial-analysis-steps_fig1_235794002</a><br />
<img src="/assets/uploads/files/1608276766632-d5d8f02e-6106-42ed-8b34-e850111dab4d-image.png" alt="d5d8f02e-6106-42ed-8b34-e850111dab4d-image.png" class=" img-responsive img-markdown" /><br />
三，分析方法<br />
Stack2流程<br />
官网：<a href="https://catchenlab.life.illinois.edu/stacks/" rel="nofollow ugc">https://catchenlab.life.illinois.edu/stacks/</a><br />
<img src="/assets/uploads/files/1608012225603-a44c215f-c432-4aa5-8037-9304f58bb975-image-resized.png" alt="a44c215f-c432-4aa5-8037-9304f58bb975-image.png" class=" img-responsive img-markdown" /><br />
原理：maximum likelihood statistical model<br />
实现语言：C++ with wrapper programs written in Perl，基于OpenMP的多线程技术<br />
<strong>无参考序列分析（de novo）：</strong><br />
process_radtags：demultiplexed and cleaned<br />
ustacks：building loci<br />
cstacks：creating the catalog of loci<br />
sstacks：matching against the catalog<br />
tsv2bam: transpose data from being store per-sample to be stored per-locus<br />
gstacks：assemble and merge paired-end contigs, call variant sites in the population and genotypes in each sample<br />
populations：filter data, calculate population genetics statistics, and export a variety of data formats.<br />
<strong>有参考序列分析：</strong><br />
process_radtags：demultiplexed and cleaned<br />
gstacks：assemble and merge paired-end contigs, call variant sites in the population and genotypes in each sample<br />
populations：filter data, calculate population genetics statistics, and export a variety of data formats.<br />
四，安装</p>
<pre><code>tar xfvz stacks-2.xx.tar.gz
cd stacks-2.xx
./configure
make
(become root)
make install
(or, use sudo)
sudo make install
</code></pre>
<p dir="auto">五，详细过程<br />
5.1.Clean the data<br />
process_radtags  样本拆分和过滤低质量序列<br />
样本是否需要拆分，以及barcode是什么，需要和实验人员确定。<br />
参考手册4.1.1　指定 --inline_index参数<br />
<a href="https://catchenlab.life.illinois.edu/stacks/manual/#install" rel="nofollow ugc">https://catchenlab.life.illinois.edu/stacks/manual/#install</a><br />
指定barcode列表</p>
<pre><code>cat barcodes_lane3
CGATA&lt;tab&gt;sample_01
CGGCG     sample_02
GAAGC     sample_03
GAGAT     sample_04
TAATG     sample_05
TAGCA     sample_06
AAGGG     sample_07
ACACG     sample_08
ACGTA     sample_09
</code></pre>
<pre><code>process_radtags -p ./raw/ -o ./samples/ -b ./barcodes/barcodes_lane3 \
                  -e sbfI -r -c -q
process_radtags -P -p ./raw  -b ./barcodes/barcodes -o ./samples/ \
                  -c -q -r --inline_index --renz_1 nlaIII --renz_2 mluCI
</code></pre>
<p dir="auto">-p 待处理fataq文件所在的目录<br />
-o 输出结果目录<br />
-b barcode列表文件<br />
-e restrction enzyme类型<br />
5.2 和参考序列进行比对<br />
可以使用常用的比对软件　GSnap, BWA, or Bowtie2.<br />
可以用samtools或者picard查看比对的质量。<br />
建议使用bwa mem进行比对。<br />
<strong>烟草有参考基因组吗？</strong><br />
5.3<br />
无参流程denovo_map.pl 或者单独执行命令<br />
有参流程ref_map.pl或者单独执行命令<br />
大部分stack2的组件需要一个population map文件，这个文件其实是根据样本特征进行分组，打标签<br />
population map</p>
<pre><code>cat popmap_both
sample_01&lt;tab&gt;red&lt;tab&gt;high
sample_02     red      high
sample_03     red      high
sample_04     red      high
sample_05     yellow   high
sample_06     yellow   high
sample_07     yellow   high
sample_08     yellow   high
sample_09     blue     low
sample_10     blue     low
sample_11     blue     low
sample_12     blue     low
sample_13     orange   low
sample_14     orange   low
sample_15     orange   low
sample_16     orange   low
</code></pre>
<pre><code>denovo_map.pl -T 8 -M 6 -o ./stacks/ --samples ./samples --popmap ./popmaps/popmap --paired
</code></pre>
<p dir="auto">或者单独执行命令：</p>
<pre><code>#!/bin/bash

src=$HOME/research/project

files=”sample_01
sample_02
sample_03”

#
# Build loci de novo in each sample for the single-end reads only. If paired-end reads are available, 
# they will be integrated in a later stage (tsv2bam stage).
# This loop will run ustacks on each sample, e.g.
#   ustacks -f ./samples/sample_01.1.fq.gz -o ./stacks -i 1 --name sample_01 -M 4 -p 8
#
id=1
for sample in $files
do
    ustacks -f $src/samples/${sample}.1.fq.gz -o $src/stacks -i $id --name $sample -M 4 -p 8
    let "id+=1"
done

# 
# Build the catalog of loci available in the metapopulation from the samples contained
# in the population map. To build the catalog from a subset of individuals, supply
# a separate population map only containing those samples.
#
cstacks -n 6 -P $src/stacks/ -M $src/popmaps/popmap -p 8

#
# Run sstacks. Match all samples supplied in the population map against the catalog.
#
sstacks -P $src/stacks/ -M $src/popmaps/popmap -p 8

#
# Run tsv2bam to transpose the data so it is stored by locus, instead of by sample. We will include
# paired-end reads using tsv2bam. tsv2bam expects the paired read files to be in the samples
# directory and they should be named consistently with the single-end reads,
# e.g. sample_01.1.fq.gz and sample_01.2.fq.gz, which is how process_radtags will output them.
#
tsv2bam -P $src/stacks/ -M $src/popmaps/popmap --pe-reads-dir $src/samples -t 8

#
# Run gstacks: build a paired-end contig from the metapopulation data (if paired-reads provided),
# align reads per sample, call variant sites in the population, genotypes in each individual.
#
gstacks -P $src/stacks/ -M $src/popmaps/popmap -t 8

#
# Run populations. Calculate Hardy-Weinberg deviation, population statistics, f-statistics
# export several output files.
#
populations -P $src/stacks/ -M $src/popmaps/popmap -r 0.65 --vcf --genepop --structure --fstats --hwe -t 8
</code></pre>
<pre><code>ref_map.pl -T 8 --popmap ./popmaps/popmap -o ./stacks/ --samples ./aligned
</code></pre>
<p dir="auto">或者单独执行命令：</p>
<pre><code>#!/bin/bash

src=$HOME/research/project
bwa_db=$src/bwa_db/my_bwa_db_prefix
    
files=”sample_01
sample_02
sample_03”

#
# Align paired-end data with BWA, convert to BAM and SORT.
#
for sample in $files
do 
    bwa mem -t 8 $bwa_db $src/samples/${sample}.1.fq.gz $src/samples/${sample}.2.fq.gz |
      samtools view -b |
      samtools sort --threads 4 &gt; $src/aligned/${sample}.bam
done

#
# Run gstacks to build loci from the aligned paired-end data. We have instructed
# gstacks to remove any PCR duplicates that it finds.
#
gstacks -I $src/aligned/ -M $src/popmaps/popmap --rm-pcr-duplicates -O $src/stacks/ -t 8

#
# Run populations. Calculate Hardy-Weinberg deviation, population statistics, f-statistics and 
# smooth the statistics across the genome. Export several output files.
#
populations -P $src/stacks/ -M $src/popmaps/popmap -r 0.65 --vcf --genepop --fstats --smooth --hwe -t 8
</code></pre>
<p dir="auto">5.4 结果数据分析<br />
genetic map<br />
population analysis<br />
STRUCTURE<br />
Adegenet<br />
coverage</p>
]]></description><link>http://an.forum.genostack.com/topic/135/简化基因组分析</link><generator>RSS for Node</generator><lastBuildDate>Sat, 13 Jun 2026 12:34:07 GMT</lastBuildDate><atom:link href="http://an.forum.genostack.com/topic/135.rss" rel="self" type="application/rss+xml"/><pubDate>Tue, 15 Dec 2020 08:57:41 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to 简化基因组分析 on Fri, 02 Jul 2021 09:44:10 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://groups.google.com/g/stacks-users/c/1n6O98oWO2o/m/oRm3FYZTs1IJ?pli=1" rel="nofollow ugc">https://groups.google.com/g/stacks-users/c/1n6O98oWO2o/m/oRm3FYZTs1IJ?pli=1</a></p>
<p dir="auto">Stacks不太适合多倍体（大于2）</p>
]]></description><link>http://an.forum.genostack.com/post/669</link><guid isPermaLink="true">http://an.forum.genostack.com/post/669</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Fri, 02 Jul 2021 09:44:10 GMT</pubDate></item><item><title><![CDATA[Reply to 简化基因组分析 on Wed, 09 Jun 2021 06:57:13 GMT]]></title><description><![CDATA[<p dir="auto"><img src="/assets/uploads/files/1623221773531-37dc496d-5922-433f-90bf-7d68d03368f7-image-resized.png" alt="37dc496d-5922-433f-90bf-7d68d03368f7-image.png" class=" img-responsive img-markdown" /><br />
<a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0201254" rel="nofollow ugc">https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0201254</a><br />
A RAD-sequencing approach to genome-wide marker discovery, genotyping, and phylogenetic inference in a diverse radiation of primates</p>
]]></description><link>http://an.forum.genostack.com/post/639</link><guid isPermaLink="true">http://an.forum.genostack.com/post/639</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Wed, 09 Jun 2021 06:57:13 GMT</pubDate></item><item><title><![CDATA[Reply to 简化基因组分析 on Tue, 08 Jun 2021 01:59:29 GMT]]></title><description><![CDATA[<p dir="auto">python tagdigger_script.py -w . -b samples.csv -o mycounts.csv -k markerstokeep.txt --StacksTags ../ANCHA180152/ustack/catalog.tags.tsv.gz --StacksSNPs ../ANCHA180152/ustack/catalog.snps.tsv.gz --StacksAlleles ../ANCHA180152/ustack/catalog.alleles.tsv.gz</p>
]]></description><link>http://an.forum.genostack.com/post/636</link><guid isPermaLink="true">http://an.forum.genostack.com/post/636</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Tue, 08 Jun 2021 01:59:29 GMT</pubDate></item><item><title><![CDATA[Reply to 简化基因组分析 on Mon, 28 Dec 2020 07:56:37 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://www.researchgate.net/post/Which-program-is-best-to-use-for-phylogeny-analysis" rel="nofollow ugc">https://www.researchgate.net/post/Which-program-is-best-to-use-for-phylogeny-analysis</a></p>
]]></description><link>http://an.forum.genostack.com/post/271</link><guid isPermaLink="true">http://an.forum.genostack.com/post/271</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Mon, 28 Dec 2020 07:56:37 GMT</pubDate></item><item><title><![CDATA[Reply to 简化基因组分析 on Mon, 28 Dec 2020 06:33:09 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://evolution.genetics.washington.edu/phylip/tuimala3.pdf" rel="nofollow ugc">https://evolution.genetics.washington.edu/phylip/tuimala3.pdf</a></p>
]]></description><link>http://an.forum.genostack.com/post/270</link><guid isPermaLink="true">http://an.forum.genostack.com/post/270</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Mon, 28 Dec 2020 06:33:09 GMT</pubDate></item><item><title><![CDATA[Reply to 简化基因组分析 on Wed, 13 Jan 2021 07:09:16 GMT]]></title><description><![CDATA[<p dir="auto">OneMap应用指南<br />
<a href="https://mran.microsoft.com/snapshot/2017-02-04/web/packages/onemap/vignettes/Tutorial_Onemap_reduced_version.pdf" rel="nofollow ugc">https://mran.microsoft.com/snapshot/2017-02-04/web/packages/onemap/vignettes/Tutorial_Onemap_reduced_version.pdf</a></p>
]]></description><link>http://an.forum.genostack.com/post/269</link><guid isPermaLink="true">http://an.forum.genostack.com/post/269</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Wed, 13 Jan 2021 07:09:16 GMT</pubDate></item><item><title><![CDATA[Reply to 简化基因组分析 on Sat, 26 Dec 2020 07:10:00 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://github.com/evansbenj/Reduced-Representation-Workshop" rel="nofollow ugc">https://github.com/evansbenj/Reduced-Representation-Workshop</a></p>
]]></description><link>http://an.forum.genostack.com/post/264</link><guid isPermaLink="true">http://an.forum.genostack.com/post/264</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 26 Dec 2020 07:10:00 GMT</pubDate></item><item><title><![CDATA[Reply to 简化基因组分析 on Sat, 26 Dec 2020 06:44:56 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://evolution.genetics.washington.edu/phylip/" rel="nofollow ugc">https://evolution.genetics.washington.edu/phylip/</a></p>
]]></description><link>http://an.forum.genostack.com/post/263</link><guid isPermaLink="true">http://an.forum.genostack.com/post/263</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 26 Dec 2020 06:44:56 GMT</pubDate></item><item><title><![CDATA[Reply to 简化基因组分析 on Sat, 26 Dec 2020 03:55:59 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://simison.com/brian/Structure_notes.html" rel="nofollow ugc">https://simison.com/brian/Structure_notes.html</a></p>
]]></description><link>http://an.forum.genostack.com/post/262</link><guid isPermaLink="true">http://an.forum.genostack.com/post/262</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 26 Dec 2020 03:55:59 GMT</pubDate></item><item><title><![CDATA[Reply to 简化基因组分析 on Fri, 18 Dec 2020 11:27:15 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://pearg.github.io/pearg_documentation/tutorials/intro-rad-seq/" rel="nofollow ugc">https://pearg.github.io/pearg_documentation/tutorials/intro-rad-seq/</a></p>
]]></description><link>http://an.forum.genostack.com/post/249</link><guid isPermaLink="true">http://an.forum.genostack.com/post/249</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Fri, 18 Dec 2020 11:27:15 GMT</pubDate></item><item><title><![CDATA[Reply to 简化基因组分析 on Fri, 18 Dec 2020 02:49:02 GMT]]></title><description><![CDATA[<p dir="auto">实际测试:<br />
鸿元服务器　/home/bioinfo/radseq<br />
process_radtags -p . -o 1-cleandata/ -e sbfI -c -q -r<br />
zgrep TGCAGG --color=always SRR828261.1.fq.gz |head  -n 20<br />
//比对<br />
bwa mem ../Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa ../SRR828261.1.fastq |gzip -3 &gt; SRR828261.1.sam.gz<br />
//查看比对率<br />
samtools flagstat SRR828303.1.sam.gz<br />
//构建loci<br />
gstacks -I ../2-align/ -M popmap -O output -t 8<br />
//populations分析<br />
populations -P ../3-build-loci/output/ -M ./popmap -r 0.65 --vcf --genepop --fstats --smooth --hwe -t 8</p>
]]></description><link>http://an.forum.genostack.com/post/245</link><guid isPermaLink="true">http://an.forum.genostack.com/post/245</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Fri, 18 Dec 2020 02:49:02 GMT</pubDate></item><item><title><![CDATA[Reply to 简化基因组分析 on Tue, 15 Dec 2020 09:29:10 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://www.yn-tobacco.com/gzfw/rqfw/xfz/jyzwjb/201801/t20180112_215188.html" rel="nofollow ugc">https://www.yn-tobacco.com/gzfw/rqfw/xfz/jyzwjb/201801/t20180112_215188.html</a><br />
烟草有参考基因组吗？<br />
普通烟草（Nicotiana tabacum）为异源四倍体植物，基因组大小约 4.5G，基因组结构高度复杂。美国烟草基因组计划对四倍体烤烟品种采用甲基过滤法测序只获得了部分基因序列信息。我国烟草基因组计划完成了对普通烟草的两个祖先种绒毛状烟草（N. tomentosiformis）和林烟草（N. sylvestris）全基因组序列图谱的绘制，更充分地验证了烟草基因组结构高度复杂这一事实。在林烟草和绒毛状烟草7万多个基因中，有1万多个属于高度同源基因，就像孪生兄弟一样十分相像，常规的表达谱芯片探针设计难以有效区分鉴别。<br />
N.tomentosiformis　绒毛状烟草<br />
N.sylvestris　　　　林烟草<br />
N.tabacum　　　　普通烟草<br />
用tobacco搜索NCBI　有参考基因组　到时侯也可以和客户沟通下用哪个</p>
]]></description><link>http://an.forum.genostack.com/post/238</link><guid isPermaLink="true">http://an.forum.genostack.com/post/238</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Tue, 15 Dec 2020 09:29:10 GMT</pubDate></item></channel></rss>