<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[微生态的多元分析  Multivariate analyses in microbial ecology]]></title><description><![CDATA[<p dir="auto"><a href="https://www.researchgate.net/post/How-to-choose-ordination-method-such-as-PCA-CA-PCoA-and-NMDS" rel="nofollow ugc">https://www.researchgate.net/post/How-to-choose-ordination-method-such-as-PCA-CA-PCoA-and-NMDS</a></p>
<p dir="auto"><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2121141/" rel="nofollow ugc">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2121141/</a><br />
<a href="https://mb3is.megx.net/gustame/dissimilarity-based-methods/nmds" rel="nofollow ugc">https://mb3is.megx.net/gustame/dissimilarity-based-methods/nmds</a><br />
<a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/mec.13536" rel="nofollow ugc">https://onlinelibrary.wiley.com/doi/abs/10.1111/mec.13536</a></p>
<p dir="auto">Numerical ecology with R. Springer</p>
]]></description><link>http://an.forum.genostack.com/topic/119/微生态的多元分析-multivariate-analyses-in-microbial-ecology</link><generator>RSS for Node</generator><lastBuildDate>Sat, 13 Jun 2026 09:37:46 GMT</lastBuildDate><atom:link href="http://an.forum.genostack.com/topic/119.rss" rel="self" type="application/rss+xml"/><pubDate>Fri, 20 Nov 2020 13:36:06 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to 微生态的多元分析  Multivariate analyses in microbial ecology on Mon, 30 Nov 2020 06:41:50 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://fukamilab.github.io/BIO202/index.html" rel="nofollow ugc">https://fukamilab.github.io/BIO202/index.html</a><br />
Biology 202: Ecological Statistics<br />
Stanford University</p>
]]></description><link>http://an.forum.genostack.com/post/198</link><guid isPermaLink="true">http://an.forum.genostack.com/post/198</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Mon, 30 Nov 2020 06:41:50 GMT</pubDate></item><item><title><![CDATA[Reply to 微生态的多元分析  Multivariate analyses in microbial ecology on Sat, 28 Nov 2020 10:10:37 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://www.metaboanalyst.ca/" rel="nofollow ugc">https://www.metaboanalyst.ca/</a><br />
<a href="https://www.researchgate.net/post/Do-you-know-of-any-free-software-which-can-do-multivariate-analysis-PCA-PLS-etc" rel="nofollow ugc">https://www.researchgate.net/post/Do-you-know-of-any-free-software-which-can-do-multivariate-analysis-PCA-PLS-etc</a></p>
]]></description><link>http://an.forum.genostack.com/post/197</link><guid isPermaLink="true">http://an.forum.genostack.com/post/197</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 28 Nov 2020 10:10:37 GMT</pubDate></item><item><title><![CDATA[Reply to 微生态的多元分析  Multivariate analyses in microbial ecology on Sat, 28 Nov 2020 10:04:52 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://ourcodingclub.github.io/tutorials/ordination/" rel="nofollow ugc">https://ourcodingclub.github.io/tutorials/ordination/</a></p>
]]></description><link>http://an.forum.genostack.com/post/196</link><guid isPermaLink="true">http://an.forum.genostack.com/post/196</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 28 Nov 2020 10:04:52 GMT</pubDate></item><item><title><![CDATA[Reply to 微生态的多元分析  Multivariate analyses in microbial ecology on Sat, 28 Nov 2020 09:55:10 GMT]]></title><description><![CDATA[<p dir="auto">软件列表<br />
<a href="http://ordination.okstate.edu/software.htm" rel="nofollow ugc">http://ordination.okstate.edu/software.htm</a></p>
]]></description><link>http://an.forum.genostack.com/post/195</link><guid isPermaLink="true">http://an.forum.genostack.com/post/195</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 28 Nov 2020 09:55:10 GMT</pubDate></item><item><title><![CDATA[Reply to 微生态的多元分析  Multivariate analyses in microbial ecology on Sat, 28 Nov 2020 09:52:26 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://albertsenlab.org/ampvis2-ordination/" rel="nofollow ugc">https://albertsenlab.org/ampvis2-ordination/</a><br />
ampvis2: A guide to ordination and how to use amp_ordinate in R</p>
]]></description><link>http://an.forum.genostack.com/post/194</link><guid isPermaLink="true">http://an.forum.genostack.com/post/194</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 28 Nov 2020 09:52:26 GMT</pubDate></item><item><title><![CDATA[Reply to 微生态的多元分析  Multivariate analyses in microbial ecology on Sat, 28 Nov 2020 09:44:14 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://cloud.tencent.com/developer/article/1667059?from=10680" rel="nofollow ugc">https://cloud.tencent.com/developer/article/1667059?from=10680</a><br />
<a href="https://cloud.tencent.com/developer/article/1667582" rel="nofollow ugc">https://cloud.tencent.com/developer/article/1667582</a></p>
]]></description><link>http://an.forum.genostack.com/post/193</link><guid isPermaLink="true">http://an.forum.genostack.com/post/193</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 28 Nov 2020 09:44:14 GMT</pubDate></item><item><title><![CDATA[Reply to 微生态的多元分析  Multivariate analyses in microbial ecology on Sat, 28 Nov 2020 09:31:39 GMT]]></title><description><![CDATA[<p dir="auto">Canoco is one of the most popular programs for multivariate statistical analysis using ordination methods in the field of ecology and several related fields. User's Guides of the recent Canoco versions (4.0, 4.5 and 5.0) were cited more than 9200 times in the past 18 years (1999-2017, ISI Web of Knowledge).</p>
<p dir="auto">Canoco 5 is the latest, much re-worked version of the Canoco software, released in October 2012. This site offers you an access to additional resources for the effective use of the software, as well a brief overview of Canoco 5 new features. Use the menu at the right side of this page to access these resources.</p>
]]></description><link>http://an.forum.genostack.com/post/192</link><guid isPermaLink="true">http://an.forum.genostack.com/post/192</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 28 Nov 2020 09:31:39 GMT</pubDate></item><item><title><![CDATA[Reply to 微生态的多元分析  Multivariate analyses in microbial ecology on Sat, 28 Nov 2020 09:27:31 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://r.bio-spring.info/2018/10/22/ordination-analysis-in-r/" rel="nofollow ugc">https://r.bio-spring.info/2018/10/22/ordination-analysis-in-r/</a><br />
常用排序分析方法<br />
相信大家在做微生物多样性研究时经常听到PCA分析、PCoA分析，NMDS分析，CCA分析，RDA分析。它们对物种（或基因、功能）的分析具有重要作用，因而频频出现在16S测序及宏基因组测序中。以上分析本质上都属于排序分析（Ordination analysis）。</p>
<p dir="auto">排序分析（ordination analysis），最早是生态学（ecology）中研究群落（communities）的一大类多元分析手段，将某个地区调查的不同环境（site）以及所对应的物种组成（species），按照相似度（similarity）或距离（distance）对site在排序轴上（ordination axes）进行排序，将其表示为沿一个或多个排序轴排列的点，从而分析各个site或species与环境因子之间的关系。其目的是把多维空间压缩到低维空间（如二维），并且保证因维数降低而导致的信息量损失尽量少，实体（site或species）按其相似关系重新排列，提高其可理解性（interpretability）；同时，通过统计手段检验排序轴（ordination axes）是否能真正代表环境因子的梯度（gradient）1。</p>
<p dir="auto">因此，排序分析的作用可以总结为两个方面：①降维；②探索性分析；</p>
<p dir="auto">常用的排序方法如下2：</p>
<p dir="auto">排序分析方法	Raw data based （线性模型）	Raw data based （单峰模型）	Distance based<br />
间接排序法 （非限制性）	PCA	CA，DCA	PCoA，NMDS<br />
直接排序法 （限制性）	RDA	CCA	dbRDA<br />
其中间接排序法包括：</p>
<p dir="auto">PCA（principal components analysis，主成分分析）<br />
CA（correspondence analysis，对应分析）<br />
DCA（Detrended correspondenceanalysis, 去趋势对应分析)<br />
PCoA（principal coordinate analysis，主坐标分析）<br />
NMDS（non-metric multi-dimensional scaling，非度量多维尺度分析）；<br />
直接排序法包括：</p>
<p dir="auto">RDA（Redundancy analysis，冗余分析）<br />
CCA（canonical correspondence analysis，典范对应分析）<br />
dbRDA（distance based redundancy analysis，基于距离的冗余分析）<br />
CAP（canonical analysis of principal coordinates，主要坐标的典型分析）<br />
其中PCA和RDA是基于线性模型（linear model）的，而CA、DCA、CCA、DCCA是基于单峰（unimodal）模型。</p>
<p dir="auto">选择单峰模型还是线性模型？<br />
用DCA（vegan::decorana()）先对数据（site-species）进行分析；<br />
查看结果中的“Axis lengths”的第一轴DCA1的值，根据该值判断该采用线性模型还是单峰模型：<br />
如果大于4.0，就应该选单峰模型；<br />
如果3.0-4.0之间，选线性模型或者单峰模型均可；<br />
如果小于3.0, 线性模型的结果要好于单峰模型<br />
如何选择一种合适的方法？<br />
排序方法的选择取决于1）您拥有的数据类型，2）您想要/可以使用的相似距离矩阵，以及3）您想说的内容。所有这些排序方法都基于数据构建的相似距离矩阵，使用不同的方法（例如Euclidean，Bray-Curtis，Jaccard等）来计算样本之间的距离。但是，不同方法计算相似度矩阵将不会给出相同的结果。不同的排序方法使用不同的相似度矩阵，并可能对结果产生显著影响。</p>
<p dir="auto">例如，PCA和PCoA将只使用欧几里得距离，而nMDS使用任何你想要的相似距离。</p>
<p dir="auto">在 ResearchGate 上有一个高赞答案3，回答了这个问题。</p>
<p dir="auto">如果您有一个包含空值的数据集（例如某些样本中存在细菌OTU，而其他样本中则没有），我建议您使用Bray-Curtis相似性矩阵和nMDS排序。选择Bray-Curtis距离是因为它不受像欧几里得距离之类的样本之间的零值数量的影响，并且选择nMDS是因为您可以选择任何相似度矩阵，而不像PCA。<br />
如果您的数据集不包含空值（例如环境变量），则可以使用欧几里得距离，并使用PCA或nMDS，在这种情况下，您会看到它会给出相同的结果。<br />
有时候一种方法会比其他方法更好，可以显示复杂群落或因素的特定影响。如果你对结果不满意，尝试不同的方法是很好的做法。但是记住，这些方法仅仅只是排序，你需要针对不同组之间的显著差异进行检验（例如ANOSIM，ADONIS，PERMANOVA ，MRPP …）。</p>
<p dir="auto">R语言实现<br />
示例数据集<br />
使用 vegan 的数据集作为示例数据，该数据集描述了苔原土壤上生长的植物多样性信息和土壤的物理化学性质。其中，varespec 描述了24块样地中44个物种的丰度信息，varechem 描述了这24块样地土壤的14个性质参数。</p>
<p dir="auto">1<br />
library("vegan")<br />
1</p>
<h2>载入需要的程辑包：permute</h2>
<p dir="auto">1</p>
<h2>载入需要的程辑包：lattice</h2>
<p dir="auto">1</p>
<h2>This is vegan 2.5-2</h2>
<p dir="auto">1<br />
2<br />
3<br />
4<br />
5<br />
data("varespec")<br />
data("varechem")</p>
<h1>查看变量</h1>
<p dir="auto">knitr::kable(varespec[1:3,1:5])<br />
Callvulg	Empenigr	Rhodtome	Vaccmyrt	Vaccviti<br />
18	0.55	11.13	0	0.00	17.80<br />
15	0.67	0.17	0	0.35	12.13<br />
24	0.10	1.55	0	0.00	13.47<br />
1<br />
knitr::kable(varechem[1:3,1:5])<br />
N	P	K	Ca	Mg<br />
18	19.8	42.1	139.9	519.4	90.0<br />
15	13.4	39.1	167.3	356.7	70.7<br />
24	20.2	67.7	207.1	973.3	209.1<br />
选择模型<br />
为了确定该选择线性模型还是单峰模型，首先对数据进行DCA分析。在 vegan 中，对应的函数为 decorana()。</p>
<p dir="auto">1<br />
decorana(varespec)</p>
<p dir="auto">1<br />
2<br />
3<br />
4<br />
5<br />
6<br />
7<br />
8<br />
9<br />
10<br />
11</p>
<h2></h2>
<h2>Call:</h2>
<h2>decorana(veg = varespec)</h2>
<h2></h2>
<h2>Detrended correspondence analysis with 26 segments.</h2>
<h2>Rescaling of axes with 4 iterations.</h2>
<h2></h2>
<h2>DCA1   DCA2    DCA3    DCA4</h2>
<h2>Eigenvalues     0.5235 0.3253 0.20010 0.19176</h2>
<h2>Decorana values 0.5249 0.1572 0.09669 0.06075</h2>
<h2>Axis lengths    2.8161 2.2054 1.54650 1.64864</h2>
<p dir="auto">在本例中，Axis lengths 最大值为 2.8161，小于3，因此采用线性模型会比较好。</p>
<p dir="auto">PCA分析<br />
在R语言中，PCA分析和RDA分析是一个函数：rda()。如果只用了物种矩阵（rda(X)）就表示PCA分析，如果同时有物种矩阵和环境因子矩阵（rda(X,Y)）就表示RDA分析。</p>
<p dir="auto">1<br />
rda(varespec)<br />
1<br />
2<br />
3<br />
4<br />
5<br />
6<br />
7<br />
8<br />
9<br />
10<br />
11</p>
<h2>Call: rda(X = varespec)</h2>
<h2></h2>
<h2>Inertia Rank</h2>
<h2>Total            1826</h2>
<h2>Unconstrained    1826   23</h2>
<h2>Inertia is variance</h2>
<h2></h2>
<h2>Eigenvalues for unconstrained axes:</h2>
<h2>PC1   PC2   PC3   PC4   PC5   PC6   PC7   PC8</h2>
<h2>983.0 464.3 132.3  73.9  48.4  37.0  25.7  19.7</h2>
<h2>(Showed only 8 of all 23 unconstrained eigenvalues)</h2>
<p dir="auto">输出结果告诉我们总的特征根（Inertia）为1826，这个值是物种矩阵中各个物种的方差和（sum(apply(varespec,2,var)) = 1825.6594047 ），可以理解为物种分布的总变化量。</p>
<p dir="auto">PCA排序结果中的 Eigenvalues for unconstrained axes 表示每个非约束排序轴所负荷的特征根的量，也可以表示每个轴所能解释的方差变化的量。例如，对于第一轴来说，其解释度为：983.0/1826 = 53.8335159%。</p>
<p dir="auto">CA分析<br />
CA和CCA也是用同一个函数 cca() 实现的。如果参数只有一个物种矩阵，就表示CA分析；如果同时有物种矩阵和环境因子矩阵，那么表示CCA分析。</p>
<p dir="auto">1<br />
cca(varespec)<br />
1<br />
2<br />
3<br />
4<br />
5<br />
6<br />
7<br />
8<br />
9<br />
10<br />
11</p>
<h2>Call: cca(X = varespec)</h2>
<h2></h2>
<h2>Inertia Rank</h2>
<h2>Total           2.083</h2>
<h2>Unconstrained   2.083   23</h2>
<h2>Inertia is scaled Chi-square</h2>
<h2></h2>
<h2>Eigenvalues for unconstrained axes:</h2>
<h2>CA1    CA2    CA3    CA4    CA5    CA6    CA7    CA8</h2>
<h2>0.5249 0.3568 0.2344 0.1955 0.1776 0.1216 0.1155 0.0889</h2>
<h2>(Showed only 8 of all 23 unconstrained eigenvalues)</h2>
<p dir="auto">RDA分析和CCA分析<br />
在分析物种分布与环境因子关系的时候，需要用到约束分析（Constrained ordination），主要类型是RDA和CCA。</p>
<p dir="auto">约束排序和非约束排序的区别在于：在约束排序里，只展示能被环境因子所解释的物种分布变化量。因此，约束排序轴比非约束排序轴的解释量明显要小。</p>
<p dir="auto">1<br />
2</p>
<h1>RDA</h1>
<p dir="auto">rda(varespec,varechem)<br />
1<br />
2<br />
3<br />
4<br />
5<br />
6<br />
7<br />
8<br />
9<br />
10<br />
11<br />
12<br />
13<br />
14<br />
15<br />
16<br />
17</p>
<h2>Call: rda(X = varespec, Y = varechem)</h2>
<h2></h2>
<h2>Inertia Proportion Rank</h2>
<h2>Total         1825.6594     1.0000</h2>
<h2>Constrained   1459.8891     0.7997   14</h2>
<h2>Unconstrained  365.7704     0.2003    9</h2>
<h2>Inertia is variance</h2>
<h2></h2>
<h2>Eigenvalues for constrained axes:</h2>
<h2>RDA1  RDA2  RDA3  RDA4  RDA5  RDA6  RDA7  RDA8  RDA9 RDA10 RDA11 RDA12</h2>
<h2>820.1 399.3 102.6  47.6  26.8  24.0  19.1  10.2   4.4   2.3   1.5   0.9</h2>
<h2>RDA13 RDA14</h2>
<h2>0.7   0.3</h2>
<h2></h2>
<h2>Eigenvalues for unconstrained axes:</h2>
<h2>PC1    PC2    PC3    PC4    PC5    PC6    PC7    PC8    PC9</h2>
<h2>186.19  88.46  38.19  18.40  12.84  10.55   5.52   4.52   1.09</h2>
<p dir="auto">1<br />
2</p>
<h1>CCA</h1>
<p dir="auto">cca(varespec,varechem)<br />
1<br />
2<br />
3<br />
4<br />
5<br />
6<br />
7<br />
8<br />
9<br />
10<br />
11<br />
12<br />
13<br />
14<br />
15<br />
16<br />
17</p>
<h2>Call: cca(X = varespec, Y = varechem)</h2>
<h2></h2>
<h2>Inertia Proportion Rank</h2>
<h2>Total          2.0832     1.0000</h2>
<h2>Constrained    1.4415     0.6920   14</h2>
<h2>Unconstrained  0.6417     0.3080    9</h2>
<h2>Inertia is scaled Chi-square</h2>
<h2></h2>
<h2>Eigenvalues for constrained axes:</h2>
<h2>CCA1   CCA2   CCA3   CCA4   CCA5   CCA6   CCA7   CCA8   CCA9  CCA10</h2>
<h2>0.4389 0.2918 0.1628 0.1421 0.1180 0.0890 0.0703 0.0584 0.0311 0.0133</h2>
<h2>CCA11  CCA12  CCA13  CCA14</h2>
<h2>0.0084 0.0065 0.0062 0.0047</h2>
<h2></h2>
<h2>Eigenvalues for unconstrained axes:</h2>
<h2>CA1     CA2     CA3     CA4     CA5     CA6     CA7     CA8     CA9</h2>
<h2>0.19776 0.14193 0.10117 0.07079 0.05330 0.03330 0.01887 0.01510 0.00949</h2>
]]></description><link>http://an.forum.genostack.com/post/191</link><guid isPermaLink="true">http://an.forum.genostack.com/post/191</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 28 Nov 2020 09:27:31 GMT</pubDate></item><item><title><![CDATA[Reply to 微生态的多元分析  Multivariate analyses in microbial ecology on Sat, 28 Nov 2020 09:26:09 GMT]]></title><description><![CDATA[<p dir="auto">The choice of ordination methods depends on 1) the type of data you have, 2) the similarity distance matrix you want/can use, and 3) what you want to say. All of these ordination methods are based on <strong>similarity distance matrix</strong> constructed on your data, using different methods (such as Euclidean, Bray-Curtis (=Sorensen), Jaccard etc.) to calculate the distance between samples. However, the different methods to calculate the similarity matrix will not give the same results. Different ordination methods use different similarity matrix, and can significantly affect the results. For example, <strong>PCA will use only Euclidean distance</strong>, while nMDS or PCoA use any similarity distance you want.<br />
So, how to choose a method?</p>
<ul>
<li>
<p dir="auto">If you have a dataset that include null values (e.g. most dataset from genotyping using fingerprinting methods include null values, when for example a bacterial OTU is present in some samples and not in others), I would advise you to use Bray-Curtis similarity matrix and nMDS ordination. Bray-Curtis distance is chosen because it is not affected by the number of null values between samples like Euclidean distance, and nMDS is chosen because you can choose any similarity matrix, not like PCA.<br />
有null值时建议基于Bray-Curtis similarity matrix的NMDS方法</p>
</li>
<li>
<p dir="auto">if you have a dataset that do not include null values (e.g. environmental variables), you can use Euclidean distance, and use either PCA or nMDS, and you will see that in this case, it will give you the same results.<br />
没有null值时　选择PCA、NMDS都可以</p>
</li>
</ul>
]]></description><link>http://an.forum.genostack.com/post/190</link><guid isPermaLink="true">http://an.forum.genostack.com/post/190</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 28 Nov 2020 09:26:09 GMT</pubDate></item></channel></rss>