突变分析
-
-
-
https://www.sciencedirect.com/science/article/pii/S1879625721000651#bib0070
Quantitative measures of within-host viral genetic diversity

这个文章对病毒的遗传多样性指标做了详细的介绍 而且有例子 重点看看香农熵有两个场景:针对碱基位置的和针对单倍型的


-
An integrated software for virus community sequencing data analysis

一个集成的病毒群体研究软件 也计算了很多指标 -
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4944299/
Sequence analysis. Reads were aligned to either a PR8 or a WSN33 reference sequence using Bowtie2 (44). The alignments were sorted, and PCR duplicates were removed using Picard (http://broadinstitute.github.io/picard/). Variants were called using either DeepSNV (26) or LoFreq (28) and filtered using the Pysam module in Python and custom R scripts available for download at https://github.com/lauringlab/Benchmarking_paper. Bases with a Phred score of <30 were masked in the DeepSNV analysis. We connected all of these steps into an analytical pipeline using bpipe (45), which is available for download at https://github.com/lauringlab/variant_pipeline. To save memory during SNV processing, only variants with P values of <0.9 were included in our receiver operating characteristic (ROC) curve analysis, as the vast majority of true negatives are trivial to identify and have a P value of 1. For ease of viewing, and to account for this analytical artifact, we extended the ROC curves horizontally from the last observed change in sensitivity. All the commands required to generate the figures are available for anonymous download at https://github.com/lauringlab/Benchmarking_paper. An interactive Shiny app of our benchmarking work can be downloaded at https://github.com/lauringlab/Benchmarking_shiny.
Diversity metrics. The Shannon entropy (H) of each genomic position was calculated as follows: H = −Σi = 1nxiln(xi), where xi represents the frequency of the ith allele and n represents the number of alleles found at the given position. Since our data do not represent haplotypes, we report Shannon's entropy as the mean across all genomic positions.
The L1 norm (L) between 2 populations was calculated as follows: L = Σi = 1n|pi − qi|, where n represents the union of variants between the two samples and pi and qi represent the frequencies of the ith variant in each sample.Data set accession number. All raw fastq files have been submitted to the Sequence Read Archive (SRA) under BioProject accession number PRJNA317621.
-
-


