序列比对研究
-
一,什么是序列比对
https://en.wikipedia.org/wiki/Sequence_alignment
序列比对即alignment,也叫序列对齐。Sequence alignment is the procedure of comparing two (pair-wise alignment) or more(multiple sequence alignment) sequences by searching for a series of individual characters or character patterns that are in the same order in the sequences.二,序列比对的类型
2.1 Pairwise Sequence Alignment (PSA) is used to identify regions of similarity that may indicate functional, structural and/or evolutionary relationships between two biological sequences (protein or nucleic acid).
https://www.ebi.ac.uk/seqdb/confluence/display/JDSAT/Pairwise+Sequence+Alignment
两两比对又分为下面几种类型

全局比对试图对齐整个序列,比较适合相似度高而且长度比较类似的序列。
局部比对试图找出来局部连续的子序列,序列长度不同,序列有共同特征时,比较适合这种方式。

两两比对有如下几种方法:
- Dot matrix analysis
除非序列非常相似,都应该首先使用该方法。因为该方法可以展示所有的对齐可能,包括插入、删除、重复等。

https://www.sanger.ac.uk/tool/seqtools/
seqtools里面有一个叫dotter的工具 可以查看dot矩阵图
2. The dynamic programming (or DP) algorithm
比较费内存 但是很适合找最优对齐
3. Word or k-tuple methods, such as used by the programs FASTA and BLAST
比较适合搜索大型数据库2.2 Multiple Sequence Alignment (MSA) is generally the alignment of three or more biological sequences (protein or nucleic acid) of similar length. From the output, homology can be inferred and the evolutionary relationships between the sequences studied.
By contrast, Pairwise Sequence Alignment tools are used to identify regions of similarity that may indicate functional, structural and/or evolutionary relationships between two biological sequences.https://www.ebi.ac.uk/seqdb/confluence/display/JDSAT/Multiple+Sequence+Alignment
三,序列比对的应用
3.1 物种分类
https://help.ezbiocloud.net/pairwise-nucleotide-sequence-alignment/

距离评价(distance score):
mismatches/(matches+mismatches)
3.2 overlap determination in genome sequence assembly[4]
3.3 gene finding and comparison[4]
3.4 protein sequence comparison[4]参考资料:
1.Bioinformatics: Sequence and Genome Analysis
2.https://www.ebi.ac.uk/seqdb/confluence/display/JDSAT/Pairwise+Sequence+Alignment
3.https://www.codeproject.com/Articles/304772/DNA-Sequence-Alignment-using-Dynamic-Programming-A
4.Reducing storage requirements for biological sequence comparison
https://github.com/Peteraya/fer_bioinformatics - Dot matrix analysis
-
Dynamic Programming
Dynamic Programming is mainly an optimization over plain recursion. Wherever we see a recursive solution that has repeated calls for same inputs, we can optimize it using Dynamic Programming. The idea is to simply store the results of subproblems, so that we do not have to re-compute them when needed later. This simple optimization reduces time complexities from exponential to polynomial. For example, if we write simple recursive solution for Fibonacci Numbers, we get exponential time complexity and if we optimize it by storing solutions of subproblems, time complexity reduces to linear.

-