blast

anneng

https://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/algo/blast/
blast的数据库格式:
nhr is the header file,
nin is the index file,
nsq is the sequence file
https://www.biostars.org/p/111501/

anneng

blast数据库的格式
https://www.yumpu.com/en/document/view/31537242/ncbi-blast-database-format-janelia-farm-research-campus
NCBI BLAST Database Format - Janelia Farm Research Campus.pdf

anneng

http://sequenceserver.com
一个界面更友好的blast

anneng

https://www.ncbi.nlm.nih.gov/IEB/ToolBox/SDKDOCS/DATAMODL.HTML
数据模型

anneng

https://open.oregonstate.education/computationalbiology/chapter/command-line-blast/

anneng

BLASTn (Nucleotide BLAST): compares one or more nucleotide query sequences to a subject nucleotide sequence or a database of nucleotide sequences. This is useful when trying to determine the evolutionary relationships among different organisms (see Comparing two or more sequences below).
BLASTx (translated nucleotide sequence searched against protein sequences): compares a nucleotide query sequence that is translated in six reading frames (resulting in six protein sequences) against a database of protein sequences. Because blastx translates the query sequence in all six reading frames and provides combined significance statistics for hits to different frames, it is particularly useful when the reading frame of the query sequence is unknown or it contains errors that may lead to frame shifts or other coding errors. Thus blastx is often the first analysis performed with a newly determined nucleotide sequence.
tBLASTn (protein sequence searched against translated nucleotide sequences): compares a protein query sequence against the six-frame translations of a database of nucleotide sequences. Tblastn is useful for finding homologous protein coding regions in unannotated nucleotide sequences such as expressed sequence tags (ESTs) and draft genome records (HTG), located in the BLAST databases est and htgs, respectively. ESTs are short, single-read cDNA sequences. They comprise the largest pool of sequence data for many organisms and contain portions of transcripts from many uncharacterized genes. Since ESTs have no annotated coding sequences, there are no corresponding protein translations in the BLAST protein databases. Hence a tblastn search is the only way to search for these potential coding regions at the protein level. The HTG sequences, draft sequences from various genome projects or large genomic clones, are another large source of unannotated coding regions.
BLASTp (Protein BLAST): compares one or more protein query sequences to a subject protein sequence or a database of protein sequences. This is useful when trying to identify a protein (see From sequence to protein and gene below).

anneng

https://www.ncbi.nlm.nih.gov/books/NBK279688/
Building a BLAST database with your (local) sequences

$ makeblastdb -in test.fsa -parse_seqids -blastdb_version 5 -taxid_map test_map.txt -title "Cookbook demo" -dbtype prot

anneng

直接比较两个序列
blastn -query test.fasta -subject ac008901.fasta

也可以把一个序列用makeblastdb做成数据库然后查询

anneng

blast算法流程：

https://pdf.sciencedirectassets.com/280203/1-s2.0-S1877050918X00039/1-s2.0-S1877050918301108/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEML%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCXVzLWVhc3QtMSJHMEUCIQCXaIqiV2zRnAU91SvpDPg8HGhD%2BEWJKyp6aoDHwRebYQIgO4Uqm2kji6b6TlEczBEvr02t4fNbM8stvqnGako4fnMqtAMIKhADGgwwNTkwMDM1NDY4NjUiDL9LwjOfcNYmi81I8iqRAz1Bm%2BsEuUhtRlq0czOZtW6%2B%2FnzWByZA4dS0UjhABbODbWYS5Ohhd8tDzMd8f0SmFQlbcF9sbSqZnxm7sJK1prbP7Dy7tjVqUlGkAZ0GFOWwYEVpK3mJUSEaeAqYXqSYwoxKfFTdmhXe1yzks8OEFk2IeHOmfpbd0Xfya9DR4pDHTx%2Bri2QCmKTgcL09TL4nY7kMMx2kGEo1En%2FfdHhwZL%2BUF1mqSKhwX7Ayk7i5x4EQMejoRtQch0QMV%2BZm825PrIGO1likQ7KrliUrYvqwQ3l7uFdd747vvaQVdNd5XTlmr8zuVgPMIyA1cN9HVRgMurBa2ZbvpFzXnQ%2BHk4CzhwbJtGCiUBLXAFiGG8T4pYkb9ds7SciORF7pRSkr1yNJ8IkVr3VpuTbn8zLBt5lPZ8oWrKsll6TaRXNZBcZ66mzC5smZoB5TeFJzIDJ%2FLDnmNh9TPYc6JN3FQcmgGwVAXlBosJZL%2Bw5vvoHSvs6Mr3FvibeBoC9O0rKMvOGvu6zj0IXpYzepcWYdFdeb7g8Nq1Z4MJ2t5YMGOusBD0biXhie39OcpXnFSqKgzIhfFQkADX4%2FZ%2Brh7cXoJG5BQT%2FYHGI1U%2BahCBqOIpyk9mZYh3sSZls42KoGMRUs2qq0dQWQTDOGST8pELJh2Ft5tO5rmfgOgn6jz6Sp8kSRprduTuKbq1FXp2ejvj0WNOGw3QTefCYdyO08SmamN8K%2Bj8LbECvgaUYCAz0M1eNum2FfmJFMnnzXh3HfxFX36E9gsH4bI0A5W1ny9Vv%2FYPrtdDeWkmWOqy06Ls%2Fl5ko7BscQCj%2FnWPKBx5r8t0bBso8vN3hyeZPLFeIJ3ws2TTJ%2BnlbBjkXQdjwgeQ%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20210416T103735Z&X-Amz-SignedHeaders=host&X-Amz-Expires=300&X-Amz-Credential=ASIAQ3PHCVTY7NDY4LGS%2F20210416%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Signature=deed69f42675baeac58b861431d25a4d3c935d611241972e26fcd48772e65488&hash=609b019c6f3e02710ede4a21d1b4450ab839af5d60f65493600a6090dd0ccc0d&host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&pii=S1877050918301108&tid=spdf-46af0cf8-0df4-4a1b-b3a8-89db04f92e26&sid=98b0a6823be1a643d07b1e41bc1c3c37a1ffgxrqa&type=client

anneng

https://open.oregonstate.education/computationalbiology/chapter/command-line-blast/
blast的一个教程