物种注释
-
https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.13095
Identification of the taxonomic origin of a DNA sequence is crucial for many sequencing projects, e.g. metagenomics studies, identification of contaminations in whole genome sequencing projects and filtering of organisms of interest in marker‐gene based community analyses.
Last common ancestor algorithms are powerful approaches to estimate the taxonomy of a given sequence and have been widely used for classification of next‐generation sequencing (NGS) reads, also known as 2nd generation sequencing reads.
Here, we present BASTA (https://github.com/timkahlke/BASTA), a basic sequence taxonomy annotator, which extends last common ancestor estimations from sequencing reads to any kind of nucleotide or amino acid sequence utilizing NCBI taxonomies of user‐defined best hits.
BASTA can be configured to use the output of many common sequence comparison tools, e.g. BLAST and Diamond, in conjunction with either provided or user‐defined target sequence databases.

-
https://webcache.googleusercontent.com/search?q=cache:fJfbDasHie8J:https://nickilottmetagenomics.wordpress.com/2015/01/19/comparing-diamond-lca-against-kraken/+&cd=10&hl=en&ct=clnk&gl=us
DIAMOND + LCA outperforms Kraken in the sensitivity stakes by ~10 fold when assigning reads from mouse gut to genera. Accuracy of these assignments is not assessed, however.
DIAMOND+LCADIAMOND (version 0.3.9) was run using the following command:
diamond blastx –db /ifs/mirror/diamond/nr –query <input.fastq> -v 2 –threads 16 -o <diamond.output.tsv>
LCA mapper (from mtools, MEGAN5) was run using:
lcamapper.sh -i <diamond.output.tsv> -f Detect -ms 50 -me 0.01 -tp 50 -gt megan/gi_taxid_prot.bin -o <lca.output>
-
https://jshleap.github.io/bioinformatics/writting-jMEGAN_notes/
Introduction to MEGAN methods -
NCBI taxonomy databases
https://www.uppmax.uu.se/resurser/databases/ncbi-taxonomy-databases/
Name Source Notes
taxdump NCBI NCBI taxonomic database, in multiple .dmp files (see taxdump_readme.txt or link)
taxcat NCBI NCBI taxonomic categories, in categories.dmp (see taxcat_readme.txt or link)
taxdump_readme.txt NCBI NCBI taxdump file description
taxcat_readme.txt NCBI NCBI taxcat file description
gi_taxid_nucl.dmp NCBI Mappings of nucleotide GI to taxid (DEPRECATED)
gi_taxid_prot.dmp NCBI Mappings of protein GI to taxid (DEPRECATED)
nucl_wgs.accession2taxid NCBI TaxID mapping for nucleotide records of type WGS or TSA
nucl_gb.accession2taxid NCBI TaxID mapping for nucleotide records not of the above types
prot.accession2taxid NCBI TaxID mapping for protein records
pdb.accession2taxid NCBI TaxID mapping for PDB protein records
dead_nucl.accession2taxid NCBI TaxID mapping for dead nucleotide records
dead_prot.accession2taxid NCBI TaxID mapping for dead protein records
dead_wgs.accession2taxid NCBI TaxID mapping for dead WGS or TSA records -
http://currents.plos.org/treeoflife/index.html%3Fp=395.html
将NCBI和wiki联系起来 -
-
https://www.ncbi.nlm.nih.gov/labs/pmc/articles/PMC5978398/
BLAST-based validation of metagenomic sequence assignments
用blast对宏基因的序列物种鉴定(例如kraken)进行校正 -
-
https://www.future-science.com/doi/10.2144/000114135
TUIT, a BLAST-based tool for taxonomic classification of nucleotide sequences
这个里面提到 16S是人体微生物的进标准 物种注释大部分用RDP数据库分类 但是RDP少了NT中的很多参考序列 因此可以用TUIT做进一步的补充分类。TUIT也可以适用于其他场景。
TUIT 不限于任何特定类型的序列,并且对短至 125 个碱基对的序列保持高特异性水平;它还具有将序列分类到物种级别的能力。


-
https://academic.oup.com/bib/article/19/3/495/2733162
Comparing genome versus proteome-based identification of clinical bacterial isolates
