<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[blast NT NR库readme介绍]]></title><description><![CDATA[<pre><code>                     The BLAST Databases
                Last updated on September 29, 2020
</code></pre>
<p dir="auto">IMPORTANT: As of February 4, 2020, the BLAST databases on the FTP site are version 5 (v5).<br />
At the same time, the databases offered has been changed. This document reflects those changes.<br />
Information on newly enabled features with the v5 databases at<br />
<a href="https://ftp.ncbi.nlm.nih.gov/blast/db/blastdbv5.pdf" rel="nofollow ugc">https://ftp.ncbi.nlm.nih.gov/blast/db/blastdbv5.pdf</a></p>
<p dir="auto">This document describes the BLAST databases available on the NCBI FTP site under<br />
the /blast/db directory. The direct URL is <a href="ftp://ftp.ncbi.nlm.nih.gov/blast/db" rel="nofollow ugc">ftp://ftp.ncbi.nlm.nih.gov/blast/db</a></p>
<ol>
<li>
<p dir="auto">Quick Start</p>
<ul>
<li>Get all numbered files for a database with the same base name:<br />
Each of these files represents a subset (volume) of that database,<br />
and all of them are needed to reconstitute the database.</li>
<li>After extraction, there is no need to concatenate the resulting files:<br />
Call the database with the base name, for nr database files, use "-db nr".</li>
<li>For easy download, use the update_blastdb.pl script from the blast+ package.</li>
<li>Incremental update is not available.</li>
</ul>
</li>
<li>
<p dir="auto">General Introduction</p>
</li>
</ol>
<p dir="auto">BLAST search pages under the Basic BLAST section of the NCBI BLAST home page<br />
(<a href="http://blast.ncbi.nlm.nih.gov/" rel="nofollow ugc">http://blast.ncbi.nlm.nih.gov/</a>) use a standard set of BLAST databases for<br />
nucleotide, protein, and translated BLAST searches.  These databases are made<br />
available as compressed archives of pre-formatted form) and can be downloaded from<br />
the /db directory of the BLAST ftp site (<a href="ftp://ftp.ncbi.nlm.nih.gov/blast/db/" rel="nofollow ugc">ftp://ftp.ncbi.nlm.nih.gov/blast/db/</a>).<br />
The FASTA files reside under the /FASTA subdirectory.</p>
<p dir="auto">The pre-formatted databases offer the following advantages:<br />
* Pre-formatting removes the need to run makeblastdb;<br />
* Species-level taxonomy ids are included for each database entry;<br />
* Databases are broken into smaller-sized volumes and are therefore easier<br />
to download;<br />
* Sequences in FASTA format can be generated from the pre-formatted databases<br />
by using the blastdbcmd utility;<br />
* A convenient script (update_blastdb.pl) is available in the blast+ package<br />
to download the pre-formatted databases.</p>
<p dir="auto">Pre-formatted databases must be downloaded using the update_blastdb.pl script or<br />
via FTP in binary mode. Documentation for this script can be obtained by running<br />
the script without any arguments; Perl installation is required.</p>
<p dir="auto">The compressed files downloaded must be inflated with gzip or other decompress<br />
tools. The BLAST database files can then be extracted out of the resulting tar<br />
file using the tar utility on Unix/Linux, or WinZip and StuffIt Expander on<br />
Windows and Macintosh platforms, respectively.</p>
<p dir="auto">Large databases are formatted in multiple one-gigabyte volumes, which are named<br />
using the basename.##.tar.gz convention. All volumes with the same base name are<br />
required. An alias file is provided to tie individual volumes together so that<br />
the database can be called using the base name (without the .nal or .pal<br />
extension). For example, to call the est database, simply use "-db est" option<br />
in the command line (without the quotes).</p>
<p dir="auto">For other genomic BLAST databases, please check the genomes ftp directory at:<br />
<a href="ftp://ftp.ncbi.nlm.nih.gov/genomes/" rel="nofollow ugc">ftp://ftp.ncbi.nlm.nih.gov/genomes/</a></p>
<ol start="3">
<li>Contents of the /blast/db/ directory</li>
</ol>
<p dir="auto">The pre-formatted BLAST databases are archived in this directory. The names of<br />
the available databases are listed at <a href="https://github.com/ncbi/blast_plus_docs#blast-databases" rel="nofollow ugc">https://github.com/ncbi/blast_plus_docs#blast-databases</a><br />
It is recommended to use update_blastdb.pl to download databases from the FTP site in order to make<br />
sure that all volumes are downloaded.</p>
<ol start="4">
<li>Contents of the /blast/db/FASTA directory</li>
</ol>
<p dir="auto">This directory contains FASTA formatted sequence files. The file names<br />
and database contents are listed below. These files must be unpacked before<br />
use.  They are provided as a convenience for users needing these sets in<br />
FASTA format.  For use with BLAST, it is preferable to use the BLAST database<br />
on the FTP site.</p>
<p dir="auto">+-----------------------+-----------------------------------------------------+<br />
|File Name              | Content Description                                 |<br />
+-----------------------+-----------------------------------------------------+<br />
nr.gz*                  | non-redundant protein sequence database with entries<br />
from GenPept, Swissprot, PIR, PDF, PDB, and RefSeq<br />
nt.gz*                  | nucleotide sequence database, with entries from all<br />
traditional divisions of GenBank, EMBL, and DDBJ;<br />
excluding bulk divisions (gss, sts, pat, est, htg)<br />
and wgs entries. Partially non-redundant.<br />
pdbaa.gz*               | protein sequences from pdb protein structures<br />
swissprot.gz*           | swiss-prot database (last major release)<br />
+-----------------------+---------------------------------------------------+<br />
NOTE:<br />
(1) For screening for vector contamination, use the UniVec database:<br />
<a href="ftp://ftp.ncbi.nlm.nih.gov/pub/UniVec/" rel="nofollow ugc">ftp://ftp.ncbi.nlm.nih.gov/pub/UniVec/</a></p>
<ul>
<li>marked files have pre-formatted counterparts.</li>
</ul>
<ol start="5">
<li>Database updates</li>
</ol>
<p dir="auto">The BLAST databases are updated regularly. There is no established incremental<br />
update scheme. We recommend downloading the complete databases regularly to<br />
keep their content current.</p>
<ol start="6">
<li>Non-redundant defline syntax</li>
</ol>
<p dir="auto">The non-redundant databases are nr, nt and pataa. Identical sequences are<br />
merged into one entry in these databases. To be merged two sequences must<br />
have identical lengths and every residue at every position must be the<br />
same.  The FASTA deflines for the different entries that belong to one<br />
record are separated by control-A characters invisible to most<br />
programs. In the example below both entries Q57293.1 and AAB05030.1<br />
have the same sequence, in every respect:</p>
<blockquote>
<p dir="auto">Q57293.1 RecName: Full=Fe(3+) ions import ATP-binding protein FbpC ^AAAB05030.1 afuC<br />
[Actinobacillus pleuropneumoniae] ^AAAB17216.1 afuC [Actinobacillus pleuropneumoniae]<br />
MNNDFLVLKNITKSFGKATVIDNLDLVIKRGTMVTLLGPSGCGKTTVLRLVAGLENPTSGQIFIDGEDVTKSSIQNRDIC<br />
IVFQSYALFPHMSIGDNVGYGLRMQGVSNEERKQRVKEALELVDLAGFADRFVDQISGGQQQRVALARALVLKPKVLILD<br />
EPLSNLDANLRRSMREKIRELQQRLGITSLYVTHDQTEAFAVSDEVIVMNKGTIMQKARQKIFIYDRILYSLRNFMGEST<br />
ICDGNLNQGTVSIGDYRFPLHNAADFSVADGACLVGVRPEAIRLTATGETSQRCQIKSAVYMGNHWEIVANWNGKDVLIN<br />
ANPDQFDPDATKAFIHFTEQGIFLLNKE</p>
</blockquote>
<p dir="auto">Individual sequences are now identified simply by their accession.version.</p>
<p dir="auto">For databases whose entries are not from official NCBI sequence databases,<br />
such as Trace database, the gnl| convention is used. For custom databases,<br />
this convention should be followed and the id for each sequence must be<br />
unique, if one would like to take the advantage of indexed database,<br />
which enables specific sequence retrieval using blastdbcmd program included<br />
in the blast executable package.  One should refer to documents<br />
distributed in the standalone BLAST package for more details.</p>
<ol start="7">
<li>Formatting a FASTA file into a BLASTable database</li>
</ol>
<p dir="auto">FASTA files need to be formatted with makeblastdb before they can be used in local<br />
blast search. For those from NCBI, the following makeblastdb commands are<br />
recommended:</p>
<p dir="auto">For nucleotide fasta file:   makeblastdb -in input_db -dbtype nucl -parse_seqids<br />
For protein fasta file:      makeblastdb -in input_db -dbtype prot -parse_seqids</p>
<p dir="auto">In general, if the database is available as BLAST database, it is better to use the<br />
preformatted database.</p>
<ol start="8">
<li>Technical Support</li>
</ol>
<p dir="auto">Questions and comments on this document and NCBI BLAST related questions<br />
should be sent to the blast-help group at:<br />
<a href="mailto:blast-help@ncbi.nlm.nih.gov" rel="nofollow ugc">blast-help@ncbi.nlm.nih.gov</a></p>
<p dir="auto">For information about other NCBI resources/services, please send email to<br />
NCBI User Service at:<br />
<a href="mailto:info@ncbi.nlm.nih.gov" rel="nofollow ugc">info@ncbi.nlm.nih.gov</a></p>
]]></description><link>http://an.forum.genostack.com/topic/263/blast-nt-nr库readme介绍</link><generator>RSS for Node</generator><lastBuildDate>Sat, 13 Jun 2026 10:41:20 GMT</lastBuildDate><atom:link href="http://an.forum.genostack.com/topic/263.rss" rel="self" type="application/rss+xml"/><pubDate>Mon, 29 Mar 2021 09:27:13 GMT</pubDate><ttl>60</ttl></channel></rss>