@anneng 在 使用KrakenUniq进行病原分析 中说:
krakenuniq --report-file res_archaea.tsv --db archaea_db --threads 10 test_archaea.fna
krakenuniq构建数据库参数说明
构建过程中需要6步,可以重复运行,每次运行会检查之前的文件确定是否需要运行当前步骤
不指定--jellyfish-hash-size会使用全部内存,内存不够会报错,建议根据服务器内存大小指定该参数
Usage: krakenuniq-build [task option] [options]
Task options (exactly one can be selected -- default is build):
--download-taxonomy Download NCBI taxonomic information
--download-library TYPE Download partial library (TYPE = one of "refseq/bacteria", "refseq/archaea", "refseq/viral").
Use krakenuniq-download for more options.
--add-to-library FILE Add FILE to library
--build Create DB from library (requires taxonomy d/l'ed and at
least one file in library)
--rebuild Create DB from library like --build, but remove
existing non-library/taxonomy files before build
--clean Remove unneeded files from a built database
--shrink NEW_CT Shrink an existing DB to have only NEW_CT k-mers
--standard Download and create default database, which contains complete genomes
for archaea, bacteria and viruses from RefSeq, as well as viral strains
from NCBI. Specify --taxids-for-genomes and --taxids-for-sequences
separately, if desired.
--help Print this message
--version Print version information
Options:
--db DBDIR Kraken DB directory (mandatory except for --help/--version)
--threads # Number of threads (def: 1)
--new-db NAME New Kraken DB name (shrink task only; mandatory
for shrink task)
--kmer-len NUM K-mer length in bp (build/shrink tasks only;
def: 31)
--minimizer-len NUM Minimizer length in bp (build/shrink tasks only;
def: 15)
--jellyfish-hash-size STR Pass a specific hash size argument to jellyfish
when building database (build task only)
--jellyfish-bin STR Use STR as Jellyfish 1 binary.
--max-db-size SIZE Shrink the DB before full build, making sure
database and index together use <= SIZE gigabytes
(build task only)
--shrink-block-offset NUM When shrinking, select the k-mer that is NUM
positions from the end of a block of k-mers
(default: 1)
--work-on-disk Perform most operations on disk rather than in
RAM (will slow down build in most cases)
--taxids-for-genomes Add taxonomy IDs (starting with 1 billion) for genomes.
Only works with 3-column seqid2taxid map with third
column being the name
--taxids-for-sequences Add taxonomy IDs for sequences, starting with 1 billion.
Can be useful to resolve classifications with multiple genomes
for one taxonomy ID.
--min-contig-size NUM Minimum contig size for inclusion in database.
Use with draft genomes to reduce contamination, e.g. with values between 1000 and 10000.
--library-dir DIR Use DIR for reference sequences instead of DBDIR/library.
--taxonomy-dir DIR Use DIR for taxonomy instead of DBDIR/taxonomy.
Experimental:
--uid-database Build a UID database (default no)
--lca-database Build a LCA database (default yes)
--no-lca-database Do not build a LCA database
--lca-order DIR1 Impose a hierarchical order for setting LCAs.
--lca-order DIR2 The directories must be specified relative to the libary directory
... (DBDIR/library). When setting the LCAs, k-mers from sequences in
DIR1 will be set first, and only unset k-mers will be set from
DIR2, etc, and final from the whole library.
Use this option when including low-confidence draft genomes,
e.g use --lca-order Complete_Genome --lca-order Chromosome to
prioritize more complete assemblies.
Keep in mind that this option takes considerably longer.
使用krakenuniq分析数据命令参数说明
Usage: krakenuniq --report-file FILENAME [options] <filename(s)>
Options:
--db NAME Name for Kraken DB (default: none)
--threads NUM Number of threads (default: 1)
--fasta-input Input is FASTA format
--fastq-input Input is FASTQ format
--gzip-compressed Input is gzip compressed
--bzip2-compressed Input is bzip2 compressed
--hll-precision INT Precision for HyperLogLog k-mer cardinality estimation, between 10 and 18 (default: 12)
--exact Compute exact cardinality instead of estimate (slower, requires memory proportional to cardinality!)
--quick Quick operation (use first hit or hits)
--min-hits NUM In quick op., number of hits req'd for classification
NOTE: this is ignored if --quick is not specified
--unclassified-out FILENAME
Print unclassified sequences to filename
--classified-out FILENAME
Print classified sequences to filename
--output FILENAME Print output to filename (default: stdout); "off" will
suppress normal output
--only-classified-output
Print no Kraken output for unclassified sequences
--preload Loads DB into memory before classification
--paired The two filenames provided are paired-end reads
--check-names Ensure each pair of reads have names that agree
with each other; ignored if --paired is not specified
--help Print this message
--version Print version information
Experimental:
--uid-mapping Map using UID database
If none of the *-input or *-compressed flags are specified, and the
file is a regular file, automatic format detection is attempted.