<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[使用KrakenUniq进行病原分析]]></title><description><![CDATA[<p dir="auto">1.软件安装<br />
conda create -n tax_classifier　//后续的物种分类软件都安装到这个目录下面<br />
conda activate tax_classifier<br />
conda install krakenuniq<br />
2.构建数据库<br />
krakenuniq兼容kraken数据库　不兼容kraken2数据库<br />
krakenuniq-download --db DBDIR --dust microbial-nt</p>
<p dir="auto">//如果只涉及病原的话 建议只做微生物相关的数据库 -R表示用rsync下载 建议使用这种方式<br />
krakenuniq-download --db . --taxa "archaea,bacteria,fungi,parasitic_worms,protozoa,viral" --dust --exclude-environmental-taxa microbial-nt -R</p>
<p dir="auto">//构建数据库<br />
krakenuniq-build --db DBDIR --kmer-len 31 --threads 10 --taxids-for-genomes --taxids-for-sequences</p>
<p dir="auto">//序列分析<br />
krakenuniq --report-file res_archaea.tsv --db archaea_db --threads 10  test_archaea.fna</p>
<p dir="auto">//注意 krakenuniq依赖 jellyfish 1<br />
Found jellyfish v2.3.0<br />
Kraken requires jellyfish version 1</p>
]]></description><link>http://an.forum.genostack.com/topic/237/使用krakenuniq进行病原分析</link><generator>RSS for Node</generator><lastBuildDate>Sat, 13 Jun 2026 14:27:44 GMT</lastBuildDate><atom:link href="http://an.forum.genostack.com/topic/237.rss" rel="self" type="application/rss+xml"/><pubDate>Sat, 06 Mar 2021 04:55:30 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to 使用KrakenUniq进行病原分析 on Fri, 16 Apr 2021 06:25:39 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="http://an.forum.genostack.com/uid/1">@anneng</a> 在 <a href="/post/461">使用KrakenUniq进行病原分析</a> 中说：</p>
<blockquote>
<p dir="auto">krakenuniq --report-file res_archaea.tsv --db archaea_db --threads 10  test_archaea.fna</p>
</blockquote>
<h2>krakenuniq构建数据库参数说明</h2>
<p dir="auto">构建过程中需要6步，可以重复运行，每次运行会检查之前的文件确定是否需要运行当前步骤<br />
不指定--jellyfish-hash-size会使用全部内存，内存不够会报错，建议根据服务器内存大小指定该参数</p>
<pre><code>Usage: krakenuniq-build [task option] [options]

Task options (exactly one can be selected -- default is build):
  --download-taxonomy        Download NCBI taxonomic information
  --download-library TYPE    Download partial library (TYPE = one of "refseq/bacteria", "refseq/archaea", "refseq/viral"). 
                             Use krakenuniq-download for more options.
  --add-to-library FILE      Add FILE to library
  --build                    Create DB from library (requires taxonomy d/l'ed and at 
                             least one file in library)
  --rebuild                  Create DB from library like --build, but remove
                             existing non-library/taxonomy files before build
  --clean                    Remove unneeded files from a built database
  --shrink NEW_CT            Shrink an existing DB to have only NEW_CT k-mers
  --standard                 Download and create default database, which contains complete genomes 
                             for archaea, bacteria and viruses from RefSeq, as well as viral strains 
                             from NCBI. Specify --taxids-for-genomes and --taxids-for-sequences
                             separately, if desired.

  --help                     Print this message
  --version                  Print version information

Options:
  --db DBDIR                 Kraken DB directory (mandatory except for --help/--version)
  --threads #                Number of threads (def: 1)
  --new-db NAME              New Kraken DB name (shrink task only; mandatory
                             for shrink task)
  --kmer-len NUM             K-mer length in bp (build/shrink tasks only;
                             def: 31)
  --minimizer-len NUM        Minimizer length in bp (build/shrink tasks only;
                             def: 15)
  --jellyfish-hash-size STR  Pass a specific hash size argument to jellyfish
                             when building database (build task only)
  --jellyfish-bin STR        Use STR as Jellyfish 1 binary.
  --max-db-size SIZE         Shrink the DB before full build, making sure
                             database and index together use &lt;= SIZE gigabytes
                             (build task only)
  --shrink-block-offset NUM  When shrinking, select the k-mer that is NUM
                             positions from the end of a block of k-mers
                             (default: 1)
  --work-on-disk             Perform most operations on disk rather than in
                             RAM (will slow down build in most cases)
  --taxids-for-genomes       Add taxonomy IDs (starting with 1 billion) for genomes.
                             Only works with 3-column seqid2taxid map with third 
                             column being the name
  --taxids-for-sequences     Add taxonomy IDs for sequences, starting with 1 billion.
                             Can be useful to resolve classifications with multiple genomes
                             for one taxonomy ID.
  --min-contig-size NUM      Minimum contig size for inclusion in database.
                             Use with draft genomes to reduce contamination, e.g. with values between 1000 and 10000.
  --library-dir DIR          Use DIR for reference sequences instead of DBDIR/library.
  --taxonomy-dir DIR         Use DIR for taxonomy instead of DBDIR/taxonomy.

Experimental:
  --uid-database             Build a UID database (default no)
  --lca-database             Build a LCA database (default yes)
  --no-lca-database          Do not build a LCA database
  --lca-order DIR1           Impose a hierarchical order for setting LCAs.
  --lca-order DIR2           The directories must be specified relative to the libary directory
  ...                        (DBDIR/library). When setting the LCAs, k-mers from sequences in
                             DIR1 will be set first, and only unset k-mers will be set from
                             DIR2, etc, and final from the whole library.
							 Use this option when including low-confidence draft genomes,
                             e.g use --lca-order Complete_Genome --lca-order Chromosome to
                             prioritize more complete assemblies.
                             Keep in mind that this option takes considerably longer.
</code></pre>
<h2>使用krakenuniq分析数据命令参数说明</h2>
<pre><code>Usage: krakenuniq --report-file FILENAME [options] &lt;filename(s)&gt;

Options:
  --db NAME               Name for Kraken DB (default: none)
  --threads NUM           Number of threads (default: 1)
  --fasta-input           Input is FASTA format
  --fastq-input           Input is FASTQ format
  --gzip-compressed       Input is gzip compressed
  --bzip2-compressed      Input is bzip2 compressed
  --hll-precision INT     Precision for HyperLogLog k-mer cardinality estimation, between 10 and 18 (default: 12)
  --exact                 Compute exact cardinality instead of estimate (slower, requires memory proportional to cardinality!)
  --quick                 Quick operation (use first hit or hits)
  --min-hits NUM          In quick op., number of hits req'd for classification
                          NOTE: this is ignored if --quick is not specified
  --unclassified-out FILENAME
                          Print unclassified sequences to filename
  --classified-out FILENAME
                          Print classified sequences to filename
  --output FILENAME       Print output to filename (default: stdout); "off" will
                          suppress normal output
  --only-classified-output
                          Print no Kraken output for unclassified sequences
  --preload               Loads DB into memory before classification
  --paired                The two filenames provided are paired-end reads
  --check-names           Ensure each pair of reads have names that agree
                          with each other; ignored if --paired is not specified
  --help                  Print this message
  --version               Print version information

Experimental:
  --uid-mapping           Map using UID database

If none of the *-input or *-compressed flags are specified, and the 
file is a regular file, automatic format detection is attempted.
</code></pre>
]]></description><link>http://an.forum.genostack.com/post/589</link><guid isPermaLink="true">http://an.forum.genostack.com/post/589</guid><dc:creator><![CDATA[ice-melt]]></dc:creator><pubDate>Fri, 16 Apr 2021 06:25:39 GMT</pubDate></item><item><title><![CDATA[Reply to 使用KrakenUniq进行病原分析 on Thu, 18 Mar 2021 07:39:18 GMT]]></title><description><![CDATA[<p dir="auto">jellyfish的hash太大 还是容易被kill 可以将其设置小一点 这样不会影响后续Kraken的分析精度<br />
If you encounter problems with Jellyfish not being able to allocate enough memory on your system to run the build process, you can supply a smaller hash size to Jellyfish using kraken-build's --jellyfish-hash-size switch. Each space in the hash table uses approximately 6.9 bytes, so using "--jellyfish-hash-size 6400M" will use a hash table size of 6.4 billion spaces and require 44.3 GB of RAM.</p>
<p dir="auto"><a href="https://github.com/DerrickWood/kraken/issues/65" rel="nofollow ugc">https://github.com/DerrickWood/kraken/issues/65</a></p>
<p dir="auto"><a href="http://gensoft.pasteur.fr/docs/kraken/0.10.5-beta/MANUAL.html" rel="nofollow ugc">http://gensoft.pasteur.fr/docs/kraken/0.10.5-beta/MANUAL.html</a><br />
<a href="http://gensoft.pasteur.fr/docs/kraken/0.10.5-beta/MANUAL.html" rel="nofollow ugc">http://gensoft.pasteur.fr/docs/kraken/0.10.5-beta/MANUAL.html</a></p>
]]></description><link>http://an.forum.genostack.com/post/487</link><guid isPermaLink="true">http://an.forum.genostack.com/post/487</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 18 Mar 2021 07:39:18 GMT</pubDate></item><item><title><![CDATA[Reply to 使用KrakenUniq进行病原分析 on Thu, 18 Mar 2021 02:14:53 GMT]]></title><description><![CDATA[<p dir="auto">一个例子<br />
<a href="https://hpc.nih.gov/apps/KrakenUniq.html" rel="nofollow ugc">https://hpc.nih.gov/apps/KrakenUniq.html</a></p>
]]></description><link>http://an.forum.genostack.com/post/484</link><guid isPermaLink="true">http://an.forum.genostack.com/post/484</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 18 Mar 2021 02:14:53 GMT</pubDate></item><item><title><![CDATA[Reply to 使用KrakenUniq进行病原分析 on Thu, 18 Mar 2021 02:11:57 GMT]]></title><description><![CDATA[<p dir="auto">总结：经过技术验证，要顺利使用krakenuniq，建议分别构建物种库，例如细菌，病毒，真菌，古菌等，然后在比对的时候同时使用这些数据库进行比对。</p>
]]></description><link>http://an.forum.genostack.com/post/483</link><guid isPermaLink="true">http://an.forum.genostack.com/post/483</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 18 Mar 2021 02:11:57 GMT</pubDate></item><item><title><![CDATA[Reply to 使用KrakenUniq进行病原分析 on Thu, 18 Mar 2021 02:01:40 GMT]]></title><description><![CDATA[<p dir="auto">add genomes to existing database<br />
kraken类的软件 都不支持给已经编译好的数据库加新序列 需要重建 这个对比较大的序列源的情况几乎很难实现 需要大量的内存<br />
<a href="https://github.com/DerrickWood/kraken2/issues/45" rel="nofollow ugc">https://github.com/DerrickWood/kraken2/issues/45</a><br />
I would delete the *.k2d files and the seqid2taxid.map file and use this:<br />
kraken2-build --add-to-library $NEW_FILE --db .<br />
and then rebuild. You cannot add in new genomes once the database is built but you can just add more to the library before building.</p>
<p dir="auto">There is no current way to combine builds due to the way that the database is stored in memory.</p>
]]></description><link>http://an.forum.genostack.com/post/482</link><guid isPermaLink="true">http://an.forum.genostack.com/post/482</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 18 Mar 2021 02:01:40 GMT</pubDate></item><item><title><![CDATA[Reply to 使用KrakenUniq进行病原分析 on Thu, 18 Mar 2021 01:56:06 GMT]]></title><description><![CDATA[<p dir="auto">krakenuniq --db HOST --db PROK --db EUK_DRAFT<br />
Hierarchical read classification with multiple databases<br />
krakenuniq 支持多个数据库的比对  利用这种方式可以分别单独构建数据库</p>
]]></description><link>http://an.forum.genostack.com/post/481</link><guid isPermaLink="true">http://an.forum.genostack.com/post/481</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Thu, 18 Mar 2021 01:56:06 GMT</pubDate></item><item><title><![CDATA[Reply to 使用KrakenUniq进行病原分析 on Wed, 17 Mar 2021 12:21:05 GMT]]></title><description><![CDATA[<p dir="auto">krakenuniq 在从nt库提取微生物库（或者任何一个自库）的时候 完全基于taxnomy 和nt 文件来解析  而我们把这些数据导入数据库后 一条sql就完成了</p>
]]></description><link>http://an.forum.genostack.com/post/480</link><guid isPermaLink="true">http://an.forum.genostack.com/post/480</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Wed, 17 Mar 2021 12:21:05 GMT</pubDate></item><item><title><![CDATA[Reply to 使用KrakenUniq进行病原分析 on Wed, 17 Mar 2021 06:34:29 GMT]]></title><description><![CDATA[<p dir="auto">/home/bioinfo/miniconda2/envs/tax_classifier/libexec/build_db.sh: line 46: 38880 Killed                  jellyfish count -m 31 -s 166082697099 -C -t 10 -o database /dev/fd/63<br />
Command exited with non-zero status 137</p>
<p dir="auto">目录中有一个nt.fna为370GB多  因此把library-files.txt删掉 重新build 试试  build 命令会自动找library目录下面的文件</p>
<p dir="auto">如果library中的数据比较多 krakenuniq非常耗费内存 中间在jellyfish的步骤会被kill掉 因此需要一个一个的处理library中的数据 ，例如先处理细菌的 再处理病毒的  正常的打印如下：</p>
<p dir="auto">krakenuniq-build --db . --kmer-len 31 --threads 15 --taxids-for-genomes --taxids-for-sequences  --jellyfish-hash-size 5000M<br />
Found jellyfish v1.1.11<br />
Kraken build set to minimize disk writes.<br />
Found 2 sequence files (*.{fna,fa,ffn,fasta,fsa}) in the library directory.<br />
Creating k-mer set (step 1 of 6)...<br />
Using jellyfish<br />
K-mer set created. [2m30.142s]<br />
Skipping step 2, no database reduction requested.<br />
Sorting k-mer set (step 3 of 6)...<br />
db_sort: Getting database into memory ...Loaded database with 801475581 keys with k of 31 [val_len 4, key_len 8].<br />
Loaded database with 801475581 keys with k of 31 [val_len 4, key_len 8].<br />
db_sort: Sorting ...db_sort: Sorting complete - writing database to disk ...<br />
K-mer set sorted. [8m26.865s]<br />
Creating seqID to taxID map (step 4 of 6)..<br />
18949 sequences mapped to taxa. [0.023s]<br />
Creating taxDB (step 5 of 6)...<br />
Building taxonomy index from taxonomy//nodes.dmp and taxonomy//names.dmp. Done, got 2312242 taxa<br />
taxDB construction finished. [11.991s]<br />
Building  KrakenUniq LCA database (step 6 of 6)...<br />
Adding taxonomy IDs for sequences<br />
Adding taxonomy IDs for genomes<br />
Reading taxonomy index from taxDB. Done.<br />
Getting database0.kdb into memory (8.957 GB) ... Done<br />
Loaded database with 801475581 keys with k of 31 [val_len 4, key_len 8].<br />
Reading sequence ID to taxonomy ID mapping ... [starting new taxonomy IDs with 1000000001] got 18949 mappings.<br />
Finished processing 37898 sequences (skipping 0 empty sequences, and 0 sequences with no taxonomy mapping)<br />
Writing kmer counts to database.kdb.counts...<br />
Writing database from RAM back to database.kdb ...<br />
Writing new TaxDB ...<br />
LCA database created. [5m10.060s]<br />
Creating database summary report database.report.tsv ...<br />
/home/bioinfo/miniconda2/envs/tax_classifier/libexec/classify -d ././database.kdb -i ././database.idx -t 15 -r database.report.tsv -a ././taxDB -p 12 /dev/fd/63<br />
Database ././database.kdb<br />
Loaded database with 801475581 keys with k of 31 [val_len 4, key_len 8].<br />
Reading taxonomy index from ././taxDB. Done.<br />
37898 sequences (2276.87 Mbp) processed in 42.084s (54.0 Kseq/m, 3246.14 Mbp/m).<br />
37869 sequences classified (99.92%)<br />
29 sequences unclassified (0.08%)<br />
Writing report file to database.report.tsv  ..<br />
Reading genome sizes from ././database.kdb.counts ... done<br />
Setting values in the taxonomy tree ... done<br />
Printing classification report ...  done<br />
Report finished in 0.247 seconds.<br />
Finishing up ...Database construction complete. [Total: 17m6.972s]<br />
You can delete all files but database.{kdb,idx} and taxDB now, if you want</p>
]]></description><link>http://an.forum.genostack.com/post/466</link><guid isPermaLink="true">http://an.forum.genostack.com/post/466</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Wed, 17 Mar 2021 06:34:29 GMT</pubDate></item></channel></rss>