暗能星系

    • 登录
    • 搜索

    构建本地nt/nr数据库

    生物信息分析
    2
    19
    76
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 编辑

      https://www.ncbi.nlm.nih.gov/books/NBK279670/

      1 条回复 最后回复 回复 引用 0
      • A
        anneng 最后由 编辑

        blastdbv5.pdf

        1 条回复 最后回复 回复 引用 0
        • A
          anneng 最后由 anneng 编辑

          https://docs.oracle.com/cd/B19306_01/datamine.102/b14340/blast.htm
          Oracle对blast的支持

          1 条回复 最后回复 回复 引用 0
          • A
            anneng 最后由 anneng 编辑

            Bioinformatics_ introduction to using BLAST with Ubuntu.pdfBioinformatics_ managing BLAST data sources.pdf

            1 条回复 最后回复 回复 引用 0
            • A
              anneng 最后由 编辑

              https://dbsloan.github.io/TS2019/exercises/local_blast.html
              Running Local BLAST and Parsing Output

              makeblastdb -in Ecoli.proteins.fas -dbtype prot
              
              makeblastdb -in Ecoli.genome.fas -dbtype nucl
              
              blastn -task blastn  -query Salmonella.genome.fas -db Ecoli.genome.fas -evalue 1e-20 -num_threads 4 -out blastn.txt
              
              pdf ("my_dotplot.pdf")
              plot (blastnData$Query_Start, blastnData$Hit_Start, cex = .25)
              dev.off()
              quit()
              

              46d8543b-55c2-4680-a439-a6c4129e7baa-image.png

              1 条回复 最后回复 回复 引用 0
              • A
                anneng 最后由 anneng 编辑

                Extracting data from BLAST databases with blastdbcmd
                https://www.ncbi.nlm.nih.gov/books/NBK279689/

                1 条回复 最后回复 回复 引用 0
                • A
                  anneng 最后由 编辑

                  Preformatted BLAST vs Fasta
                  https://www.ncbi.nlm.nih.gov/books/NBK62345/
                  Getting the preformatted database files
                  Preformatted BLAST database files offer several advantages over the FASTA files:

                  The preformatted databases are broken into smaller volumes and therefore can be downloaded more readily with fewer errors
                  A convenient Perl script (update_blastdb.pl found in the bin directory of a locally installed blast+ package) is available to simplify the download of these preformatted databases
                  Preformatted database files remove the makeblastdb formatting steps, and saves valuable processing time and diskspace
                  Taxonomic information is encoded within the preformatted databases and can be used to limit the scope of a blast search, and sequence retrieval, and scientific name addition through the included taxdb files
                  Sequences in FASTA format can be generated easily from the preformatted databases using the blastdbcmd utility when needed

                  1 条回复 最后回复 回复 引用 0
                  • A
                    anneng 最后由 anneng 编辑

                    Annotating BLAST Reports with Taxonomy Information
                    https://www.mathworks.com/matlabcentral/mlc-downloads/downloads/submissions/15970/versions/2/previews/taxoblastdemo/html/taxoblastdemo.html?access_key=

                    1 条回复 最后回复 回复 引用 0
                    • A
                      anneng 最后由 编辑

                      https://github.com/lskatz/taxdb
                      一个工具 可以把taxdump导入sqlite

                      1 条回复 最后回复 回复 引用 0
                      • A
                        anneng 最后由 编辑

                        The BLAST taxonomy database is required in order to print the scientific name, common name, blast name, or super kingdom as part of the BLAST report or in a report with blastdbcmd. The BLAST database contains only the taxid (an integer) for each entry, and the taxonomy database allow BLAST to retrieve the scientific name etc. from a taxid. The BLAST taxonomy database consists of a pair of files (taxdb.bti and taxdb.btd) that are available as a compressed archive from the NCBI BLAST FTP site (ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz). The update_blastdb.pl script can be used to download and update this archive; it is recommended that the uncompressed contents of the archive be installed in the same directory where the BLAST databases reside. Assuming proper file permissions and that the BLASTDB environment variable contains the path to the installation directory of the BLAST databases, the following commands accomplish that:

                        Download the taxdb archive

                        perl update_blastdb.pl taxdb

                        Install it in the BLASTDB directory

                        gunzip -cd taxdb.tar.gz | (cd $BLASTDB; tar xvf - )

                        1 条回复 最后回复 回复 引用 0
                        • Z
                          zhangfanglin 最后由 anneng 编辑

                          nr、nt导出规范

                          **%T %a %i  %t %s**    
                          1、物种ID 
                          2、accession 序列ID           
                          3、sequence title
                          4、描述
                          5、序列
                          

                          上面的导出%i 应该是序列id 我们可以不用这个字段
                          %a %t %T %s

                          1 条回复 最后回复 回复 引用 0
                          • A
                            anneng 最后由 编辑

                            -outfmt <String>
                            Output format, where the available format specifiers are:
                            %f means sequence in FASTA format
                            %s means sequence data (without defline)
                            %a means accession
                            %g means gi
                            %o means ordinal id (OID)
                            %i means sequence id
                            %t means sequence title
                            %l means sequence length
                            %h means sequence hash value
                            %T means taxid
                            %X means leaf-node taxids
                            %e means membership integer
                            %L means common taxonomic name
                            %C means common taxonomic names for leaf-node taxids
                            %S means scientific name
                            %N means scientific names for leaf-node taxids
                            %B means BLAST name
                            %K means taxonomic super kingdom
                            %P means PIG
                            %m means sequence masking data.
                            Masking data will be displayed as a series of 'N-M' values
                            separated by ';' or the word 'none' if none are available.
                            If '%f' is specified, all other format specifiers are ignored.
                            For every format except '%f', each line of output will correspond
                            to a sequence.
                            Default = `%f'

                            1 条回复 最后回复 回复 引用 0
                            • A
                              anneng 最后由 编辑

                              如果想把一个fasta文件中的序列都当作一个物种对待 那么可以使用taxid参数
                              合并两个数据库:
                              makeblastdb -in mysequences.fna -dbtype nucl -title "some sequences I found" -out mysequences -parse_seqids
                              blastdb_aliastool -dblist nt mysequences -dbtype nucl -title "nt database + my own sequences" -out ntandmore
                              如果有多个fasta 文件 每个文件是一个物种 可以先分别建库 然后用blastdb_aliastool合并

                              1 条回复 最后回复 回复 引用 0
                              • First post
                                Last post
                              Powered by 暗能星系