暗能星系

    • 登录
    • 搜索

    比对软件

    生物信息分析
    1
    1
    8
    正在加载更多帖子
    • 从旧到新
    • 从新到旧
    • 最多赞同
    回复
    • 在新帖中回复
    登录后回复
    此主题已被删除。只有拥有主题管理权限的用户可以查看。
    • A
      anneng 最后由 编辑

      Common Alignment Tools
      A small summary of common alignment tools used by bioinformaticians and how to get started
      Alignment tools are a major contributor to the domain of bioinformatics. From assemblers to database search and similarity calculations sometime in the line of work you may have come across some kind of an assembler. In this article, I will be talking about the common alignment tools that I have been using.

      BLAST
      BLAST stands for Basic Local Alignment Search Tool. As the term describes this is mostly used for search purposes. The idea of local alignment is useful when we want to study the containment of a sequence in another, thus the use case of search.

      Applications
      BLAST is mostly used as a database search tool due to its fast nature with sensitivity parameters. The most common service using the BLAST is NCBI search database (https://blast.ncbi.nlm.nih.gov/Blast.cgi). However, in research work BLAST comes handy when we want to perform taxonomic annotations or to label sequences with database sequence annotations such as plasmid nature, coding and non-coding regions searched against known strains, etc. In the context of a database search, BLAST is extremely fast. However, in scenarios where precise base-wise complete alignments are needed, it is better to switch on to a more sensitive aligner like BWA-MEM or Minimap2.

      Algorithm Overview
      Removal of low complexity regions (tandem repeats and N bases for DNA) of the query sequence
      Obtain a k-mer list for the query sequence (k=11), list possible matching words and score them using BLOSUM62 matrix. This is done for all k-mers
      Obtain the high scoring k-mers from step 2, decided by a specified threshold
      Scan the database for these high scoring k-mers and obtain high scoring segment pairs (HSPs)
      Extend the search from the exact match and outwards until the accumulated score starts to drop
      The algorithm is simple to explain and fast for a large search space. Database index usually contains k-mers of k-11 for nucleotide sequences. K=3 is used for protein sequences.

      Installation
      Compiled binaries or source files can be downloaded from here. Compilation can be done by;

      cd c++
      ./configure
      cd ReleaseMT/build
      make all_r
      More information can be found here. You can refer here as to how you might build your own database using sequence files.

      BWA
      BWA stands for Borrows Wheeler Transform. This transforms in a manner that makes it easy to perform compression on data. This is the key idea behind the popular alignment too BWA-MEM. BWA-MEM uses a prefix index to perform the indexing and alignment. You could read deeper in Heng Li’s GitHub.

      Applications
      BWA-MEM is commonly used for aligning short reads to the reference genomes. This is a key step in the reference-based assembly of the human genome.

      Installation
      git clone https://github.com/lh3/bwa.git
      cd bwa; make
      Usage of the BWA-MEM has several steps. In the first step, you are required to index the reference genome. The tool is designed for short-reads thus you could use both paired-end reads or sing ended reads. Following are the commands from the GitHub page for your reference.

      ./bwa index ref.fa
      ./bwa mem ref.fa read-se.fq.gz | gzip -3 > aln-se.sam.gz
      ./bwa mem ref.fa read1.fq read2.fq | gzip -3 > aln-pe.sam.gz
      Minimap2
      Minimap2 is my favourite alignment tool, which is indeed very fast and versatile. It is robust with much longer sequences with noise. Few of the common use cases are as follows.

      Applications
      Align long noisy reads to the references genomes
      Align reads to the contigs to compute base-wise coverage
      Aligning all reads against themselves as a preliminary step for assembly and read correction
      Aligning reads to the assembly graph
      Algorithm Overview
      The algorithm is based on the idea of minimizes. A minimizer is a minimum (lexicographically) k-mer in a window of w k-mers. This is one of the main reason the algorithm is fast at a bit of a compromise on the sensitivity.

      Minimap2 obtains (k, w) minimizers for all the references and query sequences. The matching minimizers that are below a certain frequency in the set of references those are called seeds and used for alignment.

      In my experience, the alignment could be not sensitive enough in certain scenarios where I tried to align reads-vs-reads. That is reasonable and mentioned on the GitHub page. It is always wise to use another alignment for such scenarios.

      Installation
      git clone https://github.com/lh3/minimap2
      cd minimap2 && make
      For extended use cases, you can refer the original GitHub page.

      I hope this will help someone who is a bit new to the field of bioinformatics. Thanks for reading.

      I will introduce a few multiple sequence alignment and visualization tools in a future article.

      https://webcache.googleusercontent.com/search?q=cache:0WPj51d8EBEJ:https://medium.com/computational-biology/common-alignment-tools-25e283290ae4+&cd=6&hl=en&ct=clnk&gl=us

      1 条回复 最后回复 回复 引用 0
      • First post
        Last post
      Powered by 暗能星系