A primer on microbial bioinformatics for nonbioinformaticians
-
https://www.clinicalmicrobiologyandinfection.com/article/S1198-743X(17)30709-7/fulltext
Table 1List of bioinformatics software used for microbial bioinformatics data analysis
Usage Software name Description URL
Quality measures and read preprocessing FASTQC Toolbox for displaying sequence statistics for next-generation sequencing reads http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
TRIMMOMATIC Command-line based tool for trimming of short-read paired-end and single-ended data http://www.usadellab.org/cms/?page=trimmomatic
FASTX-Toolkit A collection of command line tools for preprocessing of short-read FASTA/FASTQ files http://hannonlab.cshl.edu/fastx_toolkit/
PRINSEQ Command-line and web-based tool for filtering, reformatting, or trimming genomic and metagenomic sequence data, generates summary statistics in graphical and tabular format http://prinseq.sourceforge.net/, http://edwards.sdsu.edu/cgibin/prinseq/prinseq.cgi
Contamination detection Kraken Taxonomic assignment of reads, useful for metagenomics analysis or detection of contamination in pure culture samples https://ccb.jhu.edu/software/kraken/
MIDAS Taxonomic assignment of reads, useful for metagenomics analysis or detection of contamination in pure culture samples https://github.com/snayfach/MIDAS
Assembly software and pipelines Velvet De novo genomic assembler specially designed for short reads http://github.com/dzerbino/velvet/tree/master
SPAdes De novo genomic assembler for short reads; it can also provide hybrid assemblies using long-read data together with short-read data http://cab.spbu.ru/software/spades/
Canu De novo genomic assembler designed for high-noise single-molecule sequencing such as long reads http://github.com/marbl/canu
INNUca A standardized, fully automated, flexible, portable and pathogen-independent pipeline for bacterial genome assembly and quality control starting from short reads http://github.com/INNUENDOCON/INNUca
shovill A pipeline for bacterial genome assembly which improves SPAdes speed and accuracy https://github.com/tseemann/shovill
In silico typing ReMatCh Software for variant calling based on a read-mapping strategy to selected target sequences; also interacts with European Nucleotide Archive (ENA) repository, easily mining publicly available data http://github.com/B-UMMI/ReMatCh
Short Read Sequence Typing for Bacterial Pathogens (SRST2) It uses short-read data, MLST database and/or database of gene sequences (e.g. resistance genes, virulence genes) and reports the presence of STs and/or reference genes http://github.com/katholt/srst2
Microbial InSilico Typer (MIST) Rapid generation of in silico typing data (e.g. MLST, MLVA) from draft bacterial genome assemblies http://bitbucket.org/peterk87/microbialinsilicotyper
SISTR A web- and command line–accessible tool for Salmonella typing using draft genome assemblies http://lfz.corefacility.ca/sistr-app/
SeqSero A web-accessible tool for Salmonella typing using raw reads or draft genome assemblies http://www.denglab.info/SeqSero
RGI-CARD Curated collection of antimicrobial resistance gene and mutation sequences, bioinformatics models and tools for their detection in bacterial genomes http://www.card.mcmaster.ca/analyze/rgi
ResFinder A web-accessible tool for the detection of acquired antimicrobial resistance genes in bacterial genomes using raw reads or draft genome assemblies https://cge.cbs.dtu.dk/services/ResFinder/
VirulenceFinder A web-accessible tool for the detection of virulence associated genes in Escherichia coli, Listeria spp., Staphylococcus aureus, Enterococcus spp. using raw reads or draft genome assemblies https://cge.cbs.dtu.dk/services/VirulenceFinder/
MLST1.8 A web-accessible tool for the determination of MLST types from bacterial genomes using publicly available MLST schemas https://cge.cbs.dtu.dk/services/MLST
Mlst2.9 Command line–based software which can extract MLST from bacterial genomes using publicly available MLST schemas https://github.com/tseemann/mlstCFSANSNP
CFSAN SNP Pipeline Pipeline for extracting high quality SNV matrices for sequences from closely related pathogens http://snppipeline.readthedocs.io/en/latest/
Snippy A pipeline for rapid identification of haploid variants and construction of phylogeny using core genome SNPs http://github.com/tseemann/snippy
SNVPhyl (Single Nucleotide Variant PHYLogenomics) Pipeline for identifying SNV within a collection of microbial genomes and constructing a phylogenetic tree http://snvphyl.readthedocs.io/en/latest/
Lyve-SET A pipeline for using high-quality SNPs to create a phylogeny, especially for outbreak investigations https://github.com/lskatz/lyve-SET
Gene-by-gene approaches BIGSdb Web-accessible database system designed to store and analyse linked phenotypic and genotypic information, including allele calling engine for gene-by-gene approach; it is the database system for both PubMLST and PasteurMLST https://github.com/kjolley/BIGSdb, http://pubmlst.org
http://bigsdb.pasteur.fr/index.html
Enterobase Curated database and online resource for molecular typing of Salmonella, Escherichia coli, Yersinia spp. and Moraxella spp. using gene-by-gene approach http://enterobase.warwick.ac.uk/
Genome Profiler Stand-alone gene-by-gene allele calling algorithm which uses conserved gene neighbourhoods to resolve gene paralogy http://sourceforge.net/projects/genomeprofiler/
chewBBACA A comprehensive and highly efficient stand-alone gene-by-gene allele calling algorithm based on coding DNA sequences, including suite of tools for providing overview of schema performance https://github.com/B-UMMI/chewBBACA
Gene annotation Prodigal Protein-coding gene prediction software tool for bacterial and archaeal genomes http://github.com/hyattpd/prodigal/wiki
Prokka Quick functional annotation of bacterial genomes producing standards-compliant output file http://github.com/tseemann/prokka
RAST Fully automated service for annotating bacterial and archaeal genomes http://rast.nmpdr.org/
MicroScope Comprehensive analytical platform for genome annotation and analysis of bacterial genomes http://www.genoscope.cns.fr/agc/microscope/home/index.php
NCBI prokaryotic genome annotation pipeline (PGAP) Automatic prokaryotic genome annotation pipeline that combines ab initio gene prediction algorithms with homology-based methods https://www.ncbi.nlm.nih.gov/genome/annotation_prok/
NCBI Pathogen Detection An online platform for sharing and comparing data on outbreak strains; currently contains databases for 20 bacterial species, focusing on food-borne pathogens and healthcare-associated infections https://www.ncbi.nlm.nih.gov/pathogens/
Genome alignments Harvest A suite of core genome alignment and visualization tools for quick and high-throughput analysis of intraspecific bacterial genomes http://harvest.readthedocs.io/en/latest/
Mauve Aligner for comparative analysis of full bacterial genomes http://darlinglab.org/mauve/mauve.html
Homology clustering and Association studies Roary High speed stand-alone pan-genome pipeline for bacterial genomes http://sanger-pathogens.github.io/Roary/
Scoary Pan-genome–wide association studies using Roary output https://github.com/AdmiralenOla/Scoary
Neptune Software designed for detecting genomic signatures within bacterial populations https://github.com/phac-nml/neptune
Phylogenetic inference RAxML Sequential and parallel maximum-likelihood phylogeny estimation that operates on nucleotide and protein sequence alignments https://sco.h-its.org/exelixis/software.html
FastTree Compute approximately maximum likelihood phylogenetic trees from large nucleotide or protein multiple sequence alignments http://www.microbesonline.org/fasttree/
Gubbins Compute maximum likelihood from alignment after removing regions containing elevated densities of base substitutions https://github.com/sangerpathogens/gubbins
ClonalFrameML A maximum likelihood implementation of ClonalFrame designed for genomes sequences https://github.com/xavierdidelot/ClonalFrameML
PHYLOViZ Online Web-based tool for phylogenetic inference, visualization, analysis and sharing of sequence-based typing methods that generate allelic profiles and associated epidemiologic data http://online.phyloviz.net
PHYLOViZ 2.0 Stand-alone Java software for phylogenetic inference, visualization and analysis of sequence-based typing methods that generate allelic profiles and their associated epidemiologic data http://www.phyloviz.net/
Visualization tools Microreact A web-based tool for genomic epidemiology data visualization and sharing http://microreact.org
Phandango Interactive web-based tool for fast exploration of large-scale population genomics data sets combining output from multiple genomic analysis methods https://github.com/jameshadfield/phandango
iTOL Web-based tool for display, annotation and management of phylogenetic trees http://itol.embl.de/
GenGIS 2 Application including 3-D graphical and Python interfaces allowing users to combine digital map data and sequences http://kiwi.cs.dal.ca/GenGIS/Main_Page
Multipurpose analytical platforms and pipelines Centre for Genomic Epidemiology Toolbox A suite of web-based tools and service for pathogen molecular typing, genome assembly, phenotypic prediction (e.g. resistance prediction) and phylogeny construction http://cge.cbs.dtu.dk/services/
Integrated Rapid Infectious Disease Analysis (IRIDA) Platform A Galaxy-based platform for real-time infectious disease outbreak investigation using genomic data including a sequence data management module and workflows, ontology framework (GenEpiO) and data visualization tools https://irida.corefacility.ca/documentation/downloads/index.html, http://irida.ca/
Integration genomics in surveillance of food-borne pathogens (INNUENDO) platform A platform for real-time disease outbreak investigation and surveillance of food-borne pathogens using genomic data including sequence-data management module, assembly modules with QA/QC measures, gene-by-gene analytical pipeline, ontology framework (GenEpiO) and visualization tools https://github.com/INNUENDOCON/INNUENDO_platform
Nullarbor A pipeline for generating public health microbiology reports from sequenced isolates including sequencing specifics, species ID, subtypes and core SNP http://github.com/tseemann/nullarbor