Genetics and population analysis.
Genetics and population analysis
Pavian: interactive analysis of metagenomics data for
microbiome studies and pathogen identification
Florian P. Breitwieser1,* and Steven L. Salzberg2
1Center for Computational Biology, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA and 2Departments of
Biomedical Engineering, Computer Science and Biostatistics, Johns Hopkins University, Baltimore, MD, USA
*To whom correspondence should be addressed
Associate Editor: Russell Schwartz
Received on April 29, 2019; revised on September 12, 2019; editorial decision on September 16, 2019; accepted on September 20, 2019
Abstract
Summary: Pavian is a web application for exploring classification results from metagenomics experiments. With
Pavian, researchers can analyze, visualize and transform results from various classifiers—such as Kraken,
Centrifuge and MethaPhlAn—using interactive data tables, heatmaps and Sankey flow diagrams. An interactive
alignment coverage viewer can help in the validation of matches to a particular genome, which can be crucial when
using metagenomics experiments for pathogen detection.
Availability and implementation: Pavian is implemented in the R language as a modular Shiny web app and is freely
available under GPL-3 from http://github.com/fbreitwieser/pavian.
Contact: pavianviz@gmail.com
1 Introduction
Microbiome research has seen an enormous growth over the last decade, especially in the areas of health and disease, microbial ecology
and infectious diseases. With metagenomics data becoming more ubiquitous, secondary data analysis and visualization methods are pivotal (Breitwieser et al., 2017). For example, when using
metagenomics sequencing for the diagnosis of infections, finding a
single pathogen in a complex background—which might include nonpathogenic microbial communities, contamination and false positives—can be extremely hard without the ability to compare results
across multiple samples. Interactive exploration and visualization can
help users understand the data and find the needle in the haystack.
With Pavian, we provide a novel interface to explore metagenomics results from the Kraken (Wood and Salzberg, 2014),
KrakenUniq (Breitwieser et al., 2018), Centrifuge (Kim et al., 2016)
and MetaPhlAn (Truong et al., 2015) classifiers. Pavian enables
researchers to visualize and understand the species present in a single
sample as well as compare identifications across multiple samples.
Similar methods include Shiny-phyloseq (McMurdie and Holmes,
2015) and metaviz (Wagner et al., 2018) which are geared towards
community analysis, while Pavian has unique features such as its
Sankey flow diagram and genome alignment viewer.
2 Pavian software
Pavian implements a straightforward interface to analyze and compare complex metagenomics datasets. Figure 1 shows how Pavian
displays data from multiple brain biopsies of patients with suspected
neurological infections (Salzberg et al., 2016).
Sample view. The sample view provides an overview of the classifications (see Fig. 1A). Pavian’s default visualization choice is a
Sankey diagram, a type of graph that is often used to map energy
flows. Pavian uses this diagram to display the flow of reads from the
root of the taxonomy to more specific ranks. The width of the flow
is proportional to the number of reads. The Sankey diagram puts a
visual emphasis on the major flows within the system and thus provides a clear visual summary of the classifications.
Sample comparison. The sample comparison view provides a tabular view of identification results from multiple samples (Fig. 1B) to identify which microbes are commonly observed and which are present only
in one or a few samples. The query-able table displays taxa as rows and
samples as columns. By default, identifications at all taxonomy levels
are shown, but the table can be filtered to display identifications at specific ranks, e.g. species, genus or phylum ranks, with one click.
Additional columns with transformations of the data—such as percentage, z-score and rank—provide further ways to sort and filter the data.
Alignment viewer. A high read count for a particular species does
not always mean that the microbe is present; e.g. contaminated
genomes and low-complexity sequences can produce spurious matches
between reads and genomes. Pavian implements an alignment viewer
for BAM files that shows the read distribution over both the whole
genome and in selected regions, which can give additional evidence on
whether an identification is credible or not (Fig. 1C). The interface further provides links to download RefSeq genome assemblies.
3 Implementation
Pavian is written in R (https://www.R-project.org) as a modular
Shiny app. It incorporates several interactive Javascript tables and
VC The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 1
Bioinformatics, 2019, 1–2
doi: 10.1093/bioinformatics/btz715
Advance Access Publication Date: 25 September 2019
Applications Note
Downloaded from https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz715/5573755 by University of Liverpool user on 05 November 2019
D3 plots (Bostock et al., 2011), such as the Sankey diagram that is
based on D3’s sankey code and the networkD3 library. The alignment viewer uses the Rsamtools library to load the bam file. The
backend can be launched from an R environment or installed on a
server, and the interface is accessed through a web browser.
4 Conclusion
Pavian is a novel tool for visualizing and analyzing metagenomics data.
Its functions help microbiome researchers as well as clinical microbiologists to gain a better understanding of their data through Sankey flow
diagrams, multiple comparison tables and a genome alignment viewer.
Funding
This work was supported in part by grants R35-GM130151 and R01-
HG006677 from the National Institutes of Health, and by grant number
W911NF-14-1-0490 from the U. S. Army Research Office.
Conflict of Interest: none declared.
References
Bostock,M. et al. (2011) D3 data-driven documents. IEEE Trans. Vis.
Comput. Graph., 17, 2301–2309.
Breitwieser,F.P. et al. (2018) KrakenUniq: confident and fast metagenomics
classification using unique k-mer counts. Genome Biol., 19, 198.
Breitwieser,F.P. et al. (2017) A review of methods and databases for metagenomic classification and assembly. Brief. Bioinform. doi: 10.1093/bib/
bbx120.
Kim,D. et al. (2016) Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res., 26, 1721–1729.
McMurdie,P.J. and Holmes,S. (2015) Shiny-phyloseq: web application for
interactive microbiome analysis with provenance tracking. Bioinformatics,
31, 282–283.
Salzberg,S.L. et al. (2016) Next-generation sequencing in neuropathologic
diagnosis of infections of the nervous system. Neurol. Neuroimmunol.
Neuroinflamm., 3, e251.
Truong,D.T. et al. (2015) MetaPhlAn2 for enhanced metagenomic taxonomic
profiling. Nat. Methods, 12, 902–903.
Wagner,J. et al. (2018) Metaviz: interactive statistical and visual analysis of
metagenomic data. Nucleic Acids Res., 46, 2777–2787.
Wood,D.E. and Salzberg,S.L. (2014) Kraken: ultrafast metagenomic sequence
classification using exact alignments. Genome Biol., 15, R46.
Fig. 1. (A) Main interface with sample classification summary shown in a Sankey diagram. The width of the flow corresponds to the number of reads, and hovering over a species node brings up a barchart with the number of reads for the species across the sample set (inset). (B) The sample comparison module provides a tabular overview of the
reads, percentages or z-scores over all samples at any taxonomic rank with interactive filtering. (C) The alignment viewer displays the coverage of a particular genome based
on BAM alignment files
2 F.P.Breitwieser and S.L.Salzberg
Downloaded from https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz715/5573755 by University of Liverpool user on 05 November 2019