Research: Metagenomic Frameworks For Monitoring Antibiotic Resistance in Aquatic Environments
Research: Metagenomic Frameworks For Monitoring Antibiotic Resistance in Aquatic Environments
Monitoring for antibiotic resistance in Methods sludge sample was obtained discharges into a
the marine environment has been infrequent Data sources. Sequence reads for the 25 local waterway in Charlotte, North Carolina,
and incomplete (Allen et al. 2010) and has metagenomic samples included in this analysis and has a daily inflow of 7.5 million gallons
predominantly focused on measuring levels of are publicly available and were downloaded from primarily domestic sources in addition to
antibiotics in different water matrices (Segura from the National Center for Biotechnology several industries, a university, and a hospital
et al. 2009). Furthermore, environmental Information (NCBI) Sequence Read Archive (Sanapareddy et al. 2009).
monitoring of antibiotic resistance has not (SRA; http://www.ncbi.nlm.nih.gov/sra). All samples analyzed in this study (except
been formalized into public health surveillance These 25 samples were divided into seven the river sediment sample) were filtered and size
or water quality management decision frame- ecosystems: estuary, coastal ocean, freshwater fractionated (0.1–3.0 μm) to target the micro-
works, likely because of a continuing lack of lake, marina, river sediment, wastewater treat- bial community. Genomic DNA was extracted
data and uncertainty regarding risk and risk ment plant (WWTP) effluent, and WWTP and shotgun sequenced using pyrosequencing
metrics. Instead, global surveillance efforts sludge (Table 1). The estuary data set includes (Margulies et al. 2005). Pyrosequencing of
such as the European Antimicrobial Resistance surface water samples taken offshore in the total genomic DNA was performed using 454
Surveillance Network (European Centre for northern (samples P1, P26) and central (P5, GS-FLX or GS-FLX Titanium technologies
Disease Prevention and Control 2014) and the P28, P32) basins of Puget Sound, State of (454 Life Sciences, Branford, CT). For data
National Antimicrobial Resistance Monitoring Washington (Port et al. 2012). Sampling site sets with multiple samples (e.g., estuary, coastal
System for Enteric Bacteria (Centers for P26 was specifically located adjacent to the ocean, river sediment), samples were indi-
Disease Control and Prevention 2014) have northern basin in the Strait of Juan De Fuca, vidually barcoded and sequenced in parallel.
predominantly focused on the prevalence State of Washington. The marina sample is Summary sequencing statistics, including func-
of antibiotic usage and antibiotic resistance from the central basin of Puget Sound and tional annotation, are provided in Table 1.
isolates in clinical and public health labora- was taken near shore inside an urban marina Open reading frames (ORFs) were predicted
tory settings (Grundmann et al. 2011). Given and close to a source of freshwater input with MetaGeneMark software (http://topaz.
the global magnitude of antibiotic resistance, (Port et al. 2012). The coastal ocean samples gatech.edu/metagenome/) (Zhu et al. 2010)
including the emergence of multi-drug resis- were collected as part of an annual California and protein domains assigned using the
tance bacterial strains and increasing reports Cooperative Oceanic Fisheries Investigations Pfam 26.0 database (Punta et al. 2012).
of occurrence in the environment, there is a cruise in the Southern California Bight ARD index. Metagenomic data relevant
critical need for the identification, characteriza- (Allen et al. 2012). Samples at seven stations to environmental surveillance of ARDs was
tion, and control of these generally uncharac- were taken along hydrographic and nutrient classified into three categories: gene transfer
terized environmental ARD reservoirs (Bush gradients in near (samples GS257, GS263, potential, ARG potential, and pathogenicity
et al. 2011). GS264) and offshore (GS258, GS259, potential (Figure 1). A fourth category,
The objectives of the present study were GS260, GS262) upwelling regions within source tracking, relates to identifying poten-
three-fold. First, a metagenomic epidemiology- the California Current Ecosystem. The term tial anthropogenic sources of ARDs through
based approach was used to develop an index marine refers here to the estuary, marina, community composition profiling but has not
that quantifies the resistance potential of an and coastal ocean samples. The freshwater yet been incorporated into the index analysis.
environment. Metagenomic epidemiology is a lake sample is from a reservoir encompassing The index categories were quantified via their
multi-layered approach that considers the entire 59 square miles near Atlanta, Georgia, that respective subcategories as shown in Figure 1.
microbiotic context for environmental antibi- serves as a drinking water supply for the city Bioinformatic analyses. The unassem-
otic resistance by characterizing simultaneously and is used for recreational activities (Oh et al. bled DNA sequence reads for each meta
the different levels of microbiome complexity 2011). The river sediment samples were taken genome were run through a bioinformatic
that drive antibiotic resistance including ARGs, at intervals downstream from a WWTP dis- framework that quantified the ARD index
genetic vectors, and the species in which these charge site in Patancheru, Hyderabad, India, (Figure 1). Reads were quality processed using
genes occur (Baquero 2012). Second, the index that processes water from approximately 90 the MG-Rast pipeline (Meyer et al. 2008)
was analyzed for common modal patterns (i.e., drug manufacturers (Kristiansson et al. 2011). and then run through three separate analy-
principal components) across a diverse set of The wastewater effluent, taken from a WWTP ses (one for each index category, excluding
marine and freshwater ecosystems. Third, we that discharges into Puget Sound, has an aver- source tracking). Quality control param-
sought to integrate the index into a public age daily inflow of 133 million gallons and eters included the removal of reads that had
health surveillance framework in order to pro- is sourced from storm water/groundwater a length > 2 SDs from the mean sample read
vide an example by which high-throughput (53%), residential (29%), commercial (17%), length, > 5 ambiguous bases, < 5% of any
metagenomic data can be applied to regulation and industrial (1%) processes (Port et al. one nucleotide, or 100% identity to another
or management. 2012). The WWTP from which the activated sequence over the first 50 bp.
Table 1. Metagenomic samples included in this study with associated metadata and summary statistics.
Estuary: Coastal ocean: Freshwater: Marina: River sediment: WWTP effluent: WWTP sludge:
Characteristic Puget Sound, USA California Bight, USA Atlanta, GA, USA Puget Sound, USA Patancheru, India Seattle, WA, USA Charlotte, NC, USA
No. of samples 5 12 1 1 4 1 1
Size fraction (μm) 0.2–3 0.1–0.8, 0.8–3 0.22–1.6 0.2–3 NA 0.2–3 NA
Depth (m) 5 2 5 1 0a NA NA
Megabase pairs 413 1,940 502 91 91 48 95
Mean read length (bp) 368 551 395 379 365 381 250
ORFs [(mean ± SD) %] 81.8 ± 2.1 69.4 ± 9.95 86.2 87.3 74.9 ± 0.565 89.7 89.8
Pfams [(mean ± SD) %] 40.8 ± 1.5 32.8 ± 8.9 39.8 45.3 30.5 ± 1.1 39.8 35.6
SRA accession no. SRP015952 SRP006681 SRA023414 SRP015952 SRP002078 SRX328700 SRA001012
Reference Port et al. 2012 Allen et al. 2012 Oh et al. 2011 Port et al. 2012 Kristiansson et al. 2011 Port et al. 2012 Sanapareddy et al. 2009
Abbreviations: NA, not applicable; ORF, open reading frame; Pfam, protein family.
aRiver sediment samples were collected at centimeters in depth along the shoreline of the river.
The abundance of each index subcategory database (http://www.ncbi.nlm.nih.gov/refseq) ARG database (11,498 sequences) composed
was calculated using different sequence simi- (1,843 sequences) using the sequence similarity of a nonredundant and updated version of the
larity thresholds in order to generate a distri- thresholds shown in Table 2. To identify TEs, Antibiotic Resistance Genes Database (http://
bution of values for each subcategory and to 431,000 sequences annotated as TEs were ardb.cbcb.umd.edu/) (Liu and Pop 2009) in
determine how these thresholds affect data downloaded from GenBank (http://www.ncbi. addition to ARGs from metagenomic sam-
interpretation (Table 2). The high threshold nlm.nih.gov/genbank/) and databased, and ples that were functionally verified to confer
represents the most conservative annota- metagenomic reads were then searched against resistance (Schmieder and Edwards 2012).
tion approach (with the least false-positives), this database using the similarity thresholds. Proteins were predicted from the ORFs gener-
followed by a gradual reduction in stringency, To annotate phages, reads were taxonomically ated from MetaGeneMark (Zhu et al. 2010)
including medium-high, medium-low, and assigned through the MG-Rast server using and then BLASTP searched against the ARG
low thresholds. Unless stated otherwise, anno- BLASTP (http://blast.ncbi.nlm.nih.gov/Blast. database (E-value < 10–5) using the thresholds
tated reads (per subcategory) were normalized cgi), and reads matching to phage families or presented in Table 2 to determine the best
to the total number of sequence reads genera were retained for each similarity thresh- match. Sequences with similarity to MRGs
per sample. old. The total phage count for each meta were identified by searching the SEED data-
Gene transfer potential subc ategories genome was normalized to the total number of base subsystem “Resistance to antibiotics and
included plasmids, TEs, and phages. Plasmids sequences assigned at the domain level. toxic compounds” (Overbeek et al. 2005).
were annotated by BLASTN (http://blast. ARG potential subcategories included This subsystem contains genes and gene clus-
ncbi.nlm.nih.gov/Blast.cgi) searching ARGs and MRGs. ARGs were identified using ters encoding resistance to arsenic, mercury,
(Altschul et al. 1990) the reads against plas- the same approach as previously described and cadmium.
mid sequences available in the NCBI RefSeq (Port et al. 2012). Briefly, we compiled an Two approaches were used to identify
pathogenic bacteria.
• Sequences were searched against the
Ribosomal Database Project (Cole et al.
2009) at the similarity thresholds and species
level; matches were then annotated as patho-
Unassembled DNA
sequence reads gens if present in the Microbial Rosetta Stone
Database (Ecker et al. 2005). This database
contains a list of bacterial pathogens known
Next generation sequencing
MG-Rast server to pose a human health risk.
(quality processing) • Sequences were taxonomically annotated
using the lowest common ancestor algo-
BlastN
BlastN/ rithm (LCA) within the MG-Rast server
BlastX (Meyer et al. 2008), and reads matching to
MetaGeneMark the species level at each similarity threshold
Environmental sampling gene prediction
Ribosomal Database were retained and run against the Microbial
Project (RDP) Rosetta Stone Database (Ecker et al. 2005).
Metagenomic protein Statistical analyses. For principal compo-
NCBI NCBI NCBI database nent analysis (PCA), the abundance counts
SEED nucleotide
protein database plasmid
database database database BlastP for each index subcategory were normal-
Microbial Rosetta ized to the total number of sequences in the
Antibiotic resistance
gene database Stone Database index for a given sample. PCA was performed
on the normalized data using the JMP ver-
Gene transfer
potential sion 10.0 statistical package (SAS Institute
Plasmids
Inc., Cary, NC). Eigen vectors and loading
Pathogenicity
TEs
ARG potential
potential values were extracted for the first two prin-
Source tracking
ARGs Pathogenic bacteria
cipal components. Finer scale analysis of the
Phages
Community genomic elements composing each index
composition Commensal bacteria MRGs Virulence factors
subcategory was run using GraphPad Prism,
Figure 1. Bioinformatic framework for quantifying the index of ARDs. The index categories are shown in
version 6.0 (GraphPad Software, San Diego,
the cream-colored boxes and the subcategories in red. The gray boxes (e.g., commensal bacteria and CA). Abundance counts per genomic ele-
virulence factors) represent subcategories that have not yet been incorporated into the index but may still ment were normalized to the total number of
play an important role in determining ARD potential. NCBI, National Center for Biotechnology Information. sequences within the respective subcategory
Table 2. Sequence similarity thresholds used to quantify the index subcategories.
Index category/subcategory High Medium-high Medium-low Low
Gene transfer potential
Plasmids 95% ID; 400 bp 95% ID; 300 bp 95% ID; 200 bp 95% ID; 100 bp
TEs 80% ID; 120 aa 80% ID; 90 aa 80% ID; 60 aa 80% ID; 30 aa
Phages 50% ID; 150 aa 50% ID; 100 aa 50% ID; 75 aa 50% ID; 50 aa
ARG potential
ARGs 80% ID; 150 aa 80% ID; 100 aa 80% ID; 75 aa 80% ID; 50 aa
MRGs 50% ID; 150 aa 50% ID; 100 aa 50% ID; 75 aa 50% ID; 50 aa
Pathogenicity potential
Pathogens 95% ID; 400 bp or 150 aa 95% ID; 300 bp or 100 aa 95% ID; 200 bp or 75 aa 95% ID; 100 bp or 50 aa
Abbreviations: aa, amino acids; bp, base pairs; ID, identity.
and 95% confidence intervals were generated (r = 0.85–0.98, p < 0.0001) (see Supplemental sludge share positive scores in PC2. However,
for each proportion. Material, Table S2). the sludge was nearly unweighted in PC1,
ARD index patterns. We used PCA to whereas effluent was highly positive on that
Results identify modalities (or principal component axis mainly because of an absence of phages.
Antibiotic resistance potential. An ARD index “patterns”) for the metagenomic data asso- PCA outliers included estuary sample P26
was developed that consisted of three cate ciated with each sample. PCA reduces our and three coastal ocean samples (GS259.1,
gories related to the molecular etiology of anti- highly multidimensional data set by gener- GS260.8, and GS262.1). Site P26 experiences
biotic resistance: a) gene transfer, b) ARG, and ating weighted (or loaded) linear combina- increased mixing of oceanic and Puget Sound
c) pathogenicity potential. To first compare tions [i.e., principal components (PCs)] of the waters relative to the other estuary locations,
the antibiotic resistance potential across the metagenomic categories ( e.g., ARGs, MRGs). which may explain why it is grouped within the
samples, index scores were calculated for each As a result, a small number of PCs explain as coastal ocean cluster. GS259.1 and GS262.1
metagenome using four sequence similarity much of the variance in the data set as pos- are the only coastal ocean samples represent-
thresholds ranging from high to low strin- sible. We ran PCA at the index subcategory ing the 0.1-μm microbial community from
gency (Table 2). When different bioinformatic level using the medium-high sequence similar- oligotrophic waters, but there is no apparent
thresholds are applied, the index scores change ity threshold for this case. For the abundance relationship between distance offshore and the
and consequently reveal differences that can of genomic elements composing the index ARD index profile.
impact public health monitoring and decision subcategories refer to Supplemental Material,
making. Application of the highest thresh- Table S2 and Figure S1. In our analysis of Discussion
old generated the lowest percentage of index- the full set of samples, the first two principal We developed and tested an index for charac
positive sequences (mean, 0.025%) for all components, PC1 and PC2, explained 68% of terizing the ARD potential of marine and
samples except the river sediment (Figure 2A). the total variance in the data set. freshwater environments using shotgun meta
As the similarity thresholds are reduced, this PC1 was predominantly characterized genomics. Currently available metagenomic
percentage increases to 0.033% (medium- by the presence (reflected by positive load- data sets allow for gene transfer potential,
high), 0.28% (medium-low), and 0.55% ings) of ARGs, plasmids, and TEs and the ARG potential, and pathogenicity potential
(low). Individual index subcategories were also relative absence (negative loadings) of phages, to be included in the index, although future
differentially sensitive to increases in align- whereas PC2 reflected the presence of MRGs, introduction of source tracking data will
ment length and, therefore, threshold selection TEs, and pathogens and the relative absence enrich the approach. The index comprises
(Figure 2B–2G). of ARGs, plasmids, and phages (Figure 3). an ecological context for ARD potential by
As hypothesized, environments most There was a clear division between the coastal providing both the prevalence of ARGs and
proximal to human impact had the highest ocean and river sediment samples along the potential mechanisms by, and species in
cumulative ARD index scores at all similar- PC1, whereas the estuary, freshwater lake, which, these genes may be passed. This index
ity thresholds (Figure 2). Only the activated and WWTP effluent formed a mixed clus- differed across both diverse environmental
sludge sample did not follow this trend, likely ter with neutral scores along PC1. Despite samples and also within a group of marine
because the average read length of the sludge the diversity of sample types and relatively samples. Ecosystems proximal to human
data set did not meet the alignment length small sample size, the marine locations were impact, including effluent and river sediment
criteria of the higher thresholds. The river still largely distinguished from one another collected downstream from a WWTP pro-
sediment samples taken downstream from a along PC2. The estuary samples had posi- cessing high volumes of pharmaceuticals, had
WWTP processing high volumes of antibiot- tive scores within PC2, whereas the coastal the highest cumulative index scores. These
ics, as well as the effluent sample, had higher ocean samples were negative. The PC scores samples were distinguished by higher poten-
proportions of index-positive sequences due for the estuary samples are consistent with tials for gene transfer, pathogenicity, and
to elevated ARGs, plasmids, and TEs relative the presence of MRGs (arsenic and mercury the presence of ARGs. Less impacted envi-
to the other samples (Figure 2B–2G; see also resistance) and TEs (mainly Rhodobacteraceae ronments, including marine samples and a
Supplemental Material, Table S1). The most sp.) and the relative absence of ARGs and freshwater lake, had indices reflecting reduced
impacted environments also had the largest plasmids, whereas the coastal ocean samples public health concern but exhibited a distinct
proportion of sequences meeting the high were characterized by phages (primarily fingerprint characterized by either phages or
similarity threshold. In particular, sequences Myoviridae and Podoviridae) and the relative MRGs, depending on location. Pathogens
from the river sediment data sets had strong absence of MRGs, TEs, and pathogens. The were rare across all data sets but were likely
matches to known plasmids. The estuary freshwater lake sample had a similar profile to underestimated given the shotgun approach
samples, on average, had a slightly increased the estuary, including the presence of MRGs and, therefore, limited sequencing depth.
cumulative score relative to the coastal ocean (mainly arsenic resistance) and TEs (mainly As the samples in this study were diverse,
samples, with higher levels of MRGs, and Ralstonia, Rickettsia, and Synechococcus spp.). multiple factors may have contributed to the
to a lesser extent TEs, than the other marine The PC results characterized the river sedi- index profiles obtained including microbial
samples. Sample P26 (estuary) had an ele- ment samples by the presence of ARGs (sul- community composition, ecosystem type,
vated index score relative to all other marine fonamide and aminoglycoside resistance sampling methods, seasonality, and under
samples due to an increased phage count genes) and plasmids (Edwardsiella tarda plas- lying data quality. We did not directly
(Podoviridae). Pathogens were rare at the mid pEIB202, Escherichia coli pO26, and address community composition or season-
higher similarity thresholds yet still detected Pasteurella multocida plasmid pCCK38) and ality, but composition is likely reflected in
in the effluent, river sediment, coastal ocean, by the relative absence of MRGs (Figure 3; the ecosystem type. We aimed to minimize
and marina samples (Figure 2G; see also see also Supplemental Material, Table S2). the impact of sample collection in part by
Supplemental Material, Table S1). The WWTP effluent was characterized by the only including studies that targeted the same
Multivariate analysis of all samples presence of MRGs (arsenic and mercury resis- size fraction. The PCA results suggest that
revealed ARGs and plasmids to be the tance), pathogens (Acinetobacter calcoaceticus) ecosystem type is a stronger predictor of the
most strongly correlated index subcate and to lesser extent TEs, and the relative index profiles than sequence quality. The
gories at all sequence similarity thresholds absence of phages. The effluent and activated coastal ocean samples had a wide range in the
10
High
9 Medium-high
Medium-low
ARD index-positive sequences (%)
8 Low
7 Ecosystems Ecosystems
more distal to proximal to
human impact human impact
6
0
P1
P5
P26
P28
P32
GS258 (0.1)
GS259 (0.1)
GS262 (0.1)
GS263 (0.1)
GS264 (0.1)
GS257 (0.8)
GS258 (0.8)
GS259 (0.8)
GS260 (0.8)
GS262 (0.8)
GS263 (0.8)
GS264 (0.8)
Lake Lanier
Marina
WWTP sludge
WWTP effluent
Discharge site
2.3 km downstream
2.7 km downstream
17.7 km downstream
Estuary Coastal ocean Marina WWTP River sediment
Freshwater downstream
from WWTP
1.6 2.0
ARGs MRGs
0.016 1.6
MRG sequences (%)
ARG sequences (%)
1.2 0.012
0.008
1.2
0.004
0.8
0.000
Lake Lanier
Marina
WWTP sludge
WWTP effluent
0.8
Estuary Coastal ocean
0.4
0.4
0.0 0.0
4.0 2.0
Plasmids TEs
Plasmid sequences (%)
0.30
3.0
TE sequences (%)
1.5
0.20
0.10
2.0 0.00 1.0
Lake Lanier
Marina
WWTP sludge
WWTP effluent
1.0 0.5
0.0 0.0
2.0 10
Phages Pathogens
Pathogen sequences (n)
Phage sequences (%)
1.6 8
1.2 6
0.8 4
0.4 2
0.0 0
Estuary Coastal ocean WWTP River sediment Estuary Coastal ocean WWTP River sediment
Freshwater downstream Freshwater downstream
Marina from WWTP Marina from WWTP
Figure 2. Percentage of total sequenced reads per metagenome assigned to the ARD index. (A) Percentage of index-positive sequences per sample and
ecosystem and (B–G) the percentage of sequence reads per sample and ecosystem assigned to each index subcategory [(B) ARG sequences; (C) MRG
sequences; (D) plasmid sequences; (E) TE sequences; (F) phage sequences; (G) pathogen sequences]. The percentages are shown for four different sequence
similarity thresholds [including high, medium-high, medium, and low stringencies (see Table 2)]. The number of pathogen-annotated sequences is shown instead
of the percentage. The vertical bar in each plot separates ecosystems more distal versus more proximal to human impact. Filter sizes (i.e., 0.1 and 0.8 μm) are
listed after the station names for the coastal ocean samples. The graph inserts for ARGs and plasmids in B and D are zoomed-in views of the abundance of each
subcategory excluding the river sediment samples.
number of predicted ORFs and proteins, yet organisms. For example, beach and shell- abundance counts (i.e., genome copies/L
they clustered closely in the PCA score plots. fishery closures in Washington State occur detected via quantitative polymerase chain
Furthermore, the river sediment samples, when fecal coliform levels exceed a geometric reaction) for pathogenic markers in fecally
which had a low number of ORFs and pre- mean of 14 colony forming units (CFUs) or contaminated recreational waters to deter-
dicted proteins compared to the other sam- enterococci levels exceed a geometric mean mine pathogen dose (Staley et al. 2012).
ples (except coastal ocean), had the highest of 70 CFU/100 mL marine water (State of The environmental detection rate described
number of predicted ARGs and MGEs. Thus, Washington 2014). Although this has been above begins to lay out a similar approach
although data quality may impact quantifica- an effective approach for reducing expo- for metagenomic assessments that may be
tion of the index, the diverse nature of the sure to well-known pathogens, early risk informative for distinguishing differently
samples confounded other potential factors. management may benefit from population- impacted environments and evaluating a vari-
As more metagenomic data with greater spa- level screening that results in a lower false- ety of public health impacts across marine
tiotemporal resolution become available, we negative rate and thus increased sensitivity microbial communities.
will be better able to tease apart these factors. for a broader range of organisms or genes of Relevance to public health management.
We evaluated the choice of sequence interest. Furthermore, a reduction in specific- Water quality management decisions have
similarity thresholds for annotating meta ity, and subsequent increase in false-positives, ignored ARDs or antibiotics, likely because of
genomic data. Specific public health deci- may not be appropriate for regulatory con- a lack of data and uncertainty regarding risk
sions may require the selection of different texts, but it may be accepted when using and risk metrics. Given the global magnitude
thresholds in order to optimize the balance of a metric such as the ARD index to gain a of antibiotic resistance, including the emer-
false-positives to false-negatives. Our sequence broader understanding of the antibiotic resis- gence of multi-drug resistance bacterial strains
similarity thresholds matched or exceeded tance potential of an environmental sample in the environment, information pertaining
the criteria used in other studies investigating and to detect the emergence of ARDs. to the status, patterns, and trends in ARDs is
ARG and gene transfer in water. (Kristiansson To frame metagenomic screening data needed. Public health management decisions
et al. 2011; Zhang et al. 2011). There was a within an early risk management approach, that may benefit from information regard-
significant decrease in the number of index- we can calculate an environmental detec- ing ARD potential include actions aimed at
positive sequences for each sample and index tion rate for the ARD index by sample (see reducing the sources and exposure routes of
subcategory as the threshold was increased. Supplemental Material, Figure S2). The ARDs and the framing of adaptive monitoring
This trend may be related to sequence read environmental detection rate provides a rough protocols. Source control of ARDs entering
length in that sequences assigned at the lower estimate of the number of ARD sequences coastal environments primarily involves waste
thresholds may be too short to reach the align- present per volume of water sampled management and the regulation of antibiotic
ment length criteria of the higher thresholds (and takes into account the mass of DNA use in agriculture, aquaculture, hospitals,
(e.g., WWTP sludge sample) or that the lower extracted), the mass sequenced, and sequenc- and households (Davies and Davies 2010).
thresholds overassign false-positives. Further ing depth. For example, the environmental Exposure control of ARDs may involve beach
optimization of sequence similarity thresholds detection rates for ARGs in the WWTP efflu- or shellfish bed advisories or aquaculture siting.
for public health applications will be necessary ent and estuary samples were approximately Due to the uncertainty in links between expo-
to ensure proper interpretation of the index. 1.58 × 108 sequences/L and 0 sequences/L, sure and actual human health risk, current
Applications to public health surveil- respectively (based on the medium-high applications of the index as a screening tool
lance. Current water quality standards are sequence similarity threshold). Quantitative are best suited to ARD source control. For
culture based and highly specific for targeted microbial risk assessments have used gene example, using the index to screen WWTP
and cruise ship effluent and discharge sites,
4
Estuary
0.8
freshwater inputs such as river mouths and
Coastal ocean PC1
Freshwater lake 0.6
coastal aquaculture operations could provide
Marina baseline environmental levels for anthropo
Loading value
2
–0.2
rently do not account for the potential risk
–0.4
associated with antibiotic resistance release,
–0.6
such as reducing ARD dissemination into the
environment by improving WWTP technolo-
ARG
MRG
TE
Plasmid
Phage
Pathogen
0
gies or reducing the use of activated sludge as
0.8 fertilizer for agricultural crops.
PC2
P26
0.6
Future data needs. The ARD index is a
high-throughput measure of ARD poten-
Loading value
0.4
tial and as such cannot be directly related
0.2
–2 to human health risk. For environmentally
0
sourced ARGs to pose a health risk, they must
–0.2 a) be transferable via MGEs; b) be transferred
–0.4 to either pathogenic or commensal bacteria
–4
–0.6 that then infect or colonize humans; and
ARG
MRG
TE
Plasmid
Phage
Pathogen
reflect the fact that, although emerging tech- References Sogin ML. 2010. Diversity and population structure of
nologies will continue to provide unlimited sewage-derived microorganisms in wastewater treatment
Allen HK, Donato J, Wang HH, Cloud-Hansen KA, Davies J, plant influent. Environ Microbiol 12:378–392.
access to genomic information, development Meyer F, Paarmann D, D’Souza M, Olson R, Glass EM, Kubal M,
Handelsman J. 2010. Call of the wild: antibiotic resis-
of risk assessment frameworks will be of tance genes in natural environments. Nat Rev Microbiol et al. 2008. The metagenomics RAST server—a public
equal importance. 8:251–259. resource for the automatic phylogenetic and functional
Allen LZ, Allen EE, Badger JH, McCrow JP, Paulsen IT, analysis of metagenomes. BMC Bioinformatics 9:386;
Although the cost and time required doi:10.1186/1471-2105-9-386.
Elbourne LD, et al. 2012. Influence of nutrients and cur-
for metagenomic analysis is still greater rents on the genomic composition of microbes across an Oh S, Caro-Quintero A, Tsementzi D, DeLeon-Rodriguez N,
than existing regulatory options for moni- upwelling mosaic. ISME J 6:1403–1414. Luo C, Poretsky R, et al. 2011. Metagenomic insights into
the evolution, function, and complexity of the planktonic
toring, advances in sequencing technologies Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990.
microbial community of Lake Lanier, a temperate fresh
Basic local alignment search tool. J Mol Biol 215:403–410.
and bioinformatic platforms are increasing Amann RI, Ludwig W, Schleifer KH. 1995. Phylogenetic identi- water ecosystem. Appl Environ Microbiol 77:6000–6011.
the utility of high-throughput approaches. fication and in-situ detection of individual microbial-cells Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY,
Cohoon M, et al. 2005. The subsystems approach to
Next generation sequencing platforms now without cultivation. Microbiol Rev 59:143–169.
genome annotation and its use in the project to annotate
Baquero F. 2012. Metagenomic epidemiology: a public health
offer increased sequencing depths and read need for the control of antimicrobial resistance. Clin 1000 genomes. Nucleic Acids Res 33:5691–5702.
lengths for < $0.10/megabase (Glenn 2011). Microbiol Infect 18(suppl 4):67–73. Port JA, Wallace JC, Griffith WC, Faustman EM. 2012.
Furthermore, the availability of publicly avail- Bibby K, Peccia J. 2013. Identification of viral pathogen diver- Metagenomic profiling of microbial composition and
sity in sewage sludge by metagenome analysis. Environ antibiotic resistance determinants in Puget Sound. PLoS
able bioinformatic analysis tools and pipelines Sci Technol 47:1945–1951. One 7:e48000; doi:10.1371/journal.pone.0048000.
(Scholz et al. 2012) provides a platform for Breitbart M. 2012. Marine viruses: Truth or dare. Ann Rev Mar Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C,
public health practitioners to access and auto- Sci 4:425–448. et al. 2012. The Pfam protein families database. Nucleic
Bush K, Courvalin P, Dantas G, Davies J, Eisenstein B, Acids Res 40:D290–D301.
mate in a way that addresses the research or Sanapareddy N, Hamp TJ, Gonzalez LC, Hilger HA, Fodor AA,
Huovinen P, et al. 2011. Tackling antibiotic resistance. Nat
regulatory question at hand. Rev Microbiol 9:894–896. Clinton SM. 2009. Molecular diversity of a North Carolina
Decreased sequencing costs and increased Centers for Disease Control and Prevention. 2014. National wastewater treatment plant as revealed by pyrosequencing.
Antimicrobial Resistance Monitoring System for Enteric Appl Environ Microbiol 75:1688–1696.
sequencing depths will also allow for longi Schmieder R, Edwards R. 2012. Insights into antibiotic resistance
Bacteria (NARMS) Homepage. Available: http://www.cdc.
tudinal sampling and greater geospatial gov/narms/ [accessed 8 January 2014]. through metagenomic approaches. Future Microbiol 7:73–89.
coverage, leading to a more comprehensive Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, et al. Scholz MB, Lo CC, Chain PS. 2012. Next generation sequencing
and bioinformatic bottlenecks: the current state of meta
profiling of the ARD index. Furthermore, 2009. The Ribosomal Database Project: improved align-
genomic data analysis. Curr Opin Biotech 23:9–15.
ments and new tools for rRNA analysis. Nucleic Acids Res
although the sample size in this study was 37:D141–D145. Segura PA, Francçois M, Gagnon C, Sauve S. 2009. Review of the
limited, the PCA framework presented pro- Davies J, Davies D. 2010. Origins and evolution of antibiotic occurrence of anti-infectives in contaminated wastewaters
and natural and drinking waters. Environ Health Perspect
vides a platform from which to tease apart the resistance. Microbiol Mol Biol Rev 74:417–433.
117:675–684; doi:10.1289/ehp.11776.
Ecker DJ, Sampath R, Willett P, Wyatt JR, Samant V,
index and characterize individual ecosystems. Massire C, et al. 2005. The Microbial Rosetta Stone Staley C, Gordon KV, Schoen ME, Harwood VJ. 2012.
Database: a compilation of global and emerging infectious Performance of two quantitative PCR methods for micro-
Conclusions microorganisms and bioterrorist threat agents. BMC bial source tracking of human sewage and implications
Microbiol 5:19; doi:10.1186/1471-2180-5-19. for microbial risk assessment in recreational waters. Appl
We had three objectives, to a) develop a European Centre for Disease Prevention and Control. 2014. Environ Microbiol 78:7317–7326.
metagenomic ARD index that quantifies the European Antimicrobial Resistance Surveillance Network State of Washington. 2014. WAC 173-201A-210. Marine Water
antibiotic resistance signal within marine and (EARS-Net) Homepage. Available: http://www.ecdc. Designated Uses and Criteria. Available: http://apps.leg.
europa.eu/en/activities/surveillance/EARS-Net [accessed wa.gov/wac/default.aspx?cite=173-201A-210 [accessed
freshwater environments, b) analyze this index 8 Janurary 2014].
8 January 2014].
for common patterns characterizing specific Glenn TC. 2011. Field guide to next-generation DNA sequencers. Wellington EM, Boxall AB, Cross P, Feil EJ, Gaze WH,
ecosystems, and c) conceptually frame the Mol Ecol Resour 11:759–769. Hawkey PM, et al. 2013. The role of the natural environ-
Grundmann H, Klugman KP, Walsh T, Ramon-Pardo P, ment in the emergence of antibiotic resistance in gram-
index within an environmental health sur- negative bacteria. Lancet Infect Dis 13:155–165.
Sigauque B, Khan W, et al. 2011. A framework for global
veillance context. Significant differences were surveillance of antibiotic resistance. Drug Resist Update Wright GD. 2007. The antibiotic resistome: the nexus of chemical
seen in the index when comparing marine and 14:79–87. and genetic diversity. Nat Rev Microbiol 5:175–186.
Wright GD. 2010. Antibiotic resistance in the environment: a
freshwater environments that differ in proxim- Handelsman J. 2004. Metagenomics: application of genomics
link to the clinic? Curr Opin Microbiol 13:589–594.
to uncultured microorganisms. Microbiol Mol Biol Rev
ity to human impact, and distinct index pat- 68:669–685. Wu CH, Sercu B, Van de Werfhorst LC, Wong J, DeSantis TZ,
terns were evident across these environments. Hugenholtz P, Tyson GW. 2008. Microbiology: metagenomics. Brodie EL, et al. 2010. Characterization of coastal urban
watershed bacterial communities leads to alternative
We conclude that the index has potential to Nature 455:481–483.
community-based indicators. PLoS One 5:e11285;
Kristiansson E, Fick J, Janzon A, Grabic R, Rutgersson C,
be a valuable screening tool for early risk man- Weijdegard B, et al. 2011. Pyrosequencing of antibiotic- doi:10.1371/journal.pone.0011285.
agement of ARDs, but to define index thresh- contaminated river sediments reveals high levels of resis- Ye L, Zhang T. 2011. Pathogenic bacteria in sewage treatment
old levels of concern and link these levels to tance and gene transfer elements. PLoS One 6:e17038; plants as revealed by 454 pyrosequencing. Environ Sci
doi:10.1371/journal.pone.0017038. Technol 45:7173–7179.
decisions will require a better understanding Liu B, Pop M. 2009. ARDB—Antibiotic Resistance Genes Zhang T, Zhang XX, Ye L. 2011. Plasmid metagenome reveals
of the prevalence, fate, and transport of ARGs Database. Nucleic Acids Res 37:D443–447. high levels of antibiotic resistance genes and mobile
in the marine environment. Nevertheless, Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, genetic elements in activated sludge. PLoS One 6:e26041;
Bemben LA, et al. 2005. Genome sequencing in micro doi:10.1371/journal.pone.0026041.
characterization of the ARD potential of Zhang XX, Zhang T, Fang HH. 2009. Antibiotic resistance
fabricated high-density picolitre reactors. Nature
environmental microbial communities is a 437:376–380. genes in water environment. Appl Microbiol Biotechnol
first step toward incorporating metagenomic Martinez JL. 2009. The role of natural environments in the evo- 82:397–414.
lution of resistance traits in pathogenic bacteria. Proc Biol Zhu W, Lomsadze A, Borodovsky M. 2010. Ab initio gene
information into monitoring frameworks for identification in metagenomic sequences. Nucleic Acids
Sci 276:2521–2530.
antibiotic resistance in aquatic ecosystems. McLellan SL, Huse SM, Mueller-Spitz SR, Andreishcheva EN, Res 38:e132; doi:10.1093/nar/gkq275.