Evaluation of Whole-Genome DNA Methylation Sequencing Library Preparation Protocols
Evaluation of Whole-Genome DNA Methylation Sequencing Library Preparation Protocols
Abstract
Background: With rapidly dropping sequencing cost, the popularity of whole-genome DNA methylation sequenc-
ing has been on the rise. Multiple library preparation protocols currently exist. We have performed 22 whole-genome
DNA methylation sequencing experiments on snap frozen human samples, and extensively benchmarked common
library preparation protocols for whole-genome DNA methylation sequencing, including three traditional bisulfite-
based protocols and a new enzyme-based protocol. In addition, different input DNA quantities were compared for
two kits compatible with a reduced starting quantity. In addition, we also present bioinformatic analysis pipelines for
sequencing data from each of these library types.
Results: An assortment of metrics were collected for each kit, including raw read statistics, library quality and
uniformity metrics, cytosine retention, and CpG beta value consistency between technical replicates. Overall, the NEB-
Next Enzymatic Methyl-seq and Swift Accel-NGS Methyl-Seq kits performed quantitatively better than the other two
protocols. In addition, the NEB and Swift kits performed well at low-input amounts, validating their utility in applica-
tions where DNA is the limiting factor.
Results: The NEBNext Enzymatic Methyl-seq kit appeared to be the best option for whole-genome DNA methyla-
tion sequencing of high-quality DNA, closely followed by the Swift kit, which potentially works better for degraded
samples. Further, a general bioinformatic pipeline is applicable across the four protocols, with the exception of extra
trimming needed for the Swift Biosciences’s Accel-NGS Methyl-Seq protocol to remove the Adaptase sequence.
Keywords: DNA methylation, Epigenetics, Whole-genome bisulfite sequencing, Enzymatic methylation sequencing,
Fallopian tube
© The Author(s) 2021. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and
the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material
in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material
is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the
permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativeco
mmons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/
zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Morrison et al. Epigenetics & Chromatin (2021) 14:28 Page 2 of 15
DNA binding proteins. Array-based and sequencing- both 5-mCs and Cs are affected by this process, the
based methods built upon these principles have been TET-oxidized methylcytosines, 5-hmC, 5-fC, and 5-caC
developed and benchmarked [4–6]. The current gold- are minimally impacted [15]. Based on this principle, an
standard approach to examine genome-wide DNA meth- enzymatic methyl-seq (EM-seq) method was recently
ylation composition and differences is through chemical developed by New England Biolabs. Their method
modification of unmethylated cytosines using sodium uses TET2 to oxidize methylated cytosines and subse-
bisulfite [7]. Bisulfite deaminates unmethylated cytosines quent APOBEC3A treatment to convert unmethylated
(Cs) to uracils that are converted to thymines (Ts) dur- cytosines to uracils [15]. WGBS and EM-seq, collectively
ing PCR amplification. Methylated cytosines (mCs) referred to as whole-genome methylation sequencing
remain unaltered through this process. The end result (WGMS), both convert a 5-mC/C difference to a C/T dif-
yields stable genetic differences between methylated (C) ference; therefore, analysis tools developed for WGBS are
and unmethylated cytosines (T), reflecting the underly- also applicable to EM-seq.
ing DNA methylation landscape, effectively turning the In this study, we extensively benchmark the perfor-
epigenetic difference into a genetic difference, which can mance of three most commonly used protocols for
then be studied using conventional genome-scale meth- bisulfite-based whole-genome DNA methylation pro-
ods, such as microarrays or sequencing. Various genera- filing including the KAPA Hyper Prep kit (Kapa), the
tions of bisulfite-based microarrays have been used to Miura and Ito post-bisulfite adapter tagging (PBAT)
profile hundreds of thousands of human samples due to method, and the Swift Biosciences Accel-NGS Methyl-
the low cost and easy, standardized data processing and Seq DNA library kit (Swift), as well as the new EM-seq
analysis. With dropping sequencing cost in recent years, protocol from New England Biolabs, the NEBNext Enzy-
the popularity of sequencing-based methods has been on matic Methyl-seq kit (NEB), on fresh–frozen human
the rise [8]. tissue samples. For each technique, we evaluate input
Whole-genome bisulfite sequencing (WGBS) pro- quantities, read mapping statistics, library complexity,
vides the most comprehensive single base resolution insert size, cytosine retention, as well as reproducibility
DNA methylation maps. It was successfully applied to between replicates. We also present bioinformatic analy-
Arabidopsis thaliana in 2008 [9, 10] and then to humans sis pipelines for each of these library types.
in 2009 [11]. In these early methods, adapter-ligated
library material undergoes bisulfite conversion, leading Results
to sheared and degraded library fragments and overall Benchmarking studies often use cell lines due to
lower quantities and diversity of sequenceable material. the largely isogenic background and reduced bio-
A post-bisulfite adapter tagging (PBAT) method [12, logical variance to probe the reproducibility of the
13] was developed to overcome this hurdle, effectively method. However, it is often of interest to apply these
decreasing the input range to nanogram level. Notably, approaches to complex tissue sources for primary
this method has been used for single-cell WGBS profiling research. Thus, we chose to use frozen normal primary
[14]. More recently, Swift Biosciences has produced a kit human solid tissue to benchmark these kits in a more
that is an alternative approach to the post-bisulfite library common research scenario. We used human fallopian
preparation. The alternative approach maintains the low tube samples, which are believed to host the presumed
DNA input capabilities of PBAT, while also including cell-of-origin for high-grade serous ovarian cancer [16].
a low-complexity sequence on the 3′ end of the ssDNA The results presented are based on two snap-frozen pri-
during library preparation that serves as a scaffold for mary patient fallopian tube samples (denoted Biological
sequencing adapter attachment (Accel-NGS Methyl-Seq Replicates A and B). DNA from each sample was pre-
protocol, Swift Biosciences). pared using one of the four library preparation proto-
The conditions needed for bisulfite conversion are cols, as summarized in Table 1. With the exception of
known to be harsh on the DNA and cause degradation. In the PBAT protocol, two aliquots from the same DNA
recent years, it has become clear that this conversion can extraction were used to produce technical replicates for
also be achieved with an enzymatic approach. 5-mCs can each protocol (denoted Technical Replicates 1 and 2).
be converted to 5-hydroxymethylcytosine (5-hmC), then The PBAT protocol does not have a technical replicate
to 5-formylcytosine (5-fC), and eventually to 5-carboxyl- due to the poor quality of the initial sequencing run
cytosine (5-caC) by the ten–eleven translocation (TET) and not enough leftover library material to generate
family dioxygenases [15]. Further, the apolipoprotein B additional sequencing information for either replicate.
mRNA editing enzyme, catalytic polypeptide-like 3A In addition to generating samples using the suggested
(APOBEC3A) deaminates methylated and unmethylated amount of DNA input, a smaller DNA input (10 ng
Cs into thymines and uracils, respectively [15]. While each) was used in the NEB and Swift protocols to test
Morrison et al. Epigenetics & Chromatin (2021) 14:28 Page 3 of 15
Company Name Roche Sequencing New England N/A Swift Biosciences New England Swift Biosciences
Biolabs Biolabs
Kit / Protocol Name KAPA Hyper Prep NEBNext Enzymatic Miura F., Ito T. Accel-NGS Methyl- NEBNext Enzymatic Accel-NGS Methyl-
Methyl-seq (2018) Seq Methyl-seq Seq
Kit or Protocol? Kit Kit Protocol Kit Kit Kit
Strands aligned to OT/OB OT/OB CTOT/CTOB OT/OB OT/OB OT/OB
WGBS or EM-seq? WGBS EM-seq WGBS WGBS EM-seq WGBS
DNA Input [ng] 300 200 100 100 10 10
# Samples 2 (A/B) 2 (A/B) 2 (A/B) 2 (A/B) 2 (A/B) 2 (A/B)
# Tech. Reps. 2 (1/2) 2 (1/2) 1 (1) 2 (1/2) 2 (1/2) 2 (1/2)
Sheared? Yes Yes No Yes Yes Yes
Conversion Kit EZ DNA Methyla- N/A EZ DNA Methyla- EZ DNA Methyla- N/A EZ DNA Methylation-
tion-Gold kit tion-Gold kit tion-Gold kit Gold kit
# Amplification 10 4 0 4 8 8
Rounds
DNA Controls lambda phage / lambda phage / lambda phage / lambda phage / lambda phage / lambda phage /
pUC19 pUC19 pUC19 pUC19 pUC19 pUC19
Sequencer Used Illumina Illumina Illumina Illumina Illumina Illumina
NovaSeq6000 NovaSeq6000 NovaSeq6000 NovaSeq6000 NovaSeq6000 NovaSeq6000
Approx. Processing 7 9 14-16 7 9 7
Time [hrs]
Throughout the text, the “Short Name” entry is used to describe which protocol is being discussed, rather than the full name. Sample and technical replicate names
are included in parentheses. The lambda phage control is unmethylated, while the pUC19 control is methylated. These were added to the high molecular weight
genomic DNA sample at a rate of 0.01% and 0.0005%, respectively. The OT and OB strands are the original top and original bottom strands, while the CTOT and CTOB
are the complements of the original top and original bottom strands, respectively. These strands can also be referred to as the bisulfite Watson (OT), bisulfite Crick
(OB), bisulfite Watson reverse (CTOT), and bisulfite Crick reverse (CTOB) strands
the effectiveness of low DNA inputs on these protocols, ≥ 40), sub-optimally aligned (MAPQ < 40), and not
which performed best in the initial testing. (NEB and aligned read fragments was calculated. For Sample A,
Swift state they can go as low as 10 ng [17] and 100 pg the NEB and Swift protocols had about the same frac-
[18], respectively.) Table 1 shows the details for each tion of read fragments that were optimally aligned (∼ 85
protocol used in this study. %), regardless of the amount of input DNA (Fig. 2A).
The first set of metrics compared between the four The Kapa protocol was slightly behind (∼ 80%), while
preparations is related to the quality of raw reads the PBAT protocol was closer to ∼ 75%. The lower per-
received from the sequencer, including base quality and centage of optimally aligned reads for the PBAT proto-
adapter contamination, and the effects of trimming on col, when coupled with the substantially lower number
those reads. In general, the raw base qualities, percentage of read fragments, means there is much less data to use
of reads with adapter contamination, and percentage of when performing analyses using this protocol.
bases trimmed are comparable between the Kapa, NEB, Another metric that can speak to the quality of a library
and Swift protocols (Fig. 1 and Additional file 1: Figure preparation is the insert size, which relates to the size of
S1). However, the PBAT protocol suffers from a higher the sequenced DNA fragments. For these experiments,
percentage of low-quality bases along the length of the DNA used in the Kapa, NEB and Swift libraries were
read, leading to a higher percentage of trimmed bases generated in a single reaction, then split into individual
during the trimming stage. Because the PBAT protocol aliquots for library generation. Given the uniformity of
does not contain an amplification step during library input, it would be expected that the resulting libraries
preparation, the higher percentage of trimmed bases rel- would also have identical size profiles. However, bisulfite
ative to the other preparations has a larger effect on the preparations (Kapa and Swift) led to shorter fragments
amount of usable data available from this protocol. compared to the enzymatic preparation (NEB), which
Following adapter and quality trimming of the raw retained a wider range of fragment lengths that are gen-
reads, the reads were mapped to the human genome erally longer than the bisulfite fragments (Fig. 2B). The
and the fraction of optimally aligned (defined as MAPQ retention of a smaller range of shorter fragments by the
Morrison et al. Epigenetics & Chromatin (2021) 14:28 Page 4 of 15
A B
C D
E F
Fig. 1 Raw read statistics for each protocol. Each plot shows the percentage of bases with different levels of base quality, namely low base quality
(< 20) for A read 1 and B read 2, medium base quality (20 ≤ quality ≤ 30) for C read 1 and D read 2, and high base quality (> 30) for E read 1 and
F read 2
bisulfite preparations suggest degradation of the DNA The library complexity is a metric that can be used to
during library generation, which is consistent with the determine if a library has reached a saturation point,
known tendency of bisulfite conversion to degrade sam- where sequencing deeper will only gain a marginal
ples. Whereas the other samples were sheared prior to amount of unique (i.e., not duplicate) reads. While
conversion, the PBAT samples were only sheared by the none of the samples were sequenced to saturation, the
bisulfite process itself. This process is quite destructive, Kapa and low-input Swift samples showed a lower level
leaving shorter fragments than the other bisulfite con- of complexity compared with the NEB and standard-
version protocols. The Swift protocol had the longest input Swift samples (Fig. 2D). Due to the PBAT sam-
and more consistent insert size out of all bisulfite-based ples having much fewer reads, it is difficult to ascertain
methods. where the PBAT library complexity ranks compared to
With regards to the fraction of reads with MAPQ data derived from the NEB or Kapa protocols at higher
≥ 40 marked as duplicates (Fig. 2C), the standard-input read depths. At the PBAT sequencing depth in this
NEB, PBAT, and Swift protocols each had about 10% study, it appears to have a trend similar to the Kapa
of duplicate reads. The low-input NEB protocol was protocol complexities, implying an overall lower level
slightly higher at ∼ 15%. The Kapa (standard-input) and of library complexity relative to Swift and NEB data.
low-input Swift protocols had the highest percentage of To determine the uniformity of reads distributed
reads, sitting closer to 25%. across the genome, the ratio of the observed read cov-
erage to the expected coverage was calculated across
Morrison et al. Epigenetics & Chromatin (2021) 14:28 Page 5 of 15
A B
C D
Swift
NEB Rep. Rep. 1
1 and 2
Low NEB
Rep. 1 and 2
Low Swift
Rep. 1 and 2
PBAT
Fig. 2 Library quality metrics for each protocol for Sample A. A The percentage of optimally, sub-optimally, and not aligned read fragments for each
protocol. Note, read fragments treat reads 1 and 2 as separate entities, as it is possible that one read in the pair is mapped, while the other is not.
Additional file 1: Figure S2 shows the number of read fragments shown on each bar. B Insert size distribution. C Duplicate rate for reads with MAPQ
≥ 40. D The library complexity, which is a function of the duplicate rate. Metrics for Sample B are shown in Additional file 1: Figure S3
several regions (Fig. 3A–B and Additional file 1: Figure the other preparations have coverage over 85% at 150
S7). In general, there is consistent coverage across the million mapped reads, with the exception of the Swift
genic, intergenic, and repeat-masked regions, with each samples’ coverage of CpG islands, which is in the 75-80%
sample having less than a 5% departure from expected range. It should be noted that the low DNA input runs of
uniformity (closer to 1.0 is better) in these regions (Addi- the NEB protocol produced coverage levels that are con-
tional file 1: Figure S7A–C). In contrast, exonic regions, sistent with the standard DNA input runs across samples
all CpGs, and CpG islands show greater heterogeneity and technical replicates.
across kits and larger departures from uniform cover- These methods, bisulfite- or enzyme-based, all distin-
age (Fig. 3A–B and Additional file 1: Figure S7D). PBAT guish DNA methylation states by converting unmethyl-
had a much higher observed rate of coverage compared ated cytosines into uracils and subsequently thymines
to expected. The other protocols all have lower coverage during PCR, while sparing methylated cytosines. There-
than would be expected, with the NEB samples having fore, effective conversion of unmethylated cytosines,
the closest to uniform coverage and the Kapa samples but not methylated cytosines, is key to the accuracy of
having the lowest coverage of these regions. The under- these methods. Using mitochondrial DNA, which is
coverage of CpGs by the Kapa protocol is consistent with consistently unmethylated, or spike-in controls, such as
a prior study [19]. One example of the differences in cov- unmethylated lambda phage or methylated pUC19 vec-
erage uniformity across the protocols can be found in the tors, the effectiveness of the conversion can be tested.
EPCAM promoter region (Fig. 4). Fig. 5 shows the results of cytosine conversion on
When performing WGMS, a library preparation’s lambda phage (Fig. 5A), pUC19 (Fig. 5B), and mitochon-
coverage of cytosines, particularly in a CpG context, is drial DNA (Fig. 5C). Generally, the cytosine conversion
important to assess the DNA methylation landscape of a behaved as expected, with the exception of mitochondrial
given sample. When comparing the protocols used in this DNA in the PBAT protocol. The beta values show a broad
analysis, the Kapa protocol has the lowest percentage of distribution centered slightly below 0.2 for Sample A and
CpGs covered in a number of different regions on these about 0.25 for Sample B. Interestingly, the mitochondrial
samples (Fig. 3C–D and Additional file 1: Figure S8), DNA-based incomplete-conversion rate was different
which is consistent with previous work [19]. Generally, from that based on lambda phage for PBAT. This is likely
Morrison et al. Epigenetics & Chromatin (2021) 14:28 Page 6 of 15
A B
C D
Fig. 3 Library uniformity as measured by coverage of various genomic element categories. Ratio of observed coverage to expected coverage for
A all CpGs and B CpG islands. C Percentage of all CpGs covered by at least one unique read with MAPQ ≥ 40. D Percentage of CpGs in CpG islands
covered by at least one unique read with MAPQ ≥ 40. Note, for C and D, all libraries were downsampled to be comparable to PBAT (150 million
reads, or ∼4.8X coverage, per sample, see "Methods" for details); therefore, any differences are not likely confounded by sequencing depth. As
expected, this coverage will be substantially higher at increased depth
because mitochondrial DNA is circular and can become seen for the base-averaged cytosine retention (Additional
supercoiled. Unlike the other protocols, there is no file 1: Figure S10).
mechanical shearing of the DNA in PBAT, which could The M-bias plots [22] (Additional file 1: Figure S11)
explain this difference. This also shows the limitation of show a consistently lower CpG retention across the entire
using mitochondrial DNA as negative controls. read length for both reads 1 and 2 of the PBAT protocol.
The read-averaged cytosine retention for all protocols For the Kapa protocol, there is a consistently higher CpG
reflect what would be expected for a WGMS run, namely retention rate for read 1 than the other preparations.
CpG retention above 70% and the other cytosine contexts However, read 2 tends to be more in line with the reten-
around 1% (Fig. 6). Moreover, the technical replicates tion rate seen in the others. For CpH retention, each pro-
showed consistent retention, while the two biological tocol has approximately the same level of retention, with
replicates showed a bigger difference. Across all proto- slight deviations in the first 5 bp. Due to the adapter trim-
cols, Sample A consistently had higher CpA retention, ming performed (see "Methods" for details), retention
but not CpC or CpT retention. It is known that biologi- rates can behave erratically at the end of reads depend-
cal CpH retention tends to occur in the CpA dinucleo- ing on the base content of the adapter that is trimmed.
tide context [20]. The higher CpA retention, in contrast It should also be noted that, by default, the aligner used
to CpC and CpT retention, shows Sample A likely has in this analysis (see "Methods" section) does not include
true CpH methylation, a process previously thought to cytosines in the first and last 3 bp of a read when deter-
be largely restricted to embryonic stem cells and neurons mining methylation beta values, so erratic behavior on
[21]. Our results also confirm that, unlike CpA methyl- the ends of the reads is not included in methylation-
ation, CpC and CpT retention likely do not reflect true related metrics.
biological methylation, but incomplete conversion, at To compare the consistency of CpG beta values, Spear-
least in mammalian samples. The Swift and NEB proto- man correlation coefficients were calculated between
cols generally showed the lowest amount of such tech- technical replicates of each protocol (Fig. 7), as well as
nical artifact (both below 0.5%), except for one replicate between preparations of the same sample (Additional
of the low input Swift preparation. Similar results can be file 1: Table S1). Again, all samples were subsampled
to 150 million reads (equivalent of ∼4.8X coverage)
Morrison et al. Epigenetics & Chromatin (2021) 14:28 Page 7 of 15
A1 A1
A2 A2
Swift
Kapa
B1 B1
B2 B2
A1 A1
Low Swift
A2 A2
NEB
B1 B1
B2 B2
PBAT
A1 A1
Low NEB
A2 A2
B1
B2
CpG Island
Fig. 4 The EPCAM promoter region as a representative example for data generated with the protocols. The aligned reads tracks are taken from
the Integrated Genomics Viewer (IGV) [45] in the bisulfite mode where red represents an unconverted cytosine and blue represents a converted
cytosine. Each panel represents one sample, with A and B denoting the biological replicates and 1 and 2 the technical replicates for each library
construction protocol. The shown region is 1500 bp upstream and downstream of exon 1. The location of a CpG island is indicated with a green
box on the bottom. Note, the strands for the PBAT samples have been flipped in silico before being displayed to account for the strand definition in
the Miura and Ito protocol. The strands in the PBAT protocol are opposite from what is expected by IGV, as well as the definition used by the other
protocols. All libraries were downsampled to be comparable to PBAT; therefore, any differences are not likely confounded by sequencing depth. As
expected, this coverage will be substantially higher at increased depth
to ensure fair comparison. The correlation would be and low-input samples, had the highest correlation at 0.9
higher if the number of reads increases. The correlations or higher. The Kapa protocol had the lowest correlation
between the NEB technical replicates, both the standard (i.e., less consistency in beta values between replicates)
Morrison et al. Epigenetics & Chromatin (2021) 14:28 Page 8 of 15
A B
Fig. 5 Cytosine retention for methylation controls, namely lambda phage (A), pUC19 (B), and mitochondrial DNA (C). Lambda phage and
pUC19 are added to the genomic DNA to serve as unmethylated and methylated controls, respectively. Mitochondrial DNA is a good source of
unmethylated DNA that can be used in lieu of spike-in controls. Note, PBAT only had one technical replicate, so there is no “Rep. 2” half on these
violins. In addition, the mitochondrial DNA required at least three reads covering each CpG, while the spike-in controls required at least one read
due to the fewer number of reads relative to the genomic DNA. This coverage requirement can result in all CpGs for a sample being 1.0, such as in
Kapa Sample A in B
A B
C D
Fig. 6 Read-averaged cytosine retention by dinucleotide context: A CpA, B CpC, C CpG, and D CpT. In each panel two technical replicates are
shown for each biological replicate. The x-axis denotes percent retention, with a scale of 0–5% for CpH panels and 0–100% for the CpG panel. All
libraries were downsampled to be comparable to PBAT; therefore, any differences are not likely confounded by sequencing depth
Morrison et al. Epigenetics & Chromatin (2021) 14:28 Page 9 of 15
Fig. 7 The NEB protocol has the highest correlation of beta values between Sample A technical replicates. The Spearman correlation coefficient, rs,
between the two replicates is listed in each figure, along with the number of 100 kb bins used in calculating the coefficient. Note, all libraries were
downsampled to be comparable to PBAT; therefore, any differences are not likely confounded by sequencing depth. Overall low correlation values
are due to low coverage from downsampling. Additional file 1: Figure S12 shows projections of Replicates 1 and 2 onto a single axis
of just under 0.75. The standard-input Swift sample did of the plot, while Sample B clusters in the lower left.
better than the low-input sample, with correlations of The protocol split occurs consistently in both samples,
0.873 and 0.814, respectively. A correlation for the PBAT with PBAT and Kapa each separated into their own
protocol could not be calculated because PBAT did not clusters, while Swift and NEB yielded similar results.
have a technical replicate. When comparing preparations Because of the different approach to cytosine con-
to one another, the Kapa protocol had low correlations, version used by the NEB protocol relative to others
with the maximum correlation of 0.62 to the Swift pro- (enzymatic versus chemical), the difference in beta val-
tocol (Additional file 1: Table S1). Only two of the other ues between the standard-input NEB and Swift proto-
ten correlations were 0.8 or below, both of which were cols were compared to look for bias in the methylation
between the PBAT and Swift protocols. Of the NEB and level of the NEB protocol. After calculating the beta
Swift protocols, the NEB protocol had the highest cross- value differences for Sample A, only four CpGs with
protocol correlations, with overall greater correlations |diff | > 0.5 for both Technical Replicate 1 and 2 were
compared to the Swift protocols. found (Additional file 1: Figure S13). However, in Sam-
To further compare the consistency of CpG beta val- ple B, only one (which is not one of the four CpGs in
ues, a principal component analysis (PCA) was per- Sample A) was seen, so this difference was taken to be
formed (Fig. 8). The PCA shows the data generally splits sample-dependent and not due to an inherent bias in
along two variables: the sample the data came from and the enzymatic conversion process.
the protocol used to generate the library. With respect
to the sample split, Sample A clusters in the upper right
Morrison et al. Epigenetics & Chromatin (2021) 14:28 Page 10 of 15
random DNA breaks [25–28], again more prevalently cfDNA samples. cfMeDIP-seq [30] has been developed
in the CGI regions, making these small fragments more to overcome this limitation, pushing the lower input limit
available for continued library generation than larger to 1–10 ng. However, it is associated with similar limita-
fragments. In addition, CGIs are G:C rich; during the tions of common affinity-based methods [2, 30]. As the
random priming step of the PBAT library creation pro- enzymatic conversion by the NEB protocol still demon-
cess, these higher G:C regions generate more stable strates a library complexity at 10 ng comparable to that of
primer annealing conditions and increase the likelihood regular bulk DNA-based methods, the NEB protocol may
of extension and incorporation into the final library due prove to be a good alternative approach for clinical early
to a slightly higher regional melting temperature and detection work. In addition, with its template-preserving
exhibition of G:C clamping features [29]. The more even nature and excellent performance at low-input levels, the
CGI coverages exhibited by the Swift and NEB libraries NEB protocol could hold more promise for single-cell
are also protocol related: in the Swift protocol, bisulfite DNA methylation profiling.
conversion precedes manual shearing and adapter liga-
tion, avoiding the strand-breakage issue and generating Conclusion
relatively diverse libraries, and NEB’s process completely In this study, we compared four commonly used WGMS
avoids the deleterious bisulfite conversion step through library preparation protocols, including three bisulfite-
enzymatic reactions. based protocols and one enzyme-based protocol. Table 1
It should be noted that, while the PBAT protocol favors shows a summary of the protocols used, while Table 2
CpG islands, we do not necessarily suggest using this summarizes some of the results found in the analysis.
protocol if one is specifically targeting CpG island regions We found the NEBNext Enzymatic Methyl-seq and the
using sequencing. As they are not PCR amplified, PBAT Accel-NGS Methyl-Seq kits performed quantitatively
libraries produce relatively little sequenceable material better than the other two protocols at the standard-input
and, as a result, it was difficult to get enough depth to level of DNA for each kit. We found the NEB kit to per-
complete our analysis. This is evident in our lack of repli- form comparably across biological and technical repli-
cation of the PBAT library, as our technical replicate did cates for two different amounts of DNA input, whereas
not produce enough information for a thorough analysis the Swift kit showed some decline with the lower amount
despite exhausting the library during the sequencing run. of input. Based on these results, we recommend use
Further, as shown here, both NEB and Swift, to a lesser of the NEBNext Enzymatic Methyl-seq kit for whole-
degree, are able to push the input threshold well below genome DNA methylation sequencing.
that of PBAT, while maintaining better quality libraries
and more uniform coverage across the genome. Methods
The clinical utility of DNA methylation has been long Fallopian tube sample preparation
recognized, explored, and established, particularly for Fallopian tubes from two primary patients were delivered
cell-free DNA (cfDNA)-based early detection of neo- in saline from a local hospital. Upon arrival, samples were
plasm. The current gold-standard technologies for DNA washed using sterile 1x Phosphate Buffered Saline and
methylation profiling are bisulfite-based. Harsh bisulfite snap-frozen in liquid nitrogen. At a later date, the fallo-
treatment is known to cause heavy degradation of the pian tubes were thawed and minced. DNA was extracted
template, which is often scarce to begin with for clinical from the minced fallopian tubes using the tissue protocol
# Sequenced Reads (Millions) 347.5 607.9 355.6 389.9 127.9 124.1 395.8 368.2 453.7 408.6 392.7 449.8
Low Quality Bases Trimmed (R1) 2.3 1.8 1.0 1.0 0.9 0.9 0.6 0.6 1.2 1.1 1.1 0.8
Low Quality Bases Trimmed (R2) 2.0 1.7 1.1 1.0 5.9 3.4 0.7 0.7 1.2 1.2 1.0 0.8
Insert Size 208.1 228.0 285.4 297.8 299.3 300.6 222.1 225.0 291.4 290.8 217.9 223.8
Duplicate Rate 27.5 32.4 9.5 9.9 13.5 11.0 12.8 12.1 18.0 16.2 30.7 26.9
% CpGs Covered 74.0 72.3 87.5 87.6 81.8 82.6 85.6 85.2 87.6 87.6 85.5 85.3
% CpG Retention 81.6 79.0 78.5 75.6 74.0 71.1 80.3 77.5 77.8 74.8 79.4 76.9
All values shown are averaged across the two technical replicates, with the exception of PBAT, which only had one technical replicate. The first four rows are taken
from the raw data, while the last three rows are taken from the subsampled data, where the BAMs were downsampled to be comparable to PBAT
Morrison et al. Epigenetics & Chromatin (2021) 14:28 Page 12 of 15
from Qiagen’s DNeasy Blood & Tissue Kit (69504). Fol- (Roche Sequencing, Cat. #KK2800) for the DNA tem-
lowing extraction, dsDNA was quantified using Invitro- plate extension step.
gen’s Qubit 3.0 Fluorometer. KAPA pure beads (Roche Sequencing, Cat. #KK8001)
were used for cleanup steps for all prepared libraries.
Whole‑genome methylation sequencing libraries Quality and quantity of the finished libraries were
Methylated pUC19 and unmethylated lambda phage assessed using a combination of the Agilent High Sensi-
DNA (0.0005% and 0.01%, respectively) were added to
4626), QuantiFluor® dsDNA System (Promega Corp.,
tivity DNA chip (Agilent Technologies, Inc., Cat. #5067-
each high molecular weight genomic DNA sample. These
are included as methylation controls in the NEB proto- Cat. #E2670), and Kapa Illumina Library Quantification
col and were added to the DNA used in each protocol for qPCR assay (Roche Sequencing, Cat. #KK4824). 100 bp
consistency. The DNA was then sheared to approximately paired-end sequencing was performed on an Illumina
350 bp in average size for all prepared libraries, with the NovaSeq6000 sequencer using an S4, 200 bp sequencing
exception of the post-bisulfite adapter tagging (PBAT) kit (Illumina Inc., San Diego, CA, USA), with 10% PhiX.
libraries, where the DNA was not initially sheared. Base calling was done by Illumina RTA3 and output of
Libraries were prepared from the KAPA Hyper Prep kit NCS was demultiplexed and converted to FASTQ format
(v6.17) (Roche Sequencing, Cat. #KK8504) with an input with Illumina Bcl2fastq (v1.9.0).
of 300 ng of sheared DNA following the manufacturer’s
protocol with the following modifications. Illumina Alignment and methylation extraction
TruSeq Nano adapters at a concentration of 10 µM were Upon receipt of the FASTQ files, the files were trimmed
used. The post ligation cleanup elution was reduced to 20 using Trim Galore [31] version 0.6.4_dev (using Cuta-
µL and the entire DNA elution went into the EZ DNA dapt version 2.10). Default inputs were used, other than:
Methylation-Gold kit (Zymo Research, Cat. #D5005). –illumina –trim-n –paired –cores 4 –
The bisulfite converted DNA was eluted in 20 µL and 10 fastqc –fastqc_args "–noextract". In addi-
cycles of library amplification were performed using the
–clip_R2 14, due to the Adaptase™ method used by
tion, the FASTQ files for the Swift samples included
KAPA HiFi HotStart Uracil+ ReadyMix (Roche Sequenc-
ing, Cat. #KK2800). Swift Biosciences [32].
Libraries were prepared from the Accel-NGS Methyl- The trimmed FASTQ files were aligned to GRCh38
Seq DNA library kit (v3.0) (Swift Biosciences, Cat. [33] using BISCUIT [34] version 0.3.16. An index for the
#30024) with an input of either 10 ng or 100 ng of reference genome was created using biscuit index
sheared DNA following manufacturer’s protocol with GRCh38.p13.genome.fa, followed by aligning each
the following modifications. The DNA was bisulfite con- sample to the indexed reference. The alignment step used
verted using the EZ DNA Methylation-Gold kit (Zymo the default options for biscuit align, with these
Research, Cat. #D5005) with an elution volume of 15 µL. exceptions: -M -t 20 -R sample_specific_read_
Following adapter ligation, either 8 cycles (10 ng DNA group. Each sample received its own read group (-R
input) or 4 cycles (100 ng DNA input) of library amplifi- tag). The aligned reads were duplicate marked using Sam-
cation were performed. blaster [35] version 0.1.25, with the -M flag and defaults.
Libraries were prepared from the NEBNext Enzymatic The reads were then coordinate sorted and indexed using
Methyl-seq kit (New England Biolabs, Cat. #E7120L) Samtools [36] version 1.10 (with htslib 1.10.2). Default
using an input of either 10 ng or 200 ng of sheared DNA options were used, with the following exceptions -@ 20
and libraries were made according to the manufactur- -m 5G -o sample_name.sorted.markdup.bam
er’s protocol. The denaturation method used was 0.1 N -O BAM (sort) and -@ 20 (index).
sodium hydroxide, according to the protocol, and either 8 The extraction of cytosine methylation information
cycles (10 ng DNA input) or 4 cycles (200 ng DNA input) proceeded as follows. Pileup VCF files were generated
of PCR amplification were performed. using biscuit pileup with default parameters.
Libraries were prepared from the PBAT method bgzip and tabix (included with htslib version 1.10.2)
described by Miura and Ito [13] using an input of 100 were used to compress and index the VCF files. Default
ng of genomic DNA that went directly into the EZ DNA parameters were used for bgzip and tabix, with the
Methylation-Gold kit (Zymo Research, Cat. #D5005) exception of -p vcf in the call for tabix. The VCF
with an elution volume of 21 µL. Then, 10 µL of the files were then processed through biscuit vcf2bed,
bisulfite converted DNA was used for making the PBAT bedtools sort (bedtools [37] version 2.29.2), and
libraries as previously described in [13] with the modifi- biscuit mergecg to create coordinate sorted BED
cation of using KAPA HiFi HotStart Uracil+ ReadyMix files containing CpG methylation beta value information.
For each command, the default parameters were used.
Morrison et al. Epigenetics & Chromatin (2021) 14:28 Page 13 of 15
After creating the BED files, they were compressed and CpG BED files were generated in a similar manner to the
indexed using bgzip and tabix, with default parame- original BAMs, as described previously.
ters being used in both cases, with tabix also including Using the subsampled BAMs, the average coverage and
-p bed. percentage of covered CpGs within different genomic
Any code not explicitly stated in this and subsequent regions, including CpG islands, exons, genes, and repeat-
sections can be found on GitHub [38]. masked regions, were calculated using custom scripts.
The CpGs that fell in each region were determined by
FASTQ and alignment quality control intersecting a BED file containing CpG coordinates and
Quality-control data were collected from a number of the coverage at those locations with a BED file containing
sources and viewed using MultiQC [39] version 1.9. the region’s coordinates. The coverage was determined
Quality control for the FASTQ files was generated by using bedtools genomecov, while the intersection
FastQC [40] version 0.11.9 and Cutadapt during the was done using bedtools intersect. The average
trimming process. Command arguments used can be coverage was calculated by taking a weighted average of
found in the previous section. Statistics on the percent- the coverage for each CpG. The percentage of covered
age of duplicate marked reads were produced by Sam- CpGs was calculated by
blaster. Library complexity was calculated by Preseq [41]
Number of CpGs in Region with Coverage > 1
version 2.0.3 using these options, c_curve -B -P -v . (2)
-o sample.complex.ccurve.txt, where “sam- Total Number of CpGs in Region
ple” is the name of each sample. Quality controls from
The scripts used to calculate these values made use of
Samtools were generated with samtools stats and
GNU parallel [43].
samtools flagstat, with default parameters and -@
The observed coverage to expected coverage ratio was
20. BISCUIT includes a custom BASH script to generate
calculated as ratio = Obs/Exp, where:
quality-control statistics related to data aligned by BIS-
CUIT. This script was run with this command, QC.sh Number of Bases Mapped to Region
-v samp.pileup.vcf.gz -o samp_QC hg38_
Obs = (3)
Total Number of Mapped Bases
assets GRCh38.p13.genome.fa samp samp.
sorted.markdup.bam. In each case, “samp” corre-
Sum of Mappability Scores in Region
sponds to the name of the processed sample. Exp = .
Total Sum of Mappability Scores in Genome
The hg38_assets mentioned in the BISCUIT quality-
(4)
control script command can be found in a zip file on the
BISCUIT GitHub release page [42]. This formulation assumes all bases are not equally acces-
sible when sequencing. The expected coverage takes
Library protocol comparison analysis into account this difference in accessibility by including
To collect statistics related to the raw reads stored in mappability scores based on the Bismap k100 multi-read
the FASTQ files, a custom Python (version 3.7.6) script mappability scores [44]. The observed coverage does not
was written. It uses the gzip, glob, and time Python include these scores, as mappability is assumed to be
base modules and these additional Python packages: inherently included when performing DNA sequencing.
argparse (version 1.1), numpy (version 1.18.1), and Because Bismap did not include a mappability score for
biopython (version 1.76). Statistics regarding the every base in the genome, the expected and observed
trimmed reads were collected from log files generated by coverage calculations were restricted to those bases that
Cutadapt, as described previously. included a mappability score. Only mapped reads (FLAG
For a number of the analyses, the aligned BAM files field in BAM does not include 0x4 flag) with MAPQ
were subsampled before calculating the corresponding score ≥ 40 were included in this calculation. The values
metric. The BAMs were subsampled using samtools for each observed to expected ratio were calculated using
view -hbu -F 0x4 -q 40 sample.bam | sam- custom BASH scripts that used bedtools and GNU
tools view -hbu -s FRAC -. “FRAC” was calcu- awk (version 4.0.2).
lated as The beta values for the lambda phage and pUC19 meth-
ylation controls were extracted using the same method
150, 000, 000 as the genomic methylation extraction (see Alignment
, (1)
Number Mapped Fragments with MAPQ ≥ 40 and Methylation Extraction for the details), with the one
exception being that only one read was required to cover
for each sample. The subsampled BAMs were sorted and a CpG. The coverage requirement difference was due to
indexed using Samtools. Pileup VCF files and merged the fractional amount of lambda phage and pUC19 that
Morrison et al. Epigenetics & Chromatin (2021) 14:28 Page 14 of 15
were included in the sequencing compared with the The CpG and other genomic region BED files men-
amount of genomic DNA. The mitochondrial DNA beta tioned in this section can be found on the GitHub release
values were taken directly from the genomic methylation page for the analysis code.
BED file.
To calculate correlations between samples, the genome
Abbreviations
was binned into 100 kb bins and the average beta value DNA: Deoxyribonucleic acid; PCR: Polymerase chain reaction; FFPE: Formalin-
calculated for CpGs in each window. The bins were deter- fixed, paraffin-embedded; WGBS: Whole-genome bisulfite sequencing;
mined via bedtools makewindows -w 100000 EM-seq: Enzymatic methyl-seq; WGMS: Whole-genome (DNA) methylation
sequencing; PBAT: Post-bisulfite adapter tagging; C: Cytosine; T: Thymine; mC:
-g GRCh38.p13.genome.fa.fai | sort Methylated cytosine; 5-mC: 5-Methylcytosine; 5-hmC: 5-Hydroxymethylcy-
-k1,1 -k2,2n. The “.fai” file was generated via sam- tosine; 5-fC: 5-Formylcytosine; 5-caC: 5-Carboxylcytosine; TET: Ten–eleven
tools faidx GRCh38.p13.genome.fa. The aver- translocation; APOBEC3A: Apolipoprotein B mRNA editing enzyme, catalytic
polypeptide-like 3A.
age beta value for each bin was calculated via bedtools
map -a bins.bed -b sample.bed -c 4,5 -o
Supplementary Information
mean | gzip. bins.bed contains the 100 kb bins,
The online version contains supplementary material available at https://doi.
while sample.bed is the merged CpG BED file gener- org/10.1186/s13072-021-00401-y.
ated from the subsampled BAMs. Correlations between
the 100 kb averaged beta values were calculated using Additional file 1. Additional Table S1 and Figures S1–S13.
the Spearman correlation coefficient, as implemented in
Python’s scipy package (version 1.4.1).
Acknowledgements
The PCA was performed using the average beta values Computation for the work described in this paper was supported by the High
in 100 kb bins, which were calculated in the same way as Performance Cluster and Cloud Computing (HPC3) Resource at the Van Andel
the correlation analysis. After calculating the beta values, Research Institute.
they were transformed into smoothed M-values via: Authors’ contributions
MA and HS conceptualized the study. JM led the study and performed the
M+k analyses. JM, MA and HS wrote the manuscript. DWC, LLR, and EJS provided
logit , (5) the samples. KKF, JMK, and MA prepared samples and performed sequencing.
(M + k) + (U + k)
BKJ, IB, and WZ provided intellectual input during analysis. All authors read
and approved the manuscript.
where M is the number of methylated cytosines and U is
the number of unmethylated cytosines at a given CpG. Funding
The smoothing factor, k, eliminates the infinities that This study was funded by the National Institutes of Health/National Cancer
Institute [R37 CA230748] and the Ovarian Cancer Research Fund (now Ovarian
occur at 0 and 1 in the logit transformation. The numpy Cancer Research Alliance) [373933].
logit function was used to apply the transformation. This
conversion turns the beta-distributed beta values into Availability of data and materials
Sequencing data are from human tissue and will be available through dbGaP.
the more Gaussian-distributed M-values. After convert- Software used in processing the data is available on GitHub at https://github.
ing to M-values, the data was standardized using the com/jamorrison/wgms_kit_comparison/releases/tag/v1.0.4.
StandardScaler function, as implemented in scikit-learn
(sklearn version 0.24.0). The PCA was performed Declarations
using scikit-learn’s PCA function, keeping only the first
Ethics approval and consent to participate
two components (n_components=2). This study received ethics approval from the Van Andel Research Institute
The analysis for methylation bias in the NEBNext Enzy- Institutional Review Board (IRB #19017).
matic Methyl-seq kit was performed by extracting CpGs
Consent for publication
that had more than 20 reads covering them for both the Not applicable.
NEB and Swift protocols for Sample A replicate 1 and 2
or the NEB and Swift protocols for Sample B replicate Competing interests
The authors declare that they have no competing interests.
1 and 2. Methylation bias was determined by requiring
both Equations 6 and 7 to be true. Author details
1
Department of Epigenetics, Van Andel Research Institute, 333 Bostwick Ave-
|βNEB − βSwift | > 0.5 (Replicate 1) (6) nue NE, Grand Rapids, MI 49503, USA. 2 Genomics Core, Van Andel Research
Institute, 333 Bostwick Avenue NE, Grand Rapids, MI 49503, USA. 3 Center
for Computational and Genomic Medicine, The Children’s Hospital of Philadel-
phia, 3501 Civic Center Boulevard, Philadelphia, PA 19104, USA. 4 Department
|βNEB − βSwift | > 0.5 (Replicate 2) (7) of Pathology and Laboratory Medicine, University of Pennsylvania, Philadel-
phia, PA 19104, USA. 5 Spectrum Health Office of Research and Education,
This was done separately for the standard DNA input Spectrum Health System, 15 Michigan Street NE, Grand Rapids, MI 49503, USA.
NEB and Swift protocols of Samples A and B.
Morrison et al. Epigenetics & Chromatin (2021) 14:28 Page 15 of 15
Received: 16 February 2021 Accepted: 27 May 2021 23. Landan G, Cohen NM, Mukamel Z, Bar A, Molchadsky A, Brosh R, et al.
Epigenetic polymorphism and the stochastic formation of differen-
tially methylated regions in normal and cancerous tissues. Nat Genet.
2012;44:1207–14.
24. Zhou L, Ng HK, Drautz-Moses DI, Schuster SC, Beck S, Kim C, et al.
References Systematic evaluation of library preparation methods and sequencing
1. Laird PW. The power and the promise of DNA methylation markers. Nat platforms for high-throughput whole genome bisulfite sequencing. Sci
Rev Cancer. 2003;3:253–66. Rep. 2019;9:10383.
2. Laird PW. Principles and challenges of genome-wide DNA methylation 25. Munson K, Clark J, Lamparska-Kupsik K, Smith SS. Recovery of bisulfite-
analysis. Nat Rev Genet. 2010;11:191–203. converted genomic sequences in the methylation-sensitive ddPCR.
3. Clark SJ, Harrison J, Paul CL, Frommer M. High sensitivity mapping of Nucleic Acids Res. 2007;35:2893–903.
methylated cytosines. Nucleic Acids Res. 1994;22:2990–7. 26. Tanaka K, Okamoto A. Degradation of DNA by bisulfite treatment. Bioorg
4. Bock C, Tomazou EM, Brinkman AB, Müller F, Simmer F, Gu H, et al. Med Chem Lett. 2007;17:1912–5.
Quantitative comparison of genome-wide DNA methylation mapping 27. Grunau C, Clark SJ, Rosenthal A. Bisulfite genomic sequencing: systematic
technologies. Nat Biotechnol. 2010;28:1106–14. investigation of critical experimental parameters. Nucleic Acids Res.
5. Harris RA, Wang T, Coarfa C, Nagarajan RP, Hong C, Downey SL, et al. Com- 2001;29:E65.
parison of sequencing-based methods to profile DNA methylation and 28. Mill J, Petronis A. Profiling DNA methylation from small amounts of
identification of monoallelic epigenetic modifications. Nat Biotechnol. genomic DNA starting material: efficient sodium bisulfite conversion
2010;28:097–1105. and subsequent whole-genome amplification. Methods Mol Biol.
6. The BLUEPRINT consortium. Quantitative comparison of DNA methyla- 2009;507:371–81.
tion assays for biomarker development and clinical applications. Nat 29. Dieffenbach CW, Lowe TM, Dveksler GS. General concepts for PCR primer
Biotechnol. 2016;34:726–37. design. PCR Methods Appl. 1993;3(3):S30–7.
7. Clark SJ, Statham A, Stirzaker C, Molloy PL, Frommer M. DNA methylation: 30. Shen SY, Singhania R, Fehringer G, Chakravarthy A, Roehrl MHA, Chadwick
Bisulphite modification and analysis. Nat Protoc. 2006;1:2353–64. D, et al. Sensitive tumour detection and classification using plasma cell-
8. Zhou W, Dinh HQ, Ramjan Z, Weisenberger DJ, Nicolet CM, Shen H, et al. free DNA methylomes. Nature. 2018;563:579–83.
DNA methylation loss in late-replicating domains is linked to mitotic cell 31. Krueger F. TrimGalore; 2019. Available from: https://github.com/Felix
division. Nat Genet. 2018;50:591–602. Krueger/TrimGalore.
9. Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, 32. Swift Biosciences. Tail Trimming for Better Data: Accel-NGS Methyl-Seq,
et al. Highly Integrated Single-Base Resolution Maps of the Epigenome in Adaptase Module and 1S Plus DNA Library Kits; 2018. Available from:
Arabidopsis. Cell. 2008;133:523–36. https://swiftbiosci.com/wp-content/uploads/2019/02/16-0853-Tail-Trim-
10. Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD, et al. Final-442019.pdf.
Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA 33. GENCODE. Human Release 32; 2019. Available from: https://www.genco
methylation patterning. Nature. 2008;452:215–9. degenes.org/human/release_32.html.
11. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, 34. Zhou W. BISCUIT: BISulfite-seq CUI Toolkit; 2020. Available from: https://
et al. Human DNA methylomes at base resolution show widespread github.com/huishenlab/biscuit.
epigenomic differences. Nature. 2009;462:315–22. 35. Faust GG, Hall IM. SAMBLASTER: fast duplicate marking and structural
12. Miura F, Enomoto Y, Dairiki R, Ito T. Amplification-free whole-genome variant read extraction. Bioinformatics. 2014;30:2503–5.
bisulfite sequencing by post-bisulfite adaptor tagging. Nucleic Acids Res. 36. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The
2012;40:e136. Sequence Alignment/Map format and SAMtools. Bioinformatics.
13. Miura F, Ito T. Post-Bisulfite Adaptor Tagging for PCR-Free Whole-Genome 2009;25:2078–9.
Bisulfite Sequencing. In: DNA Methylation Protocols. New York, NY: 37. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing
Springer New York; 2018. p. 123–136. genomic features. Bioinformatics. 2010;26:841–2.
14. Smallwood SA, Lee HJ, Angermueller C, Krueger F, Saadeh H, Peat J, et al. 38. Morrison J. WGMS Kit Comparison Source Code; 2021. Available from:
Single-cell genome-wide bisulfite sequencing for assessing epigenetic https://github.com/jamorrison/wgms_kit_comparison/releases/tag/
heterogeneity. Nat Methods. 2014;11:817–20. v1.0.4.
15. Vaisvila R, Ponnaluri VKC, Sun Z, Langhorst BW, Saleh L, Guan S, et al. 39. Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis
EM-seq: Detection of DNA Methylation at Single Base Resolution from results for multiple tools and samples in a single report. Bioinformatics.
Picograms of DNA. bioRxiv. 2019. https://doi.org/10.1101/2019.12.20. 2016;32:3047–8.
884692. 40. Andrews S. FastQC: A Quality Control Tool for High Throughput Sequence
16. Labidi-Galy SI, Papp E, Hallberg D, Niknafs N, Adleff V, Noe M, et al. High Data; 2010. Available from: http://www.bioinformatics.babraham.ac.uk/
grade serous ovarian carcinomas originate in the fallopian tube. Nat projects/fastqc/.
Commun. 2017;8:1093. 41. Daley T, Smith AD. Predicting the molecular complexity of sequencing
17. New England Biolabs Inc . NEBNext Enzymatic Methyl-seq (EM-seq) libraries. Nat Methods. 2013;10:325–7.
Technical Report; 2019. Available from: https://www.neb.com/products/ 42. Zhou W. BISCUIT Version 0.3.16 Release Page; 2020. Available from:
e7120-nebnext-enzymatic-methyl-seq-kit#Product%20Information. https://github.com/huishenlab/biscuit/releases/tag/v0.3.16.20200420.
18. Swift Biosciences. Swift Protocol: Accel-NGS Methyl-Seq DNA Library Kit; 43. Tange O. GNU Parallel 20200522 (‘Kraftwerk’). Zenodo. 2020. https://doi.
2020. Available from: https://swiftbiosci.com/wp-content/uploads/2020/ org/10.5281/zenodo.3841377.
02/PRT-019-Methyl-Seq-Protocol-Rev-3.pdf. 44. Karimzadeh M, Ernst C, Kundaje A, Hoffman MM. Umap and Bismap:
19. Nair SS, Luu PL, Qu W, Maddugoda M, Huschtscha L, Reddel R, et al. quantifying genome and methylome mappability. Nucleic Acids Res.
Guidelines for whole genome bisulphite sequencing of intact and FFPET 2018;46:e120.
DNA on the Illumina HiSeq X Ten. Epigenetics & Chromatin. 2018;11:24. 45. Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz
20. Schultz MD, He Y, Whitaker JW, Hariharan M, Mukamel EA, Leung D, et al. G, et al. Integrative Genomics Viewer. Nat Biotechnol. 2011;29:24–6.
Human body epigenome maps reveal noncanonical DNA methylation
variation. Nature. 2015;523:212–6.
21. He Y, Ecker JR. Non-CG Methylation in the Human Genome. Annu Rev
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in pub-
Genomics Hum Genet. 2015;16:55–77.
lished maps and institutional affiliations.
22. Hansen KD, Langmead B, Irizarry RA. BSmooth: from whole genome
bisulfite sequencing reads to differentially methylated regions. Genome
Biol. 2012;13:R83.