05.metsker. Emerging Technologies in DNA Sequencing
05.metsker. Emerging Technologies in DNA Sequencing
Demand for DNA sequence information has never been greater, yet current Sanger technology is too costly, time
consuming, and labor intensive to meet this ongoing demand. Applications span numerous research interests,
including sequence variation studies, comparative genomics and evolution, forensics, and diagnostic and applied
therapeutics. Several emerging technologies show promise of delivering next-generation solutions for fast and
affordable genome sequencing. In this review article, the DNA polymerase-dependent strategies of Sanger
sequencing, single nucleotide addition, and cyclic reversible termination are discussed to highlight recent advances
and potential challenges these technologies face in their development for ultrafast DNA sequencing.
More than just a mapping and sequencing endeavor, the Human with a minor allele frequency >5% and their potential role in
Genome Project (HGP) has altered the mindset and approach to common disease (Lander 1996; Risch and Merikangas 1996; Col-
many basic and applied research efforts. Early skepticism and lins et al. 1997). Recent, large-scale genotyping efforts of these
controversy (Koshland 1989; Luria et al. 1989; Roberts 1989b; common SNPs have shown that much of the human genome can
Fox et al. 1990) were soon laid to rest by well-developed strategies be parsed into common haplotype blocks (Daly et al. 2001; Patil
(Roberts 1989a; Collins and Galas 1993; Collins et al. 1998) that et al. 2001; Gabriel et al. 2002). The International HapMap Con-
led to the successful execution of mankind’s largest biology sortium (2003) was formed to characterize common patterns of
project. At the core of the HGP was technology development that sequence variation by determining allele frequencies and the de-
advanced the pace of sequencing a mammalian-size genome gree of association between SNPs among geographically distinct
from years to months. Along the way, numerous strategies groups, leading to the identification of “tagSNPs” for genome-
emerged that hold promise for rapid, efficient, and inexpensive wide, disease-based association studies. With this method of
delivery of DNA sequence information. For the HGP, a brute- characterization, however, rare SNPs/haplotypes may be over-
force approach was adopted for completing the job by coupling looked, as highlighted by Liu et al. (2005), who described an
the core technologies of Sanger sequencing and fluorescence de- association of rare variants/haplotypes with osteoporosis.
tection. The completion of the sequencing phase could not have A shift in large-scale strategies from genotyping to rese-
been accomplished without major innovations in recombinant quencing is currently taking place to explore the significance of
protein engineering, fluorescent dye development, capillary elec- less-common SNPs to human biology and disease. The “re” in
trophoresis, automation, robotics, informatics, and process man- this approach is the sequencing of additional genomes related to
agement. The result was completion of a high-quality, reference a reference genome for de novo SNP discovery and comparative
sequence of the human genome in April, 2003 (Collins et al. genomics application. The ENCODE Project Consortium (2004)
2003), marking the 50-year anniversary of the discovery of the has described significant efforts toward resequencing megabase-
double-helix structure. For many outside the genome commu- sized blocks of the human genome. Consequently, genome cen-
nity, that heroic milestone signaled the end of this international ters are now diverting at least 10%–20% of their resources, which
scientific project, but for the rest of us, it only marked the be- currently translates to ∼5% capacity, to resequencing hundreds
ginning of things to come. to thousands of gene regions. This increase in momentum for
The need for sequencing has never been greater than it is high-throughput resequencing will greatly facilitate studies to
today, with applications spanning diverse research sectors in- determine the genetic basis of susceptibility to common disease,
cluding comparative genomics and evolution, forensics, epide- cancer biology, and disease association in model and nonmodel
miology, and applied medicine for diagnostics and therapeutics. organisms.
Arguably, the strongest rationale for ongoing sequencing is the Current sequencing technologies are too expensive, labor
quest for identification and interpretation of human sequence intensive, and time consuming for broad application in human
variation as it relates to health and disease. The most common sequence variation studies. Genome center cost is calculated on
form of variation is the single nucleotide polymorphism (SNP). the basis of dollars per 1000 Q20 bases (defined below) and can be
Although two unrelated people share, on average, 99.9% se- generally divided into the categories of instrumentation, person-
quence identity (i.e., one difference in a thousand base pairs), the nel, reagents and materials, and overhead expenses. Currently,
average occurrence of an SNP in the general population is once these centers are operating at less than one dollar per 1000 Q20
every few hundred base pairs. As such, more than nine million bases, with at least 50% of the cost resulting from DNA sequenc-
unique SNPs have been cataloged in the public database, dbSNP ing instrumentation alone. Developments in novel detection
(Crawford and Nickerson 2005), with many more expected to be methods, miniaturization in instrumentation, microfluidic sepa-
found in large-scale resequencing efforts. ration technologies, and an increase in the number of assays per
A great deal of attention has been focused on common SNPs run will most likely have the biggest impact on reducing cost. It
should be emphasized, however, that new sequencing strategies
E-mail [email protected]; fax (713) 798-5741.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/ will be needed to use these high-throughput platforms effec-
gr.3770505. tively. In September, 2004, the National Human Genome Re-
15:1767–1776 ©2005 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/05; www.genome.org Genome Research 1767
www.genome.org
Metzker
search Institute (NHGRI) initiated two new programs aimed at Table 1. Companies involved in DNA sequencing
bringing the cost of whole-genome sequencing down to technology development
$100,000 (http://grants.nih.gov/grants/guide/rfa-files/RFA-HG- Company names Web site addresses
04-002.html), with the eventual goal being $1000 (http://
grants.nih.gov/grants/guide/rfa-files/RFA-HG-04-003.html). 454 Life Sciences Corp. www.454.com
Numerous strategies and platforms for ultrafast DNA se- Agencourt Biosciences Corp. www.agencourt.com
GE Healthcare, formerly Amersham
quencing currently under development include sequencing-by-
Biosciences www.amershambiosciences.com
hybridization (SBH), nanopore sequencing, and sequencing-by- Applied Biosystems, Inc. www.appliedbiosystems.com
synthesis (SBS), the latter of which encompasses many different Genovoxx www.genovoxx.de
DNA polymerase-dependent strategies. Use of the term SBS has Helicos Bioscience Corp. www.helicosbio.com
become increasingly ambiguous in the literature; therefore, I pro- LaserGen, Inc. www.lasergen.com
Li-Cor, Inc. www.licor.com
pose a classification of DNA polymerase-dependent strategies Microchip Biotechnologies, Inc. www.mcbiotech.com
into three major categories: Sanger sequencing, single nucleotide Nanofluidics www.nanofluidics.com
addition (SNA), and cyclic reversible termination (CRT) (Text Box 1). SeqWright www.seqwright.com
In this review, I will focus only on DNA polymerase-dependent Solexa-Lynx www.solexa.com
Visigen Biotechnologies, Inc. www.visigenbio.com
strategies, which represent the broadest area of research and de-
velopment. For the SNA and CRT strategies, I will emphasize the
chemistry in an effort to illustrate the advantages and challenges
ments in fluorescence detection (Smith et al. 1986; Prober et al.
of these methods. Because of the competitive nature of technol-
1987), enzymology (Tabor and Richardson 1989, 1995), fluores-
ogy development, the exchange of scientific ideas is often
cent dyes (Ju et al. 1995; Metzker et al. 1996; Lee et al. 1997),
thwarted, as many companies do not readily publish results. Al-
dynamic-coating polymers and their derivatives (Ruiz-Martinez
though this review will highlight recent advances reported in the
et al. 1993; Carrilho et al. 1996; Madabhushi et al. 1996, 1999;
literature, readers are directed to the Web sites of companies who
Madabhushi 1998; Salas-Solano et al. 1998; Guttman 2002a,
are active in the sequencing field (Table 1). A recent review by
2002b), and capillary array electrophoresis (CAE) (Takahashi et
Shendure et al. (2004) provides a comprehensive overview of SBH
al. 1994; Kheterpal et al. 1996) have helped to define current
and nanopore sequencing technologies. Important issues sur-
DNA sequencing platforms.
rounding whole-genome sequencing, such as ownership, con-
For automated Sanger sequencing, either the primer or the
sent, privacy, and legal, ethical, and social implications, will not
terminating ddNTP is tagged with a specific fluorescent dye (e.g.,
be addressed here (Foster and Sharp 2002; Robertson 2003; Bon-
ddATP is labeled with the green dye). As these dye-labeled frag-
ham et al. 2005).
ments pass through the detection region, fluorophores are ex-
cited by the laser in the DNA sequencer, producing fluorescence
Sanger sequencing: State-of-the-art technology emissions of four different colors. The determination of the color
The Sanger method is a mixed-mode process involving synthesis is the underlying method for assigning a base call, and the order
of a complementary DNA template using natural 2⬘-deoxy- of the fluorescent fragments reveals the DNA sequence. The
nucleotides (dNTPs) and termination of synthesis using 2⬘,3⬘- “raw” fluorescence signals, however, must be transformed. Re-
dideoxynucleotides (ddNTPs) by DNA polymerase (Sanger et al. moval of cross-talk, correction for dye mobility alterations, and
1977). Balanced appropriately, competition between synthesis normalization of emission intensities must be performed before
and termination processes results in the generation of a set of readable DNA sequence information can be obtained (Smith et
nested fragments, which differ in nucleoside monophosphate al. 1987). Base-calling and error probability assignment (Ewing
units. The ratio of dNTP/ddNTP in the sequencing reaction de- and Green 1998; Ewing et al. 1998) applications are then used to
termines the frequency of chain termination, and hence the dis- call the DNA sequence and assess the accuracy of the call. A
tribution of lengths of terminated chains. The nested fragments Phred20 or Q20 score, equivalent to an error probability of 1% for
are then separated by their size using high-resolution gel electro- a given base call, is considered a high-quality base and serves as
phoresis and analyzed to reveal the DNA sequence. Advance- the commodity standard throughout the sequencing community.
Single channel
Woolley & Mathies (1995) 1 M13mp18 3.5 97 147 9
Liu et al. (1999) 1 M13mp18 6.5 99.4 500 20
Schmalzing et al. (1999) 1 M13 clonesa 11.5 99 505 27
Salas-Solano et al. (2000) 1 M13mp18 11.5 98.5 640 30
Boone et al. (2002) 1 M13mp18 18.0 98 640 30
Shi & Anderson (2003) 1 Unknown 4.5 99.1 320 13
Multiple channel arrays
Liu et al. (2000) 16 M13mp18 7.5 99 457 16
Simpson et al. (2000) 48 M13mp18 10.0 97 400 25–45
Backhouse et al. (2000) 48 BigDye Std 46.5 98 640 150
Koutny et al. (2000) 32 M13 clonesa 40.0 98 800 78
Paegel et al. (2002) 96 M13mp18 15.9 99 430 24
a
Mixture of M13mp18 vector or twelve M13 clones from human chromosome 17 project.
the length of the separation channel. Early studies, however, re- components in DNA sequencing assays (Zhu et al. 2003, 2004).
ported lower separation efficiency in channel turns due to band Alaverdian et al. (2002) proposed using four continuous wave
broadening (Jacobson et al. 1994) and differential field strength (CW) mode lasers, which are modulated at different RFs. To es-
effects (Culbertson et al. 1998). Paegel et al. (2000) introduced a timate the fluorescence signal for each dye, however, the result-
“pinched-turn” design (Fig. 1B) with an effective separation ing emission intensity pattern must be demodulated, which in-
length of 15.9 cm on a 15-cm-diameter silica disc, which has been troduces a significant computational load for each capillary sig-
multiplexed into a 96-channel radial device (Fig. 1C) showing tre- nal channel. Coupled with repetition rates on the order of ⱖ100
mendous potential for increasing throughput in DNA sequencing Hz, the RF method does not appear to be compatible with con-
applications (Paegel et al. 2002). Most of the data shown in Table 2, ventional CCD technology, limiting its scalability for detection
however, were derived using the standard M13mp18 vector as the of high-density capillary arrays.
sequencing template, and similar performance is not typically ob- Recently, Lewis et al. (2005) described a simple but effective
served under the same conditions with “real-world” samples such method for multifluorescence discrimination called pulsed mul-
as those from genome center production lines. tiline excitation (PME). The underlying principle of this four-
laser system is the correlation of sequential laser pulses with detec-
Fluorescence detection tor response (Fig. 2A). Advantages of PME are such that (1) absorp-
tion maxima for the four fluorescent dyes are matched to the
The most widely used detection method for four-color DNA se-
excitation sources yielding maximum signal intensities, (2) tempo-
quencing was initially described almost 20 years ago (Smith et al.
ral separation of the laser pulses and expansion of the dye set across
1986; Prober et al. 1987). This method is based on resolution of
the visible spectrum eliminate cross-talk between the dyes, and (3)
the emission signal from a dye-labeled nucleotide into color,
collection of emission signals is improved by eliminating the re-
with subsequent assignment in the DNA sequence. While suc-
quirement for dispersing elements (prisms or gratings) in color
cessful for the sequencing of numerous higher and lower eukary-
separation. In other words, PME measures multicomponent fluo-
otic and prokaryotic genomes, these four-color systems have
rescence assays in a color-blind manner. To demonstrate these ad-
several disadvantages, including inefficient excitation of the
vantages, Lewis et al. (2005) applied the PME technology to capil-
fluorescent dyes, significant spectral overlap, and inefficient col-
lary electrophoresis for DNA sequencing. Figure 2B shows the un-
lection of the emission signals. The issue of inefficient excitation
processed signals from the four PME laser waveforms for a portion
has been partially addressed by the use of fluorescence resonance
of the PCR amplicon for the TCF1 (formerly known as HNF1A)
energy-transfer (FRET) dyes (Ju et al. 1995; Metzker et al. 1996;
exon 10. Transformation of the data into unambiguous sequence
Lee et al. 1997). At present, FRET dye-labeled ddNTP terminators
data (Fig. 2C) is accomplished by applying only dye mobility cor-
are widely used throughout the sequencing community. The re-
rection software, eliminating the need for cross-talk and signal nor-
sulting improvements in acceptor dye signal intensities, how-
malization software transformation. The PME technology holds
ever, are suboptimal compared with those of single dyes excited
promise for real-time field applications for DNA sequencing.
at their absorption maxima by the appropriate laser source.
To overcome these deficiencies, some investigators have
proposed strategies using additional properties such as fluores- SNA methodologies
cence life-time (Nunnally et al. 1997; Lieberwirth et al. 1998;
Lassiter et al. 2000; Zhu et al. 2003, 2004) and radio frequency Pyrosequencing
(RF) modulation (Alaverdian et al. 2002). For DNA sequencing Arguably the most successful non-Sanger method developed to
applications, fluorescence life-time measurements have been de- date is pyrosequencing, first described in the literature by Hyman
scribed using pulsed lasers with high repetition rates (picosecond (1988). Pyrosequencing is a nonfluorescence technique that
time-scale) with detection in the photon-counting mode. Soper measures the release of inorganic pyrophosphate, which is pro-
and colleagues have recently demonstrated a combined approach portionally converted into visible light by a series of enzymatic
of emission wavelength and fluorescence life-time measure- reactions (Ronaghi et al. 1996, 1998). Unlike other sequencing
ments, with the potential to increase the number of fluorescent approaches that use 3⬘-modified dNTPs to terminate DNA
three steps: incorporation, imaging, and deprotection, as illustrated For CRT terminators to function properly, the protecting
in Figure 4A. The advantages of CRT over Sanger are (1) elimina- group must be efficiently cleaved under mild conditions while
tion of gel electrophoresis and (2) formatting of the CRT assay in coupled to the primer. Removal of the protecting group generally
a highly parallel fashion. Its advantages over pyrosequencing are involves either treatment with strong acid or base, catalytic or
that (1) all four bases are present during the incorporation phase, chemical reduction, or a combination of these methods. Unfor-
(2) step-wise control allows for single-base additions through ho- tunately, these conditions may chemically perturb the DNA
mopolymer repeats, and (3) synchronistic extensions are main- polymerase, nucleotides, oligonucleotide-primed template, or
tained past heterozygous bases. An additional advantage is that the solid support. Use of photocleavable protecting groups is an
unlike the pyrosequencing assay, which must be contained attractive alternative to rigorous chemical treatment and can be
within a defined reaction well, the CRT assay can be performed employed in a noninvasive manner. Of the various photocleav-
on a number of highly parallel platforms, such as high-density able protecting groups (Pillai 1980), the light-sensitive 2-nitro-
oligonucleotide arrays (Pease et al. 1994; Albert et al. 2003), PTP benzyl group has been widely used. For example, it has been
arrays, (Leamon et al. 2003), polony arrays (Mitra and Church applied to natural nucleotides (Metzker et al. 1994, 1998), to the
1999), or random dispersion of single molecules. Albert et al. (2003) linker structure coupling a fluorescent dye to nucleobases (Li et
have demonstrated the 5⬘→3⬘ synthesis of oligonucleotide on a al. 2003; Mitra et al. 2003), and to other nucleic acid structures as
high-density array and the application of incorporation of dye- well (Ohtsuka et al. 1974; Pease et al. 1994; Chaulk and MacMil-
labeled ddNTPs by DNA polymerase. These advantages of the CRT lan 1998; Singh-Gasson et al. 1999). Under appropriate deprotec-
technology could represent significant improvements in speed, tion conditions (e.g., ultraviolet light >300 nm), the 2-nitroben-
throughput, and accuracy over Sanger and SNA approaches. zyl group can be efficiently cleaved (Fig. 4B) without affecting
At the center of the CRT chemistry is the reversible termi- either the pyrimidine or purine bases (Bartholomew and Broom
nator. Ideally, these terminators should exhibit fast and efficient 1975; Pease et al. 1994).
deprotection kinetics, efficient incorporation kinetics by DNA Other protecting groups have been described for reversible
polymerase, and labels with desired characteristics, such as fluo- terminators as well. For example, Metzker et al. (1994) first de-
rophores with good fluorescence properties. Of the challenges scribed the synthesis and incorporation of a 3⬘-O-allyl-dATP by
associated with CRT for high-throughput genome sequencing, DNA polymerase, with the O-allyl group being removed using the
creating these reversible terminators with the desired properties well-known palladium (Pd) catalyst chemistry (Hayakawa et al.
and identifying DNA polymerases that recognize these substrates 1986, 1993; Honda et al. 1997). Recently, Ruparel et al. (2005)
with high affinities are the most demanding aspects of the tech- reported the synthesis of the first fluorescently labeled 3⬘-O-allyl-
nology. The latter point is exemplified by the presence of com- dNTPs. These unique reversible terminators require dual depro-
peting natural nucleotides, which can readily cause asynchronis- tection steps using UV light to cleave the fluorophore from the
tic base extensions (Metzker et al. 1998). The first examples of nucleotide (Fig. 3B), and the Pd catalyst reaction to restore the
reversible terminators using commercially available DNA poly- natural 3⬘-OH substrate. At this year’s Advances in Genome Biology
merases were reported by Canard and Sarfati (1994) and Metzker and Technology/Automation in Mapping and Sequencing meeting,
et al. (1994). Solexa reported on a similar CRT chemistry with a sequence read-
length of approximately 20 bases (http://www.agbt.org) and re-
cently reported the complete sequencing of the 174 genome
(http://www.solexa.com).
Earlier concerns regarding short read-lengths and assemblies
for SNA strategies will prove relevant to CRT as well. To overcome
this issue, research efforts in CRT technology development will
continue to focus on the cycle efficiency. The CRT read-length is
governed by the overall cycle efficiency, which is highly depen-
dent on the product of deprotection and incorporation efficien-
cies. For example, if one considers the conservative loss of 50%
signal as the assay’s end-point, the read-length is a function of
the cycle efficiency (Ceff) (Fig. 4C). Here, a read-length of only
seven bases will be achieved with an overall cycle efficiency of
90% and can be increased beyond 100 bases in length by im-
proving cycle efficiency to >99%. Figure 4D illustrates the effect
that chemical modifications of the 2-nitrobenzyl ring system
have on deprotection efficiency and thymidine production (V.A.
Litosh, W. Wu, B. Stupi, and M. Metzker, unpubl.). Thus, recent
improvements in chemical engineering of reversible terminators
are important developments for CRT as an emerging technology
for DNA sequencing applications.
Figure 4. CRT technologies. (A) The CRT cycle. (B) The photocleavage
reaction of a 3⬘-O-2-nitrobenzyl-nucleoside. (C) Effect of cycle efficiency
on CRT read-length. (D) Kinetic study of protocleavage reaction for single Conclusions
substituted (2-SSNB) and double substituted (2-dsNB) 2-nitrobenzyl thy- Recent developments in DNA polymerase-dependent strategies
midine analogs. Percentage thymidine (%Thy) was calculated according
to the equation: %Thy = AThy/(AThy + As2NB), where AThy and As2NB are highlight the central role these methods play in determination of
the integrated peak areas from RP-HPLC analysis for thymidine and sub- the overall success of the sequencing assay. Although the stan-
stituted 2-nitrobenzyl thymidine analogs, respectively. dards for current Sanger technology have set the mark for emerg-
ing SNA and CRT technologies, these measures have evolved over Domratchev, S., et al. 2002. A family of novel DNA sequencing
instruments based on single-photon detection. Electrophoresis
several decades and from numerous research laboratories. The
23: 2804–2817.
integration of additional technologies will be key for develop- Albert, T.J., Norton, J., Ott, M., Richmond, T., Nuwaysir, K., Nuwaysir,
ment of robust DNA sequencing platforms, including instrumen- E.F., Stengele, K.-P., and Green, R.D. 2003. Light-directed 5⬘→3⬘
tation, microfluidics, robotics, automation, software control, synthesis of complex oligonucleotide microarrays. Nucleic Acids Res.
31: e35.
data acquisition, and informatics. Backhouse, C., Caamano, M., Oaks, F., Nordman, E., Carrillo, A.,
Beyond the integrated instrumentation built around the Johnson, B., and Bay, S. 2000. DNA sequencing in a monolithic
chemistry, the method by which genomes are sequenced will be microchannel device. Electrophoresis 21: 150–156.
Bartholomew, D.G. and Broom, A.D. 1975. One-step chemical synthesis
important. Most strategies described in this review will employ of ribonucleosides bearing a photolabile ether protecting group. J.
the random approach of whole-genome shotgun sequencing and Chem. Soc. Chem. Commun. Issue 2: 38.
assembly (Weber and Myers 1997), including resequencing ef- Becker, H. and Gartner, C. 2000. Polymer microfabrication methods for
microfluidic analytical applications. Electrophoresis 21: 12–26.
forts for human sequence variation studies. While the random Bonham, V.L., Warshauer-Baker, E., and Collins, F.S. 2005. Race and
approach has the advantage of simplicity, it will require a tre- ethnicity in the genome era: The complexity of the constructs. Am.
mendous number of sequence reads (i.e., a minimum of 900 mil- Psychol. 60: 9–15.
Boone, T., Fan, Z., Hooper, H., Ricco, A., Tan, H., and Williams, S. 2002.
lion, 100-base reads will be needed to achieve a 30⳯ assembly for
Plastic advances microfluidic devices. Anal. Chem. 74: 78A–86A.
a mammalian-size genome) to produce comprehensive sequence Braslavsky, I., Hebert, B., Kartalov, E., and Quake, S.R. 2003. Sequence
data for comparative studies between genomes. A directed ap- information can be obtained from single DNA molecules. Proc. Natl.
proach, which targets specific regions across the genome, can Acad. Sci. 100: 3960–3964.
Canard, B. and Sarfati, R. 1994. DNA polymerase fluorescent substrates
effectively reduce genome size and complexity and, therefore, with reversible 3⬘-tags. Gene 148: 1–6.
the number of sequencing reads needed to produce these com- Carrilho, E. 2000. DNA sequencing by capillary array electrophoresis
prehensive data sets. One example of a directed strategy for hu- and microfabricated array systems. Electrophoresis 21: 55–65.
Carrilho, E., Ruiz-Martinez, M.C., Berka, J., Smirnov, I., Goetzinger, W.,
man resequencing could be the application of the CRT method to Miller, A.W., Brady, D., and Karger, B.L. 1996. Rapid DNA
5⬘→3⬘ synthesized high-density oligonucleotide arrays (Albert et sequencing of more than 1000 bases per run by capillary
al. 2003) by relying on the reference sequence as anchor points electrophoresis using replaceable linear polyacrylamide solutions.
Anal. Chem. 68: 3305–3313.
along the genome. The careful selection of unique and functional Chaisson, M., Pevzner, P., and Tang, H. 2004. Fragment assembly with
priming sites would represent an oligonucleotide tiling path short reads. Bioinformatics 20: 2067–2074.
across the genome. Priming CRT reactions from these anchor Chaulk, S. and MacMillan, A. 1998. Caged RNA: Photo-control of a
ribozyme reaction. Nucleic Acids Res. 26: 3173–3178.
points and sequencing to adjacent priming sites would provide Collins, F. and Galas, D. 1993. A new five-year plan for the U.S. Human
contiguous coverage of the targeted regions of interest. CRT reads Genome Project. Science 262: 43–46.
could then be aligned to the known positions along the reference Collins, F.S., Guyer, M.S., and Chakravarti, A. 1997. Variations on a
theme: Cataloging human DNA sequence variation. Science
genome in a straightforward manner. This approach could also
278: 1580–1581.
be used for mapping sequence reads to related genomes for com- Collins, F.S., Patrinos, A., Jordon, E., Chakravarti, A., Gesteland, R.,
parative genomics studies. Alignment of random reads could be Walters, L., and Members of the DOE and NIH Planning Groups.
performed using conventional assembly algorithms, guided by 1998. New goals for the U.S. Human Genome Project: 1998–2003.
Science 282: 682–689.
the reference sequence, to produce contiguous DNA sequence Collins, F.S., Green, E.D., Guttmacher, A.E., and Guyer, M.S. 2003. A
information. vision for the future of genomics research. Nature 422: 835–847.
Although in its infancy, the potential for these emerging Crawford, D.C. and Nickerson, D.A. 2005. Definition and clinical
importance of haplotypes. Annu. Rev. Med. 56: 303–320.
sequencing strategies to deliver next-generation technologies Culbertson, C.T., Jacobson, S.C., and Ramsey, J.M. 1998. Dispersion
looks promising. Improvements in speed, efficiency, throughput, sources for compact geometries on microchips. Anal. Chem.
and sensitivity will all contribute to a reduction in cost over the 70: 3781–3789.
Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., and Lander, E.S.
next several years. The timing of these strategies coincides with 2001. High-resolution haplotype structure in the human genome.
an increasing demand for resequencing capacity, which will pro- Nat. Genet. 29: 229–237.
vide valuable insight into the role of specific sequence variation The ENCODE Project Consortium. 2004. The ENCODE (ENCyclopedia
Of DNA Elements) Project. Science 306: 636–640.
with common disease. Integration of multidisciplinary technolo- Entz, P., Toliat, M.R., Hampe, J., Valentonyte, R., Jenisch, S., Nürnberg,
gies will translate into practical and affordable sequencing de- P., and Nagy, M. 2005. New strategies for efficient typing of HLA
vices capable of whole-genome analyses. Application of genome class-II loci DQB1 and DRB1 by using pyrosequencing. Tissue
Antigens 65: 67–80.
sequence information to health benefits could revolutionize dis-
Ewing, B. and Green, P. 1998. Base-calling of automated sequencer
ease prevention measures, early disease interventions, and make traces using Phred. II. Error probabilities. Genome Res. 8: 186–194.
the possibility of personalized therapies routine. Ewing, B., Hillier, L., Wendl, M.C., and Green, P. 1998. Base-calling of
automated sequencer traces using Phred. I. Accuracy assessment.
Genome Res. 8: 175–185.
Acknowledgments Foster, M.W. and Sharp, R.R. 2002. Race, ethnicity, and genomics: Social
classifications as proxies of biological heterogeneity. Genome Res.
12: 844–850.
I am extremely grateful to Richard A. Gibbs, Donna M. Muzny, Fox, M.S., Magasanik, B., Signer, E.R., Solomon, F., Gellert, M.F., Haber,
and Sherry Metzker for critical review of the manuscript; Steven J.E., Daniel, J., Koshland, E., and Muschel, L.H. 1990. The Genome
A. Soper for technical discussion; and NHGRI for their support Project: Pro and con. Science 247: 270.
from grants R01 HG003573, R41 HG003072, R41 HG003265, and Gabriel, S.B., Schaffner, S.F., Nguyen, H., Moore, J.M., Roy, J.,
Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., et
R21 HG002443. al. 2002. The structure of haplotype blocks in the human genome.
Science 296: 2225–2229.
Gharizadeha, B., Nordströma, T., Ahmadiana, A., Ronaghi, M., and
References Nyrén, P. 2002. Long-read pyrosequencing using pure
2⬘-deoxyadenosine-5⬘-O⬘-(1-thiotriphosphate) Sp-isomer. Anal.
Alaverdian, L., Alaverdian, S., Bilenko, O., Bogdanov, I., Filippova, E., Biochem. 301: 82–90.
Gavrilov, D., Gorbovitski, B., Gouzman, M., Gudkov, G., Guttman, A. 2002a. Capillary electrophoresis using replaceable gels. U.S.
Cocuzza, A., Jensen, M., and Baumeister, K. 1987. A system for rapid Shi, Y. and Anderson, R.C. 2003. High-resolution single-stranded DNA
DNA sequencing with fluorescent chain-terminating analysis on 4.5 cm plastic electrophoretic microchannels.
dideoxynucleotides. Science 238: 336–341. Electrophoresis 24: 3371–3377.
Quake, S. and Scherer, A. 2000. From micro- to nanofabrication with Simpson, J.W., Ruiz-Martinez, M.C., Mulhern, G.T., Berka, J., Latimer,
soft materials. Science 290: 1536–1540. D.R., Ball, J.A., Rothberg, J.M., and Went, G.T. 2000. Transmission
Risch, N. and Merikangas, K. 1996. The future of genetic studies of imaging spectrograph and microfabricated channel system for DNA
complex human diseases. Science 273: 1516–1517. analysis. Electrophoresis 21: 135–149.
Roberts, L. 1989a. New game plan for genome mapping. Science Singh-Gasson, S., Green, R.D., Yue, Y., Nelson, C., Blattner, F., Sussman,
245: 1438–1440. M.R., and Cerrina, F. 1999. Maskless fabrication of light-directed
———. 1989b. Watson versus Japan. Science 246: 576–578. oligonucleotide microarrays using a digital micromirror array. Nat.
Robertson, J.A. 2003. The $1000 genome: Ethical and legal issues in Biotechnol. 17: 974–978.
whole genome sequencing of individuals. Am. J. Bioeth. 3: W-IF1. Smith, L., Sanders, J., Kaiser, R., Hughes, P., Dodd, C., Connell, C.,
Ronaghi, M. 2000. Improved performance of pyrosequencing using Heiner, C., Kent, S., and Hood, L. 1986. Fluorescence detection in
single-stranded DNA-binding protein. Anal. Biochem. 286: 282–288. automated DNA sequence analysis. Nature 321: 674–679.
———. 2001. Pyrosequencing sheds light on DNA sequencing. Genome Smith, L.M., Kaiser, R.J., Sanders, J.Z., and Hood, L.E. 1987. The
Res. 11: 3–11. synthesis and use of fluorescent oligonucleotides in DNA sequence
Ronaghi, M., Karamohamed, S., Pettersson, B., Uhlén, M., and Nyrén, P. analysis. Methods Enzymol. 155: 260–301.
1996. Real-time DNA sequencing using detection of pyrophosphate Tabor, S. and Richardson, C.C. 1989. Effect of manganese ions on the
release. Anal. Biochem. 242: 84–89. incorporation of dideoxynucleotides by bacteriophage T7 DNA
Ronaghi, M., Uhlén, M., and Nyrén, P. 1998. A sequencing method polymerase and Escherichia coli DNA polymerase I. Proc. Natl. Acad.
based on real-time pyrophosphate. Science 281: 363, 365. Sci. 86: 4076–4080.
Ruiz-Martinez, M.C., Berka, J., Belenkii, A., Foret, F., Miller, A.W., and ———. 1995. A single residue in DNA polymerases of the Escherichia coli
Karger, B.L. 1993. DNA sequencing by capillary electrophoresis with DNA polymerase I family is critical for distinguishing between
replaceable linear polyacrylamide and laser-induced fluorescence deoxy- and dideoxyribonucleotides. Proc. Natl. Acad. Sci.
detection. Anal. Chem. 65: 2851–2858. 92: 6339–6343.
Ruparel, H., Bi, L., Li, Z., Bai, X., Kim, D.H., Turro, N.J., and Ju, J. 2005. Takahashi, S., Murakami, K., Anazawa, T., and Kambara, H. 1994.
Design and synthesis of a 3⬘-O-allyl photocleavable fluorescent Multiple sheath-flow gel capillary-array electrophoresis for
nucleotide as a reversible terminator for DNA sequencing by multicolor fluorescent DNA detection. Anal. Chem. 66: 1021–1026.
synthesis. Proc. Natl. Acad. Sci. 102: 5932–5937. Velculescu, V.E., Zhang, L., Vogelstein, B., and Kinzler, K.W. 1995. Serial
Salas-Solano, O., Carrilho, E., Kotler, L., Miller, A.W., Goetzinger, W., analysis of gene expression. Science 270: 484–487.
Sosic, Z., and Karger, B.L. 1998. Routine DNA Sequencing of 1000 Weber, J.L. and Myers, E.W. 1997. Human whole-genome shotgun
Bases in Less Than One Hour by Capillary Electrophoresis with sequencing. Genome Res. 7: 401–409.
Replaceable Linear Polyacrylamide Solutions. Anal. Chem. Woolley, A.T. and Mathies, R.A. 1995. Ultra-high-speed DNA sequencing
70: 3996–4003. using capillary electrophoresis chips. Anal. Chem. 67: 3676–3680.
Salas-Solano, O., Schmalzing, D., Koutny, L., Buonocore, S., Adourian, Zhang, C.-X. and Manz, A. 2001. Narrow sample channel injectors for
A., Matsudaira, P., and Ehrlich, D. 2000. Optimization of capillary electrophoresis on microchips. Anal. Chem. 73: 2656–2662.
high-performance DNA sequencing on short microfabricated Zhu, L., Stryjewski, W., Lassiter, S., and Soper, S.A. 2003. Fluorescence
electrophoretic devices. Anal. Chem. 72: 3129–3137. multiplexing with time-resolved and spectral discrimination using a
Sanger, F., Nicklen, S., and Coulson, A.R. 1977. DNA sequencing with near-IR detector. Anal. Chem. 75: 2280–2291.
chain-terminating inhibitors. Proc. Natl. Acad. Sci. 74: 5463–5467. Zhu, L., Stryjewski, W.J., and Soper, S.A. 2004. Multiplexed fluorescence
Schmalzing, D., Tsao, N., Koutny, L., Chisholm, D., Srivastava, A., detection in microfabricated devices with both time-resolved and
Adourian, A., Linton, L., McEwan, P., Matsudaira, P., and Ehrlich. D. spectral-discrimination capabilities using near-infrared fluorescence.
1999. Toward real-world sequencing by microdevice electrophoresis. Anal. Biochem. 330: 206–218.
Genome Res. 9: 853–858.
Seo, T.S., Bai, X., Ruparel, H., Li, Z., Turro, N.J., and Ju, J. 2004.
Photocleavable fluorescent nucleotides for DNA sequencing on a
chip constructed by site-specific coupling chemistry. Proc. Natl. Acad. Web site references
Sci. 101: 5488–5493.
Seo, T.S., Bai, X., Kim, D.H., Meng, Q., Shi, S., Ruparel, H., Li, Z., Turro, http://grants.nih.gov/grants/guide/rfa-files/RFA-HG-04-002.html;
N.J., and Ju, J. 2005. Four-color DNA sequencing by synthesis on a RFA-HG-04-002. 2004. $100,000 genome RFA.
chip using photocleavable fluorescent nucleotides. Proc. Natl. Acad. http://grants.nih.gov/grants/guide/rfa-files/RFA-HG-04-003.html;
Sci. 102: 5926–5931. RFA-HG-04-003. 2004. $1000 genome RFA.
Shendure, J., Mitra, R.D., Varma, C., and Church, G.M. 2004. Advanced http://www.agbt.org; Home page for the Advances in Genome Biology
sequencing technologies: Methods and goals. Nat. Rev. Genet. and Technology meeting.
5: 335–344. http://www.solexa.com; Home page for Solexa, Inc.