The New Central Dogma of Molecular Biology: March 2020
The New Central Dogma of Molecular Biology: March 2020
net/publication/340062231
CITATIONS READS
0 4,508
3 authors, including:
Change Tan
University of Missouri
34 PUBLICATIONS 1,063 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Change Tan on 20 March 2020.
1
Abstract
The central dogma of molecular biology formulated by Francis Crick has greatly influenced our
scientific research and perspective of life. However, it fails to adequately account for the
following discoveries: 1) there are different kinds and different levels of biological information, 2)
no information flow is possible without the cooperative function of DNA, RNA, and proteins, 3)
the coding system and the decoding system have to match with each other, and 4) proteins,
with the help of RNAs, control whether and how DNA is replicated and also control the stability,
accessibility, and usability of DNA. Thus, we propose updating the central dogma to the
following: The central principle of molecular biology is biological information coding and
decoding. Specifically, 1) sequence information can be transferred from DNA to DNA, DNA to
RNA, RNA to DNA, RNA to RNA, and RNA to proteins; 2) no information transfer can occur
without the interdependent, integrated, function of matching DNA, RNA, and proteins; 3) there is
no reverse translation, but proteins, with the help of RNAs, determine the maintenance,
propagation, and coding potential of DNA; and 4) information transfer is an active response of a
2
The central dogma of molecular biology that Francis Crick articulated more than sixty years ago
has had a profound impact not only on the study of molecular biology but also on our daily
thinking about life and our approaches to causes and treatments of diseases. The central
dogma underlies the common belief that identifying and manipulating certain genes would
enable us to solve the twin problems of world hunger (e.g., via generating genetically modified
organisms) and dreadful disease (e.g., via personalized medicine). A clear understanding of the
true characteristics of molecular biology is both critical and urgent, because the consequences
of misunderstanding are severe and costly. In this essay, we will briefly review the history of,
and describe the problems with, the current central dogma and will provide a revision that more
In a March 19, 1953 letter, Francis Crick told his 12-year-old son Michael about the discovery he
Jim Watson and I have probably made a most important discovery. We have built a
model for the structure of de-oxy-ribose-nucleic-acid (read it carefully) called D.N.A. for
short. Now the exciting thing is that while there are 4 different bases, we find we can
only put certain pairs of them together… only A with T and G with C…
Now on one chain, as far as we can see, one can have the bases in any order, but if
their order is fixed, then the order on the other chain is also fixed… It is like a code. If
you are given one set of letters you can write down the others.
Now we believe that the D.N.A. is a code. That is, the order of the bases (the letters)
makes one gene different from another gene (just as one page of print is different from
another). You can now see how Nature makes copies of the genes. Because if the two
3
chains unwind into two separate chains, and if each chain then makes another chain
come together on it, then because A always goes with T, and G with C, we shall get two
The discovery was published one month later in Nature [2]. Near the end of their famous one-
page-long article, Watson and Crick observed: “It has not escaped our notice that the specific
pairing we have postulated immediately suggests a possible copying mechanism for the genetic
material.”
Four years later, at a symposium held at University College London, Crick described principles
relating to the transfer of genetic information [3], referred to in his notes and in later writings as
This states that once 'information' has passed into protein it cannot get out again. In
more detail, the transfer of information from nucleic acid to nucleic acid, or from nucleic
acid to protein may be possible, but transfer from protein to protein, or from protein to
sequence, either of bases in the nucleic acid or of amino acid residues in the protein.
In 1970, in response to a challenge against the central dogma based on the discovery of
reverse transcriptases (or RNA-dependent DNA polymerases), Crick explained, reaffirmed, and
The central dogma of molecular biology deals with the detailed residue-by-residue
4
James Watson popularized the central dogma via his widely-used Molecular Biology of the
Gene textbook, now in its eighth edition, with a simple figure (Figure 1) in which
“the arrows indicate the directions proposed for the transfer of genetic information. The
arrow encircling DNA signifies that DNA is the template for its self-replication. The arrow
between DNA and RNA indicates that RNA synthesis (called transcription) is directed by
directed by an RNA template. Most importantly, the last two arrows were presented as
unidirectional; that is, RNA sequences are never determined by protein templates nor
was DNA then imagined ever to be made on RNA templates” ([6], p32 and [7], p33,
Consequently, in most people’s minds, the central dogma describes the unidirectional transfer
It is impossible to measure the exact impact of the discovery of the double helical structure of
DNA and Crick’s formulation of the central dogma and his sequence hypothesis. Crick equated
their discovery of the double helix with the discovery of the “secret of life” [8]. With collectors
recognizing the significance of this remarkable discovery, Crick’s March 19, 1953 letter to his
son Michael was sold for six million dollars in 2013 [1]. Matthew Cobb referred to Crick’s
subsequent 1957 symposium lecture as “one of the most significant lectures in the history of
biology” and as “a lecture that changed how we think” [3]. Horace Judson remarked: Crick’s
lecture “permanently altered the logic of biology” [9]. Eugene Koonin called the central dogma
5
the only exception to the “‘ubiquitous exception’ rule of biology” in which “the only actual rule is
that there are no rules, i.e. exceptions can be found to every ‘fundamental’ principle if one looks
hard enough” ([10], p1). In his molecular biology textbook, Burton Tropp declared that the
central dogma provides the theoretical framework for molecular biology ([11], p22). The double
helix has become the icon of biology, even of science itself. It is currently widely accepted that
given the nucleotide sequence of one strand of DNA, we can write down that of the other, and,
with the genetic codon table at hand, we can spell out the amino acid sequence of the encoded
protein.
Despite the unquestionable influence of the central dogma within both popular understanding
and scientific research, we will discuss below that information encoded in DNA is only part of
the information inside a cell; that no information can flow without the integrated function of
matching DNA, RNA, and proteins; and that whether a DNA sequence encodes for anything,
and if so, what it encodes, depends on what exists inside and outside of the cell. In short,
information coding and decoding are interdependent and are organism and cell type specific.
Some challenges have been raised against the central dogma, first after the discovery of
reverse transcriptases as mentioned above and then after the discovery of prions, proteins with
conformations that can be transmitted onto other proteins that have the same amino acid
sequence as the prions but are folded differently [10, 12-15]. However, these objections mostly
information and what Crick thought could not take place within biological systems. Crick was
very clear about both issues: by “information” he meant “the precise determination of sequence,
6
either of bases in the nucleic acid or of amino acid residues in the protein” [4]. We can refer to
Crick’s “information” as “sequence information”, which refers to the nucleotide sequence in DNA
or RNA and the amino acid sequence in proteins. Therefore, what Crick proposed was
prohibited is the transfer of sequence information from proteins to proteins or from proteins to
nucleic acids.
While Crick’s central dogma remained the prevailing wisdom for decades, more recent
discoveries have challenged his understanding of biological information and what was possible
in biological systems. Specifically, a major assumption underlying Crick’s central dogma, and
where the central dogma has led us astray, is the misconception that the sequence information
of DNA contains all the inheritable substance that determines the phenotype of an organism.
This “gene-centric” or “genetic determinism” view of life has come under increasing criticism
with additional discoveries [16-21]. As mentioned above, for example, questions about
inheritance arose when prions were discovered. After all, there did not seem to be a direct
correlation between the sequence information in DNA and the newly propagated prions. Even
more striking, a prion’s ability to impact the shape of another protein began to create doubts
about Crick’s proposal that information could not be transmitted from protein to protein (although
in the case of prions it should be noted that it is not the precise sequence of the amino acids
which is altered, but the three-dimensional structure of the protein). In addition to prions, other
inheritable epigenetic factors have been discovered, which have further complicated the simple
relationship between genotype and phenotype hypothesized by the central dogma [17, 18, 22].
Five decades of biological research has revealed that there are different kinds of information
inside cells. Koonin distinguished two kinds: (i) digital information, the one-dimensional
sequence information contained in nucleic acids, and (ii) analog information, the three-
dimensional structure of proteins [23]. Koonin’s concept can, and should, be extended since
both nucleic acids and proteins have digital (sequence) information and analog (three-
7
dimensional structure) information. The analog information can further be extended to include
not only the three-dimensional structure of a given protein, but also protein localization, post-
Not only are there different kinds of information, but also there are different levels of information
embedded within DNA and RNA. Protein-coding is only one of these levels. One of the more
remarkable discoveries in recent years has been the discovery that so-called “silent mutations”
can have an impact on cellular function. Previously, under the prevailing view enshrined in the
central dogma that the only role of DNA was to code for a particular protein, it was long
assumed that any codon for a particular amino acid was equivalent to any other codon for the
same amino acid. Thus, for example, because GGA and GGG both code for glycine, it has been
assumed that these codons are equivalent and that a mutation from, say, “A” in the third
position to “G” in the third position would have no possible consequence in the organism—it
would be a “silent” mutation, invisible to internal workings of the organism, to its outward
appearance, and to the hand of natural selection. This inherent “redundancy” of the genetic
code has been a staple of biology education for decades, with much ink spilt debating the
reasons for such redundancy and much speculation about its potential role in neutral evolution.
While it is far too early to suggest that each codon within the genetic code performs a unique
role (and variations may exist between different organisms), it came as something of a shock to
the received wisdom that some of the “silent mutations” were not, in fact, so silent after all.
Recent research has revealed multiple instances in which recoding an amino acid with codons
that code for the same amino acid but which have different nucleotide sequences can disrupt
the function of a segment of DNA or RNA at a level other than the mere designation of the
particular translated amino acids. Such changes, previously thought to be completely neutral,
8
can in fact be lethal, as several of the laboratories involved in yeast chromosome engineering
independent but distance-dependent. For instance, DNA between the upstream and the core
promoter elements of the human ribosomal RNA gene appears to tolerate nucleobase
substitutions but not significant alterations of its length. Researchers have observed that a
removal of 44 bp (base pairs) between the two promoter elements reduces the promoter
strength by 90% compared to the wild-type, and an addition of 49 bp reduces promoter strength
If we limit the discussion of information in living systems to just sequence (digital) information as
Crick did when articulating the central dogma, then neither the reverse transcriptases nor the
prions present real challenges to Crick’s central dogma [5, 23]. We could still argue, for
example, that when DNA is being replicated accurately, “we shall get two copies where we had
one before,” as Crick said. And when a gene is translated, once we know the protein-coding
region (i.e., the open reading frame) of its mature RNA, we can spell out the amino acid
sequence of its encoded protein. [Let’s discuss splicing and concatenation on our next call.]
Such a limited view of information might allow us to maintain the perception that the central
dogma is still true, but it would be true only in an increasingly-limited way and with an
onto the proud tradition of the central dogma, but only at the expense of accuracy and
relevancy.
More importantly, as is often the case, however, the devil is in the details. And an important
detail is in those conditional words: “when DNA is being replicated accurately” and “when a
9
gene is translated, once we know the protein-coding region (i.e., the open reading frame) of its
mature RNA.” These are what the central dogma, in Crick’s words, “says nothing about” ([5],
p562):
It says nothing about what the machinery of transfer is made of, and in particular nothing
about errors.
It says nothing about control mechanisms—that is, about the rate at which the processes
work.
What the central dogma “says nothing about” matters. Countless experiments have shown that
no information can be transferred without the coordinate interaction of DNA, RNA and proteins,
as discussed below. More importantly, the meaning and the usefulness of a code depend on the
decoding systems. The situation is very similar to human languages. The four-letter word “gift”
means a present in English, while it means a poison in German. “Your room is on the first floor”
points to very different location in England (one level above the ground level) than in the United
States (the ground level), even though both countries speak English.
The double helix is a double-edged sword; it makes DNA self-replication impossible. In part, this
is because DNA molecules are very long and the two strands are tightly wrapped around each
other. For example, an E. coli genomic DNA is 4.6 million base pairs long and separation of the
two strands, even a few base pairs, needs a protein enzyme, a helicase, and ATP. Furthermore,
when helicase unwinds the two strands, the DNA ahead of the opening will become over-wound
and needs to be untangled by another protein enzyme, a topoisomerase. Left unabated, the
10
torsion resulted from the overwinding would quickly stop the ability of DNA or RNA polymerases
to continue down the DNA strand during DNA replication or transcription, respectively. Indeed,
more than 25 different proteins are required to replicate the E. coli genomic DNA [30].
DNA “replicates accurately only in a complete cell containing all the objective functionality that
enable cells to be alive,” relying “on an army of specialized proteins and on the lipid
membranous structures for which there are no DNA sequences. Outside a living cell, DNA is
inert, dead,” observed Denis Noble. Noble went on to note: “DNA is a passive cause. As
Watson said to Crick when they first made their momentous discovery of the double helix:
‘Francis, it’s a template’ … Active causation lies at the level of the cell, or of multicellular
DNA replication is tightly regulated. The cell’s ability to sense its internal and external
conditions, the decision about whether, when, how much (part of the genome or the whole
genome, number of copies to be made) and how accurately to replicate DNA (for example,
whether high fidelity polymerases or error-prone polymerases will be used during replication and
the extent to which replication errors will be corrected by the DNA repair system [31]), as well as
the very execution of DNA replication, all depend on the integrated functions of numerous RNAs
It is worth pointing out that the genome of a cell is much more dynamic than expected—certainly
far more dynamic than the central dogma had led us to believe. We are learning that there are
multiple ways cells can manipulate their DNA contents. For example, Zhang and colleagues
deleted all 100 copies of endogenous yeast ribosomal DNA (rDNA), replaced them with a DNA
fragment containing 1.2 or 2 copies of the rDNA unit carrying a hygromycin B resistance
11
mutation, and cultured the resulting strains in medium containing increasing amounts of
hygromycin B [27]. After two weeks, a new rDNA cluster had been regenerated and the copy
number was comparable to that of the wild type. This demonstrates that the cells have a
mechanism to detect the copy number of rDNA and maintain the correct copy number. In yeast,
this can be accomplished by the upstream activating factor (UAF) for RNA polymerase I. UAF
was recently found to ensure rDNA production not only by rDNA transcription activation but also
by its copy-number maintenance [32]. Drosophila melanogaster has also been found to be able
to adjust its rDNA copy numbers [33, 34]. Van Hofwegen and colleagues found that aerobic
citrate-utilizing E. coli (Cit+) could be rapidly and repeatedly produced when wild type E. coli
was cultured in a minimal medium supplemented with citrate, resulting from an active internal
cellular process that generated additional citT and dctA loci followed by rearrangement of the
DNA [35]. This enables the E. coli, in the presence of oxygen, to synthesize enzymes that are
needed for citrate metabolism and that are normally synthesized only in the absence of oxygen.
Strikingly, E. coli cells that lack functional citT or dctA were not able to respond to the same
Many studies of the molecular mechanisms of mutation have now revealed that mutation is
cells/organisms are stressed (reviewed in [31] and [36]). Furthermore, proteins, with the help of
RNAs, actively survey DNA and either maintain the DNA intact or orchestrate needed
alterations–even its total degradation (such as in the case of programmed cell death or in the
In a nutshell, it is true that DNA is not synthesized using a protein as a template (i.e., no reverse
translation), so that proteins presumably would have played no role in determining the
nucleotide sequence of the first strand of DNA in the first cell. However, proteins play a central
role in determining the DNA contents of the descendants of that cell, as well as the coding
12
potential and usefulness of the DNA as discussed below. As we will see, every protein functions
in the context of other proteins, RNAs, and, in fact, the whole cellular context of a cell.
The cell, including all its RNAs and proteins, also determines whether a segment of DNA will be
used as a template to make an RNA molecule and whether an RNA will be used to direct
protein synthesis, based on the internal and external conditions detected by the cell. In other
words, whether a segment of DNA will be recognized as a gene and whether a gene will be
used to generate a protein depends, among other factors, on the RNAs and proteins present in
the cell at that moment. It often also depends on what is present in the surroundings of the cell.
Indeed, it is the overall cell (including the RNAs and proteins inside the cell and in its
membranes) that determines whether a segment of DNA is a gene or not a gene. The common
tendency to refer to a given segment of DNA as a “gene,” because it happens to code for a
protein in a specific instance in one particular organism, belies a simplistic view of the richness
proteins. Human concepts and opinions, however, do not seem to matter to the organisms
being studied. The study of molecular biology would be much simpler (although no doubt less
interesting) if the identification of a particular segment of DNA could tell us all we needed to
know about what protein would be produced (if any) and when and to what extent it would be
produced. In reality, regulation of gene expression (i.e., transcription and translation) accounts
The studies of the ENCODE Consortium and many others have uncovered that there are more
genes that do not encode any protein than those that do in the human genome [37-39](Figure
2). Not only one gene can code for multiple RNA transcripts, resulting from different
transcription starting sites or stopping sites or from alternative splicing, that may code for
13
different proteins, but genes often overlap with each other. These studies have unveiled
unexpected challenges to delineate genes. In Mudge and colleagues’ words, genes are having
Over one hundred years after the basic rules of heredity were established, the gene is
undergoing an identity crisis. Indeed the question ‘‘what is a gene?’’ has been much
debated in recent years …In a scientific context, this question concerns the way in which
information is stored in the genome. Over the 20th Century, the biological definition of
the gene evolved from ‘‘the site of a hereditable trait’’ to ‘‘the genomic region from where
the mRNA that encodes a protein is transcribed,’’ i.e., the ‘‘central dogma’’ of molecular
biology… Gerstein and colleagues recently proposed that ‘‘a gene is a union of genomic
The key point here is that the word ‘‘gene’’ no longer designates a unit of functionality.
Instead it is used as a collective term for a group of products, i.e., transcripts. From our
perspective, there are vital questions concealed within the ‘‘what is a gene?’’ debate. For
example: what is the true size of the transcriptome and what proportion of this
14
Figure 2: Gene contents of the human genome according to GENCODE version 33. The numbers represent the
number and percentage of genes in the corresponding category and are from GENCODE at
located within the genes’ untranslated regions and introns and, thus, do not code any amino acids of proteins.
In short, molecular biology has uncovered a richness and a complex array of components and
systems that underlie the production of proteins from DNA. Rather than a simple, inevitable flow
transferred, without the interdependent, integrated, functions of the DNA, RNAs, and proteins of
the cell. DNA is somewhat like a recipe book. Its value and usefulness depend on the user. It is
not a book to be read from cover to cover, conveying the same information to every reader. The
chef can choose which recipe to use and modify the recipe as needed.
15
The information funnels
organism is that it fails to account for much of the genetic information, especially in complicated
organisms like humans. This is because not all regions of genomic DNA encode genes, not all
genes are protein-coding, and not all regions of a protein-coding gene code for amino acids of
that protein, as discussed above. In addition, it is now known that many functional non-protein-
coding genes exist, the most abundant and most familiar being ribosomal RNA and transfer
RNA genes. Figure 3 lists some of the changes that take place and some of the processes
Figure 3. A schematic view of information transfer from DNA to RNA to proteins. Only the green-boxed regions
Another way to view the standard transcription and translation decoding process is to examine
in more detail the informational transformation that takes place via the information processing
systems of the cell. In Table 1 below we list the basic steps in the process of protein production
16
and indicate the aspects of information loss and information gain that apply. Note that by
“information loss” we are not suggesting that information is somehow irretrievably lost within the
cell. Rather, in the narrow context of DNA sequence information, as discussed by Crick in
formulating the central dogma, the sequence in the next stage of the process (e.g., an mRNA vs
the underlying DNA sequence; or an amino acid sequence vs the underlying mRNA) is missing
information, in that it does not allow for reverse translation of the earlier sequence from the later
one. While this can be termed “information loss,” and we have followed this convention in Table
1 below, in fact what is occurring during protein production is that additional information is being
brought to bear by the cell from outside the relevant sequence in order to complete the next
stage of the production process. For example, while it is true that we cannot start with a protein
and recreate the full genomic sequence underlying that protein, as reflected above in Figure 3,
the reason is not so much that information has been “lost” as is often described, but rather that
additional information has been brought to bear by the cell in order to read, decode, and act
upon that underlying genomic sequence in order to produce what is needed by the cell, in the
right quantity, at the right time, and in the relevant context, as discussed throughout this paper.
• 5’ untranslated regions
• 3’ untranslated regions
Translation • Non-coding (or non-protein-
coding) RNA
17
• Non-sequence information
• Cleavage of signal peptides (folding, localization,
Protein processing
• Deleting of other regions that formation of complexes)
or maturation • Posttranslational modification
are not in the mature proteins
Numerous genome-scale studies have revealed that significant amounts of DNA exist that do
not code for proteins, although they may be involved in protein production and protein function.
For example, it is estimated that only 1.1-1.5% of the human genome encodes proteins [38, 42,
43]. Even though recent researches have shown that protein-coding regions are much more
pervasive than previously thought (reviewed in [44]), there are many more non-protein-coding
The non-protein-coding regions can be essential to the viability or reproduction of the organism.
For instance, although they used to be widely regarded as junk DNA, introns can be vital for an
organism. Deletions of certain introns are lethal for yeast [24, 27] and humans [45], and failures
in sex-specific alternative intron splicing prevent proper male and female differentiation and
elements identified by the ENCODE project and of single nucleotide polymorphisms (SNPs)
associated with disease by genome-wide association studies (GWAS) are localized within the
Thus, even if every single one of the proteins within an organism could be reverse-translated
back into RNA or DNA, residue by residue, it would still represent only a portion of the DNA
necessary for the organism. At the very least, we would not be able to recreate the nucleotide
sequence of those portions of the DNA that do not code for proteins, including the untranslated
regions of mRNA, the introns, the non-protein-coding RNA genes, and the intergenic regions.
18
Based on current estimates, these non-protein-coding regions account for more than 98% of the
human genome. Therefore, the specific protein-coding DNA sequence information that Crick
focused on in formulating his central dogma, likely represents only a minority of the information
Furthermore, due to codon redundancy, a reverse transfer of information from amino acid
sequences to DNA sequences is often not a one-to-one relationship. Consequently, the original
nucleotide sequences of even the protein-coding regions may not be fully recoverable starting
from the amino acid sequences of the coded proteins in most cases, resulting in a potential loss
earlier, choosing the incorrect codons, even without altering the amino acid sequence, can be
lethal [24-27]. This may be due to a disruption of the original non-protein-coding level
central dogma, we have an information funnel with irreversible loss of sequence information
19
Figure 4. The information funnels. (left) Loss of DNA-sequence-dependent information from DNA to RNA to
proteins during transcription and translation. (right) Gain of DNA-sequence-independent information from DNA to
On the flip side, we also have a reverted information funnel in which new information, even
additional sequence information, is brought to bear during RNA processing and protein
processing (Figure 4 right). For example, alternative splicing of intron-containing RNAs can
produce new combinations of RNA segments, resulting in RNA molecules that encode different
proteins. In addition, RNA editing can dramatically change the sequence of an RNA molecule,
and, hence, the amino acid sequence of the protein encoded by the corresponding DNA [49].
IV. Problems of the Necessity of Matching the Coding and Decoding Systems
Another key issue that has often been ignored by the scientific community with its focus on the
sequence information transfer articulated by the central dogma, is that the coding and the
decoding systems need to match each other. That is, the DNA to be replicated and the
20
molecular machines that replicate the DNA have to match with each other; and the genes to be
transcribed and translated and the molecular machines that transcribe and translate the genes
Species match
For example, Craig Venter’s team synthesized the entire one-megabase (Mb) genome of
Mycoplasma mycoides in yeast, but the yeast cannot create Mycoplasma mycoides cells using
the cloned bacterial genome (Figure 5). The genes encoded in the cloned genome need to be
transcribed and translated using the molecular machines from Mycoplasma capricolum, a cell
21
The inability of a yeast cell to decode the bacterial M. mycoides genetic code is a consequence
and translation [50, 51]. Figure 6 provides a comparison of DNA replication initiation in
What is striking is not so much that the number of proteins involved are different (as important
as that is) but that the identity of these proteins is different. The proteins used for bacterial DNA
replication are bacteria specific; they do not have known homologs in eukaryotes. Likewise, the
proteins used for eukaryotic DNA replication are eukaryotes specific; they do not have known
homologs in bacteria. Due to the difference between bacterial and eukaryotic DNA replication
machinery, a yeast origin of replication had to be artificially incorporated into the bacterial
22
Figure 6. A comparison of DNA replication initiation in bacteria E. coli and eukaryote yeast S. cerevisiae. A:
Initiation in E. coli. B: Initiation in yeast. Note that the proteins involved are unique to either E. coli or yeast. From
Figure 3 of [50].
The transcription and translation machinery of bacteria and eukaryotes are also very different.
For a piece of DNA to be recognized as a gene and be transcribed by a bacterial cell, that DNA
terminator. Yet for that same stretch of DNA to be recognized as a gene and be transcribed by a
eukaryotic cell, that DNA segment would have to be sandwiched between a eukaryotic promoter
and a eukaryotic transcription terminator. Furthermore, the same RNA transcript may encode
23
totally unrelated proteins by a bacterial cell and a eukaryotic cell even if it does encode a protein
(Figure 7).
Figure 7. The same RNA may end up with two different proteins in bacteria and eukaryotes. Blue box: Shine-
Dalgarno sequence; green box: translation initiation site; red box: translation stopping site. Top: The hypothetical
mRNA would be used to code for a protein with amino acids MFIGA, based on the mechanism of translation of
bacteria like E. coli. The Shine-Dalgarno sequence is important for translation initiation in bacteria. It hybridizes to an
anti-Shine-Dalgarno sequence, which is reverse and complementary to the Shine-Dalgarno sequence, in the 16S
rRNA. Bacteria use the AUG that is a few nucleotides downstream of the Shine-Dalgarno sequence as the translation
initiation site. Bottom: The same hypothetical mRNA would be used to code for a protein with amino acids MAKEV,
based on the mechanism of translation of eukaryotes like yeast. Eukaryotes normally use the first AUG from the 5’
Interestingly, not only would a eukaryote cell have trouble decoding a bacterial genetic code
(i.e., reading, interpreting, and executing the instructions encoded in a bacterial genome), but
even a bacterial cell may not be able to read the genetic instructions of another bacterial cell
24
either. For instance, cloning the whole 3.5-Mb genome1 of the photosynthetic bacterium
Synechocystis PCC6803 into the 4.2-Mb genome of the mesophilic bacterium Bacillus subtilis
did not enable Bacillus subtilis to perform photosynthesis. The resultant cells could not even be
cultured in the medium culturing Synechocystis, indicating that the added Synechocystis
genome was not able to be used successfully by the host cell [52], despite the clear benefit the
added genome might have provided in that medium. Although from our outside perspective we
might be tempted to think that the added Synechocystis genome contained all the information
necessary to enable the host cells to thrive in the culturing medium, the extensive sequence
information contained in the Synechocystis genome seems to have been unrecognizable and of
In addition to the species-specific match required for proper coding and decoding, a special
match is sometimes necessary within a species, such as a match between the stages (or cell
cycle) of a cell or the match that exists between the type of a cell and its genome and the
decoding molecules of the genome. In reality, we might think of the decoding system as the
whole cell, including its many RNAs and proteins. This is because it is the overall function of the
cell as a whole, in the context of a specific tissue and/or environment, that determines whether
the genomic DNA will be replicated in the first place and, if so, whether only part of genome or
the whole genome will be replicated, whether error-prone DNA polymerases will be allowed to
participate in the replication (as in a stress-response situation) or only DNA polymerases with
high replication fidelity, and what parts of the genomic DNA will be transcribed or translated.
This cell-type and cell-status match occurs every day in every living organism, although we
1
The two ribosomal RNA genes of Synechocystis were not included because they are toxic to Bacillus.
The toxicity may be due to the fact that they are close enough to the ribosomal RNA genes of Bacillus to
be transcribed but different enough that it would mess up the translational machinery of the host cell.
25
normally do not think of it that way. Evidence now suggests that mismatched tissue-specific or
cell-specific transcription and translation is an important contributing factor for many diseases,
including cancer and diabetes. Imagine what would happen if muscle fibers were made in a
nerve cell instead of a muscle cell. Or consider the pain of having bones grow in a place where
In summary, the presence of a particular DNA gene sequence does not guarantee the making
of an RNA transcript. In fact, it is vital that this is so, since unregulated gene expression not only
wastes resources, but in some cases can be deadly to the survival or reproduction of the
organism. In addition, the presence of an RNA transcript does not guarantee the making of a
protein. Thus, knowledge of the sequence of a genome does not enable one to predict the
transcriptome (all the RNAs in a cell) or the proteome (all the proteins in a cell). Both a cell’s
transcriptome and proteome can change based on what type of cell it is, the status of the cell,
and what is present in the environment. The only way of precisely knowing the transcriptome
and the proteome of a cell is to independently sequence them. In addition, it is well known that
while it is now relatively easy to determine the raw genome sequence of an organism, to
annotate the genome (i.e., to determine which parts of the genome actually encode genes) is
quite challenging.
In addition to the important match that must exist between a genome and the proper species of
organism, as well as the additional correlation that must exist within the same species between
its genome and the relevant stages of cell development and the various molecules within the
cell, an organism’s coding-decoding system must also be coordinated to operate properly within
a given environment. This includes an organism’s effect on and actions within an environment,
based on what molecules are inside the organism and those that are embedded within its
26
membranes—in contact with the outside environment. One well-known example is the
observation that in a culture medium with both glucose and lactose E. coli will not make
galactosidase, a protein needed for lactose metabolism, until all the glucose is used up. When
there is no longer any lactose in the medium, galactosidase will not be generated either under
normal conditions (although it may be generated, but not be usable by the cell, if the lac-
Examples of an organism’s effects on the environment include niche construction and the
generation of waste (note that the waste of one organism may be nutrients for another).
Examples of environmental factors that an organism may have to deal with include nutrients,
temperature, pH, and other organisms, such as pathogens, predators, and siblings.
More than 60 years ago, at the dawn of the genetic age, Francis Crick proposed a model of
the fundamental source of information in the cell—constituting the source of a one-way flow of
information from DNA to RNA to proteins. For decades, Crick’s central dogma has influenced
biology research and has impacted views of how information processing occurs in the cell and
Yet despite Crick’s remarkable insights and contributions to modern biology, the central dogma
is inadequate to account for the different kinds and levels of information inside a cell, or for the
requirement and complexity of information transfer. Although perhaps not intended, the resulting
reductionist, static, DNA-centric view of life has become a hindrance to our understanding of life.
The exceptions and contradictions between the central dogma and the data have finally reached
the point where they can no longer be dismissed as occasional anomalies or be explained away
27
with definitional clarifications of the central dogma. The central dogma’s underlying assumption
of a one-way flow—with DNA protein-coding sequence information as the ultimate source and
arbiter of information processing within the cell—can no longer be considered a viable way of
The central principle of molecular biology is biological information coding and decoding,
i.e., the detailed transfer of heritable, cell-type and cell-status specific, environment-
be transferred from DNA to DNA, DNA to RNA, RNA to DNA, RNA to RNA, and RNA to
function of matching DNA, RNA, and proteins; 3) there is no reverse translation, but
proteins, with the help of RNAs, determine the maintenance, propagation, and coding
potential of DNA; and 4) information transfer is an active response of a cell to its internal
“Cell type and cell status” includes a cell’s genome, epigenome, proteome, metabolome,
based on the organism’s internal features and capabilities as discussed above (Figure 8, orange
arrows).
“Sequence” refers to the nucleotide sequences of DNA and RNAs and the amino acid
sequences of proteins. Sequence information does not include the function, conformation (or
28
structure), localization, post-translational modification, networks (binding partners), or any other
non-sequence information.
“Aperiodic” refers to the fact that DNA (or RNA or protein) sequence is irregular and cannot be
Figure 8. A schematic view of the new central dogma. Note the cell-type- and cell-status-specific, environment-
responsive, interdependence of DNA, RNA, and proteins. Black arrows: sequence information transfer. Purple arrows:
kinds of molecules needed for the corresponding information transfer. Orange arrows: interactions between a cell and
its environment.
Looking Forward
The current central dogma has caused a significant amount of misunderstanding and has
become a hindrance to our understanding of life. The purpose of this essay has been to
stimulate critical thinking about the current central dogma and to propose a new way of
29
understanding the information processes at work in living systems. As future research continues
to expand our understanding of these processes, we look forward to learning and appreciating
References
1. Lee, J.J. Read Francis Crick’s $6 Million Letter to Son Describing DNA. 2013 [cited 2019
12/2/2019]; Available from: https://blog.nationalgeographic.org/2013/04/11/read-francis-
cricks-6-million-letter-to-son-describing-dna/.
2. Watson, J.D. and F.H. Crick, Molecular structure of nucleic acids; a structure for deoxyribose
nucleic acid. Nature, 1953. 171(4356): p. 737-8.
3. Cobb, M., 60 years ago, Francis Crick changed the logic of biology. PLoS Biol, 2017. 15(9): p.
e2003243.
4. Crick, F.H., On protein synthesis. Symp Soc Exp Biol, 1958. 12: p. 138-63.
5. Crick, F., Central dogma of molecular biology. Nature, 1970. 227(5258): p. 561-3.
6. Watson, J.D., et al., Molecular Biology of the Gene. 6th ed. 2008, Cold Spring Harbor, N.Y.: Cold
Spring Harbor Laboratory Press.
7. Watson, J.D., et al., Molecular Biology of the Gene. 7th ed. 2013: Pearson.
8. BBC, 1953: Scientists describe 'secret of life', in On This Day. BBC.
9. Judson, H.F., The Eighth Day of Creation: Makers of the Revolution in Biology, Commemorative
Edition. 1996: Cold Spring Harbor Laboratory Press.
10. Koonin, E.V., Does the central dogma still stand? Biology Direct, 2012. 7.
11. Tropp, B.E., Molecular Biology: Genes to Proteins. 4th ed. 2012, Sudbury, MA 01776: Jones &
Bartlett Learning, LLC.
12. Peedicayil, J., DNA methylation and the central dogma of molecular biology. Med Hypotheses,
2005. 64(6): p. 1243-4.
13. Bussard, A.E., A scientific revolution? The prion anomaly may challenge the central dogma of
molecular biology. EMBO Rep, 2005. 6(8): p. 691-4.
14. Biro, J.C., Seven fundamental, unsolved questions in molecular biology. Cooperative storage and
bi-directional transfer of biological information by nucleic acids and proteins: an alternative to
"central dogma". Med Hypotheses, 2004. 63(6): p. 951-62.
15. anonymous, Central dogma reversed. Nature, 1970. 226(5252): p. 1198-9.
16. de Lorenzo, V., From the selfish gene to selfish metabolism: revisiting the central dogma.
Bioessays, 2014. 36(3): p. 226-35.
17. Noble, D., Central Dogma or Central Debate? Physiology (Bethesda), 2018. 33(4): p. 246-249.
18. Shapiro, J.A., Revisiting the central dogma in the 21st century. Ann N Y Acad Sci, 2009. 1178: p.
6-28.
19. Noble, D., Differential and integral views of genetics in computational systems biology. Interface
Focus, 2011. 1(1): p. 7-15.
20. Noble, D., Evolution viewed from physics, physiology and medicine. Interface Focus, 2017. 7(5).
21. Noble, D., Evolution beyond neo-Darwinism: a new conceptual framework. J Exp Biol, 2015.
218(Pt 1): p. 7-13.
22. Liu, J., et al., N (6)-methyladenosine of chromosome-associated regulatory RNA regulates
chromatin state and transcription. Science, 2020.
30
23. Koonin, E.V., Why the Central Dogma: on the nature of the great biological exclusion principle.
Biol Direct, 2015. 10: p. 52.
24. Mitchell, L.A., et al., Synthesis, debugging, and effects of synthetic chromosome consolidation:
synVI and beyond. Science, 2017. 355(6329).
25. Shen, Y., et al., Deep functional analysis of synII, a 770-kilobase synthetic yeast chromosome.
Science, 2017. 355(6329).
26. Wu, Y., et al., Bug mapping and fitness testing of chemically synthesized chromosome X. Science,
2017. 355(6329).
27. Zhang, W., et al., Engineering the ribosomal DNA in a megabase synthetic chromosome. Science,
2017. 355(6329).
28. Haltiner, M.M., S.T. Smale, and R. Tjian, Two distinct promoter elements in the human rRNA
gene identified by linker scanning mutagenesis. Mol Cell Biol, 1986. 6(1): p. 227-35.
29. Learned, R.M., et al., Human rRNA transcription is modulated by the coordinate binding of two
factors to an upstream control element. Cell, 1986. 45(6): p. 847-57.
30. Su'etsugu, M., et al., Exponential propagation of large circular DNA by reconstitution of a
chromosome-replication cycle. Nucleic Acids Res, 2017. 45(20): p. 11525-11534.
31. Fitzgerald, D.M. and S.M. Rosenberg, What is mutation? A chapter in the series: How microbes
"jeopardize" the modern synthesis. PLoS Genet, 2019. 15(4): p. e1007995.
32. Iida, T. and T. Kobayashi, RNA Polymerase I Activators Count and Adjust Ribosomal RNA Gene
Copy Number. Mol Cell, 2019. 73(4): p. 645-654 e13.
33. Lu, K.L., et al., Transgenerational dynamics of rDNA copy number in Drosophila male germline
stem cells. Elife, 2018. 7.
34. Nelson, J.O., et al., Mechanisms of rDNA Copy Number Maintenance. Trends Genet, 2019.
35(10): p. 734-742.
35. Van Hofwegen, D.J., C.J. Hovde, and S.A. Minnich, Rapid Evolution of Citrate Utilization by
Escherichia coli by Direct Selection Requires citT and dctA. J Bacteriol, 2016. 198(7): p. 1022-34.
36. Gottesman, S., Trouble is coming: Signaling pathways that regulate general stress responses in
bacteria. J Biol Chem, 2019. 294(31): p. 11685-11700.
37. Harrow, J., et al., GENCODE: the reference human genome annotation for The ENCODE Project.
Genome Res, 2012. 22(9): p. 1760-74.
38. Consortium, E.P., An integrated encyclopedia of DNA elements in the human genome. Nature,
2012. 489(7414): p. 57-74.
39. Consortium, E.P., et al., Identification and analysis of functional elements in 1% of the human
genome by the ENCODE pilot project. Nature, 2007. 447(7146): p. 799-816.
40. Gerstein, M.B., et al., What is a gene, post-ENCODE? History and updated definition. Genome
Res, 2007. 17(6): p. 669-81.
41. Mudge, J.M., A. Frankish, and J. Harrow, Functional transcriptomics in the post-ENCODE era.
Genome Res, 2013. 23(12): p. 1961-73.
42. Lander, E.S., et al., Initial sequencing and analysis of the human genome. Nature, 2001.
409(6822): p. 860-921.
43. Venter, J.C., et al., The sequence of the human genome. Science, 2001. 291(5507): p. 1304-51.
44. Ruiz-Orera, J. and M.M. Alba, Translation of Small Open Reading Frames: Roles in Regulation
and Evolutionary Innovation. Trends Genet, 2019. 35(3): p. 186-198.
45. Szafranski, P., et al., Novel FOXF1 Deep Intronic Deletion Causes Lethal Lung Developmental
Disorder, Alveolar Capillary Dysplasia with Misalignment of Pulmonary Veins. Hum Mutat, 2013.
34(11): p. 1467-71.
31
46. Sosnowski, B.A., J.M. Belote, and M. McKeown, Sex-specific alternative splicing of RNA from the
transformer gene results from sequence-dependent splice site blockage. Cell, 1989. 58(3): p. 449-
59.
47. Venables, J.P., J. Tazi, and F. Juge, Regulated functional alternative splicing in Drosophila. Nucleic
Acids Research, 2012. 40(1): p. 1-10.
48. Salz, H.K. and J.W. Erickson, Sex determination in Drosophila The view from the top. Fly, 2010.
4(1): p. 60-70.
49. Knoop, V., When you can't trust the DNA: RNA editing changes transcript sequences. Cell Mol
Life Sci, 2011. 68(4): p. 567-86.
50. Tan, C. and J.P. Tomkins, Information Processing Differences Between Bacteria and Eukarya—
Implications for the Myth of Eukaryogenesis. Answers Research Journal, 2015. 8: p. 143–162.
51. Tan, C. and J.P. Tomkins, Information Processing Differences Between Archaea and Eukaraya—
Implications for Homologs and the Myth of Eukaryogenesis. Answers Research Journal, 2015. 8:
p. 121–141.
52. Itaya, M., et al., Combining two genomes in one cell: stable cloning of the Synechocystis PCC6803
genome in the Bacillus subtilis 168 genome. Proc Natl Acad Sci U S A, 2005. 102(44): p. 15971-6.
32