0% found this document useful (0 votes)
78 views33 pages

The New Central Dogma of Molecular Biology: March 2020

The document proposes updating the central dogma of molecular biology formulated by Francis Crick. It argues that Crick's original central dogma fails to account for different kinds of biological information, the interdependent functions of DNA, RNA and proteins in information transfer, the need for matching coding and decoding systems, and how proteins control DNA replication and usage. The proposed update states that biological information coding and decoding is the central principle, and that information can be transferred between DNA, RNA and proteins in various directions, but requires the integrated functions of these molecules and proteins determine DNA maintenance and propagation.

Uploaded by

Angelina Koban
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views33 pages

The New Central Dogma of Molecular Biology: March 2020

The document proposes updating the central dogma of molecular biology formulated by Francis Crick. It argues that Crick's original central dogma fails to account for different kinds of biological information, the interdependent functions of DNA, RNA and proteins in information transfer, the need for matching coding and decoding systems, and how proteins control DNA replication and usage. The proposed update states that biological information coding and decoding is the central principle, and that information can be transferred between DNA, RNA and proteins in various directions, but requires the integrated functions of these molecules and proteins determine DNA maintenance and propagation.

Uploaded by

Angelina Koban
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/340062231

The New Central Dogma of Molecular Biology

Preprint · March 2020

CITATIONS READS
0 4,508

3 authors, including:

Change Tan
University of Missouri
34 PUBLICATIONS   1,063 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Origin of life, origin of species View project

Xenopus Development View project

All content following this page was uploaded by Change Tan on 20 March 2020.

The user has requested enhancement of the downloaded file.


The New Central Dogma of Molecular
Biology

Change Laura Tan* and Eric Anderson

Division of Biological Sciences, University of Missouri, Columbia, MO


65211, USA

*: Corresponding author: Phone: 573-882-1581, Email: [email protected]

1
Abstract

The central dogma of molecular biology formulated by Francis Crick has greatly influenced our

scientific research and perspective of life. However, it fails to adequately account for the

following discoveries: 1) there are different kinds and different levels of biological information, 2)

no information flow is possible without the cooperative function of DNA, RNA, and proteins, 3)

the coding system and the decoding system have to match with each other, and 4) proteins,

with the help of RNAs, control whether and how DNA is replicated and also control the stability,

accessibility, and usability of DNA. Thus, we propose updating the central dogma to the

following: The central principle of molecular biology is biological information coding and

decoding. Specifically, 1) sequence information can be transferred from DNA to DNA, DNA to

RNA, RNA to DNA, RNA to RNA, and RNA to proteins; 2) no information transfer can occur

without the interdependent, integrated, function of matching DNA, RNA, and proteins; 3) there is

no reverse translation, but proteins, with the help of RNAs, determine the maintenance,

propagation, and coding potential of DNA; and 4) information transfer is an active response of a

cell to its internal and external conditions.

Key Words: DNA replication, transcription, translation, biological information, information

transfer, central dogma, epigenetics

2
The central dogma of molecular biology that Francis Crick articulated more than sixty years ago

has had a profound impact not only on the study of molecular biology but also on our daily

thinking about life and our approaches to causes and treatments of diseases. The central

dogma underlies the common belief that identifying and manipulating certain genes would

enable us to solve the twin problems of world hunger (e.g., via generating genetically modified

organisms) and dreadful disease (e.g., via personalized medicine). A clear understanding of the

true characteristics of molecular biology is both critical and urgent, because the consequences

of misunderstanding are severe and costly. In this essay, we will briefly review the history of,

and describe the problems with, the current central dogma and will provide a revision that more

accurately reflects our current understanding of molecular biology.

I. History of the Central Dogma of Molecular Biology

In a March 19, 1953 letter, Francis Crick told his 12-year-old son Michael about the discovery he

and James Watson had made [1]:

Jim Watson and I have probably made a most important discovery. We have built a

model for the structure of de-oxy-ribose-nucleic-acid (read it carefully) called D.N.A. for

short. Now the exciting thing is that while there are 4 different bases, we find we can

only put certain pairs of them together… only A with T and G with C…

Now on one chain, as far as we can see, one can have the bases in any order, but if

their order is fixed, then the order on the other chain is also fixed… It is like a code. If

you are given one set of letters you can write down the others.

Now we believe that the D.N.A. is a code. That is, the order of the bases (the letters)

makes one gene different from another gene (just as one page of print is different from

another). You can now see how Nature makes copies of the genes. Because if the two

3
chains unwind into two separate chains, and if each chain then makes another chain

come together on it, then because A always goes with T, and G with C, we shall get two

copies where we had one before... (emphasis in the original)

The discovery was published one month later in Nature [2]. Near the end of their famous one-

page-long article, Watson and Crick observed: “It has not escaped our notice that the specific

pairing we have postulated immediately suggests a possible copying mechanism for the genetic

material.”

Four years later, at a symposium held at University College London, Crick described principles

relating to the transfer of genetic information [3], referred to in his notes and in later writings as

the “Central Dogma” ([4], p153):

This states that once 'information' has passed into protein it cannot get out again. In

more detail, the transfer of information from nucleic acid to nucleic acid, or from nucleic

acid to protein may be possible, but transfer from protein to protein, or from protein to

nucleic acid is impossible. Information means here the precise determination of

sequence, either of bases in the nucleic acid or of amino acid residues in the protein.

(emphasis in the original)

In 1970, in response to a challenge against the central dogma based on the discovery of

reverse transcriptases (or RNA-dependent DNA polymerases), Crick explained, reaffirmed, and

clarified the central dogma:

The central dogma of molecular biology deals with the detailed residue-by-residue

transfer of sequential information. It states that information cannot be transferred back

from protein to either protein or nucleic acid ([5], p561).

4
James Watson popularized the central dogma via his widely-used Molecular Biology of the

Gene textbook, now in its eighth edition, with a simple figure (Figure 1) in which

“the arrows indicate the directions proposed for the transfer of genetic information. The

arrow encircling DNA signifies that DNA is the template for its self-replication. The arrow

between DNA and RNA indicates that RNA synthesis (called transcription) is directed by

a DNA template. Correspondingly, the synthesis of proteins (called translation) is

directed by an RNA template. Most importantly, the last two arrows were presented as

unidirectional; that is, RNA sequences are never determined by protein templates nor

was DNA then imagined ever to be made on RNA templates” ([6], p32 and [7], p33,

emphasis not in the original).

Figure 1. The central dogma according to Watson.

Consequently, in most people’s minds, the central dogma describes the unidirectional transfer

of genetic information from DNA to RNA to proteins [3].

It is impossible to measure the exact impact of the discovery of the double helical structure of

DNA and Crick’s formulation of the central dogma and his sequence hypothesis. Crick equated

their discovery of the double helix with the discovery of the “secret of life” [8]. With collectors

recognizing the significance of this remarkable discovery, Crick’s March 19, 1953 letter to his

son Michael was sold for six million dollars in 2013 [1]. Matthew Cobb referred to Crick’s

subsequent 1957 symposium lecture as “one of the most significant lectures in the history of

biology” and as “a lecture that changed how we think” [3]. Horace Judson remarked: Crick’s

lecture “permanently altered the logic of biology” [9]. Eugene Koonin called the central dogma

5
the only exception to the “‘ubiquitous exception’ rule of biology” in which “the only actual rule is

that there are no rules, i.e. exceptions can be found to every ‘fundamental’ principle if one looks

hard enough” ([10], p1). In his molecular biology textbook, Burton Tropp declared that the

central dogma provides the theoretical framework for molecular biology ([11], p22). The double

helix has become the icon of biology, even of science itself. It is currently widely accepted that

given the nucleotide sequence of one strand of DNA, we can write down that of the other, and,

with the genetic codon table at hand, we can spell out the amino acid sequence of the encoded

protein.

Despite the unquestionable influence of the central dogma within both popular understanding

and scientific research, we will discuss below that information encoded in DNA is only part of

the information inside a cell; that no information can flow without the integrated function of

matching DNA, RNA, and proteins; and that whether a DNA sequence encodes for anything,

and if so, what it encodes, depends on what exists inside and outside of the cell. In short,

information coding and decoding are interdependent and are organism and cell type specific.

II. The Meaning of Information

Different kinds of information

Some challenges have been raised against the central dogma, first after the discovery of

reverse transcriptases as mentioned above and then after the discovery of prions, proteins with

conformations that can be transmitted onto other proteins that have the same amino acid

sequence as the prions but are folded differently [10, 12-15]. However, these objections mostly

reflect a misunderstanding of Crick’s central dogma, in particular about Crick’s concept of

information and what Crick thought could not take place within biological systems. Crick was

very clear about both issues: by “information” he meant “the precise determination of sequence,

6
either of bases in the nucleic acid or of amino acid residues in the protein” [4]. We can refer to

Crick’s “information” as “sequence information”, which refers to the nucleotide sequence in DNA

or RNA and the amino acid sequence in proteins. Therefore, what Crick proposed was

prohibited is the transfer of sequence information from proteins to proteins or from proteins to

nucleic acids.

While Crick’s central dogma remained the prevailing wisdom for decades, more recent

discoveries have challenged his understanding of biological information and what was possible

in biological systems. Specifically, a major assumption underlying Crick’s central dogma, and

where the central dogma has led us astray, is the misconception that the sequence information

of DNA contains all the inheritable substance that determines the phenotype of an organism.

This “gene-centric” or “genetic determinism” view of life has come under increasing criticism

with additional discoveries [16-21]. As mentioned above, for example, questions about

inheritance arose when prions were discovered. After all, there did not seem to be a direct

correlation between the sequence information in DNA and the newly propagated prions. Even

more striking, a prion’s ability to impact the shape of another protein began to create doubts

about Crick’s proposal that information could not be transmitted from protein to protein (although

in the case of prions it should be noted that it is not the precise sequence of the amino acids

which is altered, but the three-dimensional structure of the protein). In addition to prions, other

inheritable epigenetic factors have been discovered, which have further complicated the simple

relationship between genotype and phenotype hypothesized by the central dogma [17, 18, 22].

Five decades of biological research has revealed that there are different kinds of information

inside cells. Koonin distinguished two kinds: (i) digital information, the one-dimensional

sequence information contained in nucleic acids, and (ii) analog information, the three-

dimensional structure of proteins [23]. Koonin’s concept can, and should, be extended since

both nucleic acids and proteins have digital (sequence) information and analog (three-
7
dimensional structure) information. The analog information can further be extended to include

not only the three-dimensional structure of a given protein, but also protein localization, post-

translational modification, network components (i.e., available binding partners in a specific

cell)—in essence, the broader cellular context.

Different levels of information

Not only are there different kinds of information, but also there are different levels of information

embedded within DNA and RNA. Protein-coding is only one of these levels. One of the more

remarkable discoveries in recent years has been the discovery that so-called “silent mutations”

can have an impact on cellular function. Previously, under the prevailing view enshrined in the

central dogma that the only role of DNA was to code for a particular protein, it was long

assumed that any codon for a particular amino acid was equivalent to any other codon for the

same amino acid. Thus, for example, because GGA and GGG both code for glycine, it has been

assumed that these codons are equivalent and that a mutation from, say, “A” in the third

position to “G” in the third position would have no possible consequence in the organism—it

would be a “silent” mutation, invisible to internal workings of the organism, to its outward

appearance, and to the hand of natural selection. This inherent “redundancy” of the genetic

code has been a staple of biology education for decades, with much ink spilt debating the

reasons for such redundancy and much speculation about its potential role in neutral evolution.

While it is far too early to suggest that each codon within the genetic code performs a unique

role (and variations may exist between different organisms), it came as something of a shock to

the received wisdom that some of the “silent mutations” were not, in fact, so silent after all.

Recent research has revealed multiple instances in which recoding an amino acid with codons

that code for the same amino acid but which have different nucleotide sequences can disrupt

the function of a segment of DNA or RNA at a level other than the mere designation of the

particular translated amino acids. Such changes, previously thought to be completely neutral,
8
can in fact be lethal, as several of the laboratories involved in yeast chromosome engineering

have discovered, with surprise [24-27].

Another example of non-sequence-dependent information is information that is sequence-

independent but distance-dependent. For instance, DNA between the upstream and the core

promoter elements of the human ribosomal RNA gene appears to tolerate nucleobase

substitutions but not significant alterations of its length. Researchers have observed that a

removal of 44 bp (base pairs) between the two promoter elements reduces the promoter

strength by 90% compared to the wild-type, and an addition of 49 bp reduces promoter strength

by 70% [28, 29].

If we limit the discussion of information in living systems to just sequence (digital) information as

Crick did when articulating the central dogma, then neither the reverse transcriptases nor the

prions present real challenges to Crick’s central dogma [5, 23]. We could still argue, for

example, that when DNA is being replicated accurately, “we shall get two copies where we had

one before,” as Crick said. And when a gene is translated, once we know the protein-coding

region (i.e., the open reading frame) of its mature RNA, we can spell out the amino acid

sequence of its encoded protein. [Let’s discuss splicing and concatenation on our next call.]

Such a limited view of information might allow us to maintain the perception that the central

dogma is still true, but it would be true only in an increasingly-limited way and with an

increasingly-growing number of exceptions. In light of new discoveries, we might be able to hold

onto the proud tradition of the central dogma, but only at the expense of accuracy and

relevancy.

More importantly, as is often the case, however, the devil is in the details. And an important

detail is in those conditional words: “when DNA is being replicated accurately” and “when a

9
gene is translated, once we know the protein-coding region (i.e., the open reading frame) of its

mature RNA.” These are what the central dogma, in Crick’s words, “says nothing about” ([5],

p562):

It says nothing about what the machinery of transfer is made of, and in particular nothing

about errors.

It says nothing about control mechanisms—that is, about the rate at which the processes

work.

What the central dogma “says nothing about” matters. Countless experiments have shown that

no information can be transferred without the coordinate interaction of DNA, RNA and proteins,

as discussed below. More importantly, the meaning and the usefulness of a code depend on the

decoding systems. The situation is very similar to human languages. The four-letter word “gift”

means a present in English, while it means a poison in German. “Your room is on the first floor”

points to very different location in England (one level above the ground level) than in the United

States (the ground level), even though both countries speak English.

III. Problems with the Transfer of Information

To be replicated or not to be replicated

The double helix is a double-edged sword; it makes DNA self-replication impossible. In part, this

is because DNA molecules are very long and the two strands are tightly wrapped around each

other. For example, an E. coli genomic DNA is 4.6 million base pairs long and separation of the

two strands, even a few base pairs, needs a protein enzyme, a helicase, and ATP. Furthermore,

when helicase unwinds the two strands, the DNA ahead of the opening will become over-wound

and needs to be untangled by another protein enzyme, a topoisomerase. Left unabated, the

10
torsion resulted from the overwinding would quickly stop the ability of DNA or RNA polymerases

to continue down the DNA strand during DNA replication or transcription, respectively. Indeed,

more than 25 different proteins are required to replicate the E. coli genomic DNA [30].

DNA cannot self-replicate. “DNA + 0 = 0,” noted James Shapiro [18].

DNA “replicates accurately only in a complete cell containing all the objective functionality that

enable cells to be alive,” relying “on an army of specialized proteins and on the lipid

membranous structures for which there are no DNA sequences. Outside a living cell, DNA is

inert, dead,” observed Denis Noble. Noble went on to note: “DNA is a passive cause. As

Watson said to Crick when they first made their momentous discovery of the double helix:

‘Francis, it’s a template’ … Active causation lies at the level of the cell, or of multicellular

structures and organisms” ([17], p248).

DNA replication is tightly regulated. The cell’s ability to sense its internal and external

conditions, the decision about whether, when, how much (part of the genome or the whole

genome, number of copies to be made) and how accurately to replicate DNA (for example,

whether high fidelity polymerases or error-prone polymerases will be used during replication and

the extent to which replication errors will be corrected by the DNA repair system [31]), as well as

the very execution of DNA replication, all depend on the integrated functions of numerous RNAs

and proteins in the cell and the protein-loaded cell membranes.

It is worth pointing out that the genome of a cell is much more dynamic than expected—certainly

far more dynamic than the central dogma had led us to believe. We are learning that there are

multiple ways cells can manipulate their DNA contents. For example, Zhang and colleagues

deleted all 100 copies of endogenous yeast ribosomal DNA (rDNA), replaced them with a DNA

fragment containing 1.2 or 2 copies of the rDNA unit carrying a hygromycin B resistance

11
mutation, and cultured the resulting strains in medium containing increasing amounts of

hygromycin B [27]. After two weeks, a new rDNA cluster had been regenerated and the copy

number was comparable to that of the wild type. This demonstrates that the cells have a

mechanism to detect the copy number of rDNA and maintain the correct copy number. In yeast,

this can be accomplished by the upstream activating factor (UAF) for RNA polymerase I. UAF

was recently found to ensure rDNA production not only by rDNA transcription activation but also

by its copy-number maintenance [32]. Drosophila melanogaster has also been found to be able

to adjust its rDNA copy numbers [33, 34]. Van Hofwegen and colleagues found that aerobic

citrate-utilizing E. coli (Cit+) could be rapidly and repeatedly produced when wild type E. coli

was cultured in a minimal medium supplemented with citrate, resulting from an active internal

cellular process that generated additional citT and dctA loci followed by rearrangement of the

DNA [35]. This enables the E. coli, in the presence of oxygen, to synthesize enzymes that are

needed for citrate metabolism and that are normally synthesized only in the absence of oxygen.

Strikingly, E. coli cells that lack functional citT or dctA were not able to respond to the same

environmental challenges to become Cit+.

Many studies of the molecular mechanisms of mutation have now revealed that mutation is

often a highly-regulated process, either activated or up-regulated temporarily when

cells/organisms are stressed (reviewed in [31] and [36]). Furthermore, proteins, with the help of

RNAs, actively survey DNA and either maintain the DNA intact or orchestrate needed

alterations–even its total degradation (such as in the case of programmed cell death or in the

development of anucleate human red blood cells).

In a nutshell, it is true that DNA is not synthesized using a protein as a template (i.e., no reverse

translation), so that proteins presumably would have played no role in determining the

nucleotide sequence of the first strand of DNA in the first cell. However, proteins play a central

role in determining the DNA contents of the descendants of that cell, as well as the coding
12
potential and usefulness of the DNA as discussed below. As we will see, every protein functions

in the context of other proteins, RNAs, and, in fact, the whole cellular context of a cell.

To be a gene or not to be a gene

The cell, including all its RNAs and proteins, also determines whether a segment of DNA will be

used as a template to make an RNA molecule and whether an RNA will be used to direct

protein synthesis, based on the internal and external conditions detected by the cell. In other

words, whether a segment of DNA will be recognized as a gene and whether a gene will be

used to generate a protein depends, among other factors, on the RNAs and proteins present in

the cell at that moment. It often also depends on what is present in the surroundings of the cell.

Indeed, it is the overall cell (including the RNAs and proteins inside the cell and in its

membranes) that determines whether a segment of DNA is a gene or not a gene. The common

tendency to refer to a given segment of DNA as a “gene,” because it happens to code for a

protein in a specific instance in one particular organism, belies a simplistic view of the richness

of biology—a view exacerbated by the central dogma’s tidy emphasis on DNA-to-RNA-to-

proteins. Human concepts and opinions, however, do not seem to matter to the organisms

being studied. The study of molecular biology would be much simpler (although no doubt less

interesting) if the identification of a particular segment of DNA could tell us all we needed to

know about what protein would be produced (if any) and when and to what extent it would be

produced. In reality, regulation of gene expression (i.e., transcription and translation) accounts

for the much of molecular biology research.

The studies of the ENCODE Consortium and many others have uncovered that there are more

genes that do not encode any protein than those that do in the human genome [37-39](Figure

2). Not only one gene can code for multiple RNA transcripts, resulting from different

transcription starting sites or stopping sites or from alternative splicing, that may code for

13
different proteins, but genes often overlap with each other. These studies have unveiled

unexpected challenges to delineate genes. In Mudge and colleagues’ words, genes are having

an “identity crisis”, as detailed in the following quote:

Over one hundred years after the basic rules of heredity were established, the gene is

undergoing an identity crisis. Indeed the question ‘‘what is a gene?’’ has been much

debated in recent years …In a scientific context, this question concerns the way in which

information is stored in the genome. Over the 20th Century, the biological definition of

the gene evolved from ‘‘the site of a hereditable trait’’ to ‘‘the genomic region from where

the mRNA that encodes a protein is transcribed,’’ i.e., the ‘‘central dogma’’ of molecular

biology… Gerstein and colleagues recently proposed that ‘‘a gene is a union of genomic

sequences encoding a coherent set of potentially overlapping functional products’’[40].

The key point here is that the word ‘‘gene’’ no longer designates a unit of functionality.

Instead it is used as a collective term for a group of products, i.e., transcripts. From our

perspective, there are vital questions concealed within the ‘‘what is a gene?’’ debate. For

example: what is the true size of the transcriptome and what proportion of this

transcription is genuinely functional? Indeed, what does ‘‘functional’’ actually mean in

this context? [41]

14
Figure 2: Gene contents of the human genome according to GENCODE version 33. The numbers represent the

number and percentage of genes in the corresponding category and are from GENCODE at

https://www.gencodegenes.org/human/stats.html. Note that most DNA nucleotides of protein-coding genes are

located within the genes’ untranslated regions and introns and, thus, do not code any amino acids of proteins.

In short, molecular biology has uncovered a richness and a complex array of components and

systems that underlie the production of proteins from DNA. Rather than a simple, inevitable flow

of information from DNA-to-RNA-to-proteins, it is now clear that no information can flow, or be

transferred, without the interdependent, integrated, functions of the DNA, RNAs, and proteins of

the cell. DNA is somewhat like a recipe book. Its value and usefulness depend on the user. It is

not a book to be read from cover to cover, conveying the same information to every reader. The

chef can choose which recipe to use and modify the recipe as needed.

15
The information funnels

Another limitation of the DNA-to-RNA-to-proteins view of genetic information within the

organism is that it fails to account for much of the genetic information, especially in complicated

organisms like humans. This is because not all regions of genomic DNA encode genes, not all

genes are protein-coding, and not all regions of a protein-coding gene code for amino acids of

that protein, as discussed above. In addition, it is now known that many functional non-protein-

coding genes exist, the most abundant and most familiar being ribosomal RNA and transfer

RNA genes. Figure 3 lists some of the changes that take place and some of the processes

involved in turning a protein-coding gene sequence into a finished protein.

Figure 3. A schematic view of information transfer from DNA to RNA to proteins. Only the green-boxed regions

are protein coding.

Another way to view the standard transcription and translation decoding process is to examine

in more detail the informational transformation that takes place via the information processing

systems of the cell. In Table 1 below we list the basic steps in the process of protein production

16
and indicate the aspects of information loss and information gain that apply. Note that by

“information loss” we are not suggesting that information is somehow irretrievably lost within the

cell. Rather, in the narrow context of DNA sequence information, as discussed by Crick in

formulating the central dogma, the sequence in the next stage of the process (e.g., an mRNA vs

the underlying DNA sequence; or an amino acid sequence vs the underlying mRNA) is missing

information, in that it does not allow for reverse translation of the earlier sequence from the later

one. While this can be termed “information loss,” and we have followed this convention in Table

1 below, in fact what is occurring during protein production is that additional information is being

brought to bear by the cell from outside the relevant sequence in order to complete the next

stage of the production process. For example, while it is true that we cannot start with a protein

and recreate the full genomic sequence underlying that protein, as reflected above in Figure 3,

the reason is not so much that information has been “lost” as is often described, but rather that

additional information has been brought to bear by the cell in order to read, decode, and act

upon that underlying genomic sequence in order to produce what is needed by the cell, in the

right quantity, at the right time, and in the relevant context, as discussed throughout this paper.

Table 1. Information content changes during transcription and translation

Process Information loss Information gain

Transcription Intergenic regions (non-coding DNA)

• New combinations via intron


• Introns splicing and alternative
• External transcribed spacer* splicing
RNA processing
(e.g., rRNA transcripts) • RNA editing
• Internal transcribed spacer*

• 5’ untranslated regions
• 3’ untranslated regions
Translation • Non-coding (or non-protein-
coding) RNA

17
• Non-sequence information
• Cleavage of signal peptides (folding, localization,
Protein processing
• Deleting of other regions that formation of complexes)
or maturation • Posttranslational modification
are not in the mature proteins

Numerous genome-scale studies have revealed that significant amounts of DNA exist that do

not code for proteins, although they may be involved in protein production and protein function.

For example, it is estimated that only 1.1-1.5% of the human genome encodes proteins [38, 42,

43]. Even though recent researches have shown that protein-coding regions are much more

pervasive than previously thought (reviewed in [44]), there are many more non-protein-coding

genetic sequences than protein-coding genetic sequences in our genome.

The non-protein-coding regions can be essential to the viability or reproduction of the organism.

For instance, although they used to be widely regarded as junk DNA, introns can be vital for an

organism. Deletions of certain introns are lethal for yeast [24, 27] and humans [45], and failures

in sex-specific alternative intron splicing prevent proper male and female differentiation and

cause infertility in Drosophila melanogaster [46-48]. Furthermore, a majority of the functional

elements identified by the ENCODE project and of single nucleotide polymorphisms (SNPs)

associated with disease by genome-wide association studies (GWAS) are localized within the

non-protein-coding regions of the human genome [38].

Thus, even if every single one of the proteins within an organism could be reverse-translated

back into RNA or DNA, residue by residue, it would still represent only a portion of the DNA

necessary for the organism. At the very least, we would not be able to recreate the nucleotide

sequence of those portions of the DNA that do not code for proteins, including the untranslated

regions of mRNA, the introns, the non-protein-coding RNA genes, and the intergenic regions.

18
Based on current estimates, these non-protein-coding regions account for more than 98% of the

human genome. Therefore, the specific protein-coding DNA sequence information that Crick

focused on in formulating his central dogma, likely represents only a minority of the information

needed to actually build and maintain an organism.

Furthermore, due to codon redundancy, a reverse transfer of information from amino acid

sequences to DNA sequences is often not a one-to-one relationship. Consequently, the original

nucleotide sequences of even the protein-coding regions may not be fully recoverable starting

from the amino acid sequences of the coded proteins in most cases, resulting in a potential loss

of non-protein-coding level information or information of overlapping genes. As discussed

earlier, choosing the incorrect codons, even without altering the amino acid sequence, can be

lethal [24-27]. This may be due to a disruption of the original non-protein-coding level

information or of an unrecognized overlapping gene.

Therefore, insofar as it relates to mere sequence information alone as emphasized by the

central dogma, we have an information funnel with irreversible loss of sequence information

from DNA to RNA to proteins (Figure 4 left).

19
Figure 4. The information funnels. (left) Loss of DNA-sequence-dependent information from DNA to RNA to

proteins during transcription and translation. (right) Gain of DNA-sequence-independent information from DNA to

RNA to proteins during or after transcription and translation.

On the flip side, we also have a reverted information funnel in which new information, even

additional sequence information, is brought to bear during RNA processing and protein

processing (Figure 4 right). For example, alternative splicing of intron-containing RNAs can

produce new combinations of RNA segments, resulting in RNA molecules that encode different

proteins. In addition, RNA editing can dramatically change the sequence of an RNA molecule,

and, hence, the amino acid sequence of the protein encoded by the corresponding DNA [49].

IV. Problems of the Necessity of Matching the Coding and Decoding Systems

Another key issue that has often been ignored by the scientific community with its focus on the

sequence information transfer articulated by the central dogma, is that the coding and the

decoding systems need to match each other. That is, the DNA to be replicated and the

20
molecular machines that replicate the DNA have to match with each other; and the genes to be

transcribed and translated and the molecular machines that transcribe and translate the genes

also have to match with each other.

Species match

For example, Craig Venter’s team synthesized the entire one-megabase (Mb) genome of

Mycoplasma mycoides in yeast, but the yeast cannot create Mycoplasma mycoides cells using

the cloned bacterial genome (Figure 5). The genes encoded in the cloned genome need to be

transcribed and translated using the molecular machines from Mycoplasma capricolum, a cell

that is highly similar to the genome donor M. mycoides.

Figure 5. Basic steps involved in Venter’s generation of a synthetic bacterial cell.

21
The inability of a yeast cell to decode the bacterial M. mycoides genetic code is a consequence

of the domain-specific information processing systems, including DNA replication, transcription,

and translation [50, 51]. Figure 6 provides a comparison of DNA replication initiation in

bacterium E. coli (Escherichia coli) and yeast Saccharomyces cerevisiae.

What is striking is not so much that the number of proteins involved are different (as important

as that is) but that the identity of these proteins is different. The proteins used for bacterial DNA

replication are bacteria specific; they do not have known homologs in eukaryotes. Likewise, the

proteins used for eukaryotic DNA replication are eukaryotes specific; they do not have known

homologs in bacteria. Due to the difference between bacterial and eukaryotic DNA replication

machinery, a yeast origin of replication had to be artificially incorporated into the bacterial

genome before the bacterial genome could be cloned in yeast.

22
Figure 6. A comparison of DNA replication initiation in bacteria E. coli and eukaryote yeast S. cerevisiae. A:

Initiation in E. coli. B: Initiation in yeast. Note that the proteins involved are unique to either E. coli or yeast. From

Figure 3 of [50].

The transcription and translation machinery of bacteria and eukaryotes are also very different.

For a piece of DNA to be recognized as a gene and be transcribed by a bacterial cell, that DNA

segment has to be sandwiched between a bacterial promoter and a bacterial transcription

terminator. Yet for that same stretch of DNA to be recognized as a gene and be transcribed by a

eukaryotic cell, that DNA segment would have to be sandwiched between a eukaryotic promoter

and a eukaryotic transcription terminator. Furthermore, the same RNA transcript may encode

23
totally unrelated proteins by a bacterial cell and a eukaryotic cell even if it does encode a protein

(Figure 7).

Figure 7. The same RNA may end up with two different proteins in bacteria and eukaryotes. Blue box: Shine-

Dalgarno sequence; green box: translation initiation site; red box: translation stopping site. Top: The hypothetical

mRNA would be used to code for a protein with amino acids MFIGA, based on the mechanism of translation of

bacteria like E. coli. The Shine-Dalgarno sequence is important for translation initiation in bacteria. It hybridizes to an

anti-Shine-Dalgarno sequence, which is reverse and complementary to the Shine-Dalgarno sequence, in the 16S

rRNA. Bacteria use the AUG that is a few nucleotides downstream of the Shine-Dalgarno sequence as the translation

initiation site. Bottom: The same hypothetical mRNA would be used to code for a protein with amino acids MAKEV,

based on the mechanism of translation of eukaryotes like yeast. Eukaryotes normally use the first AUG from the 5’

end of an mRNA as the translation starting site.

Interestingly, not only would a eukaryote cell have trouble decoding a bacterial genetic code

(i.e., reading, interpreting, and executing the instructions encoded in a bacterial genome), but

even a bacterial cell may not be able to read the genetic instructions of another bacterial cell

24
either. For instance, cloning the whole 3.5-Mb genome1 of the photosynthetic bacterium

Synechocystis PCC6803 into the 4.2-Mb genome of the mesophilic bacterium Bacillus subtilis

did not enable Bacillus subtilis to perform photosynthesis. The resultant cells could not even be

cultured in the medium culturing Synechocystis, indicating that the added Synechocystis

genome was not able to be used successfully by the host cell [52], despite the clear benefit the

added genome might have provided in that medium. Although from our outside perspective we

might be tempted to think that the added Synechocystis genome contained all the information

necessary to enable the host cells to thrive in the culturing medium, the extensive sequence

information contained in the Synechocystis genome seems to have been unrecognizable and of

no value to the host cell.

Cell-type and cell-status match

In addition to the species-specific match required for proper coding and decoding, a special

match is sometimes necessary within a species, such as a match between the stages (or cell

cycle) of a cell or the match that exists between the type of a cell and its genome and the

decoding molecules of the genome. In reality, we might think of the decoding system as the

whole cell, including its many RNAs and proteins. This is because it is the overall function of the

cell as a whole, in the context of a specific tissue and/or environment, that determines whether

the genomic DNA will be replicated in the first place and, if so, whether only part of genome or

the whole genome will be replicated, whether error-prone DNA polymerases will be allowed to

participate in the replication (as in a stress-response situation) or only DNA polymerases with

high replication fidelity, and what parts of the genomic DNA will be transcribed or translated.

This cell-type and cell-status match occurs every day in every living organism, although we

1
The two ribosomal RNA genes of Synechocystis were not included because they are toxic to Bacillus.
The toxicity may be due to the fact that they are close enough to the ribosomal RNA genes of Bacillus to
be transcribed but different enough that it would mess up the translational machinery of the host cell.

25
normally do not think of it that way. Evidence now suggests that mismatched tissue-specific or

cell-specific transcription and translation is an important contributing factor for many diseases,

including cancer and diabetes. Imagine what would happen if muscle fibers were made in a

nerve cell instead of a muscle cell. Or consider the pain of having bones grow in a place where

they should not be.

In summary, the presence of a particular DNA gene sequence does not guarantee the making

of an RNA transcript. In fact, it is vital that this is so, since unregulated gene expression not only

wastes resources, but in some cases can be deadly to the survival or reproduction of the

organism. In addition, the presence of an RNA transcript does not guarantee the making of a

protein. Thus, knowledge of the sequence of a genome does not enable one to predict the

transcriptome (all the RNAs in a cell) or the proteome (all the proteins in a cell). Both a cell’s

transcriptome and proteome can change based on what type of cell it is, the status of the cell,

and what is present in the environment. The only way of precisely knowing the transcriptome

and the proteome of a cell is to independently sequence them. In addition, it is well known that

while it is now relatively easy to determine the raw genome sequence of an organism, to

annotate the genome (i.e., to determine which parts of the genome actually encode genes) is

quite challenging.

Organism and environment match

In addition to the important match that must exist between a genome and the proper species of

organism, as well as the additional correlation that must exist within the same species between

its genome and the relevant stages of cell development and the various molecules within the

cell, an organism’s coding-decoding system must also be coordinated to operate properly within

a given environment. This includes an organism’s effect on and actions within an environment,

based on what molecules are inside the organism and those that are embedded within its

26
membranes—in contact with the outside environment. One well-known example is the

observation that in a culture medium with both glucose and lactose E. coli will not make

galactosidase, a protein needed for lactose metabolism, until all the glucose is used up. When

there is no longer any lactose in the medium, galactosidase will not be generated either under

normal conditions (although it may be generated, but not be usable by the cell, if the lac-

repressor gene has been mutated and is no longer functioning properly).

Examples of an organism’s effects on the environment include niche construction and the

generation of waste (note that the waste of one organism may be nutrients for another).

Examples of environmental factors that an organism may have to deal with include nutrients,

temperature, pH, and other organisms, such as pathogens, predators, and siblings.

V. Revisiting the Central Dogma

More than 60 years ago, at the dawn of the genetic age, Francis Crick proposed a model of

genetic information transfer that emphasized protein-coding nucleotide sequences in DNA as

the fundamental source of information in the cell—constituting the source of a one-way flow of

information from DNA to RNA to proteins. For decades, Crick’s central dogma has influenced

biology research and has impacted views of how information processing occurs in the cell and

even what aspects of DNA are deemed functional or worthy of consideration.

Yet despite Crick’s remarkable insights and contributions to modern biology, the central dogma

is inadequate to account for the different kinds and levels of information inside a cell, or for the

requirement and complexity of information transfer. Although perhaps not intended, the resulting

reductionist, static, DNA-centric view of life has become a hindrance to our understanding of life.

The exceptions and contradictions between the central dogma and the data have finally reached

the point where they can no longer be dismissed as occasional anomalies or be explained away

27
with definitional clarifications of the central dogma. The central dogma’s underlying assumption

of a one-way flow—with DNA protein-coding sequence information as the ultimate source and

arbiter of information processing within the cell—can no longer be considered a viable way of

understanding the activity and role of information in biology.

We propose a new central dogma as follows (Figure 8):

The central principle of molecular biology is biological information coding and decoding,

i.e., the detailed transfer of heritable, cell-type and cell-status specific, environment-

responsive, aperiodic sequence information. Specifically, 1) sequence information can

be transferred from DNA to DNA, DNA to RNA, RNA to DNA, RNA to RNA, and RNA to

proteins; 2) no information transfer can occur without the interdependent, integrated,

function of matching DNA, RNA, and proteins; 3) there is no reverse translation, but

proteins, with the help of RNAs, determine the maintenance, propagation, and coding

potential of DNA; and 4) information transfer is an active response of a cell to its internal

and external conditions.

In this context, the following terms deserve special attention:

“Cell type and cell status” includes a cell’s genome, epigenome, proteome, metabolome,

membranes, as well as their structures and organization.

“Environment-responsive” refers to an organism’s effects on and actions within an environment,

based on the organism’s internal features and capabilities as discussed above (Figure 8, orange

arrows).

“Sequence” refers to the nucleotide sequences of DNA and RNAs and the amino acid

sequences of proteins. Sequence information does not include the function, conformation (or

28
structure), localization, post-translational modification, networks (binding partners), or any other

non-sequence information.

“Aperiodic” refers to the fact that DNA (or RNA or protein) sequence is irregular and cannot be

predicted based on the chemical or physical affinities of its constituent monomers.

Figure 8. A schematic view of the new central dogma. Note the cell-type- and cell-status-specific, environment-

responsive, interdependence of DNA, RNA, and proteins. Black arrows: sequence information transfer. Purple arrows:

kinds of molecules needed for the corresponding information transfer. Orange arrows: interactions between a cell and

its environment.

Looking Forward

The current central dogma has caused a significant amount of misunderstanding and has

become a hindrance to our understanding of life. The purpose of this essay has been to

stimulate critical thinking about the current central dogma and to propose a new way of

29
understanding the information processes at work in living systems. As future research continues

to expand our understanding of these processes, we look forward to learning and appreciating

more about the crucial role of information in biology.

References
1. Lee, J.J. Read Francis Crick’s $6 Million Letter to Son Describing DNA. 2013 [cited 2019
12/2/2019]; Available from: https://blog.nationalgeographic.org/2013/04/11/read-francis-
cricks-6-million-letter-to-son-describing-dna/.
2. Watson, J.D. and F.H. Crick, Molecular structure of nucleic acids; a structure for deoxyribose
nucleic acid. Nature, 1953. 171(4356): p. 737-8.
3. Cobb, M., 60 years ago, Francis Crick changed the logic of biology. PLoS Biol, 2017. 15(9): p.
e2003243.
4. Crick, F.H., On protein synthesis. Symp Soc Exp Biol, 1958. 12: p. 138-63.
5. Crick, F., Central dogma of molecular biology. Nature, 1970. 227(5258): p. 561-3.
6. Watson, J.D., et al., Molecular Biology of the Gene. 6th ed. 2008, Cold Spring Harbor, N.Y.: Cold
Spring Harbor Laboratory Press.
7. Watson, J.D., et al., Molecular Biology of the Gene. 7th ed. 2013: Pearson.
8. BBC, 1953: Scientists describe 'secret of life', in On This Day. BBC.
9. Judson, H.F., The Eighth Day of Creation: Makers of the Revolution in Biology, Commemorative
Edition. 1996: Cold Spring Harbor Laboratory Press.
10. Koonin, E.V., Does the central dogma still stand? Biology Direct, 2012. 7.
11. Tropp, B.E., Molecular Biology: Genes to Proteins. 4th ed. 2012, Sudbury, MA 01776: Jones &
Bartlett Learning, LLC.
12. Peedicayil, J., DNA methylation and the central dogma of molecular biology. Med Hypotheses,
2005. 64(6): p. 1243-4.
13. Bussard, A.E., A scientific revolution? The prion anomaly may challenge the central dogma of
molecular biology. EMBO Rep, 2005. 6(8): p. 691-4.
14. Biro, J.C., Seven fundamental, unsolved questions in molecular biology. Cooperative storage and
bi-directional transfer of biological information by nucleic acids and proteins: an alternative to
"central dogma". Med Hypotheses, 2004. 63(6): p. 951-62.
15. anonymous, Central dogma reversed. Nature, 1970. 226(5252): p. 1198-9.
16. de Lorenzo, V., From the selfish gene to selfish metabolism: revisiting the central dogma.
Bioessays, 2014. 36(3): p. 226-35.
17. Noble, D., Central Dogma or Central Debate? Physiology (Bethesda), 2018. 33(4): p. 246-249.
18. Shapiro, J.A., Revisiting the central dogma in the 21st century. Ann N Y Acad Sci, 2009. 1178: p.
6-28.
19. Noble, D., Differential and integral views of genetics in computational systems biology. Interface
Focus, 2011. 1(1): p. 7-15.
20. Noble, D., Evolution viewed from physics, physiology and medicine. Interface Focus, 2017. 7(5).
21. Noble, D., Evolution beyond neo-Darwinism: a new conceptual framework. J Exp Biol, 2015.
218(Pt 1): p. 7-13.
22. Liu, J., et al., N (6)-methyladenosine of chromosome-associated regulatory RNA regulates
chromatin state and transcription. Science, 2020.

30
23. Koonin, E.V., Why the Central Dogma: on the nature of the great biological exclusion principle.
Biol Direct, 2015. 10: p. 52.
24. Mitchell, L.A., et al., Synthesis, debugging, and effects of synthetic chromosome consolidation:
synVI and beyond. Science, 2017. 355(6329).
25. Shen, Y., et al., Deep functional analysis of synII, a 770-kilobase synthetic yeast chromosome.
Science, 2017. 355(6329).
26. Wu, Y., et al., Bug mapping and fitness testing of chemically synthesized chromosome X. Science,
2017. 355(6329).
27. Zhang, W., et al., Engineering the ribosomal DNA in a megabase synthetic chromosome. Science,
2017. 355(6329).
28. Haltiner, M.M., S.T. Smale, and R. Tjian, Two distinct promoter elements in the human rRNA
gene identified by linker scanning mutagenesis. Mol Cell Biol, 1986. 6(1): p. 227-35.
29. Learned, R.M., et al., Human rRNA transcription is modulated by the coordinate binding of two
factors to an upstream control element. Cell, 1986. 45(6): p. 847-57.
30. Su'etsugu, M., et al., Exponential propagation of large circular DNA by reconstitution of a
chromosome-replication cycle. Nucleic Acids Res, 2017. 45(20): p. 11525-11534.
31. Fitzgerald, D.M. and S.M. Rosenberg, What is mutation? A chapter in the series: How microbes
"jeopardize" the modern synthesis. PLoS Genet, 2019. 15(4): p. e1007995.
32. Iida, T. and T. Kobayashi, RNA Polymerase I Activators Count and Adjust Ribosomal RNA Gene
Copy Number. Mol Cell, 2019. 73(4): p. 645-654 e13.
33. Lu, K.L., et al., Transgenerational dynamics of rDNA copy number in Drosophila male germline
stem cells. Elife, 2018. 7.
34. Nelson, J.O., et al., Mechanisms of rDNA Copy Number Maintenance. Trends Genet, 2019.
35(10): p. 734-742.
35. Van Hofwegen, D.J., C.J. Hovde, and S.A. Minnich, Rapid Evolution of Citrate Utilization by
Escherichia coli by Direct Selection Requires citT and dctA. J Bacteriol, 2016. 198(7): p. 1022-34.
36. Gottesman, S., Trouble is coming: Signaling pathways that regulate general stress responses in
bacteria. J Biol Chem, 2019. 294(31): p. 11685-11700.
37. Harrow, J., et al., GENCODE: the reference human genome annotation for The ENCODE Project.
Genome Res, 2012. 22(9): p. 1760-74.
38. Consortium, E.P., An integrated encyclopedia of DNA elements in the human genome. Nature,
2012. 489(7414): p. 57-74.
39. Consortium, E.P., et al., Identification and analysis of functional elements in 1% of the human
genome by the ENCODE pilot project. Nature, 2007. 447(7146): p. 799-816.
40. Gerstein, M.B., et al., What is a gene, post-ENCODE? History and updated definition. Genome
Res, 2007. 17(6): p. 669-81.
41. Mudge, J.M., A. Frankish, and J. Harrow, Functional transcriptomics in the post-ENCODE era.
Genome Res, 2013. 23(12): p. 1961-73.
42. Lander, E.S., et al., Initial sequencing and analysis of the human genome. Nature, 2001.
409(6822): p. 860-921.
43. Venter, J.C., et al., The sequence of the human genome. Science, 2001. 291(5507): p. 1304-51.
44. Ruiz-Orera, J. and M.M. Alba, Translation of Small Open Reading Frames: Roles in Regulation
and Evolutionary Innovation. Trends Genet, 2019. 35(3): p. 186-198.
45. Szafranski, P., et al., Novel FOXF1 Deep Intronic Deletion Causes Lethal Lung Developmental
Disorder, Alveolar Capillary Dysplasia with Misalignment of Pulmonary Veins. Hum Mutat, 2013.
34(11): p. 1467-71.

31
46. Sosnowski, B.A., J.M. Belote, and M. McKeown, Sex-specific alternative splicing of RNA from the
transformer gene results from sequence-dependent splice site blockage. Cell, 1989. 58(3): p. 449-
59.
47. Venables, J.P., J. Tazi, and F. Juge, Regulated functional alternative splicing in Drosophila. Nucleic
Acids Research, 2012. 40(1): p. 1-10.
48. Salz, H.K. and J.W. Erickson, Sex determination in Drosophila The view from the top. Fly, 2010.
4(1): p. 60-70.
49. Knoop, V., When you can't trust the DNA: RNA editing changes transcript sequences. Cell Mol
Life Sci, 2011. 68(4): p. 567-86.
50. Tan, C. and J.P. Tomkins, Information Processing Differences Between Bacteria and Eukarya—
Implications for the Myth of Eukaryogenesis. Answers Research Journal, 2015. 8: p. 143–162.
51. Tan, C. and J.P. Tomkins, Information Processing Differences Between Archaea and Eukaraya—
Implications for Homologs and the Myth of Eukaryogenesis. Answers Research Journal, 2015. 8:
p. 121–141.
52. Itaya, M., et al., Combining two genomes in one cell: stable cloning of the Synechocystis PCC6803
genome in the Bacillus subtilis 168 genome. Proc Natl Acad Sci U S A, 2005. 102(44): p. 15971-6.

32

View publication stats

You might also like