Is DNA a 3.85 billion year old life-encoding data structure from space?

Abiogenesis – which came first the DNA, the RNA or the enzyme?

Abiogenesis is the theory of life originating from natural processes on Earth. It purports that organic compounds comprising the building blocks and constituents of life arose and assembled from non-living matter.


Although experiments have shown that it is possible to form amino acids and other compounds in the lab under controlled conditions, this simulated scenario has failed to yield a double-stranded, progenitor molecule with instructions for creating life.

The molecule I’m talking about is of course DNA, the original subject of the chicken-and-egg conundrum. In a somewhat longer blog post than I anticipated, I am going to explore our current understanding of molecular evolution and illuminate the tricky problem that is pinning down the origin of life and address the complexities in answering which came first – the DNA, the RNA or the enzyme?

The three greatest mysteries in science are generally considered to be the origin of the universe, the origin of life and the origin of consciousness. “How did we get here?”, “How was life formed?” and “Who created it?” are natural questions for the curious mind to ponder.

When studying the mechanisms of cell replication and genetics in university I was really fascinated by the topic. I accepted what my Professors outlined as “the central dogma” – the established scientific worldview that from DNA, RNA is made, and from RNA, protein is made. This bidirectional ebb and flow of cellular construction, was something I had to learn to get good grades, not to philosophically analyze.

For a biology module, Richard Dawkin’s The Selfish gene was recommended reading. Cleverly written and entirely captivating, I was sold on the idea that humans are vessels or meat suits for genetic life to survive and reproduce through, and that the genetic information of life can be reduced to mechanistic nuts and bolts which randomly arose through chance conditions in a “primordial soup”.

DNA as a data structure

It wasn’t until after uni when I started questioning my friends’ atheistic world views and delved deep inside to find some answers, I returned to reflect on the complexities of cellular biology. It was as if I was seeing genetics and molecular biology in a new light and could finally appreciate the wonder that is DNA. Even now, the more I sit with the concept and mechanism of DNA, the more questions I have.

DNA is a huge and complex molecule found in every cell of every organism on Earth. Double-stranded, containing 3 billion base pairs per strand,  it is composed of 4 distinct nucleotides (A, T, C and G’s). Each strand is complementary to the other with an inverse copy of nucleotides appearing on the intertwining strand bonding through A-T’s and C-G’s affinity for another.

DNA has built-in encoded patterns of nucleotides that trigger the start and stop of transcription (the first step in protein synthesis/the central dogma). This linear sequence of letters forms the genetic code of life. Through the lens of a programmer you would be forgiven for comparing a DNA strand to an array (data structure commonly used in programming to store and retrieve, you guessed it, data), or a 2D array taking both strands into consideration.

Like an array, DNA cannot perform actions on itself. A team of helper molecules and enzymes that support and catalyze steps in DNA replication and protein synthesis are recruited to the DNA to perform their highly specialized roles. In a programming analogy, these molecules are the methods or functions of a class called on the data structure to manipulate it in some way. These methods constantly read and access DNA throughout the life of the cell to direct protein synthesis and orchestrate all cellular activity.

The abstraction of DNA and its processes can be represented so well in code, a brand new field of bioinformatics has formed to computationally tackle genetic questions. Algorithms process files storing DNA or protein letters to compare sequence similarity, predict RNA and protein structure, as well as build phylogenetic trees to map a species evolutionary history.

To predict the proteins that will be made from a DNA sequence, an RNA codon table in transcription is encoded using a dictionary/map. RNA is an intermediate, complementary template strand to DNA forming the basis of protein generation. Here the occurrence of RNA nucleotides maps to the generation of amino acids.

RNA_codons = {
UUU: F      CUU: L      AUU: I      GUU: V
UUC: F      CUC: L      AUC: I      GUC: V
UUA: L      CUA: L      AUA: I      GUA: V
UUG: L      CUG: L      AUG: M      GUG: V
UCU: S      CCU: P      ACU: T      GCU: A
UCC: S      CCC: P      ACC: T      GCC: A
UCA: S      CCA: P      ACA: T      GCA: A
UCG: S      CCG: P      ACG: T      GCG: A
UAU: Y      CAU: H      AAU: N      GAU: D
UAC: Y      CAC: H      AAC: N      GAC: D
UAA: Stop   CAA: Q      AAA: K      GAA: E
UAG: Stop   CAG: Q      AAG: K      GAG: E
UGU: C      CGU: R      AGU: S      GGU: G
UGC: C      CGC: R      AGC: S      GGC: G
UGA: Stop   CGA: R      AGA: R      GGA: G
UGG: W      CGG: R      AGG: R      GGG: G 

protein = translate(RNA)

Source: RNA codon table

This is just a snippet of how bioinformaticians decompose biological problems into computational ones. Bioinformatics has come a long way in the past 20 years – a direct correlation to increasing processing power. Sometimes it all just seems a bit strange how accurately DNA can be thought of in computational terms and how we can model replication and translation so conveniently within our current programming paradigm.

The human source code can be found here

Recommended reading: DNA through the eyes of a coder

If DNA is Software, Who Wrote the Code?

The rise of self-replicating organisms

So if DNA is the universal data structure of life, how was it created or how did it evolve? First we need to understand the intricacies of cell replication and the problem this poses to abiogenesis.

Cell replication occurs in every cell in our bodies. For replication to occur an exact copy of DNA needs to be transferred into each daughter cell. Genes written into DNA encode the proteins responsible for making every biomolecule in every living cell. DNA replication has an error rate of 1 in 1 billion letters. This astonishing fidelity is the basis of heredity. A single error can be detrimental, but sometimes a mutation leads to increased fitness and falls under the criterion of neo-Darwinian evolution.

Immortalist scientists study replication to defeat or combat the ageing process as our cells become less efficient at replication as we get older. Oncology scientists are trying to uncover the conditions whereby replication goes awry to prevent uncontrollable division of cells that form cancerous tumours.

Cell replication isn’t as algorithmic-ally simple as splitting an array into 2 equal halves with equal composition. DNA must form a molecular dance with DNA polymerase for replication to even begin. A number of other molecules are also recruited to the site of DNA replication for various structural and supporting roles. These molecular minions are equipped to prise open the 2 intertwined strands and replicate both simultaneously.

Cell replication full video

So even if DNA rose independently from randomly colliding elements bubbling in Goldilocks conditions in a soupy crater on Earth’s surface billions of years ago, it cannot replicate without the presence of DNA polymerase, the enzyme that attaches to a single strand and slides along the sequence inserting a complementary letter to each nucleotide into the growing strand.

DNA, the data structure, is just the template of our cellular information-processing and replicating systems and requires interaction with a medley of other molecules.

Francis S. Collins, the principal scientist leading the human genome project and author of The Language of God wrote:

DNA, with its phosphate-sugar backbone and intricately arranged organic bases, stacked neatly on top of one another and paired together at each rung of the twisted double helix, seems an utterly improbable molecule to have “just happened”—especially since DNA seems to possess no intrinsic means of copying itself.

So here we are faced with a conundrum. If DNA or RNA came first then what made them? If the enzyme came first then how was it encoded?

Filling in the gaps

Let’s trace back to what we know about Earth’s history:

  • Earth is 4.55 billion years old (known from isotope dating of radioactive elements)
  • Earth was an inhospitable place for its first 550 million years
  • Rocks dating 4 billion years old show no sign of genetic life forms
  • 3.85 billion years ago, evidence of flourishing microbial life (preserved microorganisms from fossilized rock)
  • What happened in 150 million years that triggered life in the form of single-celled organisms capable of information storage, probably using DNA, and were self-replicating and capable of evolving into multiple different types?
  • Several offered hypotheses on how gene transfer happened between organisms but not on how the first single cell organism originated

Creation simulation

Stanley Miller joined Harold Urey’s lab as a PhD student in 1952 to test the Oparin-Haldane hypothesis. It proposed that hydrogen rich conditions on early Earth combined with methane and water vapour could make organic compounds when exposed to lightning, volcanic heat or radiation.

By applying electrical charge to water (H2O), methane (CH4), ammonia (NH3), and hydrogen (H2), this landmark experiment remarkably found amino acids present in the solution within a week. Urey was so impressed he fully credited Miller in the discovery. This experiment has been repeated many times since to generate more amino acids, sugars and even nucleic acids.

The Urey-Miller experiment is often posited as evidence for abiogenesis. Despite all our efforts, however, we haven’t been unable to mix up the right stuff to form life in the lab. Given enough time, a time span larger than a human mind can comprehend, is it possible a replicating scaffold with programmed purpose form?


The structure and nature of DNA’s existence has troubled many a scientist. Francis Crick (the co-discoverer of DNA) and Leslie Orgel suggested that the molecule may have originated off-planet. Coining the theory Directed Panspermia, they presented their self-proclaimed “highly unorthodox proposal” at a Communication with Extraterrestrial Intelligence conference, organized by Carl Sagan in Soviet Armenia in 1971.

Two years after the conference they published an article on directed panspermia. Some interesting points they made in favour of their theory was the universality of genetic code across species. They argued that if life had evolved multiple times independently there would be more variation in the genetic codes organisms used and that you would expect the same codons to code for different amino acids. The integrity of the code however, is highly conserved across species.

As clumps of amino acids have been found on meteorites, this has sparked interesting versions of panspermia where DNA hiked a ride on a meteor and crashed to Earth as a life spawning passenger. This also ignites the possibility that life may be scattered throughout the universe.

The origins of directed panspermia.

RNA world

Thomas Cech in 1989 was awarded the Nobel prize of chemistry for groundbreaking work showing that ribozymes, a class of RNA can catalyze chemical reactions

These experiments pinned RNA as the first molecule to carry genetic information and the likely star for the role of both the chicken and egg. This is due to its ability to:

  • self-replicate
  • act as an enzyme
  • be converted to DNA through reverse transcription.

RNA is now widely accepted as the original progenitor molecule and pre-cellular life form capable of encoding information, occupying the spotlight in textbooks and the abiogenesis model.

The RNA world hypothesis hasn’t been without its criticisms, receiving hard assessment from Harold S Bernhardt in his paper titled The RNA world hypothesis: the worst theory of the early evolution of life (except for all the others).

Other critiques point out the stability of RNA as an enzyme in lab conditions compared to in searing temperatures on a supposed pre-biotic Earth. Peter Willis and Charles Carter, developers of the RNA-peptide hypothesis are confident that RNA molecules alone would not have been sufficient for all the processes needed to initiate life on Earth. The End of the RNA World Is Near, Biochemists Argue.

Although the RNA world hypothesis is likely too simplistic to satisfy the problem of the origin of life, it is an excellent place to begin if you are starting out on your quest for understanding our molecular origins and molecular complexity in general.

Quantum DNA

Erwin Schrödinger in his 1944 book What is life? made some predictions on the structure of the basic unit of life as well as positing that the problem of the origin of life would be solved through quantum physics. It wasn’t until 1953 that the structure of DNA was discovered, yet Schrödinger accurately classified genes as aperiodic crystals. Crystals have a repeated molecular structure with their order encoded at a quantum level. The fidelity of replication we discussed earlier was what led Schrödinger to believe DNA was governed by quantum laws and not classical.

Is it possible that life emerged directly from the atomic world guided by the laws of quantum mechanics? How could we possibly test this?

Both James Watson (a biology student who wanted to be a naturalist) and Francis Crick (a physicist who worked on magnetic mines) switched careers to study DNA after reading Shrödinger’s book. Almost a century later, quantum physics is still as sexy, alluring and mysterious. For any progress on these questions, collaboration between the worlds of quantum physics and biology is critical to help biologists understand quantum concepts and liberate us from our ball-and-stick models.

The biggest questions in life seem to boil down to information processing, how information is stored, encoded and inherited. Can understanding our universe’s quantum dynamics lead us further down a path of deciphering these questions?


If you have made it this far I commend you. Let me know in the comments if I missed a detail or simply if this stuff keeps you awake at night too.

Many aspects of life and the world we live in are perplexing and far-out. What boggles my mind is that at a nanoscale, our cellular machinery is working so hard and at such complexity to sustain life.

Whether you are a molecular materialist confident in the purpose of life’s fundamental components guided by laws of nature or a Promethean dreamer with an imagination stretching the length of the universe, the concept of DNA and how it fits into theories on the origin of life gives plenty to chew on.


Francis Collins The Language of God

Life on the Edge: The Coming of Age of Quantum Biology




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.