On the Origin of the Genetic Code



By Paul R. Martin

"Natural Selection can originate nothing; it can only
pick out and choose among the things which are
originated by some other law. Strictly speaking
therefore, Mr. Darwin's theory is not a theory
on the Origin of Species at all, but only a
theory on the causes which lead to the relative
success or failure of such new forms
as may be born into the world."

George Campbell, 8th Duke of Argyll1

This essay examines the question of the origin of the genetic code. The genetic code has become well known over the last six decades. During that time, major strides have been made in our understanding of the molecular basis of life and the mechanisms that account for heredity. It is now well known that DNA carries information necessary for many fundamental life processes and that this information is encoded in the DNA according to a specific ubiquitous genetic code that is common to all life forms. But it remains a mystery as to how this code was established in the first place. The argument we will present is that a mind was necessary in order for the code to have come into existence at the very beginning.

Since the "mind" plays such an important role in this essay, let's be clear about the meaning of the term as used in this essay. Webster defines mind as: "that which thinks, perceives, feels, wills, etc.; seat or subject of consciousness", and as: "all of an individual's conscious experiences". So we will consider "mind" to be whatever is responsible for, and capable of doing, the kind of thinking that you, or anyone else, consciously experiences. But we must be careful not to limit this notion merely to brain processes, nor to endow it with unreasonable almighty powers.

The notion of 'mind' as used in this argument is along the lines of the notion articulated by Gregory Bateson2, and to the bold assertion of the notion of Beon as presented by Greylorn Ell3. While Bateson claims that consciousness is beyond the scope of his book4, he nevertheless presents evidence that something like a mind, which we usually associate with a human brain, seems to be present and at work on a much grander and longer term scale in the universe as a whole. Ell, on the other hand, fearlessly offers a cogent explanation not only for how consciousness operates, but for how it originated. In both cases, the notion of mind is freed from the confines of the physical brain. In this essay we will make the case that consciousness is a necessary ingredient of the "mind" which originated the genetic code.

In order for this "mind" to have played a role in the origination of the genetic code, it necessarily must exist and operate outside the confines of the brain. After all, the genetic code was established long before any biological brains existed. This is obviously a radical departure from the currently accepted views of Science.

There is no evidence for, nor does the argument in this essay depend in any way on, the "mind" being perfect, infinite, omniscient, omnipotent, immutable, eternal, or complete. In fact all those attributes are explicitly denied. That should preclude any inclination or temptation to conflate the mind we are talking about here with any notion of God as held by any religious believer.

To sum up, the "mind" we are talking about is an entity having the capability of perception, conception, and will, similar to what you and I experience, but which is not confined to the brain. Since this is an unusual use of the term 'mind' you are asked to accept the notion in the same sense that mathematicians take primitive terms or axioms even though the notion might be foreign to ordinary usage, or even considered false.

This essay goes beyond Paley's argument from design. Paley could have inferred more than he did from the watch he found on a deserted beach on an uninhabited island. Just as Paley claimed, the watch does indeed strongly suggest that there was a designer of the watch. But since Paley found an actual watch, and not just drawings showing the design of a watch, the watch also strongly suggests an engineer, or a builder of the watch. As we well know from Leonardo da Vinci, it is one thing to design a helicopter; it is quite another to build one that actually flies.

The engineering function would include the planning for and the construction of the device in addition to the mere design, and one could argue that it might also include a maintenance and management function as well.

Moreover, the functions of planning, designing, and constructing strongly suggest a conscious engineer, as opposed merely to an intelligent engineer. After all, we already have computers that we deem to be intelligent, but we have no computer that is conscious. We also know that we use computers to perform planning, design, fabrication, assembly and other "engineering" functions, but none of those is accomplished without a considerable amount of conscious engineering done by humans ahead of time. So to make the complete analogy, it is clear that consciousness plays a necessary role in the processes leading to the actual construction of an artifact like a watch. A conscious engineer is a couple cuts above a mere intelligent designer.

The argument we will follow can be summarized as follows:

Premise 1: A conscious mind is required in order to produce a symbolic system.

Premise 2: We find symbolic systems in living organisms.

Logical conclusion: Therefore a conscious mind was required to produce life.

We begin by comparing the genetic code with another well-known code, the Morse Code. First we isolate the exact equivalent of Morse's "codebook" in biology. We identify the physical structure that contains the collection of symbols (the codons) used in the code, along with the pair-wise mapping from those symbols onto the set of objects they represent (the individual amino acid groups). It turns out that the "codebook" is found in a set of 61 separate sequences within the DNA.

The challenge presented in this essay is to explain how that set of sequences came to exist originally without the active conscious involvement of a mind. An explanation including the mind of a conscious engineer seems much easier to accept than a mechanism or process that does not include a mind.

Next let me explain what I mean by a "symbolic system". It is a complement of, or a counterpart to, what I will call a "physical system".

Most of the physical world behaves according to strict laws of physics. That is, the actions of the parts are determined strictly by physical laws and initial conditions. Those parts and their actions make up the physical systems. The planets and moons in the solar system proceeding along their orbital paths are examples of physical systems. The laws of physics, to the extent that we have discovered them, consist of systems of mathematical expressions (the laws) and initial conditions. Examples include Kepler's or Newton's, or Einstein's equations together with the positions and momenta of the celestial bodies.

The important feature of a physical system is that the actions proceed strictly according to the laws of nature without any involvement of, or interference from, any mind. A science experiment is a physical system only after the apparatus is all set up and the experiment is started and left alone.

We will ignore any possible mental involvement in setting up the initial conditions, or in formulating the laws of nature in the first place. We will ignore them not because they are easy questions, nor because they are unimportant, but simply because they are off-topic for this essay.

Symbolic systems are different from physical systems in one key respect: symbolic systems contain symbolic information which has an effect on the behavior of a physical system while physical systems do not contain any such effective symbolic information.

But what, exactly, is symbolic information, and what exactly is a symbol? This is where the mind enters the picture. A symbol is a specific pattern consisting of some physical entities, that is both recognizable to some conscious observer of the symbol, and which has some specific meaning to the conscious observer. And, of course, it is in the mind associated with the conscious observer that the experiences of recognition and meaning take place.

The pattern of ink on paper making up the letter 'a', for example, or the pattern of a short pulse followed by a long pulse in a continuous wave radio signal, which happens to signify the letter 'a' in Morse code, are examples of symbols. Both are recognized and understood by some conscious human minds.

Symbolic information is any mapping from a symbol to a part of a physical system. One such mapping might be the inked impressions of a string of alphabetic symbols onto a sheet of paper. Since the symbols constitute meaning in some mind, the mapping consequently means something to someone with a mind. And it is the existence of the mind that is crucial to this argument. No mind implies no symbols hence no symbolic systems.

With these definitions of 'symbol' and 'symbolic information', and taking the terms 'mind' and 'meaning' to be primitive concepts (in the mathematical sense), the truth of Proposition 1 in my syllogism is established by definition:

Proposition 1: Mind is required in order to produce a symbolic system.

Establishing the truth of Proposition 2 will take some digging into molecular biology.

Proposition 2: We find symbolic systems in living organisms.

To find those symbolic systems, we need to examine the role of information, both physical and symbolic, in the basic chemical reactions of life.

To review, there is a lot of information present in a physical system, such as the positions of particles, the strengths of forces, etc. but none of this information is symbolic--it is purely physical. A ten-kilogram stone "contains" or "carries around with it" the information that it weighs ten kilograms. That is physical information. By contrast, this string of thirty characters, 'The stone weighs 10 kilograms', contains the same information, but in this case the information is symbolic.

To be symbolic, the information must be represented in some physical form, e.g. a string of letters made of ink on paper, which has nothing to do with the physical system the information is about, but which has some meaning to some mind.

In this case, a mind gaining the information in that 30-character string would "know" that the stone weighs 10 kilograms. By contrast, we could say that a nearby gravitating body would also "know" that the stone weighs 10 kilograms by virtue of the physical information contained in the stone itself. As a result, the stone and the body would undergo an interaction consistent with the laws of physics. The crucial difference is that neither the stone nor the gravitating body has a mind. And here, of course, we have to appeal to our common, undefined, notion of 'mind', which we have declared to be a primitive concept.

Symbolic information is commonly called "encoded information". That is because the information is represented by a code which can be thought of as a made-up arbitrary representation of the "real", or physical, information. I think that it is not insignificant that biologists themselves refer to the "genetic code" as a code. It should therefore not be too difficult to accept the argument that the genetic code does indeed convey symbolic information.

A code is an assignment pairing up symbols with real objects. Language is nothing but coded, or symbolic, information. The five-letter string, 'horse' is a code word assigned to a big familiar animal, but there is little resemblance between the string of letters and the animal itself.

The important question here is, exactly how is the assignment of those code words and the things they represent made originally? Who or what dreams up and establishes the code by making the assignments? The answer is that some mind or minds establishes the code. For languages, this is obvious. It was an extremely long and complex process, but in principle, it is easy to conclude that it was a long series of human mental activities that developed each of our languages.

But let's return to a much simpler example, that of Morse Code. Here is a code that was deliberately chosen and established by one person at one time and which was used for many decades to encode (secondarily) many millions of messages sent around the world. How and why did the letter 'N' get to be encoded as a dash followed by a dot? Did Sam Morse have some reason for choosing that pattern? He may have, but whatever his reasons, they don't affect the use of Morse Code in sending messages.

The most common use of Morse Code is to transfer information from one mind to another. Although with the use of Teletypewriters, or computers, the coded messages might be sent or received by physical mindless systems, this is not of interest to us now. We are not interested in the minds as senders or receivers. What we are interested in is the mind that originated the code in the first place. We are interested only in Samuel Morse's mind. We are interested in the choices Morse himself made in his mind and how he wrote the code down and published it. That was what made the code useful.

The assertion in this argument is that when we find a system that uses encoded or symbolic information, then a mind was necessary at the beginning in order to establish the code. That should be obvious because of two facts. First, there is no physical reason for the particular correspondence between a symbol and what it represents, and secondly, there must be consistent agreement among the various parts of the system on what those correspondences are. In other words, the code was arbitrarily chosen at the beginning, and the code is instrumental in the ongoing functioning of the system.

To simplify this a little, let me use one other example of a symbolic system, a light bulb controlled by a switch on the wall. The symbolic information in this system is the position of the lever on the switch. If the lever is up, the light will be on. If the lever is down, the light will be off. There is nothing in the laws of nature that demand that correspondence. The correspondence was established by the manufacturer of the switch and its installation by the electrician.

The two mental (engineering) acts of designing the switch and circuit, and of installing them, determine how the physical world will behave in the presence of information encoded in the position of the switch lever. The key important notion here is that with proper engineering, purely symbolic information can influence the behavior of purely physical systems.

The behavior of the physical world follows strict physical laws, with no involvement of mind. The actual movement of the switch lever may or may not involve a mind. But the design of the system, in particular the assignment of what position of the switch will mean what, necessarily required a mind in the designer and installer of the switch. And, the position of the switch lever directly influences the physical behavior of the light bulb.

Now if symbolic information is instrumental in the ongoing functioning of an otherwise physical system, then symbols must be assigned to physical things. These assignments require the decisions and choices of a mind to determine what is associated with what. So, mind is required in order to produce a symbolic system.

Next we turn to a familiar but absolutely marvelous symbolic system present in all known living organisms. It is the protein production system in living cells. As we well know, proteins are produced in cells by stringing amino acid groups together like strings of pearls. There are 20 different amino acid groups used by the cells and they are chosen in a sequence that is determined by symbolic information encoded in DNA (actually RNA but the information came from DNA).

We won't get into the details of the mechanism of protein synthesis. That would be like getting into the details of the traffic in Morse Code messages around the world, or in the design of the mechanical instruments used by Morse Code operators to key in their messages. Instead, we are going to focus on the very interface between the symbolic part of the system, and the physical part of the system. The code is the symbolic part, the amino acid group is the physical part, and the "codebook" provides the interface.

Don't be confused, the molecules which embody the code (like ink patterns embodying letters) are indeed physical. But they are not part of the physical system we are interested in. That is, the physical information carried by the code does not affect the functioning of the system even though the symbolic information encoded in the symbol does. The physical molecules reacting to the coded patterns, i.e. the amino acid group, constitutes the physical part of the system. The encoded symbolic information constitutes the symbolic part of the system.

The symbolic information about what amino acid group to take next actually causes the physical chemical reaction to take place which takes up the correct amino acid group and adds it to the growing protein chain. It is this causal link between symbol and physical molecule that we will examine more closely.

We are going to look in detail at how the genetic code is physically represented, or "written down", if you prefer. That will be like looking at a sheet of paper containing the Morse Code assignments. It is sheets like this that were used by all the many telegraph operators in order to successfully communicate using the code. Bear in mind, Morse chose the code strictly as a mental activity. He used his mind to do it. The code was accepted by everyone else simply because it wouldn't work as a communication code if they didn't.

Once Morse's code was written down on a sheet of paper, that sheet was mimeographed, or otherwise copied, and millions of copies were sent around the world to train new telegraph operators. We will be looking at the biological counterpart for the genetic code. We will see the equivalent of those millions of code sheets and how they are distributed, and we will be posing the hard question of how that first sheet came to be in the first place.

We know that the genetic code is not binary, using the symbols 0 and 1, but quaternary, using the symbols A, C, T, and G. (I am going to simplify and ignore U, which is used in RNA instead of G.). These letters stand for molecules called nucleotides which are the actual symbols used in DNA.

Similar to a binary byte as used in our computers, protein synthesis uses a "byte" consisting of a group of three nucleotides. This group, or "byte" is called a codon. A gene, which is a consecutive string of codons along the DNA strand, contains the symbolic information necessary to assemble one specific protein molecule. Each codon specifies a specific amino acid group, and the sequence of codons specifies the sequence of amino acid groups.

The codon is a string of nucleotides, like a word is a string of alphabetic characters. ATG and CAG distinguish between different amino acids within a cell just like CAT and DOG distinguish symbolically between different animals.5

What we want to zero in on here is the physical assignment, of a specific codon pattern to its specific amino acid group. We want to find the "codebook"—the physical structure in which the assignment is "written down". We want the equivalent of the assignment sheets of Morse Code.

The codon, being a string, for example 'ATG', is a symbol (or a string of three symbols if you like). The associated amino acid group is a chemical molecule which simply obeys the laws of physics and chemistry. It is purely physical with no symbolic content. So what we are looking for is a physical structure containing both the codon and its assigned amino acid, and we need to find all 61 of them. That will be the genetic "codebook".

The actual code itself is not important here.6 For our purposes we can simply think of a group of three letters like CTT, or GAT, as picking out, or being assigned to exactly one of the 20 amino acids used by living organisms.

At this point it might be tempting to look in detail at the mechanism of using the coded information from the DNA to choose the right amino acid to add to the growing protein molecule. That is done by a huge molecular structure, called a ribosome. But as interesting as that mechanism is, it would be a distraction to belabor it now. Instead we will briefly describe the mechanism and then focus on the genetic "codebook", how it is embodied, and how it gets distributed.

To choose the right amino acid, the ribosome reads a strip of mRNA (messenger RNA), which is a copy of the gene from the DNA, and advances along this strip three letters, or one codon, at a time. When a new codon appears on the strip, it is exposed by the ribosome to the soup of molecules floating around in the cell on the outside of the ribosome.

Among those floating molecules are molecules of the 20 amino acids, each one attached to a molecule of tRNA. That stands for Transfer RNA, and it is these tRNA molecules that we want to concentrate on. They make up the codebook.

The amino acids in the cellular soup are not floating alone like shipwreck survivors in the ocean. The acids are so strongly chemically reactive that they would begin to react with one another if allowed to. Therefore each amino acid is attached to a tRNA molecule, which works like a life jacket. As a shipwreck survivor wrapped in a buoyant jacket will not try to grab onto other swimmers, an amino acid attached to a tRNA molecule will not react with other acids.7

These tRNA molecules with their attached amino acid groups are the physical connection between the symbolic and physical information in the system. They are copies of the genetic "codebook". Actually they are not entire copies but each one is a single line item from the "codebook". Each tRNA molecule specifies the assignment of one codon to one amino acid group. It's as if Morse's sheets got cut into horizontal strips, each containing only one assignment, before the sheets were distributed to the Morse Code classrooms.

There is one specific type of tRNA for each of the 64 code combinations, with the exception of three combinations which are used for punctuation. So there are 61 different types. Each tRNA molecule has a specific amino acid chemically bonded to one end, and it has a codon of three letters exposed on the other end.

Now let's look at the structure of the tRNA molecule. It looks sort of like a twisted rubber band that forms a scarecrow-looking structure.8 There is a pumpkin shaped head, two outstretched arms with catcher's mitts on the hands, and two skinny legs taped together with one leg shorter than the other. That's the image you should have in mind when we go through this. This entire molecule is made of nothing but 'A's, 'T's, 'C's, and 'G's strung together by a strand made of the sugar that is a part of each of the nucleotides.

The sequence of these nucleotides is important for several reasons. One reason is that to make and keep the shape of the scarecrow, the strip must stick to itself on sections like the arms. The strip goes over the top of the arm, for example, then loops around to form the baseball mitt, and then back along the underside of the arm. Here it sticks to the upper part of the arm, by 'A's sticking to 'T's and 'G's sticking to 'C's, and that holds the thing together.

There are a couple of even more important reasons why the sequence is important. The first is a particular codon right on the top of the pumpkin head. This codon is the one that this particular tRNA codes for. That is, if this tRNA codes for, and carries a Tryptophan molecule for example, then there will be an ACC on the top of the pumpkin head.

The second important sequence of nucleotides in the tRNA is on the longer of the two legs. The chemical structure of this string of nucleotides is such that it sticks onto a certain place on the exact amino acid that matches the code in its pumpkin head. And, this sequence sticks to none of the other amino acids, nor does it stick to the correct one at any but the correct site on the molecule. Truly amazing.

So what we have are 61 different kinds of tRNA, each one grabbing and holding its specific amino acid, and in due time, getting snagged by a ribosome which separates the tRNA and sends it back into the soup to get another amino acid.

To make these 61 different types of tRNA there is for each of the 61, at least one section of the DNA that has the pattern for making it. So there are at least 61 such separate and independent sections within the cell's DNA.

Now let's think about the two important "business ends" of the tRNA. The pumpkin head contains the symbolic information identifying the particular amino acid. ACC for example in the case of Tryptophan. And the other important end is the long leg which actually binds chemically with the real physical Tryptophan molecule. The pumpkin head is symbolic and the leg is physical. This molecule of tRNA makes the association and it is the only thing that makes the association. It is where the assignment of the code physically exists. It is the symbolic-physical link. It is equivalent to one entry in Samuel Morse's chart linking letters with patterns of dots and dashes. Each tRNA molecule is one particular entry in the genetic "codebook".

Now, finally, we are ready to ask the really important question. How did the assignments in the genetic code get made? What is the equivalent of Samuel F.B. Morse assigning dot and dash patterns to letters? The answer is easy if some mind made those assignments. It is simply a matter of filling out that 64 position code sheet. For each of the 64 combinations of A, T, C, and G, choose one of the 20 amino acids. Just make sure you use each amino acid at least once. Anyone with a mind could easily do that.

I suppose it could have been done by some mindless mechanism, but I don't know where that "codebook" sheet spelling out the assignments is, or was during the origination of life, nor can I imagine how any physical process could make use of such a "codebook" if it did exist. After all, the filled out "codebook" is symbolic and its representation has nothing to do with any physical processes.

What about the engineering problem of making the tRNA molecules themselves after the code has been decided on? Well, what you need to make are 61 different sections of DNA each one containing the sequence of nucleotides whose complement will fold up into one of those pumpkin-headed scarecrow configurations. That part wouldn't be too hard because the same sequence would serve to make the overall scarecrow structure for all 61.

But to get it right, halfway down each of those sections needs to be the specific three letter codon that is specified in the assignment table. And that is different in each of the 61. The whole section is made of nucleotides ('A's, 'T's, 'C's, and 'G's) but these three are very special. They contain the symbolic information that identifies this particular tRNA molecule and distinguishes it from the other 60.

In addition to these special letters in the middle, there is the special sequence at the end of what will become the longer leg of the scarecrow. This needs to be the specific sequence of nucleotides that will form the physical binding site for the specific amino acid that matches the codon in the assignment table. Getting those two sequences right seems to be tricky without a mind.

The difficulty is not just in getting the codon and the tail to match up according to the assignment table, but to get 61 different ones right at the same time. "Right" meaning that the codon and the binding site are consistent with an arbitrary assignment of the 61 combinations. Here again, it is easy to see how anyone with a mind could consult the assignment table and deliberately fashion the DNA sequences to make the assignments.

The problem is to explain how this comes about by a mindless process. How does the information in those sixty-one different stretches of DNA get established without a mind?

In trying to imagine the mechanism used by Random Mutation and Natural Selection and how it is supposed to have produced biological systems, we usually think of the evolution as being incremental and gradual. But in this case it seems that the entire genetic code must have appeared complete at once, or at most in three steps.

It could have been developed incrementally, for example starting with a system using only one nucleotide per codon and using only two or three amino acids, or some other molecules. From that starting point maybe it evolved into a system using two nucleotides per codon which would allow for up to sixteen amino acids, or other molecules. And, from there, a third nucleotide might have been added to the codon. The actual assignments in the genetic code sort of suggest that this might have happened.

But the logical problem in this scenario can be visualized by imagining that Sam Morse started out by introducing a code consisting of only one dot or dash. The only messages that could be sent would be those composed of the letters e and t. That might have been useful, but not very convenient for expressing language statements.

If he extended the code to use two dots or dashes his system could then be used for messages that were limited to the letters a, e, i, m, n, and t. That might have been useful for some messages.

Later Sam would announce an extension to the code using three symbols, and adding the letters d, g, k, o, r, s, u, and w. Later yet, he would come out with the fourth symbol allowing the rest of the letters b, c, f, h, j, l, p, q, v, x, y, and z. And finally by adding one more symbol his code could be used for numerical digits as well as letters. It's very hard to see how that could have worked very well.

Similarly for biological systems, if the code were evolved incrementally like that it is difficult to see how old organisms could adapt to the new and expanded code. Logically, it seems as though the genetic code must have been established all at once, or at most evolving from a two-nucleotide codon to the three-nucleotide codon we have today. Either way, it seems to be a hard problem for a mindless process to solve.

In conclusion, since we find symbolic systems in living organisms, we can infer that mind was required to produce life.



1. Campbell, George, 8th Duke of Argyll, "The Reign of Law", reproduced in The Teaching Company's course "The Darwinian Revolution" by Prof. Frederick Gregory.
2. Gregory Bateson, "Mind and Nature: A Necessary Unity"
3. Greylorn Ell, "Digital Universe, Analog Soul"
4. Gregory Bateson, "Mind and Nature: A Necessary Unity", p. 142
5. This paragraph attributed to Greylorn Ell
6. See the Genetic Code
7. This paragraph attributed to Greylorn Ell
8. See tRNA

Please send me an email with your comments.

Essays | Essay Home Page
Go To Home Page

©2013 Paul R. Martin, All rights reserved.