Royal Holloway logo with departmental theme Royal Holloway, University of London

Decipherable past of the triplet code, and early life of proteins

Monday, 24th January, 2005
Edward Trifonov, University of Haifa, Israel.

Origin and evolution of the triplet code, codon by codon, are reconstructed on the basis of consensus temporal order of appearance of amino acids deduced from over 70 different expert opinions. Principles of thermostability, complementarity and processivity are applied as well. From the resulting chart several important predictions follow, which are confirmed by computational sequence analyses. In particular, it is calculated that the earliest mRNAs were short (~20 nt) CG-rich polynucleotides. These short sequences could have formed hairpins, which would be of high evolutionary advantage. All what could survive of these hairpins in extant sequences is complementarity of purines and pyrimidines in the second triplet positions which are highly conserved through evolution, compared to the first and second triplet positions.

By examining only the second positions in modern mRNA sequences one does find remnants of those ancient hairpins. The hairpins with stem size of three triplets and three base loops are found in significant excess. The mini-genes of the size 6-7 codons must have been fused at a later stage in longer chains. Polymer-statistical properties of polypeptide chains suggest that the chain trajectory occasionally returns to itself at a specific characteristic length dependent on the chain flexibility. The contour length of such closed loops in mixed sequence proteins is 25-30 amino-acid residues. As it was recently discovered by I. N. Berezovsky and coauthors, not only such closed loops are, indeed, observed in the crystallized proteins, but every globular protein is, actually, made of consecutively connected closed loop modules of 25-30 residues each. As simple calculation shows, small proteins of this very size may fold in experimentally observed time, with all possible states visited.

The respective number of states is far below the astronomical Levinthal range. Thus, at some early stage of protein evolution the closed loop units of the above size could well have become the initial modules of future elaborate protein structures. Search for abundant sequence modules of this size - presumed ancient prototypes of extant closed loops in proteins - resulted in a spectrum of such prototypes. These sequences as such are not present in the modern proteomes. Rather, large amounts of their heavily mutated derivatives are found. Vestiges of Last Universal Common Ancestor (LUCA) can be found in extant protein sequences in form of entirely conserved short sequences present in all or almost all sequenced prokaryotic proteomes (currently over 160). Protein structures of PDB crystal database are identified where the most conserved sequences (octamers for certainty) are found, and conserved structural elements containing the octamers are outlined. The resulting spectrum of specific sequence/structure/function modules provides quantitatively justified description of the most basic characteristics of LUCA and of its earlier forms ("baby LUCA").


Last updated Tue, 16-Dec-2008 11:34 GMT / PS
Department of Computer Science, University of London, Egham, Surrey TW20 0EX
Tel/Fax : +44 (0)1784 443421 /439786
@@('' )@@
@@('' )@@
@@('' )@@
@@('' )@@
@@('' )@@
@@('' )@@
@@('' )@@
@@('' )@@
@@('' )@@