What would be the drawbacks of a genetic code with 6 nucleotides instead of 4, but each amino acid could be coded with 2 base pairs instead of 3 (so the genome could be 33% shorter)?

AbouBenAdhem@lemmy.world · edit-2 4 days ago

What would be the drawbacks of a genetic code with 6 nucleotides instead of 4, but each amino acid could be coded with 2 base pairs instead of 3 (so the genome could be 33% shorter)?

thebestaquaman@lemmy.world · 4 days ago

How so? If you have a certain probability of an erroneous match on each base pair, the probability of a successful transcription of a gene is just a binomial probability depending on the length of the gene and the probability of error on each base pair. If you reduce the length of the gene, you reduce the total probability of error.

verdi@feddit.org · 4 days ago

Not all base pairs contribute equally to the result given the code is degenerate, by decreasing the codon size you’re increasing the error probability regardless of gene size by reducing degeneracy. Since proteins would still functionally have the same number of aminoacids but now the signal to noise ratio of transcription would have a greater per aminoacid effect.

thebestaquaman@lemmy.world · 4 days ago

the code is degenerate

I’m no biologist (I’ve only had some basic biochemistry), could you elaborate on this? As far as I can recall, each codon encodes a specific amino acid, which means that a single base pair being mistranscripted leads to the wrong amino acid, potentially breaking the protein.

the signal to noise ratio of transcription would have a greater per amino acid effect

Not sure I get what you mean here either? The difference is in whether each base(-pair) encodes a binary or a ternary bit. Lossless compression redundant data (i.e. what you do when moving from a binary to ternary format) doesn’t change the signal-to-noise ratio.

threelonmusketeers@sh.itjust.works · 3 days ago

each codon encodes a specific amino acid

There are 4 × 4 × 4 = 64 possible codons, but only 20(-ish) amino acids. This leaves quite a few spares for some built-in redundancy.

For example, UCU, UCC, UCA, and UCG all code for Serine. The third letter can mutate or be mistranscribed, and still produce an identical protein.