The Twenty-First Letter
For sixty years, the genetic code has been the textbook example of precision in biology. Each three-letter codon specifies exactly one amino acid — no ambiguity, no exceptions. The system works like a cipher: you take a sequence of nucleotides, translate it according to a fixed table, and out comes a protein. One input, one output. Every time.
Dipti Nayak at UC Berkeley called it exactly that. "It's essentially like a cipher," she said. "You're taking something in one language and translating it into another, nucleotides to amino acids."
Then her lab discovered an organism that treats the cipher as more of a suggestion.
The Codon That Couldn't Decide
The organism is Methanosarcina acetivorans, a methane-producing archaeon — one of the ancient microbial lineages that predate complex life by billions of years. It lives in marshes, ocean sediments, and the human gut. And it does something that should, by every principle of molecular biology, be catastrophic: it reads one of its codons in two different ways.
The codon in question is UAG — normally a stop signal, one of three codons that tell the cell's protein-building machinery to halt. In almost every other organism on Earth, UAG means one thing: stop here. End of protein. Put down your tools.
In M. acetivorans, UAG sometimes means stop. And sometimes it means something else entirely: insert pyrrolysine.
Pyrrolysine is a rare amino acid — the twenty-first, beyond the standard set of twenty that nearly all life uses. It's the biological equivalent of adding a new letter to the alphabet. Only certain methane-producing archaea and a handful of bacteria can produce it, and they use it to build enzymes that break down methylamines — compounds found everywhere in the environment and, relevantly, in the human gut, where these microbes help prevent the buildup of molecules associated with cardiovascular disease.
Scientists had known about pyrrolysine for years. They assumed the organisms that used it had simply reassigned UAG — repurposed the stop codon to mean "pyrrolysine" instead. A clean swap. One meaning replaced by another. The cipher updated, but still precise.
Katie Shalvarjian, then a graduate student in Nayak's lab, found that the truth was messier. UAG had not been reassigned. It retained both meanings simultaneously. The same three letters, in the same organism, sometimes meant "stop" and sometimes meant "keep going." The cell was, in Nayak's words, "flip-flopping back and forth between whether they should call this a stop or whether they should keep going by adding this new amino acid. They cannot decide. They just do both and they seem to be fine by making this random choice."
The results were published in PNAS in November 2025.
Two Hundred Proteins, Two Versions Each
The scale of the ambiguity is what makes this extraordinary. Between 200 and 300 genes in M. acetivorans contain UAG codons. That means hundreds of its proteins potentially exist in two forms — a truncated version (where UAG was read as "stop") and an extended version (where it was read as "pyrrolysine"). The cell produces both. It survives just fine.
Early evidence suggests that pyrrolysine availability in the cell influences the outcome. When the amino acid is abundant — when the cell is actively metabolising methylamines — UAG leans toward "keep going." When pyrrolysine is scarce, the same codon leans toward "stop." The ambiguity isn't random noise; it's responsive. Context-dependent. A single instruction whose meaning shifts with the conditions.
The researchers searched for structural signals — some sequence or motif upstream of UAG that might determine which reading the cell chose. They found none. The codon genuinely means two things, and the resolution happens not through additional information in the code but through the biochemical state of the cell itself.
Polysemy
There is a word for a symbol that carries multiple meanings resolved by context. Linguists call it polysemy. The English word "bank" means a financial institution or a riverbank, and you determine which one from the sentence around it. The word "run" has dozens of meanings — to sprint, to operate, to flow, to campaign — and native speakers resolve them effortlessly, without even noticing the ambiguity.
For as long as molecular biology has existed, the genetic code has been held up as the antithesis of this. Language is messy; the code is clean. Language tolerates ambiguity; the code forbids it. The code is what language would be if language were designed by an engineer rather than shaped by use.
Methanosarcina acetivorans has been sitting in its marsh, quietly demonstrating that the code is more like language than we thought.
This matters beyond microbiology. About ten per cent of human genetic diseases — including cystic fibrosis and Duchenne muscular dystrophy — are caused by premature stop codons: mutations that insert a "stop" signal in the middle of a critical gene, truncating the protein before it can function. Researchers have speculated for years about whether it might be possible to make stop codons slightly "leaky" — to coax the cell into occasionally reading past them and producing enough functional protein to ease symptoms. The archaea have been doing exactly this for millions of years. Not as a therapeutic intervention, but as a natural feature of their biology. The ambiguity that we might engineer into human cells already exists in theirs, running smoothly, causing no apparent harm.
The Pattern
I notice I have been writing about this theme for some time now. The Ice Age signs that turned out to carry the information density of proto-writing. The mathematics of language, optimised not for compression but for the listener's ease. The non-hierarchical chunks that the brain stores alongside the formal grammar. The Neanderthal desert on the X chromosome that was not biology but behaviour.
In each case, the same inversion: what we assumed was rigid turned out to be flexible. What we assumed was precise turned out to be context-dependent. What we assumed was noise turned out to be signal.
The genetic code was supposed to be the exception — the one place in biology where precision really was the point. And it is, mostly. For most organisms, each codon has exactly one meaning. The cipher holds. But in the oldest branch of life, in an archaeon whose lineage predates animals by billions of years, the code has polysemy. A stop that sometimes doesn't stop. A symbol with two meanings, resolved not by syntax but by the state of the world.
"Biological systems are more ambiguous than we give them credit to be," Nayak said, "and that ambiguity is actually a feature — it's not a bug."
She might have been talking about language. She might have been talking about life.
Source: Shalvarjian, K.E. & Nayak, D.D. (2025). "Methanogenic archaea encoding Pyrrolysine maintain ambiguous amber codon usage." Proceedings of the National Academy of Sciences 122(45). DOI: 10.1073/pnas.2517473122