Skip to content

Director’s Blog: Junk No More

By on

Some moments in science are so transformative that they spill out of the usual academic circles to become headlines in the New York Times and, for a day or two, part of a public conversation. This summer we had two such stories from diverse areas of science: the Curiosity landing on Mars and the report of spontaneous mutations transmitted by older fathers. Earlier this month another report – actually 30 separate papers – marked a transformative moment in our discovery of the genome. The Encyclopedia of DNA Elements (ENCODE) project involving 442 scientists from 32 different labs across the world described an entirely new picture of the human genome, changing forever how we will view our DNA and ourselves.

A decade ago when the human genome project was completed, one of the biggest surprises was how few genes were in the human genome. While we expected to find at least 100,000 genes among our 3 billion bases of DNA, the actual number was closer to 20,000, about the same as most other mammals and not much more than the lowly nematode. Equally surprising: only about 2% of the vast sequence of DNA was actually devoted to genes, that is, sequences coding for RNA that, in turn, was translated into protein. What about the other 98%? Assuming that Nature is inherently conservative, this was labeled “junk DNA”. But the sheer quantity of junk DNA was confusing. Imagine a book of 100 pages with only 2 pages actually spelling out meaningful words. While some considered this the dark matter of the genome and others thought it was simply garbage, perhaps residual from our evolutionary past, the term “junk” left the promise that this non-coding part of the genome would, like stuff stored in our biological attic, have some use.

ENCODE explored this attic using new tools to walk base-by-base through our genomes. The bottom line: nearly 80% of the genome carries information that is read out or, as the ENCODE papers call it, transcribed. Much of the DNA is regulatory, sequences that turn on or turn off genes. Some of these are just upstream or downstream from the target gene but others are surprisingly distant, as if a message on page 30 highlights a word or sentence on page 50. But the unexpected story is the number and variety of non-coding RNAs. ENCODE describes nearly 20,000 sequences of DNA that code for RNA but never get translated into protein. These non-coding RNAs influence a range of processes in the cell, some in ways that we had never known. And there are pseudogenes, stretches of DNA that are active in some cells but not others. All of this suggests that our gene-centric view of the genome obscured the richness of information encoded in the 3 billion bases of DNA in each of our cells. The genome may be less like a book and more like a busy website with links, images, and icons specifying many levels of information, all precisely regulated and coordinated.

What’s the relevance of this landmark for understanding mental disorders? There are three surprises and probably many more to come. First, parts of the non-coding genome appear unique to humans, especially regions that appear active for specifying neurodevelopment. Second, most of the previously described genetic differences associated with autism and mental disorders – and there are well over 100 such findings by now – are not on the coding regions of the genome but in these vast areas previously considered junk. With the ENCODE maps we can begin to identify the functional importance of these non-coding regions, whether via regulation of distant genes or by altering one of the thousands of non-coding RNAs. And finally, ENCODE has described roughly 200,000 places where proteins bind to the genome. The 3 billion bases, 10 miles of DNA coiled into a microscopic double helix within the nucleus of every cell, are literally covered with proteins, which coordinate their activity in complex ways to regulate gene transcription. These transcription factors alter the timing and the amounts of gene expression – mechanisms that are likely to be at the heart of mental disorders, whether they are mainly due to environmental stressors or genetic factors.

At the end of this century when we look back at the major achievements in biology, the discoveries about our genome in this first decade are likely to be seen as a landmark for understanding individual variation and human evolution. There remains a big gap to fill before we can turn these new findings into precise diagnostics or therapeutics for serious mental illness, but the tools are now available and, with ENCODE, we have new maps to follow. ENCODE moves us beyond a gene-centric view of the genome. Yesterday’s junk is, in fact, a treasure trove of interesting new leads.


Genomics. ENCODE project writes eulogy for junk DNA . Pennisi E.
Science. 2012 Sep 7;337(6099):1159, 1161. No abstract available.
PMID: 22955811