Junk DNA and ENCODE: Part 2
Updated: Sep 10
“Natural selection has no analogy with any aspect of human behavior. However, if one wanted to play with a comparison, one would have to say that natural selection does not work as an engineer works. It works like a tinker - a tinkerer who does not know exactly what he is going to produce but uses whatever he finds around him whether it be pieces of string, fragments of wood, or old cardboard; in short it works like a tinkerer who uses everything at his disposal to produce some kind of workable object.” ~ Francois Jacob
Part 1 Review and Discussion In the previous part 1 several important points were made about the history of the term junk DNA. Scientists knew especially from the 1960s that there were only about 30,000 genes active in humans. In the last few decades this has been remarkably confirmed with about 20,000 protein producing genes and another 5,000 noncoding genes that produce functional RNA products. Thus, science has pretty much settled on about only 25,000 genes in our genome. This is unlikely to change much or at all in the coming years. To many, this seems difficult to accept that we have the same number of genes as other mammals and yet we appear to be so much more complicated in our brain and social characteristics.
There were several reasons why scientists even a half century ago felt that only a small amount of the human genome could be functional. First, the mutation rate of our genome was calculated and confirmed. Based on the number of deleterious mutations that our genome could tolerate, it was predicted that we would only have about 30,000 genes. This was called the argument from genetic or mutation load (1, 4)
Secondly, the C-Value Paradox was observed. Some species had very large genomes and others small genomes. Even among closely related species, as with some frogs mentioned in Part 1, there were huge difference in genome sizes. Why would one similar species need a genome so much larger than another? If much of that difference was duplicated and repetitive DNA that built up over millions of years and was mostly nonfunctional, this would be the most parsimonious explanation. The Onion Test by Gregory challenged those who claim that there is little to no junk DNA to explain why an onion needs 5x the amount of DNA compared to a human (2).
Third, especially in plants there are many examples of genomes being duplicated, sometimes to 6x the original size. This is called polyploidy and indicates that many species can tolerate this and over millions of years the amount of DNA decreases through mutations - thus the excess DNA was dispensable junk. In modern salmon 96 million years ago a duplication event occurred. (1).
Fourth, significant portions of noncoding DNA can be removed from animals without deleterious effects. This indicates it’s most likely junk. If it was mostly or all functional this would not be the result (3).
Moran noted that if one totals up all the functional DNA in humans it only adds up to about 8 - 10%. See Table 1 in Part 1 of this blog topic. The other 90% is made up of introns (spliced out during RNA processing), transposons, ERVs and other repetitive DNA. Some even have functions but they are rare. What is defined as functional is any DNA that if deleted reduces fitness. A gene is defined as a DNA sequence that is transcribed to produce a functional product. It’s not the job of the geneticist to prove non function for everything in the genome - 3.2 billion base pairs. One can’t prove a negative. Rather if one is claiming function it is up to that person to show function and not for a scientist to show it does not have function. This is the way evidence works; it’s not up to someone to show unicorns don’t exist, it’s up to unicorn believers to produce the evidence of existence. Those claiming little to no junk DNA need to produce copious evidence of function and not just rare examples. And function means it must have a known product that is functional. Not that it just got transcribed which will be discussed below.
And then along came ENCODE in September of 2012. The acronym stands for The Encyclopedia of DNA Elements, a consortium of over 400 researchers. There is now an ENCODE 4 in the works. To say that ENCODE made a splash in the news would be a gross understatement. It was all the news in science and the media that year and for years after. ENCODE lists its objectives:
“The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.”(5).
Importantly, ENCODE defined functional elements as those that produced RNA molecules through transcription, that bind regulatory proteins called transcription factors or that possessed binding sites for methyl groups which can modify the structure of chromosomes (6). Notice that this definition is really a surrogate for function. They did not identify products that were proven to be functional; they primarily assumed that a transcript always indicated a function. There were so many papers that the ENCODE findings were published in a special issue of Nature that year, one of the most prestigious scientific publications in the world (7). Using their definition of function, and relying almost completely on transcription (the DNA was producing RNA products but if they were not immediately destroyed or never produced functioning product was unknown) they found an incredible 80% transcription which they proclaimed meant 80% of the human genome was functional. Anti-evolutionists who were mostly creationists and basically denied any junk DNA were giddy with happiness. Many scientists who could not accept junk DNA or mostly Junk DNA felt vindicated.
The definition that ENCODE used for function, transcription = function, soon came under attack by many scientists. It turns out that scientists had previously discovered that about 45% of the human genome was involved in gene activity but also 30% of that were introns that were spliced out and discarded. Only the exons went on to form functional mRNA and other RNAs. Known genes often have large introns and even the ENCODE researchers admitted that only 3 to 8% of the human genome is conserved (1). And genomic conservation between species is one of the best ways to define function because it’s an indication of positive selection in nature; those genes are critical to many species.
It turns out that ENCODE was often looking at transcriptions that did not mean anything where their products were quickly destroyed. In other words most of the transcriptions were just genetic noise, spurious transcripts. They were not proven to be functional and have yet to be.
“The editors of Nature soon realized they had a serious problem on their hands… Brendon Maher, the feature writer for Nature, took the lead in an article published the very next spring, saying, ‘First up was a scientific critique that the authors had engaged in hyperbole. In the main ENCODE summary paper, published in Nature, the authors had thus far assigned ‘biochemical function for 80% of the genome’. I had long and thorough discussions with Ewan Birney about this figure and what it actually meant, and it was clear that he was conflicted about reporting it in the paper’s abstract.” (1).
Maher went on to write that 1% of the genome encodes proteins and 8% of the genome binds transcription factors for a total of 9%. Another 11% he suspected that ENCODE missed due to sampling error giving a real total for function of 20%, the opposite of what ENCODE researchers were claiming (1). This is of course close to the 10% function of our genome that was discussed in Part 1 of this blog.
Immediately scientists working in evolutionary biology and genetics wrote papers attacking ENCODE. Eddy (2012), Doolittle (2013), Graur et al. (2013), Hurst (2013), Morange (2013) and Palazzo and Gregory (2014). Moran notes: 1. ENCODE ignored all the earlier scientific evidence and data showing that most DNA is junk.
2. ENCODE ignored all the scientific evidence indicating that much of their “biochemical activity” is spurious and not an indication of biological function.
3. ENCODE and Nature collaborated to deliberately hype their results, thus misrepresenting to the general public the actual conclusions of the experiments.
Introns are nearly all junk
Evidence that the genome was significantly transcribed was first published in the 1960s. This was called pervasive transcription since most of the genome is transcribed. This was not new to researchers. But researchers also noted that most of the RNA was in the nucleus and very little was mRNA in the cytoplasm. It was later realized that this was explained because it was mostly introns that were being spliced out (1). Introns are mostly junk. “First, intron sequences are not conserved and the lengths of introns are not conserved. Secondly, homologous genes in different species have different numbers of introns, and homologous bacterial genes get along quite nicely without introns. Third, researchers routinely construct intronless versions of eukaryotic genes, and they function normally when reinserted into genomes. Fourth, intron sequences are often littered with transposon and viral sequences that have inserted into the intro and that’s not consistent with the idea of intron sequences being important.”(1) Recall that 30% of the active part of the genome for gene expression (which is 45% of the total genome) are introns in eukaryotic cells. So in terms of function we’re already down to 30% of 45% just with introns alone (13.5% possible function).
One of the better reviews of why junk DNA is true and ENCODE wrong was the article by Palazzo and Gregory, “The case for junk DNA” found in PloS Genetics: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004351
Functional DNA apologetics
The “junk DNA is dead” drumbeat is very much alive. Of course the anti-evolutionist will never give up their belief that there can be no junk DNA because their religious Intelligent Designer would never make a human genome with 80-90% junk, let alone much of it derived from parasitic viral infections (ERVs, LINE-1s, etc.). But it’s also a huge number of secular scientists who can’t or won’t admit that our genome is mostly junk. Below are some of the most common reasons given for expressing hope that a large percentage of the genome will eventually demonstrate actual function as listed by Moran (1). 1. Alternate splicing. It is known that this occurs - that a single gene can produce more than one product mainly but using different combinations of the exons. There are very few proven examples. This won’t save the mostly function assertion. Splice variants are probably due to errors in splicing anyway.
2. DNA-binding proteins. Even if they regulate certain genes, they will always bind random DNA sequences in the nucleus. Models based on “known promoters and termination sites predict that 66 percent of random DNA sequences will be transcribed due to nonspecific binding… In other words, spurious transcription that has nothing to do with biological function will have two characteristics: 1. The transcripts will be rare, and 2. They will be tissue specific. That’s exactly what we see in the transcription data.”
3. Noncoding RNAs and functions. In Part 1 several types of RNA were listed. There are at least 300 rRNA genes and many small RNAs that have functions. But comparatively they have small numbers of genes. snRNAs - 20 different genes, miRNAs perhaps 1000 genes, several thousand piRNAs genes, etc. A type of RNA called lncRNAs have especially attracted “no junk” advocates as they perform many functions. No one knows for sure how many of these genes are present in the human genome.
4. Humans are more complex due to a sophisticated network of regulatory sequences that fine tune gene expression. The ENCODE scientists hope to solve the problem of humans having the same number of genes as other animals despite our complexity because our genes are more highly regulated. However, “most transcription factor binding doesn’t result in measurable changes in gene expression, an idea that’s consistent with nonfunctional binding,… The crucial element that’s missing in most genomic experiments is the negative control… Mike White is a vocal critic of projects that assume function in the absence of a negative control. He actually did an experiment to see whether DNA fragments could promote transcription, and the answer is yes they can… demonstrating that junk DNA can be mistaken for a functional promoter.”
5. Scaffold attachment regions. The idea is that chromatin (makes up chromosomes - DNA and proteins) is organized by DNA that we call junk and is necessary even if it doesn’t code for proteins or RNA. “There’s very little support for the idea that transposon sequences play a direct role in organizing chromatin. Degenerate transposon sequences are much more likely to be exactly as they seem; once-active transposons that have been degraded by mutation.”
6. The extra DNA and passive transcription is a feature of the genome and not an accident. “But this is just a teleological argument and it fails the Onion Test.”. This is probably the conclusion the Christian cancer researcher Finlay reaches in his otherwise excellent book on how DNA proves human evolution (8).
7. Epigenetics. Like the term ‘quantum”, this term is the darling of science the past few decades. Eukaryotes can modify the expression of DNA not by changing the bases or duplication but by silencing some genes by methylation. We know that this can fine tune the DNA from environmental pressures. An example would be starvation and mothers passing along physical influences to their offspring’s DNA. But those are stripped off the DNA and usually don’t persist past a few generations. In addition there’s no obvious mechanism for transferring chromatin markers from somatic cells to the egg cells, especially since those egg cells had already formed before the mother was born (1). Any inheritable epigenetic effects are unlikely to be major effects.
8. Natural selection would remove junk DNA. It’s too energy expensive to maintain it. What is called Neo-Darwinism or the Modern Synthesis has been the major way of looking at evolution. Beneficial alleles would be selected for by natural selection and sweep through a population. This is known as adaptationism. It appears to be wrong at the molecular level or at least over emphasized. What has replaced it or is in the process of that, is neutral theory. “The main tenet of the neutral theory is that the great majority of evolutionary changes at the molecular level are caused not by Darwinian selection but by random fixation of selectively neutral (or very nearly neutral) alleles through random sampling drift…” ~ Motoo Kimora, 1989. This approach is derived from population genetics studies and appears to be correct. People may be surprised that natural selection is thought now not to be the main mechanism for evolution. Species with large populations will have streamlined genomes (bacteria, eg) and species with relatively smaller populations will accumulate junk DNA because it is not harmful enough to be purged by natural selection due to the probability of fixation by random genetic drift becomes significant, and slightly deleterious alleles can be fixed by chance (1). Neutral alleles or near neutral ones in small populations will be invisible to natural selection and junk DNA will accumulate. Since mutations in small populations are mostly neutral and invisible to natural selection, the balance between rates of insertion and deletion determines the size of the genome and the increase in the genome initially is unrelated to the fitness of the individual. This is what population genetics is telling us.
Dr. Zach Hancock evaluates junk DNA with expert commentary on an interview with a creationist.
The point is that no-junk scientists and anti-evolutionists will often write about all the functions that are being identified for noncoding RNAs (really only about 5,000 genes) but the hopes that all or nearly all of the junk DNA in the genome will be found to be functional, usually by noncoding RNAs, in the future is probably a lost cause. Per the Borg, “resistance is futile” (to deny Junk DNA) and it’s a tragedy that so many good scientists have dug a functional hole they cannot or will not acknowledge is wrong. In part because perhaps of all the grant money involved and subconscious motivated reasoning. Follow the money may be applicable.
Scientists working with DNA and genomes have known that the vast majority of our genome is made up of junk since the 1940s. Introns, transposons mostly from ancient viral infections, ALUs derived from the fusion of two genes plus an ERV insertion, and duplications all have bloated our genome. Attempts like ENCODE to dismiss this have failed if honestly evaluated. The entire ENCODE endeavor for functional claims was a debacle as outlined above. Unfortunately this view appears to still be a minority even in the scientific community. Hopefully Moran’s book will be read widely and discussed and will help to turn the tide to the truth about junk DNA in our genomes. Whether it is 90% as he says or ends up closer to 80-75% remains to be seen but its extremely doubtful it will ever be much less than 75% junk.
Creationists and anti-evolutionists will continue to cite ENCODE as evidence for their mostly religious faith commitments and mistaken origin narratives. For the unwary, their claims will probably continue to sound supported and rational when the case is hardly so. Lying by omission is still lying unless they are unaware of why ENCODE should be on their “Don’t Use” list.
Transcription involves initiation, elongation and termination. As Moran points out, this entire process is sloppy. Initiation tends to be random; eukaryotes transcribe most of their genomes but as Stuhl writes “little is known about the fidelity… I suggest that ~90% of [RNA] Pol II initiation events in yeast represent transcriptional noise, and that the specificity of initiation is comparable to DNA-binding proteins in other biological process”.(1)
Termination also tends to be random. In contrast to bacteria, transcription termination in eukaryotes is a very sloppy business. Because genes are far apart, RNA polymerase [the enzyme that ‘reads’ the DNA to make RNAs] can easily run over the termination site and not stop until it runs into another gene. DNA downstream of a gene is transcribed by accident from time to time which is why much junk RNA transcription comes from regions downstream from active genes (1).
Lastly, RNA polymerase sometimes goes in the wrong direction during transcription. “The combination of spurious, accidental low-level transcription at each end of a gene accounts for the observation that a large percentage of junk RNA transcripts occur in the regions around known genes.” (1)
The human genome is constantly changing. The average person has about 1000 duplications (see segmental duplications as amazing evidence for human evolution) and 138 unique mutations on average not found in their parents. ALUs are still jumping in our genomes. Evidence shows that ALUs, which make up 11-13% of our genome, are still jumping around - they are polymorphic because everyone has different numbers of them and they have not yet fixed into human genomes. When they land into a gene they often break it and are the cause of some genetic diseases. Transposons by the millions have jumped around in our genomes producing a lot of junk. See Part 1. Introns are cut out and discarded and only the exons go on to produce functional RNAs. The fact that a few introns have functions in no way diminishes the incredible percentage they contribute to junk RNA. Pervasive transcription of most of the genome is part of the genome’s normal activity. According to ENCODE 75% of the genome was being transcribed but 70% of that coverage was from transcripts present at less than one copy per cell. Even ENCODE admitted that this indicated noise and that the transcripts would be less constrained (in effect, junk) (1). Science has known since the 1940s that the human genome and that of other mammals especially have large amounts of junk DNA. And so another topic of misinformation joins the politicization and ideological topics basket we are fighting today. This topic unfortunately has swept up many well meaning scientists to join creationists in spreading the falsehood that junk DNA does not constitute the vast majority of our genome.
Citations and References
1. Moran, Laurence A. 2023. What’s In Your Genome?; 90% of your genome is junk. Aevo UTP. University of Toronto Press. 372pp.
2. The Onion Test. April 25, 2007. Genomicron. Exploring genomic diversity and evolution.
4. Surprisingly good article on Junk DNA from Wikipedia https://en.wikipedia.org/wiki/Junk_DNA
5. ENCODE Project Overview.
7. Nature: An Integrated encyclopedia of DNA elements in the human genome.
https://www.nature.com/articles/nature11247 8. Finlay, Graeme. 2021. Human Evolution: Genes, Genealogies and Phylogenies. Cambridge University Press. 359 pp. 283 pp. not including References and Index.
Paperback edition. 2021 - ISBN 978-1-009-00525-8. Original 2013.
9. Paradigm Shift or Paradigm Shaft? https://sandwalk.blogspot.com/2023/09/john-matticks-new-paradigm-shaft.html
10. Junk DNA, TED talks, and the function of lncRNAs