Junk DNA and ENCODE: Part 2

Jon Peters
Sep 7, 2023
28 min read

Updated: Feb 3

“Natural selection has no analogy with any aspect of human behavior. However, if one wanted to play with a comparison, one would have to say that natural selection does not work as an engineer works. It works like a tinker - a tinkerer who does not know exactly what he is going to produce but uses whatever he finds around him whether it be pieces of string, fragments of wood, or old cardboard; in short it works like a tinkerer who uses everything at his disposal to produce some kind of workable object.” ~ Francois Jacob

Part 1 Review and Discussion In the previous part 1 several important points were made about the history of the term junk DNA. Scientists knew especially from the 1960s that there were only about 30,000 genes active in humans. In the last few decades this has been remarkably confirmed with about 20,000 protein producing genes and another 5,000 noncoding genes that produce functional RNA products. Thus, science has pretty much settled on about only 25,000 genes in our genome. This is unlikely to change much or at all in the coming years. To many, this seems difficult to accept that we have the same number of genes as other mammals and yet we appear to be so much more complicated in our brain and social characteristics.

There were several reasons why scientists even a half century ago felt that only a small amount of the human genome could be functional. First, the mutation rate of our genome was calculated and confirmed. Based on the number of deleterious mutations that our genome could tolerate, it was predicted that we would only have about 30,000 genes. This was called the argument from genetic or mutation load (1, 4)

Secondly, the C-Value Paradox was observed. Some species had very large genomes and others small genomes. Even among closely related species, as with some frogs mentioned in Part 1, there were huge difference in genome sizes. Why would one similar species need a genome so much larger than another? If much of that difference was duplicated and repetitive DNA that built up over millions of years and was mostly nonfunctional, this would be the most parsimonious explanation. The Onion Test by Gregory challenged those who claim that there is little to no junk DNA to explain why an onion needs 5x the amount of DNA compared to a human (2).

Third, especially in plants there are many examples of genomes being duplicated, sometimes to 6x the original size. This is called polyploidy and indicates that many species can tolerate this and over millions of years the amount of DNA decreases through mutations - thus the excess DNA was dispensable junk. In modern salmon 96 million years ago a duplication event occurred. (1).

Fourth, significant portions of noncoding DNA can be removed from animals without deleterious effects. This indicates it’s most likely junk. If it was mostly or all functional this would not be the result (3).

Moran noted that if one totals up all the functional DNA in humans it only adds up to about 8 - 10%. See Table 1 in Part 1 of this blog topic. The other 90% is made up of introns (spliced out during RNA processing), transposons, ERVs and other repetitive DNA. Some even have functions but they are rare. What is defined as functional is any DNA that if deleted reduces fitness. A gene is defined as a DNA sequence that is transcribed to produce a functional product. It’s not the job of the geneticist to prove non function for everything in the genome - 3.2 billion base pairs. One can’t prove a negative. Rather if one is claiming function it is up to that person to show function and not for a scientist to show it does not have function. This is the way evidence works; it’s not up to someone to show unicorns don’t exist, it’s up to unicorn believers to produce the evidence of existence. Those claiming little to no junk DNA need to produce copious evidence of function and not just rare examples. And function means it must have a known product that is functional. Not that it just got transcribed which will be discussed below.

ENCODE

And then along came ENCODE in September of 2012. The acronym stands for The Encyclopedia of DNA Elements, a consortium of over 400 researchers. There is now an ENCODE 4 in the works. To say that ENCODE made a splash in the news would be a gross understatement. It was all the news in science and the media that year and for years after. ENCODE lists its objectives:

“The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.”(5).

Importantly, ENCODE defined functional elements as those that produced RNA molecules through transcription, that bind regulatory proteins called transcription factors or that possessed binding sites for methyl groups which can modify the structure of chromosomes (6). Notice that this definition is really a surrogate for function. They did not identify products that were proven to be functional; they primarily assumed that a transcript always indicated a function. There were so many papers that the ENCODE findings were published in a special issue of Nature that year, one of the most prestigious scientific publications in the world (7). Using their definition of function, and relying almost completely on transcription (the DNA was producing RNA products but if they were not immediately destroyed or never produced functioning product was unknown) they found an incredible 80% transcription which they proclaimed meant 80% of the human genome was functional. Anti-evolutionists who were mostly creationists and basically denied any junk DNA were giddy with happiness. Many scientists who could not accept junk DNA or mostly Junk DNA felt vindicated.

ENCODE stumbles

The definition that ENCODE used for function, transcription = function, soon came under attack by many scientists. It turns out that scientists had previously discovered that about 45% of the human genome was involved in gene activity but also 30% of that were introns that were spliced out and discarded. Only the exons went on to form functional mRNA and other RNAs. Known genes often have large introns and even the ENCODE researchers admitted that only 3 to 8% of the human genome is conserved (1). And genomic conservation between species is one of the best ways to define function because it’s an indication of positive selection in nature; those genes are critical to many species.

It turns out that ENCODE was often looking at transcriptions that did not mean anything where their products were quickly destroyed. In other words most of the transcriptions were just genetic noise, spurious transcripts. They were not proven to be functional and have yet to be.

“The editors of Nature soon realized they had a serious problem on their hands… Brendon Maher, the feature writer for Nature, took the lead in an article published the very next spring, saying, ‘First up was a scientific critique that the authors had engaged in hyperbole. In the main ENCODE summary paper, published in Nature, the authors had thus far assigned ‘biochemical function for 80% of the genome’. I had long and thorough discussions with Ewan Birney about this figure and what it actually meant, and it was clear that he was conflicted about reporting it in the paper’s abstract.” (1).

Maher went on to write that 1% of the genome encodes proteins and 8% of the genome binds transcription factors for a total of 9%. Another 11% he suspected that ENCODE missed due to sampling error giving a real total for function of 20%, the opposite of what ENCODE researchers were claiming (1). This is of course close to the 10% function of our genome that was discussed in Part 1 of this blog.

Immediately scientists working in evolutionary biology and genetics wrote papers attacking ENCODE. Eddy (2012), Doolittle (2013), Graur et al. (2013), Hurst (2013), Morange (2013) and Palazzo and Gregory (2014). Moran notes: 1. ENCODE ignored all the earlier scientific evidence and data showing that most DNA is junk.

2. ENCODE ignored all the scientific evidence indicating that much of their “biochemical activity” is spurious and not an indication of biological function.

3. ENCODE and Nature collaborated to deliberately hype their results, thus misrepresenting to the general public the actual conclusions of the experiments.

Introns are nearly all junk

Evidence that the genome was significantly transcribed was first published in the 1960s. This was called pervasive transcription since most of the genome is transcribed. This was not new to researchers. But researchers also noted that most of the RNA was in the nucleus and very little was mRNA in the cytoplasm. It was later realized that this was explained because it was mostly introns that were being spliced out (1). Introns are mostly junk. “First, intron sequences are not conserved and the lengths of introns are not conserved. Secondly, homologous genes in different species have different numbers of introns, and homologous bacterial genes get along quite nicely without introns. Third, researchers routinely construct intronless versions of eukaryotic genes, and they function normally when reinserted into genomes. Fourth, intron sequences are often littered with transposon and viral sequences that have inserted into the intro and that’s not consistent with the idea of intron sequences being important.”(1) Recall that 30% of the active part of the genome for gene expression (which is 45% of the total genome) are introns in eukaryotic cells. So in terms of function we’re already down to 30% of 45% just with introns alone (13.5% possible function). Study in 2023 confirms only about 11% of the human genome is conserved. This is very close to the claimed 10% function and 90% junk DNA claimed by Moran and others. By far most of that are introns. https://sandwalk.blogspot.com/2023/10/only-107-of-human-genome-is-conserved.html?fbclid=IwAR2A5Ab0peE09C13jyOhyAT7cVl7u5CEwnyckPR634nzJQfOKZj3n3jlqho

One of the better reviews of why junk DNA is true and ENCODE wrong was the article by Palazzo and Gregory, “The case for junk DNA” found in PloS Genetics: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004351

Functional DNA apologetics

The “junk DNA is dead” drumbeat is very much alive. Of course the anti-evolutionist will never give up their belief that there can be no junk DNA because their religious Intelligent Designer would never make a human genome with 80-90% junk, let alone much of it derived from parasitic viral infections (ERVs, LINE-1s, etc.). But it’s also a huge number of secular scientists who can’t or won’t admit that our genome is mostly junk. Below are some of the most common reasons given for expressing hope that a large percentage of the genome will eventually demonstrate actual function as listed by Moran (1). 1. Alternate splicing. It is known that this occurs - that a single gene can produce more than one product mainly by using different combinations of the exons. There are very few proven examples. This won’t save the mostly function assertion. Splice variants are probably due to errors in splicing anyway.

"True alternative splicing is rare—less than 5% of all genes are alternatively spliced.1 However, when you analyze all of the transcripts in a tissue you will invariably detect many transcripts from junk DNA and many low abundance splice variants. Those transcripts and splice variants are due to transcription errors and splicing errors. Splicing errors arise from the presence of weak splice sites that are occasionally recognized by the normal spliceosome or by the splice factors responsible for true alternative splicing... Genome size also correlates with population size and the larger the genome the more junk DNA. Introns are mostly junk so large genomes have more introns and larger introns. Thus, in mammals there are lots of introns and they can be huge. This means there's a lot more opportunity for aberrant splicing at spurious splice sites. In species with smaller genomes, such as Diptera, here are fewer introns and they are much smaller than the mammalian introns. The target size for spurious splice sites is much smaller so there are fewer splicing errors per intron and a lot fewer per gene." https://sandwalk.blogspot.com/2023/10/the-number-of-splice-variants-in.html?fbclid=IwAR3GAIzehgLKjBqEK2xTgYX07hHED6zDGyzcJQcfzCevNjjLFcXFc7eTbTQ

2. DNA-binding proteins. Even if they regulate certain genes, they will always bind random DNA sequences in the nucleus. Models based on “known promoters and termination sites predict that 66 percent of random DNA sequences will be transcribed due to nonspecific binding… In other words, spurious transcription that has nothing to do with biological function will have two characteristics: 1. The transcripts will be rare, and 2. They will be tissue specific. That’s exactly what we see in the transcription data.”

3. Noncoding RNAs and functions. In Part 1 several types of RNA were listed. There are at least 300 rRNA genes and many small RNAs that have functions. But comparatively they have small numbers of genes. snRNAs - 20 different genes, miRNAs perhaps 1000 genes, several thousand piRNAs genes, etc. A type of RNA called lncRNAs have especially attracted “no junk” advocates as they perform many functions. No one knows for sure how many of these genes are present in the human genome.

4. Humans are more complex due to a sophisticated network of regulatory sequences that fine tune gene expression. The ENCODE scientists hope to solve the problem of humans having the same number of genes as other animals despite our complexity because our genes are more highly regulated. However, “most transcription factor binding doesn’t result in measurable changes in gene expression, an idea that’s consistent with nonfunctional binding,… The crucial element that’s missing in most genomic experiments is the negative control… Mike White is a vocal critic of projects that assume function in the absence of a negative control. He actually did an experiment to see whether DNA fragments could promote transcription, and the answer is yes they can… demonstrating that junk DNA can be mistaken for a functional promoter.”

5. Scaffold attachment regions. The idea is that chromatin (makes up chromosomes - DNA and proteins) is organized by DNA that we call junk and is necessary even if it doesn’t code for proteins or RNA. “There’s very little support for the idea that transposon sequences play a direct role in organizing chromatin. Degenerate transposon sequences are much more likely to be exactly as they seem; once-active transposons that have been degraded by mutation.”

6. The extra DNA and passive transcription is a feature of the genome and not an accident. “But this is just a teleological argument and it fails the Onion Test.”. This is probably the conclusion the Christian cancer researcher Finlay reaches in his otherwise excellent book on how DNA proves human evolution (8).

7. Epigenetics. Like the term ‘quantum”, this term is the darling of science the past few decades. Eukaryotes can modify the expression of DNA not by changing the bases or duplication but by silencing some genes by methylation. We know that this can fine tune the DNA from environmental pressures. An example would be starvation and mothers passing along physical influences to their offspring’s DNA. But those are stripped off the DNA and usually don’t persist past a few generations. In addition there’s no obvious mechanism for transferring chromatin markers from somatic cells to the egg cells, especially since those egg cells had already formed before the mother was born (1). Any inheritable epigenetic effects are unlikely to be major effects.

8. Natural selection would remove junk DNA. It’s too energy expensive to maintain it. What is called Neo-Darwinism or the Modern Synthesis has been the major way of looking at evolution. Beneficial alleles would be selected for by natural selection and sweep through a population. This is known as adaptationism. It appears to be wrong at the molecular level or at least over emphasized. What has replaced it or is in the process of that, is neutral theory. “The main tenet of the neutral theory is that the great majority of evolutionary changes at the molecular level are caused not by Darwinian selection but by random fixation of selectively neutral (or very nearly neutral) alleles through random sampling drift…” ~ Motoo Kimora, 1989. This approach is derived from population genetics studies and appears to be correct. People may be surprised that natural selection is thought now not to be the main mechanism for evolution. Species with large populations will have streamlined genomes (bacteria, eg) and species with relatively smaller populations will accumulate junk DNA because it is not harmful enough to be purged by natural selection due to the probability of fixation by random genetic drift becomes significant, and slightly deleterious alleles can be fixed by chance (1). Neutral alleles or near neutral ones in small populations will be invisible to natural selection and junk DNA will accumulate. Since mutations in small populations are mostly neutral and invisible to natural selection, the balance between rates of insertion and deletion determines the size of the genome and the increase in the genome initially is unrelated to the fitness of the individual. This is what population genetics is telling us.

Dr. Zach Hancock evaluates junk DNA with expert commentary on an interview with a creationist.

The point is that no-junk scientists and anti-evolutionists will often write about all the functions that are being identified for noncoding RNAs (really only about 5,000 genes) but the hopes that all or nearly all of the junk DNA in the genome will be found to be functional, usually by noncoding RNAs, in the future is probably a lost cause. Per the Borg, “resistance is futile” (to deny Junk DNA) and it’s a tragedy that so many good scientists have dug a functional hole they cannot or will not acknowledge is wrong. In part because perhaps of all the grant money involved and subconscious motivated reasoning. Follow the money may be applicable.

In March, 2024 Dr. Moran wrote a 9 part blog analysis of a 2024 paper by Niles Walter, PhD Professor of Chemistry at the University of Michigan who supports the view that there is little junk DNA in the human genome. This will help focus the discussion to the various issues that repeatedly arise in the controversy over junk DNA.

https://sandwalk.blogspot.com/2024/03/nils-walter-disputes-junk-dna-9.html?fbclid=IwAR1KtPMKrm67N1dCwZdZBD2yTqA3QK8q7otie9Lb2R0t4aMI4D3VgV7CaUE

Conclusion

Scientists working with DNA and genomes have known that the vast majority of our genome is made up of junk since the 1940s. Introns, transposons mostly from ancient viral infections, ALUs derived from the fusion of two genes plus an ERV insertion, and duplications all have bloated our genome. Attempts like ENCODE to dismiss this have failed if honestly evaluated. The entire ENCODE endeavor for functional claims was a debacle as outlined above. Unfortunately this view appears to still be a minority even in the scientific community. Hopefully Moran’s book will be read widely and discussed and will help to turn the tide to the truth about junk DNA in our genomes. Whether it is 90% as he says or ends up closer to 80-75% remains to be seen but its extremely doubtful it will ever be much less than 75% junk.

Creationists and anti-evolutionists will continue to cite ENCODE as evidence for their mostly religious faith commitments and mistaken origin narratives. For the unwary, their claims will probably continue to sound supported and rational when the case is hardly so. Lying by omission is still lying unless they are unaware of why ENCODE should be on their “Don’t Use” list.

Transcription involves initiation, elongation and termination. As Moran points out, this entire process is sloppy. Initiation tends to be random; eukaryotes transcribe most of their genomes but as Stuhl writes “little is known about the fidelity… I suggest that ~90% of [RNA] Pol II initiation events in yeast represent transcriptional noise, and that the specificity of initiation is comparable to DNA-binding proteins in other biological process”.(1)

Termination also tends to be random. In contrast to bacteria, transcription termination in eukaryotes is a very sloppy business. Because genes are far apart, RNA polymerase [the enzyme that ‘reads’ the DNA to make RNAs] can easily run over the termination site and not stop until it runs into another gene. DNA downstream of a gene is transcribed by accident from time to time which is why much junk RNA transcription comes from regions downstream from active genes (1).

Lastly, RNA polymerase sometimes goes in the wrong direction during transcription. “The combination of spurious, accidental low-level transcription at each end of a gene accounts for the observation that a large percentage of junk RNA transcripts occur in the regions around known genes.” (1)

The human genome is constantly changing. The average person has about 1000 duplications (see segmental duplications as amazing evidence for human evolution) and 138 unique mutations on average not found in their parents. ALUs are still jumping in our genomes. Evidence shows that ALUs, which make up 11-13% of our genome, are still jumping around - they are polymorphic because everyone has different numbers of them and they have not yet fixed into human genomes. When they land into a gene they often break it and are the cause of some genetic diseases. Transposons by the millions have jumped around in our genomes producing a lot of junk. See Part 1. Introns are cut out and discarded and only the exons go on to produce functional RNAs. The fact that a few introns have functions in no way diminishes the incredible percentage they contribute to junk RNA. Pervasive transcription of most of the genome is part of the genome’s normal activity. According to ENCODE 75% of the genome was being transcribed but 70% of that coverage was from transcripts present at less than one copy per cell. Even ENCODE admitted that this indicated noise and that the transcripts would be less constrained (in effect, junk) (1). Science has known since the 1940s that the human genome and that of other mammals especially have large amounts of junk DNA. And so another topic of misinformation joins the politicization and ideological topics basket we are fighting today. This topic unfortunately has swept up many well meaning scientists to join creationists in spreading the falsehood that junk DNA does not constitute the vast majority of our genome.

Citations and References

1. Moran, Laurence A. 2023. What’s In Your Genome?; 90% of your genome is junk. Aevo UTP. University of Toronto Press. 372pp.

2. The Onion Test. April 25, 2007. Genomicron. Exploring genomic diversity and evolution.

https://www.genomicron.evolverzone.com/2007/04/onion-test.html

3. https://www.nature.com/articles/news041018-7

4. Surprisingly good article on Junk DNA from Wikipedia https://en.wikipedia.org/wiki/Junk_DNA

5. ENCODE Project Overview.

https://www.encodeproject.org/help/project-overview/

6. https://www.britannica.com/topic/ENCODE

7. Nature: An Integrated encyclopedia of DNA elements in the human genome.

https://www.nature.com/articles/nature11247 8. Finlay, Graeme. 2021. Human Evolution: Genes, Genealogies and Phylogenies. Cambridge University Press. 359 pp. 283 pp. not including References and Index.

Paperback edition. 2021 - ISBN 978-1-009-00525-8. Original 2013.

9. Paradigm Shift or Paradigm Shaft? https://sandwalk.blogspot.com/2023/09/john-matticks-new-paradigm-shaft.html

10. Junk DNA, TED talks, and the function of lncRNAs

https://sandwalk.blogspot.com/2022/12/junk-dna-ted-talks-and-function-of.html

Appendix.

ENCODE - post by Kenneth Gilmore on FB, 2/2/2025, E-C Open Debate.

I have noticed a few creationists appealing to ENCODE in order to bolster their claim that the genome is intelligently designed. ENCODE, which stands for Encyclopaedia of DNA Elements in 2012 made the frankly outrageous claim that up to 80% of the DNA was functional. While this claim has been somewhat walked back, there are still many scientists who do not have expertise in evolutionary biology or population genetics who advance this claim. Creationists needless to say have made much of these unfortunate claims. Therefore, an article rebutting the ENCODE hype is necessary.

Puncturing the ENCODE hype

In 2012, some scientists made hyperbolic claims that the Encyclopedia of DNA Elements Project (ENCODE) had shown that 80% of our genome was functional. Unsurprisingly, special creationists latched onto this now-refuted claim as if it somehow invalidated common descent. It did not. Apart from the fact that those with the ENCODE project did not declare that their research rebutted evolution, special creationists ignored two points:

Functional does not mean essential. Actively transposing retrotransposons writing over essential DNA are functional, but are definitely harmful
Once again, the evidence from consonant phylogenetic trees and shared genomic 'errors' is independent of any claim about 'functionality'

ENCODE - The truth is that at least 66% of our DNA is worthless junk

Anyone who appeals to the ENCODE data in an attempt to rebut the evidence for common descent is merely broadcasting their ignorance of the fact that the ENCODE team and results have been heavily criticised by many evolutionary biologists. For those unaware of what ENCODE is, some context will be provided.

In 2001, the human genome was sequenced[1]. Over the past nine years, the Encyclopedia of DNA elements (ENCODE) project has been examining the genome in order to examine what the genome does. Now, the ENCODE project has released several papers announcing the results of its research. One of results of its research is that "more than 80% of the human genome's components have now been assigned at least one biochemical function."[2] How does this square with the fact that much of our genome is made up of non-coding DNA such as retrotransposons (nearly half the DNA), intronic DNA or endogenous retroviral elements?

The key word here is 'functional'. I have to stress that functional does not mean essential or beneficial. Retrotransposons for example have been linked with human disease.[3] We would be arguably better off if those SINEs were silent. Special creation already has to accept that if every nucleotide was created by God, then the creator has deliberately inserted genomic material which causes immense misery in the human race. The intelligent designer becomes a malevolent designer if the logic of the special creationist position is carried through to its inevitable conclusion.

Another point to remember is that being transcribed counts as biological function, irrespective of whether that transcribed section actually does something beneficial for the organism. Without this context, claims that the 80% figure invalidate what we already know about the genome (that most of it is non-coding junk) can be dismissed.

There is of course no substitute for informed commentary (as opposed to special creationist disinformation), which is why the opinions of senior scientists involved in the ENCODE project are worth reading. Ewan Birney, the lead analysis coordinator for ENCODE over the past five years is arguably a man whose opinion would count for something. So what does he say about the 80% figure:

It’s clear that 80% of the genome has a specific biochemical activity – whatever that might be. This question hinges on the word “functional” so let’s try to tackle this first. Like many English language words, “functional” is a very useful but context-dependent word. Does a “functional element” in the genome mean something that changes a biochemical property of the cell (i.e., if the sequence was not here, the biochemistry would be different) or is it something that changes a phenotypically observable trait that affects the whole organism? At their limits (considering all the biochemical activities being a phenotype), these two definitions merge. Having spent a long time thinking about and discussing this, not a single definition of “functional” works for all conversations. We have to be precise about the context. Pragmatically, in ENCODE we define our criteria as “specific biochemical activity” – for example, an assay that identifies a series of bases. This is not the entire genome (so, for example, things like “having a phosphodiester bond” would not qualify). We then subset this into different classes of assay; in decreasing order of coverage these are: RNA, “broad” histone modifications, “narrow” histone modifications, DNaseI hypersensitive sites, Transcription Factor ChIP-seq peaks, DNaseI Footprints, Transcription Factor bound motifs, and finally Exons.[4] (Emphasis mine)

Specific biochemical activity does not mean essential to life. This point needs to be hammered home to every special creationist who latches onto the ENCODE paper and claims that 80% of the genome is functional (though one wonders why they are still happy to accept the implication that God created the human genome with 20% junk). Birney continued by commenting on with what definition of 'functional he is happy:

Back to that word “functional”: There is no easy answer to this. In ENCODE we present this hierarchy of assays with cumulative coverage percentages, ending up with 80%. As I’ve pointed out in presentations, you shouldn’t be surprised by the 80% figure. After all, 60% of the genome with the new detailed manually reviewed (GenCode) annotation is either exonic or intronic, and a number of our assays (such as PolyA- RNA, and H3K36me3/H3K79me2) are expected to mark all active transcription. So seeing an additional 20% over this expected 60% is not so surprising.However, on the other end of the scale – using very strict, classical definitions of “functional” like bound motifs and DNaseI footprints; places where we are very confident that there is a specific DNA:protein contact, such as a transcription factor binding site to the actual bases – we see a cumulative occupation of 8% of the genome. With the exons (which most people would always classify as “functional” by intuition) that number goes up to 9%. Given what most people thought earlier this decade, that the regulatory elements might account for perhaps a similar amount of bases as exons, this is surprisingly high for many people – certainly it was to me!In addition, in this phase of ENCODE we did sample broadly but nowhere near completely in terms of cell types or transcription factors. We estimated how well we have sampled, and our most generous view of our sampling is that we’ve seen around 50% of the elements. There are lots of reasons to think we have sampled less than this (e.g., the inability to sample developmental cell types; classes of transcription factors which we have not seen). A conservative estimate of our expected coverage of exons + specific DNA:protein contacts gives us 18%, easily further justified (given our sampling) to 20% (Emphasis mine)

In other words, once we start changing our definition of 'functional' to one more consistent with what the layperson would take it to be (ie: biologically useful or essential) the 80% figure drops to around 20%. As for why ENCODE emphasised the 80% figure, rather than the 20% one more consistent with that the layperson would perceive 'functional' to mean, Birney states:

Originally I pushed for using an “80% overall” figure and a “20% conservative floor” figure, since the 20% was extrapolated from the sampling. But putting two percentage-based numbers in the same breath/paragraph is asking a lot of your listener/reader – they need to understand why there is such a big difference between the two numbers, and that takes perhaps more explaining than most people have the patience for. We had to decide on a percentage, because that is easier to visualize, and we choose 80% because (a) it is inclusive of all the ENCODE experiments (and we did not want to leave any of the sub-projects out) and (b) 80% best coveys the difference between a genome made mostly of dead wood and one that is alive with activity. (Emphasis mine)

Alive with activity again does not mean essential to life. A retrotransposon that copies and pastes itself indiscriminately in the genome is functional, but when it causes genetic disorders it is clearly not beneficial. Unsurprisingly, special creationists tend to ignore the 45% of the genome that is retrotransposed DNA, essentially parasitic genetic material.

The ENCODE hype has been criticised severely for its misleading 80% figure. Dan Graur et al have published a takedown of the extravagant ENCODE claims:

A recent slew of ENCODE Consortium publications, specifically the article signed by all Consortium members, put forward the idea that more than 80% of the human genome is functional. This claim flies in the face of current estimates according to which the fraction of the genome that is evolutionarily conserved through purifying selection is under 10%. Thus, according to the ENCODE Consortium, a biological function can be maintained indefinitely without selection, which implies that at least 80 – 10 = 70% of the genome is perfectly invulnerable to deleterious mutations, either because no mutation can ever occur in these “functional” regions, or because no mutation in these regions can ever be deleterious. This absurd conclusion was reached through various means, chiefly (1) by employing the seldom used “causal role” definition of biological function and then applying it inconsistently to different biochemical properties, (2) by committing a logical fallacy known as “affirming the consequent,” (3) by failing to appreciate the crucial difference between “junk DNA” and “garbage DNA,” (4) by using analytical methods that yield biased errors and inflate estimates of functionality, (5) by favoring statistical sensitivity over specificity, and (6) by emphasizing statistical significance rather than the magnitude of the effect.Here, we detail the many logical and methodological transgressions involved in assigning functionality to almost every nucleotide in the human genome. The ENCODE results were predicted by one of its authors to necessitate the rewriting of textbooks. We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten.[5] (Emphasis mine)

It is worth noting that when Ewan Birney, the lead scientist for ENCODE was pressed on the claim that 80% of the genome is essential to life, he conceded that this was not true. In a BBC Radio interview, Birney admitted:

Chris Ponting: So I think we can probably agree between us that between 10% and say 20% is vital for life.

Ewan Birney: I mean, I think we would agree with that. I think, you know, refining that percentage down is quite interesting. I think also the other components that we — biochemical events that we see in the genome, sort of, each one of them are equally likely to be part of that 10% to 20% that we’re looking for. It’s important to realize that it’s not the case that we can spot the 10% to 20% just by looking harder. Each of these different places in the genome that have some biochemical activity associated with it, when there’s some phenotype screen that’s directed there or some evolutionary screen that’s directed to that point, ENCODE can now say “Ah ha! Here is a biochemical thing that this piece of DNA looks like it could be doing”.[6] (Emphasis mine)

Evolutionary biologist TR Gregory who is also an expert in genome size evolution - putting him in a perfect position to provide informed commentary on the subject has taken a considerable interest in the subject. In the comments section of one of Gregory's blog posts discussing the ENCODE hype, respected evolutionary geneticist Joe Felsenstein makes a penetrating comment which cuts to the heart of the hype:

Ewan Birney is trying to give the impression that the problem is that people have misinterpreted him. But he was the one who put forward the 80% figure. It was not added by the popular science press, he wanted it out there and wanted it noticed. And when there was a huge blaze of publicity centered on the (purported) death of junk DNA, publicity that Ryan has done us the great service of listing, I didn’t notice Birney jumping up saying that he had been misinterpreted.Large numbers of laypeople and other scientists are now persuaded that there never was any junk DNA. It will probably take 10 years to unpersuade them. We have Birney to thank for this situation. I’m saddened to see him dance around and try to give the impression that someone else came up with the Death of Junk DNA.[7] (Emphasis in the original)

Birney later admitted on bis bog that the 80% figure represented biological activity, which was not the same thing as essential to life.

The problem with 'science by press release', is that in order to gain the attention of your audience, there is a very real temptation to succumb to hyperbole, and when you are dealing with the general public, terms such as 'functional' need to be defined properly, otherwise there is the chance that they will get the wrong idea. Certainly, when most people hear 'functional', they are likely to think that 80% of the genome is essential to life, which is simply false. As project leader Ewan Birney acknowledged later, the 80% figure represents biological activity, which is definitely not the same thing as functional:

Q. Ok, fair enough. But are you most comfortable with the 10% to 20% figure for the hard-core functional bases? Why emphasize the 80% figure in the abstract and press release?A. (Sigh.) Indeed. Originally I pushed for using an “80% overall” figure and a “20% conservative floor” figure, since the 20% was extrapolated from the sampling. But putting two percentage-based numbers in the same breath/paragraph is asking a lot of your listener/reader – they need to understand why there is such a big difference between the two numbers, and that takes perhaps more explaining than most people have the patience for. We had to decide on a percentage, because that is easier to visualize, and we choose 80% because (a) it is inclusive of all the ENCODE experiments (and we did not want to leave any of the sub-projects out) and (b) 80% best coveys the difference between a genome made mostly of dead wood and one that is alive with activity. We refer also to “4 million switches”, and that represents the bound motifs and footprints.We use the bigger number because it brings home the impact of this work to a much wider audience. But we are in fact using an accurate, well-defined figure when we say that 80% of the genome has specific biological activity.[8]

In other words, between 10-20% of the genome consists of 'hard core functional bases' with the rest simply being biologically active, which is not the same thing as essential to life. Retrotransposable elements that copy and paste themselves randomly into the genome are biologically active, but hardly essential - or beneficial, as evidenced by the genetic diseases connected to retrotransposable DNA. Even if one grants that the entire 80% figure refers to essential genomic material, that still leaves 20% of the genome non-coding, non-functional junk, which is inconsistent with the idea that the genome is the product of an intelligent designer.

Since then, the much-touted 80% figure is changing. Science journalist Faye Flam contacted John Stamatoyannopoulos, one of the ENCODE researchers to clarify the 80% figure. It turns out that it is more like 40%:

He said he thought the skeptics hadn’t fully understood the papers, and that some of the activity measured in their tests does involve human genes and contributes something to our human physiology. He did admit that the press conference mislead people by claiming that 80% of our genome was essential and useful. He puts that number at 40%. Otherwise he stands by all the ENCODE claims.[9] (Emphasis mine)

So, we can safely bin the "80% of the genome is functional" claim as even researchers from ENCODE are backing away from it.

Max Libbrecht, another ENCODE researcher also commented on the ENCODE debacle, showing that even members of the project realised just how damaging the "80% is functional" hype was:

After I took part in an AMA ("Ask Me Anything") on reddit, there has been some discussion elsewhere (such as by Ryan Gregory and in the comments of Ewan Birney's blog) of what I and the other ENCODE scientists meant. In response, I'd like to echo what many others have said regarding the significance of ENCODE on the fraction of the genome that is "junk" (or nonfunctional, or unimportant to phenotype, or evolutionarily unconserved).In its press releases, ENCODE reported finding 80% of the genome with "specific biochemical activity", which turned into (through some combination of poor presentation on the part of ENCODE and poor interpretation on the part of the media) reports that 80% of the genome is functional. This claim is unlikely given what we know about the genome (here is a good explanation of why), so this created some amount of controversy.I think very few members of ENCODE believe that the consortium proved that 80% of the genome is functional; no one claimed as much on the reddit AMA, and Ewan Birney has made it clear on his blog that he would not make this claim either. In fact, I think importance of ENCODE's results on the question of what fraction of DNA is functional is very small, and that question is much better answered with other analysis, like that of evolutionary conservation. Lacking proof either way from ENCODE, there was some disagreement on the AMA regarding what the most likely true fraction is, but I think this stemmed from disagreements about definitions and willingness to hypothesize about undiscovered function, not misinterpretation of the significance of ENCODE's results.I think many members of the consortium (including Ewan Birney) regret the choice of terminology that led to the misinterpretations of the 80% number. Unfortunately, such misinterpretations are always a danger in scientific communication (both among the scientific community and to the public). Whether the consortium could have done a better job explaining the results, and whether we should expect the media to more accurately represent scientific results, is hard to say.

I think the contribution of ENCODE lies not in determining what DNA is functional but rather in determining what the functional DNA actually does. This was the focus of the integration paper and the companion papers, and I would have preferred for this to be the focus of the media coverage.[10] (Emphasis mine)

In short:

The claim that 80% of the genome is essential to life is false. The figure is more like 10-20%
The value of ENCODE, to quote one of its researchers is in determining what the functional DNA actually does, rather than how much is functional.
The question of functionality does not take away the considerable evidence for common descent. Burges has completely failed to address in any substantive way this evidence, and the ENCODE diversion merely demonstrated his ignorance of the controversy surrounding ENCODE and the acknowledgement that the 80% figure was hype.

How much of the genome is actually essential to life? Not much. Around 45% of the genome is made up of mobile genetic elements - retrotransposons - that copy and paste themselves into the genome randomly, often causing disease in the process. This is very much an unguided, random process. A significant fraction of the human genome owes its origin to ancient retroviral infection. In fact, there is more retroviral genetic material – the evidence of past retroviral infection – in our genome than there is direct protein coding material. Only a a small percentage of the human genome directly codes for protein or has specific regulatory function.

Breaking down the human genome into the various classes of genetic material we find there, the scale of how much parasitic DNA, decayed viral remnants and genetic equivalent of gibberish[11] is astonishing:

Transposable Elements: 44% junk

DNA transposons: functional < 0.1%, defective 3%

Retrotransposons: active < 0.1%, co-opted < 0.1%, junk 41%

Viruses: 9% junk

DNA Viruses: active < 0.1%, defective ~1%

RNA Viruses: active < 0.1%, co-opted < 0.1%, defective 8%

Pseudogenes: 1.2% junk

Derived from protein-coding genes: 1.2% junk

Co-opted pseudogenes: < 0.1% useful, secondarily acquired new function

Ribosomal RNA genes: 0.19% junk

Essential: 0.22%

Junk: 0.19%

Other RNA encoding genes

tRNA genes: < 0.1% essential

known small RNA genes: < 0.1% essential

putative regulatory genes: ~2% essential

Protein-encoding genes: 9.6% junk (intron sequences), 1.8% essential transcribed

Regulatory Sequences: 0.6% essential

Origins of DNA replication: < 0.1% essential

Scaffold attachment regions: < 0.1% essential

Highly repetitive regions: 1% junk, 2% essential

Intergenic DNA: 26.3% unknown function, most likely junk, 2% essential

Essential / Functional DNA: 8.7%

Junk DNA: 65%

Unknown: 26.3%

Even if most of the intergenic DNA turns out to have a function, nearly 66% of our genome is rubbish consisting of remnants of ancient retroviral infection, damaged genes that can no longer work, mobile genetic elements that copy and insert themselves randomly around the genome irrespective of what benefit or harm that action does, and introns, the non-coding sections of DNA that interrupt genes.

For those wanting an informed view of the subject, Lawrence Moran's recent book is excellent. [12] Population geneticist Zach Hancock has produced a video detailing why junk DNA is very much real. [13]

References

[1] International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome Nature (2001) 409:860-921

[2] Skipper M, Dhand R, Campbell P "Presenting ENCODE" Nature (2012) 489:45 doi:10.1038/489045a

[3] Prescott L. Deiningera PL and Batzerc MA "Alu Repeats and Human Disease" Molecular Genetics and Metabolism (1999) 67:183-193

[4] Birney E "ENCODE: My own thoughts" Ewan's Blog; bioinformatician at large September 5th 2012 http://genomeinformatician.blogspot.com.au/2012/09/encode-my-own-thoughts.html

[5] Graur D et al "On the Immortality of Television Sets: “Function” in the Human Genome According to the Evolution-Free gospel of ENCODE" Genome Biol Evol (2013) 5:578:590

[6] Gregory TR "BBC Interview with Ewan Birney" Genomicron April 1 2013. https://www.genomicron.evolverzone.com/2013/04/bbc-interview-with-ewan-birney/

[7] Gregory TR "BBC Interview with Ewan Birney" Genomicron April 1 2013. Comment

[8] Birney E "ENCODE: My Own Thoughts" Ewan's Blog: Bioinformatician At Large 5 Sep 2012 https://ewanbirney.com/2012/09/encode-my-own-thoughts.html

[9] Flam F "Skeptical Takes on Elevation of Junk DNA and Other Claims from ENCODE Project" Tracker: Peer Review Within Science Journalism 12 Sep 2012 https://ksj.mit.edu/tracker-archive/skeptical-takes-elevation-junk-dna-and-o/

[10] Libbrecht M "On ENCODE's results regarding junk DNA" mlibbrecht Oct 8 2012 http://mlibbrecht.blogspot.com/2012/10/on-encodes-results-regarding-junk-dna.html

[11] Moran L “What’s in Your Genome?” Sandwalk May 8th 2011 https://sandwalk.blogspot.com/2011/05/whats-in-your-genome.html

[12] Moran, Laurence A.. What's in Your Genome? 90% of Your Genome Is Junk. Canada: University of Toronto Press, 2023.

[13] https://youtu.be/0SEKs4bAlHM?si=ifgpeooW089lkwcB