DNA Duplications and Evolutionary Proof
Updated: Aug 3
“Expanses of duplicated DNA are situated around our genomes…Genome science has revealed that 5% of our genome is composed of large duplicated segments of DNA or segmented duplications (SDs). Each such duplication that is a common genomic character of our species must have occurred in an ancestor that was common to us all… Any duplicon [a duplicated unit] shared with another species establishes common ancestry” ~ Graeme Finlay
New genes arise naturally. It happens all the time. Where can we find that? In the labs of cell biologists and medical centers with cancer researchers for one thing. Many of these scientists are Christians and evangelicals; Graeme Finlay, Joshua Swamidass and Francis Collins are examples. It turns out we can prove new genes arise, take on new functions and even develop novel gain of function because cancer tumors begin as a single rogue mutated cell that must evade the immune system and develop novel, new genes in a complex sequential manner in order to survive and grow into tumors of millions of cells. The ability to spread (metastasize) involves many new functions and genes not present in the original normal cell. And this has been demonstrated to happen through random mutations and natural selection, exactly how biologists have demonstrated evolution. See the blog on Evolutionary Medicine, the Cancer section where the Christian and evangelical Swamidass details this.
Biologists have documented for decades that the genomes of animals and plants contain large chunks of duplicated DNA. The human genome contains about 3 billion base pairs and only about 20,000 protein coding genes. That represents only about 1.5% of our genome. Many other genes including regulatory, promoters, inhibitory and enhancer genes are found outside this 1.5%. But the vast majority of that other DNA is derived from viral infections, duplications and jumping genes. They usually represent junk that travels with us. I will be writing about junk DNA in the near future. See blogs here about pseudogenes as an example of junk DNA that has accumulated in our genome over millions of years often due to duplications and how when shared between species they rise to the level of proof for human evolution. Many have been co-opted for functions and some even have developed new functions, something that most anti-evolutionists claim can’t happen.
The Type IA form of Charcot-Marie-Tooth disease is an inherited genetic disease caused by a gene duplication in chromosome 17 resulting in too much production of a protein called PMP22 (1). Ice Fish have a gene that functions as antifreeze that is a duplication of a digestive gene and then evolved a new function.
If complexity in organisms were correlated with genome size - which is a prediction of anti-evolutionists since they insist usually that there is no such thing as junk DNA - it is impossible or difficult for them to explain why a species of amoeba has 200x the amount of DNA as humans, or some species of salamanders 100x, or even an onion species with 10x the amount of DNA as humans (Prothero/Shermer debate vs Meyer/Sternberg, 2015). Why would you need that many genes compared to humans to make an onion? This actually has a name as a challenge for anti-evolutionists; The Onion Test (4). Indeed it’s common to find plant species where their chromosomes have undergone complete whole genome wide duplication and not just once. This is called polyploidy in plants and is common. It’s importance is that these duplicated sections are no longer under stabilizing selection pressures since they are just copies. Mutations are often not corrected and the genes are allowed to evolve into new genes or to degrade and become pseudogenes.
“The prevalence and recurrence of polyploidization in plant species make it one of the most important evolutionary events in plants, and as a result, polyploidization is an extensively investigated research field. Due to the rapid development of sequencing technologies, there is increased evidence to support that polyploidization plays an important role in the diversification of plant species, evolution of genes, and the domestication of crops…The occurrence of many independent polyploidization events in plants was found to be tightly associated with the timing of extreme climate events or natural disasters on earth, leading to mass extinctions while possibly facilitating increased polyploidization… Moreover, polyploidization was found to significantly impact species diversification, with subsequent effects on crop domestication and the development of traits with agronomic importance”(2).
Copy Number Variants
Humans contain many genes that have been copied. Genomic analysis has revealed 11,700 locations in our genomes. Any two people differ at about 1,100 of these and nearly 1,000 variable locations involving more than 50,000 bases have been found. Out of 385 olfactory receptor genes, any two humans can vary in the number of copies we have. Same for taste receptor genes and the number of opsin genes (for vision) we have (3). In other words, individuals can vary slightly in how we smell, taste and see because we all vary in the number of receptor genes we have making sensory receptors.
A clear example is in the salivary amylase gene, which begins breaking down carbohydrates (starch) in our mouth. Amylase is also produced by our pancreas. I have a confession; while in training on a few occasions I would send a single amylase test for a patient with an inflammation of their parotid gland to see if it could help show an infection vs a blocked salivary duct vs a tumor perhaps not producing a lot of inflammation. I knew that a high level would result in an emergency notice to a sleeping resident on call because the lab would assume it was due to potentially life threatening pancreatitis. This would result in an unhappy resident as they are pulled out of bed only to find it had nothing to do with pancreatitis. There are ways to differentiate the two sources, including combining it with another blood test like a lipase level or even an isoamylase test, but the latter was not available to me and I purposely skipped combining it with a lipase level (another pancreatic enzyme). Anyway, it turns out that our copy number of this gene can matter. People with more copies digest starch better and individuals of European and Japanese descent have higher number of copies of AMY1 genes than those of African descent. Having more AMY1 gene copies lowers insulin levels via accelerated digestion of starches and reduces the chances of developing insulin resistance and diabetes (3).
Segmental Duplications (SDs)
In the previous section, it was shown that genes are often copied and duplicated. This can be advantageous as new genes and functions can arise through mutation and selection. Scientists studying animals that live long but yet don’t develop cancer often for example have found one reason is because they have more copies of tumor suppressor genes (elephants and the TP53 gene). It turns out that our genome contains also a huge number of copied large chunks of DNA segments containing genes that are shared by all of us, in contrast to various copy number in different human groups and individuals. About 5% of our 3 billion base pairs represent large duplicated DNA segments called segmental duplications (SDs). Recall that 8% of our genome is made up of ERVs that originally were placed randomly millions of years ago by retroviral infections. See ERVs. SDs have been defined as copies of large sections of DNA (> than 1,000 bases) that show a greater than 90% similarity with copies (3). These copies can even be found on different chromosomes or crammed into areas near centromere or telomere repeats. When these large blocks of duplications occur, they can destabilize genomes, cause genetic diseases, can duplicate genes within them and because they are unique and can be found to be shared across species, comparing them can support evolution to the level of proof (3).
Finlay notes the following examples:
Scientists have found that a large fragment of chromosome 1 (100,000 bases long) has been copied and pasted onto the Y chromosome in humans, chimps and bonobos. It is not present in gorillas, orangutans, gibbons or old world monkeys however.
Duplicate segments on human chromosome 17 are 24,000 bases long and flank a 1.5 million base region that contains the PMP22 gene. The duplicated segments cause genetic instability and results in a neurological disease.
In part of the CD8B1 gene, the first seven of the nine exons [these are the parts left over that are spliced together after the introns are cut out during transcription processing] has been duplicated as a truncated pseudogene that is present in humans, chimps and gorillas but not orangutans. One end of the duplicated segment has a breakpoint precisely in the same spot in each species having the pseudogene.
A segment of DNA 76,000 bases long has been copied from chromosome 9 to chromosome 22, promoting mis-pairing of the chromosomes during cell reproduction and promoting a translocation between the two chromosomes that generates two genes fusing to form a chimeric BCR-ABL fusion gene that is a cause of a type of leukemia (CML). “The paired blocks of DNA are found in the genomes of humans, chimps and gorillas but not of orangutans or monkeys, indicating that the duplication event occurred in an ancestor of the African great apes.” (3)
We have a cancer suppressor gene called CHEK2. It is located on chromosome 22 but part of it has been duplicated and inserted into chromosome 16. This event occurred in an ancestor of the great apes because humans, chimps, gorillas and orangutans all have this large duplicated segment. It underwent further duplication; in humans, five of the duplications are interrupted by a particular LINE-1 element, and in two of these cases a corresponding chimp duplicon (SD section) exists (3). These are but a few of thousands of SDs discovered. As Finlay notes, “Duplications of huge tracts of DNA arise as an inevitable outcome of the biochemical nature of DNA and of the enzymatic systems that maintain it. The presence of a particular, randomly arising duplication in multiple cells or in multiple species indicates that those cells or species are monoclonal, descendants of the cell in which the unique structure arose.”
The fact that these duplications are unique events and are shared by some of the great apes but not others means we can use them to see if there is a repeatable pattern to their distribution across primate species. "Of all the bases present in SDs in the human genome, approximately 2/3rds are shared with chimps, and were already present in the genome belonging to the ancestors of humans and chimps. Some of these SDs are concentrated near telomeres, and these tend to be younger than SDs found elsewhere. Accordingly, a lower proportion of the telomeric SD’s (50% instead of 66%) is shared by humans and chimps." (3).
What all this means
If you have hopefully been following along you’ve seen the gene duplications are common. Sometimes they are in large segments of DNA that are copied called SDs (segmental duplications) and these chunks can be in many places in the genomes from next to their original source, to near telomeres and centromeres which involve lots of repeat DNA, to even different chromosomes. "High quality sequence data have allowed the comparison of SDs at single base resolution. The rigorous analysis has confirmed that some SDs are shared by humans and chimps, others by humans, chimps and gorillas and still others by humans, the other great apes and also with old world monkeys such as baboons" (3). If we just simply note which SDs are shared by which groups of primates we can test evolution. Hundreds of SDs have been noted and they generate a primate family tree that is identical to other DNA markers’ trees such as ERVs, DNA repair patches and pseudogenes discussed in the other blogs and sections on this site. If we just compare many SDs and note which are shared between primate groups a family tree based on shared SDs is produced. If evolution was not true a phylogenetic tree could not be produced like this, let alone scores of them using independently derived observations. See Figure 1.
Figure 1. Shared Segmental Duplications (SDs) between primate groups. Numerals indicate the numbers of duplication events and when they occurred. See text.
From: Finlay, Graeme. 2013. Human Evolution: Genes, Genealogies and Phylogenies.
p 204. Figure 4.2. Cambridge University Press. 2021 ed. Social sharing and Fair dealing applied per publisher's web instructions.
For example, in Figure 1 above 133 segmental duplications unique to humans have been found. In humans and chimps/bonobos 121 SDs have been found to be shared. Gorillas, chimps and humans have 220 SDs in common. If evolution were not true we could not put together an evolutionary tree from SDs that matched the fossil record, and also the trees produced by shared ERVs, DNA repairs, and pseudogenes. The concept where independent lines of evidence come together and jump towards a firm conclusion is called consilience. We can also approach the same test of evolution with SDs by looking at a single duplication that has kept duplicating, called an expansion. One gene that duplicated more than once can be evaluated in different species. If evolution is true we should find the most recent duplications in species in the newer species and the more recent species should have more copies. And that's exactly what we do find. See Figure 2.
Figure 2. Expansion of the SPANX gene family through primate evolution. Gene content is depicted to the right of the phylogenetic tree: X chromosome (thick line), centromere (oval), SPANX genes (open boxes). SPANX-N genes (N), SPANX-N5 (N5), And SPANX-B,C, A1, A2, and D (B, C, A, and D). OWM = Old World Monkeys. See text below.
From: Finlay, Graeme. 2013. Human Evolution: Genes, Genealogies and Phylogenies.
p 212. Figure 4.6. Cambridge University Press. 2021 ed. Social sharing and Fair dealing applied per publisher's web instructions.
The SPANX genes are located on the X chromosome and are largely active in the testes producing a protein in sperm cells. They also can cause premature ovarian failure in women lacking or having damaged genes. This is how Finlay describes the finding of these genes across species. Refer to Figure 2:
"In non-primate mammals such as dogs and rodents, a single SPANX-N gene is found located on the long arm of the X chromosome. Primates, in contrast, possess multiple SPANX genes. The gene family has expanded in a stepwise manner through primate history, and the rate has accelerated especially in the great apes. The original SPANX-N gene underwent two duplications in an ancestor of the apes and OWMs (which share three SPANX-N genes), and another duplication in an ancestor of the great apes (which share four SPANX-N genes). Further duplications occurred in the linage giving rise to the African great apes. These species share an additional five genes. These are SPANX-A1, A2, B, and D on the long arm of the X chromosome. Finally, humans have acquired a further gene (SPANX-C) and up to 14 copies of SPANX-B. The flanking and intronic sequences of the SPANX-A to D genes are very similar to each other, indicating that they are located on segmental duplications approximately 20,000 bases long." (3)
For those with more knowledge of DNA genomics and perhaps a glutton for evolution DNA punishment, LOL there is a little more regarding the SPANX family of duplicated genes. This is not needed to understand the basic concept of how shared duplications demonstrate evolution to near proof but is featured as an example of how much these scientists know about the genome and the findings that prove human evolution. Reading my blogs on ERVs and DNA repairs would be very helpful if interested.
He goes on to note: " The mechanism by which the SPANX-C gene arose has been elucidated. A DNA break occurred in an L1PA7 element [a retrotransposon] that was located some considerable distance upstream of SPANX-B. A segment of DNA (containing SPANX-B) was imported to join the broken ends, and this was selected on the basis of close similarity between the broken L1PA7 element and an L1PA4 element upstream of SPANX-B. The reorganized locus has features of a repair job performed by a non-homologous end-joining (NHEJ). The product contains a chimaeric LINE-1 element, the brand new SPANX-C gene and a chimaeric LTR1B/LIPA7 element." DNA breaks and repairs that are shared between species are powerful evidence of evolution. See my blog on shared DNA repairs here to begin to understand what he is saying and the power of these observations to prove evolution. LINE-1s are a type of jumping gene that randomly inserts into DNA. LTRs are promoters inserted into DNA by retroviruses and are discussed in my ERV section.
When studying human and the other great ape genomes it was discovered that there are many duplicated genes and many large segments that have been duplicated. We can study the genomes of many species including humans and compare these duplications. When we find the same duplications, defined as greater than 90% of the exact same DNA sequences (ATCG nucleotides) we can be certain that they are indeed the same and represent copies. The only way to have the same duplications at the same chromosomal homologous locations between species is they had to have inherited them from a common ancestor. There is no other rational explanation. Numerous shared duplications can be summed into groups and a phylogenetic tree is produced showing shared common ancestry. The evolutionary phylogenetic trees of segmental duplications match the trees produced by paleontology and the other DNA markers including ERVs, DNA breaks/repairs, pseudogenes, and retroelements. That the same confirming results are produced by independent areas of research means the possibility of evolution being wrong by DNA findings must be essentially zero and represents strong consilience. In addition, some duplicons continued to make copies and within a single family of duplications the number of duplicated copies increases in newer species, as predicted by evolution, and can be nested to form another evolutionary tree from the DNA raw data. As new duplications arise the more recent copies are also found closer to newer species. The comparative segmental DNA duplication findings between primates represents some of the best and most solid evidence we have for human evolution, "macroevolution", and rises to the level of proof in my opinion.
References and Citations 1. https://www.genome.gov/genetics-glossary/Duplication
3. Finlay, Graeme. 2021. Human Evolution: Genes, Genealogies, and Phylogenies. Cambridge University Press. 359 pp. Paperback edition.
4. The Onion Test. https://www.genomicron.evolverzone.com/2007/04/onion-test.html