Gene - The Concept of Constant Evolution
Writer: @Thanh Nguyen
Advisor: @Nam Sy Vo @Nguyễn Thuỳ Dương
Any scientific definition changes over time, and genes – one of the “backbone” concepts of biological sciences – are no exception. Over the past 100 years, along with our increasingly profound understanding, this term has gradually undergone fundamental changes.
History of the gene concept in the world
In the late 19th century, when Mendel discovered the law of heredity, although he did not mention the word “gene”, he mentioned “cellular elements”, or “genetic factors” (original German: Zellelemente), carrying hereditary factors, determining the characteristics of living organisms. It was not until 1909 that Johannsen named these “factors” “genes”, derived from the Greek γόνος, gonos – to grow, to flourish.
In the early 1900s, people knew about chromosome fibers (NST), initially these fibers were thought to be the “factors” found by Mendel, and they obeyed the Law of Segregation and the Law of Independent Assortment (different genes, which determine different traits, are inherited independently of each other). But only a short time later, in 1905-1910, evidence emerged that genes were frequently inherited together, demonstrating that genes were smaller than chromosomes, that genes on other chromosomes segregated independently, and that genes located close to each other on chromosomes were often inherited together. Thanks to these properties, in 1915-1929, through the studies of Morgan, Dobzhansky, Muller, and Painter, a “map” of the relative positions of the genes of fruit flies was created. In general, by 1930, the definition of a gene was relatively solid: the smallest unit of heredity, located at a position on a chromosome. Genes could inherit traits, recombine, mutate, and have specific functions. This can be considered the period when genes were defined as “Classical”: a physical entity, with a microscopic three-dimensional structure, with a distinct genetic structure, capable of changing and transmitting that change.
In the 1930s-1950s, in the midst of the golden age of Newtonian mechanics, “when almost everything had been studied”, something that existed in physical form like genes, without coordinates or dimensions was unacceptable. Experiments at Rockefeller University found that DNA was the genetic material, when experiments on Bacteriophages demonstrated that the DNA-containing component was responsible for their replication. However, the real leap in the theory of DNA genetics was discovered by Watson, Crick, Wilkin and Rosalind in 1953, which was the double-stranded helix structure of 2 nucleotide chains. From here, genes had a specific structure, with specific coordinates on the chromosome. In the following years, studies demonstrated that genes were transcribed into mRNA, and then the genetic information was further translated into Protein. The “one gene, one enzyme” theory was proposed, and was extended to “one gene – one mRNA – one polypeptide”. This period of genes, is considered the Neoclassical Period. However, just as quantum mechanics replaced the “clarity” of classical mechanics, the absolute definition of genes did not last long. Different transcription initiation regions of the same gene were discovered, along with the mechanism of alternative RNA splicing, showing that a gene could produce many different copies. However, DNA and RNA were still relatively similar, although the length could change, the order was basically the same, and an mRNA still produced a polypeptide. Then, in 2007 and 2011, two more big bullets were fired into the stronghold of the Neoclassical definition: RNA editing and gene sharing. A complete mRNA sequence can be edited, resulting in a completely different amino acid sequence than predicted. The phenomenon of gene sharing is also known as “protein moonlighting,” when the same gene, the same polypeptide chain, behaves differently in different cells. However, the core definition of a gene, “a physical entity with specific coordinates,” remained firmly in place until the emergence of whole-genome sequencing projects, which led to remarkable discoveries:
In eukaryotic cells, there is virtually no limit to the amount of transcription, and virtually the entire chromosome, or genome, is involved in the transcription process. It is virtually impossible to write down a 1:1:1 relationship between genes, transcripts, and final products. (Gingeras 2007, Pearson 2006, (The FANTOM Consortium and RIKEN Genome Exploration Group 2005; The ENCODE Project Consortium 2007; 2012)
The exons of one gene can become part of the transcript of another gene. It is estimated that about 4-5% of the repeats can produce a single hypothetical protein. (Parra et al. 2006)
In eukaryotic cells, some genes exist scattered in many parts throughout the chromosome set. (Landweber 2007)
The current state of a gene can be passed on to the next generation, that is, even information that is not stored in the DNA sequence. (Holliday 1987; Gerhart and Kirschner 2007; Jablonka and Raz 2009)
Phenomenon of genetic recovery: after a few generations of mutations, some mutations are restored to their original state exactly. (Lolle et al. 2005)
In addition to protein-producing genes, there are many genes that produce only RNA. In addition to the early discovered tRNA and rRNA, which are directly involved in protein production, noncoding RNAs, micro RNAs, or circular RNAs also have specific biological functions. (Eddy 2001; Carninci and Hayashizaki 2007; Carninci et al. 2008.
These discoveries, making the clear definition of the gene become vague, feel like a new “ignorant” period has begun. So, we have come a long way, from Johannsen’s definition, with the gene as a “genetic unit”, to the definition of the 1960s – a DNA strand that creates a polypeptide chain. And then another half century to see that the definition still does not fully cover the meaning of the gene. The gene is not the only unit that can be inherited, the gene and the gene products interact with each other in a complex network. Now, we use a relatively “safe” definition for the gene: The Vietnamese 12th grade biology textbook and the Biochemistry 6th, Lehninger both say: A gene is a segment of a DNA molecule that carries information and codes for a polypeptide chain or an RNA molecule.
The Future of Genomics Research
From the above analysis, it can be seen that genes are a relatively familiar scientific concept, but it turns out that they are still developing. Over time, each sequencing project brings a new understanding of the “backbone” of biological fields.
A typical example is the project Assembly of a pan-genome from deep sequencing of 910 humans of African descent | Nature Genetics, sequencing and assembling the genomes of more than 900 African samples, discovered up to 10% of the DNA that was completely new to the current reference genome (GRCh38), with a total length of up to 296 million bases, contained in 125,715 different segments. Although the functions of most of these newly discovered segments are still unknown, up to 315 segments are located in protein-coding genes, promising to answer the unknown questions about the genetic characteristics of Africans.
Following the above project, the Swedish Human Genome Project Discovery of Novel Sequences in 1,000 Swedish Genomes | Molecular Biology and Evolution | Oxford Academic (oup.com), discovered 46 million new bases in the Swedish population, located in 61,044 separate segments, especially many segments were found scattered in new African DNA sequences, revealing the migration origin of this population.
In Vietnam, since 2018, VinBigdata has embarked on a project to decode 1000 Vietnamese genomes. Along with this project, VinBigdata has also developed the largest biomedical data analysis, management and sharing system in Vietnam (VinGen Data Portal, link provided). The system is currently storing more than 2000 terabytes of data and nearly 5000 biological samples from the 1000 Vietnamese genomes project and a number of other application projects. The system is designed to comply with a number of data storage standards of the US National Institutes of Health (NIH) and the information security standards of the European General Data Protection Regulation (GDPR). From the success of these projects, VinBigData continues to coordinate with Hanoi Medical University and the University of Queensland, Australia, to plan to build and annotate the Vietnamese reference genome, as a premise for genomic research in Vietnam and the region (link to the Vietnamese Reference Genome project). The project has the potential to discover new gene segments specific to Vietnamese people, as well as contribute to the premise for precision medicine to take off.
The article is translated and summarized from: The Evolving Definition of the Term “Gene” – Genetics, 2017