The Vietnamese Reference Genome (VGR)
The world’s most recent reference genome (hg37/38) was based on US/European populations. Currently, the variant calling analysis to determine variants is mostly based on deviation of the individual genome from the hg38 reference. However, many of the variants displayed in the reference genome as reference alleles (more commonly found or major alleles) are decided based on populations of European descendants. These common alleles can be rare alleles in other populations, leading to miss-calling or false positive variants. The variant analysis of Asian genomes with less-related genetic backgrounds to hg38 will be less accurate. Therefore, it is the first critical step to establish a reference genome specific for Vietnamese population.
Several attempts to build ethnicity-specific reference genomes (e.g. Denmark, Japan, Korea, etc) has been established and have the potential to be more sensitive for variants calling than using the standard GRCh38 model. An early population-specific Vietnamese reference had been constructed using 100 Kinh genomes to improve short reads mapping and variant calling within the population, however, it was reference-based without indels and structure variations and it seems to be not accessible by the public.
In this project, we will combine the existing data from the 1000 Vietnamese Genomes Project with long-read sequencing guided by the current human genome reference to build the first near-complete Vietnamese Genome Reference (VGR). The VGR is scientifically novel in that it is a graph-based reference containing Vietnamese-specific genomic variants. The graph-based reference is the next generation reference structure in which not only a single linear genome but also associated variations will be presented. VGR is therefore able to represent genetic diversity in Vietnamese population. The VGR will serve as a more reliable reference than hg38 for genomic analysis studies on Vietnamese population (and more closely-related populations in Asia), such as variant calling analysis or comparing disease markers from Sanger or targeted exome sequencing.
Strategic Partners
-
University of Queensland
-
University of California San Diego