Developing systems for management, analysis, and sharing of biomedical data
Current big data projects, with size up to petabyte or even exabyte, undertaken in various disciplines including biomedical field, have posed great challenges in terms of storing, managing, analyzing, and processing data. In addition to today’s trend using open data, open science, and open platform, sharing and standardizing diverse and heterogeneous data while ensuring data security also constitutes many challenges that need addressing. Developing a system for managing, analyzing, and sharing data that ensures portability, scalability, and reproducibility has become urgent and emphasized in a number of major biomedical projects recently. The 1000 Vietnamese Genome Project and other big data projects that are being and will be implemented at Vingroup Big Data Institute face similar challenges; therefore, the development of a system of the like is particularly important to ensure long-term efficiency of the project.
The overall objective of this project is to develop a system for management, analysis, and sharing of large datasets (MASH), which initially focuses on health data, and will be gradually expanded to other data sources. MASH needs to (1) be able to work with data models of each project and integrate with the analysis workflows of such project; (2) be designed to flexibly adapt to changes in data models and analysis workflows; (3) provide front-end that allows importing/ exporting/ displaying relevant data, and back-end that allows indexing/ storing/ managing/ securing large scale data, each of which could be up to terabytes, and the total capacity could be up to petabytes or even exabytes; (4) be implemented based on the most advanced open-source technologies available to ensure portability, scalability and reproducibility in managing and analyzing big data; and (5) be deployed on-premise or cloud-based computing.
MASH makes it easy, convenient, and rapid to manage, share, explore, visualize, and analyze data. MASH allows users including bioinformatics and biomedical researchers, data scientists, doctors, or students, etc. to exploit, search, and analyze data on the website, through which users could save a significant amount of time and money to perform their research. Furthermore, users are able to upload their own data to the system, utilize the system’s resources and services to perform analysis, and share data with the community. Through MASH, users could perform specialist analysis by employing the readily available features regardless of prior knowledge of programming techniques, visualization, and in-depth data analysis. MASH is developed and deployed with many layers of security to ensure data integrity and security as well as users’ privacy.
Center for Applied Data Science, University of Chicago (USA)
Collaboration in developing systems for management and analysis of Vietnamese genetic data
Hanoi Medical University
Cooperation in research to assess disease risk based on Vietnamese genome
Vietnam Academy of Science and Technology
Collaborate building Science Data Portal