Morishita Shinichi, PhD
Competition Sponsor: Japan Agency for Medical Research and Development
The risk of getting a disease and the age of onset vary from person to person. If we can predict the risk and age of onset of disease for each individual, we can take preventive measures in advance to maintain quality of life and realize a healthy society with longevity where the elderly can experience happiness for many years. Statistics that classify the risk of disease by age are useful, but they are an average representation. We want to predict the risk and age of onset of disease for each individual with high accuracy. For this purpose, can we utilize information on all mutations in the personal genome?
For this prediction, we have been developing an algorithm that use single nucleotide variants in personal genomes. Meanwhile, large structural variants could also cause disease; however, they are generally more than 100 bases long and are hard to detect with next-generation short-read sequencing, so they are still largely unexplored. To understand structural variation on the scale of thousands of bases, we have been using long-read sequencing, and we reported tandem repeat expansions associated with benign adult familial myoclonic epilepsy (Nat. Gene. 2018). This extends in an intron with a basic unit of five bases repeated hundreds of time ore more. The longer the time, the lower the age of onset, and this is an example where the age of onset can be predicted from the number of repetitions.
We found tandem repeat expansions have a high evolutionary rate and can cause a variety of diseases (Nat. Gen. 2019). In particular, higher order repeats in human centromeres are divergent, and its has been speculated that this is due to the rarity of recombination. Indeed, we found a new linkage disequilibrium that spans the entire centromere (Sci. Adv. 2020). It is intriguing to see how structural variants of centromeres cause chromosomal instability and contribute to disease.
Based on these achievements, we propose and develop a new informatics approach to predict the risk of disease onset and age of onset from personal genomic variants.