Genomics-the study of all the genetic instructions in an organism – can help researchers develop new diagnostic tools, tailor treatments based on an individual’s genetic make-up, and design novel approaches to study and treat disease. Whole-genome sequencing analyzes all the 6 billion DNA bases in a person’s genome. Another approach called whole-exome sequencing focuses only on the DNA that codes for proteins. Such regions make up about 1% of the human genome, but contain many disease-causing variants.
In order to study sequence variants and their consequences in human DNA, researchers need access to large data sets. The Exome Aggregation Consortium (ExAC) – an international group led by scientists at the Broad Institute of MIT and Harvard – built on previous efforts and assembled the largest data set of human exomes to date. The effort was supported by several NIH components as well as funding agencies around the world. The team demonstrated the benefits of this massive data set in a report published in Nature.
ExAC scientists assembled a data set of more than 60,000 human exomes from diverse populations. Significantly, all the DNA had been sequenced deeply – that is, each nucleotide was sequenced enough times to ensure the data’s accuracy.
The researchers discovered more than 7.4 million DNA variants across the exome. These corresponded to an average of one variant for every 8 DNA base pairs. The majority of these occurred at very low frequency and weren’t detected in previous data sets. Patterns of variation differed among people of European, African, South Asian, East Asian, and Latino ancestry.
The researchers found that the density of genetic variation isn’t uniform across the exome. This is partly because some DNA sequences are more susceptible to mutation than others. Some mutations also have greater consequences than others and are thus more likely to be preserved or eliminated.
Using their new findings, the researchers categorized genes according to how resistant they are to change, and thus how crucial they are to function. This knowledge could help investigators make decisions about which genes of interest to prioritize in future studies.
The scientists assessed whether the new data set could yield insights into inherited disease. They identified hundreds of suspected disease variants that should be reclassified as benign (harmless) – particularly in South Asian or Latino people, who were underrepresented in previous reference databases. These results show that the data set can serve as an important resource for interpreting genetic variants seen in the clinic.
“The scale and diversity of the ExAC resource is invaluable,” says senior author Dr. Daniel MacArthur of the Broad Institute, Massachusetts General Hospital, and Harvard Medical School. “It gives us the ability to discover extremely rare variants and offers an unparalleled window into the roots of rare genetic diseases.” The data catalog (link is external) is freely available to the biomedical community.
Article: Analysis of protein-coding genetic variation in 60,706 humans, Monkol Lek, Konrad J. Karczewski, Eric V. Minikel, Kaitlin E. Samocha, Eric Banks, Timothy Fennell, Anne H. O’Donnell-Luria, James S. Ware, Andrew J. Hill, Beryl B. Cummings, Taru Tukiainen, Daniel P. Birnbaum, Jack A. Kosmicki, Laramie E. Duncan, Karol Estrada, Fengmei Zhao, James Zou, Emma Pierce-Hoffman, Joanne Berghout, David N. Cooper, Nicole Deflaux, Mark DePristo, Ron Do, Jason Flannick, Menachem Fromer, Nature, doi:10.1038/nature19057, published online 17 August 2016.