Helping Scientists Discover the Hidden Jewels Within UK Biobank Data

Angela Anderson

 

 

 

Did you know that a loss of function mutation in the MEPE gene, which codes for proteins that regulate bone mineralization, results in two-fold increased odds of osteoporosis and 1.5-fold increased risk of fractures? Neither did scientists, until they delved deeply into data released by the UK Biobank.

The UK Biobank, which has developed a biospecimen collection paired with unparalleled health data and phenotype information from more than 500,000 individuals, is making its massive genotype/phenotype dataset available to approved researchers world-wide, as part of a unique open access project aimed to accelerate medical research and drug discovery. Earlier this month, it released the first batch of results of whole-exome sequencing, providing data from 50,000 people. These exomes were sequenced by the Regeneron Genetics Center, which will soon complete sequencing of the entire 500,000 person cohort with the financial support of a “pre-competitive” consortium of leading biopharma companies.

Contained inside these data is a treasure trove of information — but one in which the jewels are hidden among billions of grains of sand.

How would Dana, a senior scientific researcher in the early development cardiovascular program at a pharma company, use it to determine whether there is a correlation between mutations in the PKP2 gene and a phenotype trait, such as red hair?

Luckily, the new DNAnexus Cohort Browser for UK Biobank makes phenome-wide association studies (PheWAS) and other queries easy to carry out. Developed to help researchers like Dana navigate thousands of phenotypic fields and millions of genetic variants, the browser can mine extremely large datasets in a matter of seconds.

How does it work? Dana would simply plug in her phenotypic (red hair) and genetic (PKP2, familial links) requirements into a built-in browser with a powerful point-and-click interface that makes it easy for her to quickly filter, browse, and visualize the integrated phenotypic and genomic information.

Run on our DNAnexus Apollo Platform, the Cohort Browser for UK Biobank was designed to enable scientists with all levels of bioinformatic expertise to be able to rapidly test multiple hypotheses and gain insight into mechanisms of action, biomarkers, and targets. And working with clinical data requires specialized capabilities to maintain privacy, which includes compliance in accordance with ISO 27001 certification, GDPR, and GxP, among others.

As part of their effort to explore the UK Biobank data, the Regeneron Genetics Center (RGC) used DNAnexus to run bioinformatics pipelines and deliver the results back to pharmaceutical companies in the UK Biobank Exome Sequencing Consortium. As part of this data delivery, the RGC successfully deployed the cohort browser on a collection of thousands of phenotypic fields extracted from the UK Biobank and millions of genetic variants computed through their scientific pipeline.

Initial analysis of the UK Biobank data has already led to many discoveries.

In addition to the MEPE mutation finding, RGC researchers identified a handful of other significant novel loss-of-function associations, including one that confers a nearly five-fold increased odds of varicose veins in certain carriers.

Among the nearly four million single nucleotide and indel coding variants observed by the researchers were many mutations to the so-called “ACMG59” genes — 59 genes proposed by The American College of Medical Genetics to be associated with highly penetrant disease phenotypes.

Overall, 2% of the sequenced individuals carried a flagged variant in one of the ACMG59 genes. Variants in cancer-associated genes were the most prevalent, followed by variants associated with familial hypercholesterolemia and cardiac dysfunction disorders.

Importantly, the Browser for UK Biobank allows researchers to check their hypotheses against real-world data from the de-identified patients’ records. For instance, an individual with a pathogenic missense variant could be found to have a history of benign colon neoplasms, diverticular disease of the intestine, colonic polyps, and intestinal obstruction.

These discoveries are a great illustration of how the extensive health data available for UK Biobank participants will be a valuable resource to assess disease risk at both the individual and the population level.

The crowdsourcing spirit of the initiative is what makes it stand apart. When the Manchester-based biobank enrolled its first volunteer 13 years ago, principal investigator Rory Collins wanted to democratize the data and maximize its scientific pay-off: “By making data available to 100 people around the world, we can get a lot more research done than if I sit here and do one study a year with the data,” the University of Oxford epidemiologist told Science.

Earlier releases of genotyping data in 2015 and 2017 have already resulted in more than 600 papers across 1,400 projects from 7,000 researchers. Additional tranches of exome and whole genome data will similarly be released over the next two years.

The hope is that the easy-to-use DNAnexus Apollo Platform and its Cohort Browser for UK Biobank will help even more researchers navigate the complexities of generating and delivering the combined phenotype and genetic data.

“This is just the beginning,” said Aris Baras, MD, Senior Vice President and Head of the Regeneron Genetics Center. “There is so much actionable information in this resource that can be utilized by scientific minds around the globe. We are hard at work mining the data for novel findings that will accelerate science, innovative new medicines and improved patient care, and are excited to have others join us in this important quest.”

Researchers interested in applying for access to UK Biobank data should visit http://www.ukbiobank.ac.uk/register-apply/.

For more information about the UK Biobank Cohort Browser, visit https://go.dnanexus.com/apollo_ukb.

PrecisionFDA Receives FDA Commissioner’s Award for Outstanding Achievement

Today, the precisionFDA Next Generation Sequencing (NGS) Team received the FDA Commissioner’s Special Citation Award for Outstanding Achievement and Collaboration in the development of the precisionFDA platform promoting innovative regulatory science research to modernize regulation of NGS-based genomic tests. This award recognizes superior achievement of the Agency’s mission through teamwork, partnership, shared responsibility, and fostering collaboration to achieve the FDA goals.

 

PrecisionFDA is an online, cloud-based, virtual research space where members of the genomics community can experiment, share data and tools, collaborate, and define standards for evaluating and validating analytical pipelines. This open-source community platform, which has become a global reference standard for variant comparison, includes members from academia, industry, healthcare, and government, all working together to further innovation and develop regulatory standards for NGS-based drugs and devices. Launched in December 2015, the precisionFDA community includes nearly 5,000 users across 1,200 organizations, with more than 38 terabytes of genomic data stored.

To date, the precisionFDA NGS Team has engaged the genomics community through a series of community challenges:

  • The Consistency Challenge (Feb-Apr 2016): Invited participants to manipulate datasets with their software pipelines and conduct performance comparisons.
  • The Truth Challenge (Apr-May 2016): Gave participants the unique opportunity to test their NGS pipelines on an uncharacterized sample (HG002) and publish results for subsequent evaluation against a newly-revealed ‘truth’ dataset.
  • App-a-thon in a Box (Aug-Dec 2016): Invited the community to contribute NGS software to the precisionFDA app library, enabling the community to explore new tools.
  • Hidden Treasures Competition (Jul-Sep 2017): Participants beta-tested the in-silico analyses of NGS datasets for the purpose of determining the reliability and accuracy of different NGS tests.
  • CFSAN Pathogen Detection Challenge (Feb-Apr 2018): Participants helped to improve bioinformatics pipelines for detecting pathogens in samples sequenced using metagenomics.

We are thrilled that precisionFDA has been recognized for its efforts in fostering shared responsibility for the evaluation and validation of analytical pipelines. PrecisionFDA’s proven success has driven other scientific communities such as St. Jude Cloud to promote pediatric cancer research, and the Mosaic microbiome platform for advancing microbial strains analysis, to establish their own collaborative ecosystem for members to contribute and innovate. DNAnexus is proud to be the platform that powers precisionFDA and other community portals to advance scientific research through a secure and collaborative online environment.

To learn more about DNAnexus community portals please visit: https://go.dnanexus.com/community-portals.

For St. Jude, Advancing Cures for Pediatric Cancer Means Accelerating Genomic Discovery and Collaboration

Angela Blog Author

 

 

 

Historically, cancer research has been slowed by an inability to make genomic data rapidly accessible to research collaborators. Last week, St. Jude Children’s Research Hospital took a big step toward solving this problem with its launch of St. Jude Cloud, an online platform that allows researchers to access the world’s largest public repository of pediatric cancer genomics data. Developed in partnership between St. Jude, Microsoft, and DNAnexus, St. Jude Cloud provides a flexible cloud platform for rapid data mining, analysis and visualization capabilities.

St. Jude has long been a leader in advancing cures for pediatric cancer and other life-threatening diseases, and continues to develop new approaches to revolutionize the way medicine is practice. St. Jude Cloud is the latest unique tool developed in the fight to advance cures for pediatric diseases. DNAnexus is proud to serve as the technology platform that brings together St. Jude researchers and their partners in a secure and collaborative ecosystem.

Collaboration fuels scientific advancements, and St. Jude Cloud is already doing just that. In a paper that was recently published in Nature, St. Jude researchers lead by Jinghui Zhang, PhD, discovered mutations connected to UV damage in a B-cell leukemia. This was a very surprising finding and led the team to ask whether other leukemia samples not included in the original study might have a similar mutational pattern. Scott Newman, PhD, used St. Jude Cloud to reproduce the original experimental findings in just a few days whereas the original research took more than two years to complete.

Using St. Jude Cloud, Newman was able to conduct large-scale data analysis enabling him to identify the same UV-linked mutational signature in pediatric B-Cell leukemia patients over four days. Discovering these additional samples further helped researchers understand the possible link between UV damage and a blood cancer and potentially leads to the development of new therapies. Learn more about the St. Jude Cloud and its research capabilities via Q&A with Newman featured in the St. Jude Progress.

Like St. Jude Cloud, DNAnexus delivers fit-for-purpose community portals to advance scientific research through a secure and collaborative online environment that has been independently audited and certified. DNAnexus community research portals allow members to focus on discovery and innovation, removing the burden of secure data management, distribution, and data analysis. Other community research portals powered by DNAnexus include the FDA’s precisionFDA platform for advancing regulatory standards for NGS-based drug and devices, and the microbiome research platform, Mosaic, which facilitates the translation of microbiome research into clinical applications.

Learn more about DNAnexus community portals and determine which use case is right for you.