Building a Precision Medicine Hub

Building a Precision Medicine Hub

Our understanding of human disease is progressing and so is the arc of precision medicine. Originally focused on genomics, thanks to the Human Genome Project, precision medicine is unfolding to include other -omics data and patient-level data such as clinical, environmental and behavioral data. And while no one ever claimed precision medicine would be easy, we’re learning just how hard it is to combine multiple, disparate data types for the purpose of improving human health. 

Indeed, the idea of precision medicine used to be more simple. In the beginning, there was only clinical data and expression data. Today, precision medicine is moving beyond single modalities and looking at how to assign treatment to patients based on multiple -omics data and other data types.

Adding more complexity is the way we now understand disease. Take cancer, for example. We used to think it was just one disease. But based on the seminal work of Charles Perou, who used expression analysis to subphenotype breast cancer, we now know that it’s many different diseases. Stratifying phenotype and genotype to distinguish disease based on various characteristics is what we must continue to do to develop effective therapies for patients. 

How we view our obligations is evolving as well. The U.S., after years of investing in precision medicine initiatives and realizing that its outcomes aren’t significantly better than nations with no investments, is revising the mantra of precision medicine. The “We must learn from every patient,” has shifted to “We must learn from every patient and translate what we learn to larger populations.” Doing otherwise simply isn’t scalable or cost-effective for our healthcare systems.

But we have made significant progress, and that is perhaps why precision medicine now poses the challenges it does. Advances in the -omics fields–transcriptomics, epigenomics, etc.–are continuing. We can begin to take a holistic approach to precision medicine, rather than the reductionist view we have been taking. Systems biology used to be a dirty word, but luckily it has become popular again.

And a new paradigm is emerging — that of population-based precision medicine initiatives, such as UK Biobank and All of Us. These initiatives examine genomics data alongside phenotypic data, but reveal just how sorely we need platforms to standardize, manage, and analyze multiple data types. The platforms must be able to transact multi-omics data along with electronic health records data. They must promote provenance, auditability, scalability, and security. And most of all, these platforms must be accessible to scientists and clinicians from multiple disciplines to transform data into information that helps us translate what we learn from patients to larger populations. 

To accommodate these complex needs, DNAnexus has partnered with industry leaders to build DNAnexus Apollo.

For more information, watch the video below.

1. Liu MC, Pitcher BN, Mardis ER, et al. PAM50 gene signatures and breast cancer prognosis with adjuvant anthracycline- and taxane-based chemotherapy: correlative analysis of C9741 (Alliance). Nature News. Published January 6, 2016. Accessed October 4, 2019.

What Does the Sunsetting of Python 2.7 Mean for You?

Sunsetting Python 2.x

As stated on, the Python core development team sunset Python 2.x on January 1, 2020 and moving forward, will support only Python 3.x. This announcement means that the Python organization will no longer provide security updates, bug fixes, or other improvements going forward. Read on for information about what this means for you as a user of the DNAnexus Platform.

The Fine Print

As mentioned above, any new security vulnerabilities discovered in Python 2 after January 1, 2020, will remain unpatched. The DNAnexus execution environment isolates the execution of apps in a secure Linux container, and mitigates the impact of potential Python 2 security vulnerabilities. Given the lack of support after Python 2 goes End-of-Life (EOL), significant security vulnerabilities may cause the DNAnexus Platform to disable execution of Python 2 or have you assume full liability for execution of your Python 2 code.

As of December 2019, we provide an Ubuntu 16.04 app execution environment, “Python 2 AEE,” which includes the following:

  • The dx-toolkit package (including the “dx” command-line client and the “dxpy” python module), configured in a way that requires Python 2.
  • The stock Ubuntu python2.7 interpreter, available at /usr/bin/python.
  • The stock Ubuntu python3.5.2 interpreter, available at /usr/bin/python3.

To facilitate the migration to Python 3, we plan to provide a new Ubuntu 16.04 AEE in the first quarter of 2020. This new “Python 3 AEE” will include the dx-toolkit package configured in a way that makes “dxpy” compatible with both Python 2 and Python 3. The “dx” command-line client will use Python 3.

Furthermore, we will introduce a new configuration option to dxapp.json so that you can select between “Python 2 AEE” and the new “Python 3 AEE.” In addition, we will introduce a new “python3” value for the “interpreter” dxapp.json configuration option.

In summary, while it’s  possible to use both Python 2.x or Python 3, to prevent any security issues, we strongly encourage you to review your code for Python 2.7 dependencies and consider migrating to Python 3.0. 

For More Information

To help with your planning and to further explain what this means, we’ve put together an FAQ.

Refining GWAS Results Using Machine Learning

Genome-wide association studies (GWAS) present a viable approach for researchers to identify genetic variations associated with a particular trait. GWAS have already identified several single nucleotide polymorphisms associated with diabetes, Parkinson’s disease, amongst others. However, these comprehensive studies frequently identify large numbers of genetic variants associated with the phenotypes, not all of which are causal. 

Fine mapping, which is a statistical process in which additional data are introduced to the GWAS dataset, enables researchers to prioritize those variants that warrant additional examination. And it also helps them identify which variants narrowly missed the genome wide significance threshold but actually are causal.

But fine mapping is easier said than done. For starters, you have to set up the proper computing environment — one that promotes traceability and reproducibility. Traceability and reproducibility become even more important when you are testing a drug which will potentially enter clinical trials. You also need to assemble the data in a way your fine mapping algorithms expects, which can be challenging. Not to mention the scientific challenges: it’s hard to compare and evaluate models and there are no frameworks that enable you to interact with the models and improve upon them.

The DNAnexus Platform provides end-to-end support for machine learning and also enables you to build and deploy the models such that domain scientists can ask questions and interact with the models themselves.

Join us for our upcoming webinar in which we provide an overview of how to refine your GWAS results using fine mapping. Specifically, by borrowing from Bayesian statistical methods, we present an interactive approach for applying machine learning-based models in fine mapping. Real-life examples will be demonstrated using UK Biobank data on the DNAnexus Platform. Register now.