ABRF: A Quick Meeting Recap

Here at DNAnexus, we’re lucky to have a terrific team supporting our goals. In this blog post, we wanted to share highlights from the recent ABRF meeting from the perspective of our marketing manager, Cristin Smith. Here’s her recap.

Just when we thought the Marco Island resort couldn’t be beaten for location, here comes the annual Association for Biomolecular Resource Facilities (ABRF) conference, held at the lakeside Disney Contemporary Resort right in the heart of Disney World, complete with a view of Space Mountain. I’m pretty sure the team back home in Mountain View was a little concerned that we weren’t going to come back.

The meeting’s opening keynote came from Trisha Davis, who runs the Yeast Resource Center at the University of Washington. Her work has focused on using yeast as a proving ground for various technologies, noting that as her center has evolved, so too has her team’s ability to really drill down into targeted interrogations of the organism. During her talk, entitled “Technology Development in a Multidisciplinary Center,” she noted how important it is to integrate multiple complex analyses in an attempt to relate genotype to phenotype.

On the final day of the meeting, “Omics Technologies to Transform Research, Health & Daily Life” also resonated with me. This was Harvard professor George Church’s vision of a future where genome sequence information is widely used and readily available. He spoke about some current logistical limitations, such as the fact that a $100 blood draw is cost-prohibitive, and that the field will have to move toward buccal swabs and other technologies that may cost only $1 to process in order for ’omic testing to become affordable. Citing some 37 next-gen sequencing technologies as the driver for the rapid drop in sequence costs, he said that his own estimate of the current genome price — from sample to interpretation — is $4,000. In order for genome sequencing to become medically useful, Church noted a few factors that will have to be addressed: a focus on completeness and standards to give FDA confidence in these technologies; the need for significantly more genetic counselors than we have right now; and better interpretation software that makes genome analysis truly straightforward.

Overall, we were excited to see how eager the core lab community is to receive technology improvements that generate a higher quantity and quality of sequence data for their customers in support of their research. This enthusiasm was a great setting to unveil our newly redesigned booth at the exhibit hall. It’s hard to find a more tech-loving crowd than the people who run core facilities, and we were glad to meet so many of them last week.

SOT: Still Early Days for Next-Gen Sequencing in Molecular Toxicology

The Society of Toxicology’s 51st annual meeting was held this week right in our back yard. Since I am a longtime member, I headed up to the Moscone Convention Center in San Francisco to check it out. The Annual Meeting and ToxExpo were packed; almost 7,500 people and more than 350 exhibitors.

SOT isn’t like the sequencing-focused meetings I’ve been attending since I joined DNAnexus, but it’s actually home turf for my own research background in toxicogenomics. This year’s meeting sponsors included a number of pharmas and biotechs, from Novartis and Bristol-Myers Squibb to Amgen and Syngenta. Scientific themes at the conference ranged from environmental health to clinical toxicology to regulatory science and toxicogenomics. Next-gen sequencing is still in its infancy in the world of molecular toxicology, which is still dominated by microarray expression experiments. There were very few posters showing applications of NGS data in toxicogenomics — the ones that did tended to be centered around microRNAs — but a lot of the people I had conversations with have recently started running sequencing studies to eventually retire microarray type experiments.

I found Lee Hood’s opening presentation particularly interesting because he focused on the need to combine data from various technology platforms and institutions all over the world. He talked about his P4 vision, of course — the idea that medicine going forward will have to be predictive, personalized, preventive, and participatory. He also included great gems about fostering a cross-disciplinary culture, mentioning genome sequencing of families, the human proteome, and mining genomic data together with phenotypic and clinical data.

Lee Hood. Photo Copyright Chuck Fazio

Another exciting talk that was well received came from Joe DeRisi at the University of California, San Francisco. He presented work analyzing hundreds of honey bee samples with microarrays combined with DNA and RNA sequencing. Using an internally developed de novo assembler called PRICE (short for Paired-Read Iterative Contig Extension; freely available on his website), his team identified a number of different organisms associated with the sequence data of the honey bee samples, including different viruses, phorides, and parasites. At this moment it’s not clear what is causing the honey bee population decline; it appears that there are multiple factors contributing to the phenomenon. It is great to see that DeRisi and team will continue working in this area.

Last but not least, Scott Auerbach from the National Toxicology Program announced the release of the previously commercial toxicogenomics database DrugMatrix to the public for free (announced earlier this year, but now officially made public). With this release, DrugMatrix is now the largest scientific and freely available toxicogenomic reference database and informatics system. The data included is based on rat organ toxicogenomic profiles for 638 compounds; DrugMatrix allows an investigator to formulate a comprehensive picture of a compound’s potential for toxicity with greater efficiency than traditional methods. All of the molecular data stems from microarray experiments, but Auerbach and team are now investigating what it will take to move from microarrays to RNA-seq experiments and how to integrate the different types of data. They are currently performing a pilot on a subset of compounds with the same RNA used for the microarray experiments. Their challenge, as he sees it, lies in the interpretation and validation of the newly generated RNA-seq data: what qualifies one platform as superior to the other? Since they are interested in the biology and in generating drug classifiers, one way of looking at it is to assess which platform is the basis for better classifiers based on sensitivity and specificity thresholds. It will be interesting to see whether the RNA-seq data-based classifiers will be comparable or superior to microarray classifiers.

On Being Platform Agnostic

One inevitable outcome of the ever-expanding number of DNA sequencing platforms is the lock-step addition of new data types. The technologies developed by Complete Genomics, Illumina, Life Tech/ABI/Ion Torrent and Pacific Biosciences produce the lion’s share of genomic data today. But Genia, GnuBio, NABsys, Oxford Nanopore and others are in the wings, poised to pile significantly more on.

Every sequencing platform relies on a different technology to read the As, Ts, Cs, and Gs of a genome. This presents a number of major challenges in assembly and sequence accuracy across platforms due to varying read lengths, method-specific data generation, sequencing errors, and so forth. However, while all have their nuances, they all have potential value to the progress of life science and medical research.

A complete solution to this problem would involve models for each platform, accounting for the generation and characteristics of libraries, data collection, transcript distributions, read lengths, error rates, and so on. The fact that a standard solution for integrating all these data types doesn’t currently exist is a testament to the difficulty of this task, which shouldn’t be underestimated.

The solutions most commonly used today for managing this diversity of data are the products of enterprising bioinformaticians who have developed “home-brewed” applications capable of taking primary data created by the instrument and, among other tricks, performing alignments to a reference genome and/or completing assemblies. While these workarounds provide a band-aid, they are not available for all platforms, rarely scalable and take highly experienced technical users to manage.

As genomic data continues its march beyond core facilities and into a broader range of research labs, healthcare organizations and, eventually, point-of-care providers, the need becomes even more acute for technologies that can — as far as the user is concerned — effortlessly perform the challenging tasks of integrating data from multiple sources for annotation and interpretation and combining them with the analysis and collaboration tools needed to glean insights.

As an industry, we need to start taking a more platform-agnostic approach towards the analysis and visualization of sequencing data. This is particularly critical as new platforms enter the market, collaborations across institutions, labs and borders expand and “legacy” data is incorporated into new repositories.

At DNAnexus, we are committed to removing the complexities inherent in working with diverse datasets so that scientists and clinicians can focus on the more impactful areas of data analysis and knowledge extraction. We are also committed to providing a secure and user-friendly online workspace where collaboration and data sharing can flourish.

Stay tuned for much more on this topic and let us know about the challenges you face when working with multiple data types and what kind of datasets you’d like to see more easily integrated into your work.