On Being Platform Agnostic

One inevitable outcome of the ever-expanding number of DNA sequencing platforms is the lock-step addition of new data types. The technologies developed by Complete Genomics, Illumina, Life Tech/ABI/Ion Torrent and Pacific Biosciences produce the lion’s share of genomic data today. But Genia, GnuBio, NABsys, Oxford Nanopore and others are in the wings, poised to pile significantly more on.

Every sequencing platform relies on a different technology to read the As, Ts, Cs, and Gs of a genome. This presents a number of major challenges in assembly and sequence accuracy across platforms due to varying read lengths, method-specific data generation, sequencing errors, and so forth. However, while all have their nuances, they all have potential value to the progress of life science and medical research.

A complete solution to this problem would involve models for each platform, accounting for the generation and characteristics of libraries, data collection, transcript distributions, read lengths, error rates, and so on. The fact that a standard solution for integrating all these data types doesn’t currently exist is a testament to the difficulty of this task, which shouldn’t be underestimated.

The solutions most commonly used today for managing this diversity of data are the products of enterprising bioinformaticians who have developed “home-brewed” applications capable of taking primary data created by the instrument and, among other tricks, performing alignments to a reference genome and/or completing assemblies. While these workarounds provide a band-aid, they are not available for all platforms, rarely scalable and take highly experienced technical users to manage.

As genomic data continues its march beyond core facilities and into a broader range of research labs, healthcare organizations and, eventually, point-of-care providers, the need becomes even more acute for technologies that can — as far as the user is concerned — effortlessly perform the challenging tasks of integrating data from multiple sources for annotation and interpretation and combining them with the analysis and collaboration tools needed to glean insights.

As an industry, we need to start taking a more platform-agnostic approach towards the analysis and visualization of sequencing data. This is particularly critical as new platforms enter the market, collaborations across institutions, labs and borders expand and “legacy” data is incorporated into new repositories.

At DNAnexus, we are committed to removing the complexities inherent in working with diverse datasets so that scientists and clinicians can focus on the more impactful areas of data analysis and knowledge extraction. We are also committed to providing a secure and user-friendly online workspace where collaboration and data sharing can flourish.

Stay tuned for much more on this topic and let us know about the challenges you face when working with multiple data types and what kind of datasets you’d like to see more easily integrated into your work.

Load Up on Caffeine … AGBT Is Almost Here

View from the Marcos Island Marriott, the AGBT venue

We’re gearing up for the Super Bowl of the next-gen sequencing field – the Advances in Genome Biology and Technology (AGBT) meeting held annually in Marco Island, Fla. In a typical year, there would be major announcements from the established sequencing vendors at this event, but given that Life Technologies and Illumina already went public with their big news at JP Morgan, and the Roche bid for Illumina will likely still be playing out, the big stories from this year’s meeting will probably revolve around major research findings, technology applications, and what’s going on with the sequencing upstarts. (Oxford Nanopore, for example, will be announcing plans to commercialize its instrument later this year and providing attendees a sneak peek. GnuBio will also be presenting on its desktop sequencer, the iGnuIT 1000.)

As usual, this year’s agenda is chock full of thought-provoking presentations, including a talk by DNAnexus co-founder Arend Sidow, who will be presenting on the use of deep whole-genome sequencing to monitor breast cancer progression (Thursday, Feb. 16, at 4:35pm).

We’ll be there to meet with colleagues, customers, and potential collaborators. We’ll also be presenting two posters on current DNAnexus projects. If you’ll be there, we encourage you to stop by — find out more about us, get a demo, have some wine and cheese, you name it. Here’s a quick preview of what we’ll be showcasing:

  • Candidate Gene Variants in “Micronesian” Autosomal Recessive Aplastic Anemia – Brigitte Ganter, Majed Dasouki, S. Abhyankar, M. Furness, R. Calado
    This work was done with collaborators at the University of Kansas Medical Center and National Heart, Lung, and Blood Institute (NHLBI). In the project, researchers performed exome sequence and nucleotide-level variation analyses for two siblings with aplastic anemia, a condition where bone marrow does not produce sufficient new cells to replenish blood cells. The results led to the identification of 12 candidate homozygous variants in 9 different genes. In this poster, we’ll discuss how DNAnexus was used to identify these variants and characterize their potential role in aplastic anemia.
  • Expanding and Enhancing Access to the Sequence Read Archive (SRA) Through a Complementary New Web-Based Mirror – Brigitte Ganter, Evan Worley, Bing Xia, Andreas Sundquist
    As we announced last October, we teamed up with Google to develop a complementary hosted mirror of NCBI’s Sequence Read Archive (SRA). Through a typical user scenario, we will discuss the underlying data processing pipeline, key features of the new web-based interface and how researchers can use it to quickly identify and browse datasets of interest, link-out to PubMed references, and integrate data into follow-on analysis workflows.