Webinar Series: Enabling PacBio Long-Read Bioinformatics in the Cloud

We are excited to be hosting, in collaboration with our partner PacBio, an inaugural webinar series focused on best practices for analyzing SMRT® Sequencing data.

De novo genome assembly and structural variant calling are complex tasks, which can require massive computational resources to weave long-reads into a final, polished assembly or run a variety of SV detection methods across multiple data types. Reference genome assembly is done far less frequently than whole genome sequencing, just as the case with SV detection vs. SNP detection.  DNAnexus and PacBio are collaborating to make tools and resources easily accessible and enabling researchers to take long-read bioinformatics to new heights. Learn about today’s best practices for PacBio sequencing data.

Session 1: Rapid Reference-Quality Genome Assembly
May 4, 2016
8:00am PST, 11:00am EST, 5:00pm CET


Brett Hannigan, PhD, Director, Scientific Partnerships at DNAnexus, will present how running FALCON on the DNAnexus Platform can provide a fast, accurate, and cost-efficient solution for de novo genome assembly. In this webinar, we’ll examine the challenges around assembling the tobacco genome (comprised of 4.5 billion base pairs), which is tetraploid in nature and highly repetitive.

Session 2: Simplifying Structural Variant Discovery
June 16, 2016
8:00am PST, 11:00am EST, 5:00pm CET


AndrewAndrew Carroll, PhD, Vice President, Science at DNAnexus will present how current solutions that use PacBio data can greatly improve the accuracy of SV-calling by using fast and easy to run cloud-optimized apps (PBHoney, Parliament, & Sniffles). We will also explore the current work we are doing with Genome in a Bottle (GIAB) to develop high confidence truth sets for structural variants. Finally, Andrew will discuss how sequencing coverage correlates with the ability to accurately call structural variants, to inform decisions about the ideal depth to sequence.

Who should attend?
Researchers currently working with or those who desire to work with PacBio RSII and/or Sequel data.

On Being Platform Agnostic

One inevitable outcome of the ever-expanding number of DNA sequencing platforms is the lock-step addition of new data types. The technologies developed by Complete Genomics, Illumina, Life Tech/ABI/Ion Torrent and Pacific Biosciences produce the lion’s share of genomic data today. But Genia, GnuBio, NABsys, Oxford Nanopore and others are in the wings, poised to pile significantly more on.

Every sequencing platform relies on a different technology to read the As, Ts, Cs, and Gs of a genome. This presents a number of major challenges in assembly and sequence accuracy across platforms due to varying read lengths, method-specific data generation, sequencing errors, and so forth. However, while all have their nuances, they all have potential value to the progress of life science and medical research.

A complete solution to this problem would involve models for each platform, accounting for the generation and characteristics of libraries, data collection, transcript distributions, read lengths, error rates, and so on. The fact that a standard solution for integrating all these data types doesn’t currently exist is a testament to the difficulty of this task, which shouldn’t be underestimated.

The solutions most commonly used today for managing this diversity of data are the products of enterprising bioinformaticians who have developed “home-brewed” applications capable of taking primary data created by the instrument and, among other tricks, performing alignments to a reference genome and/or completing assemblies. While these workarounds provide a band-aid, they are not available for all platforms, rarely scalable and take highly experienced technical users to manage.

As genomic data continues its march beyond core facilities and into a broader range of research labs, healthcare organizations and, eventually, point-of-care providers, the need becomes even more acute for technologies that can — as far as the user is concerned — effortlessly perform the challenging tasks of integrating data from multiple sources for annotation and interpretation and combining them with the analysis and collaboration tools needed to glean insights.

As an industry, we need to start taking a more platform-agnostic approach towards the analysis and visualization of sequencing data. This is particularly critical as new platforms enter the market, collaborations across institutions, labs and borders expand and “legacy” data is incorporated into new repositories.

At DNAnexus, we are committed to removing the complexities inherent in working with diverse datasets so that scientists and clinicians can focus on the more impactful areas of data analysis and knowledge extraction. We are also committed to providing a secure and user-friendly online workspace where collaboration and data sharing can flourish.

Stay tuned for much more on this topic and let us know about the challenges you face when working with multiple data types and what kind of datasets you’d like to see more easily integrated into your work.