Ten-Fold PacBio Sequel Call Set Proves Affordable & Effective in Identifying Structural Variants

Pacific Biosciences, a key partner of DNAnexus, has released the first public Sequel™ dataset of NA12878. This is a 10-fold coverage set featuring 32.8Gb of data, with an N50 read length of 11.8kb. Generated by PacBio’s new Sequel System, this dataset was used to demonstrate the robust ability of even low coverage long-read data to discover novel structural variants. The Sequel System is smaller, faster, and provides higher throughput, delivering around 7X the amount of data as the PacBio RS II.

screen-shot-2016-11-28-at-10-21-24-amThe Sequel System is half the cost of a PacBio RS II, five times faster and produces seven times as much data per SMRT Cell.  We believe that with these improvements the Sequel System is poised to open up long-read sequencing to a broader audience.  It will allow access to more robust applications ranging from genome and transcriptome assembly to variant detection. This is demonstrated by the Parliament Suite on DNAnexus, where combining low coverage PacBio reads with a short read dataset can significantly improve both the accuracy and the number of structural variant calls compared to short reads alone. In addition, 10-fold PacBio sequencing of NA12878 has been shown to recall 84% of known structural variants (SV) and identifies thousands more not previously seen in short reads by using SV tool, PBHoney.

Aside from structural variation, PacBio long reads have been used for robust, high quality de novo genome and transcriptome assemblies. Additionally, with instrument and chemistry improvements made for the Sequel System, the cost for generating a 50-fold coverage human dataset for resequencing and de novo assembly is expected to decrease dramatically.

Despite sequencing costs dropping, de novo genome assembly and structural variant calling remain complex tasks; ones that can require massive computational resources to weave long reads into a final, polished assembly or to run structural variation detection methods across multiple data types.  For this reason, PacBio has selected DNAnexus to be its cloud bioinformatics partner, providing bioinformatics support to its global customers. The SMRT® Analysis Suite v3.1.1 is available on the DNAnexus Platform and has been optimized for the cloud environment, as well as other long read analysis tools, such as PBHoney, PBJelly, and Parliament.

Curious about PacBio tools and services on DNAnexus? Schedule a 30-minute scientific consultation.


De novo assemblies of individual human genomes via the PacBio RS II at high-fold coverage have revealed tens of thousands of structural variants, many of which are accessible only through SMRT Sequencing. In an effort to optimize SV discovery methods, PacBio set out to understand what SV’s could be identified in a well-studied human sample NA12878 from low-fold coverage sequencing on the new Sequel System. To create the NA12878 Sequel dataset, PacBio generated approximately 10-fold coverage of the NA12878 sample on the Sequel System, which comes to about $5,000 in sequencing cost. The newly generated long reads were mapped to GRCh37 human reference using NGM-LR, and structural variants were called with PBHoney.

The output calls were compared to a “truth set” generated as a merged set between the 1000 Genomes Project and Genome in a Bottle NA12878 sets, both of which were analyzed using short read technology at much higher coverage. The low coverage PacBio 10-fold Sequel System set recalled 86% of truth set deletions and 81% of truth set insertions. Above and beyond, the 10-fold Sequel call set identified thousands of insertions and deletions not found in the short read truth set, with over 66% of these novel structural variants verified using a FALCON-Unzip 60-fold PacBio RS II de novo assembly.

This dataset demonstrates that low coverage Sequel reads can be used for accurate variant calling as well as novel structural variant identification, all of which is now available at a fraction of the cost with the new Sequel System. Through the partnership with DNAnexus, you can recreate the analysis performed on the NA12878 10-fold set. These tools and the generated dataset can be found on DNAnexus under Featured Projects.

Questions? Contact us directly at: pacbio@dnanexus.com.

Leading Genome Research Center Migrates to DNAnexus on Azure

DNAnexus on Microsoft AzureToday we announced that the trusted DNAnexus genome informatics and data management platform is now also available on Microsoft Azure, Microsoft’s open, flexible, enterprise-grade cloud computing platform. Leveraging Azure, DNAnexus provides organizations a single, secure, scalable, and collaborative platform to accelerate the application of genomics within healthcare and research. The Stanford Center for Genomics and Personalized Medicine (SCGPM) is the first organization to access DNAnexus on Azure.

scgpmA key advantage to conducting genomic research in the cloud is the enhanced collaboration facilitated by data accessibility, consistency, and scalability. SCGPM researchers already have existing collaborations on the DNAnexus Platform hosted by Amazon Web Services, by extending adoption of DNAnexus on Azure means that researchers can collaborate even more widely. By leveraging DNAnexus on Azure’s powerful data-handling capabilities, a distributed network of scientists and researchers have secure access to terabytes of data through a common user interface.

DNAnexus and Microsoft are both valued partners to Stanford’s core sequencing facility. SCGPM and David Heckerman, distinguished scientist and director of Microsoft Genomics, have been in close collaboration for years. By extending the DNAnexus Platform to Azure, it is now easier for SCGPM researchers to work closely with David’s team. We believe we are just seeing the tip of the iceberg in terms of the potential for medical discovery.

DNAnexus is proud to support SCGPM on its mission to translate genomics into patient-centered medicine, and we look forward to enabling the discoveries that unfold.

DNAnexus on Microsoft AzureInnovation Through Collaboration

Through additional partnerships, Microsoft recently developed computational methods to accelerate the best practices pipeline for genome resequencing sevenfold. By improving the efficiency of the Burrows-Wheeler Aligner (BWA) and Genome Analysis Toolkit (GATK), researchers and medical professionals are able to get actionable results in just four hours, compared to the previous twenty-eight. This is critical for medical professionals to accelerate diagnosis and treatment for patients.

Genomic sequencing and analysis has become a key component of the diagnosis and treatment of cancer and other genetic conditions. This effort has both relied on and stimulated innovative technologies. At DNAnexus, we firmly believe that in order to continue innovating and further break down the technical barriers to disease, community collaboration is essential. The sharing of data and ideas between organizations – and even industries – spurs the innovation critical to medical breakthroughs. Microsoft is a global leader in technological innovation, and by partnering with leading research centers, universities, and the private sector, it is poised to make great contributions to the genomics revolution.

The DNAnexus Platform sits at the forefront of cloud-based data security, compliance, and controlled access. By co-developing with DNAnexus, Microsoft will be able to deploy their tools into an investigative environment while leveraging extensive research experience. We are excited to be collaborating with Microsoft and to offer these cutting-edge bioinformatics tools available to the genomics community via the DNAnexus Platform in the future.

Facilitating Collaboration on DNAnexus

The need for enhanced collaboration is a trend in the genomics industry we have been following for a while. DNAnexus equips end-users with out-of-the-box clinical compliance and streamlines communication between healthcare providers, reducing information silos for more efficient collaboration.

However, this notion of partnership goes deeper than groups of scientists working together to parse through datasets. Innovation and exploration are best served through collaboration, thus successful innovation in the genomics industry also relies on disparate industries working together towards a common goal. By tapping into the genomics network, the community is able to learn from each other to advance research, leading to accelerated medicine and tailored patient care.

DNAnexus is excited about the opportunity to partner with Microsoft, given their commitment to advancing the field of genomics, and their depth and breadth of experience offering solutions to the healthcare industry.

Join Us for a Lunchtime Discussion at ASHG

At DNAnexus we always look forward to attending the American Society of Human Genetics (ASHG) annual meeting.  It’s the world’s largest genetics meeting and this year it’s held in the quintessential coastal seaport of Vancouver, BC.  This meeting always delivers by showcasing cutting edge science in the genetics and genomics industry. ashg

Stop by the DNAnexus booth (#100) to demo the latest platform features, hear about new research applications the DNAnexus Platform is supporting, and join our lunchtime discussion to learn how DNAnexus has created the global network for genomics – and what that means for you.

The lunchtime discussion, The Rise of the Genomics Network, will highlight the need for improved approaches to data integration, scalability, and global collaboration within the genomics industry. In clinical genomics, pipelines need to be reliable, assembled quickly, and integrate with existing processes. During this lunch hour, the DNAnexus Team will explore how customers use the DNAnexus Platform to construct and deploy end-to-end solutions for health data networks.  

David Shaywitz, MD, PhD, Chief Medical Officer and Andrew Carroll, PhD, Vice President of Science will discuss real world examples from:

  • Regeneron Genetics Center
  • Geisinger Health System
  • Rady Children’s Hospital,
  • ORIEN – Oncology Research Information Exchange Network
  • Natera
  • Ebola viral sequencing onsite, and more.

Lunchtime Talk Details:

  • Friday, October 21, 2016
  • 1:00pm – 2:30pm
  • Convention Centre Room 18, East Building
  • Lunch and refreshments will be provided for attendees
  • RSVP here

Poster Presentations:

The DNAnexus Platform is leveraged in a variety of research applications. 

WED, OCT 19 — 2:00PM-3:00PM

Poster #655W: GWAS to 30X genomes: Evolution of sequencing in the ARIC cohort to reveal the genetic architecture of complex traits.
Lead Author: Ginger A. Metcalf, Baylor College of Medicine, Human Genome Sequencing Center

Poster #1885W: High-throughput clinical reporting of gene panels with the Neptune Pipeline.
Lead Author: Eric Venner, Baylor College of Medicine, Human Genome Sequencing Center

Poster #1309W: EHR data illuminates patient subtypes in obstructive lung diseases yielding new insights for genetic discovery.
Lead Author: Nilanjana Banerjee, Geisinger-Regeneron DiscovEHR Collaboration

Poster #3247W: Enhanced screening performance of a SNP-based NIPT for five clinically significant microdeletions in a large clinical cohort.
Lead Author: Kim Martin, Natera

Poster #1315W: Disease associations of common and rare calcium sensing receptor variants in the 50K DiscovEHR cohort.
Lead Author: Gerda E. Breitwieser, Geisinger Health System

WED, OCT 19 — 3:00PM-4:00PM

Poster #682W: Exome-wide association analysis of cardiac structural traits in large healthcare provider organization identifies genetic heterogeneity underlying left ventricular structure and overlapping genetic architecture with cardiomyopathy genes.
Lead Author: Jonathan Chung, Regeneron Genetics Center

Poster #2638W: Penetrance in the EHR record of 76 DiscovEHR Cohort participants with two recurrent pathogenic variants.
Lead Author: Kandamurugu Manickam, Geisinger Health System

THUR, OCT 20 — 2:00PM-3:00PM

Poster #1775T: Structural variant calling combining Illumino and low-coverage PacBio.
Lead Author: Andrew Carroll, DNAnexus

Poster #1205T: Exome  sequencing in DiscovEHR identifies rare variants in anion transporter genes that exert large effects on uric acid levels and gout.
Lead Author: Jan Freudenberg, Regeneron Genetics Center

Poster #2099T: The role of the ENCODE Data Coordination Center.
Lead Author: Jean M. Davidson, Department of Genetics, School of Medicine, Stanford University

THUR, OCT 20 — 3:00PM-4:00PM

Poster #1790T: The eMERGE Network: Continuing the legacy of genomic discovery to enrich precision medicine.
Lead Author: Melissa A. Basford, eMERGE Network

Poster #512T: Trajectory of new variants requiring pathogenicity assessment as potential secondary findings across 50,000 exomes in the DiscovEHR cohort.
Lead Author: Uyenlinh T. Mirshahi, Geisinger-Regeneron DiscovEHR Collaboration

Poster #2102T: Integrated metadata-driven access of ENCODE, modENCODE, REMC, GGR and modERN data through a common portal.
Lead Author: Esther T. Chan, Department of Genetics, School of Medicine, Stanford University

FRI, OCT 21 — 2:00PM-3:00PM

Poster #3207F: Rapid, high-throughput clinical sequencing and reporting for personalized medicine
Lead Author: Donna M. Muzny, Baylor College of Medicine, Human Genome Sequencing Center

FRI, OCT 21 — 3:00PM-4:00PM

Poster #1626F: Discovery and replication of rare variant associations using a knowledge-driven PheWas approach in eMERGE and Geisinger Health System.
Lead Author: Anna O. Basile, Pennsylvania State University

Poster #1764F: The ENCODE analysis pipelines: Repeatable and shareable analysis tools for ChIP-seq, RNA-seq, DNase-seq, and whole genome bisulfite experiments.
Lead Author: J.Seth Strattan, Stanford University Medical School

Poster #510F: A phenome-wide gene burden analysis to identify DrugBank genes associated with patient diagnoses.
Lead Author: Sarah Pendergrass, Geisinger Health System