Run the Mercury Variant-Calling Pipeline on Your Own Data

HGSC Baylor College of MedicineMercury, designed by the Human Genome Sequencing Center at Baylor College of Medicine (HGSC), is used as the core variant-calling pipeline for the CHARGE consortium. The Mercury pipeline is a semi-automated and modular set of tools for the analysis of NGS data in clinically focused studies. HGSC designed the pipeline to identify mutations from genomic data, setting the stage for determining the significance of these mutations as a cause of serious disease.

Thanks to HGSC’s work with us, the Mercury pipeline is now freely available to any DNAnexus user. The Mercury pipeline is located in the applets folder of the  HGSC_Mercury project. You can find the project, along with everything you need to run the applet, under the ‘Featured Projects’ section on your home page.  Login to DNAnexus or create an account today to get started immediately.

Inside the Mercury Project

  • Both whole genome and exome samples
  • All annotation and reference data required
  • Pre–configured workflow (just drag & drop your inputs)

Results from the Mercury pipeline will be made up of a set of annotated variants from your data sample. You’ll also see all of the biologically significant data that applies to the variants from the Baylor College of Medicine database, using their Cassandra annotation tool. You can easily visualize the mappings and variant calls within our integrated genome browser.

On Being Platform Agnostic

One inevitable outcome of the ever-expanding number of DNA sequencing platforms is the lock-step addition of new data types. The technologies developed by Complete Genomics, Illumina, Life Tech/ABI/Ion Torrent and Pacific Biosciences produce the lion’s share of genomic data today. But Genia, GnuBio, NABsys, Oxford Nanopore and others are in the wings, poised to pile significantly more on.

Every sequencing platform relies on a different technology to read the As, Ts, Cs, and Gs of a genome. This presents a number of major challenges in assembly and sequence accuracy across platforms due to varying read lengths, method-specific data generation, sequencing errors, and so forth. However, while all have their nuances, they all have potential value to the progress of life science and medical research.

A complete solution to this problem would involve models for each platform, accounting for the generation and characteristics of libraries, data collection, transcript distributions, read lengths, error rates, and so on. The fact that a standard solution for integrating all these data types doesn’t currently exist is a testament to the difficulty of this task, which shouldn’t be underestimated.

The solutions most commonly used today for managing this diversity of data are the products of enterprising bioinformaticians who have developed “home-brewed” applications capable of taking primary data created by the instrument and, among other tricks, performing alignments to a reference genome and/or completing assemblies. While these workarounds provide a band-aid, they are not available for all platforms, rarely scalable and take highly experienced technical users to manage.

As genomic data continues its march beyond core facilities and into a broader range of research labs, healthcare organizations and, eventually, point-of-care providers, the need becomes even more acute for technologies that can — as far as the user is concerned — effortlessly perform the challenging tasks of integrating data from multiple sources for annotation and interpretation and combining them with the analysis and collaboration tools needed to glean insights.

As an industry, we need to start taking a more platform-agnostic approach towards the analysis and visualization of sequencing data. This is particularly critical as new platforms enter the market, collaborations across institutions, labs and borders expand and “legacy” data is incorporated into new repositories.

At DNAnexus, we are committed to removing the complexities inherent in working with diverse datasets so that scientists and clinicians can focus on the more impactful areas of data analysis and knowledge extraction. We are also committed to providing a secure and user-friendly online workspace where collaboration and data sharing can flourish.

Stay tuned for much more on this topic and let us know about the challenges you face when working with multiple data types and what kind of datasets you’d like to see more easily integrated into your work.

Preserving and Enhancing an Important Community Resource

Today, DNAnexus is pleased to announce the launch of our hosted SRA site!

The DNAnexus SRA site is a hosted version of NCBI’s Sequence Read Archive (SRA). As the most comprehensive archive of publicly available next-generation sequencing data, the SRA is an important resource to researchers around the world. The SRA remains the single best resource of useful sequence data from research initiatives such as the 1,000 Genomes Project and institutions like the Broad Institute, Washington University, and the Wellcome Trust Sanger Institute.

DNAnexus has created a mirrored site of this resource by teaming up with Google, to provide access to all publicly accessible datasets for specific studies, experiments, samples, and runs that are currently available via the NCBI SRA website. (Note: Currently these data do not include the analysis data and the Trace Archive repository.)

The New Interface

In addition to maintaining free access to the SRA database, we have taken this opportunity to improve the experience of using and accessing these data. The new web-based user interface was built using the latest cloud-based technologies and genomic data standards. Central to this effort were the many conversations we had with researchers about how they search and interact with data of this type. Their feedback was the basis of our development plan, which drew on our own experience in developing web-based sequence data analysis solutions as well as Google’s big data expertise.

Searching and Browsing

Our main goal in developing the new interface was to vastly enhance the way you find data of interest, understand sample-to-project associations, and download files for subsequent analysis in your tool of choice.

The most significant difference you will notice is the new web-based searching and browsing interface. The new search tool allows you to simultaneously look across multiple data annotations and keywords for objects of interest that are embedded in the SRA database. Each search returns a ranked list of results with relevant metadata for easy follow-on browsing.

We have developed a number of features to simplify how you can scan results and quickly narrow in on relevant data. We are particularly excited about the links to published data. PubMed references now permit users to link directly to journals for descriptions of samples, experiments, studies, or runs as they appeared in the referenced publication.

Once you have identified samples of interest, you can easily download them. In addition to the SRA standard format, we have also made it possible to download these data in the more popular FASTQ format.

For more details on the sra.dnanexus.com functionality and how the website works, please visit the SRA FAQ.

Transforming Data into Real Insights Using DNAnexus

Since the SRA primarily contains raw sequence data, the ability to import them into a platform such as DNAnexus is essential for further analyses. For example, by uploading your results into DNAnexus you can access tools that will map your data to a reference genome so you can better understand data quality, a critical step in determining whether to move forward with the data. DNAnexus also allows you to analyze and visualize these data as a standalone dataset or in conjunction with other data already in the system, using our interactive web-based Genome Browser.

Analyze and Visualize SRA Data for Free

For the next 30 days, you can import SRA data directly into DNAnexus at no cost. If you already have a DNAnexus account, simply log in and import your SRA data. If you are not yet a user of DNAnexus, you can sign up for a free trial account and import your data. Once logged in, you can perform mapping, RNA-seq, ChIP-seq, variant analysis, and data visualization on your SRA data for a total of two years.

Special note for our users from academic institutions… We have just reduced the standard academic pricing by half!