Meet the new DNAnexus and its Instant Collaboration Environment

The life sciences field has a long and respected tradition of collaboration among researchers. Genomics as we know it today kicked off with one of the biggest biology collaborations of all time, the Human Genome Project.

This has led to a highly cooperative mindset among many participants in the community, and it’s a mentality that we at DNAnexus share and embrace. A successful collaboration is more than just a mindset, though. It requires infrastructure of all sorts: today’s partnerships benefit from technology advances such as Skype, instant messaging, FaceTime, Google Docs, and more.

genomics platformThese collaborations often involve many participants with a range of backgrounds and expertise, such as bioinformatics, medicine, microbiology, molecular biology, and more. The new DNAnexus includes a set of features to facilitate research projects for teams within and across organizations. As a bioinformatician, for example, you can upload your data, build Apps, create custom workflows, and then share all of it with your research partners, whose access and permissions you control. Because you have the ability to define the project and design the workflow yourself, your collaborators — who might be clinicians or biologists with little to no expertise in bioinformatics — will have easy access to the entire project via an intuitive web interface to analyze and visualize their data.

With the new DNAnexus, you not only enable non-bioinformaticians to run your custom analysis tools and best-practice workflows, but you’ll also be able to eliminate data transfer, format conversion, and other incompatibilities that currently slow down even the most efficient collaborative efforts. This platform offers a secure and reliable environment through which you can instantly collaborate with team members without the hassle of data synchronization and shipping hard drives.

Here are a few key features of the new platform that are especially useful for collaborations:

Instant access from anywhere: When you’re part of a team that could be spread out across an organization, country, or even the world, you need to have easy access to your data from anywhere, at any time. The beauty of using a cloud-based platform is that it offers just that: peace of mind that your data will be ready and waiting for you whenever and wherever you need it. One of the main values of the new DNAnexus is collaboration support without the need to transfer files between collaborating sites. All the data is in one location and can be accessed by any permitted person from anywhere.

Intuitive permission definitions for users: As the project leader, you’ll be able to use the new DNAnexus to set access permissions for the other users of your project. Some people may be able to just view or download the data, while others can be made contributors, allowing them to manage and run analyses on the data — it’s all up to you, the administrator.

share genomic data

Enterprise-grade data security: Just as much as you want the right people to have access to your project data, you don’t want other people seeing it. With its extensive background in data and cloud security, the DNAnexus team has built this platform with enterprise and user-controlled permission for data, analysis tools, and workflow sharing. Your data is not only stored in high-end physical data centers, but also fully encrypted at rest and in transfer.

If you haven’t yet tried the new DNAnexus for yourself, what are you waiting for? Sign up here for your free beta testing account. And check back on this blog for our next in-depth look, when we will discuss security and compliance.

Recovering from AGBT: Exhausted but Encouraged !

It’s hard to believe the whirlwind of the annual Advances in Genome Biology & Technology meeting is already over! The DNAnexus team had a terrific time at the conference, and we want to thank everyone who stopped by our suite and attended our Friday afternoon talk.

#agbt13This year’s meeting had more attendees (not to mention the thickest abstract book) than ever before. In many of the talks and posters, the challenge of data interpretation was front and center. Several scientists mentioned that the data sets they’re comparing and the analyses they’re performing pull together far more data than they’ve ever had to deal with. Indeed, the very first speaker of the meeting, Eric Boerwinkle from the University of Texas, told attendees that the community needs to keep pushing for better informatics and data interpretation tools. It was gratifying to see that so many scientists are making use of large, publicly available databases — ENCODE in particular was cited in several presentations.

The talks we found most interesting were about applications of next-gen sequencing technologies, ranging from clinical sequencing to epigenetics to microbiome studies. Christine Eng from Baylor College of Medicine spoke about whole exome sequencing in the clinic, noting that her team’s pilot project saw a 25% success rate (conservatively estimated) in using this information to diagnose a disease. She also said that clinical genomics will have trouble ramping up without more genetic counselors and geneticists; at the moment, there are just 3.5 such experts per million people in the US. In another talk, Leonid Moroz from the University of Florida captured attendees’ imaginations with a discussion on the biological mechanisms underlying memory persistence. He focused on epigenetic changes in the brain, finding that demethylation of just one strand of DNA seems to precede the formation of long-term memories in model organisms. Finally, the most-discussed talk of the conference came from Kjersti Aagaard at Baylor College of Medicine, who spoke about metagenomics in medicine. She presented data indicating that the placenta is not a sterile environment as previously thought, and that the placental microbiome is most closely related to the oral microbiome. For this technology audience, it was a real treat to see just how much compelling science is happening because of the sequencing tools that have been presented in previous years at AGBT.

It’s clear to us that the focus is rapidly moving from sequencing technologies and toward data interpretation as the real immediate technological challenge in the genomics community. This year, there were a number of companies presenting analysis tools, including Maverix Biomics, Ingenuity Systems, Personalis, and more. In fact, there seemed to be many more of these types of tools on display instead of the usual plethora of next-generation sequencing technologies that people tend to expect from the Marco Island conference. It was a fairly quiet year for instruments — no major sequencing technology headlines came out of the meeting — so it was great to have lots of attention on data interpretation and the tools enabling it.

We hope that we were able to offer conference attendees an optimistic view of the data analysis situation. People who came to our suite had the opportunity to get a guided tour of the new DNAnexus, and we were pleased at how much interest there was in a customizable cloud-based solution for managing and analyzing sequence data. We hosted several demos and have been thrilled to see how many people from the meeting have signed up for beta accounts with the new platform to help tame their own data sets.

agbt dnanexusOn Friday afternoon, our CEO and co-founder Andreas Sundquist gave a plenary presentation to introduce attendees to the new DNAnexus. Andreas’s talk provided a detailed look at the core attributes of the new DNAnexus — its configurabilityextensible toolbox with more than 40 Apps, instant collaboration environment, and security and compliance support. He also noted that users could choose an intuitive drag-and-drop interface or opt for the more hard-core command line to suit their own needs. When asked about data upload speeds and cloud capacity, Andreas said that current ethernet speeds are usually sufficient to upload sequence data in real time as it’s generated, and pointed out that Amazon’s cloud capacity — on which the DNAnexus service runs — currently has the infrastructure to run 1 million whole genome sequences per year.

agbt posterWe were also delighted to meet Franck Rapaport, the lucky winner of a 4-Day AGBT conference registration. Franck, who hails from Memorial Sloan-Kettering Cancer Center in New York City, was part of a team with a really interesting poster comparing differential expression tools for RNA-seq type data analysis.

Though scientists are certainly facing new challenges in data analysis, we think this is a great time for informatics innovation. Services such as the new DNAnexus, combined with great new algorithms and Apps, are helping to pave a path forward for a new era of genomics analysis in which infrastructure, workflow, and interpretation options are as seamless and simple as they should be.

Meet the new DNAnexus and Its Extensible Genomics Toolbox

genomics platformThis week we continue our look at unique facets of the new DNAnexus  with a focus on the “Extensible Genomics Toolbox” — that is, the platform’s ability to allow bioinformaticians to tailor their analyses through custom Apps and workflows. The Apps provided within the platform serve as a starting toolkit, but users can also build new Apps from scratch or modify and combine existing ones to create truly custom pipelines.

Customization is one of the most important components when it comes to data analysis. Depending on the question that has spurred their experiments, researchers have a broad range of data analysis needs. The tools, algorithms, or annotations they find relevant will vary greatly from one experiment to the next — no single solution works for all of them.

extensible genomics toolboxDNAnexus’ goal is to provide a turn-key platform with a comprehensive menu of built-in functionality while also providing the freedom to add new capabilities as you need them. You can use what is already available in the constantly growing Apps library, which includes a rich set of industry-recognized tools for data QC, DNA resequencing, and RNA-seq, such as FastQC, RSeQC, BWA, GATK, SAMtools, Picard, SomaticSniper, Tophat, Cufflinks, and many more. In addition to these tools, you can take advantage of an expanding set of integrated reference genomes and variant databases, including dbSNP and COSMIC. All the Apps provided are open source. In addition, we provide a large set of useful example applets that can be used as a starting point for developers to build their own Apps.

genomics apps

For example, if you need your own reference genome and annotation database for annotating and interpreting your data, DNAnexus provides the functionality to let you upload and integrate your own proprietary reference genome into your custom workflow. Combine and configure multiple tools — whether they’re provided by DNAnexus or built in your own lab — into best-practice workflows that can be used in your lab or shared with collaborators within or across institutions.

The flexibility of the new DNAnexus platform allows you to run Linux programs written in any language. You can develop your own parallelized tools using APIs and the SDK for Bash shell, Python, C++, and Java (with more coming soon). In addition, you can now automate your batch data analysis via scripting using the command line, or with an easy-to-use, drag-and-drop web interface with dynamic validation of App compatibility to let you know whether you have selected the proper input file type for a specified App.

genomics workflows

The new DNAnexus platform will allow upload and storage of any file type via the API. To achieve this and to take full advantage of the programmatic capabilities of the new platform, DNAnexus will automatically convert certain file types into objects optimized for fast programmatic access. These new objects are called Genomic Tables or simply gtables, and they can be generated with file-conversion Apps provided by DNAnexus. As a result, users can store and retrieve any file type from their account workspace in its original form. File types supported for import and export are FASTA, FASTQ, SAM, BAM, VCF, BED, GFF, GTF, and WIG (more soon).

The rich genomics toolbox also contains an HTML5-based integrated and interactive Genome Browser which lets you create custom tracks to view your data alongside reference data without any additional downloads or plugins. You can immediately leverage the Genome Browser and stream data as needed across the internet. Add from our included reference data sets and variant databases, such as dbSNP and COSMIC, or whatever data you choose to upload.

dnanexus genome browser

All together, this results in an environment that makes it possible for you to design, script, and fully automate custom workflows — filtering data, querying massive data sets, and handling batch analysis with ease thanks to the extensible genomics toolbox within the configurable cloud.

This ability to create Apps and workflows tailored to the needs of your project and lab is the foundation of the platform’s ability to facilitate “Instant Collaboration” — another core capability that we’ll discuss in next week‘s close-up look at the new DNAnexus. Haven’t taken it for a test drive yet? Take advantage of our free beta trial period and sign up for an account here to explore it for yourself.