New features for managing workflows and releasing them to your global network of collaborators

DNAnexus Blog Authors






Computational genomics workflows are regularly used to not only rapidly accelerate R&D in the field of genomics, but they are also increasingly used to make clinical diagnoses tailored to individual genomes. As DNAnexus has grown to support a large network of industries and collaborators, we have noticed that these workflows are often developed and shared across users and organizations on a global scale.

DNAnexus workflows currently provide the core functionality of allowing users to create and execute a computational workflow within a DNAnexus project. However, for users and organizations collaborating on multiple private or public projects, these local workflows may be less suitable for long-term maintenance in the context of larger organizations and consortia.  In recognition of the need to manage workflows with a truly global network of collaborators, we are excited to introduce an additional suite of features that can be applied to objects we call ‘global workflows’.

As with DNAnexus applications, global workflows are published to a global space accessible by authorized users across projects.  Like a Github repository or Docker repository, global workflows are versioned and updated with a globally unique name. Global workflows can be tagged, associated with broad categories (e.g. ‘read mapping’, ‘germline variant calling’, ‘somatic variant calling’, ‘tumor-normal variant calling’), and defined to run across cloud regions and providers.  They can also be developed by a specified set of users and subsequently published or released to a larger set of authorized users who can run but not modify the workflow. Together, these features empower workflow developers to better share and advertise their workflows to a broad set of users and organizations across multiple regions and clouds.

A user creates a global workflow in essentially the same way as a regular workflow (see this tutorial for more details on how to create a global workflow).  In fact, existing workflows on the DNAnexus platform can be converted to global workflows in a straightforward way.  Since workflows written in CWL or WDL can be directly converted into workflows on our platform, these workflows can also be easily converted to global workflows.  As a result, portable workflows can also be imported to our platform and used in a way that meets organizational needs for access control and collaboration at scale.

To illustrate the use of global workflows, we have published a public workflow available to all users of our platform. For example, from our CLI, you can run:

$ dx find globalworkflows
GATK4 Germline Best Practice FASTQ to VCF (hs38DH) (gatk4_germline_bp_fq_hs38dh),

Here, you can see that there is a GATK4 best practice pipeline available for you to use.  You can treat this workflow name like any other global applications on the platform. Examples for how to use these features can be seen in more detail here.

Workflow release management features were built by the Developer Experience team at DNAnexus. Thanks to the DNAnexus Science team for contributions to the design of this feature. Please see our documentation for a tutorial on how to use these features and contact if you have any feedback or questions.

Meet the new DNAnexus and its Instant Collaboration Environment

The life sciences field has a long and respected tradition of collaboration among researchers. Genomics as we know it today kicked off with one of the biggest biology collaborations of all time, the Human Genome Project.

This has led to a highly cooperative mindset among many participants in the community, and it’s a mentality that we at DNAnexus share and embrace. A successful collaboration is more than just a mindset, though. It requires infrastructure of all sorts: today’s partnerships benefit from technology advances such as Skype, instant messaging, FaceTime, Google Docs, and more.

genomics platformThese collaborations often involve many participants with a range of backgrounds and expertise, such as bioinformatics, medicine, microbiology, molecular biology, and more. The new DNAnexus includes a set of features to facilitate research projects for teams within and across organizations. As a bioinformatician, for example, you can upload your data, build Apps, create custom workflows, and then share all of it with your research partners, whose access and permissions you control. Because you have the ability to define the project and design the workflow yourself, your collaborators — who might be clinicians or biologists with little to no expertise in bioinformatics — will have easy access to the entire project via an intuitive web interface to analyze and visualize their data.

With the new DNAnexus, you not only enable non-bioinformaticians to run your custom analysis tools and best-practice workflows, but you’ll also be able to eliminate data transfer, format conversion, and other incompatibilities that currently slow down even the most efficient collaborative efforts. This platform offers a secure and reliable environment through which you can instantly collaborate with team members without the hassle of data synchronization and shipping hard drives.

Here are a few key features of the new platform that are especially useful for collaborations:

Instant access from anywhere: When you’re part of a team that could be spread out across an organization, country, or even the world, you need to have easy access to your data from anywhere, at any time. The beauty of using a cloud-based platform is that it offers just that: peace of mind that your data will be ready and waiting for you whenever and wherever you need it. One of the main values of the new DNAnexus is collaboration support without the need to transfer files between collaborating sites. All the data is in one location and can be accessed by any permitted person from anywhere.

Intuitive permission definitions for users: As the project leader, you’ll be able to use the new DNAnexus to set access permissions for the other users of your project. Some people may be able to just view or download the data, while others can be made contributors, allowing them to manage and run analyses on the data — it’s all up to you, the administrator.

share genomic data

Enterprise-grade data security: Just as much as you want the right people to have access to your project data, you don’t want other people seeing it. With its extensive background in data and cloud security, the DNAnexus team has built this platform with enterprise and user-controlled permission for data, analysis tools, and workflow sharing. Your data is not only stored in high-end physical data centers, but also fully encrypted at rest and in transfer.

If you haven’t yet tried the new DNAnexus for yourself, what are you waiting for? Sign up here for your free beta testing account. And check back on this blog for our next in-depth look, when we will discuss security and compliance.

Recovering from AGBT: Exhausted but Encouraged !

It’s hard to believe the whirlwind of the annual Advances in Genome Biology & Technology meeting is already over! The DNAnexus team had a terrific time at the conference, and we want to thank everyone who stopped by our suite and attended our Friday afternoon talk.

#agbt13This year’s meeting had more attendees (not to mention the thickest abstract book) than ever before. In many of the talks and posters, the challenge of data interpretation was front and center. Several scientists mentioned that the data sets they’re comparing and the analyses they’re performing pull together far more data than they’ve ever had to deal with. Indeed, the very first speaker of the meeting, Eric Boerwinkle from the University of Texas, told attendees that the community needs to keep pushing for better informatics and data interpretation tools. It was gratifying to see that so many scientists are making use of large, publicly available databases — ENCODE in particular was cited in several presentations.

The talks we found most interesting were about applications of next-gen sequencing technologies, ranging from clinical sequencing to epigenetics to microbiome studies. Christine Eng from Baylor College of Medicine spoke about whole exome sequencing in the clinic, noting that her team’s pilot project saw a 25% success rate (conservatively estimated) in using this information to diagnose a disease. She also said that clinical genomics will have trouble ramping up without more genetic counselors and geneticists; at the moment, there are just 3.5 such experts per million people in the US. In another talk, Leonid Moroz from the University of Florida captured attendees’ imaginations with a discussion on the biological mechanisms underlying memory persistence. He focused on epigenetic changes in the brain, finding that demethylation of just one strand of DNA seems to precede the formation of long-term memories in model organisms. Finally, the most-discussed talk of the conference came from Kjersti Aagaard at Baylor College of Medicine, who spoke about metagenomics in medicine. She presented data indicating that the placenta is not a sterile environment as previously thought, and that the placental microbiome is most closely related to the oral microbiome. For this technology audience, it was a real treat to see just how much compelling science is happening because of the sequencing tools that have been presented in previous years at AGBT.

It’s clear to us that the focus is rapidly moving from sequencing technologies and toward data interpretation as the real immediate technological challenge in the genomics community. This year, there were a number of companies presenting analysis tools, including Maverix Biomics, Ingenuity Systems, Personalis, and more. In fact, there seemed to be many more of these types of tools on display instead of the usual plethora of next-generation sequencing technologies that people tend to expect from the Marco Island conference. It was a fairly quiet year for instruments — no major sequencing technology headlines came out of the meeting — so it was great to have lots of attention on data interpretation and the tools enabling it.

We hope that we were able to offer conference attendees an optimistic view of the data analysis situation. People who came to our suite had the opportunity to get a guided tour of the new DNAnexus, and we were pleased at how much interest there was in a customizable cloud-based solution for managing and analyzing sequence data. We hosted several demos and have been thrilled to see how many people from the meeting have signed up for beta accounts with the new platform to help tame their own data sets.

agbt dnanexusOn Friday afternoon, our CEO and co-founder Andreas Sundquist gave a plenary presentation to introduce attendees to the new DNAnexus. Andreas’s talk provided a detailed look at the core attributes of the new DNAnexus — its configurabilityextensible toolbox with more than 40 Apps, instant collaboration environment, and security and compliance support. He also noted that users could choose an intuitive drag-and-drop interface or opt for the more hard-core command line to suit their own needs. When asked about data upload speeds and cloud capacity, Andreas said that current ethernet speeds are usually sufficient to upload sequence data in real time as it’s generated, and pointed out that Amazon’s cloud capacity — on which the DNAnexus service runs — currently has the infrastructure to run 1 million whole genome sequences per year.

agbt posterWe were also delighted to meet Franck Rapaport, the lucky winner of a 4-Day AGBT conference registration. Franck, who hails from Memorial Sloan-Kettering Cancer Center in New York City, was part of a team with a really interesting poster comparing differential expression tools for RNA-seq type data analysis.

Though scientists are certainly facing new challenges in data analysis, we think this is a great time for informatics innovation. Services such as the new DNAnexus, combined with great new algorithms and Apps, are helping to pave a path forward for a new era of genomics analysis in which infrastructure, workflow, and interpretation options are as seamless and simple as they should be.