Skip to content

Choosing Between WDL and Nextflow for Genomics Analysis Workflows

Explore the differences between WDL and Nextflow to determine the best workflow language for your analysis needs.

 

Organizations often approach us seeking to standardize their bioinformatics pipelines and centralize their computational resources to move away from ad hoc development. When beginning this journey, clients may still need to determine which workflow language to deploy standardized workflows. Since DNAnexus fully supports both WDL and Nextflow, we often get asked, "Which would be best for us? Unfortunately, there's no easy answer to this question.

Recently, some of our in-house experts hosted a roundtable discussion to address teams' frequently asked questions when selecting a workflow language to standardize on. 

Watch the Webinar Here

Why Do We Need Workflow Languages?

Before exploring the specifics of each workflow language, let's first discuss why workflow languages are so crucial in bioinformatics.

The analysis of omics data involves many data transformations. Any analysis with these data types is likely to involve a large number of discrete tasks such as normalization, alignment, filtering, etc. To ensure that each step in these processes is consistent across data and platforms, you need something that runs the same tasks each time. Utilizing workflow languages allows these processes to be portable, parallelizable, consistent, and interoperable.

Another advantage of utilizing a workflow language is that it allows fellow researchers to read and understand your analysis protocol and reproduce results. This can be especially useful when publishing research or developing methods, such as diagnostic tests, that require others to understand and reproduce your methodology.

Workflow languages also allow your processes to be portable across multiple types of infrastructure, such as a local HPC environment or a cloud environment, to ensure seamless processing. This way, you or others who may want to reproduce your results are not tied to a certain environment for managing samples.

What is WDL?

Workflow Definition Language (WDL), according to OpenWDL, is an open-source domain-specific language used to define and describe scientific workflows. WDL was developed at the Broad Institute specifically for genome analysis pipelines. It provides a way to specify a workflow's inputs, outputs, and steps, allowing for reproducibility and scalability. WDL is designed to be human-readable and writable, making it easier for researchers and scientists to define and share their workflows. It utilizes a structured syntax that allows for the composition of complex workflows, with support for conditionals, loops, and parallel execution.

What is Nextflow?

Nextflow is a popular workflow language used for omics analysis workflows. It is expressive and scalable and allows for the definition and execution of complex data-driven workflows. One of the critical advantages of Nextflow is its ability to handle different execution platforms, including local machines, clusters, and cloud environments. It provides a flexible, portable solution for running workflows across different computing infrastructures. Like WDL, Nextflow is free and open source.

What are some differences between WDL and Nextflow?

WDL and Nextflow both provide the functionality we outlined as necessary for a workflow orchestrator; both give users the ability to produce human-readable, portable, parallelizable, consistent, and interoperable processes. In many ways, the decision of which workflow language to use will be one of developer familiarity. Utilizing the workflow language your team is most comfortable with is often the fastest path to success, regardless of the chosen language. While there are more similarities than differences, there are a few ways in which the languages differ in their implementation of functions. We'll outline a few of these below.

Executor

One key difference between Nextflow and WDL is the approach to execution engines.  Nextflow  has its own executor or engine, but does not provide any specifications for the engine.   WDL,  on the other hand,  requires a third-party execution engine  to run but  provides versioned specifications. A provided executor can be convenient for new users to get started quickly whereas having versioned specifications provides information that can be useful for pinpointing bugs or enhancements and allowing competition in implementation.

 In addition to convenience, the Nextflow engine provides functionality that can make it easier to iterate on or debug. For example, the resume feature, so if you encounter an error, it might be easier for you to correct it and then pick up where you left off.

When running WDL on DNAnexus, similar functionality can be achieved using the DNAnexus feature Smart Reuse. With this feature, if you've generated some data using a task with a particular set of inputs, it'll remember that these tasks have already been completed. The results are saved and can be reused later when needed.

Specification of Scatter & Gather

WDL has a feature called scatter-gather parallelism. Parallelism makes programs faster and more efficient by performing several tasks in parallel rather than one by one. The scatter function in WDL "will produce parallelizable jobs running the same task on each input in an array, and output the results as an array as well."1 In the case of WDL, scatter is explicit, meaning that you must specify that these tasks be treated as an array. However, once scatter is specified, gather is implicit.

In Nextflow, the opposite is true; scatter is implicit. This means that Nextflow will automatically scatter a task if it's given array inputs. Users will need to specify if they do not want tasks to run in an array. However, gather in Nextflow is explicit, so to gather, you will need to specify.

Ultimately, in function, both languages are the same; they differ only in how you specify them.

Community Support

Arguably, the most significant difference between Nextflow and WDL is the community around each of these workflow languages. nf-core provides a rich resource for getting started. Many well-tested and well-written pipelines are open-source. In addition, the community offers training, community forums, hackathons, and regular user group meetings in Europe and the U.S. to enable users. These types of resources can especially be valuable for those starting in the field or those coming to Nextflow from backgrounds other than software development.

The OpenWDL community has well-documented toolkits and cookbooks to help users get started. Compared to the Nextflow community, however, OpenWDL provides fewer out-of-the-box workflows and does not have active user group meetings.

Pipelines from either OpenWDL or nf-core  can be easily imported into DNAnexus and made available to users. 

A screenshot of the import pipeline/workflow dialog box in DNAnexus

Should I Rewrite My Workflows in WDL or Nextflow?

While we don't discredit the value of learning a new workflow language or adding a new skill to your toolbelt, in our experience, most bioinformaticians, computational scientists, and software developers will choose the workflow language that they know the best. Working with a language you know well is likely the quickest path to success. On that same note, if you have an existing functioning workflow written in another language, there's no compelling reason to rewrite existing workflows.

Here we’ve only briefly discussed the importance of workflow languages in bioinformatics and how they help make the analysis process consistent, portable, parallelizable, and interoperable. WDL and Nextflow provide similar functionality for workflow orchestration, but there are a few differences in their implementation of functions which may be important for some users. To learn more about how Nextflow and WDL can be used on DNAnexus, see WDL on DNAnexus and Nextflow on DNAnexus

 

About DNAnexus

DNAnexus the leader in biomedical informatics and data management, has created the global network for genomics and other biomedical data, operating in 33 countries including North America, Europe, China, Australia, South America, and Africa. The secure, scalable, and collaborative DNAnexus Platform helps thousands of researchers across a spectrum of industries — biopharmaceutical, bioagricultural, sequencing services, clinical diagnostics, government, and research consortia — accelerate their genomics programs.

The DNAnexus team is made up of experts in computational biology and cloud computing who work with organizations to tackle some of the most exciting opportunities in human health, making it easier—and in many cases feasible—to work with genomic data. With DNAnexus, organizations can stay a step ahead in leveraging genomics to achieve their goals. The future of human health is in genomics. DNAnexus brings it all together.