Earlier this month I attended the 4th annual Bio-IT World Europe Conference & Expo in Vienna, where I found that the enthusiasm for high-performance, cloud-based computing from the scientific community is higher than ever. I was thrilled to see that there is more demand for resources to help scientists and bioinformaticians store, manage, and analyze their data — particularly in ways that facilitate collaboration among larger groups. There seems to be quite a bit of money spent in Europe on cloud-based/open source tools with the goal to support and advance genomics research. The money comes from research funds, but also from the commercial sector. That’s especially interesting since Europe’s overall funding situation seems a bit shaky, yet it is great to see that there is enough funding on the research side.
In another keynote session, Paul Flicek, principal investigator and head of the vertebrate genomics team at EMBL’s European Bioinformatics Institute, spoke about evaluating cloud-based computing as part of his work with Ensembl, the 1,000 Genomes Project, and ENCODE. He termed it, “interacting with the cloud through the lens of Ensembl.” Flicek made the important point that the ultimate goal isn’t amassing sequence data, such as aligned reads, variation calls, and genome browser viewings, but rather to extract knowledge from that data to improve our understanding of biology and disease. He uses cloud services from Amazon to take advantage of its entire infrastructure, to distribute the data, and to provide genome annotation of more than 50 species.
I was also really interested in a talk from Veit Ulishoefer, who presented an update on the Pistoia Alliance. This group was formed a few years ago by informatics experts at some of the leading pharmaceutical companies who wanted to share precompetitive information to streamline the drug discovery process at all of their companies. Today, the group is made up of pharma companies, publishers, and academic institutes, among others. Ulishoefer spoke about a recent competition called Sequence Squeeze, hosted by Pistoia, to find the best compression tool for sequence data. The winning entry came from James Bonfield, a researcher at the Wellcome Trust Sanger Institute, which can be accessed through SourceForge.
It was great to see that so many of these collaborative projects were driven by pharma, which isn’t necessarily known for having a share-and-share-alike mentality. If even these highly competitive corporations can find ways to work together, that gives me great hope that such alliances will help usher in an improved understanding of diseases and more effective medicines. Here at DNAnexus, we strongly believe in the central pillar of collaboration, a major focus of ours and well supported with the core capability of the cloud.