At Bio-IT World, Genome Centers Dished on Big Data

BioIT World 2013At the Bio-IT World Conference & Expo last week in Boston, more than 2,500 attendees descended on the World Trade Center to hear about the latest in hardware, analysis, data storage, and much more. The DNAnexus team was out in force, and we were delighted to share updates about our new platform with the many attendees who stopped by our booth.

The conference had a number of excellent keynote talks this year, including Atul Butte from Stanford and Andrew Hopkins from the University of Dundee. We also really enjoyed seeing Steven Salzberg’s acceptance of the Benjamin Franklin Award for Open Access in the Life Sciences — a much deserved honor for one of the veterans of the bioinformatics field.

Perhaps most interesting was a panel discussion about big data featuring members of major genome centers. Panelists included Guy Coates from Sanger, Xing Xu from BGI, Eric Jones from the Broad Institute, and Alexander (Sasha) Zaranek from Harvard Medical School and a company called Clinical Future.

For those of us who remember when it was a big deal to have a terabyte of storage available, it was truly amazing to hear that most of the panelists have 15 petabytes or more of data stored and easily accessible. Still, even with resources like that, some of the panelists encourage their institute members to delete data when possible, such as the unaligned reads from a sequencing run.

Access control is a real problem for managing data at these large centers. Sanger’s Coates said that his institute’s move into the clinical field — complete with consent forms and all the other compliance needs — makes controlling access “a real nightmare” for his team. Jones at the Broad said that this issue basically means people in the field are living on borrowed time as it becomes increasingly important to find the right solution to this challenge. Zaranek noted that Clinical Focus will use the Arvados tool to include security permissions and provenance along with the files to address this issue.

The panelists also specifically discussed cloud computing, with BGI’s Xu saying that the cloud is his center’s main data repository. Still one goal is to facilitate more rapid and efficient exchange of genomic data globally via higher bandwidth, although they have tested this using Aspera. They successfully transferred 24 GB in just 30 seconds across countries, but this feat is not yet economical enough for routine use. Coates said that his group uses cloud options (including Amazon) for research projects, but they are still evaluating how to integrate cloud for the production pipeline in a cost-effective way. At the Broad, Jones said, the need to move to the cloud is understood, but so far internal computing is still enough for institute members; he added that the cloud’s elasticity will ultimately drive adoption, allowing people to run very large jobs that would otherwise interfere with the rest of the institute’s compute resources. Zaranek’s group is using cloud computing from Harvard and from Amazon and said that having both options is incredibly valuable. It will also allow other organizations to access their resources. Coates and Jones said that the real challenge in managing data is when individual researchers start moving data around, because tracking that data and predicting resource needs can become difficult.

These are all issues that we have given a great deal of thought to as we designed and built the new DNAnexus, now available for beta testing. We agree that security and compliance are important components of any compute solution — whether cloud-based or in-house — and that’s why we baked the highest standards right into our new tool. Having flexibility to configure the environment as needed, such as scaling up or down at a moment’s notice, is another key trait of the new platform and one that we believe will be quite useful for scientists in individual labs or at these major genome centers streaming data around the clock.

Join Us at Bio-IT World for a Personalized Demo and Chance to Win an iPad Mini!

Next week kicks off the annual Bio-IT World Conference & Expo, one of the best conferences dedicated to bioinformaticians and computational biologists. We look forward to it every year for the opportunity to rub elbows with die-hard developers, IT powerhouses, and the truly remarkable scientists who are just as comfortable working with lines of code as others are with lines of cells. Last year the meeting had more than 2,500 attendees from all over the world; it’s a can’t-miss venue for people who work hard to make sure the computational side of biology functions properly.

In that spirit, we hope to meet many of you at our booth (#311) for a personalized demo of the new DNAnexus, our cloud-based genomics platform. For the bioinformatics professional, the new DNAnexus eliminates up-front commitment to expensive hardware and maintenance. And since DNAnexus leverages Amazon Web Services for scalable and cost-effective data storage and computing, a research lab can access these resources, as there is need. The new DNAnexus is now in beta and you can request access for a free account today.IPad Mini Giveaway

To sweeten your personalized test drive of the new DNAnexus platform at Bio-IT World, we are giving away an iPad Mini! Here’s how to enter:

1. Sign up for a beta account
2. Upload your data so you can test-drive our system
3. Visit DNAnexus in booth #311 for a chance to win

DNAnexus scientists will be on hand to offer a custom data analysis and answer any questions you may have. This is a perfect opportunity to see how our new platform will perform in your lab! Even if you don’t have time to upload your data ahead of time, please stop by to learn more about the new platform and shoot the bioinformatics breeze.

Plus, join us in our booth for daily 10-minute mini-sessions during meeting breaks. We’ll be spotlighting a few new DNAnexus features, including compliance, collaboration and app building.

Wednesday – April 10th

10:00 am Instant Collaboration:
Compliance Meets Collaboration: Together at Last
Vince Ramey, Ph.D., Scientist, DNAnexus
3:30 pm App Building:
Build an App & Share With Your Team in 10 Minutes
Andrey Kislyuk, Ph.D., Sr. Engineer, DNAnexus

Thursday – April 11th

10:30 am App Building:
Build an App & Share With Your Team in 10 Minutes
Andrey Kislyuk, Ph.D., Sr. Engineer, DNAnexus
1:30 pm Instant Collaboration:
Compliance Meets Collaboration: Together at Last
Vince Ramey, Ph.D., Scientist, DNAnexus

At Bio-IT World, All Eyes Were on the Cloud and Big Data!

As expected, the 10th annual Bio-IT World Conference & Expo was both exhausting and invigorating. With three jam-packed days of great talks, demos, and networking opportunities, we came away from the meeting eager for a long nap. A huge thank you to all the people who stopped by our booth and engaged us in really interesting conversations! Talking with current and potential users has sent us back to Mountain View with the reinforced knowledge that our product really is making a difference in people’s lives — and making next-gen sequence data easier to manage and analyze for labs that don’t have production-scale compute resources, as well as for IT groups that find themselves juggling a lot of different computing needs. This gives us an even greater sense of urgency in launching our new platform this summer.

One of the most interesting facts about this year’s Bio-IT World conference was a shift in focus in the exhibit hall. In years past, there have always been a lot of vendors showing off servers, cluster improvement tools, chip accelerators, and other hardware offerings. This year, hardware was unusually hard to come by. It seems that the field has tacitly acknowledged that cloud-based data storage is indeed the way to go for the vast majority of genomic labs.

Scientists who spoke at the conference or visited our booth offered further validation of that trend. After years spent trying to figure out how to write scripts that work best on a cluster or with an FPGA, the biologists were relieved to be back where they wanted: focused on the data, not on the compute infrastructure. When talking to attendees in our booth, it was clear that the most pressing thing for this community is getting the right scientific answer, a sentiment that resonates with us.

In a keynote talk, Jill Mesirov from the Broad Institute spoke about the critical need to integrate tools and workflows for analysis and management of large data sets. She introduced GenomeSpace, the Broad’s new platform that combines various tools, including the UCSC genome browser, Cytoscape, GenePattern, Galaxy, and more.

On the commercial side, the emphasis on putting compute infrastructure behind the scenes so that scientists can focus on answers was reinforced by a number of news announcements tied to new cloud-based services from organizations including Illumina, BGI, and others. While the commercial services haven’t officially launched yet, we look forward to trying them out later in the year as they’re released.

Finally, we’re delighted to see that people are already flocking to our new Landing page at, which has information on the new DNAnexus platform to be launched this summer. If you haven’t signed up already, we encourage you to do so. This way you will be the first to learn about the capabilities the new platform will support and exact timing of the roll out. It’s simple: just enter your e-mail address and we will keep you posted with ongoing information about our best-in-class security, unified environment for instant collaborations, custom workflows, and more. You’ll also be automatically entered in a monthly drawing for a free iPad in May and June. Sign up early for the best chance of winning! Stay tuned — the winner for the April drawing has been pulled and will be announced shortly on our blog.