The Hybrid Hackathons of the Future — now with Librarians!


Hannah Gunderman, Data, Gaming, and Popular Culture Librarian, Carnegie Mellon University Libraries

Ben Busby, Director, Solution Science and Principal Scientist, DNAnexus


With the world still reckoning with the impacts of the COVID-19 pandemic, one thing that has remained constant is the need to change how people collaborate and communicate ideas, often shifting to remote and virtual formats.  The COVID-19 pandemic accelerated the rate at which hackathons are hosted in a virtual format. Remote hackathons have the potential to mirror the personally and professionally transforming experiences conveyed by in-person events to those who can not travel due to financial, physical, or environmental constraints. Remote hackathons allow the intellectual wealth of scientists in these countries to be applied to the important topics and goals of the hackathon, while supporting their health and safety through virtual participation. We hope that hackathons will retain a hybrid model to maximize the scientific contributions of both in-person and remote participants. 

Why are hackathons important?

Hackathons allow for concentrated, focused effort on a task or goal by bringing together scientific experts in a particular discipline, such as structural variants, or united by a common goal, such as ending neurofibromatosis.  Some hackathons solve thorny problems, make life easier for practitioners of specific disciplines, or push the boundaries of what a particular scientific field can do.  That said, hackathons not only produce content (usually software), but ideally also actively facilitate education and networking. Those who participate often have professionally transformative experiences that can lead to a wider scientific network, job opportunities, and increased confidence in their coding and research skills. 

Hackathons largely follow the model of “disruptive innovation” by serving as a prototyping layer across scientific organizations, producing new ideas and technologies that the community can then assess for value in their larger goals and initiatives. The prototypes that emerge either push the envelope of what is possible with biomedical informatics, or make day-to-day bioinformatics easier.  While the code isn’t necessarily persistent, these proof-of-concepts are intended for the community to build upon. Hackathons foster an environment with “buzz,” an economic geography concept referring to the serendipitous sharing of creative ideas that happens when people engage in face-to-face interactions. The last year has taught us that these benefits from hackathons are also afforded through hybrid or fully-remote formats, providing hope for a positive future of hybrid hackathons in scientific advancement and discovery. 

How do hackathons benefit the participants?

Not only do hackathons have an undeniable benefit to the broader scientific community, but, they also can provide transformative and impactful experiences for the participants themselves. These experiences largely revolve around the areas of confidence-building, educational development, and camaraderie. 

As described earlier, the “buzz” created in hackathon environments helps advance the sharing of creative and innovative ideas. Through this exchange of ideas, participants can advance their journeys in computational problem-solving and modern software development techniques. In the bioinformatics space, there are many beginner data scientists who are still learning foundational skills in computation and scientific collaboration. Hackathons, whether remote or in-person, offer a concentrated space for beginner data scientists to advance their skills in both of these areas alongside more established bioinformatics researchers. Not only does this afford educational benefits to these participants, but it can also increase their confidence as scientists who can contribute to important research endeavours. 

Finally, hackathons also create the opportunity for participants to forge close personal friendships and bonds, which can lead to long-term collaborations and network-building. 

Participants often find themselves in intensely challenging and time-limited environments as they race to accomplish the goals of the hackathon, and going through these transformative experiences together can lead to strong friendships and connections that span beyond the bounds of the hackathon itself. This is not limited to in-person hackathons, however: video-conferencing software such as Zoom and collaborative tools such as Slack allow participants to interact with each other and build both interpersonal and professional connections. 

A Retrospective Look Into CMU-DNAnexus Virtual “Genomic Data to the Clinic” Hackathon

The CMU-DNAnexus Virtual “Genomic Data to the Clinic” Hackathon (June 1st – June 4th 2021) was focused on bringing complex genomic data into the clinic.  Specifically, we focused on integrating Expressed Variants, Polygenic Risk Scores, Structural Variants and T-Cell Receptors into an Electronic Medical Record readable format using OMOP and worked on a clinically presentable interface. Remote support was offered by librarians from Carnegie Mellon University Libraries who have specialties in data management, bioinformatics, and information sciences. This support included collating important resources found by hackathon participants (such as tools, software, literature, etc.) into a single spreadsheet for easy access, reviewing the hackathon manuscript for syntax and readability, and preparing the manuscript for submission to BioHackrXiv. Communication platforms such as Zoom and Slack can offer ways to stay in touch and facilitate collaboration during a remote hackathon, but information can still get lost in translation in environments where we can’t see each other face-to-face. Librarians are trained in the information sciences and well-positioned to assist in keeping information organized and accessible during a remote or hybrid hackathon. 

Participants not only effectively used online collaboration tools to create innovative workflows and deliverables supporting the goals of the hackathon, but also used tools such as Slack to develop interpersonal friendships. Much of the same dynamic energy and “buzz” felt during an in-person hackathon was also felt in this virtual space and the experience has already led to some promising future collaborations and scientific endeavors, including an accepted proposal for a presentation at the 2021 annual meeting of the American Society of Human Genetics that will share the scientific findings from this hackathon. 

Upcoming hybrid hackatons

Although the pandemic is experiencing a long tail, we can still begin to envision what our post-pandemic future may look like, taking the lessons we have learned from navigating our remote environment for the past several months. One of the lessons we can bring into a post-pandemic future is that hackathons with a virtual option can help us create more equitable and diverse intellectual spaces for tackling the most pressing issues we face in bioinformatics. Moving forward, hackathons should take a hybrid model and allow for both in-person and remote participation, while allowing more team leads the sequestration they need to fully focus their energies on these efforts instead of juggling both work and the hackathon.

Further, leveraging the support of librarians in the hackathon space can lead to a more organized, cohesive, and collaborative experience for participants. This is particularly true for fully remote or hybrid hackathons, where clear communication channels are crucial for all participants. Librarians can help facilitate collaboration and coordination between remote and in-person participants, and help collate resources (such as tools, software, and literature) found during the course of the hackathon.  

We are excited to see what the future of hybrid hackathons holds for our field at large, and the scientific discoveries that will result from these events.  Below are some upcoming hackathons you can follow or get involved in!

Everything is bigger in Texas: Pan-Structural Variation hackathon in the Cloud! 

October 10-13, 2021, hosted by the Baylor College of Medicine

BioHackathon Europe

November 8-12, 2021, hosted by ELIXIR Europe

CMU-DNAnexus Hybrid “Genomic Data to the Clinic” Hackathon

March 9-11, 2022, hosted by CMU Libraries (stay tuned for more details!)

We also recommend keeping an eye on future events and initiatives hosted by the DEMON network, an international network for applying data science and AI to dementia!

Keep an eye on this link for more information about these and other events: 

Visit us at Bio-IT World 2021!

We’re excited to see you back in you person at Bio-IT World in Boston! We’ll be there, masked up, and ready to join fellow life sciences, clinical, healthcare, pharma, and IT professionals to discuss recent advancements in the field, and the future of precision medicine. Come to our talks and visit us in booth 204 to learn about the latest enhancements to our biomedical data platform. Can’t make it to any of our events? Email us at to schedule a meeting with one of our scientists.

DNAnexus Talks & Booth 204 Events

Tuesday, September 21 

COVID & IT: SARS-COV-2 Genome Analyses and Computational Tools for Infectious Disease Surveillance

  • 8:40am  
  • Speaker: Ben Busby, Director Research Platforms

The COVID-19 pandemic has revealed the criticality of computational approaches for monitoring environmental microbiomes for emerging pathogens. This talk will focus on computational tools that enable rapid and robust global surveillance of infectious disease and will focus on existing computational approaches for SARS-CoV-2 detection and monitoring.

Booth Spotlight: Synthesis Data Harmonization on DNAnexus  

  • 9:30-10am 

As scientists seek to manage and interrogate larger volumes of diverse datasets, data harmonization becomes ever more complex. The Snthesis platform ingests data from a wide range of sources, including LIMS platforms, electronic lab notebooks, structured data extraction from handwritten notes, PDFs, Excel files, EHR data, and data output from lab instrumentation, and harmonizes them in an automated way. Join this booth session to learn more about data harmonization on DNAnexus with Snthesis.

Explainable ML for Adverse Drug Reactions Using DNAnexus

  • 10am – Track 5: AI for Drug Discovery & Development 
  • 12:30-1pm – Booth Spotlight 
  • Speaker: Mike Lelivelt

Pharmacogenomics researchers can leverage UK Biobank to better understand how genes affect a person’s response to drugs. Learn how DNAnexus Apollo efficiently analyzes this massive dataset with explainable machine learning models to gain insight into ADRs.

Booth Spotlight: UK Biobank Research Analysis Platform 

  • 2:30-3pm

UK Biobank contains a trove of genomic and clinical information from 500,000 volunteers, including exome and whole genome sequencing data, blood samples, medical imaging and extensive environmental and lifestyle data.. To address the scale of the dataset, UK Biobank partnered with DNAnexus to build the UK Biobank Research Analysis Platform. Leveraging the power and scalability of the cutting-edge, cloud-native DNAnexus Apollo Platform, the Research Analysis Platform enables researchers easily and quickly to search and analyze the incredibly rich UK Biobank dataset.

Wednesday, September 22 

Data Federation Panel: From Biobank Scale to Individual Patients: Bringing Complex Multi-Omic Data to the Clinic and Clinical Research 

  • 9:55am – Track 2: Data Management 
  • Panel Speakers: Ben Busby, DNAnexus; Ankita Das, MIODx; Rory Kelleher, NVIDIA, Ahmad Khleifat, King’s College London

Multi-omics datasets of different diseases are available to researchers, and for the first time the availability of new analytical tools allow for the incorporation of these datasets in clinical research. However, there are serious challenges involved in realizing the promise of these developments. Developing new methods for multi-omics data will allow for better patient stratification, more targeted treatments, and greater understanding of disease mechanisms.

A Multi-Omics Data Science Platform Powering a Comprehensive Precision Oncology Strategy 

  • 11:25am – Track 3: Software Applications & Services 
  • 1:15-1:45pm – Booth Spotlight 
  • Speaker: David Fenstermacher, VP Precision Medicine & Data Sciences

DNAnexus and City of Hope Comprehensive Cancer Center (COH) embarked on a partnership to develop a scalable cloud-based oncology platform, POSEIDON, to democratize data for COH’s research and clinical programs, accelerating the fulfillment of its Precision Oncology Strategy. POSEIDON leverages DNAnexus bioinformatics technologies to combine multi-omics data in a unified platform that supports advanced analytics and visualizations within a collaboration portal to discover new evidence-based treatments.

Panel: Deconvolution of Massive Scale Datasets from Etiological Lessons: Technical Tips and Tricks, Data Interoperability for Training, and Feature Extraction

  • 4:00pm – Track 2: Data Management 
  • Panel Speakers: Ben Busby, DNAnexus; Emerson Huitt, Snthesis; Vivian Neilley, Google Cloud Healthcare; Sean Davis, University of Colorado Anschutz Medical Campus 

Computational Immunology: Tailoring Tools to Tackle the T-Cell Triad

Whether you’re a seasoned computational scientist or an armchair immunologist, the growing focus on the T-cell triad — MHC, peptide & TCR — in both infectious and chronic diseases is necessitating easy access to complex datasets and analysis tools that can make sense of the human adaptive immune response.

In a recent webinar, DNAnexus research platforms expert Ben Busby joined Ankita Das, Head of Product at immune profiling company MIODx, to discuss ways in which the companies are working together to make it easier to interrogate and interpret TCR (T-cell receptor) data.

As Das explained, T-cell composition and activity are at the center of the immune response and key to tracking immune health. The composition of a person’s T-cell repertoire can vary depending upon factors like age, environment, genetics, infection, and lifestyle. When T-cells sense an infection, they undergo a phenomenon called clonal amplification, wherein a subset of the T-cell repertoire will amplify to orchestrate an immune response, whether that be killing off infected cells or recruiting B cells to generate antibodies. Interpreting clone activity can reveal important clues about immune health and insight that could potentially lead to biomarker and therapeutic discovery.

However, scientists wishing to undertake such research face several challenges. “TCR data is currently very siloed, and the architecture available for hosting and analyzing the immense data sets involved are often not scalable,” Das said. 

The MIODx team set out to overcome these challenges by creating ClonoMap™, a SaaSportal hosted on DNAnexus Titan™, in which TCR repertoire libraries can be stored, managed and analyzed in a secure environment. 

ClonoMapTM subscribers upload raw sequence data from TCR repertoire libraries into the portal and run  the ClonoMap™ Immune Profiler analysis to discover repertoire features. A second tool, ClonoMapTM Immune Insight, searches public datasets to help scientists see the new immune profiles in context and draw translational insights, such as biomarker identification. The MIODx team is now taking it a step further, applying machine learning to create personalized, immune health scores.

The ClonoMapTM suite of tools has already received nods for its innovation during the precisionFDA COVID-19 Precision Immunology App-a-thon. In the webinar, Das shared examples of how it was used to generate data and insights in COVID-19 and rheumatoid arthritis. In the case of COVID-19, for example, the Immune Profiler highlighted specific T Cell Receptor Beta Variable (TRBV) genes and CDR3 clonotypes at different frequencies in healthy individuals compared to COVID-19 recovered patients, providing TCRs for further investigation with respect to disease severity.

Holy grail of healthcare

Unravelling the immune response is the ‘holy grail of healthcare,’ Busby said. And in what may prove to be the decade of infectious disease research, Busby said he was proud to be able to provide tools to help scientists do so. 

“We want to make this size data accessible to everyone, and the DNAnexus platform really enables scientists and bioinformaticians to be more powerful,” Busby said. 

In addition to ClonoMap™, the iReceptor data discovery platform, curated by the AIRR (Adaptive Immune Receptor Repertoire) community, facilitates the curation, analysis and sharing of antibody/B-cell and T-cell receptor repertoires (AIRR-seq data) from multiple labs and institutions.

JupyterLab is another powerful tool that DNAnexus leverages for multi-omic cohort analysis and data exploration, and Busby recommended it for experts and armchair enthusiasts alike. He also noted that many DNAnexus-created Jupyter Notebooks are available, even to non-DNAnexus users. 

Other open source tools include Bioconductor, Bioconda and Docker. Each can be easily integrated into DNAnexus platforms, and the visualization capabilities of the DNAnexus system make the data obtained by them even more approachable, Busby said.  

You can watch the full webinar below.