Publication Watch: In Early 2013, Nice Flow of New Papers from DNAnexus Users

It’s been awhile since we checked in on publications using DNAnexus, so we headed over to PubMed to provide an update. With so many great new papers coming out — more than 10 just in the past few months — we wanted to take the opportunity to look at a few of them and see how they’re making use of DNAnexus.


In the Journal of Medical Genetics, scientists from Hebrew University Medical Center and colleagues at other organizations published a paper entitled “Agenesis of corpus callosum and optic nerve hypoplasia due to mutations in SLC25A1 encoding the mitochondrial citrate transporter” (published online February 2013). Lead author Simon Edvardson et al. report on the first known patient with agenesis of corpus callosum caused by a mitochondrial citrate carrier deficiency. The team performed exome sequencing and used DNAnexus for read alignment and variant calling. Two pathogenic variants were found in a gene responsible for the mitochondrial citrate transporter, and functional studies in yeast validated the findings by displaying the same biomolecular effects of the mutated proteins.


In the January issue of Antimicrobial Agents and Chemotherapy, a journal from the American Society for Microbiology, a research team from Georgetown University Medical Center and the Institute of Microbiology in Beijing released a paper called “Azole Susceptibility and Transcriptome Profiling in Candida albicans Mitochondrial Electron Transport Chain Complex I Mutants.” In the study, the authors looked at how mitochondrial changes in yeast alter susceptibility to certain azole compounds commonly used as antifungal agents. As part of the effort, the team used RNA-seq to generate a transcriptome profile of two mutants known to increase susceptibility to azoles. Data analysis was conducted through DNAnexus. The scientists found that both mutants showed downregulation of transporter genes that encode efflux proteins, a mechanism thought to be linked to the cell energy required for azole susceptibility.


In the journal Human Mutation, a paper entitled “A Deletion Mutation in TMEM38B Associated with Autosomal Recessive Osteogenesis Imperfecta” (published online in January) comes from a research group at Ben Gurion University and the Soroka Medical Center, both in Israel. The scientists studied patients with autosomal recessive osteogenesis imperfecta, or brittle bone disease, which could not be explained by any previously known mutation. The team used genome-wide linkage analysis and whole exome sequencing to identify a single mutation common to all three patients: a homozygous deletion mutation of an exon in TMEM38B. Sequence read alignment, variant calling, and annotation were done with DNAnexus tools.


Finally, a paper published early online in February in the journal Case Reports in Genetics called “Targeted Next-generation Re-sequencing of F5 gene Identifies Novel Multiple Variants Pattern in Severe Hereditary Factor V Deficiency“ comes from a group that used DNAnexus for data quality, exome coverage, and exome-wide SNP/indel analysis. The authors — scientists from Pennsylvania State University and MS Hershey Medical Center — present a study of four people with severe factor V deficiency in which they used next-gen sequencing to study the factor V gene locus. They found five coding mutations and 75 noncoding variants, including three missense mutations previously associated with other factor V phenotypes.

DNAnexus in the Literature: A Look at Recent Papers Using Our Platform

It’s great to find that several papers published recently have used DNAnexus in their research — and even more interesting to see the broad range of applications presented in these papers, from cross-species analyses to novel disease polymorphisms, and from single genes to whole genomes. We thought it would be informative to take a look at a few of these publications and discuss how the DNAnexus tools contributed to them.

academic papers citing dnanexusFirst we have a paper in Genome Biology from Eric Vallender at Harvard Medical School. He reports using human exome enrichment methods on chimpanzee and rhesus macaque samples. The chimpanzee sample showed similar coverage levels and distributions following exome capture as the human sample, whereas the rhesus macaque sample showed significant coverage in protein-coding sequence but much less in untranslated regions. In both cases, the primates showed significant numbers of frameshift mutations compared to self-genomes. Vallender used DNAnexus for “initial data analysis, including alignment to genome, coverage analysis, and Nucleotide-Level Variation analysis,” according to the paper.

Next up, we have three papers from scientists at the Cleveland Clinic focusing on myeloid disorders. In a paper from the journal Leukemia, researchers screened samples from patients with RARS (Refractory Anemia with Ring Sideroblasts) and RARS-T (RARS with refractory thrombosis), two distinct subtypes of MDS (myelodysplastic syndromes) and MDS/myeloproliferative neoplasms (MDSs/MPNs). They used Exome and Nucleotide-Level Variation analyses to identify variants associated with the conditions, finding somatic mutation in SF3B1, a gene located in chromosome 2q in several of the patients with RARS or RARS-T. Another paper, this one published in Blood, used DNAnexus to perform mapping, variant analysis, and RNA-seq in a project that used exome sequencing of 15 patients with myeloid neoplasms to find somatic mutations. Mutations were found that affect spliceosomal genes, resulting in defective splicing and suggesting a new leukemogenic pathway. Analysis of the mutations may serve as useful diagnostic markers, or potentially even therapeutic targets. In the third paper, also in Blood, they used SNP chips, gene expression arrays, and next-gen sequencing to look at loss of heterozygosity affecting chromosome 7q, which is common in AML and MDSs. Using direct and parallel sequencing, they found no recurrent mutations in typically large deletion 7q and monosomy 7 patients, but they did find decreased expression of genes included in SNP-A defined minimally deleted regions.

A Nature paper from Stanford scientists Rada-Iglesias et al. employed ChIP-seq and RNA-seq analyses of human embryonic stem cells to find unique chromatin signatures that identified two distinct classes of genomic elements. The study also identified more than 2,000 putative regulatory sequences, providing an invaluable resource for lineage tracking and isolation of transient cell populations representing early steps of human development. In a second paper, the same group identified a new member of the mESC (mouse embryonic stem cell) transcriptional network, Prdm14, which plays a dual role as a context-dependent transcriptional repressor or activator.

Scientists at the Hadassah-Hebrew University Medical Center in Israel have made use of DNAnexus in four of their recently published papers that used our Exome and Nucleotide-level Variation analysis to look at polymorphic changes associated with various clinical conditions. The first of these came out in Molecular Genetics and Metabolism, providing a study of three siblings with ventriculomegaly at early gestation. The group used linkage analysis and exome sequencing to identify a hemizygous mutation in the mitochondrial X-linked AIFM1 gene which encodes Apoptosis Induced Factor (AIF), a 613 amino acid flavoprotein. In the Annals of Neurology, the team reports homozygosity mapping followed by exome sequencing to find a deleterious mutation in the DST gene in four infants with a new lethal autonomic sensory neuropathy. In PLoS ONE, they studied two patients with juvenile parkinsonism and used homozygosity mapping and whole exome sequencing to identify a deleterious mutation in DNAJC6, which encodes the HSP40 Auxilin, a protein selectively expressed in neurons. The paper underscores a key role of the endocytic/lysosomal pathway in the pathogenesis of Parkinson disease and other forms of parkinsonism. In their most recent paper, the group studied the molecular basis of childhood familial chronic Coombs’ negative hemolysis and relapsing polyneuropathy in infants of North-African Jewish origin from four unrelated families using homozygosity mapping and exome sequencing. A homozygous missense mutation, p.Cys89Tyr in CD59, was identified in all the patients. The mutated protein was expressed at lower levels and failed to localize properly on the cell surface.

It’s really rewarding to see that DNAnexus is making a difference for scientists. We’ll continue to keep an eye on the literature and offer updates as other publications using DNAnexus are released.

New ENCODE Paper Reveals Remarkable Chromatin Diversity at Regulatory Elements


Today marks a major milestone for the ENCODE consortium! More than 30 papers will be published today in Nature, Genome Biology, and Genome Research from teams of scientists working on various facets of the project.


One of those, a publication in Genome Research, reports on a surprising level of heterogeneity among patterns of chromatin modifications as well as nucleosome positioning around regulatory elements such as transcription factor binding sites in the human genome. In the past, these genomic elements have been studied primarily by averaging patterns of chromatin marks across populations of sites, leading to the perception that patterns were much more uniform. The nucleosome positioning sequence data mapping and analysis was performed on the DNAnexus platform.


Lead author Anshul Kundaje was a postdoc for Serafim Batzoglou and Arend Sidow at Stanford University during the project reported in the paper. Now a research scientist at MIT, Kundaje says the work was an integral part of the ENCODE consortium’s efforts to elucidate functional elements in the human genome. The scientists looked at 119 human transcription factors and regulatory proteins to better understand how nucleosomes are positioned and how histone modifications are made around binding sites. In the paper, the authors report that asymmetry of nucleosome positioning and histone modifications is the rule, rather than the exception.


Kundaje and his colleagues relied on ChIP-seq data for the 119 transcription factors in a variety of cell types, with corresponding data for histone modifications. They also generated similar data for nucleosome positioning. To improve accuracy, the team sequenced extremely deeply, ultimately generating some 5 billion reads on the SOLiD sequencing platform. “The data sets were incredibly massive,” Kundaje says. “Processing these data sets locally was quite a challenge.” The group turned to DNAnexus, uploading their sequence files to the cloud and preprocessing the data with the company’s probabilistic mapping tool. “DNAnexus made that process incredibly simple,” he adds.


Figure 1:The mapping of the 5 billion reads was performed using the DNAnexus mapper.

Using a new tool they developed — the Clustered AGgregation Tool (CAGT) for pattern discovery — the scientists found that nucleosome positioning and histone modification at transcription factor binding sites is far more diverse than was previously thought. Rather than averaging across the regions as most studies have done, the new clustering tool was able to analyze the differences in magnitude, shapes, and orientation of the many patterns identified.


“What we found is that the results you get from the clustering approach are dramatically different from what you get by simply averaging across all types,” Kundaje says. “We found a large diversity of patterns of histone modifications as well as nucleosome positioning around almost every transcription factor binding site.”


Even the well-known and remarkably well studied transcription factor CTCF, long established as an insulator, was found to have surrounding chromatin patterns pointing to other functions throughout the genome.


Figure 2: Analysis using CAGT reveals the surprising diversity of patterns of an active chromatin mark H3K27ac around the binding sites of the CTCF protein that is well-known for its repressive insulator role.


The authors used their clustering tool to group the patterns into some 25 distinct signatures “that completely capture the diversity of all the modifications across all binding sites in a variety of cell types,” Kundaje says. The method uses ‘metapatterns’ to explain that diversity, and that information can reveal the function of these elements in context. “By accounting for combinatorial relationships between various binding events and how they affect chromatin, this gives you a more complete biological sense of what a transcription factor is doing in a cell type,” he adds.


Kundaje is already following up on this study by looking at other species to see whether the heterogeneity of modification patterns holds true in other organisms. He continues to use DNAnexus for analysis of sequencing data, especially in read mapping, quality control, and genome browsing, he says.


Using DNAnexus for the team’s ENCODE study “made the process significantly easier,” Kundaje adds, noting that the cloud provider’s direct integration of the genome browser was particularly helpful. DNAnexus allowed Kundaje and his colleagues to go from data to visualization with minimal processing steps in between, he says. “It frees up your time to focus on the more interesting work.”


For a glimpse of some of Kundaje’s data, DNAnexus has made the 20 samples available on DNAnexus in the Public Data folder, called Encode. Click here to sign up for a free account.


Check out the Kundaje et al. paper “Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements.”