Scaling Genetic Data Analysis with Apache Spark

In 2001, it cost ~$100M to sequence a single human genome. In 2014, due to dramatic improvements in sequencing technology far outpacing Moore’s law, we entered the era of the $1,000 genome. At the same time, the power of genetics to impact medicine has become evident. For example, drugs with supporting genetic evidence are twice as likely to succeed in clinical trials. These factors have led to an explosion in the volume of genetic data, in the face of which existing analysis tools are breaking down.

As a result, the Broad Institute began the open-source Hail project (, a scalable platform built on Apache Spark, to enable the worldwide genetics community to build, share and apply new tools. Hail is focused on variant-level (post-read) data; querying genetic data, as well as annotations, on variants and samples; and performing rare and common variant association analyses. Hail has already been used to analyze datasets with hundreds of thousands of exomes and tens of thousands of whole genomes, enabling dozens of major research projects.

Session hashtag: #SFr1

Jonathan Bloom, Co-Founder, Hail Team at Broad Institute of MIT and Harvard

About Jonathan

Jonathan Bloom is a mathematician, engineer, and co-founder of the Hail team at the Broad Institute of MIT and Harvard. Prior to joining the Broad, he did research in geometry and algebraic topology as a Moore Instructor and NSF Fellow in Mathematics at the Massachusetts Institute of Technology. While there, he re-architected the department’s introductory course on probability and statistics, now available on MIT OpenCourseWare. He received his B.A. from Harvard University and Ph.D. from Columbia University in Mathematics.

Timothy Poterba, Engineer and Computational Biologist at Broad Institute of MIT and Harvard

About Timothy

Tim Poterba is an engineer and computational biologist on the Hail team at the Broad Institute of MIT and Harvard. Prior to joining the Broad, he studied protein folding dynamics at the Max Planck Institute for Biochemistry on a Fulbright Scholarship. He received his B.A. in Biophysics from Amherst College in 2013.