This talk will showcase work done by the bioinformatics team at CSIRO in Sydney, Australia to make Spark more useful and usable for the bioinformatics community. They have created a custom library, variant-spark, which provides a DSL and also a custom implementation of Spark ML via random forests for genomic pipeline processing. We’ve created a demo, using their ‘Hipster-genome’ and a Databricks notebook to better explain their library to the world-wide bioinformatics community. This notebooks compares results with another popular genomics library (HAIL.io) as well.
Session hashtag: #EUeco6
Piotr Szul is a Senior Engineer in CSIRO Data61. He holds a MSc degree in Computer Science and has over fifteen year of experience in developing various types of commercial and scientific software applications.
Since his joining of CSIRO in 2012 Mr Szul has been involved in a number of project of developing research platforms and applying emerging big data.