In healthcare, DICOM is an international standard format for storing medical images (MRI/CT representations). Each image has associated with it embedded metadata and pixel data. There is currently a tremendous amount of effort in healthcare to incorporate image analytics within clinical data analysis. Apache Spark is a natural framework to integrate these efforts.
This session presents an analytics workflow using Apache Spark to perform ETL on DICOM images, and then to perform Eigen decomposition to derive meaningful insights on the pixel data. The workflow integrates a Java based framework DCM4CHE with Apache Spark to parallelize the big data workload for fast processing. Users can extract features based on the metadata and run efficient clean/filter/drill-down for preprocessing. See a demonstration of predictive analytics with visualization using the metadata to derive insights, such as likelihood of a condition or efficacy of medication administered.
The speakers will also present performance benchmarks of this workflow on various datasets and cluster configurations to demonstrate the benefits of running this kind of analysis workflow on Apache Spark.
Session hashtag: #SFds20
Anahita Bhiwandiwalla is a Senior Software Engineer at Intel for their Analytics & Artificial Intelligence Solutions Group. She is currently working on creating analytics solutions for large scale distributed data and solving challenges that arise as the data scales. She holds an M.S. in Computer Science from Columbia University with an emphasis on Machine Learning. Anahita’s main interests are in Machine Learning, Natural Language Processing, Speech Recognition and Data Mining. She has presented her work at various meet-ups, webinars and conferences.
Karthik Vadla currently working as a Software Engineer in Artificial Intelligence Products Group(AIPG) at Intel Corporation. His responsibilities include developing cloud tools to access Neon (Intel Nervanas Deep Learning Framework). He also worked on Big Data Analytic libraries using Scala and Python on large distributed frameworks like Apache Spark on Cloudera Distribution of Hadoop (CDH). He holds Master’s Degree in Computer Science from Arizona State University. Karthik’s interests are in Distributed Systems/Frameworks and Deep Learning.