Photo of

Evan Sparks

PhD Student, UC Berkeley AMPLAB

Evan Sparks is a PhD Student in the Computer Science Division at UC Berkeley. His research focuses on the design and implementation of distributed systems for large scale data analysis and machine learning. Prior to Berkeley he spent several years in industry tackling large scale data problems as a Quantitative Financial Analyst at MDT Advisers and as a Product Engineer at Recorded Future. He holds a bachelor’s degree from Dartmouth College.


Building Large Scale Machine Learning Applications with Pipelines

Real world machine learning applications typically consist of many components in a data processing pipeline. For example, in text classification, preprocessing steps like n-gram extraction, and TF-IDF feature weighting are often necessary before training of…