SESSION

Using SparkML to Power a DSaaS (Data Science as a Service)

Slides PDF Video

Almost all organizations now have a need for data science and, as such, the main challenge after determining the algorithm is to scale it up and make it operational. Comcast uses several tools and technologies such as Python, R, SaS, H2O and so on. In this session, they’ll show how many common use cases use the common algorithms like Logistic Regression, Random Forest, Decision Trees, Clustering, NLP, etc.

Apache Spark has several machine learning algorithms built in and has excellent scalability. Hence, at Comcast, they built a platform to provide DSaaS on top of Spark with REST API as a means of controlling and submitting jobs, so as to abstract most users from the rigor of writing (repeating) code, instead focusing on the actual requirements.

Learn how they solved some of the problems of establishing feature vectors, choosing algorithms and then deploying models into production. They’ll also showcase their use of Scala, R and Python to implement models using language of choice yet deploying quickly into production on 500-node Spark clusters.

Session hashtag: #SFeco19

Kiran Muglurmath, Executive Director at Comcast

About Kiran

Kiran Muglurmath is the Executive Director of Big Data Analytics at Comcast, where he manages a team of data scientists and big data engineers for machine learning, data mining and predictive analytics. Prior to Comcast, Kiran was a consulting big data platform architect and data scientist at T-Mobile and Boeing. He holds an MBA from the Kellogg School at Northwestern University, and a Computer Science degree from Bangalore University.

Sridhar Alla, Director of Big Data Solutions at Comcast

About Sridhar

Sridhar Alla currently works as the Director of Big Data Solutions and Architecture at Comcast, where he has delivered several key solutions, such as the XFinity personalization platform, ClickthruAnalytics, Correlation platform, etc. Sridhar started his career in network appliances on NAS and caching technologies. He also served as the CTO of security company eIQNetworks, where he merged the concepts of big data and security products. He holds patents on topics of very large scale processing algorithms and caching.