Training:Advanced: Exploring Wikipedia with Spark


The real power and value proposition of Apache Spark is in building a unified use case that combines ETL, batch analytics, real­time stream analysis, machine learning, graph processing and visualizations. In class we will explore various Wikipedia datasets while applying the ideal programming paradigm for each analysis. The class will comprise of about 50% lecture and 50% hands on labs + demos.

Photo of Sameer Farooqui

About Sameer

Sameer Farooqui is a Client Services Engineer at Databricks where he focuses on training and curriculum development. Prior to that, he was a freelance big data + NoSQL consultant and trainer. Before freelancing, Sameer was a Systems Architect at Hortonworks, an Emerging Data Platforms Consultant at Accenture R&D and an Enterprise Solutions Specialist at Symantec (VERITAS division).