Training Continues: Advanced: Exploring Wikipedia with Spark


The real power and value proposition of Apache Spark is in building a unified use case that combines ETL, batch analytics, real­time stream analysis, machine learning, graph processing and visualizations. In class we will explore various Wikipedia datasets while applying the ideal programming paradigm for each analysis. The class will comprise of about 50% lecture and 50% hands on labs + demos.