Speaker

Nan Zhu, Software Engineer at Microsoft

Nan Zhu

Software Engineer, Microsoft

Nan Zhu is a Software Engineer from Microsoft, where he works on serving Spark Streaming/Structured Streaming on Azure HDInsight. He is a contributor of Apache Spark (known as CodingCat) and also serves as the committee member of Distributed Machine Learning Community (DMLC) and Apache MxNet (incubator).

Sessions

Building a Unified Data Pipeline with Apache Spark and XGBoost

XGBoost (https://github.com/dmlc/xgboost) is a library designed and optimized for tree boosting. XGBoost attracts users from a broad range of organizations in both industry and academia, and more than half of the winning solutions in machine… Read more

Building Continuous Application with Structured Streaming and Real-Time Data Source

One of the biggest challenges in data science is to build a continuous data application which delivers results rapidly and reliably. Spark Streaming offers a powerful solution for real-time data processing. However, the challenge remains… Read more