Speaker

Fokko Driesprong

Data Engineer, GoDataDriven

Fokko Driesprong is a data engineer at GoDataDriven. What he enjoys most is writing scalable code using functional languages (preferably Scala) and loves to play with big data processing platforms. Besides being a consultant he contributes to open source projects, among others Apache Spark, Apache Flink, Apache Airflow and Druid.

Sessions

Working with Skewed Data: The Iterative Broadcast

Skewed data is the enemy when joining tables using Spark. It shuffles a large proportion of the data onto a few overloaded nodes, bottlenecking Spark’s parallelism and resulting in out of memory errors. The go-to… Read more