Apache Spark is a dynamic execution engine that can take relatively simple Scala code and create complex and optimized execution plans. In this talk, we will describe how user code translates into Spark drivers, executors, stages, tasks, transformations, and shuffles. We will also discuss various sources of information on how Spark applications use hardware resources, and show how application developers can use this information to write more efficient code. We will show how Pepperdata’s products can clearly identify such usages and tie them to specific lines of code. We will show how Spark application owners can quickly identify the root causes of such common problems as job slowdowns, inadequate memory configuration, and Java garbage collection issues.
Vinod Nair leads product management at Pepperdata. He brings more than 20 years of experience in engineering and product management to the job, with a special interest in distributed systems and Hadoop. He has worked in software for telecommunications, financial management for small business, and big data. Vinod’s approach to product management is deeply influenced by his success in applying Lean Startup principles and rapid iteration to product design and development.