High Resolution Energy Modeling that Scales with Apache Spark 2.0

Slides PDF Video

As advanced sensor technologies are becoming widely deployed in the energy industry, the availability of higher-frequency data results in both analytical benefits and computational costs. To an energy forecaster or data scientist, some of these benefits might include enhanced predictive performance from forecasting models as well as improved pattern recognition in energy consumption across building types, economic sectors, and geographies. To a utility or electricity service provider, these benefits might include significantly deeper insights into their diverse customer base. However, these advantages can come with a high computational price tag. With Spark 2.0, User-Defined Functions can be applied across grouped SparkDataFrames in the SparkR API to solve the multivariate optimization and model selection problems typically required for fitting site-level models. This recently added feature of Spark 2.0 on Databricks has allowed DNV GL to efficiently fit predictive models that relate weather, electricity, water, and gas consumption across virtually any number of buildings.

Jonathan Farland, Sr. Data Scientist at DNV GL

About Jonathan

Jonathan Farland is a technical consultant for DNV GL Energy in the Policy, Advisory and Research group and serves as the lead data scientist on both quantitative and qualitative energy studies. Mr. Farland’s primary focus is on the development of electricity demand forecasting systems that are capable of predicting demand while accounting for emerging or disruptive technologies such as smart grids, storage, photovoltaic cells, and electric vehicles. Developing these predictive models often requires the collection of large amounts of data and information on electricity usage, as well as climatological and economic conditions. Mr. Farland uses R and Python while leveraging the Spark distributed computing framework to effectively deploy model estimation and statistical learning algorithms.