Nikita Shamgunov co-founded MemSQL and has served as CTO since inception. Prior to co-founding the company, Nikita worked on core infrastructure systems at Facebook. He served as a senior database engineer at Microsoft SQL Server for more than half a decade. Nikita holds a bachelor’s, master’s and doctorate in computer science, has been awarded several patents and was a world medalist in ACM programming contests.
As the dangers of global climate change multiply, utility companies seek methods to reduce carbon emissions, such as integrating renewable and sustainable energy sources like wind, solar, and hydroelectric power. Renewable energy not only has the power to improve climate conditions, it also encourages economic growth. By combining advances in sensor technology with machine learning algorithms and environmental data, utility companies can monitor energy sources in real time to make faster decisions and speed innovation. In this session, Nikita Shamgunov, CTO and co-founder of MemSQL, will conduct a live demonstration based on real-time data from 2 million sensors on 197,000 wind turbines installed on wind farms around the world. This Internet of Things (IoT) simulation explores the ways utility companies can integrate new data pipelines into established infrastructure. Attendees will learn how to deploy this breakthrough technology composed of Apache Kafka, a real-time message queue; Streamliner, an integrated Apache Spark solution; MemSQL Ops, a cluster management and monitoring interface; and a set of simulated data producers written in Python. By applying machine learning to analyze millions of data points in real time, the data pipeline predicts and visualizes health of wind farms at global scale. This architecture propels innovation in the energy industry and is replicable across other IoT applications including smart cities, connected cars, and digital healthcare.
Going real-time is the next phase for big data, and streaming remains a primary mechanism to get there. Spark provides groundbreaking capabilities to handle real-time data, including streams and transformation. And retaining both real-time and historical data provides the most accurate mechanisms for predictive analytics and machine learning. In this session, I, will outline architecting real-time data pipelines with the power of Apache Spark and a robust, distributed in-memory database. In particular, I will detail how some of the world’s largest companies are running business critical applications using Spark. Attendees will dive deep into the mechanics of real-time pipelines, the ability to durably store data, and how to instantly derive insights from billions of data points.