Data integration is a really difficult problem. We know this because 80% of the time in every project is spent getting the data you want the way you want it. We know this because this problem remains challenging despite 40 years of attempts to solve it. All we want is a service that will be reliable, handle all kinds of data and integrate with all kinds of systems, be easy to manage and scale as our systems grow. Oh, and it should be super low latency too. Is it too much to ask?
In this presentation, we’ll discuss the basic challenges of data integration and introduce few design and architecture patterns that are used to tackle these challenges. We will then explore how these patterns can be implemented using Apache Kafka. Difficult problems are difficult and we offer no silver bullets, but we will share pragmatic solutions that helped many organizations build fast, scalable and manageable data pipelines.
Gwen is a product manager at Confluent. She has 15 years of experience working with code and customers to build scalable data architectures, integrating relational and big data technologies. Gwen is the author of “Kafka – The Definitive Guide” and “Hadoop Application Architectures”, and a frequent presenter at industry conferences. Gwen is a PMC member on the Apache Kafka project and committer on Apache Sqoop. When Gwen isn’t building data pipelines or thinking up new features, you can find her pedaling on her bike exploring the roads and trails of California, and beyond.