The Spark compute engine is bringing real-time analytics to Hadoop. Today, the hot spot for interest in Spark is among analytic developers and the vendor community; for enterprises, it is natural to wonder what the fuss is all about, and how (and whether) Spark will impact their analytics. But the speed bump, for the moment, is that few in the Spark technology community are speaking to the enterprise, making the case for how it will change analytics and impact the business. For enterprises, the hot spot for Spark is likely to be operational analytics, thanks to its high performance. This has clear benefits for processing sensor data from the Internet of Things (IoT), performing real-time fraud or threat detection, modeling consumer behavior (and segmenting the customer base), and for data wrangling/data transformation. Spark’s extensibility allows organizations to combine real-time and historical analytics, making Lambda architectures real. And its API-based interfaces are well-suited for extending the power of third-party analytic tools that support pushdown processing to remote data sources, enabling them to crunch much more complex models, involving larger sets of data compared to running MapReduce in a Hadoop cluster, data mining, SQL querying in a data warehouse, or even MPC on specialized compute grids. For enterprises, Spark will be the engine that takes R and Python-based programmatic analytics mainstream. But its biggest impact in the long run, Spark’s biggest impact on the enterprise will be via the embedded route inside packaged analytics tools and enterprise applications that incorporate operational analytics as part of their core functionality. For vendors, the time to hop the Spark bandwagon is now.
Tony Baer leads Ovum’s Big Data research area. Over his 25 years in the industry, he has studied issues of data integration, software and data architecture, middleware, and application development. Having tracked the emergence of BI and data warehousing back in the 1990s, Baer sees similar parallels emerging in the world of Big Data today. His coverage focuses on how Big Data must become a first-class citizen in the data center, IT organization, and the business. Baer has a multi-disciplinary background touching the different tiers of enterprise software.