Spark Summit 2014 brought the Apache Spark community together on June 30- July 2, 2014 at the The Westin St. Francis in San Francisco. It featured production users of Spark, Shark, Spark Streaming and related projects.
F-Secure is an online security and privacy company from Finland. We would like to share our experiments with Spark; the reasons why we choose Spark in the first place, how we have been using it and what kind lessons we have learnt. F-Secure has a long history in categorisation and classification of files and websites, mainly from the data security perspective. During the years we have developed our own toolset and style of doing data mining. At the same time we constantly update our vision of what we would like our data mining platform to look like. We have been curiously following the Spark project for some time as it has been evolving towards a very interesting direction from our point of view. Spark has given us an effective interactive toolset to combine data from different sources, from our own databases and various data streams we receive. This enables fast experimentation with the data in order to spot and quickly respond to emerging changes, for example, predict the classifications of certain clusters of the web. With Spark and MLlib we aim to combine multiple custom tools to use a common generic framework. This will give us better usage of resources and more agile implementation of new functionality to meet continuously changing business landscape.
Perttu Ranta-aho: Works as a software engineer at F-Secure Labs, has been leading our Spark adaptation.
Ville Lindfors: Has been working as an architect at F-Secure Labs, responsible in creating and upgrading our data mining vision.