Apache Spark’s machine learning (ML) pipelines provide a lot of power, but sometimes the tools you need for your specific problem aren’t available yet. This talk introduces Spark’s ML pipelines, and then looks at how to extend them with your own custom algorithms. By integrating your own data preparation and machine learning tools into Spark’s ML pipelines, you will be able to take advantage of useful meta-algorithms, like parameter searching and pipeline persistence (with a bit more work, of course).
Even if you don’t have your own machine learning algorithms that you want to implement, this session will give you an inside look at how the ML APIs are built. It will also help you make even more awesome ML pipelines and customize Spark models for your needs. And if you don’t want to extend Spark ML pipelines with custom algorithms, you’ll still benefit by developing a stronger background for future Spark ML projects.
The examples in this talk will be presented in Scala, but any non-standard syntax will be explained.