All benchmarks reported here were performed on an Intel i7-7820x CPU. GPU Benchmarks were done on a NVIDIA A6000.
The benchmark_spark.py script compares the AlternatingLeastSquares model found here to the implementation found in Spark MLlib.
To run this comparison, you should first compile Spark with native BLAS support.
This benchmark compares the Conjugate Gradient solver found in implicit on both the CPU and GPU, to the Cholesky solver used in Spark.
The times per iteration are average times over 5 iterations.
For the lastm.fm dataset at 256 factors, implicit on the CPU is 30x faster than Spark and the GPU version of implicit is 260x faster than Spark:
For the ml20m dataset at 256 factors, implicit on the CPU was 23x faster than Spark while the GPU version was 180x faster than Spark:
Note that this dataset was filtered down for all versions to reviews that were positive (4+ stars), to simulate a truly implicit dataset.