Boosting Machine Learning with Redis
Modules and Spark
Dvir Volk, Redis Labs, November 2016
2
Hello World
Open source. The leading in-memory database
The open source home and commercial provider of
Redis - cloud and on-premise
Senior System Architect at Redis Labs. Redis user
and contributor for ~6 years
@dvirsky
dvirvolk
3
A Brief Overview of Redis
● Started in 2009 by Salvatore Sanfilippo
● Mostly a one man show
● Most popular KV store
● Notable Users:
○ Twitter, Netflix, Uber, Groupon, Twitch
○ Many, many more...
4
A Brief Overview of Redis
▪ Key => Data Structure server
▪ In memory disk backed
▪ Optional cluster mode
▪ Embedded Lua scripting
▪ Single Threaded!
▪ Key features: Fast, Flexible, Simple
5
A Lego For Your Database
Key
"I'm a Plain Text String!"
{ A: “foo”, B: “bar”, C: “baz” }
Strings/Blobs/Bitmaps
Hash Tables (objects!)
Linked Lists
Sets
Sorted Sets
Geo Sets
HyperLogLog
{ A , B , C , D , E }
[ A → B → C → D → E ]
{ A: 0.1, B: 0.3, C: 100, D: 1337 }
{ A: (51.5, 0.12), B: (32.1, 34.7) }
00110101 11001110 10101010
6
Redis In Practice
▪ “Front End Database”
▪ Real Time Counters
▪ Ad Serving
▪ Message Queues
▪ Geo Database
▪ Time Series
▪ Cache
▪ Session State
▪ Etc
7
But Can Redis Do X?
Secondary Index?
Time Series?
Full Text Search?
Graph?
Machine Learning?
AutoComplete?
SQL?
8
So You Want a New Feature?
▪ Try a Lua script
▪ Convince @antirez
▪ Fork Redis
▪ Build Your Own Database!
9
Enter Redis Modules
▪ In development since March 2016
▪ Redis 4.0 RC out soon
▪ Several modules already exist
▪ Key paradigm shift for Redis
10
New Capabilities
What Modules Actually Are
▪ Dynamic libraries loaded to redis
▪ Written in C/C++
▪ Use a C ABI/API isolating redis internals
▪ Near Zero latency access to data
New Commands
New Data Types
11
Obligatory Module Example
12
LEFTPAD Example
127.0.0.1:6379> MODULE LOAD "./example.so"
OK
127.0.0.1:6379> COMMAND INFO EXAMPLE.LEFTPAD
1) 1) "example.leftpad"
...
127.0.0.1:6379> EXAMPLE.LEFTPAD "foo" 8
foo
127.0.0.1:6379> EXAMPLE.LEFTPAD "foo" 8 "_"
_____foo
13
Real Module: RediSearch
▪ From-Scratch search index over redis
▪ Uses Strings for holding compressed index data
▪ Includes stemming, exact phrase match, etc.
▪ Fast Fuzzy Auto-complete
▪ Up to X5 faster than Elastic / Solr
> FT.SEARCH “lcd tv” FILTER price 100 +inf
> FT.SUGGET “lcd” FUZZY
14
More Modules Out There
▪ Native JSON Support
▪ Time Series
▪ Secondary Indexing
▪ Encryption
▪ Bloom Filters
▪ Online Neural Network
▪ Many Many more...
15
Spark ML + Redis modules
16
Redis + Spark So Far
▪ Current connector:
- RDD abstraction
- SparkSQL
- Streaming Source
▪ ML is not addressed specifically
▪ Used for pre-computed results
▪ We felt that we can take it further
17
Addressing The ML Pain
▪ The missing piece of ML: Serving your model
- Not standardized
- Vendor-lock with cloud platforms
- Reliable services are hard to do
- If only we had a “database” for this!
- Well, maybe we do?
18
Why Modules for ML?
With modules we can:
▪ Define data structures for models
▪ Store training output as “hot model”
▪ Perform evaluation directly in Redis
▪ Easily integrate existing C/C++ libs
19
Spark + Modules = AWESOME
▪ Train ML model on Spark
▪ Save model to Redis and get:
- High availability
- Clustering
- Persistence
- Performance
- Client libraries
20
Spark-ML End-to-End Flow
Spark Training
Custom Server
Model saved to
Parquet file
Data Loaded
to Spark
Pre-computed
results
Batch Evaluation
?
ClientApp
21
Adding Redis Into The Mix
Redis-ML “Active Model”
Any Training Platform
ClientApp
Spark Training
Data Loaded
to Spark
22
Redis Module
Tree Ensembles
Linear Regression
Logistic Regression
Matrix + Vector Operations
More to come...
The Redis-ML Module
23
Example: Random Forest
24
Forest Data Type
▪ A collection of decision trees
▪ Supports classification & regression
▪ Splitter Node can be
- Categorical (e.g. day == “Sunday”)
- Numerical (e.g. age < 43)
25
Decision Tree Example
The famous Titanic survival predictor
sex=male?yes no
Survived
Died
Age > 9.5?
sibsp > 2.5?
Died Survived *sibsp = siblings + spouses
26
Forest Data Type Example
> MODULE LOAD "./redis-ml.so"
OK
> ML.FOREST.ADD myforest 0 . CATEGORIC sex “male” .L
LEAF 1 .R LEAF 0
OK
> ML.FOREST.RUN myforest sex:male
"1"
> ML.FOREST.RUN myforest sex:yes_please
"0"
27
Using Redis-ML With Spark
scala> import com.redislabs.client.redisml.MLClient
scala> import com.redislabs.provider.redis.ml.Forest
scala> val rfModel =
pipelineModel.stages.last.asInstanceOf[RandomForestClassificationModel]
scala> val f = new Forest(rfModel.trees)
scala> f.loadToRedis("forest-test", "localhost")
scala> val jedis = new Jedis("localhost")
scala> jedis.getClient.sendCommand(MLClient.ModuleCommand.FOREST_RUN,
"forest-test", makeInputString (0))
scala> jedis.getClient.getStatusCodeReply
res53: String = 1
28
Benchmarking Redis-ML
- Spark + Parquet Spark + Redis ML
Model Preparation + Save 3785ms 292ms
Model Load 2769ms 0ms (model is on memory)
Classification (AVG) 13ms 1ms
● Forest size: 15000 trees
● Data: $(SPARK_HOME)/data/mllib/sample_libsvm_data.txt
29
Going Forward - More Features
▪ Implement more Spark-ML model types
- SVM
- Naive Bayes Classifier
- Neural Networks
▪ Integration with Redis’ native types
▪ Data Processing (e.g. Word2Vec, TF-IDF)
▪ PMML Support
30
PS: Neural Redis
▪ Developed by Salvatore
▪ Training is done inside redis
▪ Online continuous training process
▪ Builds Fully Connected NNs
31
More Resources
Redis-ML:
https://github.com/RedisLabsModules/redis-ml
Spark-Redis-ML:
https://github.com/RedisLabs/spark-redis-ml
Neural-Redis:
https://github.com/antirez/neural-redis
32

Boosting Machine Learning with Redis Modules and Spark

  • 1.
    Boosting Machine Learningwith Redis Modules and Spark Dvir Volk, Redis Labs, November 2016
  • 2.
    2 Hello World Open source.The leading in-memory database The open source home and commercial provider of Redis - cloud and on-premise Senior System Architect at Redis Labs. Redis user and contributor for ~6 years @dvirsky dvirvolk
  • 3.
    3 A Brief Overviewof Redis ● Started in 2009 by Salvatore Sanfilippo ● Mostly a one man show ● Most popular KV store ● Notable Users: ○ Twitter, Netflix, Uber, Groupon, Twitch ○ Many, many more...
  • 4.
    4 A Brief Overviewof Redis ▪ Key => Data Structure server ▪ In memory disk backed ▪ Optional cluster mode ▪ Embedded Lua scripting ▪ Single Threaded! ▪ Key features: Fast, Flexible, Simple
  • 5.
    5 A Lego ForYour Database Key "I'm a Plain Text String!" { A: “foo”, B: “bar”, C: “baz” } Strings/Blobs/Bitmaps Hash Tables (objects!) Linked Lists Sets Sorted Sets Geo Sets HyperLogLog { A , B , C , D , E } [ A → B → C → D → E ] { A: 0.1, B: 0.3, C: 100, D: 1337 } { A: (51.5, 0.12), B: (32.1, 34.7) } 00110101 11001110 10101010
  • 6.
    6 Redis In Practice ▪“Front End Database” ▪ Real Time Counters ▪ Ad Serving ▪ Message Queues ▪ Geo Database ▪ Time Series ▪ Cache ▪ Session State ▪ Etc
  • 7.
    7 But Can RedisDo X? Secondary Index? Time Series? Full Text Search? Graph? Machine Learning? AutoComplete? SQL?
  • 8.
    8 So You Wanta New Feature? ▪ Try a Lua script ▪ Convince @antirez ▪ Fork Redis ▪ Build Your Own Database!
  • 9.
    9 Enter Redis Modules ▪In development since March 2016 ▪ Redis 4.0 RC out soon ▪ Several modules already exist ▪ Key paradigm shift for Redis
  • 10.
    10 New Capabilities What ModulesActually Are ▪ Dynamic libraries loaded to redis ▪ Written in C/C++ ▪ Use a C ABI/API isolating redis internals ▪ Near Zero latency access to data New Commands New Data Types
  • 11.
  • 12.
    12 LEFTPAD Example 127.0.0.1:6379> MODULELOAD "./example.so" OK 127.0.0.1:6379> COMMAND INFO EXAMPLE.LEFTPAD 1) 1) "example.leftpad" ... 127.0.0.1:6379> EXAMPLE.LEFTPAD "foo" 8 foo 127.0.0.1:6379> EXAMPLE.LEFTPAD "foo" 8 "_" _____foo
  • 13.
    13 Real Module: RediSearch ▪From-Scratch search index over redis ▪ Uses Strings for holding compressed index data ▪ Includes stemming, exact phrase match, etc. ▪ Fast Fuzzy Auto-complete ▪ Up to X5 faster than Elastic / Solr > FT.SEARCH “lcd tv” FILTER price 100 +inf > FT.SUGGET “lcd” FUZZY
  • 14.
    14 More Modules OutThere ▪ Native JSON Support ▪ Time Series ▪ Secondary Indexing ▪ Encryption ▪ Bloom Filters ▪ Online Neural Network ▪ Many Many more...
  • 15.
    15 Spark ML +Redis modules
  • 16.
    16 Redis + SparkSo Far ▪ Current connector: - RDD abstraction - SparkSQL - Streaming Source ▪ ML is not addressed specifically ▪ Used for pre-computed results ▪ We felt that we can take it further
  • 17.
    17 Addressing The MLPain ▪ The missing piece of ML: Serving your model - Not standardized - Vendor-lock with cloud platforms - Reliable services are hard to do - If only we had a “database” for this! - Well, maybe we do?
  • 18.
    18 Why Modules forML? With modules we can: ▪ Define data structures for models ▪ Store training output as “hot model” ▪ Perform evaluation directly in Redis ▪ Easily integrate existing C/C++ libs
  • 19.
    19 Spark + Modules= AWESOME ▪ Train ML model on Spark ▪ Save model to Redis and get: - High availability - Clustering - Persistence - Performance - Client libraries
  • 20.
    20 Spark-ML End-to-End Flow SparkTraining Custom Server Model saved to Parquet file Data Loaded to Spark Pre-computed results Batch Evaluation ? ClientApp
  • 21.
    21 Adding Redis IntoThe Mix Redis-ML “Active Model” Any Training Platform ClientApp Spark Training Data Loaded to Spark
  • 22.
    22 Redis Module Tree Ensembles LinearRegression Logistic Regression Matrix + Vector Operations More to come... The Redis-ML Module
  • 23.
  • 24.
    24 Forest Data Type ▪A collection of decision trees ▪ Supports classification & regression ▪ Splitter Node can be - Categorical (e.g. day == “Sunday”) - Numerical (e.g. age < 43)
  • 25.
    25 Decision Tree Example Thefamous Titanic survival predictor sex=male?yes no Survived Died Age > 9.5? sibsp > 2.5? Died Survived *sibsp = siblings + spouses
  • 26.
    26 Forest Data TypeExample > MODULE LOAD "./redis-ml.so" OK > ML.FOREST.ADD myforest 0 . CATEGORIC sex “male” .L LEAF 1 .R LEAF 0 OK > ML.FOREST.RUN myforest sex:male "1" > ML.FOREST.RUN myforest sex:yes_please "0"
  • 27.
    27 Using Redis-ML WithSpark scala> import com.redislabs.client.redisml.MLClient scala> import com.redislabs.provider.redis.ml.Forest scala> val rfModel = pipelineModel.stages.last.asInstanceOf[RandomForestClassificationModel] scala> val f = new Forest(rfModel.trees) scala> f.loadToRedis("forest-test", "localhost") scala> val jedis = new Jedis("localhost") scala> jedis.getClient.sendCommand(MLClient.ModuleCommand.FOREST_RUN, "forest-test", makeInputString (0)) scala> jedis.getClient.getStatusCodeReply res53: String = 1
  • 28.
    28 Benchmarking Redis-ML - Spark+ Parquet Spark + Redis ML Model Preparation + Save 3785ms 292ms Model Load 2769ms 0ms (model is on memory) Classification (AVG) 13ms 1ms ● Forest size: 15000 trees ● Data: $(SPARK_HOME)/data/mllib/sample_libsvm_data.txt
  • 29.
    29 Going Forward -More Features ▪ Implement more Spark-ML model types - SVM - Naive Bayes Classifier - Neural Networks ▪ Integration with Redis’ native types ▪ Data Processing (e.g. Word2Vec, TF-IDF) ▪ PMML Support
  • 30.
    30 PS: Neural Redis ▪Developed by Salvatore ▪ Training is done inside redis ▪ Online continuous training process ▪ Builds Fully Connected NNs
  • 31.
  • 32.