LINE
, LINE LINE 1
About me
● : (Shunsuke Nakamura)
● : LINE LINE 1 7
● : Redis team lead, messaging server tech lead
●
● : LINE HBase scalability
● :
● Storage infrastructure using HBase behind LINE messages 

https://www.slideshare.net/naverjapan/storage-infrastructure-using-hbase-behind-line-
messages
● HBase Redis 100 / LINE 

https://www.slideshare.net/linecorp/a-5-47983106
Redis Talk for LINE messaging
●
● RedisConf18 @ San Francisco
● Redis at LINE, 25 Billion Messages Per Day

https://www.slideshare.net/RedisLabs/redisconf18-redis-at-
line-25-billion-messages-per-day (open today!)
● Tech Planet 2015 @ Korea
● LINE Redis Cluster 

https://www.slideshare.net/lovejinstar/redis-at-line-tech-
planet-2015
●
Agenda
● How LINE messaging uses Redis
● Redis details for LINE messaging
● Challenge to Redis3.2 official cluster
● Redis hotspots daily handling
● 2011 LINE messaging Redis
● messaging in-memory
● Redis 3 x2 (master/slave) client side sharding
Redis for LINE messaging (As of 2011.6)
● Redis (As of 2018.5)
● 60 Redis clusters
● 2 ~ 2000 master+slave nodes/cluster
● 14,000 Redis nodes
● 4 commands / sec
● 2 keys, 60 TB memory used
Redis for LINE messaging (2011 → 2018)
… …… …
x60
● sequences
● user/group/message sequence ID
● event revision
● caches
● message event TTL time series data
● immutable read heavy data
LINE is powered by Redis
● storages
● secondary index (ex. follower list)
● CAS (ex. unread badge count)
● local queue in each API servers
● API server async task ( IO )
● IO RPC context
1.App. calls sendMessage thrift API
2.Acquire messageId from Redis sequence
3.Store Message, Event into Redis caches
1.Store into HBase storage
2.Enqueue task to local Redis queue if failed
4.Check receiver info with Redis storages
5.Deliver and notify event to receivers
How messaging
uses Redis
G a t e w a y
A P I s e r v e r s
…
queue
… …
… … …
1. sendMessage
sequence
caches
storages
2.acquire 

messageId
3.store
Message
4. check

receiver info
5. notify
Redis server at LINE
● Redis version 2.8 ~ 3.2
●
● Multiple nodes per host
● 10 Gbps network
● Redis node NIC interrupt CPU affinity 

● Redis node
● standalone : master slave 1 1, host role slave
● disk less : No BGSAVE backup, No AOF, No Virtual Memory
● non HA : No Sentinel, No slave read
In-house cluster at LINE
● 3.0 official cluster (2011~2012)
● Proxy-less client side sharding
● ZooKeeper: shard
● Cluster Manager Server:
● LINE Redis Client
● ZooKeeper
● Redis key hash (MurmurHash3) 

shard
● shard master Redis command
Redis client at LINE
● Jedis (Sync) or Lettuce (Async) Java client
● commands ser/des template
● Redis command client side metrics
● Availability
● Back pressure: ZooKeeper
● Circuit breaker: fail fast
● read Replicated cluster
Replicated clusters
● replica Redis cluster cache read
● client cluster random 

origin storage fallback 

(read-through cache)

● : LINE official account (followers1 ) 

account
Recent works in LINE Redis team
● Redis3.2 Official cluster
● Async Redis client
● Redis
Challenge to Redis3.2 official cluster
● : In-house cluster 7 ...
● dynamic resizing
● cluster / 2 cluster migration
● Redis OSS
● REAL cache cluster 3.2 cluster
● in-house cluster client Jedis Cluster client
● REAL service crush test
● 

=> Redis3.2 cluster
how to resize in-house cluster
3 issues of Redis3.2 cluster for LINE
● Gossip traffic 1,000 nodes
● Redis official document 

“High performance and linear scalability up to 1,000 nodes.”

https://redis.io/topics/cluster-spec
● => LINE in-house cluster = 1,400 master nodes
● gossip issue: https://github.com/antirez/redis/issues/3929
● PSYNC1 [#2]
● => Redis
● clustering memory overhead [#3]
● => memory bound cluster
Redis3.2 cluster - Issue #2 PSYNC1
● H/W decommission
● master decommission (A → B)
● CLUSTER FAILOVER command master/slave 

=> master (B) PSYNC RDB Full sync !!
● PSYNC1 : master instance PSYNC
● https://gist.github.com/antirez/ae068f95c0d084891305
● https://github.com/antirez/redis/issues/2683
● workaround
● slots CLUSTER FAILOVER
●
● Redis4 PSYNC2
Redis3.2 cluster 

- Issue #3 huge memory overhead
● slots → keys mapping 1 ZSET in-memory
● https://github.com/antirez/redis/blob/3.2.11/src/cluster.c#L469
● CLUSTER GETKEYSINSLOT command
● key
● https://github.com/antirez/redis/issues/3800
● Redis4 RAX
● https://github.com/antirez/rax (like Radix-tree)
● 40% (11.69GB → 9.42GB)
● 60% overhead
Redis hotspots daily handling
Slow command
OPS bursting
Connection bursting
Slow command
● single thread Redis slow command
● monitoring system 10ms slow command alert
● => Hash / (Z)SET
● O(N) heavy command
● HBase Cassandra
● LINE 1 element bigkeys ;;;
● => SCAN command
● blocking iteration
SMEMBERS SSCAN
ops bursting in the old days
OPS 2.6 Million /min
spike per-min metrics
CPU 98%
OPS and connection bursting
● Per-sec Cluster Monitoring and auto bursting detection
● metrics collector: Akka cluster nodes
● store: ElasticSearch
● view: Kibana + Grafana
● ops/connections MONITOR command
● command client server IP ElasticSearch
● command dedup, Lua script, local cache IO
collect
store/view
Future works
● Redis4 cluster
● Async client connections
● message service latency API server Redis proxy
● API server connection Pool + Sync client Redis node connection
● latency
● scalability bottleneck Redis storage
● redis-server 1 100k+ OPS/s
● 3 ~ 4
● Redis lock-in
THANK YOU

Redis at LINE

  • 1.
  • 2.
    About me ● :(Shunsuke Nakamura) ● : LINE LINE 1 7 ● : Redis team lead, messaging server tech lead ● ● : LINE HBase scalability ● : ● Storage infrastructure using HBase behind LINE messages 
 https://www.slideshare.net/naverjapan/storage-infrastructure-using-hbase-behind-line- messages ● HBase Redis 100 / LINE 
 https://www.slideshare.net/linecorp/a-5-47983106
  • 3.
    Redis Talk forLINE messaging ● ● RedisConf18 @ San Francisco ● Redis at LINE, 25 Billion Messages Per Day
 https://www.slideshare.net/RedisLabs/redisconf18-redis-at- line-25-billion-messages-per-day (open today!) ● Tech Planet 2015 @ Korea ● LINE Redis Cluster 
 https://www.slideshare.net/lovejinstar/redis-at-line-tech- planet-2015 ●
  • 4.
    Agenda ● How LINEmessaging uses Redis ● Redis details for LINE messaging ● Challenge to Redis3.2 official cluster ● Redis hotspots daily handling
  • 5.
    ● 2011 LINEmessaging Redis ● messaging in-memory ● Redis 3 x2 (master/slave) client side sharding Redis for LINE messaging (As of 2011.6)
  • 6.
    ● Redis (Asof 2018.5) ● 60 Redis clusters ● 2 ~ 2000 master+slave nodes/cluster ● 14,000 Redis nodes ● 4 commands / sec ● 2 keys, 60 TB memory used Redis for LINE messaging (2011 → 2018) … …… … x60
  • 7.
    ● sequences ● user/group/messagesequence ID ● event revision ● caches ● message event TTL time series data ● immutable read heavy data LINE is powered by Redis ● storages ● secondary index (ex. follower list) ● CAS (ex. unread badge count) ● local queue in each API servers ● API server async task ( IO ) ● IO RPC context
  • 8.
    1.App. calls sendMessagethrift API 2.Acquire messageId from Redis sequence 3.Store Message, Event into Redis caches 1.Store into HBase storage 2.Enqueue task to local Redis queue if failed 4.Check receiver info with Redis storages 5.Deliver and notify event to receivers How messaging uses Redis G a t e w a y A P I s e r v e r s … queue … … … … … 1. sendMessage sequence caches storages 2.acquire 
 messageId 3.store Message 4. check
 receiver info 5. notify
  • 9.
    Redis server atLINE ● Redis version 2.8 ~ 3.2 ● ● Multiple nodes per host ● 10 Gbps network ● Redis node NIC interrupt CPU affinity 
 ● Redis node ● standalone : master slave 1 1, host role slave ● disk less : No BGSAVE backup, No AOF, No Virtual Memory ● non HA : No Sentinel, No slave read
  • 10.
    In-house cluster atLINE ● 3.0 official cluster (2011~2012) ● Proxy-less client side sharding ● ZooKeeper: shard ● Cluster Manager Server: ● LINE Redis Client ● ZooKeeper ● Redis key hash (MurmurHash3) 
 shard ● shard master Redis command
  • 11.
    Redis client atLINE ● Jedis (Sync) or Lettuce (Async) Java client ● commands ser/des template ● Redis command client side metrics ● Availability ● Back pressure: ZooKeeper ● Circuit breaker: fail fast ● read Replicated cluster
  • 12.
    Replicated clusters ● replicaRedis cluster cache read ● client cluster random 
 origin storage fallback 
 (read-through cache)
 ● : LINE official account (followers1 ) 
 account
  • 13.
    Recent works inLINE Redis team ● Redis3.2 Official cluster ● Async Redis client ● Redis
  • 14.
    Challenge to Redis3.2official cluster ● : In-house cluster 7 ... ● dynamic resizing ● cluster / 2 cluster migration ● Redis OSS ● REAL cache cluster 3.2 cluster ● in-house cluster client Jedis Cluster client ● REAL service crush test ● 
 => Redis3.2 cluster how to resize in-house cluster
  • 15.
    3 issues ofRedis3.2 cluster for LINE ● Gossip traffic 1,000 nodes ● Redis official document 
 “High performance and linear scalability up to 1,000 nodes.”
 https://redis.io/topics/cluster-spec ● => LINE in-house cluster = 1,400 master nodes ● gossip issue: https://github.com/antirez/redis/issues/3929 ● PSYNC1 [#2] ● => Redis ● clustering memory overhead [#3] ● => memory bound cluster
  • 16.
    Redis3.2 cluster -Issue #2 PSYNC1 ● H/W decommission ● master decommission (A → B) ● CLUSTER FAILOVER command master/slave 
 => master (B) PSYNC RDB Full sync !! ● PSYNC1 : master instance PSYNC ● https://gist.github.com/antirez/ae068f95c0d084891305 ● https://github.com/antirez/redis/issues/2683 ● workaround ● slots CLUSTER FAILOVER ● ● Redis4 PSYNC2
  • 17.
    Redis3.2 cluster 
 -Issue #3 huge memory overhead ● slots → keys mapping 1 ZSET in-memory ● https://github.com/antirez/redis/blob/3.2.11/src/cluster.c#L469 ● CLUSTER GETKEYSINSLOT command ● key ● https://github.com/antirez/redis/issues/3800 ● Redis4 RAX ● https://github.com/antirez/rax (like Radix-tree) ● 40% (11.69GB → 9.42GB) ● 60% overhead
  • 18.
    Redis hotspots dailyhandling Slow command OPS bursting Connection bursting
  • 19.
    Slow command ● singlethread Redis slow command ● monitoring system 10ms slow command alert ● => Hash / (Z)SET ● O(N) heavy command ● HBase Cassandra ● LINE 1 element bigkeys ;;; ● => SCAN command ● blocking iteration SMEMBERS SSCAN
  • 20.
    ops bursting inthe old days OPS 2.6 Million /min spike per-min metrics CPU 98%
  • 21.
    OPS and connectionbursting ● Per-sec Cluster Monitoring and auto bursting detection ● metrics collector: Akka cluster nodes ● store: ElasticSearch ● view: Kibana + Grafana ● ops/connections MONITOR command ● command client server IP ElasticSearch ● command dedup, Lua script, local cache IO collect store/view
  • 22.
    Future works ● Redis4cluster ● Async client connections ● message service latency API server Redis proxy ● API server connection Pool + Sync client Redis node connection ● latency ● scalability bottleneck Redis storage ● redis-server 1 100k+ OPS/s ● 3 ~ 4 ● Redis lock-in
  • 23.