Redis at LINE

About me
● : (Shunsuke Nakamura)
● : LINE LINE 1 7
● : Redis team lead, messaging server tech lead
●
● : LINE HBase scalability
● :
● Storage infrastructure using HBase behind LINE messages  
https://www.slideshare.net/naverjapan/storage-infrastructure-using-hbase-behind-line-
messages
● HBase Redis 100 / LINE  
https://www.slideshare.net/linecorp/a-5-47983106

Redis Talk for LINE messaging
●
● RedisConf18 @ San Francisco
● Redis at LINE, 25 Billion Messages Per Day 
https://www.slideshare.net/RedisLabs/redisconf18-redis-at-
line-25-billion-messages-per-day (open today!)
● Tech Planet 2015 @ Korea
● LINE Redis Cluster  
https://www.slideshare.net/lovejinstar/redis-at-line-tech-
planet-2015
●

Agenda
● How LINE messaging uses Redis
● Redis details for LINE messaging
● Challenge to Redis3.2 official cluster
● Redis hotspots daily handling

● 2011 LINE messaging Redis
● messaging in-memory
● Redis 3 x2 (master/slave) client side sharding
Redis for LINE messaging (As of 2011.6)

● Redis (As of 2018.5)
● 60 Redis clusters
● 2 ~ 2000 master+slave nodes/cluster
● 14,000 Redis nodes
● 4 commands / sec
● 2 keys, 60 TB memory used
Redis for LINE messaging (2011 → 2018)
… …… …
x60

● sequences
● user/group/message sequence ID
● event revision
● caches
● message event TTL time series data
● immutable read heavy data
LINE is powered by Redis
● storages
● secondary index (ex. follower list)
● CAS (ex. unread badge count)
● local queue in each API servers
● API server async task ( IO )
● IO RPC context

1.App. calls sendMessage thrift API
2.Acquire messageId from Redis sequence
3.Store Message, Event into Redis caches
1.Store into HBase storage
2.Enqueue task to local Redis queue if failed
4.Check receiver info with Redis storages
5.Deliver and notify event to receivers
How messaging
uses Redis
G a t e w a y
A P I s e r v e r s
…
queue
… …
… … …
1. sendMessage
sequence
caches
storages
2.acquire  
messageId
3.store
Message
4. check 
receiver info
5. notify

Redis server at LINE
● Redis version 2.8 ~ 3.2
●
● Multiple nodes per host
● 10 Gbps network
● Redis node NIC interrupt CPU affinity  
● Redis node
● standalone : master slave 1 1, host role slave
● disk less : No BGSAVE backup, No AOF, No Virtual Memory
● non HA : No Sentinel, No slave read

In-house cluster at LINE
● 3.0 official cluster (2011~2012)
● Proxy-less client side sharding
● ZooKeeper: shard
● Cluster Manager Server:
● LINE Redis Client
● ZooKeeper
● Redis key hash (MurmurHash3)  
shard
● shard master Redis command

Redis client at LINE
● Jedis (Sync) or Lettuce (Async) Java client
● commands ser/des template
● Redis command client side metrics
● Availability
● Back pressure: ZooKeeper
● Circuit breaker: fail fast
● read Replicated cluster

Replicated clusters
● replica Redis cluster cache read
● client cluster random  
origin storage fallback  
(read-through cache) 
● : LINE official account (followers1 )  
account

Recent works in LINE Redis team
● Redis3.2 Official cluster
● Async Redis client
● Redis

Challenge to Redis3.2 official cluster
● : In-house cluster 7 ...
● dynamic resizing
● cluster / 2 cluster migration
● Redis OSS
● REAL cache cluster 3.2 cluster
● in-house cluster client Jedis Cluster client
● REAL service crush test
●  
=> Redis3.2 cluster
how to resize in-house cluster

3 issues of Redis3.2 cluster for LINE
● Gossip traffic 1,000 nodes
● Redis official document  
“High performance and linear scalability up to 1,000 nodes.” 
https://redis.io/topics/cluster-spec
● => LINE in-house cluster = 1,400 master nodes
● gossip issue: https://github.com/antirez/redis/issues/3929
● PSYNC1 [#2]
● => Redis
● clustering memory overhead [#3]
● => memory bound cluster

Redis3.2 cluster - Issue #2 PSYNC1
● H/W decommission
● master decommission (A → B)
● CLUSTER FAILOVER command master/slave  
=> master (B) PSYNC RDB Full sync !!
● PSYNC1 : master instance PSYNC
● https://gist.github.com/antirez/ae068f95c0d084891305
● https://github.com/antirez/redis/issues/2683
● workaround
● slots CLUSTER FAILOVER
●
● Redis4 PSYNC2

Redis3.2 cluster  
- Issue #3 huge memory overhead
● slots → keys mapping 1 ZSET in-memory
● https://github.com/antirez/redis/blob/3.2.11/src/cluster.c#L469
● CLUSTER GETKEYSINSLOT command
● key
● https://github.com/antirez/redis/issues/3800
● Redis4 RAX
● https://github.com/antirez/rax (like Radix-tree)
● 40% (11.69GB → 9.42GB)
● 60% overhead

Redis hotspots daily handling
Slow command
OPS bursting
Connection bursting

Slow command
● single thread Redis slow command
● monitoring system 10ms slow command alert
● => Hash / (Z)SET
● O(N) heavy command
● HBase Cassandra
● LINE 1 element bigkeys ;;;
● => SCAN command
● blocking iteration
SMEMBERS SSCAN

ops bursting in the old days
OPS 2.6 Million /min
spike per-min metrics
CPU 98%

OPS and connection bursting
● Per-sec Cluster Monitoring and auto bursting detection
● metrics collector: Akka cluster nodes
● store: ElasticSearch
● view: Kibana + Grafana
● ops/connections MONITOR command
● command client server IP ElasticSearch
● command dedup, Lua script, local cache IO
collect
store/view

Future works
● Redis4 cluster
● Async client connections
● message service latency API server Redis proxy
● API server connection Pool + Sync client Redis node connection
● latency
● scalability bottleneck Redis storage
● redis-server 1 100k+ OPS/s
● 3 ~ 4
● Redis lock-in

Redis at LINE

In this document

More Related Content

What's hot

Similar to Redis at LINE

More from LINE Corporation

Recently uploaded

Redis at LINE