1
David Morin
Big Data Devops
@davAtBzh
Change data capture in
production
How OVH became a Data Driven Business with the help of Apache Flink
Yann Pauly
Software Engineer
@impolitepanda
2
Why did we become Data Driven ?
Desire to grow
Need to raise funds from
investors
Investors needs insurances and
numbers about the business
OVH needs to produce
reliable financial KPIs to
answer regularly
Drive the business with said KPIs
3
How did we become Data Driven?
1999
Only one database for
most products
4
200+ databases
15K+ tables
10M+ events/day
How did we become Data Driven?
5
How did we become Data Driven?
6
Our ingestion pipeline
Databases
7
How did we become Data Driven?
8
Where are all our data stored?
9
Our ingestion pipeline
Databases HDFS
10
How are our data Extracted?
11
Our ingestion pipeline
Data
Collector
Databases HDFS
12
How are our data Transformed and Loaded?
13
How is Flink integrated in our pipeline?
14
How did we customize our storage?
15
How was Flink integrated with Kerberos?
Principal identity + credentials
TGT + service name
Ticket service
Ticket service
Ticket: TGT
16
How was Flink integrated with Kerberos?
Principal identity + credentials
TGT + service name
Ticket service
Ticket service
Ticket: TGT
Expiration !
17
How was Flink integrated with Kerberos?
Get keytab
keytab
TGT + service name
Ticket service
Ticket service
18
How was Flink integrated with Kerberos?
TGT
keytab
TGT + service name
Ticket service
Ticket service
19
How was Flink integrated with Kerberos?
TGT
keytab
Reference keytab in
Flink config
TGT + service name
Ticket service
Ticket service
20
Our ingestion pipeline
Data
Collector
Databases Flink HDFS
21
How do we analyze our data?
22
How are our data ingested by Hive?
{ api }CLI
JDBC Files
23
How are our data ingested by Hive?
{ api } JDBC Parquet ORC
Streaming
Performance
Transaction
Future proof
24
How are our data ingested by Hive?
25
How are our data ingested by Hive?
{ api } JDBC Parquet ORC
Streaming
Performance
Transaction
Future proof
26
How are our data ingested by Hive?Row Data
Stripe Footer
File Footer
Index Data
Row Data
Stripe Footer
Index Data
Row Data
Stripe Footer
Index Data
Row Data
Stripe Footer
Index Data
Row Data
Stripe Footer
Index Data
Postscript
Column 1
Column 2
Column 3
Column X
Column 1
Column 1
Column 3
Column X
27
Our ingestion pipeline
Data
Collector
Databases Flink HiveHDFS
28
Our Data Sources
MySQL ORC
29
PG
Our Data Sources – Not only MySQL
MySQL ORC
30
Our Data Sources – multiple sources / sinks
PG
MySQL ORC
?
31
Our Data Sources – multiple sources / sinks
PG
MySQL ORC
?
?
32
What’s the problem with that ?
PostgreSQL MySQL SQL Server Oracle ORC (Hive)
Boolean Boolean TinyInt(1) Bit Number(1) 0/1 Boolean
Float Real/Float Float Float Float Float
DateTime
Timestamp (no
timezone)
DateTime DateTime2 TimeStamp TimeStamp
Blob ByteA Blob Binary(n) Blob Binary
33
Introducing: a Pivot format
DBMS XMySQL
Oracle
MongoDB
Postgre
SQL
DBMS
X
SQL
Server
Postgre
SQL
SQL
Server
HIVE
MySQL
Pivot
34
{api}
Data Collector gets source
RAW schema
Our API saves it in our
backend
We manually launch our
converter job
Converter runs and generate
pivot schema and DDL for
target sink
Generated pivot schema and
target DDL are stored on
HDFS
01
02
03
05How do we generate Pivot
schemas?
04
35
Pivot format – Additional conversion data
Update
Insert
Delete
36
Our ingestion pipeline
Data
Collector
Databases Flink
Data
Collector
Databases
Consume
37
How do we push our data into Kafka?
1 kafka topic = N partitions
What about table name as kafka partition key ?
Area table
Ordering preserved by partition
Commands table Bad distribution !
38
Is round-robin the solution?
What about event order ?
Any table (partition 1)
Any table (partition 2)
Any table (partition 3)
39
Our ingestion pipeline
Data
Collector
Databases Flink
Data
Collector
Databases
Consume Sort
40
How do we maintain event order? Watermarks!
Watermark to mark the progress of event time
Watermark based on event timestamp
41
Our ingestion pipeline
Data
Collector
Databases Flink
Data
Collector
Databases
Consume Sort Filter
42
Our ingestion pipeline
Data
Collector
Databases Flink
Data
Collector
Databases
Consume Sort Filter Map Convert
43
Event conversion
Table schema
extraction
Event readJob spawns
How do we convert our events to pivot format?
The Flink job
retrieves it’s DB
pivot schema
from HDFS
The current
event’s table pivot
schema is
extracted from
the database
schema
Each event
corresponds to
one table only
Event is converted
from it’s source
format (RAW) to
the pivot format
44
Our ingestion pipeline
Data
Collector
Databases Flink
Data
Collector
Databases
Consume Sort Filter Map Convert Aggregate Store
45
Last steps: Windowing and sink
Custom window function based on size and duration
converted events
window aggregation
ORC files
conversion
46
Our ingestion pipeline
Data
Collector
Databases Flink
Data
Collector
Databases
Consume Sort Filter Map Convert Aggregate Store
47
Why do we need checkpoints?
Commit
48
How do we manage these anomalies?
Error: Cannot be converted to ORC !
Side output
Write to HDFS (data + error)
Push Metric
Alerting
49
How do we monitor our pipeline execution?
Prometheus Push Gateway
Reporter
Prometheus Push Gateway OVH Metrics Data Platform
50
How do we monitor our pipeline execution?
51
Why don’t we see any data in Hive?
52
Why don’t we see any data in Hive?
Hive Managed
tables (ACID)
ORC Files Table 1
ORC Files Table 2
External Tables
ORC Files Table 3
ORC Files Table 4
ORC Files Table X
ORC Files Table 1
ORC Files Table 2
ORC Files Table 3
ORC Files Table 4
ORC Files Table X
53
How do we see our data in Hive?
Hive
Hive Managed
tables
ORC Files Table 1
ORC Files Table 2
External Tables
ORC Files Table 3
ORC Files Table 4
ORC Files Table X
ORC Files Table 1
ORC Files Table 2
ORC Files Table 3
ORC Files Table 4
ORC Files Table X
SQL query to merge data
54
Why isn’t this the best solution?
55
What’s a better solution? ORC + Hive Metadata!
SELECT * FROM mytab;
mytab
id value
1 test
56
What’s a better solution? ORC + Hive Metadata!
SELECT row__id,* FROM mytab;
row__id
row
id value
row__id: {"transactionid":10,"bucketid":1,"rowid":0} 1 test
57
What’s a better solution? ORC Delta File + Hive Metadata!
INSERT INTO `mytab` VALUES(1,'test');
transaction 1 created
delta_0000001_0000001_0000/bucket_00001
{"operation":0,"originalTransaction":1,"bucket":1,"rowId":0,"currentTransaction":1,"row":{"_col0":1,"_col1":"test"}}
DELETE FROM `mytab` WHERE id=1;
transaction 2 created
delta_0000002_0000002_0000/bucket_00001
{"operation":2,"originalTransaction":1,"bucket":1,"rowId":0,"currentTransaction":2,"row":null}
UPDATE = DELETE + INSERT
58
What’s a better solution? ORC Delta File + Hive Metadata!
{"operation":2,"originalTransaction":1,"bucket":1,"rowId":0,"currentTransaction":2,"row":null}
Keep track of several metadatas
HiveMeta
id
pkValue
operation
originalTxId
bucketId
timestamp
59
Flink State
What’s a better solution? ORC Delta File + Hive Metadata!
{"operation":2,"originalTransaction":1,"bucket":1,"rowId":0,"currentTransaction":2,"row":null}
HiveMeta
id
pkValue
operation
originalTxId
bucketId
timestamp
test.mytab
1
2
1
1
1569362895...
Keep track of several metadatas
60
How do we store our Flink state?
Local
Scalable
Incremental
RocksDB
61
Our Flink usage: a summary
CheckpointsSide output
WindowingWatermarks
RocksDB
state
Metrics
62
Our Flink usage: some numbers…
3+ Billion
Rows
2500+
Synced tables
Up to 300 Millions
Query events per dump
200+
Flink containers on Yarn
100+
Flink jobs
10+ Millions
Streaming events per day
63
What’s next?
Hive 3?
Multiple other sinks
Automate all remaining manual
processes
Make it Open Source?
Rule engine to anonymize data
and perform more complex
transformations
Real-time data merging
64
Questions ?
Hive 3?
Multiple other sinks
Automate all remaining manual
processes
Make it Open Source?
Rule engine to anonymize data
and perform more complex
transformations
Real-time data merging

OVH-Change Data Capture in production with Apache Flink - Meetup Rennes 2019-09-26

  • 1.
    1 David Morin Big DataDevops @davAtBzh Change data capture in production How OVH became a Data Driven Business with the help of Apache Flink Yann Pauly Software Engineer @impolitepanda
  • 2.
    2 Why did webecome Data Driven ? Desire to grow Need to raise funds from investors Investors needs insurances and numbers about the business OVH needs to produce reliable financial KPIs to answer regularly Drive the business with said KPIs
  • 3.
    3 How did webecome Data Driven? 1999 Only one database for most products
  • 4.
    4 200+ databases 15K+ tables 10M+events/day How did we become Data Driven?
  • 5.
    5 How did webecome Data Driven?
  • 6.
  • 7.
    7 How did webecome Data Driven?
  • 8.
    8 Where are allour data stored?
  • 9.
  • 10.
    10 How are ourdata Extracted?
  • 11.
  • 12.
    12 How are ourdata Transformed and Loaded?
  • 13.
    13 How is Flinkintegrated in our pipeline?
  • 14.
    14 How did wecustomize our storage?
  • 15.
    15 How was Flinkintegrated with Kerberos? Principal identity + credentials TGT + service name Ticket service Ticket service Ticket: TGT
  • 16.
    16 How was Flinkintegrated with Kerberos? Principal identity + credentials TGT + service name Ticket service Ticket service Ticket: TGT Expiration !
  • 17.
    17 How was Flinkintegrated with Kerberos? Get keytab keytab TGT + service name Ticket service Ticket service
  • 18.
    18 How was Flinkintegrated with Kerberos? TGT keytab TGT + service name Ticket service Ticket service
  • 19.
    19 How was Flinkintegrated with Kerberos? TGT keytab Reference keytab in Flink config TGT + service name Ticket service Ticket service
  • 20.
  • 21.
    21 How do weanalyze our data?
  • 22.
    22 How are ourdata ingested by Hive? { api }CLI JDBC Files
  • 23.
    23 How are ourdata ingested by Hive? { api } JDBC Parquet ORC Streaming Performance Transaction Future proof
  • 24.
    24 How are ourdata ingested by Hive?
  • 25.
    25 How are ourdata ingested by Hive? { api } JDBC Parquet ORC Streaming Performance Transaction Future proof
  • 26.
    26 How are ourdata ingested by Hive?Row Data Stripe Footer File Footer Index Data Row Data Stripe Footer Index Data Row Data Stripe Footer Index Data Row Data Stripe Footer Index Data Row Data Stripe Footer Index Data Postscript Column 1 Column 2 Column 3 Column X Column 1 Column 1 Column 3 Column X
  • 27.
  • 28.
  • 29.
    29 PG Our Data Sources– Not only MySQL MySQL ORC
  • 30.
    30 Our Data Sources– multiple sources / sinks PG MySQL ORC ?
  • 31.
    31 Our Data Sources– multiple sources / sinks PG MySQL ORC ? ?
  • 32.
    32 What’s the problemwith that ? PostgreSQL MySQL SQL Server Oracle ORC (Hive) Boolean Boolean TinyInt(1) Bit Number(1) 0/1 Boolean Float Real/Float Float Float Float Float DateTime Timestamp (no timezone) DateTime DateTime2 TimeStamp TimeStamp Blob ByteA Blob Binary(n) Blob Binary
  • 33.
    33 Introducing: a Pivotformat DBMS XMySQL Oracle MongoDB Postgre SQL DBMS X SQL Server Postgre SQL SQL Server HIVE MySQL Pivot
  • 34.
    34 {api} Data Collector getssource RAW schema Our API saves it in our backend We manually launch our converter job Converter runs and generate pivot schema and DDL for target sink Generated pivot schema and target DDL are stored on HDFS 01 02 03 05How do we generate Pivot schemas? 04
  • 35.
    35 Pivot format –Additional conversion data Update Insert Delete
  • 36.
    36 Our ingestion pipeline Data Collector DatabasesFlink Data Collector Databases Consume
  • 37.
    37 How do wepush our data into Kafka? 1 kafka topic = N partitions What about table name as kafka partition key ? Area table Ordering preserved by partition Commands table Bad distribution !
  • 38.
    38 Is round-robin thesolution? What about event order ? Any table (partition 1) Any table (partition 2) Any table (partition 3)
  • 39.
    39 Our ingestion pipeline Data Collector DatabasesFlink Data Collector Databases Consume Sort
  • 40.
    40 How do wemaintain event order? Watermarks! Watermark to mark the progress of event time Watermark based on event timestamp
  • 41.
    41 Our ingestion pipeline Data Collector DatabasesFlink Data Collector Databases Consume Sort Filter
  • 42.
    42 Our ingestion pipeline Data Collector DatabasesFlink Data Collector Databases Consume Sort Filter Map Convert
  • 43.
    43 Event conversion Table schema extraction EventreadJob spawns How do we convert our events to pivot format? The Flink job retrieves it’s DB pivot schema from HDFS The current event’s table pivot schema is extracted from the database schema Each event corresponds to one table only Event is converted from it’s source format (RAW) to the pivot format
  • 44.
    44 Our ingestion pipeline Data Collector DatabasesFlink Data Collector Databases Consume Sort Filter Map Convert Aggregate Store
  • 45.
    45 Last steps: Windowingand sink Custom window function based on size and duration converted events window aggregation ORC files conversion
  • 46.
    46 Our ingestion pipeline Data Collector DatabasesFlink Data Collector Databases Consume Sort Filter Map Convert Aggregate Store
  • 47.
    47 Why do weneed checkpoints? Commit
  • 48.
    48 How do wemanage these anomalies? Error: Cannot be converted to ORC ! Side output Write to HDFS (data + error) Push Metric Alerting
  • 49.
    49 How do wemonitor our pipeline execution? Prometheus Push Gateway Reporter Prometheus Push Gateway OVH Metrics Data Platform
  • 50.
    50 How do wemonitor our pipeline execution?
  • 51.
    51 Why don’t wesee any data in Hive?
  • 52.
    52 Why don’t wesee any data in Hive? Hive Managed tables (ACID) ORC Files Table 1 ORC Files Table 2 External Tables ORC Files Table 3 ORC Files Table 4 ORC Files Table X ORC Files Table 1 ORC Files Table 2 ORC Files Table 3 ORC Files Table 4 ORC Files Table X
  • 53.
    53 How do wesee our data in Hive? Hive Hive Managed tables ORC Files Table 1 ORC Files Table 2 External Tables ORC Files Table 3 ORC Files Table 4 ORC Files Table X ORC Files Table 1 ORC Files Table 2 ORC Files Table 3 ORC Files Table 4 ORC Files Table X SQL query to merge data
  • 54.
    54 Why isn’t thisthe best solution?
  • 55.
    55 What’s a bettersolution? ORC + Hive Metadata! SELECT * FROM mytab; mytab id value 1 test
  • 56.
    56 What’s a bettersolution? ORC + Hive Metadata! SELECT row__id,* FROM mytab; row__id row id value row__id: {"transactionid":10,"bucketid":1,"rowid":0} 1 test
  • 57.
    57 What’s a bettersolution? ORC Delta File + Hive Metadata! INSERT INTO `mytab` VALUES(1,'test'); transaction 1 created delta_0000001_0000001_0000/bucket_00001 {"operation":0,"originalTransaction":1,"bucket":1,"rowId":0,"currentTransaction":1,"row":{"_col0":1,"_col1":"test"}} DELETE FROM `mytab` WHERE id=1; transaction 2 created delta_0000002_0000002_0000/bucket_00001 {"operation":2,"originalTransaction":1,"bucket":1,"rowId":0,"currentTransaction":2,"row":null} UPDATE = DELETE + INSERT
  • 58.
    58 What’s a bettersolution? ORC Delta File + Hive Metadata! {"operation":2,"originalTransaction":1,"bucket":1,"rowId":0,"currentTransaction":2,"row":null} Keep track of several metadatas HiveMeta id pkValue operation originalTxId bucketId timestamp
  • 59.
    59 Flink State What’s abetter solution? ORC Delta File + Hive Metadata! {"operation":2,"originalTransaction":1,"bucket":1,"rowId":0,"currentTransaction":2,"row":null} HiveMeta id pkValue operation originalTxId bucketId timestamp test.mytab 1 2 1 1 1569362895... Keep track of several metadatas
  • 60.
    60 How do westore our Flink state? Local Scalable Incremental RocksDB
  • 61.
    61 Our Flink usage:a summary CheckpointsSide output WindowingWatermarks RocksDB state Metrics
  • 62.
    62 Our Flink usage:some numbers… 3+ Billion Rows 2500+ Synced tables Up to 300 Millions Query events per dump 200+ Flink containers on Yarn 100+ Flink jobs 10+ Millions Streaming events per day
  • 63.
    63 What’s next? Hive 3? Multipleother sinks Automate all remaining manual processes Make it Open Source? Rule engine to anonymize data and perform more complex transformations Real-time data merging
  • 64.
    64 Questions ? Hive 3? Multipleother sinks Automate all remaining manual processes Make it Open Source? Rule engine to anonymize data and perform more complex transformations Real-time data merging