How we centralized data
into a data lake for
analytics
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Guest Speakers
Arvind Rajagopalan
Director – Global Technology Services – Verizon
Jordan Martz
Director of Technology Solutions - Attunity
© Verizon 2017 All Rights Reserved. Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
We are Verizon.
Verizon delivers the promise of the digital world.
• Fortune 500
rank: #14
• $29.8 billion in first-quarter revenue (2017)
• 161,000 employees
For first-quarter 2017:
LTE covers 98% of U.S. population
113.9 M total retail connections
LTE Advanced covers 466 markets
Largest all-fiber Fios network
5.7 M Fios internet and 4.7 M Fios video
connections
500 mbps upload and download speeds
Global IP network
99% of Fortune 500 customers
Products and solutions
Innovating in entertainment, digital
media, the Internet of Things and broadband
service
© Verizon 2017 All Rights Reserved. Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
‘Be Prepared’ – build architecture so you can:
Analyze
Everything
Analyze
Anywhere
Analyze in
Real-Time
• 100’s to 1000’s of Data Sources
• Business & Machine Data
• On-premise or in the Cloud
• In DB, DW, Hadoop, In-Memory, etc.
• Capture new, changing data
• Process/stream in motion
© Verizon 2017 All Rights Reserved. Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
Paradigm Shift: App-Centric  Data-Centric
DATA-CENTRIC
Central
Data
Lake
App1 App 2 App 3 App 4 App 5 App 6
APP-CENTRIC
Limitations:
• Multiple copies of data
• Difficult cross-system
integration
• Limit on Data volumes
Advantages:
• One version of the data
• No need for cross-app
integration
• System scales linearly
© Verizon 2017 All Rights Reserved. Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
Migrating to Hadoop – Types & Use Cases
•Analyze data where it resides
•Exploit Fault-tolerant, High-Performance Platforms for varying
workloads
•Push analytics to the front line
ETL-Offload
•Enable ELT Offload while reducing cost
•Enable new forms and sources of data
Self Service
•Schema on Read
•Transform and Model in place
Data Reservoir Exploratory Lake Analytical Lake
Active Archive Integrate & Converge Analytics & Data
•Carry all History
•Expand Depth and Breadth of DW
•Expand Variety of Data
© Verizon 2017 All Rights Reserved. Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
Architecture in Motion  Adaptable Architectures
Data In Motion  Enabling Real Time
Scale Matters  Reduce Impact, Increase Efficiency
Breadth Matters  Sources, targets, and in between
Depth Matters  When the going gets tough…
Traceability  Data Lineage
Data Ingestion for Real-Time Analytics
© Verizon 2017 All Rights Reserved. Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
8
Data Ingestion – Enhancement
✓ Data Ingestion with CDC
✓ Ingest Data directly to Hadoop
✓ Simplified Architecture (fewer hops, points of
failure)
➢ Data consistency with time-based partitions
➢ Operational visibility with granular change
tracking
➢ Automated data integration on Apache Hive
ERP3
FINANCE DATA LAKE
ERP2
ERP1ERP(SOURCE)
Lambda Architecture
Attunity
Replicate
© 2017 Attunity
Attunity Corporate Overview
Data Integration & Big Data Management Software
Accelerate data delivery and availability
Automate data readiness for analytics
Optimize data management with intelligence
▪Hadoop & Big Data
▪Databases & Data Warehouses
▪On premise & in the Cloud
Solutions Global OfficesOverview
▪2000 customers in 65 countries
▪250 people and growing
▪NASDAQ traded (ATTU)
© 2017 Attunity
Seamless integration with Hortonworks Connected Data
platforms and solutions
Hortonworks
Connection
Hortonworks Solutions
Enterprise Data
Warehouse Optimization
Cyber Security and
Threat Management
Internet of Things
and Streaming Analytics
Hortonworks Connection
Subscription Support
SmartSense
Premier Support
Educational Services
Professional Services
Community Connection
Cloud
Hortonworks Data Cloud
AWS HDInsight
Data Center
Hortonworks Data Suite
HDFHDP
© 2017 Attunity
Real-time Data Ingest with Attunity Replicate
SOURCES
OLTP, ERP,
CRM Systems
Documents,
Emails
Web Logs,
Click Streams
Social
Networks
Machine
Generated
Sensor
Data
Geolocation
Data
Attunity Replicate for HDP & HDF
Accelerate time-to-insights by delivering
solutions faster, with fresh data, from many
sources
- Automated data ingest
- Incremental data ingest (CDC)
- Support for multiple sources
© 2017 Attunity
Attunity Replicate architecture
Transfer
TransformFilter
Batch
CDC Incremental
In-Memory
File Channel
Batch
Hadoop
Files
RDBMS
Data Warehouse
Mainframe
Cloud
On-prem
Cloud
On-prem
Hadoop
Files
RDBMS
Data Warehouse
Kafka
Persistent Store
© 2017 Attunity*Supported under early access program
Attunity Replicate sources and targets
RDBMS
Oracle
SQL Server
DB2 iSeries
DB2 z/OS
DB2 LUW
MySQL
PostgreSQL
Sybase ASE
Informix
DW
Exadata
Teradata
Netezza
Vertica
Hortonworks
Cloudera
MapR
Hadoop
DB2 for z/OS
IMS/DB
VSAM
SQL M/P
Enscribe
RMS
HP NonStop
Mainframe
AWS RDS
Salesforce
Cloud
RDBMS
Oracle
SQL Server
DB2 LUW
MySQL
PostgreSQL
Sybase ASE
Informix
DW
Microsoft PDW
Exadata
Teradata
Netezza
Vertica
Sybase IQ
Amazon Redshift
Actian Vector
SAP HANA
Hortonworks
Cloudera
MapR
Pivotal
Amazon EMR
Hadoop
MongoDB
NoSQL
Amazon RDS
Amazon Redshift
Amazon EMR
Google Cloud SQL
Google Cloud Dataproc
Azure SQL Data
Warehouse
Azure SQL Database
Cloud
Azure Event
Hubs*
Kafka
Messaging
Targets
Sources
SAP
ECC on Oracle
ECC on SQL
ECC on DB2*
SAP
HANA
© 2017 Attunity
In Memory and File Optimized Data Transport
CDC for data-at-rest and data-in-motion
R1
R1
R2
R1
R2
R
1
R
2
Batch
CDC
Data Warehouse
Ingest-Merge
SQL
n 2 1
SQL SQL
Transactional CDC
Message
Encoded
CDC
Data Sources
Attunity Replicate – Change Processing
CDC
Many Databases
and Data
Warehouses
....
© 2017 Attunity
CDC
Data Streaming into Kafka  HDF  HDP
MSG
n 2 1
MSG MSG
Data Streaming
Transaction
logs
In memory optimized metadata
management and data transport
Bulk
Load
MSG
n 2 1
MSG MSG
Data Streaming
Message
broker
Message
broker
© 2017 Attunity
Attunity Replicate for SAP
Universal, Real-Time and Simplified Data Integration
• Replicate your SAP application data in bulk or
real-time for data analytics
▪ Documents, transactions and business data
▪ All core and industry-specific SAP modules
• Integrate real-time with all major targets
▪ DBs, data warehouses, Hadoop – cloud or on
premises
▪ Decode SAP data from complex source structures
▪ Enable business usage on common data model
• Move external data into SAP HANA
Attunity Replicate
Bulk
Load
CDC
Core and Industry-Specific
SAP Modules
RDBMS | EDW | Hadoop
On Premises or Cloud
Hadoop Data Lake
© 2017 Attunity
Attunity Replicate Server
TransformFilter
Batch
CDC Incremental
In-Memory
File Channel
Batch
Attunity Replicate
Persistent Store
Extract relationships for Pool and Cluster Tables
RDBMS
(Oracle, DB2, etc.)
Redo/
Archive
logs
or
Journal
File
---------------
-
Transparent
Tables
On Premises
Hadoop RDBMS
Data
WarehouseKafka
Cloud
Attunity Replicate Agent
for SAP
SAP ECC
(Enterprise Central
Component)
Data Model Mapping
Pool/Cluster table RFC
© Verizon 2017 All Rights Reserved. Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
Use Cases
18
• Working Capital Analytics
• Spend Analytics
• Labor Reporting
• Audit & Compliance
• Capital Reporting & Analytics
• Active Archival of legacy data
© Verizon 2017 All Rights Reserved. Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
Data Governance Considerations for Migration
MDM
Integration
Bidirectional, tagging
&
Linking tools,
which highlight the
Relationships in Data
Data
Quality
Incoming data needs
to discover
contradictions,
inconsistencies, &
redundancies
Security
Policy
Process
authentication,
authorization,
encryption,
& monitoring
Data
Masking
Access to sensitive
Data has regulatory
& additional auditing
© Verizon 2017 All Rights Reserved. Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners.
What’s Next?
Delivering real-time insights & analytics opening up new use cases:
• TCO Analysis
• Reducing Close Cycles
• Revenue Analysis
• EDW Offload

Verizon Centralizes Data into a Data Lake in Real Time for Analytics

  • 1.
    How we centralizeddata into a data lake for analytics
  • 2.
    2 © HortonworksInc. 2011 – 2017. All Rights Reserved Guest Speakers Arvind Rajagopalan Director – Global Technology Services – Verizon Jordan Martz Director of Technology Solutions - Attunity
  • 3.
    © Verizon 2017All Rights Reserved. Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. We are Verizon. Verizon delivers the promise of the digital world. • Fortune 500 rank: #14 • $29.8 billion in first-quarter revenue (2017) • 161,000 employees For first-quarter 2017: LTE covers 98% of U.S. population 113.9 M total retail connections LTE Advanced covers 466 markets Largest all-fiber Fios network 5.7 M Fios internet and 4.7 M Fios video connections 500 mbps upload and download speeds Global IP network 99% of Fortune 500 customers Products and solutions Innovating in entertainment, digital media, the Internet of Things and broadband service
  • 4.
    © Verizon 2017All Rights Reserved. Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. ‘Be Prepared’ – build architecture so you can: Analyze Everything Analyze Anywhere Analyze in Real-Time • 100’s to 1000’s of Data Sources • Business & Machine Data • On-premise or in the Cloud • In DB, DW, Hadoop, In-Memory, etc. • Capture new, changing data • Process/stream in motion
  • 5.
    © Verizon 2017All Rights Reserved. Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. Paradigm Shift: App-Centric  Data-Centric DATA-CENTRIC Central Data Lake App1 App 2 App 3 App 4 App 5 App 6 APP-CENTRIC Limitations: • Multiple copies of data • Difficult cross-system integration • Limit on Data volumes Advantages: • One version of the data • No need for cross-app integration • System scales linearly
  • 6.
    © Verizon 2017All Rights Reserved. Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. Migrating to Hadoop – Types & Use Cases •Analyze data where it resides •Exploit Fault-tolerant, High-Performance Platforms for varying workloads •Push analytics to the front line ETL-Offload •Enable ELT Offload while reducing cost •Enable new forms and sources of data Self Service •Schema on Read •Transform and Model in place Data Reservoir Exploratory Lake Analytical Lake Active Archive Integrate & Converge Analytics & Data •Carry all History •Expand Depth and Breadth of DW •Expand Variety of Data
  • 7.
    © Verizon 2017All Rights Reserved. Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. Architecture in Motion  Adaptable Architectures Data In Motion  Enabling Real Time Scale Matters  Reduce Impact, Increase Efficiency Breadth Matters  Sources, targets, and in between Depth Matters  When the going gets tough… Traceability  Data Lineage Data Ingestion for Real-Time Analytics
  • 8.
    © Verizon 2017All Rights Reserved. Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. 8 Data Ingestion – Enhancement ✓ Data Ingestion with CDC ✓ Ingest Data directly to Hadoop ✓ Simplified Architecture (fewer hops, points of failure) ➢ Data consistency with time-based partitions ➢ Operational visibility with granular change tracking ➢ Automated data integration on Apache Hive ERP3 FINANCE DATA LAKE ERP2 ERP1ERP(SOURCE) Lambda Architecture Attunity Replicate
  • 9.
    © 2017 Attunity AttunityCorporate Overview Data Integration & Big Data Management Software Accelerate data delivery and availability Automate data readiness for analytics Optimize data management with intelligence ▪Hadoop & Big Data ▪Databases & Data Warehouses ▪On premise & in the Cloud Solutions Global OfficesOverview ▪2000 customers in 65 countries ▪250 people and growing ▪NASDAQ traded (ATTU)
  • 10.
    © 2017 Attunity Seamlessintegration with Hortonworks Connected Data platforms and solutions Hortonworks Connection Hortonworks Solutions Enterprise Data Warehouse Optimization Cyber Security and Threat Management Internet of Things and Streaming Analytics Hortonworks Connection Subscription Support SmartSense Premier Support Educational Services Professional Services Community Connection Cloud Hortonworks Data Cloud AWS HDInsight Data Center Hortonworks Data Suite HDFHDP
  • 11.
    © 2017 Attunity Real-timeData Ingest with Attunity Replicate SOURCES OLTP, ERP, CRM Systems Documents, Emails Web Logs, Click Streams Social Networks Machine Generated Sensor Data Geolocation Data Attunity Replicate for HDP & HDF Accelerate time-to-insights by delivering solutions faster, with fresh data, from many sources - Automated data ingest - Incremental data ingest (CDC) - Support for multiple sources
  • 12.
    © 2017 Attunity AttunityReplicate architecture Transfer TransformFilter Batch CDC Incremental In-Memory File Channel Batch Hadoop Files RDBMS Data Warehouse Mainframe Cloud On-prem Cloud On-prem Hadoop Files RDBMS Data Warehouse Kafka Persistent Store
  • 13.
    © 2017 Attunity*Supportedunder early access program Attunity Replicate sources and targets RDBMS Oracle SQL Server DB2 iSeries DB2 z/OS DB2 LUW MySQL PostgreSQL Sybase ASE Informix DW Exadata Teradata Netezza Vertica Hortonworks Cloudera MapR Hadoop DB2 for z/OS IMS/DB VSAM SQL M/P Enscribe RMS HP NonStop Mainframe AWS RDS Salesforce Cloud RDBMS Oracle SQL Server DB2 LUW MySQL PostgreSQL Sybase ASE Informix DW Microsoft PDW Exadata Teradata Netezza Vertica Sybase IQ Amazon Redshift Actian Vector SAP HANA Hortonworks Cloudera MapR Pivotal Amazon EMR Hadoop MongoDB NoSQL Amazon RDS Amazon Redshift Amazon EMR Google Cloud SQL Google Cloud Dataproc Azure SQL Data Warehouse Azure SQL Database Cloud Azure Event Hubs* Kafka Messaging Targets Sources SAP ECC on Oracle ECC on SQL ECC on DB2* SAP HANA
  • 14.
    © 2017 Attunity InMemory and File Optimized Data Transport CDC for data-at-rest and data-in-motion R1 R1 R2 R1 R2 R 1 R 2 Batch CDC Data Warehouse Ingest-Merge SQL n 2 1 SQL SQL Transactional CDC Message Encoded CDC Data Sources Attunity Replicate – Change Processing CDC Many Databases and Data Warehouses ....
  • 15.
    © 2017 Attunity CDC DataStreaming into Kafka  HDF  HDP MSG n 2 1 MSG MSG Data Streaming Transaction logs In memory optimized metadata management and data transport Bulk Load MSG n 2 1 MSG MSG Data Streaming Message broker Message broker
  • 16.
    © 2017 Attunity AttunityReplicate for SAP Universal, Real-Time and Simplified Data Integration • Replicate your SAP application data in bulk or real-time for data analytics ▪ Documents, transactions and business data ▪ All core and industry-specific SAP modules • Integrate real-time with all major targets ▪ DBs, data warehouses, Hadoop – cloud or on premises ▪ Decode SAP data from complex source structures ▪ Enable business usage on common data model • Move external data into SAP HANA Attunity Replicate Bulk Load CDC Core and Industry-Specific SAP Modules RDBMS | EDW | Hadoop On Premises or Cloud Hadoop Data Lake
  • 17.
    © 2017 Attunity AttunityReplicate Server TransformFilter Batch CDC Incremental In-Memory File Channel Batch Attunity Replicate Persistent Store Extract relationships for Pool and Cluster Tables RDBMS (Oracle, DB2, etc.) Redo/ Archive logs or Journal File --------------- - Transparent Tables On Premises Hadoop RDBMS Data WarehouseKafka Cloud Attunity Replicate Agent for SAP SAP ECC (Enterprise Central Component) Data Model Mapping Pool/Cluster table RFC
  • 18.
    © Verizon 2017All Rights Reserved. Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. Use Cases 18 • Working Capital Analytics • Spend Analytics • Labor Reporting • Audit & Compliance • Capital Reporting & Analytics • Active Archival of legacy data
  • 19.
    © Verizon 2017All Rights Reserved. Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. Data Governance Considerations for Migration MDM Integration Bidirectional, tagging & Linking tools, which highlight the Relationships in Data Data Quality Incoming data needs to discover contradictions, inconsistencies, & redundancies Security Policy Process authentication, authorization, encryption, & monitoring Data Masking Access to sensitive Data has regulatory & additional auditing
  • 20.
    © Verizon 2017All Rights Reserved. Information contained herein is provided AS IS and subject to change without notice. All trademarks used herein are property of their respective owners. What’s Next? Delivering real-time insights & analytics opening up new use cases: • TCO Analysis • Reducing Close Cycles • Revenue Analysis • EDW Offload