100% found this document useful (1 vote)
109 views

Data Ops

Uploaded by

abn2011
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
109 views

Data Ops

Uploaded by

abn2011
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Driving Analytics Success

With DataOps Enriched


Data Engineering Practices
Robert Thanaraj

© 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates. This publication may not be reproduced or distributed in any form
without Gartner's prior written permission. It consists of the opinions of Gartner's research organization, which should not be construed as statements of fact. While the information contained in this
publication has been obtained from sources believed to be reliable, Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although Gartner research
may address legal and financial issues, Gartner does not provide legal or investment advice and its research should not be construed or used as such. Your access and use of this publication are
governed by Gartner’s Usage Policy. Gartner prides itself on its reputation for independence and objectivity. Its research is produced independently by its research organization without input or
influence from any third party. For further information, see "Guiding Principles on Independence and Objectivity."
Current State of Data and Analytics Delivery:
Agile Development + Fragile Operations

Dev Team Ops Team

Inception Customer
Product Increments Value

2 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Key Issues

1. What is data engineering? How can DataOps improve data


engineering? and why is data engineer a critical role?
2. What are the must-have best practices?
3. How to formalize and scale the data engineering practice?

3 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Data engineering is the discipline
of translating data into “usable forms”

It involves building and operating:

Data and Analytics Data Data


Applications (Products) Pipelines Platforms

4 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Data Engineering Operational Complexity
Post-deployment
issues Design
issues

What
2. Doesn’t run 3. Doesn’t run
happened? 1. Doesn’t run
“good enough” “fast enough”

What’s the Wrong Wrong Config drift Data drift Data pipeline Need more
root cause? code/wrong environment issue compute?
data/wrong
schema

5 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
DataOps Adds “Software Mindset” to
Data Management
DataOps is an agile and collaborative data management practice focused on
improving the communication, integration, automation, observability and operations
of data flows between data managers and data consumers.

Agile

Source: Data and Analytics Essentials: DataOps (G00767464)

6 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
DataOps Helps You Achieve “Good”
Data Management Metrics

Business alignment
Code quality Service tickets
Productivity Data value gap
Time to use Data-as-a-product
Self-service Release velocity
Process management
Collaboration Reuse
7 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Data Engineering Is a Critical Skill
With High Demand High SQL
Java
Python
DevOps
Automation

CI/CD Kubernetes
Cloud Environments
Core Critical
Data
Data Engineering Engineering

Current Job Openings


Artificial Intelligence
Test Automation
Data Integration
17.2% Demand Growth Data Management
Cloud Architecture
10/10 Hiring Difficulty Data Science
Cloud Security
Data Architecture

Create data engineers by upskilling your


ETL developers, data analysts, DBAs or Legacy Niche
similar roles. Train them on software
engineering, DevOps tooling, product
development and soft skills.
Low Demand Pressure High
Representative skills displayed here, full list available at
Source: Infographic: 2021 IT Skills Roadmap (G00756340), Gartner TalentNeuron

8 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Data Engineer’s Key Activities

Emerging
Trend From provisioning servers, automated
Study
Usage data ingestion, storage and scheduling
Patterns of pipelines, to self-healing pipelines
Manage and dynamic workload management …
Metadata
Automate!
Build Data
BuildPipelines
Data
rd
2/3 Pipelines
of the time
Support
Data Science All recurring patterns can be
templatized; but not all has value.
Collaborate Metadata analysis can guide
Across Drive
Business and IT Automation you in picking the right use cases
for automation.
Critical
Activity
9 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Data Engineer’s Top 3 Challenges

1. Data Engineers are thought as unicorns. Align roles


adequately; data engineering is a team sport.
Help from software engineers, I&O specialists
are critical.

2. Feel burnt out most of the time: firefighting, manual


tasks and poor social structures. Embrace
modularity with a product delivery mindset.
Augment with junior/citizen roles.

3. Not seen as value creators. Celebrate


automation; automation creates velocity.
Speed = Value.

10 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
A Representative DataOps Team

Data Manager persona Data Consumer persona


• Data Engineers • Business Analysts
• Data Architects • Data Scientists
• DBAs • Domain Experts
• Data Stewards • BI Developers

Software Engineer persona Business


Business
Data Product Manager Business
• Python Developers Stakeholders
Stakeholders
Stakeholders
• DevOps Experts
• Automation Engineers
• Test Engineers

Cross-functional focus improves collaboration,


Product development focus increases agility.

11 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Key Issues

1. What is data engineering? How can DataOps improve data


engineering? and why is data engineer a critical role?
2. What are the must-have best practices?
3. How to formalize and scale the data engineering practice?

12 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
No. 1: Product Delivery Mindset: Replace
Monolithic Practices With Modularity
Emerging
trend Templatize Provision
(DevOps capabilities)
Data store
Catalog D&A Products • Infrastructure as code
… • Access control
1 2 n Data pipelines
• Version control
• Continuous
Analytic model integration/deployment
Data and Analytics Platforms • Regression test packs
User interface

Microapps are self-contained, loosely-coupled and enable independent build/


deployment cycles. Drive targeted consumer experiences by templatizing D&A
product designs and provisioning D&A product instances at scale with agility.

13 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
No. 2: Automation Mindset: Pick the Right Use
Cases for Automation by Metadata Analysis

Iterative Resource-intensive

Raw
Raw
Raw Ingest Explore Model Curate Catalog Optimized
data
data
Data Data
Data at Data in Motion Data at
Rest (Detokenized/Deidentified appropriately) Rest/Use
(Encrypted) (Encrypted)

Metadata drives pipeline patterns. This is a current trend, e.g., data warehouse automation tools.
Contextualize data better by studying consumption patterns by users and systems.
Data observability is an emerging trend in this context. Active metadata forms the basis
for the data fabric design.

14 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Automate Testing and Release Processes:
Continuous Integration Pipelines

Developer
Commits
CI Pipeline

Run
Run Run Deploy to Advanced Tests,
Automated
Build Scripts* Unit Tests Test Env. Release Process
Tests
Build Unit test Deployment Regression
failed failed failed test failed

Send Notification, End Process

CI = Continuous Integration
Scripts* = DDL/DML, pipeline, metadata, operations config, etc.
Source: Data and Analytics Essentials: DataOps (G00767464)

15 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
No. 3: Ops Enablement Mindset: Agile Practices
When Empowering Citizen Roles
Start here Ops overhead

Mode 2: Data Preparation Pipelines Exploratory Workspace


(Sandboxes)
Gatekeeping process monitors and controls the
promotion of successful data processes
into production. The benefits for business teams: Gatekeep
Data
Sources a. Removes operations overhead here
b. Realizes business value and outcomes. Emerging
trend

Mode 1: Operational Pipelines Reusable Models


and Transformations
(Production)
End here
Ops = operations

16 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Test Upfront for Feasibility and Business Value

Feasibility Test Value Test


Can we develop a testable solution? Does the D&A solution have the
anticipated effect on business value?

Ideas Repository: Move to


1 2 3 Production
1. Targeted Promotions
2. Automated Shopping List Reliable Datasets: D&A Models for Market Evaluation
3. Location-Based Marketing Store Sales Solution of Pilots
4. Cross-Sell Recommendation Shopping History Development (A/B testing)
5. … Discard/
Reformulate/
Try another idea

Source: 3 Case Studies of Data and Analytics Driven Business Innovation (G00751851)
* Pseudonym

17 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Key Issues

1. What is data engineering? How can DataOps improve data


engineering? and why is data engineer a critical role?
2. What are the must-have best practices?
3. How to formalize and scale the data engineering practice?

18 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Data Engineering Stretches Beyond the Core
Data Management Practices

Data Software Infrastructure


Management Engineering and Operations

Source: How to Build a Data Engineering Practice That Delivers Great Consumer Experiences (G00741778)

19 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
NIKE Acquires Datalogue to Add
“Software Mindset” to Data Management
NIKE acquired a startup based in New York to enable its digital transformation.
NEWS Datalogue had a proprietary machine learning technology that automates data
preparation and integration.
— February 2021

John Donahoe, President and CEO, NIKE said:


“Our CDA strategy focuses on accelerating how we connect
with consumers to better serve them personally at scale. The
acquisition of Datalogue builds on our digital momentum by
enhancing our ability to transform raw data into actionable
insights in real time and across the enterprise.”

Source: Adapted From NIKE

20 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Hub-Spoke Operating Model: Formalize and Scale

Marketing Finance

Central
D&A Team
Supply Chain Data Science Lab

The central team establishes your franchise (processes, capabilities, best practices) so
that your brand remains consistent. The satellite teams within the departments adapt to
local environments (data, people).

21 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Data Engineering Activities Vary Across
Central and Departmental Teams
5% 5% 15%
5%
Data Management

80% Software Engineering


90% 30% 30%
Infrastructure and
Marketing Finance Operations

10% 10%
10% 40% 20%
Central
70%
80% D&A Team
Supply Chain Data Science Lab

Software engineering and I&O tasks will be higher in the central team. While the core data
tasks will be higher in the departmental teams with minimal platform operations.

22 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Draft and Improvise Responsibilities Distribution

Central D&A Team Departmental D&A Teams


• Technical group focused on platform and • Functional groups focused on
technology standards and operationalizing data toward consumer
best practices. outcomes within business domains.
• Platform owners, data engineers, DBAs, • Product managers, business users,
integrators, architects. citizen data engineers, technical analysts.
• Data scope includes “all” data i.e., • Data scope is limited to the product line.
known/unknown, internal/external data.
• Provision data products/services using
• Produce reusable components, templates the reusable components (or templates)
to support departmental team activities. just like an assembly line.

23 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Recommendations

Create data engineers by upskilling your ETL developers, data


analysts, DBAs or similar roles. Train them on software engineering,
DevOps tooling, product development and soft skills.
Embrace automation wherever possible to accelerate design,
development, monitoring, management of data products that
meet your business demands.
Create mixed-role teams, train members if you must. Position them
within LOB teams for maximum impact. Invest in product
managers, as necessary. Excellence as you mature; center of
enablement first.

24 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.
Recommended Gartner Research

How to Build a Data Engineering Practice That Delivers Great Consumer


Experiences
Robert Thanaraj and Ehtisham Zaidi (G00741778)
Data Engineering Essentials, Patterns and Best Practices
Sumit Pal (G00741282)
Data and Analytics Essentials: DataOps
Robert Thanaraj (G00767464)
Operational AI Requires Data Engineering, DataOps
and Data-AI Role Alignment
Robert Thanaraj and Erick Brethenoux (G00737307)
Quick Answer: How Can Executive Leaders Put Their Metadata to Work?
Robert Thanaraj and Guido De Simoni (G00758029)
Access to Gartner research is subject to entitlement. For information, please contact your Gartner representative.
25 © 2022 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates.

You might also like