Data Intelligence
Using Machine Learning & Spark to
Power Data-Driven Marketing
February 13, 2018
Presented by:
Joe Caserta
Max Goldbas
Co-Presented by:
Big Data Warehousing Meetup
• Knowledge Sharing: All things Data & Innovation
• 4,800+ Members
• Founded and hosted by Caserta
About Caserta
Data Intelligence and Strategic Consulting
Data Lakes, Data Warehouses, Data Laboratories
Award-winning company for Data Innovation
Data Science, Machine Learning, Artificial Intelligence
Internationally recognized work force
Best Practices, Authors, Educators, Mentors
Strategy, Governance, Architecture, Implementation
The Customer Journey
PR
Radio
TV
Print
Outdoor
Word of Mouth
Direct Mail
Customer Service
Physical Touchpoints
Digital Touchpoints
Search
Paid Content
email
Website/
Landing Pages
Social Media
Community
Chat
Social Media
Call Center
Offers
Mailings
Survey
Loyalty Programs
email
Agents
Partners
Ads
Website
Mobile
3rd Party Sites
Offers
Web self-service
Learning the Path-to-Purchase
Attribution
Type
Comments
Single Touch Rules-Based Statistically Driven
Assign the credit to the
first or last exposure
Assign the credit to each
interaction based on
business rules
Assign the credit to
interactions based on
data-driven model
Ad-Click Mailing MailingE-mail E-mailAd-Click Ad-Click
100% 33% 33% 33% 27% 49% 24%
- Last touch only
- Ignores bulk of
customer journey
- Undervalues other
interactions and
influencers
- Subjective
- Assigns arbitrary values
to each interaction
- Lacks analytics rigor to
determine weights
 Looks at full behavior
patterns
 Consider all touch points
 Can apply different models
for best results
 Use data to find
correlations between touch
points (winning
combinations)
Data Science in Practice
Source: https://www.collaberatact.com/data-science-stay/
Data Science for the Enterprise
CRISP-DM: Cross Industry Standard Process for Data Mining
1. Business Understanding
• Solve a single business problem
2. Data Understanding
• Discovery
• Data Munging
• Cleansing Requirements
3. Data Preparation
• ETL
4. Modeling
• Evaluate various models
• Iterative experimentation
5. Evaluation
• Does the model achieve business objectives?
6. Deployment
• PMML; application integration; data platform; Excel
Business
Understanding
Data
Understanding
Data
Preparation
Modeling
Evaluation
Deployment
Data
S3
Ingest Storage ETL Presentation VisualizationData Sources
• OPRA
• Equifax
• CDS
• Moody’s
• BlackBox
Relational Datasets
• Barclay
• Eureka
• Hedge Fund
Intelligence
• Hedge Fund
Research
• Lipper
• Morningstar
• MF Holdings
• BD/ ADV
Flat File Datasets
S/ FTP
Push
Kinesis
• CAT
Landing
Data Lake
(Tier 1)
Data Lake
(Tier 2)
Data Science
(Ephemeral)
Redshift
Spark
(Streaming*
/ Batch)
Lambda
Data Science
• Python
• SQL
• Scala
• Predic ve
Analy cs
• Text Analy cs
• Business
Intelligence
Structured
Data
Redshift
Metadata
Repository
• Data
Marketplace
• Clean
• Match
• Derive
• Aggregate
• Mllib
• CoreNLP
• Prepare
• Deliver
Streaming Data Sets
Data Analytics Ecosystem
Campaigns
Sales
Netezza
Relational DBs
Salesforce
RESTful APIs
Cloud DBs
Adobe
Weblogs
Web Data
DMP
Streaming Data
Redshift
Governing Data Innovation
Customer Journey Dashboard
Thank You
joe@caserta.com
@Joe_Caserta
Joe Caserta
President, Caserta Concepts

Using Machine Learning & Spark to Power Data-Driven Marketing

  • 1.
    Data Intelligence Using MachineLearning & Spark to Power Data-Driven Marketing February 13, 2018 Presented by: Joe Caserta Max Goldbas Co-Presented by:
  • 2.
    Big Data WarehousingMeetup • Knowledge Sharing: All things Data & Innovation • 4,800+ Members • Founded and hosted by Caserta
  • 3.
    About Caserta Data Intelligenceand Strategic Consulting Data Lakes, Data Warehouses, Data Laboratories Award-winning company for Data Innovation Data Science, Machine Learning, Artificial Intelligence Internationally recognized work force Best Practices, Authors, Educators, Mentors Strategy, Governance, Architecture, Implementation
  • 4.
    The Customer Journey PR Radio TV Print Outdoor Wordof Mouth Direct Mail Customer Service Physical Touchpoints Digital Touchpoints Search Paid Content email Website/ Landing Pages Social Media Community Chat Social Media Call Center Offers Mailings Survey Loyalty Programs email Agents Partners Ads Website Mobile 3rd Party Sites Offers Web self-service
  • 5.
    Learning the Path-to-Purchase Attribution Type Comments SingleTouch Rules-Based Statistically Driven Assign the credit to the first or last exposure Assign the credit to each interaction based on business rules Assign the credit to interactions based on data-driven model Ad-Click Mailing MailingE-mail E-mailAd-Click Ad-Click 100% 33% 33% 33% 27% 49% 24% - Last touch only - Ignores bulk of customer journey - Undervalues other interactions and influencers - Subjective - Assigns arbitrary values to each interaction - Lacks analytics rigor to determine weights  Looks at full behavior patterns  Consider all touch points  Can apply different models for best results  Use data to find correlations between touch points (winning combinations)
  • 6.
    Data Science inPractice Source: https://www.collaberatact.com/data-science-stay/
  • 7.
    Data Science forthe Enterprise CRISP-DM: Cross Industry Standard Process for Data Mining 1. Business Understanding • Solve a single business problem 2. Data Understanding • Discovery • Data Munging • Cleansing Requirements 3. Data Preparation • ETL 4. Modeling • Evaluate various models • Iterative experimentation 5. Evaluation • Does the model achieve business objectives? 6. Deployment • PMML; application integration; data platform; Excel Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment Data
  • 8.
    S3 Ingest Storage ETLPresentation VisualizationData Sources • OPRA • Equifax • CDS • Moody’s • BlackBox Relational Datasets • Barclay • Eureka • Hedge Fund Intelligence • Hedge Fund Research • Lipper • Morningstar • MF Holdings • BD/ ADV Flat File Datasets S/ FTP Push Kinesis • CAT Landing Data Lake (Tier 1) Data Lake (Tier 2) Data Science (Ephemeral) Redshift Spark (Streaming* / Batch) Lambda Data Science • Python • SQL • Scala • Predic ve Analy cs • Text Analy cs • Business Intelligence Structured Data Redshift Metadata Repository • Data Marketplace • Clean • Match • Derive • Aggregate • Mllib • CoreNLP • Prepare • Deliver Streaming Data Sets Data Analytics Ecosystem Campaigns Sales Netezza Relational DBs Salesforce RESTful APIs Cloud DBs Adobe Weblogs Web Data DMP Streaming Data Redshift
  • 9.
  • 10.
  • 11.

Editor's Notes

  • #8 Teaching half-day class on this at the Data Summit in Boston in May 22nd