0% found this document useful (0 votes)

20 views5 pages

EP-Experimentation Best Practices-200125-135749

The document outlines best practices for conducting experiments to ensure accurate conclusions and valuable insights for customers. Key recommendations include defining clear hypotheses and goals, validating assumptions through pre-analysis, and ensuring proper setup and verification of experiments. Additionally, it emphasizes the importance of analyzing results critically and institutionalizing insights to avoid repeating mistakes in future experiments.

Uploaded by

spideyunfcs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views5 pages

EP-Experimentation Best Practices-200125-135749

Uploaded by

spideyunfcs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Experimentation Best Practices

Why Experimentation Best Practices?

Experimentation allows us to learn and give the right experiences to our Customers, to create better value for Customers.
Although experimentation seems straightforward, the risk of making inaccurate conclusions is too high, if an organization
does not follow best practices.

For example, here are a couple of commonly made mis-steps while conducting experiments which can lead to inaccurate
conclusions and decisions:

1. Peeking (article here): If we do not lock down the testing time period ahead of time, we end up with the peeking
problem where we have side effects from checking the results and taking action before the A/B test is over. The more
often you look at the intermediate results of the A/B testing with the readiness to make a decision, the higher the
probability is that the criterion will show a statistically significant difference when there is none.
a. 2 peeking cases double the p-value;
b. 5 peeking sessions increase the p-value by a factor of 3.2;
2. Simpson’s Paradox: This can occur when we change the test group allocations in a disproportionate manner mid-test-
flight. The latent segments in the test groups change their proportions when we change allocation percentages,
inserting error into the results. More formally, Simpson’s Paradox is a statistical phenomenon where an association
between two variables in a population emerges, disappears or reverses when the population is divided into
subpopulations.

To ensure we can maximize the value from our experimentation practices and reduce inaccuracy of decisions, we
recommend following best practices across all experiments. This document outlines the best practices to adopt.

Experimentation Planning

Hypothesis: Define the upside

Before launching an experiment:

Define the business opportunity we are trying to improve

Explain how we are planning to enhance the business opportunity and why we want to take the particular approach.
If you have supporting data/analysis, document it.
Document the control UI and flow and test wireframe/concept

Goals: Without goals, test measurement is meaningless

Define the primary metric that we want the experimentation to move. This is the metric we use to define the
rollout scenario.
Ideally, this should be just one metric (2 metrics max)
Ideally, this should tie to a business KPI (e.g. greater ARR, paid user signups, etc)
Define the expected change magnitude and the direction to calculate the sample size
Decide one/two tail and duration of the test → based on sample size and based on direction/s we expect the KPI to
change in
Example:
Test concept: Reducing the number of steps in the new user sign up process
The primary metric: Number of sign-ups and it would be ideal to tie to a profit driving KPI, such as # of paid
users
Define the secondary metric(s): There are metrics that help us validate and understand in detail why and how the
primary metric was impacted

Secondary metrics help us cross-validate if the movement in the primary metric is real.
Since we use a high alpha of 5% or 10%, it is possible that impact to the primary metric was by chance, so the
secondary metrics can help validate whether the change in the primary metric is real or not. As we improve the
cadence of testing, this will become more important.
Example:
Test concept: Introduce a more prominent trial sign up button in the new user registration flow.
Primary goal: D90 conversion rate. **As more people sign up for trial in the new user flow, they
can better understand the total value of our product and move our D90 conversion rate by 5% **
Possible Noise: If the primary metric did move by 5%, but if the number of new users signing up for trial
didn't increase, then the increased D90 Conversion could be noise and needs further validation.
**Define Guardrail metrics: This helps us ensure we are not harming the business in the long run for short-term gains.
**
These are metrics based that track the long term impact an experiment can have.
Example:
When experimenting to increase trials during new user sign-up flow, we might want to set up D90 conversion or
Trial to Paid conversion as a guard rail metric.
Another example is from a Social Media company where they define for a "notification disable rate" as a guard
rail metric. They set a guard rail that says that every 1% lift in sessions/DAU from increased push notification,
the "notification disable rate" increase should be within X%.
Another Example: A SaaS company might set up a 12-month churn rate as a guard rail metric and revisit the
analysis after 12 months.

Pre-analysis: do the calculations upfront

Validate the problem, opportunity, and approach using data (e.g. descriptive analysis)
Example: There may not be significant value in testing email header copy change for a specific email type if the
volume of this email is low. We will never achieve significance. In this case, consider pre/post or picking a higher
traffic email type for testing.
Calculate the sample size needed using one tail/two-tail, alpha (5%) , and power (80%). Determine experiment time
period (how long to run the experiment) upfront.
Predetermine if you need to use a higher alpha for rollout decisions.
Do a cost vs. benefit analysis of conducting the experiment. When the cost of experimentation is high in the
organization and we have limited tech resources, we must be able to understand the potential upside from each test,
so we are prioritizing the right ideas.
Calculate the potential yearly upside from this change as one of the company KPI. When that's not possible, define
the upside to a KPI in terms of log value (like 0.01%, 0.01%, 0.1%, 1%, 10%, 100% impact), so we can make rough
comparisons between different ideas.
Push for step function changes vs. incremental changes

Experimentation set up plan: How to setup the test

Determine the # of variants needed to help answer the business question we want to learn from the experiments (eg:
A/B/n or MVT (multivariate) set up)
Document the analysis plan including what hypothesis question can and can’t be answered.
Define instrumentation needed (the additional tracking data you need on top of what's already available).
Specify tracking needs as critical or nice to have so developers can discuss and implement the new tracking based
on effort and performance impact.
Define the population you are testing on, including segmentation or exclusion criteria
Document when you want to run the experiment in the experiment calendar and checks for conflicts
Define the launch weights and ramp-up plan if experimenting in a critical path or impacts a large user base
When ramping for high-risk test in a critical area, go from 1% to 5%/10% and then 50%
Beware of the Simpson’s paradox and only analyze test periods where the weight allocation is proportional and
comparable.
Specify the success criteria for ramping
Example: No significant impact to Primary metric, and we could detect a change as small as 5% (i.e. sensitivity)

General Guidance for Test Group Sample Allocation Based on Risk and Critical Path

Experiment launch weights matrix on failure Critical path page or critical product feature
risk, and importance of feature and impacted
population L M H

Potential code risk for L 50% 10% 5%

failure
M 10% 10% 5%

H 10% 5% 1%

Experiment Verification

Experiment Validation Pre-Launch: Check that everything is in place

After the functionality is built by Dev and data eng team, verify if the reporting data and UI works as intended in Dev
or Pre production environment
Recommendation is to have any two out of Dev, QA and Analyst verify tracking and reporting
Recommendation is to have any two of PM, QA and Analyst verify UI functionality
Paste the control and test experience screenshot in experiment documentation for future reference

Experiment Validation Post-Launch: Check that the results are flowing in as expected
1 or 2 days after Go live, verify reporting data for the experiment is valid
Check for any skew in population assignment
If you launched at 1% weight, take an initial read after 2 or 3 days and set to 5% or 10% weights as per the previously
agreed plan.
The 1% experiment’s goal is only to ensure “things don’t break” and not to get a read on the results.
If you launched at 5% or 10%, change the weights to 50% as per your initial plan
Beware that the customer base could be different on weekdays/weekends, so analyze results in full week
increments in case of weekly seasonality and in ful month increments in case of monthly seasonality
The above will not apply for the 1% since the 1% test is not intended to get a read impact created by the
experimental experience.

Experimentation Analysis and Communication:

Ensure there is no bias in the assignment or experiment data
When possible, automate these validations like assignment population bias
Analyze the test results using statistical significance
Leverage the secondary metrics to validate and better understand how and why the test won.
Analyze results by key dimensions to understand the rollout scenario and find any significant performance
difference by population cohort.
Beware of increased statistical noise when making multiple comparisons. So form the hypothesis before analysis
for dimension level split or use adjusted P-value.
Analyze the interaction results with any pre-identified interacting experiments that were live simultaneously as this
test.
Communicate results with statistical confidence:
When there is a significant impact, communicate the results with a confidence interval (@80%)
When there is no significant impact, communicate the results with sensitivity, so the general audience can
understand at what observed change %, can we have concluded results as statistically significant.
Document results in wiki and detailed description of the test, ramp summary, key results, snapshot of metrics,
Insights, and Next steps.
Present the results to partners (Tech and Business) and analytical peers to gather additional insights and educate
others on the learning. Document any follow-up analysis and insights.
For PM: Ensure the experiment is rolled out or retired from code as per final result conclusions.

Institutionalize Insights from Experimentation:

Along with business partners, present the detailed results in a broader org group to spur conversations on how others
can benefit from learning and what actions other teams can take/collaborate to maximize value.
Ensure all results are documented and searchable by function or product tag to ensure we don't repeat the same
failed ideas. This enables us to do meta-analysis from multiple experiments to gather broader insights.
Ex: In a comparable company, we leveraged data from 2 dozen past examples to understand the relationship
between increased notification type and notification disablement.
Ex. In a comparable company, using metadata from ~50 past results, we understood the session lift by increasing
the volume of different email types.
Ex: In a comparable company, we leveraged ~8 past analyses to understand the incremental value of additional
cross merchandising spots.

Experimentation Governance

Automated alert system:

Enable an automated alert system to monitor significant negative impact to primary metrics from experiments.
This helps to avoid the need for results peeking.
We use P <= 0.01 or P <= 0.5 for two continuous days
Enable automated alert if an experiment has consistently significant positive results for continuous x days (we used
3 days to reduce random wins)
Review long-running experiments periodically to avoid performance impact on the product.
Document any Experiment failures due to set up, tracking/data failure, wrong implementation, or conflicts/ experiment
interactions. This enables us to monitor the health of the experiment platform.
Create an experiment calendar, to enable conflict management and understand volume and velocity of experiments.

Opportunity Action Items for Consideration

Note: Short term - means hours/days worth of effort. Long term means weeks/months worth of effort

Based on initial feedback and observations on current experiment platform capabilities, these are a list of action items I
recommend we consider. It still needs to be consulted with impacted parties to ensure the gaps still exist and their
priority.

Standardize sample size requirement with power calculations: (Priority H)

Short term: Define a standard sample size calculator using an alpha of (5% or 10%), one/two tail, and power of 80%
Ensure result dashboard incorporate significance and confidence level (Priority H)
Short term: Incorporate significance calculation and confidence interval directly into Sisense Experiment analysis
framework using Z test formulas or Python functions.
Ability to split test results by dimensions with adjusted P-value threshold: (Priority M)
Long-term: Explore alternative tools or data tracking to enable unrestricted metric/funnel/dimension evaluation for
Experiments and split analysis by dimensions and filters.
Solve for Sample size problems (Priority H)
Short term: Ensure we define experiment sample size analysis using one-tail or two-tail to better accommodate for
smaller sample sizes in experiments.
Short term: Leverage secondary metrics to build confidence in directional read on primary metrics.
Short term: Use guard rail metric for long-term impact measurement by pausing the experiment after collecting
enough samples and later measuring the impact on guard rail metric.
Long-term: Leverage Bayesian experiment analysis framework to communicate uncertainty in data for small
sample size problems.
Automated bias validation of test results: (Priority M)
Short term: Explore how to automatically flag when experiments have weights or population bias in results
Long term: Implement automated bias alerts when analyzing results.
Automated alerts for winning and Losing experiments: (Priority L)
Short term: Explore how to create automated alerts for losing or consistently winning experiments based on P value
and days the results remain significant.
Analyze randomization within and across experiments for any bias: (Priority M)
Short term: Explore if we need to validate randomization bias within and across experiment allocation based on
limitations with existing systems and analyze.
Long-term: Build automated systems to monitor bias continuously within and across experiments.
Experiment calendar: (Priority M)
Short term: Understand the current Experiment calendar and explore automated ways to validate accuracy of
testing calendar by comparing against actual experiment assignment data.
Long term: We have a single system to record all the experiments running across the organization and the duration
of experiments. Enable automated validation against actual test assignment data.

MPH Entrance Examination With Answers
92% (91)
MPH Entrance Examination With Answers
42 pages
Kohavi R Tang, D Xu, Y. (2020) - Trustworthy Online Controlled Experiments. A Practical Guide To A-B Testing. 1° Edición. Cap. 4
No ratings yet
Kohavi R Tang, D Xu, Y. (2020) - Trustworthy Online Controlled Experiments. A Practical Guide To A-B Testing. 1° Edición. Cap. 4
28 pages
2022 Experiment With Google Ads Playbook
No ratings yet
2022 Experiment With Google Ads Playbook
47 pages
Foundations of Large-Scale Doubly-Sequential Experimentation
No ratings yet
Foundations of Large-Scale Doubly-Sequential Experimentation
339 pages
A - B Test Guide
No ratings yet
A - B Test Guide
33 pages
Functional Testing With Examples
No ratings yet
Functional Testing With Examples
10 pages
Test Plan: Product Name: Opencart (Frontend)
88% (8)
Test Plan: Product Name: Opencart (Frontend)
8 pages
Data Science Product Questions
No ratings yet
Data Science Product Questions
92 pages
Google (DA) 面试准备
No ratings yet
Google (DA) 面试准备
20 pages
DDDM Lecture3 ExperimentBasics Dec11
No ratings yet
DDDM Lecture3 ExperimentBasics Dec11
38 pages
Chapter 05 - Tool For Experiments
No ratings yet
Chapter 05 - Tool For Experiments
13 pages
Facebook Status Colour Change AB Tetsing Case Study
No ratings yet
Facebook Status Colour Change AB Tetsing Case Study
11 pages
Download
No ratings yet
Download
7 pages
Unit-5 SED
No ratings yet
Unit-5 SED
54 pages
Experiment
No ratings yet
Experiment
7 pages
A - B Testing
No ratings yet
A - B Testing
27 pages
How To Design A Business Experiment
No ratings yet
How To Design A Business Experiment
5 pages
Innovate by Rapid Experimentation 1
No ratings yet
Innovate by Rapid Experimentation 1
33 pages
Test What Matters - Level-Up Your Product Experiments With Behavioral Data
No ratings yet
Test What Matters - Level-Up Your Product Experiments With Behavioral Data
12 pages
AB Testing Cheat Sheet
No ratings yet
AB Testing Cheat Sheet
13 pages
SQR: Balancing Speed, Quality and Risk in Online Experiments
No ratings yet
SQR: Balancing Speed, Quality and Risk in Online Experiments
9 pages
Strategyzer-Books-Testing-Business-Ideas-Teaser Tabloide
No ratings yet
Strategyzer-Books-Testing-Business-Ideas-Teaser Tabloide
46 pages
Gabejan Takenaka 2021 Students Computer Literacyand Academic Performance
No ratings yet
Gabejan Takenaka 2021 Students Computer Literacyand Academic Performance
15 pages
Marketing Experiment2 PDF
No ratings yet
Marketing Experiment2 PDF
9 pages
Digital Experimentation
No ratings yet
Digital Experimentation
11 pages
Lect 4 Software Testing Fundamentals
No ratings yet
Lect 4 Software Testing Fundamentals
36 pages
Lessons, Tactics, and Stories From World-Class Experimenters
No ratings yet
Lessons, Tactics, and Stories From World-Class Experimenters
23 pages
Parte 2
No ratings yet
Parte 2
20 pages
A-B Testing - Framework-2025061017080742
No ratings yet
A-B Testing - Framework-2025061017080742
5 pages
Academic and Social Motivation Practical Research 2
No ratings yet
Academic and Social Motivation Practical Research 2
29 pages
Bayesian Cost Effectiveness Analysis With The R Package BCEA PDF
No ratings yet
Bayesian Cost Effectiveness Analysis With The R Package BCEA PDF
181 pages
App Testing
No ratings yet
App Testing
10 pages
2020-04-08 A - B-Tests - Jasmin Yaya
No ratings yet
2020-04-08 A - B-Tests - Jasmin Yaya
42 pages
How To Design Smart Business Experiment
No ratings yet
How To Design Smart Business Experiment
1 page
Unit 4
No ratings yet
Unit 4
16 pages
AB Test Notes
No ratings yet
AB Test Notes
7 pages
B Experiments
No ratings yet
B Experiments
3 pages
Whitepaper Getting Started With Product Experiemtation
No ratings yet
Whitepaper Getting Started With Product Experiemtation
17 pages
Chap 4
No ratings yet
Chap 4
1 page
A Comprehensive Getting Started Guide To A/B Testing
No ratings yet
A Comprehensive Getting Started Guide To A/B Testing
8 pages
Experimentation
No ratings yet
Experimentation
4 pages
25 A - B Testing Concepts You Must Know - Interview Refresher
No ratings yet
25 A - B Testing Concepts You Must Know - Interview Refresher
7 pages
VAR Models and Cointegration - Sebastian Fossati
No ratings yet
VAR Models and Cointegration - Sebastian Fossati
64 pages
AB Cheatsheet
No ratings yet
AB Cheatsheet
13 pages
Taxation As A Tool For Selling Life Insurance For IDBI Federal Life Insurance Co LTD
0% (1)
Taxation As A Tool For Selling Life Insurance For IDBI Federal Life Insurance Co LTD
33 pages
Annapoorani
No ratings yet
Annapoorani
232 pages
Defining Software Testing Test Strategy: February 26th, 2010
No ratings yet
Defining Software Testing Test Strategy: February 26th, 2010
4 pages
Essentials of Bio Statistics Research Methodology
No ratings yet
Essentials of Bio Statistics Research Methodology
7 pages
Chapter One 1.1 Background of The Study
No ratings yet
Chapter One 1.1 Background of The Study
28 pages
Gamification As A Tool For Engaging Student Learning: A Field Experiment With A Gamified App
No ratings yet
Gamification As A Tool For Engaging Student Learning: A Field Experiment With A Gamified App
18 pages
40-Article Text-107-1-10-20190426
No ratings yet
40-Article Text-107-1-10-20190426
4 pages
The Mediating Effects of Work-Life Balance On The Relationship of School Heads' Leadership Practices and Teachers' Work Satisfaction
No ratings yet
The Mediating Effects of Work-Life Balance On The Relationship of School Heads' Leadership Practices and Teachers' Work Satisfaction
13 pages
The Use of Digital Storytelling in Teaching Listening Comprehension
No ratings yet
The Use of Digital Storytelling in Teaching Listening Comprehension
9 pages
Thesis 2
No ratings yet
Thesis 2
70 pages
Impact of Perceived Vlog Content On Customer Intention To Visit A Tourism Destination in Southern India
No ratings yet
Impact of Perceived Vlog Content On Customer Intention To Visit A Tourism Destination in Southern India
8 pages
Planning Data Analysis
No ratings yet
Planning Data Analysis
11 pages
Aspects of Strategic Intelligence and Its Role in Achieving Organizational Agility: An Empirical Investigation
No ratings yet
Aspects of Strategic Intelligence and Its Role in Achieving Organizational Agility: An Empirical Investigation
10 pages
Role of Social and Economic Infrastructure in Economic Development of Punjab
No ratings yet
Role of Social and Economic Infrastructure in Economic Development of Punjab
8 pages
Written Report - 6.419x Module 1
No ratings yet
Written Report - 6.419x Module 1
8 pages
s00477 022 02330 y
No ratings yet
s00477 022 02330 y
26 pages
Channelling Fisher Randomization Tests and The Statistical Insignificance of Seemingly Significant Experimental Results
No ratings yet
Channelling Fisher Randomization Tests and The Statistical Insignificance of Seemingly Significant Experimental Results
47 pages
Students' Waste Disposal: A Disciplinary Problem in Tertiary Institutions
No ratings yet
Students' Waste Disposal: A Disciplinary Problem in Tertiary Institutions
15 pages
Ha01 - PP Test
No ratings yet
Ha01 - PP Test
17 pages
Vol 1.1 Cahaya & Hijratul A
No ratings yet
Vol 1.1 Cahaya & Hijratul A
11 pages
RM Practical Final
No ratings yet
RM Practical Final
17 pages
Lutchman Katrina - Exam - Psyc 3256el 12
No ratings yet
Lutchman Katrina - Exam - Psyc 3256el 12
10 pages
Study Abroad Objectives and Satisfaction of International Students in Japan. by Matthias Hennings and Shin Tanabe, pp.1914-1925 (PDF, Web)
No ratings yet
Study Abroad Objectives and Satisfaction of International Students in Japan. by Matthias Hennings and Shin Tanabe, pp.1914-1925 (PDF, Web)
12 pages
BSC 311: Design and Analysis of Experiments First Semester 2021/22 Academic Year
No ratings yet
BSC 311: Design and Analysis of Experiments First Semester 2021/22 Academic Year
8 pages
ZACCARELLA FRIDERICI Merge in The Human Brain
No ratings yet
ZACCARELLA FRIDERICI Merge in The Human Brain
9 pages
RSM Basic PDF
No ratings yet
RSM Basic PDF
7 pages
Effective Test Case Writing
From Everand
Effective Test Case Writing
D. P. Harrison
4/5 (6)
Practical Statistical Process Control
From Everand
Practical Statistical Process Control
Colin Hardwick
5/5 (9)
Minitab® and Lean Six Sigma: A Guide to Improve Business Performance Metrics
From Everand
Minitab® and Lean Six Sigma: A Guide to Improve Business Performance Metrics
Forrest Breyfogle
5/5 (1)
How to Learn Digital Marketing from Scratch and Alone - Volume 02: PPC and Paid Ads: Maximizing Return on Investment for Series (Part 2)
From Everand
How to Learn Digital Marketing from Scratch and Alone - Volume 02: PPC and Paid Ads: Maximizing Return on Investment for Series (Part 2)
Max Editorial
No ratings yet
Software Testing: A Guide to Testing Mobile Apps, Websites, and Games
From Everand
Software Testing: A Guide to Testing Mobile Apps, Websites, and Games
Mark Garzone
4.5/5 (3)
LEAN STARTUP: Navigating Entrepreneurial Success Through Iterative Prototyping and Customer Validation (2023 Guide for Beginners)
From Everand
LEAN STARTUP: Navigating Entrepreneurial Success Through Iterative Prototyping and Customer Validation (2023 Guide for Beginners)
Brooke Holland
No ratings yet
Quality Assurance Testing from Beginner to Paid Professional, 1: Everything You Need to Know to Start a Career in Manual and Automated QA Testing
From Everand
Quality Assurance Testing from Beginner to Paid Professional, 1: Everything You Need to Know to Start a Career in Manual and Automated QA Testing
Bolakale Aremu
5/5 (1)
ISTQB Certified Tester Foundation Level Practice Exam Questions
From Everand
ISTQB Certified Tester Foundation Level Practice Exam Questions
Gabriel Awoyemi
5/5 (1)
How to Start a Career in QA: Steps and Tips
From Everand
How to Start a Career in QA: Steps and Tips
Idrak Mirzayev
No ratings yet
The Performance Appraisal Tool Kit: Redesigning Your Performance Review Template to Drive Individual and Organizational Change
From Everand
The Performance Appraisal Tool Kit: Redesigning Your Performance Review Template to Drive Individual and Organizational Change
Paul Falcone
No ratings yet
Ways to Achieve Quality
From Everand
Ways to Achieve Quality
chakrapani srinivasa
5/5 (1)
ISTQB Certified Tester Advanced Level Test Manager (CTAL-TM): Practice Questions Syllabus 2012
From Everand
ISTQB Certified Tester Advanced Level Test Manager (CTAL-TM): Practice Questions Syllabus 2012
Gabriel Awoyemi
No ratings yet
Project Management of Clinical Trials
From Everand
Project Management of Clinical Trials
Richard Chamberlain
No ratings yet
50 Basic Predictive Project Management Questions: A great primer for the PMP® and CAPM® Exams
From Everand
50 Basic Predictive Project Management Questions: A great primer for the PMP® and CAPM® Exams
Phill Akinwale
No ratings yet
Learn Software Testing in 24 Hours
From Everand
Learn Software Testing in 24 Hours
Alex Nordeen
No ratings yet
Change data capture Third Edition
From Everand
Change data capture Third Edition
Gerardus Blokdyk
No ratings yet
OpenMP Second Edition
From Everand
OpenMP Second Edition
Gerardus Blokdyk
No ratings yet
A B testing A Complete Guide
From Everand
A B testing A Complete Guide
Gerardus Blokdyk
No ratings yet
Test plan Complete Self-Assessment Guide
From Everand
Test plan Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
10G-PON Second Edition
From Everand
10G-PON Second Edition
Gerardus Blokdyk
No ratings yet

EP-Experimentation Best Practices-200125-135749

Uploaded by

EP-Experimentation Best Practices-200125-135749

Uploaded by

Experimentation Best Practices

Why Experimentation Best Practices?

Hypothesis: Define the upside

Before launching an experiment:

Define the business opportunity we are trying to improve

Goals: Without goals, test measurement is meaningless

Pre-analysis: do the calculations upfront

Experimentation set up plan: How to setup the test

Potential code risk for L 50% 10% 5%

Experiment Validation Pre-Launch: Check that everything is in place

Experimentation Analysis and Communication:

Institutionalize Insights from Experimentation:

Automated alert system:

Opportunity Action Items for Consideration

Standardize sample size requirement with power calculations: (Priority H)

You might also like