0% found this document useful (0 votes)

73 views28 pages

Pydata 2021 CV Tesco

This document discusses lessons learned from image classification projects in retail settings. It describes how Tesco generates image classification datasets from checkout and CCTV data, which can include unanticipated situations like empty checkouts or obscured products. The document also discusses improving label quality through review, dealing with imbalanced datasets, recalibrating models to new distributions, challenges with distribution shift, and using metric learning to classify all products scalably.

Uploaded by

test

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views28 pages

Pydata 2021 CV Tesco

Uploaded by

test

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Image classification

in retail.
Lessons from the real world

Valentina Bono and Paul Klinger

Summary

• Data Science at Tesco

• Image classification
• Dataset generation: data sources
• Dataset generation: unanticipated situations
• Label quality & label refinement
• Imbalanced datasets
• Distribution shift
• Metric learning

2
3
We Work End-to-End Across Tesco
WORKFORCE
MANAGEMENT
SECURITY
SUPPLY CHAIN
PROPERTY

SUPPORT
FULFILMENT

STORE
STORE OPERATIONS
COMMERCIAL PLANNING
ONLINE

MARKETING
TRANSPORT & LOYALTY

FINANCE

4
What? Image Classification CLASSES

IMAGES

5
What? Image Classification

6
Data sources: checkout data and CCTV videos

7
Unanticipated situations: Empty checkout

8
Unanticipated situations: Covered checkout

9
Unanticipated situations: what class?

10
Unanticipated situations: loose versus packaged

11
Unanticipated situations: brown bag

12
Label quality: how to improve the labels
Can you
recognise Is class 1 Is class 2
Class label
the visible? visible?
product?
Yes Class 1 & 2

Yes

No Class 1

Yes

Yes Class 2

No
Look at image
No Other

No Occluded/Empty

13
Label quality: how to improve the labels
Occluded/empty Avocado Avocado

Is this correct? Avocado or onion? Avocado or mango?

Can Carrot Pepper

14
Is this a can or a box? Carrot or onion? Pepper or tomato?
Label quality: how to improve the labels

What class is this? Can you recognise it now?

15
Dealing with imbalanced datasets

Real-world datasets are often very imbalanced. Images per class

• How do you train on them

(oversampling / undersampling)?
• What distribution will the model encounter once
deployed?
• How do you evaluate (against real or corrected
distribution)?
• Is there an additional weighting from business
requirements?

16
Recalibrating models to a new distribution

Given a model that gives calibrated probabilities for a Effect of recalibration

distribution A, how do we get calibrated probabilities
0.9
for a different distribution B?
0.8
0.7
Simply rescale probabilities by the ratio of the 0.6
distributions and normalize! 0.5
0.4
0.3
0.2
0.1
0
Train Test raw output transformed
output

apple banana pear

17
Recalibrating models to a new distribution

This means that we can train a model on distribution A But what if it’s not?
and then transform it to one adapted to the test
distribution. We can just rescale each output probability by an
arbitrary factor (and normalise).

Assuming the model was well calibrated before, this

choice of transformation gives the best possible So we have number_of_classes extra free parameters to
performance on the test set. tune the model!

18
Distribution invariant metrics?

In the binary classification setting the equivalent to For multiclass problems there is no obvious way to
recalibrating is just setting a threshold. extend this.
The AUC gives a metric that’s invariant under setting the
threshold.
There are multiclass versions of AUC, but they are not
independent of the rescaling!

(If you know of a metric that is independent let me know!)

19
Distribution shift
The obvious and the not-so-obvious

• 2 models can perform similarly on an unseen test set,

but differ a lot in their ability to generalise

• Synthetic transformations (data augmentation) can

help a bit, but real world effects can be subtle.

• Models that have seen a wider variety of images are

better at handling distribution shift (at least if they are
not fine-tuned to convergence).

The Evolution of Out-of-Distribution Robustness Throughout

Fine-Tuning, Andreassen et al.
arXiv:2106.15831

20
Distribution shift
The obvious and the not-so-obvious

When the shift looks similar to what the model is supposed to detect it gets tricky!

21
Distribution shift
The obvious and the not-so-obvious

When the shift looks similar to what the model is supposed to detect it gets tricky!

22
Classify all the products!
Metric learning for scalable product classification

Work by our PhD intern Charles

Goal:
• Classify every product (10s of thousands)
• Handle changes (new products added, old ones
removed) without retraining
-> Metric learning
• Embed images into a ”product space” such that
images of the same (or similar) products cluster
together.
• At test time, embed the query and compare to stored
database of embeddings.
• Can change the product range by changing the
database, without touching the model.

23
Classify all the products!
Metric learning for scalable product classification

Challenges: Training with softmax works surprisingly well, see also

• Lots of occlusion, various other objects in view Classification is a Strong Baseline for Deep Metric Learning
(BMVC ‘19) Andrew Zhai, Hao-Yu Wu
• Need the model to focus on the product, not anything
else (different from many academic datasets)

24
Questions?
We are hiring!
https://www.tesco-careers.com/technology/uk/en/c/data-jobs

• Data Science
• (Senior) Data Scientist – Machine Learning
• (Senior) Data Scientist – Time Series Forecasting
• (Senior) Data Scientist – Computer Vision
• Data Science Intern

Contact us:
• Data Science Engineering
• Data Science Software Development Manager [email protected] (Recruiter)
• Lead Machine Learning Engineer – Computer Vision [email protected] (DS Leadership)
• (Senior) Machine Learning Engineer

• Analytics
• (Senior) Data Analyst

26
Backup Slides
Rescale probabilities formulas

Rescale from distribution A to distribution B

𝑝! (𝑥" ) General rescaling
𝑝 (𝑥 |𝑥)
𝑝# (𝑥" ) # " 𝑓" 𝑝# (𝑥" |𝑥)
𝑝! 𝑥" 𝑥) = 𝑝! 𝑥" 𝑥) = '
𝑝 𝑥 ∑$%& 𝑓$ 𝑝# (𝑥$ |𝑥)
∑'$%& ! $ 𝑝# (𝑥$ |𝑥)
𝑝# 𝑥$

(for n classes).

Adjusting the Outputs of a Classifier to New a Priori Probabilities

May Significantly Improve Classification Accuracy: Evidence from a
Multi-Class Problem in Remote Sensing
Latinne, Saerens, Decaestecker
2001, ICML '01

Image Classification - Building Image Classification Model
No ratings yet
Image Classification - Building Image Classification Model
18 pages
CH 03
No ratings yet
CH 03
48 pages
Data Prep and Cleaning For Machine Learning
No ratings yet
Data Prep and Cleaning For Machine Learning
22 pages
Data Science & Analytics Basics
No ratings yet
Data Science & Analytics Basics
71 pages
Data Imbalance in Classification
No ratings yet
Data Imbalance in Classification
13 pages
DSand ML
No ratings yet
DSand ML
76 pages
Fods Unit 1
No ratings yet
Fods Unit 1
9 pages
CRM Descriptive Analytics Guide
No ratings yet
CRM Descriptive Analytics Guide
33 pages
Report of Mini Project
No ratings yet
Report of Mini Project
53 pages
Data Mining and BI - Student Notes 2
No ratings yet
Data Mining and BI - Student Notes 2
40 pages
Big Data Lesson 2 Lucrezia Noli
No ratings yet
Big Data Lesson 2 Lucrezia Noli
21 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
نسخة من prep
No ratings yet
نسخة من prep
17 pages
Data Analytics
No ratings yet
Data Analytics
24 pages
Example of Customer Data For Data Science Problems
No ratings yet
Example of Customer Data For Data Science Problems
5 pages
Big Mart Sales Prediction Using Machine Learning Report PDF
No ratings yet
Big Mart Sales Prediction Using Machine Learning Report PDF
56 pages
FDS Unit-4
No ratings yet
FDS Unit-4
15 pages
An Introduction To Pattern Recognition - 2
No ratings yet
An Introduction To Pattern Recognition - 2
46 pages
Exploring, Transforming, and Summarizing Input Datasets For Building Classification Models
No ratings yet
Exploring, Transforming, and Summarizing Input Datasets For Building Classification Models
21 pages
Predictive Analytics in Marketing
No ratings yet
Predictive Analytics in Marketing
90 pages
Unit I - Data Science Fundamentals
No ratings yet
Unit I - Data Science Fundamentals
6 pages
Allpiedml Unit2
No ratings yet
Allpiedml Unit2
19 pages
08 Classification
No ratings yet
08 Classification
26 pages
6 Data Mining Functionalities 08-01-2025
No ratings yet
6 Data Mining Functionalities 08-01-2025
23 pages
Capstone
No ratings yet
Capstone
20 pages
Week5 Modified
No ratings yet
Week5 Modified
25 pages
Machine Learning - Lec4 - 5
No ratings yet
Machine Learning - Lec4 - 5
41 pages
BA Full Note 1
No ratings yet
BA Full Note 1
183 pages
SWE622 Lecture 3 Classification
No ratings yet
SWE622 Lecture 3 Classification
57 pages
Dav Unit 3
No ratings yet
Dav Unit 3
50 pages
Flower Detecting Using Python Report
No ratings yet
Flower Detecting Using Python Report
6 pages
Midterm Comprog 1
No ratings yet
Midterm Comprog 1
1 page
Notes Software Engineering Unit 1
No ratings yet
Notes Software Engineering Unit 1
20 pages
M.Sc. Computer Science First Semester Office Tools: Unit I Multiple Choice Questions
No ratings yet
M.Sc. Computer Science First Semester Office Tools: Unit I Multiple Choice Questions
7 pages
COMPASS使用手册
No ratings yet
COMPASS使用手册
383 pages
Cataloog Hisense 12-06 at
No ratings yet
Cataloog Hisense 12-06 at
2 pages
Umbrello Handbook X
No ratings yet
Umbrello Handbook X
41 pages
CENG304 - Lec05
No ratings yet
CENG304 - Lec05
25 pages
Web Analytics Course Emarketing Institute Ebook 2018 Edition PDF
100% (1)
Web Analytics Course Emarketing Institute Ebook 2018 Edition PDF
160 pages
Web Programming Lab Guide
No ratings yet
Web Programming Lab Guide
27 pages
SJ-20141121113158-004-NetNumen U31 R06 (V12.14.30) Unified Element Management System Performance Management Description
No ratings yet
SJ-20141121113158-004-NetNumen U31 R06 (V12.14.30) Unified Element Management System Performance Management Description
17 pages
ltc6813 1
No ratings yet
ltc6813 1
90 pages
2016 TELI-Handbook Q1
No ratings yet
2016 TELI-Handbook Q1
13 pages
VMware Vsphere 5.1 Install Configure Manage Lab Manual
100% (3)
VMware Vsphere 5.1 Install Configure Manage Lab Manual
152 pages
Part Code Book
0% (1)
Part Code Book
42 pages
Sinamics g120
No ratings yet
Sinamics g120
732 pages
Call Report
No ratings yet
Call Report
1 page
ORION and B2B Training For Suppliers V5
No ratings yet
ORION and B2B Training For Suppliers V5
61 pages
CSIT Student Portal App
No ratings yet
CSIT Student Portal App
78 pages
Dynamic Routing Protocols (OSPF/RIP/BGP)
No ratings yet
Dynamic Routing Protocols (OSPF/RIP/BGP)
41 pages
Module 162.L.3 Identifying Indicators of Compromise
No ratings yet
Module 162.L.3 Identifying Indicators of Compromise
19 pages
2162 Term Project: The Tomasulo Algorithm Implementation
No ratings yet
2162 Term Project: The Tomasulo Algorithm Implementation
5 pages
RAR 5.0 Archive Format
No ratings yet
RAR 5.0 Archive Format
10 pages
Engineering Excel Assignment
No ratings yet
Engineering Excel Assignment
3 pages
CCS332 - App Development
No ratings yet
CCS332 - App Development
3 pages
Program To List Folders
No ratings yet
Program To List Folders
9 pages
SAP SD Tables Reference Guide
No ratings yet
SAP SD Tables Reference Guide
5 pages
L 5 Csa Cloud Infrastructure
No ratings yet
L 5 Csa Cloud Infrastructure
24 pages
Dukpt Explained
100% (1)
Dukpt Explained
42 pages

Pydata 2021 CV Tesco

Uploaded by

Pydata 2021 CV Tesco

Uploaded by

Image classification

Valentina Bono and Paul Klinger

• Data Science at Tesco

Is this correct? Avocado or onion? Avocado or mango?

Can Carrot Pepper

What class is this? Can you recognise it now?

Real-world datasets are often very imbalanced. Images per class

• How do you train on them

Given a model that gives calibrated probabilities for a Effect of recalibration

apple banana pear

Assuming the model was well calibrated before, this

(If you know of a metric that is independent let me know!)

• 2 models can perform similarly on an unseen test set,

• Synthetic transformations (data augmentation) can

• Models that have seen a wider variety of images are

The Evolution of Out-of-Distribution Robustness Throughout

Work by our PhD intern Charles

Challenges: Training with softmax works surprisingly well, see also

Rescale from distribution A to distribution B

Adjusting the Outputs of a Classifier to New a Priori Probabilities

You might also like