Private AI Whitepaper - 2023

Uploaded by

nata.kuzmich

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views4 pages

Private AI Whitepaper - 2023

Uploaded by

nata.kuzmich

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Introducing Private AI:

Elevating Data Privacy For Every Industry

About Private AI What is considered PII also depends on the relevant

local legislation, such as the General Data Protec-
Private AI is at the forefront of privacy solutions, pro- tion Regulation (GDPR) or California Consumer Pri-
viding an advanced machine learning system that vacy Act (CCPA). Learn more about PII here.
identifies, redacts, and replaces personally identifi-
able information (PII) across a wide spectrum of file Why Should You Care: Mishandling PII
types, including text, structured data, PDFs, audio, is a Ripple Effect
images, and more.
Inappropriate handling of PII can restrict the use of
Their technology is able to detect over 50 different data, delay revenue opportunities, reduce the effi-
entities of PII, PHI and PCI across more than 52 lan- ciency of data analytics and AI/ML modelling, and
guages (and growing!). Models are deployed on-prem damage your brand’s reputation.
via container so customer data is processed within
their own existing environments, and is never shared To avoid these problems, companies handling cus-
with anyone - not even Private AI. tomer data should rely on a data privacy expert to
determine whether their data has been properly
You can test their models using their web demo de-identified. With Private AI’s solution, you can
See the full list of supported entities & languages dramatically speed up and enhance the accuracy of
Visit the developer documentation your de-identification process. For any organization
holding personal data, the automatic redaction or
What is PII? de-identification of PII should be a mandatory step
before data is shared, both internally and externally.
Personally Identifiable Information (PII) involves a
range of data points that are capable of reveal- Why is PII Detection Difficult?
ing someone’s identity. Common bits of data like
names, phone numbers, and credit card numbers A successful data privacy solution must be able to
are known as ‘direct identifiers’ since they provide identify and remove both direct and quasi-identi-
direct (and sometimes immediate) identification. fiers. This is easier said than done: real-world data
is rarely clean-cut. It contains inconsistencies and
PII also includes ‘quasi-identifiers’, seemingly in- edge cases that defy rule-based systems. Consid-
nocuous details that, when combined, increase the er the inherent difficulties of classifying a name
risk of re-identification. For instance, while know- like Paris or June, or the phone number extension
ing a particular customer resides in Delaware may ‘x324’. Even clearly defined PII can take on many
not be highly revealing, combining this information different forms, such as driver’s licenses that have
with others, like their Buddhist faith, male gender, different international and regional formats, or
Dutch nationality, and heart medication usage cer- 16-digit credit card numbers that appear as 4-digit
tainly increases the chances of identification. blocks intertwined with other text and data in an

1 | private-ai.com Copyright © 2023 Private AI ®

ASR transcript (ie. “Could I have the first four dig- proprietary generative models, the synthetic PII
its of the card please? 4567. Thanks, the next four generation system replaces PII with entities that fit
please? 1325” etc.). Good luck getting a regex to the surrounding context. This method has numer-
catch and identify those entities accurately. ous benefits, including:

The Private AI Approach: How it Works 1. Preserving downstream model training integrity
(e.g., sentiment analysis, NER).
Private AI uses cutting-edge Machine Learning 2. Decreasing re-identification risk: if any person-
models that identify PII based on context, similar al data is missed, distinguishing between origi-
to how the human brain does. Their models are ca- nal and synthetic data is nearly impossible.
pable of detecting over 50 different types of direct
and quasi-identifiers in 52 different languages, PII Detection Benchmarks
with more entity types and languages added with
every new release. How does Private AI stack up against other ser-
vices? To find out, we created a 3,000-word test
Their models are actively worked on by a team of dataset to compare our models against AWS
over 20 linguists, data annotators, and privacy ex- Comprehend, spaCy, Microsoft Presidio, Nightfall,
perts, who make informed decisions on what is and and Google DLP. The test data was conversational
is not considered PII and actively refine their mod- data that contained sensitive health information
els to align with evolving global privacy regulations. and featured internet shorthand. Example length
Private AI is deployed via a self-hosted container ranged from 120 to 512 words and was carried out
and accessed using a REST API. Unlike third-party with the August 2021 version of each vendor’s
cloud APIs, no customer data is ever transmitted cloud offering, together with spaCy 3.0.0 and
to Private AI. The container comes in two versions: Presidio 2.2.21. The entity types considered in this
a CPU version that can run on any x86 CPU, and a test were: condition, date, email address, location,
GPU version for real-time or large-throughput de- medical process, name, occupation, organization,
ployments. Both versions rely on Private AI’s Neural origin, phone number, time, and url. Please see our
Network optimization IP and operate 25 times fast- documentation for descriptions of each entity.
er than open-source reference models. Precision, recall, and F1-score are displayed below
in Fig. 1. Metrics are calculated independently for
Private AI can also generate synthetic PII to replace each entity type at the word level, where a word
any PII found in the input data. Powered by their is a whitespace-separated piece of text.

FIGURE 1: PRECISION, RECALL, AND F1-SCORES ON HELD-OUT TEST DATA

1,00

0.95
Precision

Recall
0.90
F1-Score

0.85

0.80
Private AI Nightfall Microsoft Presidio Google DLP AWS Comprehend spaCy

2 | private-ai.com Copyright © 2023 Private AI ®

In addition to individual entity metrics, we also class-agnostic recall values for each service.
considered the amount of PII missed entirely, also
known as ‘class-agnostic recall’. This corresponds Please contact us if you would like a copy of the
to the binary classification problem of whether dataset used to test each service or the evalua-
a given word is PII or not. Fig. 2. below shows the tion toolkit we built to compare the services.

FIGURE 2: PII RECALL IN HELD-OUT TEST DATA. LOWER IS BETTER.

PII Missed as % of Total PII

60
PII Missed as % of Total PII

0
Private AI spaCy AWS Microsoft Nightfall Google DLP
Comprehend Presidio

Evaluation within Proof

“From all of the PII redaction products we’ve seen out there
of Concepts and Pilots
(and believe me, we’ve seen all of them), Private AI is the best
Private AI has been tested in one by far in terms of accuracy, types of data that can be
bake-offs by multi-billion dollar redacted, and flexibility of their models. After doing a side
companies, renowned health-
by side comparison it quickly became clear to us that we
care and financial institutions,
and major government agen- couldn’t go back to using something like AWS Comprehend.”
cies, and has emerged as the
most accurate solution in each Sebastian Jiminez
and every test. Founder, Rilla Voice

Manual Evaluation • Nightfall offers very limited support for Protect-
ed Health Information (PHI).
In addition to the Precision, Recall, and F1 metrics • Nightfall offers a maximum of 50 detectors, lim-
presented above, we manually inspected the out- iting its use as a general PII detector.
put of each service. Here are some things we no-
ticed. Research

AWS Comprehend Private AI is at the forefront of research in priva-

cy-preserving Natural Language Processing and
• AWS Comprehend only supports a maximum in- studying re-identification risk within unstructured
put request length of 5000 characters. data. They frequently present and organize work-
• AWS Comprehend supports a wide range of nu- shops at conferences.
merical PII types, but these appear to be imple-
mented via regexes and do not perform well in For a list of their research papers and events they
real-world use. participate in, please visit their website.

Google DLP

• Being a DLP application, Google DLP prioritizes "We provide a speech-to-text transcription API
throughput over PII detection performance. For and needed to bring our redaction of credit
example, Google DLP misses even simple exam- cards, SSNs, and other personal financial and
ples such as ‘My name is Roshmi’.
health information up to the highest accuracy
• We found that Google DLP predicts “M.D.” in a
doctor’s name as a location. level possible. Private AI made that quick and
easy – now our accuracy numbers are through
Nightfall the roof and our clients are happy, which has
been amazing."
• Like Google DLP, Nightfall prioritizes throughput
over PII detection, due to their focus on process-
Dylan Fox
ing massive volumes of data efficiently.
CEO, AssemblyAI
• Nightfall only supports a maximum of 10 re-
quests per second.

Get Started Contact Us

Get an API key [email protected]

Book a demo @_PrivateAI
Try our web demo /private-ai

The GARTNER COOL VENDOR badge is a trademark and service mark of Gartner, Inc., and/or its affiliates, and is used herein with permission. All rights reserved. Gartner
does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest
ratings or other designation. Gartner research publications consist of the opinions of Gartner’s Research & Advisory organization and should not be construed as statements
of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Report-0.23883800 1734535397
No ratings yet
Report-0.23883800 1734535397
3 pages
Unit 6 PII MACHINE LEARNING
No ratings yet
Unit 6 PII MACHINE LEARNING
43 pages
Pii Awareness Training: Don'T Be Tomorrow'S Headlines
No ratings yet
Pii Awareness Training: Don'T Be Tomorrow'S Headlines
38 pages
You Are What You Buy Personal Information Extracti
No ratings yet
You Are What You Buy Personal Information Extracti
9 pages
16 - Explaining Data Privacy and Protection Concepts
No ratings yet
16 - Explaining Data Privacy and Protection Concepts
19 pages
Privacy-Aware Blockchain for PII Management
No ratings yet
Privacy-Aware Blockchain for PII Management
12 pages
Nike PII Annotation Training Guide
No ratings yet
Nike PII Annotation Training Guide
16 pages
Skopeai
No ratings yet
Skopeai
8 pages
Privacydataprotection 170312050400
No ratings yet
Privacydataprotection 170312050400
17 pages
T RESPAIPS I m4 l2 en File 22.en
No ratings yet
T RESPAIPS I m4 l2 en File 22.en
232 pages
Private Transformer Inference in Mlaas: A Survey: Yang Li Xinyu Zhou Yitong Wang Liangxin Qian Jun Zhao
No ratings yet
Private Transformer Inference in Mlaas: A Survey: Yang Li Xinyu Zhou Yitong Wang Liangxin Qian Jun Zhao
9 pages
Personally Identifiable Information (PII) - The 21st Century Threat
No ratings yet
Personally Identifiable Information (PII) - The 21st Century Threat
12 pages
Privacy in The Age of Innovation AI Solutions For Information Security (Ranadeep Reddy Palle Etc.) (Z-Library)
No ratings yet
Privacy in The Age of Innovation AI Solutions For Information Security (Ranadeep Reddy Palle Etc.) (Z-Library)
170 pages
OneTrust - Mastering PIAs and DPIAs
No ratings yet
OneTrust - Mastering PIAs and DPIAs
35 pages
(PDF) Security and Privacy
No ratings yet
(PDF) Security and Privacy
9 pages
Understanding Data Privacy Concerns
100% (1)
Understanding Data Privacy Concerns
15 pages
Privacy & PII Awareness Training
No ratings yet
Privacy & PII Awareness Training
19 pages
AI For Data Privacy: Balancing Innovation With Protection
No ratings yet
AI For Data Privacy: Balancing Innovation With Protection
6 pages
Vanisri 947
No ratings yet
Vanisri 947
9 pages
14 Module Six Privacy
No ratings yet
14 Module Six Privacy
45 pages
Chapter 16
No ratings yet
Chapter 16
34 pages
Chapter Summary Cybersecurity
No ratings yet
Chapter Summary Cybersecurity
15 pages
AI Privacy Concerns & Data Security
No ratings yet
AI Privacy Concerns & Data Security
21 pages
408 476 1 SM
No ratings yet
408 476 1 SM
8 pages
PrivateGPT: The Fully Private Solution For Question Answering
No ratings yet
PrivateGPT: The Fully Private Solution For Question Answering
4 pages
Roberto Paper
No ratings yet
Roberto Paper
6 pages
CYBR1003 - Asset Security - Unit 1.3
No ratings yet
CYBR1003 - Asset Security - Unit 1.3
33 pages
A Survey On Private Transformer Inference: Yang Li
No ratings yet
A Survey On Private Transformer Inference: Yang Li
24 pages
How Can We, or How Should We, Use Data: - Legal Standards
No ratings yet
How Can We, or How Should We, Use Data: - Legal Standards
16 pages
AI Privacy Risk Management Models
No ratings yet
AI Privacy Risk Management Models
99 pages
Harnessing AI For Data Privacy Through A Multidimensional Framework
No ratings yet
Harnessing AI For Data Privacy Through A Multidimensional Framework
20 pages
B-DLP Machine Learning - WP En-Us
No ratings yet
B-DLP Machine Learning - WP En-Us
8 pages
Identifying and Protecting Assets Against Data Breaches
No ratings yet
Identifying and Protecting Assets Against Data Breaches
156 pages
Data Privacy Solutions for Businesses
No ratings yet
Data Privacy Solutions for Businesses
30 pages
Data Loss Prevention
No ratings yet
Data Loss Prevention
12 pages
Privacy Challenges in The Age of Artificial Intelligence
0% (1)
Privacy Challenges in The Age of Artificial Intelligence
2 pages
Title The Impact of Artificial Intelligence On Data Security and Privacy 2
No ratings yet
Title The Impact of Artificial Intelligence On Data Security and Privacy 2
10 pages
Personal AI
No ratings yet
Personal AI
2 pages
Hemanth Report (2) - Merged
No ratings yet
Hemanth Report (2) - Merged
38 pages
Chapter 6
No ratings yet
Chapter 6
25 pages
Privacy and Data Security Concerns in AI1
No ratings yet
Privacy and Data Security Concerns in AI1
17 pages
AI Privacy and Security Final
No ratings yet
AI Privacy and Security Final
59 pages
Pratikmm
No ratings yet
Pratikmm
3 pages
Contextual Integrity: Usable Privacy Model
No ratings yet
Contextual Integrity: Usable Privacy Model
8 pages
Module 2
No ratings yet
Module 2
39 pages
4.data Privacy and Protection Concepts
No ratings yet
4.data Privacy and Protection Concepts
18 pages
AI's Impact on Cybersecurity & Growth
No ratings yet
AI's Impact on Cybersecurity & Growth
9 pages
Securing PII: A Proactive Framework
No ratings yet
Securing PII: A Proactive Framework
6 pages
PII Management in Windows L Systems
No ratings yet
PII Management in Windows L Systems
1 page
AI Meets Anonymity: How Named Entity Recognition Is Redefining Data Privacy
No ratings yet
AI Meets Anonymity: How Named Entity Recognition Is Redefining Data Privacy
9 pages
PYTHON - Mini Topics
No ratings yet
PYTHON - Mini Topics
2 pages
Data Anonymization and Differential Privacy: Department of Computer Science City University of Hong Kong
No ratings yet
Data Anonymization and Differential Privacy: Department of Computer Science City University of Hong Kong
95 pages
2018 Privacy Impact Assessment
89% (9)
2018 Privacy Impact Assessment
15 pages
HawkTech Advance Solutions Product Presentation
No ratings yet
HawkTech Advance Solutions Product Presentation
33 pages
Information Security
No ratings yet
Information Security
36 pages
Secure AI Adoption for Enterprises
No ratings yet
Secure AI Adoption for Enterprises
13 pages
PET Companies
No ratings yet
PET Companies
4 pages
Cybersecurity - Unit 5
100% (1)
Cybersecurity - Unit 5
36 pages
PII Labeling Guidelines for Model Training
No ratings yet
PII Labeling Guidelines for Model Training
4 pages
Neela Film Productions Private Limited V TaarakMehtaKaOoltahChashmah Com Ors
No ratings yet
Neela Film Productions Private Limited V TaarakMehtaKaOoltahChashmah Com Ors
49 pages
تفعيل مساهمة مجلس المحاسبة في ترقية الحكم الراشد وفقا لرؤية الانتوساي وفقا للتعديل الدستوري 2020.
No ratings yet
تفعيل مساهمة مجلس المحاسبة في ترقية الحكم الراشد وفقا لرؤية الانتوساي وفقا للتعديل الدستوري 2020.
21 pages
Classification of Prisoners in the Philippines
No ratings yet
Classification of Prisoners in the Philippines
5 pages
Form 1
No ratings yet
Form 1
2 pages
Memorandum of Understanding (Eva2z)
No ratings yet
Memorandum of Understanding (Eva2z)
3 pages
Firmware Upgrade Tool User Manual
No ratings yet
Firmware Upgrade Tool User Manual
15 pages
Chapter 20
No ratings yet
Chapter 20
54 pages
Toyota's Corporate Governance
No ratings yet
Toyota's Corporate Governance
3 pages
PO Box 14268, Lexington, KY 40512: Hartford Casualty Insurance Company Hartford Casualty Insurance Company
No ratings yet
PO Box 14268, Lexington, KY 40512: Hartford Casualty Insurance Company Hartford Casualty Insurance Company
11 pages
Legal Ethics
100% (14)
Legal Ethics
222 pages
Natco Pharma LTD v. Bayer Corporation
No ratings yet
Natco Pharma LTD v. Bayer Corporation
13 pages
Vahura Sample Compensation Report
No ratings yet
Vahura Sample Compensation Report
7 pages
Hitachi ID Identity Manager Brochure
No ratings yet
Hitachi ID Identity Manager Brochure
2 pages
Grade 9 EMS Final Examination 2023
100% (3)
Grade 9 EMS Final Examination 2023
5 pages
Activity 3
92% (12)
Activity 3
2 pages
Magic Circle Affair Report
No ratings yet
Magic Circle Affair Report
104 pages
Pricing Strategies: Principles of Marketing
No ratings yet
Pricing Strategies: Principles of Marketing
4 pages
Uttar Pradesh History Congress
100% (1)
Uttar Pradesh History Congress
4 pages
Wp1002 G P S 026 0842 Pensions Administration
No ratings yet
Wp1002 G P S 026 0842 Pensions Administration
3 pages
2 Kenny Questionnaire
No ratings yet
2 Kenny Questionnaire
3 pages
BUSANA1 Chapter 4: Sinking Fund
No ratings yet
BUSANA1 Chapter 4: Sinking Fund
17 pages
Iso 18404 2015 en PDF
100% (1)
Iso 18404 2015 en PDF
11 pages
Jdjudjd
No ratings yet
Jdjudjd
1 page
Ipsas 17 - Property, Plant and Equipment
No ratings yet
Ipsas 17 - Property, Plant and Equipment
6 pages
The Influencer: An Entertainment Mogul Sets His Sights On Foreign Policy
No ratings yet
The Influencer: An Entertainment Mogul Sets His Sights On Foreign Policy
13 pages
TQL Contact Info: Driver/Carrier Information Sheet TQL Po# 12491136
No ratings yet
TQL Contact Info: Driver/Carrier Information Sheet TQL Po# 12491136
2 pages
Nelson Mandela Rough Draft
No ratings yet
Nelson Mandela Rough Draft
4 pages
New Ulm Boulevard Tree Reimbursement Program
No ratings yet
New Ulm Boulevard Tree Reimbursement Program
1 page
Exercise 17.11 Solution
100% (1)
Exercise 17.11 Solution
3 pages
CH 12
100% (2)
CH 12
50 pages