0% found this document useful (0 votes)
62 views4 pages

Private AI Whitepaper - 2023

Uploaded by

nata.kuzmich
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views4 pages

Private AI Whitepaper - 2023

Uploaded by

nata.kuzmich
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Introducing Private AI:

Elevating Data Privacy For Every Industry

About Private AI What is considered PII also depends on the relevant


local legislation, such as the General Data Protec-
Private AI is at the forefront of privacy solutions, pro- tion Regulation (GDPR) or California Consumer Pri-
viding an advanced machine learning system that vacy Act (CCPA). Learn more about PII here.
identifies, redacts, and replaces personally identifi-
able information (PII) across a wide spectrum of file Why Should You Care: Mishandling PII
types, including text, structured data, PDFs, audio, is a Ripple Effect
images, and more.
Inappropriate handling of PII can restrict the use of
Their technology is able to detect over 50 different data, delay revenue opportunities, reduce the effi-
entities of PII, PHI and PCI across more than 52 lan- ciency of data analytics and AI/ML modelling, and
guages (and growing!). Models are deployed on-prem damage your brand’s reputation.
via container so customer data is processed within
their own existing environments, and is never shared To avoid these problems, companies handling cus-
with anyone - not even Private AI. tomer data should rely on a data privacy expert to
determine whether their data has been properly
You can test their models using their web demo de-identified. With Private AI’s solution, you can
See the full list of supported entities & languages dramatically speed up and enhance the accuracy of
Visit the developer documentation your de-identification process. For any organization
holding personal data, the automatic redaction or
What is PII? de-identification of PII should be a mandatory step
before data is shared, both internally and externally.
Personally Identifiable Information (PII) involves a
range of data points that are capable of reveal- Why is PII Detection Difficult?
ing someone’s identity. Common bits of data like
names, phone numbers, and credit card numbers A successful data privacy solution must be able to
are known as ‘direct identifiers’ since they provide identify and remove both direct and quasi-identi-
direct (and sometimes immediate) identification. fiers. This is easier said than done: real-world data
is rarely clean-cut. It contains inconsistencies and
PII also includes ‘quasi-identifiers’, seemingly in- edge cases that defy rule-based systems. Consid-
nocuous details that, when combined, increase the er the inherent difficulties of classifying a name
risk of re-identification. For instance, while know- like Paris or June, or the phone number extension
ing a particular customer resides in Delaware may ‘x324’. Even clearly defined PII can take on many
not be highly revealing, combining this information different forms, such as driver’s licenses that have
with others, like their Buddhist faith, male gender, different international and regional formats, or
Dutch nationality, and heart medication usage cer- 16-digit credit card numbers that appear as 4-digit
tainly increases the chances of identification. blocks intertwined with other text and data in an

1 | private-ai.com Copyright © 2023 Private AI ®


ASR transcript (ie. “Could I have the first four dig- proprietary generative models, the synthetic PII
its of the card please? 4567. Thanks, the next four generation system replaces PII with entities that fit
please? 1325” etc.). Good luck getting a regex to the surrounding context. This method has numer-
catch and identify those entities accurately. ous benefits, including:

The Private AI Approach: How it Works 1. Preserving downstream model training integrity
(e.g., sentiment analysis, NER).
Private AI uses cutting-edge Machine Learning 2. Decreasing re-identification risk: if any person-
models that identify PII based on context, similar al data is missed, distinguishing between origi-
to how the human brain does. Their models are ca- nal and synthetic data is nearly impossible.
pable of detecting over 50 different types of direct
and quasi-identifiers in 52 different languages, PII Detection Benchmarks
with more entity types and languages added with
every new release. How does Private AI stack up against other ser-
vices? To find out, we created a 3,000-word test
Their models are actively worked on by a team of dataset to compare our models against AWS
over 20 linguists, data annotators, and privacy ex- Comprehend, spaCy, Microsoft Presidio, Nightfall,
perts, who make informed decisions on what is and and Google DLP. The test data was conversational
is not considered PII and actively refine their mod- data that contained sensitive health information
els to align with evolving global privacy regulations. and featured internet shorthand. Example length
Private AI is deployed via a self-hosted container ranged from 120 to 512 words and was carried out
and accessed using a REST API. Unlike third-party with the August 2021 version of each vendor’s
cloud APIs, no customer data is ever transmitted cloud offering, together with spaCy 3.0.0 and
to Private AI. The container comes in two versions: Presidio 2.2.21. The entity types considered in this
a CPU version that can run on any x86 CPU, and a test were: condition, date, email address, location,
GPU version for real-time or large-throughput de- medical process, name, occupation, organization,
ployments. Both versions rely on Private AI’s Neural origin, phone number, time, and url. Please see our
Network optimization IP and operate 25 times fast- documentation for descriptions of each entity.
er than open-source reference models. Precision, recall, and F1-score are displayed below
in Fig. 1. Metrics are calculated independently for
Private AI can also generate synthetic PII to replace each entity type at the word level, where a word
any PII found in the input data. Powered by their is a whitespace-separated piece of text.

FIGURE 1: PRECISION, RECALL, AND F1-SCORES ON HELD-OUT TEST DATA

1,00

0.95
Precision

Recall
0.90
F1-Score

0.85

0.80
Private AI Nightfall Microsoft Presidio Google DLP AWS Comprehend spaCy

2 | private-ai.com Copyright © 2023 Private AI ®


In addition to individual entity metrics, we also class-agnostic recall values for each service.
considered the amount of PII missed entirely, also
known as ‘class-agnostic recall’. This corresponds Please contact us if you would like a copy of the
to the binary classification problem of whether dataset used to test each service or the evalua-
a given word is PII or not. Fig. 2. below shows the tion toolkit we built to compare the services.

FIGURE 2: PII RECALL IN HELD-OUT TEST DATA. LOWER IS BETTER.

PII Missed as % of Total PII

60
PII Missed as % of Total PII

40

20

0
Private AI spaCy AWS Microsoft Nightfall Google DLP
Comprehend Presidio

Evaluation within Proof


“From all of the PII redaction products we’ve seen out there
of Concepts and Pilots
(and believe me, we’ve seen all of them), Private AI is the best
Private AI has been tested in one by far in terms of accuracy, types of data that can be
bake-offs by multi-billion dollar redacted, and flexibility of their models. After doing a side
companies, renowned health-
by side comparison it quickly became clear to us that we
care and financial institutions,
and major government agen- couldn’t go back to using something like AWS Comprehend.”
cies, and has emerged as the
most accurate solution in each Sebastian Jiminez
and every test. Founder, Rilla Voice

3 | private-ai.com Copyright © 2023 Private AI ®


Manual Evaluation • Nightfall offers very limited support for Protect-
ed Health Information (PHI).
In addition to the Precision, Recall, and F1 metrics • Nightfall offers a maximum of 50 detectors, lim-
presented above, we manually inspected the out- iting its use as a general PII detector.
put of each service. Here are some things we no-
ticed. Research

AWS Comprehend Private AI is at the forefront of research in priva-


cy-preserving Natural Language Processing and
• AWS Comprehend only supports a maximum in- studying re-identification risk within unstructured
put request length of 5000 characters. data. They frequently present and organize work-
• AWS Comprehend supports a wide range of nu- shops at conferences.
merical PII types, but these appear to be imple-
mented via regexes and do not perform well in For a list of their research papers and events they
real-world use. participate in, please visit their website.

Google DLP

• Being a DLP application, Google DLP prioritizes "We provide a speech-to-text transcription API
throughput over PII detection performance. For and needed to bring our redaction of credit
example, Google DLP misses even simple exam- cards, SSNs, and other personal financial and
ples such as ‘My name is Roshmi’.
health information up to the highest accuracy
• We found that Google DLP predicts “M.D.” in a
doctor’s name as a location. level possible. Private AI made that quick and
easy – now our accuracy numbers are through
Nightfall the roof and our clients are happy, which has
been amazing."
• Like Google DLP, Nightfall prioritizes throughput
over PII detection, due to their focus on process-
Dylan Fox
ing massive volumes of data efficiently.
CEO, AssemblyAI
• Nightfall only supports a maximum of 10 re-
quests per second.

Get Started Contact Us

Get an API key [email protected]


Book a demo @_PrivateAI
Try our web demo /private-ai

The GARTNER COOL VENDOR badge is a trademark and service mark of Gartner, Inc., and/or its affiliates, and is used herein with permission. All rights reserved. Gartner
does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest
ratings or other designation. Gartner research publications consist of the opinions of Gartner’s Research & Advisory organization and should not be construed as statements
of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

You might also like