Accelerator Guide
Accelerator Guide
0)
Accelerator Guide
Informatica Data Quality Accelerator Guide
Version 10.0
November 2015
This software and documentation contain proprietary information of Informatica LLC and are provided under a license agreement containing restrictions on use and
disclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any
form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica LLC. This Software may be protected by U.S. and/or
international Patents and other Patents Pending.
Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as
provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013©(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14
(ALT III), as applicable.
The information in this product or documentation is subject to change without notice. If you find any problems in this product or documentation, please report them to us
in writing.
Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange,
PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica
On Demand, Informatica Identity Resolution, Informatica Application Information Lifecycle Management, Informatica Complex Event Processing, Ultra Messaging and
Informatica Master Data Management are trademarks or registered trademarks of Informatica LLC in the United States and in jurisdictions throughout the world. All
other company and product names may be trade names or trademarks of their respective owners.
Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: Copyright DataDirect Technologies. All rights
reserved. Copyright © Sun Microsystems. All rights reserved. Copyright © RSA Security Inc. All Rights Reserved. Copyright © Ordinal Technology Corp. All rights
reserved.Copyright © Aandacht c.v. All rights reserved. Copyright Genivia, Inc. All rights reserved. Copyright Isomorphic Software. All rights reserved. Copyright © Meta
Integration Technology, Inc. All rights reserved. Copyright © Intalio. All rights reserved. Copyright © Oracle. All rights reserved. Copyright © Adobe Systems
Incorporated. All rights reserved. Copyright © DataArt, Inc. All rights reserved. Copyright © ComponentSource. All rights reserved. Copyright © Microsoft Corporation. All
rights reserved. Copyright © Rogue Wave Software, Inc. All rights reserved. Copyright © Teradata Corporation. All rights reserved. Copyright © Yahoo! Inc. All rights
reserved. Copyright © Glyph & Cog, LLC. All rights reserved. Copyright © Thinkmap, Inc. All rights reserved. Copyright © Clearpace Software Limited. All rights
reserved. Copyright © Information Builders, Inc. All rights reserved. Copyright © OSS Nokalva, Inc. All rights reserved. Copyright Edifecs, Inc. All rights reserved.
Copyright Cleo Communications, Inc. All rights reserved. Copyright © International Organization for Standardization 1986. All rights reserved. Copyright © ej-
technologies GmbH. All rights reserved. Copyright © Jaspersoft Corporation. All rights reserved. Copyright © International Business Machines Corporation. All rights
reserved. Copyright © yWorks GmbH. All rights reserved. Copyright © Lucent Technologies. All rights reserved. Copyright (c) University of Toronto. All rights reserved.
Copyright © Daniel Veillard. All rights reserved. Copyright © Unicode, Inc. Copyright IBM Corp. All rights reserved. Copyright © MicroQuill Software Publishing, Inc. All
rights reserved. Copyright © PassMark Software Pty Ltd. All rights reserved. Copyright © LogiXML, Inc. All rights reserved. Copyright © 2003-2010 Lorenzi Davide, All
rights reserved. Copyright © Red Hat, Inc. All rights reserved. Copyright © The Board of Trustees of the Leland Stanford Junior University. All rights reserved. Copyright
© EMC Corporation. All rights reserved. Copyright © Flexera Software. All rights reserved. Copyright © Jinfonet Software. All rights reserved. Copyright © Apple Inc. All
rights reserved. Copyright © Telerik Inc. All rights reserved. Copyright © BEA Systems. All rights reserved. Copyright © PDFlib GmbH. All rights reserved. Copyright ©
Orientation in Objects GmbH. All rights reserved. Copyright © Tanuki Software, Ltd. All rights reserved. Copyright © Ricebridge. All rights reserved. Copyright © Sencha,
Inc. All rights reserved. Copyright © Scalable Systems, Inc. All rights reserved. Copyright © jQWidgets. All rights reserved. Copyright © Tableau Software, Inc. All rights
reserved. Copyright© MaxMind, Inc. All Rights Reserved. Copyright © TMate Software s.r.o. All rights reserved. Copyright © MapR Technologies Inc. All rights reserved.
Copyright © Amazon Corporate LLC. All rights reserved. Copyright © Highsoft. All rights reserved. Copyright © Python Software Foundation. All rights reserved.
Copyright © BeOpen.com. All rights reserved. Copyright © CNRI. All rights reserved.
This product includes software developed by the Apache Software Foundation (http://www.apache.org/), and/or other software which is licensed under various versions
of the Apache License (the "License"). You may obtain a copy of these Licenses at http://www.apache.org/licenses/. Unless required by applicable law or agreed to in
writing, software distributed under these Licenses is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied. See the Licenses for the specific language governing permissions and limitations under the Licenses.
This product includes software which was developed by Mozilla (http://www.mozilla.org/), software copyright The JBoss Group, LLC, all rights reserved; software
copyright © 1999-2006 by Bruno Lowagie and Paulo Soares and other software which is licensed under various versions of the GNU Lesser General Public License
Agreement, which may be found at http:// www.gnu.org/licenses/lgpl.html. The materials are provided free of charge by Informatica, "as-is", without warranty of any
kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose.
The product includes ACE(TM) and TAO(TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University, University of California,
Irvine, and Vanderbilt University, Copyright (©) 1993-2006, all rights reserved.
This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (copyright The OpenSSL Project. All Rights Reserved) and
redistribution of this software is subject to terms available at http://www.openssl.org and http://www.openssl.org/source/license.html.
This product includes Curl software which is Copyright 1996-2013, Daniel Stenberg, <[email protected]>. All Rights Reserved. Permissions and limitations regarding this
software are subject to terms available at http://curl.haxx.se/docs/copyright.html. Permission to use, copy, modify, and distribute this software for any purpose with or
without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.
The product includes software copyright 2001-2005 (©) MetaStuff, Ltd. All Rights Reserved. Permissions and limitations regarding this software are subject to terms
available at http://www.dom4j.org/ license.html.
The product includes software copyright © 2004-2007, The Dojo Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to
terms available at http://dojotoolkit.org/license.
This product includes ICU software which is copyright International Business Machines Corporation and others. All rights reserved. Permissions and limitations
regarding this software are subject to terms available at http://source.icu-project.org/repos/icu/icu/trunk/license.html.
This product includes software copyright © 1996-2006 Per Bothner. All rights reserved. Your right to use such materials is set forth in the license which may be found at
http:// www.gnu.org/software/ kawa/Software-License.html.
This product includes OSSP UUID software which is Copyright © 2002 Ralf S. Engelschall, Copyright © 2002 The OSSP Project Copyright © 2002 Cable & Wireless
Deutschland. Permissions and limitations regarding this software are subject to terms available at http://www.opensource.org/licenses/mit-license.php.
This product includes software developed by Boost (http://www.boost.org/) or under the Boost software license. Permissions and limitations regarding this software are
subject to terms available at http:/ /www.boost.org/LICENSE_1_0.txt.
This product includes software copyright © 1997-2007 University of Cambridge. Permissions and limitations regarding this software are subject to terms available at
http:// www.pcre.org/license.txt.
This product includes software copyright © 2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms
available at http:// www.eclipse.org/org/documents/epl-v10.php and at http://www.eclipse.org/org/documents/edl-v10.php.
This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html, http://www.bosrup.com/web/overlib/?License, http://
www.stlport.org/doc/ license.html, http://asm.ow2.org/license.html, http://www.cryptix.org/LICENSE.TXT, http://hsqldb.org/web/hsqlLicense.html, http://
httpunit.sourceforge.net/doc/ license.html, http://jung.sourceforge.net/license.txt , http://www.gzip.org/zlib/zlib_license.html, http://www.openldap.org/software/release/
license.html, http://www.libssh2.org, http://slf4j.org/license.html, http://www.sente.ch/software/OpenSourceLicense.html, http://fusesource.com/downloads/license-
agreements/fuse-message-broker-v-5-3- license-agreement; http://antlr.org/license.html; http://aopalliance.sourceforge.net/; http://www.bouncycastle.org/licence.html;
http://www.jgraph.com/jgraphdownload.html; http://www.jcraft.com/jsch/LICENSE.txt; http://jotm.objectweb.org/bsd_license.html; . http://www.w3.org/Consortium/Legal/
2002/copyright-software-20021231; http://www.slf4j.org/license.html; http://nanoxml.sourceforge.net/orig/copyright.html; http://www.json.org/license.html; http://
forge.ow2.org/projects/javaservice/, http://www.postgresql.org/about/licence.html, http://www.sqlite.org/copyright.html, http://www.tcl.tk/software/tcltk/license.html, http://
www.jaxen.org/faq.html, http://www.jdom.org/docs/faq.html, http://www.slf4j.org/license.html; http://www.iodbc.org/dataspace/iodbc/wiki/iODBC/License; http://
www.keplerproject.org/md5/license.html; http://www.toedter.com/en/jcalendar/license.html; http://www.edankert.com/bounce/index.html; http://www.net-snmp.org/about/
license.html; http://www.openmdx.org/#FAQ; http://www.php.net/license/3_01.txt; http://srp.stanford.edu/license.txt; http://www.schneier.com/blowfish.html; http://
www.jmock.org/license.html; http://xsom.java.net; http://benalman.com/about/license/; https://github.com/CreateJS/EaselJS/blob/master/src/easeljs/display/Bitmap.js;
http://www.h2database.com/html/license.html#summary; http://jsoncpp.sourceforge.net/LICENSE; http://jdbc.postgresql.org/license.html; http://
protobuf.googlecode.com/svn/trunk/src/google/protobuf/descriptor.proto; https://github.com/rantav/hector/blob/master/LICENSE; http://web.mit.edu/Kerberos/krb5-
current/doc/mitK5license.html; http://jibx.sourceforge.net/jibx-license.html; https://github.com/lyokato/libgeohash/blob/master/LICENSE; https://github.com/hjiang/jsonxx/
blob/master/LICENSE; https://code.google.com/p/lz4/; https://github.com/jedisct1/libsodium/blob/master/LICENSE; http://one-jar.sourceforge.net/index.php?
page=documents&file=license; https://github.com/EsotericSoftware/kryo/blob/master/license.txt; http://www.scala-lang.org/license.html; https://github.com/tinkerpop/
blueprints/blob/master/LICENSE.txt; http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html; https://aws.amazon.com/asl/; https://github.com/
twbs/bootstrap/blob/master/LICENSE; https://sourceforge.net/p/xmlunit/code/HEAD/tree/trunk/LICENSE.txt; https://github.com/documentcloud/underscore-contrib/blob/
master/LICENSE, and https://github.com/apache/hbase/blob/master/LICENSE.txt.
This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php), the Common Development and Distribution
License (http://www.opensource.org/licenses/cddl1.php) the Common Public License (http://www.opensource.org/licenses/cpl1.0.php), the Sun Binary Code License
Agreement Supplemental License Terms, the BSD License (http:// www.opensource.org/licenses/bsd-license.php), the new BSD License (http://opensource.org/
licenses/BSD-3-Clause), the MIT License (http://www.opensource.org/licenses/mit-license.php), the Artistic License (http://www.opensource.org/licenses/artistic-
license-1.0) and the Initial Developer’s Public License Version 1.0 (http://www.firebirdsql.org/en/initial-developer-s-public-license-version-1-0/).
This product includes software copyright © 2003-2006 Joe WaInes, 2006-2007 XStream Committers. All rights reserved. Permissions and limitations regarding this
software are subject to terms available at http://xstream.codehaus.org/license.html. This product includes software developed by the Indiana University Extreme! Lab.
For further information please visit http://www.extreme.indiana.edu/.
This product includes software Copyright (c) 2013 Frank Balluffi and Markus Moeller. All rights reserved. Permissions and limitations regarding this software are subject
to terms of the MIT license.
DISCLAIMER: Informatica LLC provides this documentation "as is" without warranty of any kind, either express or implied, including, but not limited to, the implied
warranties of noninfringement, merchantability, or use for a particular purpose. Informatica LLC does not warrant that this software or documentation is error free. The
information provided in this software or documentation may include technical inaccuracies or typographical errors. The information in this software and documentation is
subject to change at any time without notice.
NOTICES
This Informatica product (the "Software") includes certain drivers (the "DataDirect Drivers") from DataDirect Technologies, an operating company of Progress Software
Corporation ("DataDirect") which are subject to the following terms and conditions:
1. THE DATADIRECT DRIVERS ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
2. IN NO EVENT WILL DATADIRECT OR ITS THIRD PARTY SUPPLIERS BE LIABLE TO THE END-USER CUSTOMER FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, CONSEQUENTIAL OR OTHER DAMAGES ARISING OUT OF THE USE OF THE ODBC DRIVERS, WHETHER OR NOT
INFORMED OF THE POSSIBILITIES OF DAMAGES IN ADVANCE. THESE LIMITATIONS APPLY TO ALL CAUSES OF ACTION, INCLUDING, WITHOUT
LIMITATION, BREACH OF CONTRACT, BREACH OF WARRANTY, NEGLIGENCE, STRICT LIABILITY, MISREPRESENTATION AND OTHER TORTS.
4 Table of Contents
Chapter 3: Core Data Domains Accelerator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Core Data Domains Accelerator Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Data Domains in Core Accelerator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Core Data Domains Column Name Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Core Data Domains Data Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Table of Contents 5
France Contact Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
France Corporate Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
France General Data Cleansing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
France Matching and Deduplication Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
France Composite Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
France Demonstration Mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6 Table of Contents
United Kingdom Composite Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
United Kingdom Demonstration Mappings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Table of Contents 7
Preface
The Informatica Data Quality Accelerator Guide is written for data quality developers. This guide assumes
that you have an understanding of data quality concepts such as standardization, parsing, labeling, and
validation.
Informatica Resources
Informatica Documentation
The Informatica Documentation team makes every effort to create accurate, usable documentation. If you
have questions, comments, or ideas about this documentation, contact the Informatica Documentation team
through email at [email protected]. We will use your feedback to improve our
documentation. Let us know if we can contact you regarding your comments.
The Documentation team updates documentation as needed. To get the latest documentation for your
product, navigate to Product Documentation from https://mysupport.informatica.com.
8
Informatica Web Site
You can access the Informatica corporate web site at https://www.informatica.com. The site contains
information about Informatica, its background, upcoming events, and sales offices. You will also find product
and partner information. The services area of the site includes important information about technical support,
training and education, and implementation services.
Informatica Marketplace
The Informatica Marketplace is a forum where developers and partners can share solutions that augment,
extend, or enhance data integration implementations. By leveraging any of the hundreds of solutions
available on the Marketplace, you can improve your productivity and speed up time to implementation on
your projects. You can access Informatica Marketplace at http://www.informaticamarketplace.com.
Informatica Velocity
You can access Informatica Velocity at https://mysupport.informatica.com. Developed from the real-world
experience of hundreds of data management projects, Informatica Velocity represents the collective
knowledge of our consultants who have worked with organizations from around the world to plan, develop,
deploy, and maintain successful data management solutions. If you have questions, comments, or ideas
about Informatica Velocity, contact Informatica Professional Services at [email protected].
Online Support requires a user name and password. You can request a user name and password at
http://mysupport.informatica.com.
Preface 9
The telephone numbers for Informatica Global Customer Support are available from the Informatica web site
at http://www.informatica.com/us/services-and-training/support-services/global-support-centers/.
10 Preface
CHAPTER 1
Introduction to Accelerators
This chapter includes the following topics:
• Accelerators Overview, 11
• Accelerator Structure, 11
• Accelerator Installation, 13
• Accelerator Components, 16
• Tags and Rules, 20
• Accelerator Use in PowerCenter, 20
Accelerators Overview
Accelerators are content bundles that address common data quality problems in a country, a region, or an
industry. An accelerator might contain mapplets that you can use to analyze and enhance the data in an
organization. An accelerator might also contain data domains that you can use to discover the types of
information that the data contains.
You add the mapplets and data domains to the Model repository. Informatica configures the mapplets and the
data domains to respond to the business rules that you might define for the organization data. The
accelerators use the terms mapplet and rule to identify the mapplets. When you import the mapplets to the
Model repository, the Developer tool creates the mapplet objects in a folder named Rules.
Informatica Data Quality includes a Core accelerator and a Core Data Domain accelerator. You can buy and
download additional accelerators from Informatica.
Accelerator Structure
An accelerator is a compressed file that contains repository metadata files and other files in a directory
structure. The directory structure depends on the type of accelerator. General accelerators contain rules,
11
reference data objects, demonstration mappings, and demonstration data sources. Data Domain accelerators
contain rules, reference data objects, data domains, and data domain groups.
• Accelerator_Content
• Accelerator_Sources
Accelerator_Content Directory
The Accelerator_Content directory contains the following components:
Note: If you export a mapping that contains a rule to PowerCenter, copy the dictionary files to a directory
that the PowerCenter Integration Service can read.
Accelerator_Sources Directory
The Accelerator_Sources directory contains the demonstration data file. The demonstration data file is a
compressed file that contains the source data for the demonstration mappings. Copy the source data file to
the file system.
Accelerator Installation
To install an accelerator, import the repository object metadata to a Model repository project, and copy the
demonstration data files to the file system. Use the Developer tool to import the repository objects.
When you import rules and demonstration mappings, select the repository project from the Object Explorer.
When you import data domains, select the repository project from the Preferences dialog box. In each case,
the import operation prompts you to select the compressed file that contains the reference data that the XML
file specifies.
Informatica_Core_Accelerator_961.xml
When you import the metadata file, select the following reference data file:
Informatica_Core_Accelerator_961.xml
Informatica_IDE_DataDomain_961.xml
When you import the metadata file, select the following reference data file:
Informatica_IDE_DataDomain_961.zip
Accelerator Installation 13
The following image shows the data domains in the Preferences dialog box:
<alt text: The left-hand side of the Preferences dialog box shows a range of Developer tool options that you
can browse and update. Expand the Informatica option and select Data Domain Glossary. The data domains
and data domain groups appear on the right-hand side of the dialog box.>
Consider the following rules and guidelines when you install an accelerator:
• Before you import or copy files, verify that you have all privileges on the Data Integration Service, the
Content Management Service, and the Analyst Service.
• Import the accelerators to a single Model repository project. Create the project before you import the
accelerators.
• Install the Core accelerator before you install another accelerator.
• Install the Core Data Domain accelerator before you install the Extended Data Domain accelerator.
• If you import a metadata file that contains an object in common with an accelerator that you imported
earlier, replace the object in the repository.
1. In the Developer tool, connect to the Model repository that contains the destination project for the
metadata.
2. In the Object Explorer, select the destination project.
For example, select the Informatica_DQ_Content project. If required, create a project in the Model
repository.
3. Select File > Import.
4. In the Import dialog box, select Informatica > Import Object Metadata File (Advanced).
5. Click Next.
6. Browse to the XML metadata file in the accelerator directory structure, and select the file.
7. Click Open, and click Next.
8. In the Source pane, select the items that appear under the project node.
9. In the Target pane, select the destination project.
10. Click Add to Target.
• If the repository project contains an object that you want to add, the Developer tool prompts you to
merge the object with the current object. Click Yes to merge the objects.
• If the Developer tool prompts you to rename the objects, click No.
• If any object remains in the Source pane, use the pointer to move the object to the target project.
11. Click Next.
12. Browse to the compressed reference data file in the accelerator directory structure, and select the file.
13. Click Open.
14. Verify that the code page is UTF-8, and click Next.
15. In the Target Connection field, select the reference data database.
16. Click Finish.
1. In the Developer tool, connect to the Model repository that contains the destination project for the
metadata.
2. Select Window > Preferences.
3. In the Preferences dialog box, expand the Informatica node and select Data Domain Glossary.
4. In the repository pane, select the top-level node for the data domains or the data domain groups.
5. Click Import.
Accelerator Installation 15
6. Browse to the XML metadata file in the accelerator directory structure, and select the file.
7. Click Open, and click Next.
8. In the Source pane, select the data domain glossary project.
9. In the Target pane, select the destination project.
10. Select the following option in the Resolution field:
Replace option in target
11. Click Add Contents to Target.
• If the Developer tool prompts you to add the objects, click Yes.
• If the Developer tool prompts you to rename the objects, click No.
12. Click Next.
13. If the import operation identifies dependencies, copy the dependent objects from the source project to
the target project.
14. Click Next.
15. Browse to the compressed reference data file in the accelerator directory structure, and select the file.
16. Click Open.
17. Verify that the code page is UTF-8, and click Next.
18. In the Target Connection field, select the reference data database.
19. Click Finish.
Accelerator Components
When you import an accelerator, the Developer tool creates folders for the rules, data domains, and other
objects that the accelerator specifies. Each folder contains subfolders that organize the objects by country
and by the type of data quality operation that they perform.
Use the Core accelerator to create the folders in a repository project. When you import additional
accelerators, you add objects and folders to the project.
<alt text: The Informatica D Q Content project contains top-level folders for reference tables, rules,
demonstration mappings, and content sets. The Rules folder contains subfolders for the different data quality
operations that the rules perform. The other folders contain subfolders that identify the region or industry that
the objects apply to.>
1. Dictionaries folder
2. Domain_Discovery folder
3. Rules folder
4. Rules_Demo folder
5. Content Sets folder
The project contains the following top-level folders:
Dictionaries
The Dictionaries folder contains reference table objects. Each object refers to a table in the reference
data database.
Accelerator Components 17
Domain_Discovery
The Domain_Discovery folder contains the rules that define the data domains in the accelerators that
you install. The folder contains a Data_Rules folder and a Metadata_Rules folder. The rules in the
Data_Rules folder correspond to the data domains that analyze column data values. The rules in the
Metadata_Rules folder correspond to the data domains that analyze column names.
Rules
The Rules folder contains the rules that you use to analyze and enhance data.
Rules_Demo
The Rules_Demo folder contains the demonstration mappings and demonstration data sources.
Content Sets
The Content Sets folder contains reference data objects that do not specify data in the reference data
database.
Rules
The accelerator rules define a range of data analysis and data transformation operations. You can add a
single rule or a series of rules to a mapping.
Address validation
Validate and enhance the data in postal address records. The rules require address reference data files.
Data parsing
Parse information from records. Parsing rules can extract multiple types of information, including person
names, organization names, telephone numbers, dates, and identification numbers.
Data standardization
Standardize the spelling and format of data values. Standardization rules can identify and correct
multiple types of information, including person names, organization names, telephone numbers, dates,
and identification numbers.
Duplicate analysis
Find duplicate records in a data set. Duplicate analysis rules compare the records in a data set and
generate a numeric score that represents the degree of similarity between the records.
The duplicate analysis rules can read records that contain general corporate data and records that
contain identity data. The identity data rules require identity population data files.
The import operation adds the rules to the following repository folder:
[Informatica_DQ_Content]\Rules
Find the rules that perform address validation, data parsing, and data standardization operations in the Data
Cleansing subfolders in the accelerator project. Find the rules that perform duplicate analysis in the Matching
Deduplication subfolder in the accelerator project.
If you import rules for a country or region, you add a subfolder for composite rules. A composite rule
combines multiple rules in a nested format in a single rule.
The import operation adds the mappings and data source objects to the following repository folder:
[Informatica_DQ_Content]\Rules_Demo
When you import an accelerator, the import operation adds the data source for the demonstration mappings
to the Rules_Demo folder. Copy the data source files from the Accelerator_Sources directory to the file
system.
Data Domains
A data domain describes the data values that can represent a single type of business information in a
column. Use data domains to determine the type of information in a column and to find information of a
specified type in a column. The accelerators include data domains for a range of information types, including
Social Security numbers, credit card numbers, email addresses, and job titles.
For example, a database table might contain Social Security numbers in a Comments column that any user
can read. You must identify the records that contain the Social Security numbers and delete or move the
Social Security numbers. You add the SSN data domain to a profile, and you run the profile on the
Comments column.
You can assign a data domain to one or more data domain groups. Use the data domain groups to organize
the data domains based on the type of business analysis that the data domains perform. The data domain
glossary lists the data domains and data domain groups that you add to the Model repository. Use the
Preferences menu in the Developer tool to add data domains to the data domain glossary. To update the
data definitions in a data domain, use the rules in the data domain accelerator.
Note: You cannot view the data domain objects in the Object Explorer.
Reference Tables
A reference table contains standard and alternative versions of a set of data values. Rules use reference
tables to verify that data values are accurate and correctly formatted.
The import operation adds the reference tables to the following repository folder:
[Informatica_DQ_Content]\Dictionaries
Content Sets
A content set is a reference data object that does not store data in database tables. Content sets include
character sets, pattern sets, regular expressions, token sets, probabilistic models, and classifier models.
The import operation adds the rules to the following repository folder:
[Informatica_DQ_Content]\Content Sets
Note: To view a list of the elements in a content set, open the content set in the Developer tool and select the
Tags tab.
Accelerator Components 19
Tags and Rules
Accelerator rules include tags that indicate the type of data that the rule can read and the type of operation
that the rule can perform.
To view the tags that apply to a rule, open the rule in the Developer tool and click the Tags tab. You can use
the Search options in the Developer tool to find accelerators that contain a tag that you specify.
The export operation copies the reference table data to the file system. Copy the files to the PowerCenter
Integration Service host machine. The reference data file locations in the PowerCenter directory structure
must correspond to the locations of the reference tables in the Model repository folder structure.
The following path describes a sample directory structure for the reference data objects in a PowerCenter
installation:
<Informatica_installation_directory>\services\<Model_repository_project_name>
\<Model_repository_project_folder_name>
Note: If the PowerCenter product version does not match the Developer tool version, verify that the
PowerCenter environment includes the Data Quality Integration Plug-in.
For more information about Data Quality integration with PowerCenter, read the Informatica Data Quality
Integration for PowerCenter User Guide.
Core Accelerator
This chapter includes the following topics:
The Core accelerator includes rules that perform the following data quality processes:
Find the address data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Address_Data_Cleansing
21
The following table describes the address data cleansing rules in the Core accelerator:
Name Description
Find the contact address data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
The following table describes the contact data cleansing rules in the Core accelerator:
Name Description
rule_Email_Parse_and_Validate Parses email addresses from data fields and validates the format
of each email address.
rule_Email_Validation Validates the format of email addresses. The rule does not verify
that the email addresses are accurate or active. The rule returns
Valid or Invalid.
rule_Identify_Suspect_Names Identifies names that might not be genuine person names. The rule
compares the input values to a reference table of names that are
unlikely to be genuine. For example, the reference table includes
the names of fictional characters.
Find the corporate data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Corporate_Data_Cleansing
The following table describes the corporate data cleansing rules in the Core accelerator:
Name Description
Find the general data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\General_Data_Cleansing
The following table describes the general data cleansing rules in the Core accelerator:
Name Description
rule_Add_Space_Around_Hyphen Adds a space before and after all dashes and hyphens in a string.
rule_AllTrim Removes all leading and trailing spaces from the input data fields.
rule_Assign_DQ_90_ElementInputStatus_ Assigns a description to the Element Input Status output from the
Description Address Validator transformation. The description corresponds to
the output from Data Quality transformations in releases prior to
Data Quality 9.0.
rule_Assign_DQ_90_ElementResultStatus_ Assigns a description to the Element Result Status output from the
Description Address Validator transformation. The description corresponds to
the output from Data Quality transformations in releases prior to
Data Quality 9.0.
rule_Assign_DQ_90_Match_Code_Descript Assigns a description to the Match Code output from the Address
ion Validator transformation. The description corresponds to the output
from Data Quality transformations in releases prior to Data Quality
9.0.
rule_Compare_Dates Calculates the difference between two dates. The mapplet uses the
following units of measure:
- Hours
- Days
- Months
- Years
Each output value is exclusive from the other values. The outputs
cannot be added to represent the difference between the data
values.
rule_Completeness Checks a single port for NULL values. Returns "Complete" if the
port contains data. Returns "Incomplete" if the port is empty or
contains a NULL value.
rule_Completeness_Multi_Port Checks multiple ports for NULL values. Returns "Complete" if all
ports contain data. Returns "Incomplete" if any port is empty or
contains a NULL value.
rule_Convert_DQ90_Match_Codes_to_IDQ Converts the output from the Match Code port in an Address
_86_Codes Validator transformation to the equivalent address validation match
code in Data Quality 8.6.
rule_CreditCard_Number_Validation Validates credit card numbers for credit cards that use the Luhn
algorithm. Validation includes, but is not limited to, the following
credit cards:
- American Express
- Diners Club Carte Blanche
- Diners Club International
- Diners Club US & Canada
- Discover Card
- JCB
- Maestro
- Master Card
- Solo
- Switch
- Visa
- Visa Electron
The rule returns "Valid" or "Invalid."
rule_Date_Complete Verifies that the input string conforms to a date format that the rule
recognizes. The rule reads the following reference data object:
- user_defined_dates_infa
rule_Date_of_Birth_Validation Checks the number of years between a date of birth and the
current date. Returns "Adult" or "Minor" in addition to "Valid" if the
number of years 120 or lower. Returns "Invalid" if the number of
years is greater than 120.
rule_Date_Parse Parses date data from a string to a port that the rule specifies. The
rule recognizes dates in the following formats:
- dd/mm/yyyy
- mm/dd/yyyy
- yyyy/dd/mm
The rule returns a date and also returns a string that contains the
input text without the date.
rule_Days_from_Current_Date Calculates the number of days between a specified date and the
current date.
rule_GTIN_Validation Validates a Global Trade Item Number (GTIN). The rule validates
eight-dight, twelve-digit, thirteen-digit, and fourteen-digit numbers.
The rule returns "Valid" if the check digit is correct for the number
and "Invalid" if the check digit is incorrect.
rule_IsNumeric Verifies that the input data is numeric. The rule returns "True" or
"False."
rule_Luhn_Algorithm Applies the Luhn algorithm to a numeric string. The rule can
validate numeric strings, such as credit card numbers.
rule_Parse_First_Word Parses the first word in an input string to a port that the rule
specifies.
rule_Parse_Number_At_End_Of_Line Parses any number that occurs at the end of an input string to a
port that the rule specifies. The rule reads strings from left to right.
rule_Parse_Number_At_Start_Of_Line Parses any number that occurs at the start of an input string to a
port that the rule specifies. The rule reads strings from left to right.
rule_Parse_Text_Between_Parentheses Parses strings that are enclosed in parentheses to a port that the
rule specifies. The rule contains an output port for the parsed
strings and an output port for the input text without the parsed
strings.
rule_Parse_Text_in_Single_Quotes Parses strings that are enclosed in quotation marks to a port that
the rule specifies. When the input data contains multiple quoted
elements, the rule parses the final element. The rule reads the
input strings from left to right. The rule contains an output port for
the parsed strings and an output port for the input text without the
parsed strings.
rule_Past_Date_Label Determines whether an input date is earlier than the system date
or later than the system date.
rule_Personal_Company_Identification Parses person names and company names to different ports that
the rule specifies. The rule has the following outputs:
- Person name
- Company name
- Data category, such as person name or company name
- Data that the rule cannot parse
rule_Remove_All_Leading_Zeros Removes all instances of the numeric character "0" from the
beginning of a string.
rule_Remove_Apostrophe Removes apostrophes. The rule merges the text strings on either
side of the apostrophe.
rule_Remove_Control_Characters Removes control characters from text strings. The rule returns a
string that contains the control characters and a string that
contains the input text without the control characters.
rule_Remove_Extra_Spaces Replaces all consecutive spaces with a single space and trims
leading and trailing spaces.
rule_Remove_Leading_Zero Removes a single instance of the numeric character "0" from the
beginning of a string.
rule_String_Completeness Checks a string for completeness. The rule also searches the input
strings for values in the reference table string_default_values_infa.
The reference table contains values such as NA, DEFAULT, and
XX. If an input string contains a value in the reference table, the
rule identifies the string as incomplete.
rule_TitleCase Converts strings to title case. In title case strings, the first letter of
each word is capitalized.
Find the matching and deduplication rules in the following repository location:
[Informatica_DQ_Content]\Rules\Matching_Deduplication
The following table describes the matching and deduplication rules in the Core accelerator:
Name Description
Find the product data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Product_Data_Cleansing
The following table describes the product data cleansing rules in the Core accelerator:
Name Description
rule_Parse_Quantity_And_UOM Parses the first instance of a quantity and a unit of measure from a
string to a port that the rule specifies. The rule reads the string
from left to right and returns the following data:
- Quantity.
- Unit of measure.
- The input string without the quantity and unit of measure values.
m_customer_data_demo
m_product_demo
Use the data domains in the Core Data Domains accelerator to discover the functional meaning of the source
columns based on column names or column data.
The Core Data Domains accelerator includes the following types of rules:
• Data rule. Finds columns with data that matches the logic defined in the rule.
• Column name rule. Finds columns with column names that match column-name logic defined in the rule.
The data domain rules return Boolean values that indicate whether the column data or column name meets
the rule criteria. The data domain rules use regular expressions or reference tables to look for specific values
or matching patterns. For example, you can use a 9-digit rule expression to identify source data that matches
the Social Security number format. When you use expressions in data domain rules, some unrelated source
data values might also meet the rule expression criteria. For example, United States ZIP codes in the source
might meet the Social Security number format. To make the data domain inference effective, you must review
the data domain discovery results for discrepancies. After you have reviewed and verified the data domain
discovery results, you can choose to associate a data domain with a column.
32
Data Domains in Core Accelerator
Use the predefined data domains in profiles to perform data domain discovery and identify critical data
characteristics within an enterprise.
AccountNumber Discovers column names that Column name rule Account_Bank, PCI, PHI
contain the "a*c*num" or "acc"
string.
CompanyName Discovers column names that Column name rule PII, Contact
contain the "company" string Data rule
and identifies the column data
that matches the organization-
name values in a reference
table.
Country Discovers column names that Column name rule PII, Address
contain the "iso*countr*code" Data rule
string, "iso*country" string, or
"countr*" string and identifies
the column data that matches
country names.
CreditCardNumber Discovers column names that Column name rule Account_Bank, PII, PCI
contain the "ccn" string, Data rule
"cr*ca*nu" string, or
"credit*no*" string and
identifies the column data that
matches the credit card
number format of multiple
credit card organizations.
Email Discovers column names that Column name rule PHI, Contact
contain the "email" string and Data rule
identifies the column data that
matches a predefined email ID
format.
FirstName Discovers column names that Column name rule PCI, PII, Contact
contain the "f*nam*" string and Data rule
identifies the column data that
matches values in a reference
table with a list of first names.
Gender Discovers column names that Column name rule PII, Contact
contain the "gender" string or Data rule
strings such as "female" and
"male" and identifies the
column data that matches the
gender values in a reference
table.
LastName Discovers column names that Column name rule PII, PCI, Contact
contain the "lname" string, Data rule
"su*name" string, or
"last*name" string and
identifies the column data that
matches values in a reference
table with a list of last names.
PhoneNumber Discovers column names that Column name rule PHI, Contact
contain the "phone" string or Data rule
"fax" string and identifies the
column data that matches the
United States phone number
format.
SSN Discovers column names that Column name rule PHI, NationalID
contain the "SSN" string, Data rule
"social*sec*no" string, or
"social* sec*num*" string and
identifies the column data that
matches the Social Security
number format.
You can find the column-name rules in the following repository location:
[Informatica_DQ_Content]\Domain_Discovery\MetaData_Rules
The following table describes the column-name rules in the Core Data Domains accelerator:
Name Description
dataDomain_MetaDataRule_Age Discovers a column name that contains the "age" string or "dob"
string.
dataDomain_MetaDataRule_JobPosition Discovers a column name that contains the "title" string, "position"
string, or "designation" string.
dataDomain_MetaDataRule_PhoneNumber Discovers a column name that contains the "phone" string or "fax"
string.
dataDomain_MetaDataRule_ZipCode Discovers a column name that contains the "zip" string or "pin"
string.
The following table describes the data rules in the Core Data Domains accelerator:
Name Description
dataDomain_DataRule_Age Identifies the column data with values from 1 through 120.
dataDomain_DataRule_BirthDay Identifies the column data that matches valid birth dates. The rule
verifies the number of years between the input date and current
date. The rule returns "Adult," "Minor," or "Valid" based on the
values from 1 through 120. The rule returns "Invalid" for all other
values.
dataDomain_DataRule_CreditCardNumber Identifies the column data that matches the credit card number
format of major credit card organizations, such as American
Express, Diners Club International, and Maestro.
dataDomain_DataRule_ExpirationDate Identifies the column data that matches expired credit card dates.
The rule compares the input date to the system date for validation.
dataDomain_DataRule_FirstName Identifies the column data that matches values in a reference table
with a list of first names.
dataDomain_DataRule_Gender Identifies the column data that matches the gender values in a
reference table.
dataDomain_DataRule_LastName Identifies the column data that matches values in a reference table
with a list of last names.
dataDomain_DataRule_PhoneNumber Identifies the column data that matches the United States phone
number format.
dataDomain_DataRule_SSN Identifies the column data that matches the Social Security number
format.
dataDomain_DataRule_State Identifies the column data that matches the state names in the
United States.
dataDomain_DataRule_URL Identifies the column data that matches predefined URL formats.
dataDomain_DataRule_ZipCode Identifies the column data that matches United States ZIP codes.
Use the data domains in the Extended accelerator to discover the functional meaning of the columns in the
source based on column names or column data.
The Extended Data Domains accelerator includes the following types of rules:
• Data rule. Finds columns with data that matches the logic defined in the rule.
• Column name rule. Finds columns with column names that match column-name logic defined in the rule.
The data domain rules return Boolean values that indicate whether the column data or column name meets
the rule criteria. The data domain rules use regular expressions or reference tables to look for specific values
or matching patterns. For example, you can use a 9-digit rule expression to identify source data that matches
the Social Security number format. When you use expressions in data domain rules, some unrelated source
data values might also meet the rule expression criteria. For example, United States ZIP codes in the source
might meet the Social Security number format. To make the data domain inference effective, you must review
the data domain discovery results for discrepancies. After you have reviewed and verified the data domain
discovery results, you can choose to associate a data domain with a column.
40
Data Domains in Extended Accelerator
Use the predefined data domains in profiles to perform data domain discovery and identify critical data
characteristics within an enterprise.
The following table describes the data domains available in the Extended Data Domains accelerator
package:
AccountNumber Discovers column names that Column name rule Account_Bank, PCI, PHI
contain the "a*c*num" or "acc"
string.
CompanyName Discovers column names that Column name rule Contact, PII
contain the "company" string Data rule
and identifies the column data
that matches the organization-
name values in a reference
table.
Country Discovers column names that Column name rule Address, PII
contain the "iso*countr*code" Data rule
string, "iso*country" string, or
"countr*" string and identifies
the column data that matches
country names.
CreditCardNumber Discovers column names that Column name rule Account_Bank, PCI, PII
contain the "ccn" string, Data rule
"cr*ca*nu" string, or
"credit*no*" string and
identifies the column data that
matches the credit card
number format of multiple
credit card organizations.
Email Discovers column names that Column name rule Contact, PHI
contain the "email" string and Data rule
identifies the column data that
matches a predefined email ID
format.
FirstName Discovers column names that Column name rule Contact, PCI, PII
contain the "f*nam*" string and Data rule
identifies the column data that
matches values in a reference
table of first names.
Gender Discovers column names that Column name rule Contact, PII
contain the "gender" string or Data rule
strings such as "female" and
"male" and identifies the
column data that matches the
gender values in a reference
table.
Geocode_Latitude Discovers column names that Column name rule Address, General
contain the "latitude" string and Data rule
identifies the column data that
matches valid latitude
coordinates.
Geocode_LatitudeL Discovers column names that Column name rule Address, General
ongitude contain strings such as Data rule
"latitude," "longitude," and
"geocode" and identifies
column data that matches valid
latitude or longitude
coordinates.
Geocode_Longitud Discovers column names that Column name rule Address, General
e contain the "longitude" string Data rule
and identifies the column data
that matches valid longitude
coordinates.
LastName Discovers column names that Column name rule Contact, PCI, PII
contain the "lname" string, Data rule
"su*name" string, or
"last*name" string and
identifies the column data that
matches values in a reference
table of last names.
PhoneNumber Discovers column names that Column name rule Contact, PHI
contain the "phone" string or Data rule
"fax" string and identifies the
column data that matches the
United States phone number
format.
SSN Discovers column names that Column name rule NationalID, PHI
contain the "SSN" string, Data rule
"social*sec*no" string, or
"social* sec*num*" string and
identifies the column data that
matches the Social Security
number format.
You can find the column-name rules in the following repository location:
[Informatica_DQ_Content]\Domain_Discovery\MetaData_Rules
The following table describes the column-name rules in the Extended Data Domains accelerator:
Name Description
dataDomain_MetaDataRule_Age Discovers a column name that contains the "age" string or "dob"
string.
dataDomain_MetaDataRule_JobPosition Discovers a column name that contains the "title" string, "position"
string, or "designation" string.
dataDomain_MetaDataRule_PhoneNumber Discovers a column name that contains the "phone" string or "fax"
string.
dataDomain_MetaDataRule_ZipCode Discovers a column name that contains the "zip" string or "pin"
string.
The following table describes the data rules in the Extended Data Domains accelerator:
Name Description
dataDomain_DataRule_Account_Status Identifies the column data that matches account status values in a
reference table.
dataDomain_DataRule_AUT_NATID Identifies the column data that matches the Austrian national ID
format.
dataDomain_DataRule_BGR_NATID Identifies the column data that matches the Bulgarian national ID
format.
dataDomain_DataRule_BIC_SWIFTCode Identifies the column data that matches Bank Identifier Code (BIC)
or Society for Worldwide Interbank Financial Telecommunication
(SWIFT) code by pattern recognition and country code.
dataDomain_DataRule_BRA_IDDoc Identifies the column data that matches the number format of the
Brazilian ID card titled Registro Geral.
dataDomain_DataRule_BRA_Personal_ID Identifies the column data that matches the Brazilian personal ID
format.
dataDomain_DataRule_CHN_NATID Identifies the column data that matches the Chinese national ID
format.
dataDomain_DataRule_Computer_Address Identifies the column data that matches the format of IP addresses
and Mac addresses.
dataDomain_DataRule_CreditCard_AMEX Identifies the column data that matches the American Express
credit card number format.
dataDomain_DataRule_CreditCard_Diners Identifies the column data that matches the Diners Club
Card International credit card number format.
dataDomain_DataRule_CreditCard_Discov Identifies the column data that matches the Discover credit card
erCard number format.
dataDomain_DataRule_CreditCard_JCB Identifies the column data that matches the JCB International
credit card number format.
dataDomain_DataRule_CreditCard_Master Identifies the column data that matches the MasterCard credit card
Card number format.
dataDomain_DataRule_CreditCard_Visa Identifies the column data that matches the Visa credit card
number format.
dataDomain_DataRule_Date_Validation Identifies the date strings in the source data that appear in a single
format in a date column. To configure the date format that the rule
uses for validation, open the dq_ValidateDate Expression
transformation in the rule and update the In_Date_Format
expression variable. The default format is "MM/DD/YYYY." The
rule returns "Valid" or "Invalid."
dataDomain_DataRule_Date_Validation_All Identifies the date values in the column data and standardizes the
_Formats column data to a specific date format.
dataDomain_DataRule_DNK_NATID Identifies the column data that matches the Danish national ID
format.
dataDomain_DataRule_DriversLicense Identifies the column data that matches the United Kingdom,
Unites States, and Canadian driver license numbers based on the
length and pattern requirements.
dataDomain_DataRule_DriversLicense_Ca Identifies the column data that matches the Canada driver license
nada numbers except for the provinces of British Columbia, Quebec,
Manitoba, and Prince Edward Island.
dataDomain_DataRule_DriversLicense_GB Identifies the column data that matches the United Kingdom driver
R license numbers.
dataDomain_DataRule_DriversLicense_US Identifies the column data that matches the driver license numbers
A of most of the states in the United States.
dataDomain_DataRule_FIN_NATID Identifies the column data that matches the Finnish national ID
format.
dataDomain_DataRule_FRA_INSEE Identifies the column data that matches the French Institut National
de la Statistique et des Études Économiques (INSEE) number
format.
dataDomain_DataRule_GBR_NINO Identifies the column data that matches the United Kingdom
National Insurance number format.
dataDomain_DataRule_GBR_Passport_Nu Identifies the column data that matches the United Kingdom
mber passport number format.
dataDomain_DataRule_HostName Identifies the column data that matches valid host names.
dataDomain_DataRule_HRV_NATID Identifies the column data that matches the Croatian national ID
format.
dataDomain_DataRule_IBAN Identifies the column data that matches the International Bank
Account Number format of multiple European countries.
dataDomain_DataRule_IND_NATID Identifies the column data that matches the Indian Permanent
Account Number format.
dataDomain_DataRule_IND_Passport Identifies the column data that matches the Indian passport
number format.
dataDomain_DataRule_ISBN Identifies the column data that matches the International Standard
Book Number format.
dataDomain_DataRule_ItalyFiscalCode Identifies the column data that matches the Italian national ID
format.
dataDomain_DataRule_KOR_NATID Identifies the column data that matches the Korean national ID
format.
dataDomain_DataRule_Latitude Identifies the column data that matches valid latitude coordinates.
dataDomain_DataRule_LatitudeLongitude Identifies the column data that matches valid pairs of latitude and
longitude coordinates, each pair separated by a semicolon.
dataDomain_DataRule_NOR_NATID Identifies the column data that matches the Norwegian national ID
format.
dataDomain_DataRule_PostCode Identifies the column data that matches the postal codes of
multiple countries.
dataDomain_DataRule_ROU_NATID Identifies the column data that matches the Romanian national ID
format.
dataDomain_DataRule_SouthAfrica_NATID Identifies the column data that matches the South African national
ID format.
dataDomain_DataRule_SWE_NATID Identifies the column data that matches the Swedish national ID
format.
dataDomain_DataRule_TWN_NATID Identifies the column data that matches the Taiwanese national ID
format.
dataDomain_DataRule_URL Identifies the column data that matches predefined URL formats.
dataDomain_DataRule_US_Zip5 Identifies the column data that matches United States ZIP codes.
dataDomain_DataRule_USA_SSN_post_20 Identifies the column data that matches the Social Security number
11June format in length, numeric values, and minimum and maximum
values of the area, group, and serial number sections. Based on
the SSN Randomization effective June 25, 2011, the rule does not
verify the issuance of a Social Security number and the group and
area number combination.
Australia/New Zealand
Accelerator
This chapter includes the following topics:
The Australia/New Zealand accelerator includes rules that perform the following data quality processes:
54
Australia/New Zealand Address Data Cleansing
Rules
Use the address data cleansing rules to parse, standardize, and validate address data.
Find the address data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Address_Data_Cleansing
The following table describes the address data cleansing rules in the Australia/New Zealand accelerator:
Name Description
rule_AUS_Address_Parse_Hyb Parses unstructured Australian addresses into address elements. The rule
rid does not validate the addresses. Use the rule when you can connect the input
address fields to the Hybrid input ports on the Address Validator
transformation.
rule_AUS_Address_Parse_Mult Parses unstructured Australian addresses into address elements. The rule
iline does not validate the addresses. Use the rule when you can connect the input
address fields to the Multiline input ports on the Address Validator
transformation.
rule_AUS_Address_Validation_ Validates the deliverability of Australian addresses and adds latitude and
Discrete_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Discrete input ports on the Address Validator
transformation.
rule_AUS_Address_Validation_ Validates the deliverability of Australian addresses. The rule corrects errors in
Discrete the input addresses where possible. Use the rule when you can connect the
input address fields to the Discrete input ports on the Address Validator
transformation.
rule_AUS_Address_Validation_ Validates the deliverability of Australian addresses and adds latitude and
Hybrid_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Hybrid input ports on the Address Validator
transformation.
rule_AUS_Address_Validation_ Validates the deliverability of Australian addresses. The rule corrects errors in
Hybrid the input addresses where possible. Use the rule when you can connect the
input address fields to the Hybrid input ports on the Address Validator
transformation.
rule_AUS_Address_Validation_ Validates the deliverability of Australian addresses and adds latitude and
Multiline_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Multiline input ports on the Address Validator
transformation.
rule_AUS_Address_Validation_ Validates the deliverability of Australian addresses. The rule corrects errors in
Multiline the input addresses where possible. Use the rule when you can connect the
input address fields to the Multiline input ports on the Address Validator
transformation.
rule_NZL_Address_Parse_Hybr Parses unstructured New Zealand addresses into address elements. The rule
id does not validate the addresses. Use the rule when you can connect the input
address fields to the Hybrid input ports on the Address Validator
transformation.
rule_NZL_Address_Parse_Mult Parses unstructured New Zealand addresses into address elements. The rule
iline does not validate the addresses. Use the rule when you can connect the input
address fields to the Multiline input ports on the Address Validator
transformation.
rule_NZL_Address_Validation_ Validates the deliverability of New Zealand addresses and adds latitude and
Discrete_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Discrete input ports on the Address Validator
transformation.
rule_NZL_Address_Validation_ Validates the deliverability of New Zealand addresses. The rule corrects errors
Discrete in the input addresses where possible. Use the rule when you can connect the
input address fields to the Discrete input ports on the Address Validator
transformation.
rule_NZL_Address_Validation_ Validates the deliverability of New Zealand addresses and adds latitude and
Hybrid_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Hybrid input ports on the Address Validator
transformation.
rule_NZL_Address_Validation_ Validates the deliverability of New Zealand addresses. The rule corrects errors
Hybrid in the input addresses where possible. Use the rule when you can connect the
input address fields to the Hybrid input ports on the Address Validator
transformation.
rule_NZL_Address_Validation_ Validates the deliverability of New Zealand addresses and adds latitude and
Multiline_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Multiline input ports on the Address Validator
transformation.
rule_NZL_Address_Validation_ Validates the deliverability of New Zealand addresses. The rule corrects errors
Multiline in the input addresses where possible. Use the rule when you can connect the
input address fields to the Multiline input ports on the Address Validator
transformation.
Find the contact data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
Name Description
rule_AUS_Driver_Licence_Num Validates Australian driver's license numbers based on length and pattern
ber_Validation requirements.
rule_AUS_Gender_Assignment Assigns gender according to first names. The rule returns "M" for male names,
"F" for female names, and "U" if the gender is unknown. For example, the rule
assigns the name "John Smith" a gender of "M" for male.
rule_AUS_Multi_Person_Name Parses person name values into separate ports. The rule creates ports for
_Parse values such as title, first name, middle name, and surname.
The rule output includes a port that contains the full name of the person in the
record. You can use the full name port as an input to a Match transformation in
an identity match analysis mapping.
When the name data identifies more than one person, the rule creates an
output port for each full name. For example, the rule can read the name "John
and Jane Smith" and create output ports for "John Smith" and "Jane Smith."
rule_AUS_Phone_Number_Par Parses a Australian telephone number from a string. The rule parses the first
se telephone number in the data, reading from right to left.
The rule recognizes telephone numbers that use leading zeros, international
dialing codes, or extensions that begin with the hash symbol. The rule
processes the following punctuation symbols: the plus sign, parentheses, and
the hash symbol. Before you run the rule, remove all other punctuation,
including double spaces.
The rule returns a telephone number and also returns a string that contains the
input text with the telephone number removed.
rule_AUS_Phone_Number_Vali Validates the area code and length of Australian telephone numbers. The rule
dation returns the region of the telephone number, as well as codes that indicate if
the area code and length of a telephone number are valid.
rule_AUS_Tax_File_Number_V Validates Australian Tax File Numbers (TFN) based on the check digit in each
alidation number.
rule_NZL_Gender_Assignment Assigns gender according to New Zealand first names. The rule returns "M" for
male names, "F" for female names, and "U" if the gender is unknown. For
example, the rule assigns the name "John Smith" a gender of "M" for male.
rule_NZL_IRD_Number_Parse Parses nine-digit numeric strings as New Zealand Inland Revenue Department
numbers (IRD).
rule_NZL_IRD_Number_Validat Validates New Zealand Inland Revenue Department numbers (IRD) based on
e the check digit in each number.
rule_NZL_Phone_Number_Par Parses a New Zealand telephone number from a string. The rule parses the
se first telephone number in the data, reading from right to left.
The rule recognizes telephone numbers that use leading zeros, international
dialing codes, or extensions that begin with the hash symbol. The rule
processes the following punctuation symbols: the plus sign, parentheses, and
the hash symbol. Before you run the rule, remove all other punctuation,
including double spaces.
The rule returns a telephone number and also returns a string that contains the
input text with the telephone number removed.
rule_NZL_Phone_Number_Vali Validates the area code and length of New Zealand telephone numbers. The
dation rule returns the region of the telephone number, as well as codes that indicate
if the area code and length of a telephone number are valid
rule_Prename_Assignment Generates an honorific according to the gender. You can change the
female_prename expression variable from Ms. to Mrs.
rule_Salutation_Assignment Generates formal and casual greetings from prenames and name tokens. For
example, when input data contains "Mr. John Smith," the rule generates the
formal greeting "Dear Mr. Smith," and the casual greeting "Dear John,". You
can change the prefix and punctuation by editing the variables in the
dq_Generate_Salutation Expression transformation.
• rule_Email_Validation
For more information about these rules, see “Core Contact Data Cleansing Rules” on page 23.
Find the corporate data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Corporate_Data_Cleansing
The following table describes the corporate data cleansing rules in the Australia/New Zealand accelerator:
Name Description
rule_AUS_Business_Number_ Standardizes Australian Business Numbers (ABN) to the NN NNN NNN NNN
Standardize format. The rule requires that the input is a 11-digit string.
rule_AUS_Business_Number_ Validates Australian Business Numbers (ABN) based on the check digit in
Validation each number.
Find the general data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\General_Data_Cleansing
Name Description
rule_AUS_NZL_NER_Field_Identification Identifies the type of information contained in an input field. The rule
can identify names, Personal IDs, company names, dates, and
address data from Australia and New Zealand. The rule returns a
label that describes the type of input data. The rule uses probabilistic
matching techniques to identify the types of information.
• rule_Assign_DQ_90_GeocodingStatus_Description
• rule_Assign_DQ_90_Mailability_Score_Description
• rule_Assign_DQ_90_Match_Code_Descriptions
• rule_Remove_Extra_Spaces
• rule_Remove_Hyphen
• rule_Remove_Leading_Zero
• rule_Remove_Period_Parentheses
• rule_Remove_Punctuation
• rule_Remove_Punctuation_and_Space
• rule_Remove_Space
• rule_Replace_Limited_Punct_with_Space
• rule_UpperCase
For more information about these rules, see “Core General Data Cleansing Rules” on page 24.
Find the matching and deduplication rules in the following repository location:
[Informatica_DQ_Content]\Rules\Matching_Deduplication
Name Description
mplt_AUS_Firstname_and_TF Uses field match strategies to identify duplicate rows in Australian data based
N_Match on Tax File Numbers (TFN) and first names. The mapplet generates group
keys from the TFN data.
mplt_AUS_IMO_Company_Na Uses identity match strategies to identify duplicate rows for Australian data
me_and_Address_Match based on company names and addresses. The mapplet generates group keys
from the postal code data.
mplt_AUS_IMO_Familyname_a Uses identity match strategies to identify duplicate rows in Australian data
nd_Address_Match based on family names and addresses. The mapplet generates group keys
from the postal code data.
mplt_AUS_IMO_Individual_Na Uses identity match strategies to identify duplicate rows in Australian data
me_and_Address_Match based on person names and addresses. The mapplet generates group keys
from the postal code data.
mplt_AUS_IMO_Personal_Nam Uses identity match strategies to identify duplicate rows in Australian data
e_and_Data based on person names and personal data. The fields in the personal data
column must contain a single type of data, such as a telephone number, email,
or Tax File Number. The mapplet generates group keys from the personal
data.
mplt_AUS_Individual_Name_an Uses field match strategies to identify duplicate rows based on person names
d_Address_Match and Australia address data. The mapplet uses a combination of characters
from the surname values and the postal code values to generate group keys.
mplt_AUS_Individual_Name_an Uses field match strategies to identify duplicate rows based on Australian
d_Date_Match person names and dates. The mapplet generates group keys from the date
data.
mplt_AUS_Individual_Name_an Uses field match strategies to identify duplicate rows based on email
d_Email_Match addresses and Australian person names. The mapplet generates group keys
from the email address data.
mplt_AUS_Individual_Name_an Uses field match strategies to identify duplicate rows based on Australian
d_Phone_Match person names and telephone numbers. The mapplet generates group keys
from the telephone number data.
mplt_AUS_Individual_Name_an Uses field match strategies to identify duplicate rows for Australian data based
d_TFN_Match on Tax File Numbers (TFN) and person names. The mapplet generates group
keys from the TFN data.
mplt_AUS_Individual_Name_M Uses field match strategies to identify duplicate rows based on Australian
atch person names. The mapplet generates NYSIIS codes from the surname values
and uses the NYSIIS codes as group keys.
mplt_AUS_NZL_Company_Na Uses field match strategies to identify duplicate rows based on company name
me_and_Address_Match and address data from Australia and New Zealand. The mapplet uses a
combination of characters from the company name values and the postal code
values to generate group keys.
mplt_AUS_NZL_Familyname_a Uses field match strategies to identify duplicate rows based on surname and
nd_Address_Match address data from Australia and New Zealand. The mapplet uses a
combination of characters from the surname values and the postal code values
to generate group keys.
mplt_Company_Name_Match Uses field match strategies to identify duplicate rows based on company name.
The mapplet generates Soundex codes from the company name values and
uses the Soundex codes as group keys.
mplt_NZL_Firstname_and_IRD Uses field match strategies to identify duplicate rows for New Zealand data
_Match based on Inland Revenue Department (IRD) numbers and first names. The
mapplet generates group keys from the IRD number.
mplt_NZL_IMO_Company_Na Uses identity match strategies to identify duplicate rows in New Zealand data
me_and_Address_Match based on company names and addresses. The mapplet generates group keys
from the postal code data.
mplt_NZL_IMO_Familyname_a Uses identity match strategies to identify duplicate rows in New Zealand data
nd_Address_Match based on family names and addresses. The mapplet generates group keys
from the postal code data.
mplt_NZL_IMO_Individual_Na Uses identity match strategies to identify duplicate rows in New Zealand data
me_and_Address_Match based on person names and addresses. The mapplet generates group keys
from the postal code data.
mplt_NZL_IMO_Personal_Nam Uses identity match strategies to identify duplicate rows in New Zealand data
e_and_Data based on person names and personal data. The fields in the personal data
column must contain a single type of data, such as telephone number, email,
or Inland Revenue Department number. The mapplet generates group keys
from the personal data.
mplt_NZL_Individual_Name_an Uses field match strategies to identify duplicate rows based on person names
d_Address_Match and New Zealand address data. The mapplet uses a combination of characters
from the surname values and the postal code values to generate group keys.
mplt_NZL_Individual_Name_an Uses field match strategies to identify duplicate rows based on New Zealand
d_Date_Match person names and dates. The mapplet generates group keys from the date
data.
mplt_NZL_Individual_Name_an Uses field match strategies to identify duplicate rows based on email
d_Email_Match addresses and New Zealand person names. The mapplet generates group
keys from the email address data.
mplt_NZL_Individual_Name_an Uses field match strategies to identify duplicate rows based on New Zealand
d_IRD_Match person names and Inland Revenue Department (IRD) numbers. The mapplet
generates group keys from the IRD number.
mplt_NZL_Individual_Name_an Uses field match strategies to identify duplicate rows based on New Zealand
d_Phone_Match person names and telephone numbers. The mapplet generates group keys
from the telephone number data.
mplt_NZL_Individual_Name_M Uses field match strategies to identify duplicate rows based on New Zealand
atch person names. The mapplet generates NYSIIS codes from the surname values
and uses the NYSIIS codes as group keys.
rule_AUS_NZL_Company_Nam Generates a match score based on company names and addresses from
e_and_Address_MatchScore Australia and New Zealand.
rule_AUS_NZL_Familyname_a Generates a match score based on surnames and addresses from Australia
nd_Address_MatchScore and New Zealand.
rule_AUS_NZL_Firstname_and Generates a match score based on first names and personal identification
_PID_MatchScore numbers.
rule_AUS_NZL_Individual_Nam Generates a match score based on person names and addresses from
e_and_Address_MatchScore Australia and New Zealand.
rule_AUS_NZL_Individual_Nam Generates a match score based on person names and personal identification
e_and_PID_MatchScore numbers.
rule_Individual_Name_and_Em Generates a match score based on person names and email addresses.
ail_MatchScore
rule_Individual_Name_and_Ph Generates a match score based on person names and telephone numbers.
one_MatchScore
Name Description
rule_AUS_Contact_Data Parses, standardizes, and validates Australian contact data, such as addresses,
telephone numbers, and Tax File Numbers.
Rule Location
rule_Assign_DQ_90_Mailability_Score_Description [Informatica_DQ_Content]\Rules\General_Data_Cleansing
rule_AUS_Address_Validation_Hybrid [Informatica_DQ_Content]\Rules\Address_Data_Cleansing
rule_AUS_Company_Name_Standardization [Informatica_DQ_Content]\Rules
\Corporate_Data_Cleansing
rule_AUS_Gender_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_AUS_Multi_Person_Name_Parse [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_AUS_Phone_Number_Standardization [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_AUS_Phone_Number_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_AUS_Tax_File_Number_Standardization [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_Email_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_Prename_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_Salutation_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
Name Description
rule_NZL_Contact_Data Parses, standardizes, and validates New Zealand contact data, such as addresses,
telephone numbers, and Inland Revenue Department (IRD) numbers.
The following table lists the names and repository locations of the rules in the composite rule for New
Zealand contact data:
Rule Location
rule_Assign_DQ_90_Mailability_Score_Description [Informatica_DQ_Content]\Rules\General_Data_Cleansing
rule_AUS_Company_Name_Standardization [Informatica_DQ_Content]\Rules
\Corporate_Data_Cleansing
rule_AUS_Multi_Person_Name_Parse [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_Email_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_NZL_Address_Standardization [Informatica_DQ_Content]\Rules\Address_Data_Cleansing
rule_NZL_Gender_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_NZL_IRD_Number_Standardization [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_NZL_IRD_Number_Validate [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_NZL_Phone_Number_Standardization [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_NZL_Phone_Number_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_Prename_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_Salutation_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
m_AUS_customer_data_demo
m_AUS_customer_matching_demo
Parses and standardizes identity data from Australia and New Zealand and performs identity match
analysis on the data.
The mapping analyzes the following data combinations and generates match clusters for each
combination:
Brazil Accelerator
This chapter includes the following topics:
The Brazil accelerator includes rules that perform the following data quality processes:
Find the address data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Address_Data_Cleansing
66
The following table describes the address data cleansing rules in the Brazil accelerator:
Name Description
rule_BRA_Address_Parse_Hyb Parses unstructured Brazilian addresses into address elements. The rule does
rid not validate the addresses. Use the rule when you can connect the input
address fields to the Hybrid input ports on the Address Validator
transformation.
rule_BRA_Address_Parse_Mult Parses unstructured Brazilian addresses into address elements. The rule does
iline not validate the addresses. Use the rule when you can connect the input
address fields to the Multiline input ports on the Address Validator
transformation.
rule_BRA_Address_Validation_ Validates the deliverability of Brazilian addresses and adds latitude and
Discrete_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Discrete input ports on the Address Validator
transformation.
rule_BRA_Address_Validation_ Validates the deliverability of Brazilian addresses. The rule corrects errors in
Discrete the input addresses where possible. Use the rule when you can connect the
input address fields to the Discrete input ports on the Address Validator
transformation.
rule_BRA_Address_Validation_ Validates the deliverability of Brazilian addresses and adds latitude and
Hybrid_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Hybrid input ports on the Address Validator
transformation.
rule_BRA_Address_Validation_ Validates the deliverability of Brazilian addresses. The rule corrects errors in
Hybrid the input addresses where possible. Use the rule when you can connect the
input address fields to the Hybrid input ports on the Address Validator
transformation.
rule_BRA_Address_Validation_ Validates the deliverability of Brazilian addresses and adds latitude and
Multiline_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Multiline input ports on the Address Validator
transformation.
rule_BRA_Address_Validation_ Validates the deliverability of Brazilian addresses. The rule corrects errors in
Multiline the input addresses where possible. Use the rule when you can connect the
input address fields to the Multiline input ports on the Address Validator
transformation.
Find the contact data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
Name Description
rule_BRA_Gender_Assignment Assigns gender according to first name. The rule returns "M" for male names,
"F" for female names, and "U" if the gender is unknown. For example, the rule
assigns the name "Joao Coelho" a gender of "M" for male.
rule_BRA_Personal_CPF_Valid Validates check digits for Cadastro de Pessoas Físicas (CPF) numbers.
ation
rule_BRA_Personal_Name_Par Parses person name values into separate ports. The rule creates ports for
se_Validate values such as title, first name, middle name, and surname. The rule also
indicates if the name might be a company name and validates the spelling of
the name.
The rule output includes a port that contains the full name of the person in the
record. You can use the full name port as an input to a Match transformation in
an identity match analysis mapping.
rule_BRA_Phone_Number_Par Parses a Brazilian telephone number from a string. The rule parses the first
se telephone number in the data, reading from left to right. The rule returns a
telephone number and also returns a string that contains the input text with the
telephone number removed.
rule_BRA_Phone_Number_Sta Standardizes Brazilian telephone numbers. The rule returns the telephone
ndardization number in the following formats:
- Standard - nn nnnn nnnn
- Dashes - nn-nnnn-nnnn
- No Spaces - nnnnnnnnnn
rule_BRA_Phone_Validatation Validates the area code and length of Brazilian telephone numbers. The rule
returns codes that indicate if the area code and length of a telephone number
are valid.
rule_BRA_Prename_Assignme Generates an honorific according to the gender. You can change the
nt female_prename expression variable from "Sra" to "Sta".
rule_BRA_Salutation_Assignm Generates formal and casual greetings from prenames and name tokens. For
ent example, when input data contains "Sr. Joao Coelho," the rule generates the
formal greeting "Prezado Sr. Coelho," and the casual greeting "Prezado
Joao,". You can change the prefix and punctuation by editing the variables in
the dq_Generate_Salutation Expression transformation.
• rule_Email_Parse_Into_Mailbox_Domain
• rule_Email_Validation
For more information about these rules, see “Core Contact Data Cleansing Rules” on page 23.
Find the corporate data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Corporate_Data_Cleansing
The following table describes the corporate data cleansing rules in the Brazil accelerator:
Name Description
Find the general data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\General_Data_Cleansing
The following table describes the general data cleansing rules in the Brazil accelerator:
Name Description
rule_BRA_NER_Field_Identification Identifies the type of information contained in an input field. The rule can
identify names, Personal IDs, company names, dates, and Brazilian
address data. The rule returns a label that describes the type of input data.
The rule uses reference data to identify the types of information. The rule
uses probabilistic matching techniques to identify the types of information.
• rule_Assign_DQ_90_GeocodingStatus_Description
• rule_Assign_DQ_90_Mailability_Score_Description
• rule_Assign_DQ_90_Match_Code_Descriptions
• rule_Remove_Extra_Spaces
• rule_Remove_Non_Numbers
• rule_Remove_Punctuation_and_Space
• rule_Remove_Punctuation
• rule_Replace_Limited_Punct_with_Space
• rule_TitleCase
• rule_UpperCase
For more information about these rules, see “Core General Data Cleansing Rules” on page 24.
Find the matching and deduplication rules in the following repository location:
[Informatica_DQ_Content]\Rules\Matching_Deduplication
The following table describes the matching and deduplication rules in the Brazil accelerator:
Name Description
mplt_BRA_Company_Name_an Uses field match strategies to identify duplicate rows based on company name
d_Address_Match and Brazilian address data. The mapplet uses a combination of characters
from the company name values and the postal code values to generate group
keys.
mplt_BRA_Familyname_and_A Uses identity match strategies to identify duplicate rows in Brazilian data
ddress_Match based on family names and addresses. The mapplet uses a combination of
characters from the surname values and the postal code values to generate
group keys.
mplt_BRA_Firstname_and_CP Uses field match strategies to identify duplicate rows based on first name and
F_Match Cadastro de Pessoas Físicas (CPF) number. The mapplet generates group
keys from the CPF number.
mplt_BRA_IMO_Company_Na Uses identity match strategies to identify duplicate rows in Brazilian data
me_and_Address_Match based on company names and addresses. The mapplet generates group keys
from the postal code data.
mplt_BRA_IMO_Familyname_a Uses identity match strategies to identify duplicate rows in Brazilian data
nd_Address_Match based on family names and addresses. The mapplet generates group keys
from the postal code data.
mplt_BRA_IMO_Individual_Na Uses identity match strategies to identify duplicate rows in Brazilian data
me_and_Address_Match based on person names and addresses. The mapplet generates group keys
from the postal code data.
mplt_BRA_IMO_Personal_Nam Uses identity match strategies to identify duplicate rows in Brazilian data
e_and_Data based on person names and personal data. The fields in the personal data
column must contain a single type of data, such as telephone number, email,
or Cadastro de Pessoas Físicas number. The mapplet generates group keys
from the personal data.
mplt_BRA_Individual_Name_an Uses field match strategies to identify duplicate rows based on person names
d_Address_Match and Brazilian address data. The mapplet uses a combination of characters
from the surname values and the postal code values to generate group keys.
mplt_BRA_Individual_Name_an Uses field match strategies to identify duplicate rows based on Brazilian
d_CPF_Match person names and Cadastro de Pessoas Físicas (CPF) numbers. The mapplet
generates group keys from the CPF number.
mplt_BRA_Individual_Name_an Uses field match strategies to identify duplicate rows based on Brazilian
d_Date_Match person names and date data. The mapplet generates group keys from the date
data.
mplt_BRA_Individual_Name_an Uses field match strategies to identify duplicate rows based on Brazilian
d_Email_Match person names and email addresses. The mapplet generates group keys from
email address data.
mplt_BRA_Individual_Name_an Uses field match strategies to identify duplicate rows based on Brazilian
d_Phone_Match person names and telephone numbers. The mapplet generates group keys
generated from telephone numbers.
mplt_Company_Name_Match Uses field match strategies to identify duplicate rows based on company name.
The mapplet generates Soundex codes from the company name values and
uses the Soundex codes as group keys.
rule_BRA_Company_Name_an Generates a match score based on company names and Brazilian address
d_Address_MatchScore data.
rule_BRA_Familyname_and_A Generates a match score based on surnames and Brazilian address data.
ddress_MatchScore
rule_BRA_Firstname_and_CPF Generates a match score based on first name and Cadastro de Pessoas
_MatchScore Físicas (CPF) number.
rule_BRA_Individual_Name_an Generates a match score based on person names and Brazilian address data.
d_Address_MatchScore
rule_BRA_Individual_Name_an Generates a match score based on person names and Brazilian address data.
d_CPF_MatchScore
rule_BRA_Individual_Name_an Generates a match score based on person names and telephone numbers.
d_Phone_MatchScore
rule_Individual_Name_and_Em Generates a match score based on person names and email addresses.
ail_MatchScore
Name Description
rule_BRA_Contact_Data Parses, standardizes, and validates Brazilian contact data, such as addresses,
telephone numbers, and Cadastro de Pessoas Físicas (CPF) numbers.
The following table lists the names and repository locations of the rules in the composite rule for Brazilian
contact data:
Rule Location
rule_Assign_DQ_90_Mailability_Score_Description [Informatica_DQ_Content]\Rules\General_Data_Cleansing
rule_BRA_Address_Validation_Hybrid [Informatica_DQ_Content]\Rules\Address_Data_Cleansing
rule_BRA_Company_Suffix_Standardization [Informatica_DQ_Content]\Rules
\Corporate_Data_Cleansing
rule_BRA_Personal_CPF_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_BRA_Personal_Name_Parse_Validate [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_BRA_Phone_Number_Standardization [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_BRA_Phone_Number_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_BRA_Prename_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_BRA_Salutation_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_Email_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
m_BRA_customer_data_demo
m_BRA_customer_matching_demo
Parses and standardizes identity data from Brazil and performs identity match analysis on the data.
The Financial Services accelerator includes rules that perform the following data quality processes:
Find the contact data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
74
The following table describes the contact data cleansing rule in the Financial Services accelerator:
Name Description
Find the financial data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Financial_Data_Cleansing
The following table describes the financial data cleansing rules in the Financial Services accelerator:
Name Description
rule_Account_Status_Validatio Validates the account status. The rule requires account status reference data.
n
rule_Accrual_Period_Validatio Validates that the start date is earlier than the end date.
n
rule_Age_For_Account_Validat Validates the customer age for the account type. The rule uses the
ion age_per_account_infa reference table. You must update the reference table
with your own data.
rule_Beta_Coefficient_Validati Validates that the Beta coefficient string is a number. The rule indicates that
on the string is a positive number, negative number, zero, or not a number.
rule_BIC_SWIFT_Code_Valida Validates a Bank Identifier Code (BIC) or Society for Worldwide Interbank
tion Financial Telecommunication (SWIFT) code by pattern recognition and country
code validation.
rule_CAN_Transit_Number_Va Uses paper and electronic fund transactions to validate the format of a
lidation Canadian transit number.
rule_Credit_Card_Expiry_Chec Validates a credit card expiration date. The rule compares the credit card
k expiration date to the system date and identifies expired dates. The rule
accepts a seven character string in the format MM/YYYY.
rule_Credit_Card_Security_Co Validates that the credit card security code is a whole number that contains
de_Validation three or four digits.
rule_Currency_Code_Country_ Validates that the currency code is valid for the ISO three-character country
Validation code.
rule_Currency_Code_Validatio Validates the currency code. The rule returns "Valid" or "Invalid."
n
rule_CUSIP_Validation Validates the format and length of the check digit value. The rule returns a
status that describes the validity of the check digit value and a message that
explains the status.
rule_Dividend_Yield_Validation Validates that the dividend yield string is a number greater than or equal to
zero. The rule returns whether the string is a positive number, negative
number, zero, or not a number.
rule_EAD_Drawn_Balance_Val Validates that the amount listed in the exposure at default (EAD) is not less
idation than the drawn balance. The rule follows the guidelines for EAD calculation by
the Financial Services Authority in the United Kingdom.
rule_EAD_Validation Validates that the exposure at default (EAD) string is a number. The rule
returns whether the string is a positive number, negative number, zero, or not a
number.
rule_EPS_Validation Validates that the input is a number greater than or equal to zero.
rule_Ex_Dividend_Date_Valida Validates that the ex-dividend date and the record date are valid dates and that
tion the ex-dividend date is earlier than the record date. The rule identifies dates
with a difference of more than 15 days as not valid. The rule returns the
difference in days between the record date and the ex-dividend date.
rule_Gamma_Validation Validates that the Gamma string is a number. The rule returns whether the
string is a positive number, negative number, zero, or not a number.
rule_GBR_Bank_Account_Pars Parses eight-digit numeric strings as United Kingdom bank account numbers.
e
rule_GBR_Bank_Account_Vali Validates United Kingdom bank account numbers. The rule returns codes that
dation indicate whether the input is numeric and whether it is the correct number of
digits.
rule_GBR_Bank_Sort_Code_P Parses six-digit numeric strings as United Kingdom bank sort codes. The rule
arse parses strings of numbers in the following formats:
- Consecutive numbers (999999)
- Numbers delimited with a dash (99-99-99)
rule_GBR_Bank_Sort_Code_S Standardizes a United Kingdom bank sort code to the format "NN-NN-NN."
tandardize
rule_GBR_Bank_Sort_Code_V Validates the format and length of United Kingdom bank sort codes that are
alidation standardized to the dash-delimited format (99-99-99). The rule returns a Status
port that describes the validity of the sort code and a Validation Note port that
explains the status. If the sort code prefix matches a known assignment for a
United Kingdom bank, the Validation Note port includes the bank name.
rule_Interest_Rate_Within_Ran Validates if the decimal interest rate value is within the specified range. The
ge range is set by the two variable ports in the Expression transformation. The
rule returns "True" or "False."
rule_Loan_to_Value_Ratio Calculates the loan to value ratio, which is the loan amount divided by the
property value.
rule_Loss_Given_Default_Vali Validates that the string is numeric and a positive, negative, or zero value.
dation
rule_Market_Cap_Validation Validates that the input is a number greater than or equal to zero.
rule_Maturity_Date_Validation Validates that the maturity date is greater than the system date.
rule_Price_Earnings_Ratio_Val Validates that the price-to-earnings ratio is a positive number in the range of 0 -
idation 100.
rule_Probability_of_Default_Va Validates that the probability of default value is numeric and indicates if it is a
lidation positive, negative, or zero value. If positive, The rule returns status messages
for values in the following ranges:
- < = .1
- > .1 and < = .5
- > .5 and < = 1
- >1
rule_Rating_Code_Validation Validates that a rating is in the Standard & Poor's ratings scale, the Moody's
ratings scale, or in a user-defined list.
rule_Rating_Date_Validation Validates that the rating date is one year greater than the system date.
rule_SEDOL_Validation Validates a Stock Exchange Daily Official List (SEDOL) code by checking its
format and check digit.
rule_Stock_Exchange_Validati Validates most stock exchanges world wide by name and symbol.
on
rule_USA_Routing_Number_V Validates a standard magnetic ink character recognition line (MICR) formatted
alidation routing number. Validates the Associated Federal Reserve Bank, the structure
of the input, and the checksum calculation.
rule_Volatility_Validation Validates that the volatility value is a number greater than or equal to zero.
Find the general data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\General _Data_Cleansing
The following table describes the general data cleansing rule in the Financial Services accelerator:
Name Description
• rule_Remove_Punctuation
• rule_Remove_Punctuation_and_Space
• rule_Remove_Space
• rule_UpperCase
For more information about these rules, see “Core General Data Cleansing Rules” on page 24.
Find the matching and deduplication rules in the following repository location:
[Informatica_DQ_Content]\Rules\Matching_Deduplication
The following table describes the matching and deduplication rules in the Financial Services accelerator:
Name Description
mplt_Company_Name_and_Ad Identifies duplicate rows based on company name and United States address
dress_Match data. The mapplet uses a combination of characters from the company name
values and the postal code values to generate group keys.
mplt_Company_Name_Match Identifies duplicate rows based on company name. The mapplet generates
Soundex codes from the company name values and uses the Soundex codes
as group keys.
mplt_Familyname_and_Addres Identifies duplicate rows based on surname and United States address data.
s_Match The mapplet uses a combination of characters from the surname values and
the postal code values to generate group keys.
mplt_Individual_Name_and_Ad Identifies duplicate rows based on person names and United States address
dress_Match data. The mapplet generates NYSIIS codes from the surname values and uses
the NYSIIS codes as group keys.
mplt_Individual_Name_and_Da Identifies duplicate rows based on person names and date data. The mapplet
te_Match generates group keys generated from the date data.
mplt_Individual_Name_and_E Identifies duplicate rows based on person names and email addresses. The
mail_Match mapplet matches generates keys generated from the email address data.
mplt_Individual_Name_and_Ph Identifies duplicate rows based on person names and telephone numbers. The
one_Match mapplet generates group keys from telephone numbers.
mplt_Individual_Name_Match Identifies duplicate rows based on person names. The mapplet generates
NYSIIS codes from the surname values and uses the NYSIIS codes as group
keys.
rule_Company_Name_and_Ad Generates a match score based on company names and United States
dress_MatchScore addresses.
rule_Familyname_and_Addres Generates a match score based on surnames and United States addresses.
s_MatchScore
rule_Individual_Name_and_Ad Generates a match score based on person names and United States
dress_MatchScore addresses.
rule_Individual_Name_and_Em Generates a match score based on person names and email addresses.
ail_MatchScore
rule_Individual_Name_and_Ph Generates a match score based on person names and telephone numbers.
one_MatchScore
France Accelerator
This chapter includes the following topics:
The France accelerator includes rules that perform the following data quality processes:
Find the address data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Address_Data_Cleansing
80
The following table describes the address data cleansing rules in the France accelerator:
Name Description
rule_FRA_Address_Parse_Hyb Parses unstructured French addresses into address elements. The rule does
rid not validate the addresses. Use the rule when you can connect the input
address fields to the Hybrid input ports on the Address Validator
transformation.
rule_FRA_Address_Parse_Mult Parses unstructured French addresses into address elements. The rule does
iline not validate the addresses. Use the rule when you can connect the input
address fields to the Multiline input ports on the Address Validator
transformation.
rule_FRA_Address_Validation_ Validates the deliverability of French addresses and adds latitude and
Discrete_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Discrete input ports on the Address Validator
transformation.
rule_FRA_Address_Validation_ Validates the deliverability of French addresses. The rule corrects errors in the
Discrete input addresses where possible. Use the rule when you can connect the input
address fields to the Discrete input ports on the Address Validator
transformation.
rule_FRA_Address_Validation_ Validates the deliverability of French addresses and adds latitude and
Hybrid_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Hybrid input ports on the Address Validator
transformation.
rule_FRA_Address_Validation_ Validates the deliverability of French addresses. The rule corrects errors in the
Hybrid input addresses where possible. Use the rule when you can connect the input
address fields to the Hybrid input ports on the Address Validator
transformation.
rule_FRA_Address_Validation_ Validates the deliverability of French addresses and adds latitude and
Multiline _w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Multiline input ports on the Address Validator
transformation.
rule_FRA_Address_Validation_ Validates the deliverability of French addresses. The rule corrects errors in the
Multiline input addresses where possible. Use the rule when you can connect the input
address fields to the Multiline input ports on the Address Validator
transformation.
Find the contact data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
Name Description
rule_FRA_Gender_Assignment Assigns gender according to first names. The rule returns "M" for male names,
"F" for female names, and "U" if the gender is unknown. For example, the rule
assigns the name "Jean Leclerc" a gender of "M" for male.
rule_FRA_INSEE_Validation Validates the INSEE number based on the gender, date, and Code Officiel
Géographique (COG) values.
rule_FRA_Multi_Person_Name Parses person name values into separate ports. The rule creates ports for
_Parse values such as title, first name, middle name, and surname.
The rule output includes a port that contains the full name of the person in the
record. You can use the full name port as an input to a Match transformation in
an identity match analysis mapping.
When the name data identifies more than one person, the rule creates an
output port for each full name. For example, the rule can read the name "Jean
et Marianne Leclerc" and create output ports for "Jean Leclerc" and "Marianne
Leclerc."
rule_FRA_Phone_Number_Par Parses a French telephone number from a string. The rule parses the first
se telephone number in the data, reading from right to left.
The rule recognizes telephone numbers that use leading zeros, international
dialing codes, or extensions that begin with the hash symbol. The rule
processes the following punctuation symbols: the plus sign, parentheses, and
the hash symbol. Before you run the rule, remove all other punctuation,
including double spaces.
The rule returns a telephone number and also returns a string that contains the
input text with the telephone number removed.
rule_FRA_Phone_Number_Vali Validates the area code and length of French telephone numbers. The rule
dation returns the region of the telephone number, as well as codes that indicate if
the area code and length of a telephone number are valid.
rule_FRA_Salutation_Assignm Generates formal and casual greetings from prenames and name tokens. For
ent example, when input data contains "M. Jean Leclerc," the rule generates the
formal greeting "Monsieur Leclerc," and the casual greeting "Cher Jean,". You
can change the prefix and punctuation by editing the variables in the
dq_Generate_Salutation Expression transformation.
• rule_Email_Validation
For more information about these rules, see “Core Contact Data Cleansing Rules” on page 23.
Find the corporate data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Corporate_Data_Cleansing
The following table describes the corporate data cleansing rules from the France accelerator:
Name Description
Find the general data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\General_Data_Cleansing
The following table describes the general data cleansing rules in the France accelerator:
Name Description
rule_FRA_NER_Field_Identification Identifies the type of information contained in an input field. The rule can
identify names, Personal IDs, company names, dates, and French address
data. The rule returns a label that describes the type of input data. The rule
uses reference data to identify the types of information. The rule uses
probabilistic matching techniques to identify the types of information.
The France accelerator depends on the following general data cleansing rules from the Core accelerator:
• rule_Assign_DQ_90_GeocodingStatus_Description
• rule_Assign_DQ_90_Mailability_Score_Description
• rule_Assign_DQ_90_Match_Code_Description
• rule_Luhn_Algorithm
• rule_Remove_Extra_Spaces
• rule_Remove_Parentheses
• rule_Remove_Punctuation
• rule_Remove_Punctuation_and_Space
• rule_Replace_Limited_Punct_with_Space
• rule_UpperCase
For more information about these rules, see “Core General Data Cleansing Rules” on page 24.
The matching and deduplication rules in the France accelerator install to the following repository location:
[Informatica_DQ_Content]\Rules\Matching_Deduplication
Name Description
mplt_Company_Name_Match Uses field match strategies to identify duplicate rows based on company
names. The mapplet generates Soundex codes from the company name
values and uses the Soundex codes as group keys.
mplt_FRA_Company_Name_an Uses field match strategies to identify duplicate rows based on company
d_Address_Match names and addresses. The mapplet uses a combination of characters from the
company name values and the postal code values to generate group keys.
mplt_FRA_Familyname_and_A Uses field match strategies to identify duplicate rows based on family names
ddress_Match and addresses. The mapplet uses a combination of characters from the
surname values and the postal code values to generate group keys.
mplt_FRA_Firstname_and_INS Uses field match strategies to identify duplicate rows based on the French
EE_Match Institut National de la Statistique et des Études Économiques (INSEE) number.
The mapplet generates group keys from the INSEE number data.
mplt_FRA_Firstname_Surname Uses field match strategies to identify duplicate rows of personal names, date
_DOB_and_Postcode_Match of birth, and postal codes. The mapplet generates group keys from the postal
code data.
mplt_FRA_IMO_Company_Na Uses identity match strategies to identify duplicate rows in French data based
me_and_Address_Match on company names and addresses. The mapplet generates group keys from
the postal code data.
mplt_FRA_IMO_Familyname_a Uses identity match strategies to identify duplicate rows in French data based
nd_Address_Match on family names and addresses. The mapplet generates group keys from the
postal code data.
mplt_FRA_IMO_Individual_Na Uses identity match strategies to identify duplicate rows in French data based
me_and_Address_Match on person names and addresses. The mapplet generates group keys from the
postal code data.
mplt_FRA_IMO_Personal_Nam Uses identity match strategies to identify duplicate rows in French data based
e_and_Data on person names and personal data. The fields in the personal data column
must contain a single type of data, such as telephone number, email, or Institut
National de la Statistique et des Études Économiques (INSEE) number. The
mapplet generates group keys generated from personal data.
mplt_FRA_Individual_Name_an Uses field match strategies to identify duplicate rows based on French person
d_Date_Match names and date data. The mapplet generates group keys from the dates.
mplt_FRA_Individual_Name_an Uses field match strategies to identify duplicate rows based on French person
d_Email_Match names and email addresses. The mapplet generates group keys from the
email address data.
mplt_FRA_Individual_Name_an Uses field match strategies to identify duplicate rows based on French person
d_INSEE_Match names and the INSEE numbers. The mapplet generates group keys generated
from the INSEE number data.
mplt_FRA_Individual_Name_M Uses field match strategies to identify duplicate rows based on French person
atch names. The mapplet generates NYSIIS codes from the surname values and
uses the NYSIIS codes as group keys.
rule_FRA_Company_Name_an Generates a match score based on company names and French addresses.
d_Address_MatchScore
rule_FRA_Firstname_and_INS Generates a match score based on first names and any data in the personal
EE_MatchScore data column such as telephone number, email, or the INSEE number.
rule_FRA_Firstname_Surname Generates a match score based on the surnames, dates of birth, and postal
_DOB_and_Postcode_MatchSc codes.
ore
rule_FRA_Individual_Name_an Generates a match score based on the person names and the INSEE
d_INSEE_MatchScore numbers.
rule_Individual_Name_and_Em Generates a match score based on person names and email addresses.
ail_MatchScore
The composite rules in the France accelerator install to the following repository location:
[Informatica_DQ_Content]\Rules\Composite_Rules
Name Description
rule_FRA_Contact_Data Parses, standardizes, and validates French contact data, such as addresses and
telephone numbers.
Rule Location
rule_Assign_DQ_90_Mailability_Score_Description [Informatica_DQ_Content]\Rules\General_Data_Cleansing
rule_FRA_Address_Validation_Hybrid [Informatica_DQ_Content]\Rules\Address_Data_Cleansing
rule_FRA_Company_Name_Standardization [Informatica_DQ_Content]\Rules
\Corporate_Data_Cleansing
rule_FRA_Gender_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_FRA_Multi_Person_Name_Parse [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_FRA_Phone_Number_Standardize [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_FRA_Phone_Number_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_FRA_Prename_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_FRA_Salutation_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
m_FRA_customer_data_demo
m_FRA_customer_matching_demo
Parses and standardizes identity data from Australia and New Zealand and performs identity match
analysis on the data.
The mapping analyzes the following data combinations and generates match clusters for each
combination:
Germany Accelerator
This chapter includes the following topics:
The Germany accelerator includes rules that perform the following data quality processes:
Find the address data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Address_Data_Cleansing
88
The following table describes the address data cleansing rules in the Germany accelerator:
Name Description
rule_DEU_Address_Parse_Hyb Parses unstructured German addresses into address elements. The rule does
rid not validate the addresses. Use the rule when you can connect the input
address fields to the Hybrid input ports on the Address Validator
transformation.
rule_DEU_Address_Parse_Mult Parses unstructured German addresses into address elements. The rule does
iline not validate the addresses. Use the rule when you can connect the input
address fields to the Multiline input ports on the Address Validator
transformation.
rule_DEU_Address_Validation_ Validates the deliverability of German addresses and adds latitude and
Discrete_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Discrete input ports on the Address Validator
transformation.
rule_DEU_Address_Validation_ Validates the deliverability of German addresses. The rule corrects errors in
Discrete the input addresses where possible. Use the rule when you can connect the
input address fields to the Discrete input ports on the Address Validator
transformation.
rule_DEU_Address_Validation_ Validates the deliverability of German addresses and adds latitude and
Hybrid_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Hybrid input ports on the Address Validator
transformation.
rule_DEU_Address_Validation_ Validates the deliverability of German addresses. The rule corrects errors in
Hybrid the input addresses where possible. Use the rule when you can connect the
input address fields to the Hybrid input ports on the Address Validator
transformation.
rule_DEU_Address_Validation_ Validates the deliverability of German addresses and adds latitude and
Multiline_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Multiline input ports on the Address Validator
transformation.
rule_DEU_Address_Validation_ Validates the deliverability of German addresses. The rule corrects errors in
Multiline the input addresses where possible. Use the rule when you can connect the
input address fields to the Multiline input ports on the Address Validator
transformation.
Find the contact data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
Name Description
rule_DEU_Gender_Assignment Assigns gender according to first names. The rule returns "M" for male names,
"F" for female names, and "U" if the gender is unknown. For example, the rule
assigns the name "Hans Adler" a gender of "M" for male.
rule_DEU_Multi_Person_Name Parses person name values into separate ports. The rule creates ports for
_Parse values such as title, first name, middle name, and surname.
The rule output includes a port that contains the full name of the person in the
record. You can use the full name port as an input to a Match transformation in
an identity match analysis mapping.
When the name data identifies more than one person, the rule creates an
output port for each full name. For example, the rule can read the name "Hans
und Maria Adler" and create output ports for "Hans Adler" and "Maria Adler."
rule_DEU_Phone_Number_Par Parses a German telephone number from a string. The rule parses the first
se telephone number in the data, reading from right to left.
The rule recognizes telephone numbers that use leading zeros, international
dialing codes, or extensions that begin with the hash symbol. The rule
processes the following punctuation symbols: the plus sign, parentheses, and
the hash symbol. Before you run the rule, remove all other punctuation,
including double spaces.
The rule returns a telephone number and also returns a string that contains the
input text with the telephone number removed.
rule_DEU_Phone_Number_Vali Validates the area code and length of German telephone numbers. The rule
dation returns the region of the telephone number, as well as codes that indicate if
the area code and length of a telephone number are valid.
rule_DEU_Salutation_Assignm Generates formal and casual greetings from prenames and name tokens. For
ent example, when input data contains "Herr Hans Adler," the rule generates the
formal greeting "Sehr geehrter Herr Adler," and the casual greeting "Lieber
Hans,". You can change the prefix and punctuation by editing the variables in
the dq_Generate_Salutation Expression transformation.
• rule_Email_Validation
For more information about these rules, see “Core Contact Data Cleansing Rules” on page 23.
Find the corporate data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Corporate_Data_Cleansing
The following table describes the corporate data cleansing rule in the Germany accelerator:
Name Description
Find the general data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\General_Data_Cleansing
Name Description
rule_DEU_NER_Field_Identification Identifies the type of information contained in an input field. The rule can
identify names, Personal IDs, company names, dates, and German
address data. The rule returns a label that describes the type of input data.
The rule uses reference data to identify the types of information. The rule
uses probabilistic matching techniques to identify the types of information.
• rule_Assign_DQ_90_GeocodingStatus_Description
• rule_Assign_DQ_90_Mailability_Score_Description
• rule_Assign_DQ_90_Match_Code_Descriptions
• rule_Remove_Extra_Spaces
• rule_Remove_Hyphen
• rule_Remove_Leading_Zero
• rule_Remove_Parentheses
• rule_Remove_Period_Parentheses
• rule_Remove_Punctuation
• rule_Remove_Punctuation_and_Space
• rule_Remove_Space
• rule_Replace_Limited_Punct_with_Space
• rule_UpperCase
For more information about these rules, see “Core General Data Cleansing Rules” on page 24.
Find the matching and deduplication rules in the following repository location:
[Informatica_DQ_Content]\Rules\Matching_Deduplication
Name Description
mplt_Company_Name_Match Uses field match strategies to identify duplicate rows based on company
names. The mapplet generates Soundex codes from the company name
values and uses the Soundex codes as group keys.
mplt_DEU_Company_Name_a Uses field match strategies to identify duplicate rows in German data based on
nd_Address_Match company name and address data. The mapplet uses a combination of
characters from the company name values and the postal code values to
generate group keys.
mplt_DEU_Familyname_and_A Uses field match strategies to identify duplicate rows in German data based on
ddress_Match surname and address data. The mapplet uses a combination of characters
from the surname values and the postal code values to generate group keys.
mplt_DEU_Firstname_3CharsS Uses field match strategies to identify duplicate rows in German data based on
urname_DOB_and_Postcode_ personal names, first three characters of the family names, date of birth, and
Match postal codes. The mapplet generates group keys from the postal code data.
mplt_DEU_Firstname_and_PID Uses field match strategies to identify duplicate rows in German data based on
_Match personal names and personal IDs grouped. The mapplet generates group keys
from the personal ID data.
mplt_DEU_Firstname_Surname Uses field match strategies to identify duplicate rows in German data based on
_2ElementsDOB_and_Postcod personal names, two elements of the date of birth, and postal codes. The
e_Match mapplet generates group keys from the postal code data.
mplt_DEU_Firstname_Surname Uses field match strategies to identify duplicate rows in German data based on
_DOB_and_Postcode_Match personal names, date of birth, and postal codes. The mapplet generates group
keys from the postal code data.
mplt_DEU_IMO_Company_Na Uses identity match strategies to identify duplicate rows in German data based
me_and_Address_Match on company names and addresses. The mapplet generates group keys from
the postal code data.
mplt_DEU_IMO_Familyname_a Uses identity match strategies to identify duplicate rows in German data based
nd_Address_Match on surnames and addresses. The mapplet generates group keys from the
postal code data.
mplt_DEU_IMO_Individual_Na Uses identity match strategies to identify duplicate rows in German data based
me_and_Address_Match on person names and addresses. The mapplet generates group keys from the
postal code data.
mplt_DEU_IMO_Personal_Nam Uses identity match strategies to identify duplicate rows in German data based
e_and_Data on person names and personal data. The fields in the personal data column
must contain a single type of data, such as telephone number, email, or
personal ID. The mapplet generates group keys from the personal data.
mplt_DEU_Individual_Name_a Uses field match strategies to identify duplicate rows based on person names
nd_Date_Match and date data grouped by date. The mapplet generates group keys from the
date data.
mplt_DEU_Individual_Name_a Uses field match strategies to identify duplicate rows in German data based on
nd_Email_Match person names and email addresses. The mapplet generates group keys from
the email address data.
mplt_DEU_Individual_Name_a Uses field match strategies to identify duplicate rows in German data based on
nd_Phone_Match person names and telephone numbers. The mapplet generates group keys
from the telephone number data.
mplt_DEU_Individual_Name_a Uses field match strategies to identify duplicate rows in German data based on
nd_PID_Match person names and the personal IDs. The mapplet generates group keys from
the personal ID data.
mplt_DEU_Individual_Name_M Uses field match strategies to identify duplicate rows in German data based on
atch person names. The mapplet generates NYSIIS codes from the surname values
and uses the NYSIIS code as group keys.
rule_DEU_Firstname_3CharsS Generates a match score based on the first names, the first three characters of
urname_DOB_and_Postcode_ surnames, the date of birth, and the postal codes.
MatchScore
rule_DEU_Firstname_and_PID Generates a match score based on first names and any data in the personal
_MatchScore data column such as telephone number, email, or personal ID.
rule_DEU_Firstname_Surname Generates a match score based on personal names, date of birth, and postal
_2ElementsDOB_and_Postcod codes.
e_MatchScore Note: The input format of the date of birth is assumed to be DD/MM/YYYY.
rule_DEU_Firstname_Surname Generates a match score based on the surnames, date of birth, and postal
_DOB_and_Postcode_MatchSc codes.
ore
rule_DEU_Individual_Name_an Generates a match score based on the person names and the telephone
d_Phone_MatchScore numbers.
rule_Familyname_and_Address Generates a match score based on the family names and addresses.
_MatchScore
rule_Individual_Name_and_Em Generates a match score based on person names and email addresses.
ail_MatchScore
rule_Individual_Name_and_SS Generates a match score based on the firstnames and any data in the
N_MatchScore personal data column such as telephone number, email, or the SSN number.
Name Description
rule_DEU_Contact_Data Parses, standardizes, and validates German contact data, such as addresses and
telephone numbers.
The following table lists the rules contained in the composite rule for German contact data and their
repository locations:
Rule Location
rule_Assign_DQ_90_Mailability_Score_Description [Informatica_DQ_Content]\Rules\General_Data_Cleansing
rule_DEU_Address_Validation_Hybrid [Informatica_DQ_Content]\Rules\Address_Data_Cleansing
rule_DEU_Company_Name_Standardization [Informatica_DQ_Content]\Rules
\Corporate_Data_Cleansing
rule_DEU_Gender_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_DEU_Multi_Person_Name_Parse [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_DEU_Phone_Number_Standardization [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_DEU_Phone_Number_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_DEU_Prename_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_Email_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
m_DEU_customer_data_demo
m_DEU_customer_matching_demo
Parses and standardizes identity data from Germany and performs identity match analysis on the data.
The mapping analyzes the following data combinations and generates match clusters for each
combination:
Portugal Accelerator
This chapter includes the following topics:
The Portugal accelerator includes rules that perform the following data quality processes:
Find the address data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Address_Data_Cleansing
97
The following table describes the address data cleansing rules in the Portugal accelerator:
Name Description
rule_PRT_Address_Parse_Hyb Parses unstructured Portuguese addresses into address elements. The rule
rid does not validate the addresses. Use the rule when you can connect the input
address fields to the Hybrid input ports on the Address Validator
transformation.
rule_PRT_Address_Parse_Mult Parses unstructured Portuguese addresses into address elements. The rule
iline does not validate the addresses. Use the rule when you can connect the input
address fields to the Multiline input ports on the Address Validator
transformation.
rule_PRT_Address_Validation_ Validates the deliverability of Portuguese addresses and adds latitude and
Discrete_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Discrete input ports on the Address Validator
transformation.
rule_PRT_Address_Validation_ Validates the deliverability of Portuguese addresses. The rule corrects errors
Discrete in the input addresses where possible. Use the rule when you can connect the
input address fields to the Discrete input ports on the Address Validator
transformation.
rule_PRT_Address_Validation_ Validates the deliverability of Portuguese addresses and adds latitude and
Hybrid_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Hybrid input ports on the Address Validator
transformation.
rule_PRT_Address_Validation_ Validates the deliverability of Portuguese addresses and. The rule corrects
Hybrid errors in the input addresses where possible. Use the rule when you can
connect the input address fields to the Hybrid input ports on the Address
Validator transformation.
rule_PRT_Address_Validation_ Validates the deliverability of Portuguese addresses and adds latitude and
Multiline_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Multiline input ports on the Address Validator
transformation.
rule_PRT_Address_Validation_ Validates the deliverability of Portuguese addresses. The rule corrects errors
Multiline in the input addresses where possible. Use the rule when you can connect the
input address fields to the Multiline input ports on the Address Validator
transformation.
Find the contact data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
Name Description
rule_PRT_Gender_Assignment Assigns gender according to first names. The rule returns "M" for male names,
"F" for female names, and "U" if the gender is unknown. For example, the rule
assigns the name "Artur Cruz" a gender of "M" for male.
rule_PRT_NIF_Parse Parses Número de Identificação Fiscal (NIF) numbers from strings. The rule
returns the ID numbers and also returns a string that contains the input text
with the ID numbers removed.
rule_PRT_NIF_Validate Validates Número de Identificação Fiscal (NIF) numbers based on the check
digit in each number. The rule requires that the input is a nine-digit numeric
string with no spaces.
rule_PRT_Personal_Name_Par Parses person name values into separate ports. The rule creates ports for
se_Validate values such as title, first name, middle name, and surname. The rule also
indicates if the name might be a company name and validates the spelling of
the name.
The rule output includes a port that contains the full name of the person in the
record. You can use the full name port as an input to a Match transformation in
an identity match analysis mapping.
rule_PRT_Phone_Number_Par Parses a Portuguese telephone number from a string. The rule parses the first
se telephone number in the data, reading from right to left. The rule returns a
telephone number and also returns a string that contains the input text with the
telephone number removed.
rule_PRT_Phone_Number_Vali Validates the area code and length of Portuguese telephone numbers. The rule
dation returns the region of the telephone number, as well as codes that indicate if
the area code and length of a telephone number are valid.
rule_PRT_Prename_Assignme Generates an honorific according to the gender. You can change the
nt female_prename expression variable from "Sra" to "Sta".
rule_PRT_Salutation_Assignm Generates formal and casual greetings from prenames and name tokens. For
ent example, when input data contains "Sr. Artur Cruz," the rule generates the
formal greeting "Prezado Sr. Cruz," and the casual greeting "Prezado Artur,".
You can change the prefix and punctuation by editing the variables in the
dq_Generate_Salutation Expression transformation.
• rule_Email_Validation
For more information about these rules, see “Core Contact Data Cleansing Rules” on page 23.
Find the corporate data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Corporate_Data_Cleansing
The following table describes the corporate data cleansing rules in the Portugal accelerator:
Name Description
rule_PRT_NIPC_Parse Parses a Número de Identificação Pessoa Colectiva (NIPC). The rule returns
the NIPC and also returns a string that contains the input text with the NIPC
removed.
Find the general data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\General_Data_Cleansing
The following table describes the general data cleansing rules in the Portugal accelerator:
Name Description
rule_PRT_NER_Field_Identification Identifies the type of information contained in an input field. The rule can
identify names, Personal IDs, company names, dates, and Portuguese
address data. The rule returns a label that describes the type of input data.
The rule uses reference data to identify the types of information. The rule
uses probabilistic matching techniques to identify the types of information.
• rule_Assign_DQ_90_GeocodingStatus_Description
• rule_Assign_DQ_90_ElementResultStatus_Description
• rule_Assign_DQ_90_Match_Code_Descriptions
• rule_Parse_First_Word
• rule_Remove_Extra_Spaces
Find the matching and deduplication rules in the following repository location:
[Informatica_DQ_Content]\Rules\Matching_Deduplication
The following table describes the matching and deduplication rules in the Portugal accelerator:
Name Description
mplt_Company_Name_Match Uses field match strategies to identify duplicate rows based on company name.
The mapplet generates Soundex codes from the company name values and
uses the Soundex codes as group keys.
mplt_PRT_Company_Name_an Uses field match strategies to identify duplicate rows in Portuguese data based
d_Address_Match on company name and address data. The mapplet uses a combination of
characters from the company name values and the postal code values to
generate group keys.
mplt_PRT_Familyname_and_A Uses field match strategies to identify duplicate rows in Portuguese data based
ddress_Match on surname and address data. The mapplet uses a combination of characters
from the surname values and the postal code values to generate group keys.
mplt_PRT_Firstname_and_NIF Uses field match strategies to identify duplicate rows in Portuguese data based
_BI_Match on first name and personal identification numbers such as Número de
Indentificação Fiscal (NIF) and Bilhete de Identidade (BI). The mapplet
generates group keys from the personal identification number data.
mplt_PRT_IMO_Company_Na Uses identity match strategies to identify duplicate rows in Portuguese data
me_and_Address_Match based on company names and addresses. The mapplet generates group keys
from the postal code data.
mplt_PRT_IMO_Familyname_a Uses identity match strategies to identify duplicate rows in Portuguese data
nd_Address_Match based on family names and addresses. The mapplet generates group keys
from the postal code data.
mplt_PRT_IMO_Individual_Na Uses identity match strategies to identify duplicate rows in Portuguese data
me_and_Address_Match based on person names and addresses. The mapplet generates group keys
from the postal code data.
mplt_PRT_IMO_Personal_Nam Uses identity match strategies to identify duplicate rows in Portuguese data
e_and_Data based on person names and personal data. The fields in the personal data
column must contain a single type of data, such as telephone number, email,
or Número de Indentificação Fiscal (NIF). The mapplet generates group keys
generated from the personal data.
mplt_PRT_Individual_Name_an Uses field match strategies to identify duplicate rows in Portuguese data based
d_Address_Match on person names and address data. The mapplet uses a combination of
characters from the surname values and the postal code values to generate
group keys.
mplt_PRT_Individual_Name_an Uses field match strategies to identify duplicate rows in Portuguese data based
d_Date_Match on person names and date data. The mapplet generates group keys from the
date data.
mplt_PRT_Individual_Name_an Uses field match strategies to identify duplicate rows in Portuguese data based
d_Email_Match on person names and email addresses. The mapplet generates group keys
from the email address data.
mplt_PRT_Individual_Name_an Uses field match strategies to identify duplicate rows in Portuguese data based
d_Phone_Match on person names and telephone numbers. The mapplet generates group keys
from the telephone number data.
mplt_PRT_Individual_Name_M Uses field match strategies to identify duplicate rows in Portuguese data based
atch on person names. The mapplet generates NYSIIS codes from the surname
values and uses the NYSIIS codes as group keys.
rule_PRT_Company_Name_an Generates a match score based on company names and Portuguese address
d_Address_MatchScore data.
rule_PRT_Familyname_and_A Generates a match score based on surnames and Portuguese address data.
ddress_MatchScore
rule_PRT_Firstname_and_NIF_ Generates a match score based on first name data, Número de Indentificação
BI_MatchScore Fiscal (NIF), and Bilhete de Identidade (BI) numbers.
rule_PRT_Individual_Name_an Generates a match score based on person names and Portuguese address
d_Address_MatchScore data.
rule_PRT_Individual_Name_an Generates a match score based on person names and email addresses.
d_Email_MatchScore
rule_PRT_Individual_Name_an Generates a match score based on person names and telephone numbers.
d_Phone_MatchScore
Name Description
rule_PRT_Contact_Data Parses, standardizes, and validates Portuguese contact data, such as addresses,
telephone numbers, and Número de Identificação Fiscal (NIF) numbers.
The following table lists the rules contained in the composite rule for Portuguese contact data and their
repository locations:
Rule Location
rule_Assign_DQ_90_Mailability_Score_Description [Informatica_DQ_Content]\Rules\General_Data_Cleansing
rule_Email_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_PRT_Address_Validation_Hybrid [Informatica_DQ_Content]\Rules\Address_Data_Cleansing
rule_PRT_Company_Name_Standardization [Informatica_DQ_Content]\Rules
\Corporate_Data_Cleansing
rule_PRT_NIF_Standardization [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_PRT_NIF_Validate [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_PRT_Personal_Name_Parse_Validate [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_PRT_Phone_Number_Standardization [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_PRT_Phone_Number_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_PRT_Prename_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_PRT_Salutation_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
m_PRT_customer_data_demo
m_PRT_customer_matching_demo
Parses and standardizes identity data from Portugal and performs identity match analysis on the data.
The mapping analyzes the following data combinations and generates match clusters for each
combination:
Spain Accelerator
This chapter includes the following topics:
The Spain accelerator includes rules that perform the following data quality processes:
Find the address data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Address_Data_Cleansing
105
The following table describes the address data cleansing rules in the Spain accelerator:
Name Description
Find the contact data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
The following table describes the contact data cleansing rules in the Spain accelerator:
Name Description
rule_ESP_Gender_Assignment Assigns gender according to first names. The rule returns "M"
for male names, "F" for female names, and "U" if the gender is
unknown. For example, the rule assigns the name "Juan
Garcia" a gender of "M" for male.
• rule_Email_Validation
For more information about these rules, see “Core Contact Data Cleansing Rules” on page 23.
The Spain accelerator depends on the following corporate data cleansing rule from the Core accelerator:
• rule_Company_Name_Standardization
Find the general data cleansing rules in the following repository location:
[Informatica_DQ_Content]Rules\General_Data_Cleansing
Name Description
rule_ESP_NER_Field_Identification Identifies the type of information contained in an input field. The rule can
identify names, Personal IDs, company names, dates, and Spanish
address data. The rule returns a label that describes the type of input data.
The rule uses probabilistic matching techniques to identify the types of
information.
The Spain accelerator depends on the following general data cleansing rules from the Core accelerator:
• rule_Assign_DQ_90_ElementResultStatus_Description
• rule_Assign_DQ_90_GeocodingStatus-Description
• rule_Assign_DQ_90_Match_Code_Descriptions
• rule_Remove_Extra_Spaces
• rule_Remove_Leading_Zero
• rule_Remove_Limited_Punctuation
• rule_Remove_Non_Numbers
• rule_Remove_Punctuation_and_Space
• rule_Remove_Punctuation
• rule_Replace_limited_Punct_with_Space
• rule_Translate_Diacritic_Characters
• rule_UpperCase
Find the matching and deduplication rules in the following repository location:
[Informatica_DQ_Content]\Rules\Matching_Deduplication
The following table describes the matching and deduplication rules in the Spain accelerator:
Name Description
m_ESP_customer_data_demo
m_ESP_customer_matching_demo
Parses and standardizes identity data from Spain and performs identity match analysis on the data.
The United Kingdom accelerator includes rules that perform the following data quality processes:
Find the address data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Address_Data_Cleansing
113
The following table describes the address data cleansing rules in the United Kingdom accelerator:
Name Description
rule_GBR_Address_Parse_Hyb Parses unstructured United Kingdom addresses into address elements. The
rid rule does not validate the addresses. Use the rule when you can connect the
input address fields to the Hybrid input ports on the Address Validator
transformation.
rule_GBR_Address_Parse_Mul Parses unstructured United Kingdom addresses into address elements. The
tiline rule does not validate the addresses. Use the rule when you can connect the
input address fields to the Multiline input ports on the Address Validator
transformation.
rule_GBR_Address_Validation_ Validates the deliverability of United Kingdom addresses and adds latitude and
Discrete_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Discrete input ports on the Address Validator
transformation.
rule_GBR_Address_Validation_ Validates the deliverability of United Kingdom addresses. The rule corrects
Discrete errors in the input addresses where possible. Use the rule when you can
connect the input address fields to the Discrete input ports on the Address
Validator transformation.
rule_GBR_Address_Validation_ Validates the deliverability of United Kingdom addresses and adds latitude and
Hybrid_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Hybrid input ports on the Address Validator
transformation.
rule_GBR_Address_Validation_ Validates the deliverability of United Kingdom addresses. The rule corrects
Hybrid errors in the input addresses where possible. Use the rule when you can
connect the input address fields to the Hybrid input ports on the Address
Validator transformation.
rule_GBR_Address_Validation_ Validates the deliverability of United Kingdom addresses and adds latitude and
Multiline_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Multiline input ports on the Address Validator
transformation.
rule_GBR_Address_Validation_ Validates the deliverability of United Kingdom addresses. The rule corrects
Multiline errors in the input addresses where possible. Use the rule when you can
connect the input address fields to the Multiline input ports on the Address
Validator transformation.
rule_GBR_Postcode_Standardi Standardizes United Kingdom postal codes. The rule requires that the input
se follows predefined formats.
The rule standardizes inputs that match the following patterns:
- A9 9AA
- A99 9AA
- AA9 9AA
- AA99 9AA
- A9A 9AA
- AA9A 9AA
- GIR 0AA
The letter A represents an alphabetic character and the number 9 represents a
digit.
rule_GBR_Postcode_Validate Validates United Kingdom. postal codes. The rule matches standardized postal
codes with valid United Kingdom postal codes. If the rule does not find a
matching postal code, it verifies whether the postal code follows the standard
United Kingdom. postal code pattern.
Find the contact data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
The following table describes the contact data cleansing rules in the United Kingdom accelerator:
Name Description
rule_GBR_Driver_Number_Par Parses strings that match the format of United Kingdom driver's license
se numbers.
rule_GBR_Driver_Number_Vali Validates United Kingdom driver's license numbers based on the requirements
dation of the United Kingdom Government Data Standards Catalog.
rule_GBR_Gender_Assignment Assigns gender according to first names. The rule returns "M" for male names,
"F" for female names, and "U" if the gender is unknown. For example, the rule
assigns the name "John Smith" a gender of "M" for male.
rule_GBR_Multi_Person_Name Parses person name values into separate ports. The rule creates ports for
_Parse values such as title, first name, middle name, and surname.
The rule output includes a port that contains the full name of the person in the
record. You can use the full name port as an input to a Match transformation in
an identity match analysis mapping.
When the name data identifies more than one person, the rule creates an
output port for each full name. For example, the rule can read the name "John
and Jane Smith" and create output ports for "John Smith" and "Jane Smith."
rule_GBR_NHS_Number_Pars Parses National Health Service (NHS) numbers from a string. The rule returns
e the NHS number and also returns a string that contains the input text with the
NHS number removed.
rule_GBR_NHS_Number_Stan Standardizes National Health Service (NHS) numbers into the standard format
dardise (999 999 9999). The rule requires that the input is a 10-digit string.
rule_GBR_NHS_Number_Valid Validates National Health Service (NHS) numbers based on the check digit in
ate each number. The rule requires that the input is a 10-digit string.
rule_GBR_NINO_Conformity_C Validates the standard pattern for a United Kingdom National Insurance
heck Number (NINO). The rule does not verify that a NINO is accurate or active.
rule_GBR_NINO_Parse Parses United Kingdom National Insurance Numbers (NINO) from strings. The
rule returns the NINO and also returns a string that contains the input text with
the NINO removed.
rule_GBR_NINO_Standardizati Standardizes United Kingdom National Insurance Numbers (NINO) into the two
on most typical formats. The rule returns the following formats, where C
represents alphabetic characters and N represents numerals:
- CC NN NN NN C
- CCNNNNNNC
The rule formats all alphabetic characters as uppercase. The rule requires that
the input conforms to the pattern of a NINO.
rule_GBR_NINO_Validation Validates a United Kingdom National Insurance Number (NINO). The rule does
not verify that a NINO is active.
rule_GBR_Passport_Number_ Parses United Kingdom passport numbers in extended format. The extended
MR_Parse format is the machine readable format for passport numbers.
rule_GBR_Passport_Number_ Parses United Kingdom passport numbers that use the format specified by the
Parse Government Data Standards Catalogue. The rule parses all nine-digit strings.
rule_GBR_Passport_Number_ Validates United Kingdom passport numbers that use the format specified by
Validation the Government Data Standards Catalogue.
rule_GBR_Phone_Number_Par Parses a United Kingdom telephone number from a string. The rule parses the
se first telephone number in the data, reading from right to left.
The rule recognizes telephone numbers that use leading zeros, the "+44"
international dialing code, and extensions that begin with the hash symbol. The
rule processes the following punctuation symbols: the plus sign, parentheses,
and the hash symbol. Before you run the rule, remove all other punctuation,
including double spaces.
The rule returns a telephone number and also returns a string that contains the
input text with the telephone number removed.
rule_GBR_Phone_Number_Vali Validates the area code and length of United Kingdom telephone numbers. The
dation rule returns the region of the telephone number as well as codes that indicate
if the area code and length of a telephone number are valid.
rule_Prename_Assignment Generates an honorific according to the gender. You can change the
female_prename expression variable from Ms. to Mrs.
rule_Salutation_Assignment Generates formal and casual greetings from prenames and name tokens. For
example, when input data contains "Mr. John Smith," the rule generates the
formal greeting "Dear Mr. Smith," and the casual greeting "Dear John,". You
can change the prefix and punctuation by editing the variables in the
dq_Generate_Salutation Expression transformation.
Find the financial data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Financial_Data_Cleansing
Name Description
rule_GBR_Bank_Account_Pars Parses eight-digit numeric strings as United Kingdom bank account numbers.
e
rule_GBR_Bank_Account_Valid Validates United Kingdom bank account numbers. The rule returns codes that
ation indicate whether the input is numeric and whether it is the correct number of
digits.
rule_GBR_Bank_Sort_Code_P Parses six-digit numeric strings as United Kingdom bank sort codes. The rule
arse parses strings of numbers in the following formats:
- Consecutive numbers (999999)
- Numbers delimited with a dash (99-99-99)
rule_GBR_Bank_Sort_Code_V Validates the format and length of United Kingdom bank sort codes that are
alidation standardized to the dash-delimited format (99-99-99). The rule returns a Status
port that describes the validity of the sort code and a Validation Note port that
explains the status. If the sort code prefix matches a known assignment for a
United Kingdom bank, the Validation Note port includes the bank name.
rule_GBR_Bank_Sort_Code_St Standardizes a United Kingdom bank sort code to the format "NN-NN-NN."
andardise
Find the general data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\General_Data_Cleansing
The following table describes the general data cleansing rules in the United Kingdom accelerator:
Name Description
rule_GBR_NER_Field_Identification Identifies the type of information contained in an input field. The rule can
identify names, Personal IDs, company names, dates, and United Kingdom
address data. The rule returns a label that describes the type of input data.
The rule uses reference data to identify the types of information. The rule
uses probabilistic matching techniques to identify the types of information.
The United Kingdom accelerator depends on the following general data cleansing rules from the Core
accelerator:
• rule_Assign_DQ_90_GeocodingStatus_Description
• rule_Assign_DQ_90_Mailability_Score_Description
• rule_Assign_DQ_90_Match_Code_Descriptions
• rule_Remove_Extra_Spaces
• rule_Remove_Leading_Zero
• rule_Remove_Period_Parentheses
Find the matching and deduplication rules in the following repository location:
[Informatica_DQ_Content]\Rules\Matching_Deduplication
The following table describes the matching and deduplication rules in the United Kingdom accelerator:
Name Description
mplt_GBR_Company_Name_P Uses field match strategies to identify duplicate rows in United Kingdom data
ostcode_Match based on company name and postal code. The mapplet generates group keys
from the postal code.
mplt_GBR_Familyname_and_N Uses field match strategies to identify duplicate rows in United Kingdom data
INO_Match based on surname and National Insurance Number (NINO). The mapplet
generates group keys from the NINO data.
mplt_GBR_Familyname_and_P Uses field match strategies to identify duplicate rows in United Kingdom data
ostcode_Match based on surname and United Kingdom postal code. The mapplet generates
group keys from the postal code data.
mplt_GBR_Firstname_3CharsS Uses field match strategies to identify duplicate rows in United Kingdom data
urname_DOB_and_Postcode_ based on the following data:
Match - First name
- The first three characters in the surname
- Date of birth
- postal code
The mapplet generates group keys from the postal code data.
mplt_GBR_Firstname_Surnam Uses field match strategies to identify duplicate rows in United Kingdom data
e_2ElementsDOB_and_Postco based on the following data:
de_Match - Person names
- Any two date of birth elements, such as month and year
- United Kingdom postal code
The mapplet generates group keys from the postal code data.
mplt_GBR_Firstname_Surnam Uses field match strategies to identify rows based on the following data:
e_DOB_and_Postcode_Match - Person names
- Date of birth
- postal code
The mapplet generates group keys from the postal code data.
mplt_GBR_IMO_Company_Na Uses identity match strategies to identify duplicate rows in United Kingdom
me_and_Address_Match data based on company names and addresses. The mapplet generates group
keys from the postal code data.
mplt_GBR_IMO_Familyname_a Uses identity match strategies to identify duplicate rows in United Kingdom
nd_Address_Match data based on family names and addresses. The mapplet generates group
keys from the postal code data.
mplt_GBR_IMO_Individual_Na Uses identity match strategies to identify duplicate rows in United Kingdom
me_and_Address_Match data based on person names and addresses. The mapplet generates group
keys from the postal code data.
mplt_GBR_IMO_Personal_Nam Uses identity match strategies to identify duplicate rows in United Kingdom
e_and_Data data based on person names and personal data. The fields in the personal
data column must contain a single type of data, such as telephone number,
email, or National Insurance Number. The mapplet generates group keys from
the personal data.
mplt_GBR_Individual_Name_a Uses field match strategies to identify duplicate rows in United Kingdom data
nd_Date_Match based on person names and date data. The mapplet generates group keys
from the date data.
mplt_GBR_Individual_Name_a Uses field match strategies to identify duplicate rows in United Kingdom data
nd_Email_Match based on person names and the email address data. The mapplet generates
group keys from the email address data.
mplt_GBR_Individual_Name_a Uses field match strategies to identify duplicate rows in United Kingdom data
nd_NINO_Match based on person names and National Insurance Numbers (NINO). The
mapplet generats group keys from the NINO data.
mplt_GBR_Individual_Name_a Uses field match strategies to identify duplicate rows in United Kingdom data
nd_Phone_Match based on person names and telephone numbers. The mapplet generates
group keys from the telephone number data.
mplt_GBR_Individual_Name_a Uses field match strategies to identify duplicate rows in United Kingdom data
nd_Postcode_Match based on person names and the postal code data. The mapplet generates
group keys from the postal code data.
mplt_GBR_Individual_Name_M Uses field match strategies to identify duplicate rows in United Kingdom data
atch based on person names. The mapplet generates NYSIIS codes from the
surname values and uses the NYSIIS codes as group keys.
rule_GBR_Familyname_and_NI Generates a match score based on surnames and United Kingdom National
NO_MatchScore Identification Numbers (NINO).
rule_GBR_Familyname_and_P Generates a match score based on surnames and United Kingdom postal
ostcode_MatchScore codes.
rule_GBR_Firstname_Surname Generates a match score based on person names, date of birth, and postal
_DOB_and_Postcode_MatchSc code.
ore
rule_GBR_Individual_Name_an Generates a match score based on person names and United Kingdom
d_NINO_MatchScore National Insurance Numbers (NINO).
rule_GBR_Individual_Name_an Generates a match score based on person names and telephone numbers.
d_Phone_MatchScore
rule_GBR_Individual_Name_an Generates a match score based on person names and United Kingdom postal
d_Postcode_MatchScore codes.
rule_GBR_Company_Name_Po Generates a match score based on company name and United Kingdom postal
stcode_MatchScore codes.
rule_Individual_Name_and_Em Generates a match score based on person names and email addresses.
ail_MatchScore
Name Description
rule_GBR_Contact_Data Parses, standardizes, and validates United Kingdom contact data, such as addresses,
telephone numbers, and National Insurance Numbers (NINO).
Name Location
rule_Assign_DQ_90_Mailability_Score_Description [Informatica_DQ_Content]\Rules\General_Data_Cleansing
rule_Company_Name_Standardization [Informatica_DQ_Content]\Rules
\Corporate_Data_Cleansing
rule_Email_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_GBR_Address_Validation_Hybrid [Informatica_DQ_Content]\Rules\Address_Data_Cleansing
rule_GBR_Gender_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_GBR_Multi_Person_Name_Parse [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_GBR_NINO_Standardization [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_GBR_NINO_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_GBR_Phone_Number_Standardization [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_GBR_Phone_Number_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_Prename_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_Salutation_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
m_GBR_customer_data_demo
m_GBR_customer_matching_demo
Parses and standardizes identity data from the United Kingdom and performs identity match analysis on
the data.
The mapping analyzes the following data combinations and generates match clusters for each
combination:
U.S./Canada Accelerator
This chapter includes the following topics:
The U.S./Canada accelerator includes rules that perform the following data quality processes:
124
The following table describes the address data cleansing rules in the U.S./Canada accelerator:
Name Description
rule_CAN_Address_Parse_Hyb Parses unstructured Canadian addresses into address elements. The rule
rid does not validate the addresses. Use the rule when you can connect the input
address fields to the Hybrid input ports on the Address Validator
transformation.
rule_CAN_Address_Parse_Mult Parses unstructured Canadian addresses into address elements. The rule
iline does not validate the addresses. Use the rule when you can connect the input
address fields to the Multiline input ports on the Address Validator
transformation.
rule_CAN_Address_Validation_ Validates the deliverability of Canadian addresses and adds latitude and
Discrete_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Discrete input ports on the Address Validator
transformation.
rule_CAN_Address_Validation_ Validates the deliverability of Canadian addresses. The rule corrects errors in
Discrete the input addresses where possible. Use the rule when you can connect the
input address fields to the Discrete input ports on the Address Validator
transformation.
rule_CAN_Address_Validation_ Validates the deliverability of Canadian addresses and adds latitude and
Hybrid_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Hybrid input ports on the Address Validator
transformation.
rule_CAN_Address_Validation_ Validates the deliverability of Canadian addresses. The rule corrects errors in
Hybrid the input addresses where possible. Use the rule when you can connect the
input address fields to the Hybrid input ports on the Address Validator
transformation.
rule_CAN_Address_Validation_ Validates the deliverability of Canadian addresses and adds latitude and
Multiline_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Multiline input ports on the Address Validator
transformation.
rule_CAN_Address_Validation_ Validates the deliverability of Canadian addresses. The rule corrects errors in
Multiline the input addresses where possible. Use the rule when you can connect the
input address fields to the Multiline input ports on the Address Validator
transformation.
rule_CAN_Postcode_Validation Validates Canadian postal codes. The rule returns "Valid" or "Invalid."
rule_CAN_Province_Validation Validates Canadian province names. The rule returns "Valid" or "Invalid."
rule_USA_Address_Parse_Hyb Parses unstructured United States addresses into address elements. The rule
rid does not validate the addresses. Use the rule when you can connect the input
address fields to the Hybrid input ports on the Address Validator
transformation.
rule_USA_Address_Parse_Mult Parses unstructured United States addresses into address elements. The rule
iline does not validate the addresses. Use the rule when you can connect the input
address fields to the Multiline input ports on the Address Validator
transformation.
rule_USA_Address_Validation_ Validates the deliverability of United States addresses and adds latitude and
Discrete_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Discrete input ports on the Address Validator
transformation.
rule_USA_Address_Validation_ Validates the deliverability of United States addresses. The rule corrects errors
Discrete in the input addresses where possible. Use the rule when you can connect the
input address fields to the Discrete input ports on the Address Validator
transformation.
rule_USA_Address_Validation_ Validates the deliverability of address records from United States addresses
Hybrid_w_Geocoding and adds latitude and longitude coordinates to each output address. The rule
corrects errors in the input addresses where possible. Use the rule when you
can connect the input address fields to the Hybrid input ports on the Address
Validator transformation.
rule_USA_Address_Validation_ Validates the deliverability of address records from United States addresses.
Hybrid The rule corrects errors in the input addresses where possible. Use the rule
when you can connect the input address fields to the Hybrid input ports on the
Address Validator transformation.
rule_USA_Address_Validation_ Validates the deliverability of United States. addresses and adds latitude and
Multiline_w_Geocoding longitude coordinates to each output address. The rule corrects errors in the
input addresses where possible. Use the rule when you can connect the input
address fields to the Multiline input ports on the Address Validator
transformation.
rule_USA_Address_Validation_ Validates the deliverability of U.S. addresses. The rule corrects errors in the
Multiline input addresses where possible. Use the rule when you can connect the input
address fields to the Multiline input ports on the Address Validator
transformation.
rule_USA_County_Validation Validates United States county names. The rule compares input data against
county names in all states. The rule returns "Valid" or "Invalid."
rule_USA_State_Validation Validates United States state names. The rule returns "Valid" or "Invalid."
rule_USA_ZIPCode_Validation Validates five-digit United States Zone Improvement Plan (ZIP) Codes. The
rule returns "Valid" or "Invalid."
Find the contact data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
Name Description
rule_CAN_Gender_Assignment Assigns gender according to first names. The rule returns "M" for male names,
"F" for female names, and "U" if the gender is unknown. For example, the rule
assigns the name "John Smith" a gender of "M" for male.
rule_CAN_Given_Name_Stand Generate given names from Canadian nicknames. For example, the rule
ard standardizes the nickname "Bob" to the given name "Robert."
rule_CAN_Multi_Person_Name Parses person name values into separate ports. The rule creates ports for
_Parse values such as title, first name, middle name, and surname.
The rule output includes a port that contains the full name of the person in the
record. You can use the full name port as an input to a Match transformation in
an identity match analysis mapping.
When the name data identifies more than one person, the rule creates an
output port for each full name. For example, the rule can read the name "John
and Jane Smith" and create output ports for "John Smith" and "Jane Smith."
rule_CAN_Personal_Name_Par Parses the values in a person name into separate ports. The rule also
se_and_Standardize_FML standardizes the name values.
The rule creates the ports in the following sequence:
- First name, middle name, last name
The rule output also includes a port that contains the full name of the person in
the record. You can use the full name port as an input to a Match
transformation in an identity match analysis mapping.
rule_CAN_Personal_Name_Par Parses the values in a person name into separate ports. The rule also
se_and_Standardize_LFM standardizes the name values.
The rule creates the ports in the following sequence:
- Last name, first name, middle name
The rule output also includes a port that contains the full name of the person in
the record. You can use the full name port as an input to a Match
transformation in an identity match analysis mapping.
rule_CAN_Phone_Number_Par Parses a Canadian telephone number from a string. The rule parses the first
se telephone number in the data, reading from right to left. The rule returns a
telephone number and also returns a string that contains the input text with the
telephone number removed.
rule_CAN_Phone_Number_Sta Standardizes Canadian telephone numbers. The rule returns the telephone
ndardization number in the following formats:
- Standard - (nnn) nnn-nnnn
- Dashes - nnn-nnn-nnnn
- No Spaces - nnnnnnnnnn
rule_CAN_Phone_Number_Vali Validates the area code and length of Canadian telephone numbers. The rule
dation returns codes that indicate telephone number type and validity. Types describe
categories such as "toll-free."
rule_CAN_SIN_Parse Parses a Canadian Social Insurance Number (SIN) from a string. The rule
returns the SIN and also returns a string that contains the input text with the
SIN removed.
rule_CAN_SIN_Standardization Standardizes Canadian Social Insurance Numbers (SIN). The rule can output
the following formats:
- No Punctuation - nnnnnnnnn
- Space - nnn nnn nnn
- Dash - nnn-nnn-nnn
To change the format, edit the SIN_format expression variable in the
dq_Format_SIN Expression transformation. Default is "No_Punctuation."
rule_CAN_SIN_Validation Validates Canadian Social Insurance Numbers (SIN). The rule uses the Luhn
algorithm to verify whether or not a SIN is valid. The rule returns "Valid" or
"Invalid."
rule_Prename_Assignment Generates an honorific according to the gender. You can change the
female_prename expression variable from Ms. to Mrs.
rule_Salutation_Assignment Generates formal and casual greetings from prenames and name tokens. For
example, when input data contains "Mr. John Smith," the rule generates the
formal greeting "Dear Mr. Smith," and the casual greeting "Dear John,". You
can change the prefix and punctuation by editing the variables in the
dq_Generate_Salutation Expression transformation.
rule_USA_Gender_Assignment Assigns gender according to first name. The rule returns "M" for male names,
"F" for female names, and "U" if the gender is unknown. For example, the rule
assigns the name "John Smith" a gender of "M" for male.
rule_USA_Given_Name_Stand Generate given names from U.S. nicknames. For example, the rule
ard standardizes the nickname "Bob" to the given name "Robert."
rule_USA_Multi_Person_Name Parses person name values into separate ports. The rule creates ports for
_Parse values such as title, first name, middle name, and surname.
The rule output includes a port that contains the full name of the person in the
record. You can use the full name port as an input to a Match transformation in
an identity match analysis mapping.
When the name data identifies more than one person, the rule creates an
output port for each full name. For example, the rule can read the name "John
and Jane Smith" and create output ports for "John Smith" and "Jane Smith."
rule_USA_Personal_Name_Par Parses the values in a person name into separate ports. The rule also
se_and_Standardize_FML standardizes the name values.
The rule creates the ports in the following sequence:
- First name, middle name, last name
The rule output also includes a port that contains the full name of the person in
the record. You can use the full name port as an input to a Match
transformation in an identity match analysis mapping.
rule_USA_Personal_Name_Par Parses the values in a person name into separate ports. The rule also
se_and_Standardize_LFM standardizes the name values.
The rule creates the ports in the following sequence:
- Last name, first name, middle name
The rule output also includes a port that contains the full name of the person in
the record. You can use the full name port as an input to a Match
transformation in an identity match analysis mapping.
rule_USA_Phone_Number_Par Parses a United States telephone number from a string. The rule parses the
se first telephone number in the data, reading from right to left. The rule returns a
telephone number and also returns a string that contains the input text with the
telephone number removed.
rule_USA_Phone_Number_Sta Standardizes United States telephone numbers. The rule returns the telephone
ndardization number in the following formats:
- Standard - (nnn) nnn-nnnn
- Dashes - nnn-nnn-nnnn
- No Spaces - nnnnnnnnnn
rule_USA_Phone_Number_Vali Validates the area code and length of United States telephone numbers. The
dation rule returns codes that indicate if the area code and length of a telephone
number are valid.
rule_USA_SSN_Standardizatio Standardizes United States Social Security Numbers (SSN). The rule can
n output the following formats:
- No Punctuation - nnnnnnnnn
- Space - nnn nnn nnn
- Dash - nnn-nnn-nnn
To change the format, edit the SSN_format expression variable in the
dq_SSN_Format Expression transformation. Default is "No_Punctuation."
rule_USA_SSN_Validation Validates United States Social Security Numbers (SSN). The rule validates
each SSN for length, numeric values, and known mininum and maximum
values in the Area, Group, and Serial Number sections.
The Area section comprises the first three digits of the SSN, and the Group
section comprises the fourth and fifth digits. The Serial Number section
comprises the final four digits.
If the SSN was issued prior to June 2011, the rule also verifies that the Area
value and Group value are a valid combination. The rule does not verify that
the SSN is an issued number. The rule returns "Valid" or "Invalid."
rule_USA_SSN_Validation_pos Validates United States Social Security Numbers (SSN). The rule validates
t_June2011 each SSN for length, numeric values, and known mininum and maximum
values in the Area, Group, and Serial Number sections.
The Area section comprises the first three digits of the SSN, and the Group
section comprises the fourth and fifth digits. The Serial Number section
comprises the final four digits.
The rule does not verify that the Area value and Group value are a valid
combination. The rule does not verify that the SSN is an issued number. The
rule returns "Valid" or "Invalid."
• rule_Email_Validation
For more information about these rules, see “Core Contact Data Cleansing Rules” on page 23.
The U.S./Canada accelerator depends on the following corporate data cleansing rule from the Core
accelerator:
• rule_Company_Name_Standardization
Find the general data cleansing rules in the following repository location:
[Informatica_DQ_Content]\Rules\General_Data_Cleansing
The following table describes the general data cleansing rules in the U.S./Canada accelerator:
Name Description
rule_CAN_Field_Identification Identifies the type of information contained in an input field. The rule can
identify names, Personal IDs, company names, dates, and Canadian address
data. The rule returns a label that describes the type of input data. The rule
uses reference data to identify the types of information.
rule_CAN_NER_Field_Identific Identifies the type of information contained in an input field. The rule can
ation identify names, Personal IDs, company names, dates, and Canadian address
data. The rule returns a label that describes the type of input data. The rule
uses reference data to identify the types of information. The rule uses
probabilistic matching techniques to identify the types of information.
rule_USA_Field_Identification Identifies the type of information contained in an input field. The rule can
identify names, Personal IDs, company names, dates, and United States
address data. The rule returns a label that describes the type of input data.
The rule uses reference data to identify the types of information.
rule_USA_NER_Field_Identific Identifies the type of information contained in an input field. The rule can
ation identify names, Personal IDs, company names, dates, and United States
address data. The rule returns a label that describes the type of input data.
The rule uses reference data to identify the types of information. The rule uses
probabilistic matching techniques to identify the types of information.
• rule_Assign_DQ_90_GeocodinStatus_Description
• rule_Assign_DQ_90_Mailability_Score_Description
• rule_Assign_DQ_90_Match_Code_Descriptions
• rule_Date_Validation
• rule_Remove_Extra_Spaces
• rule_Remove_Punctuation
• rule_Replace_Limited_Punct_with_Space
• rule_UpperCase
For more information about these rules, see “Core General Data Cleansing Rules” on page 24.
Find the matching and deduplication rules in the following repository location:
[Informatica_DQ_Content]\Rules\Matching_Deduplication
The following table describes the matching and deduplication rules in the U.S./Canada accelerator:
Name Description
mplt_CAN_IMO_Company_Na Uses identity match strategies to identify duplicate rows in Canadian data
me_and_Address_Match based on company names and addresses. The mapplet generates group keys
from the postal code data.
mplt_CAN_IMO_Familyname_a Uses identity match strategies to identify duplicate rows in Canadian data
nd_Address_Match based on family names and addresses. The mapplet generates group keys
from the postal code data.
mplt_CAN_IMO_Individual_Na Uses identity match strategies to identify duplicate rows in Canadian data
me_and_Address_Match based on person names and addresses. The mapplet generates group keys
from the postal code data.
mplt_CAN_IMO_Personal_Nam Uses identity match strategies to identify duplicate rows in Canadian data
e_and_Data based on person names and personal data. The fields in the personal data
column should contain a single type of data, such as telephone number, email,
or Social Insurance Number. The mapplet generates group keys from the
personal data.
mplt_Company_Name_and_Ad Uses field match strategies to identify duplicate rows based on company name
dress_Match and address data. The mapplet uses a combination of characters from the
company name values and the postal code values to generate group keys.
mplt_Company_Name_Match Uses field match strategies to identify duplicate rows based on company name.
The mapplet generates Soundex codes from the company name values and
uses the Soundex codes as group keys.
mplt_Familyname_and_Addres Uses field match strategies to identify duplicate rows based on surname and
s_Match address data. The mapplet uses a combination of characters from the surname
values and the postal code values to generate group keys.
mplt_Firstname_and_SSN_Mat Uses field match strategies to identify duplicate rows based on first names and
ch United States Social Security numbers. The mapplet generates group keys
from the Social Security number data.
mplt_Individual_Name_and_Ad Uses field match strategies to identify duplicate rows based on person names
dress_Match and United States address data. The mapplet uses a combination of
characters from the surname values and the postal code values to generate
group keys.
mplt_Individual_Name_and_Da Uses field match strategies to identify duplicate rows based on person names
te_Match and date data. The mapplet generates group keys from the date data.
mplt_Individual_Name_and_Em Uses field match strategies to identify duplicate rows based on person names
ail_Match and email addresses. The mapplet generates group keys from the email
address data.
mplt_Individual_Name_and_Ph Uses field match strategies to identify duplicate rows based on person names
one_Match and telephone numbers. The mapplet generates group keys from the
telephone number data.
mplt_Individual_Name_and_SS Uses field match strategies to identify duplicate rows based on person names
N_Match and United States Social Security numbers. The mapplet generates keys
generated from the Social Security number data.
mplt_Individual_Name_Match Uses field match strategies to identify duplicate rows based on person names.
The mapplet generates NYSIIS codes from the surname values and uses the
NYSIIS codes as group keys.
mplt_USA_Address_Match Uses field match strategies to identify duplicate rows in United States data
based on United States address data. The mapplet generates group keys from
the postal code data.
mplt_USA_IMO_Company_Na Uses identity match strategies to identify duplicate rows in United States data
me_and_Address_Match based on company names and addresses. The mapplet generates group keys
from the postal code data.
mplt_USA_IMO_Familyname_a Uses identity match strategies to identify duplicate rows in United States data
nd_Address_Match based on family names and addresses. The mapplet generates group keys
from the postal code data.
mplt_USA_IMO_Individual_Na Uses identity match strategies to identify duplicate rows in United States data
me_and_Address_Match based on person names and addresses. The mapplet generates group keys
from the postal code data.
mplt_USA_IMO_Personal_Nam Uses identity match strategies to identify duplicate rows in United States data
e_and_Data based on person names and personal data. The fields in the personal data
column must contain a single type of data, such as telephone number, email,
or Social Security number. The mapplet generates group keys from the
personal data.
rule_Company_Name_and_Ad Generates a match score based on company names and United States
dress_MatchScore address data.
rule_Familyname_and_Address Generates a match score based on surnames and United States address data.
_MatchScore
rule_Firstname_and_SSN_Mat Generates a match score based on first names and United States address
chScore data.
rule_Individual_Name_and_Ad Generates a match score based on person names and United States address
dress_MatchScore data.
rule_Individual_Name_and_Em Generates a match score based on person names and email addresses.
ail_MatchScore
rule_Individual_Name_and_Ph Generates a match score based on person names and telephone numbers.
one_MatchScore
rule_Individual_Name_and_SS Generates a match score based on person names, Social Security numbers,
N_MatchScore and identification data.
Name Description
rule_US_Contact_Data Parses, standardizes, and validates U.S. contact data, such as addresses, telephone
numbers, and Social Security Numbers (SSN).
The following table lists the names and repository locations of the rules in the composite rule for United
States contact data:
Rule Location
rule_Assign_DQ_90_Mailability_Score_Description [Informatica_DQ_Content]\Rules\General_Data_Cleansing
rule_Company_Name_Standardization [Informatica_DQ_Content]\Rules
\Corporate_Data_Cleansing
rule_Email_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_Salutation_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_USA_Address_Validation_Hybrid [Informatica_DQ_Content]\Rules\Address_Data_Cleansing
rule_USA_Gender_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_USA_Multi_Person_Name_Parse [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_USA_Phone_Number_Standardization [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_USA_Phone_Number_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_USA_Prename_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_USA_SSN_Standardization [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_USA_SSN_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
Name Description
rule_CAN_Contact_Data Parses, standardizes, and validates Brazilian contact data, such as addresses,
telephone numbers, and Social Insurance Numbers (SIN).
The following table lists the names and repository locations of the rules in the composite rule for Canadian
contact data:
Rule Location
rule_Assign_DQ_90_Mailability_Score_Descriptions Informatica_DQ_Content]\Rules\General_Data_Cleansing
rule_CAN_Address_Validation_Hybrid [Informatica_DQ_Content]\Rules\Address_Data_Cleansing
rule_CAN_Company_Standardization [Informatica_DQ_Content]\Rules
\Corporate_Data_Cleansing
rule_CAN_Gender_Assignment Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_CAN_Multi_Person_Name_Parse Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_CAN_Phone_Number_Standardization [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_CAN_Phone_Number_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_CAN_SIN_Standardization [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_CAN_SIN_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_Email_Validation [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_Prename_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
rule_Salutation_Assignment [Informatica_DQ_Content]\Rules\Contact_Data_Cleansing
m_customer_data_US_demo
m_customer_matching_US_demo
Parses and standardizes identity data from the United States and perofrms identity match analysis on
the data.
The mapping analyzes the following data combinations and generates match clusters for each
combination: