Agenda
Data Quality
• List of Issues occurred in Source system
Health Check
• How to identify the issue upfront?
Data Profiling
• How to resolve the data quality issue?
Data Quality
Types of Data Quality Considered
Schema Length
Compliance
Data Type Compliance
IN2TEXT Compliance
Single Value with MV Data
MV with Sub Value Data
5
Issue Patterns
Pattern - 1
•This is the most common form of data quality issue encountered, the data field length is not aligned with schema definition from
the source system
Pattern - 2
•This is more typical data quality validation, and is used to validate both occasional schema violations and casting format errors
• Data inputted incorrectly into the source system (i.e. text in a number field)
•Data supplied in an incorrect format (i.e. YYYYMMDD mask with 31-Jan-2019 data)
Pattern - 3
•IN2TEXT – as a Single Value rather than Multi Value
Pattern - 4
•SS definition as Single Value but values comes as Multi Value
Pattern – 5
•SS definition as Multi Value but values comes as Sub Value
6
Pattern -3 [IN2TEXT as Single Value]
STANDARD.SELECTION – PAYMENT.PURPOSE.CODE RECORD REFERENCE – PAYMENT.PURPOSE.CODE - ACCT
RECORD FROM CLASSIC VIEW – PAYMENT.PURPOSE.CODE S ACCT DATA EVENT FROM DBTOOLS
7
Pattern -4 [Single Value but Multi Valued data]
APPLICATION: F.AC.EVENT
RECID: INTEREST-DEBIT-DUE
ERROR MESSAGE:
Found multi values for a single field: schema=AC_EVENT data={fieldNumber=2,mvIndex=1,svIndex=1,value=Handles due interest take
over},{fieldNumber=2,mvIndex=2,svIndex=1,value=Executes as part of DEBIT action },{fieldNumber=2,mvIndex=3,svIndex=1,value=of interest property
class.},{fieldNumber=2,mvIndex=4,svIndex=1,value=Accounting (for Loans):},{fieldNumber=2,mvIndex=5,svIndex=1,value=Dr
ACC<Interest>},{fieldNumber=2,mvIndex=6,svIndex=1,value=Cr <Capture Suspense>}
8
Pattern -5 [Multi Value but Sub Valued data]
TSR-151458 Incident raised by BCI for the issue with the following application – F.AA.PROPERTY.CLASS
Field FULL.DESC is defined as MULTI VALUE but the Value for the RECID CHANGE.PRODUCT is coming with SM Marker instead of VM Marker
APPLICATION: F.AA.PROPERTY.CLASS
RECID: CHANGE.PRODUCT
ERROR MESSAGE:
Invalid multi value: schema=AA_PROPERTY_CLASS position=2 data={fieldNumber=2,mvIndex=1,svIndex=2,value=products on a running arrangement.}
CORRECT RECORD - F.AA.PROPERTY.CLASS – ACTIVITY CHARGES/ FAILED RECORD - F.AA.PROPERTY.CLASS – CHANGE.PRODUCT
9
Health Check
Overview - RR.DATA.HEALTH.CHECK
Utility developed to identify data issue in
01 Transact database that does not match with
Schema definitions
New service introduced which scan all the
02 tables configured in BATCH record and
validate each data.
Dashboard to display all the errors, which are
03 captured in RR.DATA.HEALTH.CHECK
application during the service execution.
04
Configurations
❑ Batch Record – RR.HEALTH
Add the Applications in Data Field of BATCH Record
Configurations
❑ TSA.SERVICE Record – RR.HEALTH
Start the TSA.SERVICE
Execution – RR.HEALTH
DASHBOARD – RR.DATA.CHECK
File name is mandatory input to execute
the enquiry
DASHBOARD – RESULTS
Data Profiling
Data Profiling It is the most common form of
data quality issue
encountered
1. Data Profiling is a common Source System provides an
Schema Compliance application to Override the
solution to cater the length schema length
and datatype issue in SS
across the vertical This ensures that the schema
supplied through DES from the
source system aligns with the
data content of the dataset
2. Data Profiling is an utility to
This is another common form of
cater the length and data quality issue
datatype issue in SS from encountered
Transact System
Source system provides an
application to overcome the
Data/Schema Profiling Data Type Compliance
datatype mismatch to correct
the right data type
3. Introduced
RR.OVERRIDE.PARAMETER These exceptions will typically
application in Transact to be encountered in online
services/COB transactions
handle the length and data
type issue to override the
The data is not aligned with
properties at runtime Single Value field with MV
data
the expected schema
definition
The data is not aligned with
MV field with Sub Value Data the expected schema
definition
How to Configure Data Profiling ?
1. Data Profiling – Length Change
2. Data Profiling – Data Type Change
3. Schema Profiling – Single Value field to MV Change
4. Schema Profiling – MV field to Sub Value Change
How to configure RR.OVERRIDE.PARAMETER for Length Change?
Add USER application Enter TARGET.FIELD.LENGTH greater than SOURCE.FIELD.LENGTH
Commit the record
Add the field name as ‘PASSWORD’ and validate
SOURCE.FIELD.LENGTH and SOURCE.FIELD.TYPE values are
populated.
Error message ‘Target field length or target field type should be entered’ is
displayed as either TARGET.FIELD.LENGTH or TARGET.FIELD.TYPE
should be entered for a field
Authorise the record
How to configure RR.PARAM?
Configure F.USER table in RR.PARAM
The ‘fieldinputlen’ value for PASSWORD field is updated as 150
Authorise the record
Verify the event created in RR.XSD.EVENTS
How to configure RR.OVERRIDE.PARAMETER for Data Type Change?
▪ The data type of a ODS/SDS field can be overridden using the
RR.OVERRIDE.PARAMETER table.
▪ Possible data type conversions are listed below for different source types
In the USER record, set Field Name to ATTRIBUTES and set the Target Field
Type to NVARCHAR(MAX)
Authorise the record
After you authorize RR.OVERRIDE.PARAMETER, an event is created
in RR.XSD.EVENTS. This is because you have already configured
F.USER in RR.PARAM
How to configure RR.OVERRIDE.PARAMETER for Single Value to MV?
Input RR.OVERRIDE.PARAMETER → AC.EVENT record, feed field name as
New multi value child table is created for FULL_DESCRIPTION column in the target database
FULL.DESCRIPTION and select Multi value as Target Single Multi.1 field
After committing the record, an event is created in RR.XSD.EVENTS if AC.EVENT is already
configured in RR.PARAM
How to configure RR.OVERRIDE.PARAMETER for MV to Sub Value?
Input RR.OVERRIDE.PARAMETER → ACCT.BALANCE.ACTIVITY record, feed field
name as ACTIVITY.DATA and select Sub value as Target Single Multi.1 field
After commiting the record, the new child tables will be created in target data store as a
sub value field