0% found this document useful (0 votes)
16 views22 pages

DataQuality 4

The document outlines data quality issues, health checks, and data profiling methods within a source system. It identifies common patterns of data quality issues, such as schema length mismatches and incorrect data types, and describes a utility for health checks that scans and validates data against schema definitions. Additionally, it details configurations for data profiling to address length and type discrepancies, ensuring data integrity and compliance.

Uploaded by

Psar Khmer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views22 pages

DataQuality 4

The document outlines data quality issues, health checks, and data profiling methods within a source system. It identifies common patterns of data quality issues, such as schema length mismatches and incorrect data types, and describes a utility for health checks that scans and validates data against schema definitions. Additionally, it details configurations for data profiling to address length and type discrepancies, ensuring data integrity and compliance.

Uploaded by

Psar Khmer
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Agenda

Data Quality
• List of Issues occurred in Source system

Health Check
• How to identify the issue upfront?

Data Profiling
• How to resolve the data quality issue?
Data Quality
Types of Data Quality Considered

Schema Length
Compliance

Data Type Compliance

IN2TEXT Compliance

Single Value with MV Data

MV with Sub Value Data

5
Issue Patterns

Pattern - 1
•This is the most common form of data quality issue encountered, the data field length is not aligned with schema definition from
the source system

Pattern - 2
•This is more typical data quality validation, and is used to validate both occasional schema violations and casting format errors
• Data inputted incorrectly into the source system (i.e. text in a number field)
•Data supplied in an incorrect format (i.e. YYYYMMDD mask with 31-Jan-2019 data)

Pattern - 3
•IN2TEXT – as a Single Value rather than Multi Value

Pattern - 4
•SS definition as Single Value but values comes as Multi Value

Pattern – 5
•SS definition as Multi Value but values comes as Sub Value

6
Pattern -3 [IN2TEXT as Single Value]
STANDARD.SELECTION – PAYMENT.PURPOSE.CODE RECORD REFERENCE – PAYMENT.PURPOSE.CODE - ACCT

RECORD FROM CLASSIC VIEW – PAYMENT.PURPOSE.CODE S ACCT DATA EVENT FROM DBTOOLS

7
Pattern -4 [Single Value but Multi Valued data]
APPLICATION: F.AC.EVENT
RECID: INTEREST-DEBIT-DUE
ERROR MESSAGE:
Found multi values for a single field: schema=AC_EVENT data={fieldNumber=2,mvIndex=1,svIndex=1,value=Handles due interest take
over},{fieldNumber=2,mvIndex=2,svIndex=1,value=Executes as part of DEBIT action },{fieldNumber=2,mvIndex=3,svIndex=1,value=of interest property
class.},{fieldNumber=2,mvIndex=4,svIndex=1,value=Accounting (for Loans):},{fieldNumber=2,mvIndex=5,svIndex=1,value=Dr
ACC<Interest>},{fieldNumber=2,mvIndex=6,svIndex=1,value=Cr <Capture Suspense>}

8
Pattern -5 [Multi Value but Sub Valued data]
TSR-151458 Incident raised by BCI for the issue with the following application – F.AA.PROPERTY.CLASS
Field FULL.DESC is defined as MULTI VALUE but the Value for the RECID CHANGE.PRODUCT is coming with SM Marker instead of VM Marker
APPLICATION: F.AA.PROPERTY.CLASS
RECID: CHANGE.PRODUCT
ERROR MESSAGE:
Invalid multi value: schema=AA_PROPERTY_CLASS position=2 data={fieldNumber=2,mvIndex=1,svIndex=2,value=products on a running arrangement.}

CORRECT RECORD - F.AA.PROPERTY.CLASS – ACTIVITY CHARGES/ FAILED RECORD - F.AA.PROPERTY.CLASS – CHANGE.PRODUCT

9
Health Check
Overview - RR.DATA.HEALTH.CHECK

Utility developed to identify data issue in


01 Transact database that does not match with
Schema definitions

New service introduced which scan all the


02 tables configured in BATCH record and
validate each data.

Dashboard to display all the errors, which are


03 captured in RR.DATA.HEALTH.CHECK
application during the service execution.

04
Configurations
❑ Batch Record – RR.HEALTH
Add the Applications in Data Field of BATCH Record
Configurations
❑ TSA.SERVICE Record – RR.HEALTH
Start the TSA.SERVICE
Execution – RR.HEALTH
DASHBOARD – RR.DATA.CHECK

File name is mandatory input to execute


the enquiry
DASHBOARD – RESULTS
Data Profiling
Data Profiling It is the most common form of
data quality issue
encountered

1. Data Profiling is a common Source System provides an


Schema Compliance application to Override the
solution to cater the length schema length
and datatype issue in SS
across the vertical This ensures that the schema
supplied through DES from the
source system aligns with the
data content of the dataset

2. Data Profiling is an utility to


This is another common form of
cater the length and data quality issue
datatype issue in SS from encountered

Transact System
Source system provides an
application to overcome the
Data/Schema Profiling Data Type Compliance
datatype mismatch to correct
the right data type
3. Introduced
RR.OVERRIDE.PARAMETER These exceptions will typically
application in Transact to be encountered in online
services/COB transactions
handle the length and data
type issue to override the
The data is not aligned with
properties at runtime Single Value field with MV
data
the expected schema
definition

The data is not aligned with


MV field with Sub Value Data the expected schema
definition
How to Configure Data Profiling ?

1. Data Profiling – Length Change

2. Data Profiling – Data Type Change

3. Schema Profiling – Single Value field to MV Change

4. Schema Profiling – MV field to Sub Value Change


How to configure RR.OVERRIDE.PARAMETER for Length Change?
Add USER application Enter TARGET.FIELD.LENGTH greater than SOURCE.FIELD.LENGTH

Commit the record

Add the field name as ‘PASSWORD’ and validate

SOURCE.FIELD.LENGTH and SOURCE.FIELD.TYPE values are


populated.

Error message ‘Target field length or target field type should be entered’ is
displayed as either TARGET.FIELD.LENGTH or TARGET.FIELD.TYPE
should be entered for a field
Authorise the record
How to configure RR.PARAM?

Configure F.USER table in RR.PARAM

The ‘fieldinputlen’ value for PASSWORD field is updated as 150

Authorise the record

Verify the event created in RR.XSD.EVENTS


How to configure RR.OVERRIDE.PARAMETER for Data Type Change?
▪ The data type of a ODS/SDS field can be overridden using the
RR.OVERRIDE.PARAMETER table.
▪ Possible data type conversions are listed below for different source types

In the USER record, set Field Name to ATTRIBUTES and set the Target Field
Type to NVARCHAR(MAX)

Authorise the record

After you authorize RR.OVERRIDE.PARAMETER, an event is created


in RR.XSD.EVENTS. This is because you have already configured
F.USER in RR.PARAM
How to configure RR.OVERRIDE.PARAMETER for Single Value to MV?
Input RR.OVERRIDE.PARAMETER → AC.EVENT record, feed field name as
New multi value child table is created for FULL_DESCRIPTION column in the target database
FULL.DESCRIPTION and select Multi value as Target Single Multi.1 field

After committing the record, an event is created in RR.XSD.EVENTS if AC.EVENT is already


configured in RR.PARAM
How to configure RR.OVERRIDE.PARAMETER for MV to Sub Value?
Input RR.OVERRIDE.PARAMETER → ACCT.BALANCE.ACTIVITY record, feed field
name as ACTIVITY.DATA and select Sub value as Target Single Multi.1 field

After commiting the record, the new child tables will be created in target data store as a
sub value field

You might also like