3
Most read
7
Most read
19
Most read
Ensuring data quality 
Mapping outcomes for quality assurance & control 
Data Topics Workshop Series: Fall 2014
Meet & Greet 
• 
First Name 
• 
Program or Department 
• 
Current role in a research project 
Heather Coates 
Digital Scholarship & Data Management Librarian 
Liaison to the Fairbanks School of Public Health 
hcoates@iupui.edu 
317-278-7125
Timeline 
• 
The Big Picture 
• 
Practical Strategies 
• 
Activities 
• 
Presentation: 10 minutes 
• 
Discussion: Defining quality 
• 
Discussion: Mapping outcomes 
• 
Review | Q&A 
Agenda
Scenario 
Four years after your article is published, a researcher in your field contacts you with questions about the integrity of the data. 
• 
Can you find the files supporting your published findings? 
• 
Can you access and view the files? 
• 
Can you justify your rationale for the procedures based on your documentation? 
• 
Can someone pick up your research and build on it?
Goals 
• 
Recognize the need for quality standards. 
• 
Begin to define quality standards for your research. 
• 
Identify quality assurance and quality control activities.
Data Integrity 
• 
Data have integrity if they have been maintained without unauthorized alteration or destruction 
• 
Data integrity is data that has a complete or whole structure. (http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Data_integrity.html)
Data Quality 
• 
Fitness for use (depends on context of your questions) 
• 
Data quality is the most important aspect of data management 
• 
Ensured by 
• 
Sufficient resources and expertise 
• 
Paying close attention to the design of data collection instruments 
• 
Creating appropriate entry, validation, and reporting processes 
• 
Ongoing QC processes 
• 
Understanding the data collected 
Chapman, 2005 
Source: Deptof Biostatistics –Data Management, IUSM
Data Quality Standards 
• 
Check data for its logical consistency. 
• 
Check data for reasonableness. 
• 
Ensure adherence to sound estimation methodologies. 
• 
Ensure adherence to monetary submission standards for stolen and recovered property. 
• 
Ensure that other statistical edit functions are processed within established parameters. 
FBI: http://www.fbi.gov/about-us/cjis/ucr/data_quality_guidelines 
Source: Deptof Biostatistics –Data Management, IUSM
Discussion: Defining Data Quality 
Define data quality standards for the following variables: 
• 
Age, BMI 
• 
Life satisfaction scale 
• 
Number of close friends 
• 
Blood draw, bone fossil, water sample 
• 
Satellite image, photograph,
Defining QA/QC 
• 
Strategies for preventing errors from entering a dataset 
• 
Activities to ensure quality of data before collection 
• 
Activities that involve monitoring and maintaining the quality of data during the study
QA/QC Before Collection 
• 
Define & enforce standards 
• 
Formats 
• 
Codes 
• 
Measurement units 
• 
Metadata 
• 
Assign responsibility for data quality 
• 
Be sure assigned person is educated in QA/QC
Quality Assurance v. Control 
• 
QA: set of processes, procedures, and activitiesthat are initiated prior to data collection to ensure the expected level of quality will be reached and data integrity will be maintained. 
• 
QC: a system for verifying and maintaining a desired level of quality in a productor service. 
http://c2.com/cgi/wiki?QualityAssuranceIsNotQualityControl
Quality Assurance in Practice 
• 
CRF (data collection instrument) review & validation 
• 
System/process testing & validation 
• 
Training, education, communication of a team 
• 
Standard Operating Procedures, Standard Operating Guidelines 
• 
Site audits 
Source: Deptof Biostatistics –Data Management, IUSM
Quality Control in Practice 
• 
Set of processes, procedures, and activities associated with monitoring, detection, and action during and after data collection. 
• 
Examples: 
• 
Errors in individual data fields 
• 
Systematic errors 
• 
Violation of protocol 
• 
Staff performance issues 
• 
Fraud or scientific misconduct 
Source: Deptof Biostatistics –Data Management, IUSM
General themes: GCDMP 
• 
Plan, test, revise, test, revise, test…implement 
• 
All stakeholders should be involved in designing protocol, data collection tools, data management plan, etc. 
• 
Document, document, document 
• 
Rule: the bigger and more complex the study (sites, data, people), the more planning you need
Relevant practices from GCDMP 
• 
Specify documents required for reproducible research at various levels 
• 
Institutional: SOP 
• 
Study: protocol, manual of procedures, data management plan, statistical analysis plan 
• 
Documentation serves practical purposes, among them a shared understanding of the project, and benefits the team immediately 
• 
Specify roles and responsibilities from the beginning
Begin with the end in mind 
Produce report-ready outputs 
Collect data in a way to enable efficient data entry, processing, validation, analysis, reporting 
Enabled by standardized data collection tools
Mapping research & data outcomes 
• 
Review the instructions 
• 
Review the example (on screen) 
• 
Discussion
Resources 
1. 
Department of Biostatistics –Data Management Team, Indiana University School of Medicine (2013). Data Management including REDCap. (provided via email) 
2. 
Chapman, A. D. 2005. Principles of Data Quality, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen. ISBN 87-92020-03-8. http://www.gbif.org/resources/2829 
3. 
DataONE Education Module: Data Quality Control and Assurance. DataONE. From http://www.dataone.org/sites/all/documents/L05_DataQualityControlAssurance.pptx 
4. 
Good Clinical Data Management Practices (2013). Available at http://www.scdm.org/sitecore/content/be- bruga/scdm/Publications/gcdmp.aspx

Ensuring data quality

  • 1.
    Ensuring data quality Mapping outcomes for quality assurance & control Data Topics Workshop Series: Fall 2014
  • 2.
    Meet & Greet • First Name • Program or Department • Current role in a research project Heather Coates Digital Scholarship & Data Management Librarian Liaison to the Fairbanks School of Public Health [email protected] 317-278-7125
  • 3.
    Timeline • TheBig Picture • Practical Strategies • Activities • Presentation: 10 minutes • Discussion: Defining quality • Discussion: Mapping outcomes • Review | Q&A Agenda
  • 4.
    Scenario Four yearsafter your article is published, a researcher in your field contacts you with questions about the integrity of the data. • Can you find the files supporting your published findings? • Can you access and view the files? • Can you justify your rationale for the procedures based on your documentation? • Can someone pick up your research and build on it?
  • 5.
    Goals • Recognizethe need for quality standards. • Begin to define quality standards for your research. • Identify quality assurance and quality control activities.
  • 6.
    Data Integrity • Data have integrity if they have been maintained without unauthorized alteration or destruction • Data integrity is data that has a complete or whole structure. (http://www.princeton.edu/~achaney/tmve/wiki100k/docs/Data_integrity.html)
  • 7.
    Data Quality • Fitness for use (depends on context of your questions) • Data quality is the most important aspect of data management • Ensured by • Sufficient resources and expertise • Paying close attention to the design of data collection instruments • Creating appropriate entry, validation, and reporting processes • Ongoing QC processes • Understanding the data collected Chapman, 2005 Source: Deptof Biostatistics –Data Management, IUSM
  • 8.
    Data Quality Standards • Check data for its logical consistency. • Check data for reasonableness. • Ensure adherence to sound estimation methodologies. • Ensure adherence to monetary submission standards for stolen and recovered property. • Ensure that other statistical edit functions are processed within established parameters. FBI: http://www.fbi.gov/about-us/cjis/ucr/data_quality_guidelines Source: Deptof Biostatistics –Data Management, IUSM
  • 9.
    Discussion: Defining DataQuality Define data quality standards for the following variables: • Age, BMI • Life satisfaction scale • Number of close friends • Blood draw, bone fossil, water sample • Satellite image, photograph,
  • 10.
    Defining QA/QC • Strategies for preventing errors from entering a dataset • Activities to ensure quality of data before collection • Activities that involve monitoring and maintaining the quality of data during the study
  • 11.
    QA/QC Before Collection • Define & enforce standards • Formats • Codes • Measurement units • Metadata • Assign responsibility for data quality • Be sure assigned person is educated in QA/QC
  • 12.
    Quality Assurance v.Control • QA: set of processes, procedures, and activitiesthat are initiated prior to data collection to ensure the expected level of quality will be reached and data integrity will be maintained. • QC: a system for verifying and maintaining a desired level of quality in a productor service. http://c2.com/cgi/wiki?QualityAssuranceIsNotQualityControl
  • 13.
    Quality Assurance inPractice • CRF (data collection instrument) review & validation • System/process testing & validation • Training, education, communication of a team • Standard Operating Procedures, Standard Operating Guidelines • Site audits Source: Deptof Biostatistics –Data Management, IUSM
  • 14.
    Quality Control inPractice • Set of processes, procedures, and activities associated with monitoring, detection, and action during and after data collection. • Examples: • Errors in individual data fields • Systematic errors • Violation of protocol • Staff performance issues • Fraud or scientific misconduct Source: Deptof Biostatistics –Data Management, IUSM
  • 15.
    General themes: GCDMP • Plan, test, revise, test, revise, test…implement • All stakeholders should be involved in designing protocol, data collection tools, data management plan, etc. • Document, document, document • Rule: the bigger and more complex the study (sites, data, people), the more planning you need
  • 16.
    Relevant practices fromGCDMP • Specify documents required for reproducible research at various levels • Institutional: SOP • Study: protocol, manual of procedures, data management plan, statistical analysis plan • Documentation serves practical purposes, among them a shared understanding of the project, and benefits the team immediately • Specify roles and responsibilities from the beginning
  • 17.
    Begin with theend in mind Produce report-ready outputs Collect data in a way to enable efficient data entry, processing, validation, analysis, reporting Enabled by standardized data collection tools
  • 18.
    Mapping research &data outcomes • Review the instructions • Review the example (on screen) • Discussion
  • 20.
    Resources 1. Departmentof Biostatistics –Data Management Team, Indiana University School of Medicine (2013). Data Management including REDCap. (provided via email) 2. Chapman, A. D. 2005. Principles of Data Quality, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen. ISBN 87-92020-03-8. http://www.gbif.org/resources/2829 3. DataONE Education Module: Data Quality Control and Assurance. DataONE. From http://www.dataone.org/sites/all/documents/L05_DataQualityControlAssurance.pptx 4. Good Clinical Data Management Practices (2013). Available at http://www.scdm.org/sitecore/content/be- bruga/scdm/Publications/gcdmp.aspx