0% found this document useful (0 votes)
29 views2 pages

DS Methodology Data Requirements

The document outlines the DS methodology for data requirements and collection, emphasizing the importance of identifying necessary data content, formats, and sources for analytical approaches. It describes the process of data collection, including assessing data quality and filling gaps, as demonstrated in a healthcare case study for congestive heart failure. Collaboration among data scientists, DBAs, and programmers is highlighted as essential for data extraction, merging, and cleaning to prepare for further analysis.

Uploaded by

phnstrial
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views2 pages

DS Methodology Data Requirements

The document outlines the DS methodology for data requirements and collection, emphasizing the importance of identifying necessary data content, formats, and sources for analytical approaches. It describes the process of data collection, including assessing data quality and filling gaps, as demonstrated in a healthcare case study for congestive heart failure. Collaboration among data scientists, DBAs, and programmers is highlighted as essential for data extraction, merging, and cleaning to prepare for further analysis.

Uploaded by

phnstrial
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

DS Methodology: Data Requirements

and Collection
Data Requirements
If the problem is the 'recipe' and data is the 'ingredient,' the data scientist must identify the
required ingredients (data), how to source or collect them, understand them, and prepare
them for the desired outcome. Once the problem and analytical approach are understood,
the data scientist defines the data requirements before data collection and preparation. For
decision tree classification, this includes identifying data content, formats, and sources. In a
healthcare case study, criteria were established to select a patient cohort for congestive
heart failure.

Criteria included patients admitted within the provider’s service area, a primary diagnosis
of congestive heart failure, and continuous enrollment for at least six months before
primary admission. The cohort excluded patients with significant medical conditions to
avoid skewing results.

The content for decision tree modeling involved a complete clinical history, including
admissions, diagnoses, procedures, prescriptions, and services. The data scientists rolled up
transactional records into one record per patient, creating new variables, which required
anticipating the data preparation stage.

Data Collection
After the initial data collection, data scientists assess whether the collected data meets their
needs. Sometimes data is more difficult to obtain or costs more than expected, requiring
adjustments to the data requirements.

In the data collection stage, descriptive statistics and visualization techniques help assess
data content, quality, and provide initial insights. Data gaps are identified, and decisions are
made on how to fill or substitute missing information.

In the case study, data was collected from various sources, including demographic, clinical,
and coverage information, as well as claims and pharmaceutical data. Some data, like drug
information, was not available initially, but the team was able to build a good model without
it. The team could later revisit missing data if needed.

Data scientists, DBAs, and programmers often collaborate to extract, merge, and clean data
from different sources, preparing it for the next stage (data understanding). Automating
data processes can improve efficiency.
Summary: Data Requirement and Collection
Data Requirements stage tasks include identifying the correct and necessary data content,
data formats, and data sources for the specific analytical approach.

During the Data Collection stage, expert data scientists meticulously revise data
requirements and make critical decisions regarding the quantity and quality of data.

Data scientists apply descriptive statistics and visualization techniques to thoroughly assess
the content, quality, and initial insights gained from the collected data, identify gaps, and
determine if new data is needed or to substitute existing data.

You might also like