0% found this document useful (0 votes)

42 views8 pages

Using Electronically Available Inpatient Hospital Data For Research

This document describes a process for utilizing electronically available inpatient hospital data from multiple sources for research purposes. The researchers linked data from clinical records, procedures, administration records, and billing across four hospitals. This included linking patient location data, medications, procedures, lab results and diagnoses codes. Challenges included accessing different data sources, data storage, defining health measures, linking the data, and dealing with missing data. The resulting dataset provides rich information but linking hospital data also presents challenges for improving patient care.

Uploaded by

Sam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views8 pages

Using Electronically Available Inpatient Hospital Data For Research

Uploaded by

Sam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Using Electronically Available Inpatient Hospital Data for Research

Mandar Apte, M.B.B.S., Ms.P.H.1, Matthew Neidell, Ph.D.2, E. Yoko Furuya, M.D., M.S.3,4, David Caplan, B.S.5, Sherry Glied, Ph.D.6,*,
Elaine Larson, R.N., Ph.D.7

Abstract
Despite a push to create electronic health records and a plethora of healthcare data from disparate sources, there are no data from
a single electronic source that provide a full picture of a patient’s hospital course. This paper describes a process to utilize electroni-
cally available inpatient hospital data for research. We linked several different sources of extracted data, including clinical, procedural,
administrative, and accounting data, using patients’ medical record numbers to compile a cohesive, comprehensive account of patient
encounters. Challenges encountered included (1) interacting with distinct administrative units to locate data elements; (2) finding a
secure, central location to house the data; (3) appropriately defining health measures of interest; (4) obtaining and linking these data
to create a usable format for conducting research; and (5) dealing with missing data. Although the resulting data set is incredibly rich
and likely to prove useful for a wide range of clinical and comparative effectiveness research questions, there are multiple challenges
associated with linking hospital data to improve the quality of patient care. Clin Trans Sci 2011; Volume 4: 338–345
Keywords: surveillance, electronic data, ICD9 codes.

Introduction
For the past decade there has been a national commitment to shown that ICD-9-CM representations of clinical events such as
enhance health information technology and develop electronic infections are inadequate for clinical research since they do not
health records. These efforts are intended to monitor and improve match well with clinical definitions.4 Faced with this situation,
evidence-based practice and quality of care and secure patient we identified relevant data sources and developed algorithms to
information in a highly mobile environment. For example, the collate data from a variety of electronic sources. The purpose of
HiTech provisions of the American Recovery and Reinvestment this paper is to describe the process we used to combine various
Act of 2009 (Public Law 111–5) included $20 billion in spending sources of electronically available inpatient hospital data for health
to spur the adoption of electronic health records. Hospitals across services research.
the country have developed and/or adopted electronic methods
to collect and store data, but electronic databases have often been Methods
designed for a specific purpose or department such as laboratory,
radiology, pharmacy, patient tracking, clinician orders, central Sample and setting
supply, or billing. Hence, many hospitals have a number of such Data were extracted from various electronic databases from four
databases which function well for one purpose but are unlinked sites in a large healthcare system in metropolitan New York City:
and do not “speak to each other.” As a result, with few exceptions, the New York-Presbyterian Hospital (NYPH) System. NYPH
such as the healthcare facilities of the Department of Veterans is the largest hospital system in the largest metropolitan region in
Affairs (http://www.ehealth.va.gov/VistA.asp), there may be a the United States and includes a community hospital, pediatric
plethora of healthcare data available regarding therapies provided, hospital, and two tertiary/quaternary care hospitals that provide
test results, and costs of care, but often no single electronic source care to a diverse range of patients. Although the database was
that provides a full picture of a patient’s hospital course(s). developed to study healthcare-associated infections, and hence this
In general, the United States has been slow to adopt electronic paper disproportionately focuses on these outcomes, the approach
health records.1,2 As of 2005 only 5% of hospitals used computerized is generalizable to a wide range of clinical research topics.
physician order entry,3 and even fewer had unified electronic health
records. Hence, the current potential for using data to conduct Data extraction
comparative effectiveness research and monitor and improve the
quality of patient care is limited and little is known about how these Clinical Data Warehouse (CDW).
data can be used. In an ongoing NIH-funded study of healthcare- The four hospital sites share a CDW that enables hospital or
associated infections and predictors and costs of antimicrobial university personnel engaged in either clinical research or
resistance among patients in a large hospital system (Distribution activities related to hospital treatment, payment, or operations
of the Costs of Antimicrobial Resistant Infections, 5R01NR10822), to perform analytic queries on clinical data across patients.
we found that relevant data were not readily available from a single The Warehouse integrates data from over 20 clinical electronic
source. A major limitation of commonly available data such as sources and organizes the data by subject. We extracted the
ICD-9-CM codes is that it only identifies health end points that following data elements from the CDW: (1) laboratory results,
are relevant for billing purposes. In addition, several studies have including microbiologic results from blood, urine, and respiratory
1
School of Nursing, Columbia University, New York, New York, USA; 2Associate Professor, Department of Health Policy and Management, Mailman School of Public Health, Columbia
University, New York, New York, USA; 3Department of Medicine, Division of Infectious Diseases, Columbia University, New York, New York USA; 4NewYork-Presbyterian Hospital,
Columbia University Medical Center, New York, New York, USA; 5Information Services Division—Business Solutions Group, New York-Presbyterian Hospital, Columbia University
Medical Center, USA; 6Department of Health Policy and Management, Mailman School of Public Health, Columbia University, New York, New York, USA; 7School of Nursing and
Mailman School of Public Health, Columbia University, New York, New York, USA
*
Dr. Glied is currently on leave at the Department of Health and Human Services (HHS), where she is Assistant Secretary for Planning and Evaluation. Her contributions to this paper
were made prior to her appointment at HHS and the paper does not reflect the official views of HHS.
Correspondence: E Larson ([email protected])
DOI: 10.1111/j.1752-8062.2011.00353.x

338 VOLUME 4 • ISSUE 5 WWW.CTSJOURNAL.COM

Apte et al. Using Electronic Inpatient Hospital Data
I

cultures, all cultures taken from possible surgical sites, and urine results, urine microscopy results, and ICD-9-CM diagnoses codes,
microscopy results; (2) patient location, including hospital unit, we identified patients as having an infection (cases), not having
room and bed occupied for each day of hospital stay as well as an infection (controls), and patients whom we could not clearly
patient’s home address; and (3) detailed accounts of medications categorize (noncase, noncontrol). We separately identified cases
administered and procedures performed, including use of central for organisms of interest (those often associated with multidrug
venous (CV) catheters. resistance) and for any organism.8 Appendix provides a detailed
description of the algorithms used.
Operating room data.
Data on procedures performed in the operating room were obtained Variables constructed
from the perioperative services of each institution. Data included Using the data sources described earlier, we coded categories of
the date and time of entry in the operating room, commencement variables for the final data set. A data dictionary describing all
of and recovery from anesthesia, time of incision and closure, variables is available upon request from the authors. A limited set
procedure descriptions and type of anesthesia used. of patient demographics was also collected, namely age and zip
code of residence, which could be used to link neighborhood level
Administrative data. characteristics from external data sets such as the decennial census
Administrative data from the admission, discharge, transfer of housing and population. Admission and discharge variables
(ADT) billing, and coding and abstraction systems included included the date of admission, length of hospital stay, whether the
admission and discharge dates, ICD-9-CM principal and patient died in the hospital, several variants of diagnosis related
secondary diagnosis and procedure codes with associated codes groups (DRGs), and measures of risk of mortality and severity of
for diagnoses present on admission, and admission source and illness based on output from 3M’s grouper software, which uses a
discharge destinations. proprietary algorithm to assign an APR–DRG to each discharge.9
Several measures of the health status of the patient were collected,
Cost accounting data. including prior hospitalizations, diabetes, chronic dermatitis,
Financial information for each discharge was obtained from the trauma, burns, and history of substance abuse. ICD-9-CM
cost accounting system, including total charges and insurance/ diagnoses codes for conditions present on admission were used
payer information. In addition, details for each item charged to calculate a weighted Charlson score as a measure of patients’
to the patient’s stay were collected, including date of service, health status at admission.10 Several measures of procedure based
charge amount, and UB-92 revenue codes (maintained by the risk factors were collected, including the use of medications, CV
National Uniform Billing Committee), which identify specific catheterization, urinary catheterization, mechanical ventilation,
accommodations. cardiac catheterization, catheter angiography, vascular stenting,
dialysis, surgical procedure, general anesthesia, intubation,
Data from the electronic health record system. and ICU stay. All of these variables included both the date the
Data on urinary catheter output was obtained through mediated procedure started and ended. We also coded patients in whom an
queries to flowsheets in the physician and nursing order entry infection occurred, including details on the organism responsible,
system (Eclipsys XA, http://www.allscripts.com/) antibiotic susceptibility pattern and when the infection occurred.
Financial variables collected included the total charges for the
Linking data. encounter, total payments received, along with information on
Patient information was linked across the multiple data sets the source of payment, and daily itemization of charges.
using the unique account number associated with each hospital Given that some of the events varied throughout the course of
admission where available. In case of data for which account a patient’s hospital stay (e.g., presence of a urinary catheter) while
numbers were not available, source data were matched to the some were fixed throughout the stay (e.g., malignancy, diabetes),
correct hospital stay using the unique medical record number we created both time varying and time invariant variables. To
and date/time stamps associated with source data. Once data allow for the construction of time varying variables, the unit of
sets were linked and processed, data sets were de-identified by analysis was the patient-day, so each patient encounter contributed
replacing account numbers and medical record numbers with one observation for each day in his or her length of hospital stay.
unique identification numbers. This data set construction is analogous to the structure often used
for discrete time survival models, hence making it possible to
Algorithms for identifying infections model risk factors for infections.
To study the cost of antimicrobial resistant infections, infection
outcomes needed to be defined across multiple domains and axes: Imputation
the type of infection, the date an infection occurred, the causative The rollout of the electronic health record (Eclipsys; H/P
organism and its antimicrobial susceptibility pattern. Our team Technologies, Phoenix, AZ, USA) was staggered at the four
of clinicians and researchers developed electronic algorithms to hospitals for the time period of our analysis. Because this
identify hospital stays with any of four types of infections: blood system was primarily used in our data set to record the use
stream infection, urinary tract infection, pneumonia and surgical of CV catheters, urinary catheters, and the administration of
site infections. We used the surveillance definitions from the medication, these observations were frequently missing for
Centers for Disease Control and Prevention National Healthcare earlier years. Because this pattern of “missingness” was due solely
Safety Network (NHSN, http://www.cdc.gov/nhsn/about.html) to the introduction of the new system, we imputed these variables
for healthcare-associated infections5–7 as a starting point to to maintain a full sample.
identify elements of these definitions which could be mapped to We used two imputation procedures. To identify whether
available electronic data. Using a combination of microbiologic or not one of the three events (CV catheterization, urinary

WWW.CTSJOURNAL.COM VOLUME 4 • ISSUE 5 339

Apte et al. Using Electronic Inpatient Hospital Data
I

Hospital/year 2006 2007 2008 Total Stata version 10.1 (Stata Corp, College Station, TX, USA) was
used for imputation.
Community 13,706 13,515 13,570 40,791
Pediatric 16,551 18,375 19,260 54,186
Results
Tertiary1 41,524 41,586 40,724 123,834 Table 1 displays the summary of discharges for each hospital
Tertiary2 33,547 33,926 33,661 101,134 separately by year for all inpatient discharges from 2006 to 2008.
Nearly 320,000 discharges occurred during this time period, with
Total 105,328 107,402 107,215 319,945
small increases in discharges at each hospital over the 3-year period.
Table 1. Summary of discharges by hospital and year. Given the different target populations, there were considerably
more discharges at the two tertiary care hospitals. Table 2 and
Table 3 display the number of discharges in which patients were
Hospital BSI1 UTI2 PNU3 SSI4
identified as being infected according to our algorithms, separately
Community 937 3,285 256 80 by site, organism, and hospital. Consistent with the number of
Pediatric 1,145 1,163 176 137 discharges across hospitals, there were more infections at the
tertiary care hospitals. Table 4 displays the summary statistics of
Tertiary1 3,024 7,728 1,101 835
a subset of variables in the final data set.
Tertiary2 3,241 8,241 1,706 705 As one way of assessing the validity of our imputation, we
Total 8,347 20,417 3,239 1,757 compared the distribution of nonmissing observations to the
Notes: 1BSI, blood stream infection; 2UTI, urinary tract infection; 3PNU, pneumonia; distribution of missing (imputed) observations. Figures 1 and 2
4
SSI, surgical site infection. display histograms for CV catheter, with Figure 1 showing the
Table 2. Number of infections by any organism and hospital. results for imputing the first day of insertion and Figure 2 showing
the results for the duration of insertion (results are comparable for
urinary catheter and medication administration). In both figures,
catheterization, and the administration of medication) had the white bars represent cases where CV catheter data were
occurred for a patient, we used multiple imputation by chained complete (observed) and the dark bars representing cases in which
equations11 using logistic regression with all other available CV catheter data were imputed. These figures demonstrate that
variables in the data set as predictors for the three events. our imputation procedure was generally effective in replicating
Once we imputed these three variables, we then needed to impute the distribution of these variables.
the day the event started and the duration of the event. Because
start and end dates must be restricted to occur within a patient’s Discussion
hospital stay (i.e., we could not predict a CV catheter to be inserted Although a fully integrated database is essential for comparative
on a patient’s tenth day if he only stayed in the hospital for 9 effectiveness and outcomes research, the initial development
days) and the distribution of start day and duration are skewed, phase of this project posed a number of challenges and required
we performed hotdeck imputation, which replaces data for the considerable time. In fact, the process required almost 2 years of
missing observations (“recipients”) with data from nonmissing work of a team including a clinician, economist, epidemiologist,
observations in the same sample that have similar characteristics and an experienced programmer and statistician. Major
(“donors”).12 The “recipients” consisted of patients whose CV challenges that we encountered are discussed below, and
catheter, urinary catheter, and use of medications had just been included identifying and obtaining permission for access to
imputed in the first step. The “donors” consisted of patients who data sources, limitations regarding extraction of text-based data,
had one of these events, but with the same length of stay as the and technical issues regarding merging various systems across
recipient and a similar predicted probability of start day and institutions.
duration. This predicted probability was obtained by estimating In many healthcare systems, departments or service lines
separate count models for start day and duration using all other often operate independently; it is thus not surprising that silos
variables in the data sets as predictors, using only the sample or fiefdoms develop to facilitate getting work accomplished
in which one of the events had occurred. Similar predicted efficiently. Considerable effort was required in this project to
probability was defined by grouping the predicted start day and first identify within each department and across settings the
duration into deciles. “proprietor” or steward/manager of specific data sources and then
Data extraction, manipulation and analysis were conducted to work with them to obtain the necessary permissions to access
using TOAD for DB2 version 3.1.1 (Quest Software, Aliso Viejo, and use the data. There were no specific protocols or guidelines in
CA, USA), SAS version 9.1.3 (SAS Institute, Cary, NC, USA) and place to clarify how this should be done, and in some cases it was

Site/organism Acinetobacter Enterococcus Klebsiella Pseudomonas Staphylococcus Streptococcus Total

baumannii faecalis/faecium pneumoniae aeruginosa aureus pneumoniae
BSI 118 780 598 161 1,180 175 3,012
UTI 204 2,878 2,520 1,112 580 4 7,298
PNU 157 176 425 585 1,103 125 2,571
SSI 31 327 3 124 462 1 948
Table 3. Number of infections caused by one of six organisms of interest.

340 VOLUME 4 • ISSUE 5 WWW.CTSJOURNAL.COM

Apte et al. Using Electronic Inpatient Hospital Data
I

Variable/hospital Community Pediatric Tertiary1 Tertiary 2 Total

Age 44.66 16.73 44.47 59 44.39
% Male 37.41 39.09 44.23 50.77 44.56
Health status and procedures
Length of stay (days) 5.01 5.52 6.43 6.52 6.12
% Diabetes mellitus 17.87 2.45 12.51 24.89 15.40
% Malignancy 3.16 6.48 13.60 14.37 11.31
% Chronic dermatitis 5.97 1.50 3.89 4.75 4.02
% Renal failure 11.43 1.43 9.57 18.79 11.34
% Substance abuse 7.76 0.65 2.34 6.80 4.15
% History of transplant 0.11 1.24 1.57 2.74 1.70
% Prior hospitalization 30.23 24.48 26.09 39.50 30.58
% History of stay at skilled nursing facility 7.07 0.54 1.94 1.91 2.35
% Central venous catheter 2.16 6.34 7.22 9.00 6.99
Days with central venous catheter 6.52 13.09 10.56 9.01 10.16
% Urinary catheterization 26.55 31.40 39.69 39.25 36.47
Days with urinary catheterization 4.21 3.49 5.45 6.07 5.26
% ICU stay 4.80 13.77 26.95 11.60 17.04
Number of days in ICU 0.24 1.79 1.69 0.82 1.25
% Mechanical ventilation 2.17 4.07 3.48 3.57 3.44
% Dialysis 1.28 0.22 2.22 3.60 2.20
% Biopsy 0.38 1.31 1.37 2.47 1.58
% Operating room procedure 9.81 21.08 23.76 28.86 23.14
% Endotracheal intubation 1.75 3.23 2.51 3.15 2.74
% General anesthesia 6.69 19.81 20.35 22.75 19.28
% Major operating room procedure (>30 min) 8.34 20.63 21.58 27.65 21.65
% Major organ transplant 0.00 0.27 0.54 1.42 0.70
% Cardiac catheterization/angiography/angioplasty/vascular stent 0.20 2.21 10.61 19.23 10.6
% Feeding tube insertion 0.46 0.57 0.78 1.24 0.85
Table 4. Summary statistics for select variables.

difficult to determine who actually had the right to grant access added to make the cost worth the effort. Secondly, even when NLP
to data for anyone outside their specific area. Over a period of algorithms are established, they are often not sufficiently sensitive
months, we had multiple conversations with various individuals to assure efficient and accurate retrieval of useable information.15
to develop our own list of individuals with the authority to grant Most importantly, our first priority was to create a system that
access. Because multiple and varying electronic data collection was potentially generalizable across institutions in which the
systems had been purchased or internally developed by many required NLP expertise might not be. Although our study is
individual departments or divisions, this was one of the most limited by the fact that we do not have data extracted from text
time consuming tasks we encountered. To facilitate future efforts notations, this is also an advantage in terms of generalizability
to consolidate data bases, we recommend that healthcare systems and sustainability.
begin to identify the various sources of clinical, administrative Finally, and not surprisingly, we encountered technical
and financial data and develop policies and procedures to access issues regarding merging various software and data formats
and use the data. across institutions. Despite the fact that the four hospitals in this
Natural language processing (NLP) algorithms have been study were part of a single large hospital system, the institutions
used in a number of clinical applications to extract useful varied with regard to the electronic record systems used. In fact,
information for research.13–15 In this study, we considered using during the study period, one of the hospitals changed electronic
NLP algorithms to extract data from text-based records such as medical records systems and, as noted in Methods, some data
nursing notes and radiology reports. We found, however, that elements were not available for the entire study period at all
while it was possible, we chose not to pursue using NLP for several sites, necessitating the application of imputation methods.
reasons. First, a huge investment in additional time and resources Such technical problems require considerable programming
would have been necessary and we did not see sufficient value expertise.

WWW.CTSJOURNAL.COM VOLUME 4 • ISSUE 5 341

Apte et al. Using Electronic Inpatient Hospital Data
I

infections, the general process

described above is generalizable
to a wide range of settings and
studies, such as studies of the
impact of various therapies
or interventions on patient
outcomes or changes in trends
over time. Much of the general
procedures discussed above
reflect the necessary “first step” for
obtaining and merging data; the
“second step” would require the
development of algorithms specific
to a particular study to improve the
measurement of health outcomes.
This paper simply highlights one
such algorithm, though the range
of outcomes that could potentially
Figure 1. Result of impulation for first day of central venous catheter. be studied extends far beyond this
setting.

Conclusion
Given that it is not always possible
to design randomized clinical
trials to understand the impact
of various clinical interventions,
researchers must often instead
rely on retrospectively collected
data from various sources. In
analyses using retrospective
data, it becomes more important
to account for the full range of
experiences patients encounter
in the healthcare system. Detailed
information on these encounters is
often recorded electronically, but
these data are typically stored in
distinct databases, thus limiting
Figure 2. Result of impulation for duration of central venous catheter.
researchers’ ability to compile a
cohesive, comprehensive account
of patient encounters.
Clearly, the extensive resources required to overcome such In this paper, we have described the steps we have taken
challenges are not justifiable if the database remains static for to compile such a database from a major hospital system in
a short period of time, because the data will quickly become New York City as part of a larger study to examine the impact
outdated and less relevant for research or quality monitoring. of antimicrobial-resistant infections on the costs to society.
Hence, we are now in the process of incorporating the Several obstacles were encountered in this process that are
database into the institution’s Clinical Data Warehouse as a likely to be common across other settings, including: (1)
datamart, and setting up automatic feeds to update the data interacting with distinct administrative units to locate data
on a continuous, ongoing basis. The database to date has been elements; (2) finding a secure, central location to house the
used to examine clinical problems related to infections such data; (3) appropriately defining health measures of interest;
as identifying risk factors for multidrug resistant infections, (4) obtaining and linking these data to create a usable format
examining the relationship between short bowel syndrome for conducting research; and (5) dealing with missing data.
and incidence of bloodstream infection, and correlating Although some of the steps we have taken to address these
measures of glucose control and risk of surgical site infection issues are context specific, these steps are likely to serve as
in diabetics and nondiabetics. Additional data elements can a general guideline for creating such data sets in other large
be added to the database for investigators seeking to test other healthcare systems.
specific hypotheses. We plan to widely disseminate information The resulting data set is an incredibly rich one that is likely
regarding the availability of these data to investigators within to prove useful for a wide range of clinical research questions.
and outside the study institutions. Looking ahead, a major focus centers on maintaining the
Although the algorithms developed to identify infections sustainability of these data to ensure they can be regularly updated
were specific to the focus of our grant on healthcare acquired to include additional years of data as it becomes available.

342 VOLUME 4 • ISSUE 5 WWW.CTSJOURNAL.COM

Apte et al. Using Electronic Inpatient Hospital Data
I

6. Horan TC, Andrus M, Dudeck MA. CDC/NHSN surveillance definition of health care-associated
Acknowledgments
infection and criteria for specific types of infections in the acute care setting. Am J Infect Control.
The study was funded by NIH/NINR Grant R01 NR010822, 2008; 36: 309–332.
Distribution of the Costs of Antimicrobial Resistant Infections. 7. National Healthcare Safety Network (NHSN) [Internet]. Atlanta: Centers for Disease Control and
We gratefully acknowledge the administrative support of Bevin Prevention. Available from: http://www.cdc.gov/nhsn/library.html (accessed on November 1, 2010).

Cohen and statistical expertise of Jennifer Hill and Haomiao 8. Landers T, Apte M, Hyman S, Furuya Y, Glied S, Larson E. A comparison of methods to detect
urinary tract infections using electronic data. Jt Comm J Qual Patient Saf. 2010; 36: 411–417.
Jia.
9. All Patient Refined Diagnosis Related Groups (APR-DRGs). Version 20.0. Available from: http://
www.hcup-us.ahrq.gov/db/nation/nis/APR-DRGsV20MethodologyOverviewandBibliography.pdf.
Wallingford, CT: 3M Health Information Systems; 2003. Accessed September 29, 2011.
References
1. Balfour DC 3rd, Evans S, Januska J, Lee HY, Lewis SJ, Nolan SR, Noga M, Stemple C, Thapar 10. Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic co-
K. Health information technology—results of a roundtable. J Manag Care Pharm. 2009; 15(1 morbidity in longitudinal studies: development and validation. J Chronic Dis. 1987; 40: 373–383.
Suppl. A): 10–17. 11. Van Buuren S, Brand J, Groothuis-Oudshoorn CD, Rubin D. Fully conditional specification in
2. Poon EG, Jha AK, Christino M, Honour MM, Fernandopulle R, Middleton B, Newhouse J, Leape multivariate imputation. J Stat Comput Simul 2006; 76: 1049–1064.
L, Bates DW, Blumenthal D, et al. Assessing the level of health information technology in the 12. Allison PD. Missing Data (Quantitative Applications in the Social Sciences), Thousand Oaks,
United States: a snapshot. BMC Med Informat Decis Mak. 2006; 6: 1. CA: Sage, 2002: 27–72.
3. Jha AK, Ferris TG, Donelan K, DesRoches C, Shields A, Rosenbaum S, Blumenthal D. How 13. Xu H, Jiang M, Oetjens M, Bowton EA, Ramirez AH, Jeff JM, Basford MA, Pulley JM, Cowan
common are electronic health records in the United States? A summary of the evidence. Health JD, Wang X, et al. Facilitating pharmacogenetic studies using electronic health records and
Aff (Millwood) 2006; 25: w496–w507. natural language processing: a case study of warfarin. J Am Med Inform Assoc. 2011; 18:
387–391.
4. Sherman ER, Heydon KH, St. John KH, Teszner E, Rettig SL, Alexander SK, Zaoutis TZ, Coffin SE.
Administrative data fail to accurately identify cases of healthcare-associated infection. Infect Contr 14. Womack JA, Scotch M, Gibert C, Chapman W, Yin M, Justice AC, Brandt C. A comparison of two
Hosp Epidemiol. 2006; 27: 332–337. approaches to text processing: facilitating chart reviews of radiology reports in electronic medical
records. Perspect Health Inf Manag. 2010; 7: 1a.
5. Stevenson KB, Khan Y, Dickman J, Gillenwater T, Kulich P, Taylor D, Santangelo J, Lundy J,
Jarjoura D. Administrative coding data, compared with CDC/NHSN criteria, are poor indicators of 15. Mendonca EA, Haas J, Shagina L, Larson E, Friedman C. Extracting information on pneumonia in in-
health care-associated infections. Am J Infect Control 2008; 36: 155–164. fants using natural language processing of radiology reports. J Biomed Inform. 2005; 38: 314–321.

WWW.CTSJOURNAL.COM VOLUME 4 • ISSUE 5 343

Apte et al. Using Electronic Inpatient Hospital Data
I

Appendix: Algorithms used to define infection outcomes

Blood stream infections

Organism of interest1 Any organism

Case Case
Positive blood culture for an organism of interest AND No positive Positive blood culture with any organism2 AND No positive culture
culture with the same organism at other body site(s) within 14 with the same organism at other body site(s) within 14 days prior
days prior to positive blood culture to positive blood culture
Control Control
No positive blood culture for any organism No positive blood culture for any organism OR Only one culture
with common skin contaminant within 2 day period
Noncase, noncontrol Noncase, noncontrol
Positive blood culture with an organism NOT of interest OR ICD- ICD-9-CM code for sepsis and no/negative blood culture OR Posi-
9-CM code for sepsis and no/negative blood culture OR Positive tive culture with the same organism at other body site(s) within 14
culture with the same organism at other body site(s) within 14 days prior to a positive blood culture
days prior to a positive blood culture

Urinary tract infections

Organism of interest1 Any organism

Case Case
Positive urine culture with an organism of interest, that is, ⱖ10
5
Positive urine culture with any organism, that is, ⱖ105 colony
colony forming units per mL of urine and no more than one forming units per mL of urine and no more than one other species
other species of microorganism OR Positive urine culture with an of microorganism OR Positive urine culture with any organism, that
organism of interest, that is, 103–105 colony forming units per mL is, 103–105 colony forming units per mL of urine and no more than
of urine and no more than one other species of microorganism one other species of microorganism and pyuria (ⱖ 3 white blood
and pyuria (ⱖ 3 white blood cells per high power field in urine cells per high power field in urine microscopy) within ±48 hours of
microscopy) within ±48 hours of positive culture positive culture
Control Control
No positive urine culture with any organism AND No physician No positive urine culture with any organism AND No physician
diagnosis of a urinary tract infection (ICD-9-CM coding) diagnosis of a urinary tract infection (ICD-9-CM coding)
Noncase, Noncontrol Noncase, Noncontrol
Positive urine culture with an organism not of interest OR ICD-9-CM ICD-9-CM code for UTI and no positive urine culture with any
code for UTI + NO positive urine culture with any organism organism

Surgical site infection

Organism of interest1 Any organism

Case Case
Any NHSN operative procedure (as per ICD-9-CM procedure code)
3
Any NHSN operative procedure (as per ICD-9-CM procedure
performed AND Positive wound culture for an organism of interest within code) performed AND Positive wound culture for any organ-
30 days of NHSN procedure ism within 30 days of NHSN procedure
Control Control
NHSN operative procedure performed (as per ICD-9-CM code) AND No NHSN operative procedure performed (as per ICD-9-CM
wound culture performed code) AND No wound culture performed
Noncase, noncontrol Noncase, noncontrol
No NHSN operative procedure performed OR NHSN operative procedure No NHSN operative procedure performed OR NHSN opera-
performed followed by negative wound culture within 30 days OR NHSN tive procedure performed followed by negative wound
operative procedure performed followed by a positive wound culture culture within 30 days OR NHSN operative procedure per-
with an organism other than an organism of interest OR NHSN operative formed + no wound culture performed, but encounter has
procedure performed + no wound culture performed, but encounter has an ICD-9-CM code for Postoperative infection
an ICD-9-CM code for Postoperative infection

344 VOLUME 4 • ISSUE 5 WWW.CTSJOURNAL.COM

Apte et al. Using Electronic Inpatient Hospital Data
I

Pneumonia

Organism of interest1 Any organism

Case Case
ICD-9-CM coding for pneumonia (includes all bacterial PNU codes) AND ICD-9-CM coding for pneumonia (includes all PNU codes)
positive respiratory culture with an organism of interest AND positive respiratory culture with any organism
Control Control
No ICD-9-CM code for pneumonia AND No respiratory culture performed No ICD-9-CM code for pneumonia AND No respiratory cul-
or a negative respiratory culture ture performed or a negative respiratory culture
Noncase, noncontrol Noncase, noncontrol
ICD-9-CM code for pneumonia and positive respiratory culture for an ICD-9-CM code for pneumonia and no positive respiratory
organism NOT of interest OR ICD-9-CM code for pneumonia and no posi- culture performed OR negative respiratory culture OR No
tive respiratory culture performed or negative respiratory culture OR No ICD-9-CM code for pneumonia + positive respiratory culture
ICD-9-CM code for bacterial pneumonia + positive respiratory culture for for any organism OR No ICD-9-CM code for pneumonia
any organism OR No ICD-9-CM code for bacterial pneumonia + positive +Positive urine streptococcal antigen
urine streptococcal antigen
Notes: 1Organisms of interest are Staphylococcus aureus, Klebsiella pneumoniae, Enterococcus faecalis, Enterococcus faecium, Pseudomonas aeruginosa, Acinetobacter bau-
mannii, Streptococcus pneumoniae. 2A common skin contaminant must be cultured from two or more blood cultures drawn on separate occasions within 2 days of each other
to count as positive culture. Common skin contaminants in blood culture include diphtheroids (Corynebacterium spp.), Bacillus (not B. anthracis) spp., Propionibacterium spp.,
coagulase-negative staphylococci (including S. epidermidis), viridans group streptococci, Aerococcus spp., Micrococcus spp.). 3NHSN, National Healthcare Safety Network.

WWW.CTSJOURNAL.COM VOLUME 4 • ISSUE 5 345

Hospital Information Systems:: Statistics & Numerical Data
No ratings yet
Hospital Information Systems:: Statistics & Numerical Data
8 pages
EHR Integration for Research Data
No ratings yet
EHR Integration for Research Data
10 pages
Clinical Data in Public Health Surveillance
No ratings yet
Clinical Data in Public Health Surveillance
1 page
Wyatt Capstone Final
No ratings yet
Wyatt Capstone Final
23 pages
FHIR and Public Health
No ratings yet
FHIR and Public Health
70 pages
Ganiga Et Al. - 2020 - A Preliminary Study of Real-Time Capturing and Sharing of Routine Health Data Among The Public Healt
No ratings yet
Ganiga Et Al. - 2020 - A Preliminary Study of Real-Time Capturing and Sharing of Routine Health Data Among The Public Healt
5 pages
Health Data Anaytics
No ratings yet
Health Data Anaytics
40 pages
Health Ppt6
No ratings yet
Health Ppt6
24 pages
Hesr 53 3270
No ratings yet
Hesr 53 3270
8 pages
A Web Based It Support Portal For Health Officers Full Work
No ratings yet
A Web Based It Support Portal For Health Officers Full Work
49 pages
(Alia) CASE STUDY HSM541
No ratings yet
(Alia) CASE STUDY HSM541
6 pages
Performance Measures Using Electronic Health Records Five Case Studies
No ratings yet
Performance Measures Using Electronic Health Records Five Case Studies
46 pages
Case Study Presentation
No ratings yet
Case Study Presentation
12 pages
A Web Based It Support Portal For Health Officers Full Work
No ratings yet
A Web Based It Support Portal For Health Officers Full Work
46 pages
Tizon, R. - Assignment in NCM 112
No ratings yet
Tizon, R. - Assignment in NCM 112
4 pages
Tizon, R. - Assignment in NCM 112
No ratings yet
Tizon, R. - Assignment in NCM 112
4 pages
PICO Method Search
No ratings yet
PICO Method Search
29 pages
Closing Plenary: Bill Stead
No ratings yet
Closing Plenary: Bill Stead
24 pages
Gift Work Sketcgh
No ratings yet
Gift Work Sketcgh
35 pages
A Web Based It Support Portal For Health Officers Chapter One
No ratings yet
A Web Based It Support Portal For Health Officers Chapter One
9 pages
Finals Info
No ratings yet
Finals Info
33 pages
Identifying Appropriate Reference Data Models For .9
No ratings yet
Identifying Appropriate Reference Data Models For .9
8 pages
Benchmarking Hospital Performance - CUNY Acdemic Works (2015)
No ratings yet
Benchmarking Hospital Performance - CUNY Acdemic Works (2015)
14 pages
Health Information Systems: By: Amir EL-Ghamry
No ratings yet
Health Information Systems: By: Amir EL-Ghamry
52 pages
Clinical Data Analytics 59f63d5c33
No ratings yet
Clinical Data Analytics 59f63d5c33
4 pages
Module I Assignment
No ratings yet
Module I Assignment
8 pages
"Big Data" and The Electronic Health Record
No ratings yet
"Big Data" and The Electronic Health Record
8 pages
Chapter 9 Practice (Leadership) Information Technology For Safe and Qual-Ity Patient Care
No ratings yet
Chapter 9 Practice (Leadership) Information Technology For Safe and Qual-Ity Patient Care
2 pages
Handouts 1
No ratings yet
Handouts 1
15 pages
Identifying Appropriate Reference Data Models For.9
No ratings yet
Identifying Appropriate Reference Data Models For.9
8 pages
Chapter 14-16 Nursing Informatics
No ratings yet
Chapter 14-16 Nursing Informatics
4 pages
Using Common Ehr Functionality in Eclinicalworks To Help Improve Population Health
No ratings yet
Using Common Ehr Functionality in Eclinicalworks To Help Improve Population Health
12 pages
1 s2.0 S0167923620300944 Main
No ratings yet
1 s2.0 S0167923620300944 Main
13 pages
NI Final Exam
No ratings yet
NI Final Exam
7 pages
06 - Chapter 2
No ratings yet
06 - Chapter 2
29 pages
6854de4203b16 Wgu
No ratings yet
6854de4203b16 Wgu
32 pages
Integration of Nursing Informatics, Nursing Classification Systems, and Nursing Practice (Nur 603)
No ratings yet
Integration of Nursing Informatics, Nursing Classification Systems, and Nursing Practice (Nur 603)
56 pages
Dex (Data Exchange) and The City: Gil Kuperman, MD, PHD
No ratings yet
Dex (Data Exchange) and The City: Gil Kuperman, MD, PHD
21 pages
Shared Nationwide Interoperability Roadmap PDF
No ratings yet
Shared Nationwide Interoperability Roadmap PDF
1 page
Cinicaldatawarehousing - Rosen 4 11 07
No ratings yet
Cinicaldatawarehousing - Rosen 4 11 07
47 pages
CHN 1 Module 5
No ratings yet
CHN 1 Module 5
18 pages
Ch02 Health Care Data
No ratings yet
Ch02 Health Care Data
75 pages
Session 1
No ratings yet
Session 1
4 pages
HCI - Notes-Ch1-2
No ratings yet
HCI - Notes-Ch1-2
238 pages
Bari 11
No ratings yet
Bari 11
104 pages
01 NI Course Unit 9
No ratings yet
01 NI Course Unit 9
9 pages
Data Warehousing in The Healthcare Environmentr1 PDF
No ratings yet
Data Warehousing in The Healthcare Environmentr1 PDF
7 pages
Scribd 4
No ratings yet
Scribd 4
14 pages
The Design of Electronic Medical Records For P 2021 Journal of Infection and
No ratings yet
The Design of Electronic Medical Records For P 2021 Journal of Infection and
6 pages
Text 2
No ratings yet
Text 2
9 pages
Electronic Health Records Overview
No ratings yet
Electronic Health Records Overview
37 pages
2020 MUSE Conference - Sessions - March 6 2020
No ratings yet
2020 MUSE Conference - Sessions - March 6 2020
73 pages
E Health
No ratings yet
E Health
113 pages
5.2 Components
No ratings yet
5.2 Components
20 pages
Module 11 Informatics Application in Evidence Based Nursing Practice
No ratings yet
Module 11 Informatics Application in Evidence Based Nursing Practice
25 pages
21DCS029 Case Study
No ratings yet
21DCS029 Case Study
4 pages
Envisioning A Learning Healthcare System
No ratings yet
Envisioning A Learning Healthcare System
6 pages
Scorpion (Asuryani Tank) - Warhammer 40k Wiki - Fandom
No ratings yet
Scorpion (Asuryani Tank) - Warhammer 40k Wiki - Fandom
5 pages
Kipchak Age of Empires Series Wiki Fandom
No ratings yet
Kipchak Age of Empires Series Wiki Fandom
12 pages
Monaspa Age of Empires Series Wiki Fandom
No ratings yet
Monaspa Age of Empires Series Wiki Fandom
10 pages
Aspect Warriors - Warhammer 40k Wiki - Fandom
No ratings yet
Aspect Warriors - Warhammer 40k Wiki - Fandom
8 pages
Star Trek Infinite - Wikipedia
No ratings yet
Star Trek Infinite - Wikipedia
2 pages
Monolith - Warhammer 40k Wiki - Fandom
No ratings yet
Monolith - Warhammer 40k Wiki - Fandom
8 pages
Halberdier (Age of Empires II) Age of Empires Series Wiki Fandom
No ratings yet
Halberdier (Age of Empires II) Age of Empires Series Wiki Fandom
11 pages
Warhammer 40,000 - 1d4chan
No ratings yet
Warhammer 40,000 - 1d4chan
20 pages
Perpetual - 1d4chan
No ratings yet
Perpetual - 1d4chan
8 pages
Watchers in The Dark - 1d4chan
No ratings yet
Watchers in The Dark - 1d4chan
3 pages
Ynnead - 1d4chan
No ratings yet
Ynnead - 1d4chan
5 pages
Hrud - 1d4chan
100% (1)
Hrud - 1d4chan
6 pages
Fourth Sphere of Expansion - Warhammer 40k - Lexicanum
No ratings yet
Fourth Sphere of Expansion - Warhammer 40k - Lexicanum
2 pages
T'au Empire's Startide Nexus
No ratings yet
T'au Empire's Startide Nexus
2 pages
Psychic Awakening - Warhammer 40k Wiki - Fandom
No ratings yet
Psychic Awakening - Warhammer 40k Wiki - Fandom
31 pages
Fifth Sphere of Expansion - Warhammer 40k - Lexicanum
No ratings yet
Fifth Sphere of Expansion - Warhammer 40k - Lexicanum
2 pages
My Fireworks Grimdark Broadside - Tau40K
No ratings yet
My Fireworks Grimdark Broadside - Tau40K
3 pages
Printed Warhammer
No ratings yet
Printed Warhammer
3 pages
A Noob Father Learning For His Son - TTSWarhammer40k
No ratings yet
A Noob Father Learning For His Son - TTSWarhammer40k
5 pages
Henry Cavill Threatens To EXIT Amazon's Warhammer 40K ! + Woke Feminists Swarm Stellar Blade - YouTube
No ratings yet
Henry Cavill Threatens To EXIT Amazon's Warhammer 40K ! + Woke Feminists Swarm Stellar Blade - YouTube
7 pages
Cypher - Warhammer 40k - Lexicanum
No ratings yet
Cypher - Warhammer 40k - Lexicanum
6 pages
Era Indomitus - Warhammer 40k Wiki - Fandom
No ratings yet
Era Indomitus - Warhammer 40k Wiki - Fandom
31 pages
Henry Cavil 40k - Google Search
No ratings yet
Henry Cavil 40k - Google Search
2 pages
Imperium Nihilus - Warhammer 40k Wiki - Fandom
No ratings yet
Imperium Nihilus - Warhammer 40k Wiki - Fandom
16 pages
Konrad Curze - Warhammer 40k - Lexicanum
No ratings yet
Konrad Curze - Warhammer 40k - Lexicanum
9 pages
Legion of The Damned - Warhammer 40k Wiki - Fandom
No ratings yet
Legion of The Damned - Warhammer 40k Wiki - Fandom
78 pages
Konrad Curze - Warhammer 40k Wiki - Fandom
No ratings yet
Konrad Curze - Warhammer 40k Wiki - Fandom
104 pages
Will There Ever Be Total War Warhammer 40k Wargamer
No ratings yet
Will There Ever Be Total War Warhammer 40k Wargamer
19 pages
Imperial Compliance - Warhammer 40k Wiki - Fandom
No ratings yet
Imperial Compliance - Warhammer 40k Wiki - Fandom
8 pages
Corvus Corax - Warhammer 40k Wiki - Fandom
No ratings yet
Corvus Corax - Warhammer 40k Wiki - Fandom
68 pages
Hydrafacial Md-Catalog
No ratings yet
Hydrafacial Md-Catalog
8 pages
Final Comment Conceptual Plan (GCH) - 24june24
No ratings yet
Final Comment Conceptual Plan (GCH) - 24june24
26 pages
Pediatric IV Fluid Therapy Guide
No ratings yet
Pediatric IV Fluid Therapy Guide
1 page
Burn in Pediatric
No ratings yet
Burn in Pediatric
5 pages
Pathology Expansion & Innovation
No ratings yet
Pathology Expansion & Innovation
19 pages
Reticular Abscess
No ratings yet
Reticular Abscess
4 pages
Lenght of Stay NICU
No ratings yet
Lenght of Stay NICU
8 pages
BioS 2024 CA ProductSheet ACE EN
No ratings yet
BioS 2024 CA ProductSheet ACE EN
2 pages
Sexologia
No ratings yet
Sexologia
3 pages
Assessment Diagnosis Planning Nursing Interventions Rationale Evaluation
No ratings yet
Assessment Diagnosis Planning Nursing Interventions Rationale Evaluation
6 pages
What Is Cognitive Impairmenwhat Is Cognitive Impairment?
No ratings yet
What Is Cognitive Impairmenwhat Is Cognitive Impairment?
5 pages
Blue Orange
0% (1)
Blue Orange
1 page
Auspar Ferric Carboxymaltose 191001 Pi
No ratings yet
Auspar Ferric Carboxymaltose 191001 Pi
15 pages
Surgery: The Etymology of The Word Surgery Chiron Hand Ergon Work
No ratings yet
Surgery: The Etymology of The Word Surgery Chiron Hand Ergon Work
36 pages
MEVULAK MR POA Qtr-3-2025
No ratings yet
MEVULAK MR POA Qtr-3-2025
1 page
Clinical Outline of Oral Pathology Diagnosis and Treatment 4th Edition Lewis R. Eversole Instant Download
100% (5)
Clinical Outline of Oral Pathology Diagnosis and Treatment 4th Edition Lewis R. Eversole Instant Download
71 pages
Chapter 24 Immune System
No ratings yet
Chapter 24 Immune System
91 pages
Introduction Breast Cancer
No ratings yet
Introduction Breast Cancer
6 pages
Ignorance of Nutrition Is No Longer Defensible
No ratings yet
Ignorance of Nutrition Is No Longer Defensible
2 pages
Surgical History for Medical Scholars
No ratings yet
Surgical History for Medical Scholars
52 pages
Immediate Newborn Care: The First 90 Minutes: Time Band
No ratings yet
Immediate Newborn Care: The First 90 Minutes: Time Band
27 pages
Oral Lesions in Crack and Cocaine User Patients Literature Review
No ratings yet
Oral Lesions in Crack and Cocaine User Patients Literature Review
5 pages
Vitamin A: Benefits, Sources, and Risks
No ratings yet
Vitamin A: Benefits, Sources, and Risks
5 pages
Duralac Secure - Datasheet - 2020-10-08 - 14-07-08
No ratings yet
Duralac Secure - Datasheet - 2020-10-08 - 14-07-08
7 pages
Drugs For Treatment of Glaucoma
No ratings yet
Drugs For Treatment of Glaucoma
13 pages
Arteria Pancreatica Magna - Google Search
No ratings yet
Arteria Pancreatica Magna - Google Search
1 page
AORN Journal - 2019 - Guideline Quick View Transmission Based Precautions
No ratings yet
AORN Journal - 2019 - Guideline Quick View Transmission Based Precautions
8 pages
Abortion Ethics and Procedures
50% (2)
Abortion Ethics and Procedures
16 pages
Pysiology of Digestive System
No ratings yet
Pysiology of Digestive System
26 pages
Treatment of Hyperlipidemias: Pharmacology Team
100% (1)
Treatment of Hyperlipidemias: Pharmacology Team
53 pages

Using Electronically Available Inpatient Hospital Data For Research

Uploaded by

Using Electronically Available Inpatient Hospital Data For Research

Uploaded by

Using Electronically Available Inpatient Hospital Data for Research

338 VOLUME 4 • ISSUE 5 WWW.CTSJOURNAL.COM

WWW.CTSJOURNAL.COM VOLUME 4 • ISSUE 5 339

Site/organism Acinetobacter Enterococcus Klebsiella Pseudomonas Staphylococcus Streptococcus Total

340 VOLUME 4 • ISSUE 5 WWW.CTSJOURNAL.COM

Variable/hospital Community Pediatric Tertiary1 Tertiary 2 Total

WWW.CTSJOURNAL.COM VOLUME 4 • ISSUE 5 341

infections, the general process

342 VOLUME 4 • ISSUE 5 WWW.CTSJOURNAL.COM

WWW.CTSJOURNAL.COM VOLUME 4 • ISSUE 5 343

Appendix: Algorithms used to define infection outcomes

Blood stream infections

Organism of interest1 Any organism

Urinary tract infections

Organism of interest1 Any organism

Surgical site infection

Organism of interest1 Any organism

344 VOLUME 4 • ISSUE 5 WWW.CTSJOURNAL.COM

Organism of interest1 Any organism

WWW.CTSJOURNAL.COM VOLUME 4 • ISSUE 5 345

You might also like