Development and Validation of The Quick COVID19 Severity Index A Prognostic Tool For Early Clinical Decompensation PDF
Development and Validation of The Quick COVID19 Severity Index A Prognostic Tool For Early Clinical Decompensation PDF
Study objective: The goal of this study is to create a predictive, interpretable model of early hospital respiratory failure among
emergency department (ED) patients admitted with coronavirus disease 2019 (COVID-19).
Methods: This was an observational, retrospective, cohort study from a 9-ED health system of admitted adult patients with severe
acute respiratory syndrome coronavirus 2 (COVID-19) and an oxygen requirement less than or equal to 6 L/min. We sought to
predict respiratory failure within 24 hours of admission as defined by oxygen requirement of greater than 10 L/min by low-flow
device, high-flow device, noninvasive or invasive ventilation, or death. Predictive models were compared with the Elixhauser
Comorbidity Index, quick Sequential [Sepsis-related] Organ Failure Assessment, and the CURB-65 pneumonia severity score.
Results: During the study period, from March 1 to April 27, 2020, 1,792 patients were admitted with COVID-19, 620 (35%) of
whom had respiratory failure in the ED. Of the remaining 1,172 admitted patients, 144 (12.3%) met the composite endpoint
within the first 24 hours of hospitalization. On the independent test cohort, both a novel bedside scoring system, the quick COVID-
19 Severity Index (area under receiver operating characteristic curve mean 0.81 [95% confidence interval {CI} 0.73 to 0.89]), and
a machine-learning model, the COVID-19 Severity Index (mean 0.76 [95% CI 0.65 to 0.86]), outperformed the Elixhauser mortality
index (mean 0.61 [95% CI 0.51 to 0.70]), CURB-65 (0.50 [95% CI 0.40 to 0.60]), and quick Sequential [Sepsis-related] Organ
Failure Assessment (0.59 [95% CI 0.50 to 0.68]). A low quick COVID-19 Severity Index score was associated with a less than 5%
risk of respiratory decompensation in the validation cohort.
Conclusion: A significant proportion of admitted COVID-19 patients progress to respiratory failure within 24 hours of admission.
These events are accurately predicted with bedside respiratory examination findings within a simple scoring system. [Ann Emerg
Med. 2020;76:442-453.]
Please see page 443 for the Editor’s Capsule Summary of this article.
Readers: click on the link to go directly to a survey in which you can provide feedback to Annals on this particular article.
0196-0644/$-see front matter
Copyright © 2020 by the American College of Emergency Physicians.
https://doi.org/10.1016/j.annemergmed.2020.07.022
Editor’s Capsule Summary greater than 10% were observed to have increased oxygen
requirements within 24 hours. Conversely, among these
What is already known on this topic patients admitted to higher levels of care, approximately
Patients with coronavirus disease 2019 (COVID-19) 70% did not progress above nasal cannula oxygen at 6 L/
can experience respiratory deterioration after hospital min. These data suggest potential to improve our ability to
admission. risk stratify ED patients before admission.
What question this study addressed
Data were extracted retrospectively for 1,172 Goals of This Investigation
COVID-19 patients admitted to the hospital from 9 The objective of this study was to derive a risk-
emergency departments to identify factors that may stratification tool to predict 24-hour respiratory
predict deterioration requiring oxygen greater than decompensation in admitted patients with COVID-19.
10 L/min, noninvasive ventilation, or intubation; or Here, we expand on previous efforts describing the course
that leads to death within 24 hours of hospital of critical COVID-19 illness in 3 ways. First, we focused
admission. on ED prognostication by studying patient outcomes
within 24 hours of admission, using data available during
What this study adds to our knowledge the first 4 hours of presentation.11 Although critical illness
A model (quick COVID-19 Severity Index) with 3 often occurs later in hospitalization, the relevance of these
variables (respiratory rate, pulse oximetry, and oxygen later events to ED providers is less clear. We emphasize
flow rate) outperformed other models, including the oxygen requirements and mortality rather than ICU
quick Sequential [Sepsis-related] Organ Failure placement because we have observed the latter to have
Assessment and CURB-65, on an independent highly variable criteria, depending on total patient census.12
validation cohort. Second, to aid health care providers in assessing illness
How this might change clinical practice severity in COVID-19 patients, we presented predictive
The quick COVID-19 Severity Index model may be models of early respiratory failure during hospitalization
useful to assist level-of-care decisions for admitted and compare them with 3 benchmarks accessible with data
COVID-19 patients. It is not known how well it in the electronic health record: the Elixhauser Comorbidity
performs relative to physician gestalt. Index,13 the quick Sequential [Sepsis-related] Organ
Failure Assessment (qSOFA),14,15 and the CURB-65
pneumonia severity score.16 Although many clinical risk
models exist, these benefit from wide clinical acceptability
and relative model parsimony because they require minimal
of greater than or equal to 50% of lungs.9,10 Critical input data for calculation. The Elixhauser Comorbidity
COVID-19 exists on a spectrum with severe illness and Index was derived to enable prediction of hospital death
involves organ failure, often leading to prolonged with administrative data.13 The qSOFA score was included
mechanical ventilation.9 In a large cohort of COVID-19 in SEPSIS-3 guidelines and can be scored at the bedside
patients, severe and critical illness represented almost 20% because it includes respiratory rate, mental status, and
of the studied population.10 In most institutions, systolic blood pressure.14 The CURB-65 pneumonia
dispositions for patients with critical respiratory failure (eg, severity score has been well validated for hospital
those receiving ventilation or with nonrebreather masks) disposition, but its utility in both critical illness and
are largely apparent and determined by system protocols COVID-19 is unclear.16,17 Third, we made the quick
and capacity. Rapid progression from severe to critical COVID-19 Severity Index, a prognostic tool, available to
illness, however, is a common problem and presents a the public through a Web interface.
prognostic challenge for ED providers determining
admissions.
For this reason, we focus on patients for whom critical MATERIALS AND METHODS
respiratory illness is not universally apparent in the ED; Study Design and Setting
namely, those requiring nasal cannula with oxygen less than This was a retrospective observational cohort study to
or equal to 6 L/min. In our health system, 6 L/min is develop a prognostic model of early respiratory
typically the maximum flow rate delivered by nasal decompensation in patients admitted from the emergency
cannula. Greater than 90% of patients receiving oxygen at department (ED) with COVID-19. The health care system
less than 6 L/min are admitted to the floors, but of those, is composed of a mix of suburban community (n¼6),
urban community (n¼2), and urban academic (n¼1) EDs. institutional computational health care team as part of a
Data from 8 EDs were used in the derivation and cross systemwide process to standardize outcomes for COVID-
validation of the predictive model, whereas data from the 19–related research.
last urban community site was withheld for independent Data included visits from March 1, 2020, through April
validation. We adhered to the Transparent Reporting of a 27, 2020, because our institution’s first COVID-19 tests
Multivariable Prediction Model for Individual Prognosis or were ordered after March 1, 2020. This study included
Diagnosis checklist (Appendix E1, available online at admitted COVID-19–positive patients as determined by
http://www.annemergmed.com).18 This study was test results ordered between 14 days before and up to 24
approved by our local institutional review board. hours after hospital presentation. We included delayed
testing because institutional guidelines initially restricted
testing within the hospital to inpatient wards. Testing for
Data Collection and Processing COVID-19 was performed at local or reference laboratories
Patient demographics, summarized medical histories, by nucleic acid detection methods using oropharyngeal or
vital signs, outpatient medications, chest radiograph nasopharyngeal swabs, or a combination oropharyngeal/
reports, and laboratory results available during the ED nasopharyngeal swab. We excluded patients younger than
encounter were extracted from our local Observational 18 years and those who required oxygen at more than 6 L/
Medical Outcomes Partnership data repository and min or otherwise met our critical illness criteria at any point
analyzed within our computational health platform.19 Data within 4 hours of presentation. The latter was intended to
were collected into a research cohort with custom scripts in exclude patients for whom critical illness was nearly
PySpark (version 2.4.5) that were reviewed by an immediately apparent to the medical provider and for
independent analyst. whom a prediction would not be helpful. Patients who
Nonphysiologic values likely related to data entry errors explicitly opted out of research were excluded from analysis
for vital signs were converted to missing values based on (n<5). Data were extracted greater than 24 hours after the
expert-guided rules (Appendix E1, available online at last included patient visit so that all outcomes could be
http://www.annemergmed.com [Table S1]). Laboratory extracted from the electronic health record.
values at minimum or maximum thresholds and encoded We generated comparator models using the Elixhauser
with “<“ or “>“ were converted to the numeric threshold Comorbidity Index, qSOFA, and CURB-65 (Appendix E1,
value, and other nonnumeric values were dropped. Medical available online at http://www.annemergmed.com).
histories were generated by using diagnoses before the date International Statistical Classification of Diseases and Related
of admission to exclude potential future information in Health Problems, 10th Revision codes from patient medical
modeling. Outpatient medications were mapped to the histories were mapped to Elixhauser comorbidities and
First DataBank Enhanced Therapeutic Classification indices with H-CUP Software and Tools (hcuppy package;
System.20 Radiograph reports were manually reviewed by 2 version 0.0.7).21,22 qSOFA was calculated as the sum of the
physicians and categorized as “no opacity,” “unilateral following findings, each of which was worth 1 point:
opacity,” or “bilateral opacities.” One hundred radiograph Glasgow Coma Scale score less than 15, respiratory rate
reports were reviewed by both physicians to determine greater than or equal to 22 breaths/min, and systolic blood
interrater agreement with weighted k. Oxygen devices were pressure less than or equal to 100 mm Hg. CURB-65 was
similarly extracted from the Observational Medical calculated as the sum of the following findings, each of
Outcomes Partnership (Appendix E1, available online at which was worth 1 point: Glasgow Coma Scale score less
http://www.annemergmed.com [Table S2]). than 15, blood urea nitrogen level greater than 19 mg/dL,
We defined critical respiratory illness in the setting of respiratory rate greater than or equal to 30 breaths/min,
COVID-19 as any COVID-19 patient meeting one of the systolic blood pressure less than 90 mm Hg or diastolic
following criteria: oxygenation flow rate greater than or blood pressure less than or equal to 60 mm Hg, and aged
equal to 10 L/min, high-flow oxygenation, noninvasive 65 years. Baseline models were evaluated on the training
ventilation, invasive ventilation, or death (Appendix E1, and internal validation cohort, using logistic regression on
available online at http://www.annemergmed.com the calculated scores.
[Table S2]). We did not include ICU admission in our Samples from 8 hospitals were used in model generation
composite outcome because at the start of the COVID-19 and internal validation with the remaining large, urban
pandemic, ICU admissions were protocolized to include community hospital serving as an independent site for
even minimal oxygen requirements. A subset of outcomes validation. All models were fit on patient demographic and
was manually reviewed by physician members of the clinical data collected during the first 4 hours of patient
presentation, and predictions are made with the most excluded by meeting critical respiratory illness endpoints
recently available data at the 4-hour point unless otherwise within 4 hours of presentation. Of the included patients,
noted. We used an ensemble technique to identify and rank 144 (12.3%) had respiratory decompensation within the
potentially important predictive variables based on their first 24 hours of hospitalization: 101 (8.6%) requiring
occurrence across multiple selection methods: univariate oxygen flow at greater than 10 L/min, 112 (9.6%) with
regression, random forest, logistic regression with LASSO, high-flow device support (Appendix E1, available online at
c2 testing, gradient-boosting information gain, and http://www.annemergmed.com [Table S2]), 4 (0.3%)
gradient-boosting Shapley additive explanation (SHAP) receiving noninvasive ventilation, 10 (0.8%) with invasive
interaction values (Appendix E1, available online at http:// ventilation, and 1 (0.01%) who died. Fifty-nine patients
www.annemergmed.com).23-25 We counted the co- (5%) were admitted to the ICU within the 4- to 24-hour
occurrences of the top 30, 40, and 50 variables of each of period. Population characteristics including demographics
the methods before selecting features for a minimal scoring and comorbidities for the study are shown in Table 1.
model (quick COVID-19 Severity Index) and machine- Study patient flow is shown in Figure 1 and patient
learning model (COVID-19 Severity Index) using gradient characteristics for the development and validation
boosting. For the quick COVID-19 Severity Index, we populations are shown in Appendix E1 (available online at
used a point system guided by logistic regression (Appendix http://www.annemergmed.com [Tables S3 to S4]).
E1, available online at http://www.annemergmed.com). Our full data set included 713 patient variables available
The gradient-boosting COVID-19 Severity Index model during the first 4 hours of the patient encounters
was fit with the XGBoost package and hyperparameters (Appendix E1, available online at http://www.
were set with Bayesian optimization with a tree-structured annemergmed.com [Table S5]). These included
Parzen estimator (Appendix E1, available online at http:// demographics, vital signs, laboratory values, comorbidities,
www.annemergmed.com).26,27 All analyses were performed chief complaints, outpatient medications, tobacco use
in Python (version 3.8.2). histories, and radiographs. Radiologist-evaluated
We report summary statistics of model performance in radiographs were classified into 3 categories, with strong
predicting the composite outcome between 4 and 24 hours interrater agreement (k¼0.81). Associations between
of hospital arrival. We used bootstrapped logistic regression radiographic findings and outcomes are shown in Appendix
with 10-fold cross validation to generate performance E1 (available online at http://www.annemergmed.com
benchmarks for the Elixhauser, qSOFA, CURB-65, and [Table S6]). We preferentially selected variables available at
quick COVID-19 Severity Index models and bootstrapped bedside for derivation of the quick COVID-19 Severity
gradient boosting with 10-fold cross validation for the Index. Our ensemble approach identified 3 bedside
COVID-19 Severity Index model. Where necessary, data variables as consistently important across the variable
were imputed with training set median values of bootstraps. selection models: nasal cannula requirement, minimum
We report area under the receiver operating characteristic recorded pulse oximetry, and respiratory rate (Appendix
(ROC), accuracy, sensitivity and specificity at Youden’s E1, available online at http://www.annemergmed.com
index, area under the precision-recall curve,28 Brier score, [Figure S1]). These 3 features appeared in at least 5 of the 6
F1 score, and average precision (Appendix E1, available variable selection methods.
online at http://www.annemergmed.com). Similarly, to We divided each of these 3 clinical variables into value
evaluate model performance on the independent validation ranges according to clinical experience and used logistic
cohort, means and confidence intervals were calculated regression to derive weights for the quick COVID-19
from bootstrap iterations of the test set, using sampling Severity Index scoring system (Table 2). Normal
with replacement. We report 95% confidence intervals physiology was used as the baseline category, and the
derived from the percentiles of the bootstrapped logistic regression odds ratios were offset to assign normal
distribution or Welch’s 2-sample t test for statistical clinical parameters zero points in the quick COVID-19
comparisons of model performance.29 Severity Index (Appendix E1, available online at http://
www.annemergmed.com). The quick COVID-19 Severity
Index score ranges from 0 to 12.
RESULTS We identified an additional 12 features from the
Characteristics of Study Subjects predictive factor analysis for use in a machine-learning
Between March 1, 2020, and April 27, 2020, there were model (COVID-19 Severity Index) with gradient boosting
a total of 1,792 admissions for COVID-19 patients (Table 2 and Appendix E1, available online at http://www.
meeting our age criteria. Of these, 620 patients (35%) were annemergmed.com [Figure S1]). These variables were
Table 1. Continued.
24-Hour Critical Respiratory Illness
selected by balancing the goals of model parsimony, (Figure 2).25,30-32 SHAP values are an extension of the
minimizing highly correlated features (ie, various game-theoretic Shapley values that seek to describe variable
summaries of vital signs), and predictive performance. We effects on model output, defined as the contribution of a
used SHAP methods to understand the importance of specific variable to the prediction itself.30 The key
various clinical variables in the COVID-19 Severity Index advantage of the related SHAP values is that they add
0.44 (0.29–0.60)
0.20 (0.13–0.30)
0.20 (0.12–0.29)
0.16 (0.10–0.24)
boosting, which otherwise provide opaque outputs.
SHAP values are dimensionless and represent the
log odds of the marginal contribution a variable makes
on a single prediction. In the case of our gradient-
boosting COVID-19 Severity Index model, we used an
Table 3. Performance characteristics for the COVID-19 Severity Index, quick COVID-19 Severity Index, and comparison models on independent validation.
contributions.33
The rank order of average absolute SHAP values across
all variables in a model suggests the most important
variables in assigning modeled risk. For the COVID-19
Severity Index, these were flow rate by nasal cannula,
0.25 (0.25–0.25)
0.12 (0.09–0.15)
0.12 (0.09–0.15)
0.12 (0.09–0.15)
0.10 (0.07–0.13)
Brier Score
0.47 (0.30–0.64)
0.19 (0.11–0.29)
0.22 (0.11–0.35)
*The COVID-19 Severity Index area under the ROC was statistically greater than qSOFA and Elixhauser after testing with Welch’s t test.29,46
0.72 (0.64–1.00)
0.52 (0.18–1.00)
0.42 (0.15–0.78)
0.79 (0.71–0.87)
0.82 (0.45–1.00)
0.79 (0.63–0.93)
0.47 (0.06–0.66)
0.57 (0.03–0.97)
0.83 (0.79–0.88)
0.82 (0.77–0.88)
0.49 (0.26–0.74)
0.81 (0.73–0.89)
0.59 (0.50–0.68)
0.61 (0.51–0.70)
Figure 2. SHAP variable importance and bee swarm plots. A, Mean absolute SHAP values suggest a rank order for variable
importance in the COVID-19 Severity Index. B, In the bee swarm plot, each point corresponds to an individual person in the study.
The points’ position on the x axis shows the effect that feature has on the model’s prediction for a given patient. Color corresponds
to relative variable value.
Index and COVID-19 Severity Index scores by assigning all annemergmed.com [Figures S2 to 3]).34 These calibration
patients in the independent validation cohort each of the curves suggest that outcome rates increased with quick
scores and comparing them with known outcomes COVID-19 Severity Index and COVID-19 Severity Index
(Figure 4 and Appendix E1, available online at http://www. scores. A quick COVID-19 Severity Index score of less
Figure 3. SHAP value plots for age (A), alanine aminotransferase (B), aspartate aminotransferase (C), and ferritin (D). Scatter plots
show the effects of variable values (x axis) on the model predictions as captured by SHAP values (y axis).
Figure 4. Calibration of quick COVID-19 Severity Index and COVID-19 Severity Index on the independent validation data set. A,
Each patient in the validation cohort was assigned a score by quick COVID-19 Severity Index, and the percentage who had a critical
respiratory illness outcome were plotted with a line plot. Patients were then grouped into risk bins by quick COVID-19 Severity Index
intervals (0 to 3, 4 to 6, 7 to 9, and 10 to 12); the percentage of patients in each group with the outcome is indicated in the bar plot.
B, Each patient in the validation cohort was assigned a COVID-19 Severity Index score, a percentage risk from 0% to 100% using
gradient boosting and isotonic regression. The percentage of patients with COVID-19 Severity Index scores of 0% to 33%, 33% to
66%, and 66% to 100% who experienced critical respiratory illness at 24 hours is shown.
than or equal to 3 has a sensitivity of 0.79 (0.65 to 0.93), assumed a Glasgow Coma Scale score of 15 unless
specificity 0.78 (0.72 to 0.83), PPV 0.36 (0.25 to 0.47), documented otherwise, which may underestimate severity
NPV 0.96 (0.93 to 0.99), LRþ 3.55 (3.51 to 3.59), and in qSOFA and CURB-65. Likewise, comorbidities were
LR- 0.27 (0.26 to 0.28). populated from previous in-system diagnoses; patients
The quick COVID-19 Severity Index is available at without system visits are likely to have lower Elixhauser
https://covidseverityindex.org. The quick COVID-19 indices than those whose care was integrated within the
Severity Index calculator includes selection boxes for each health system. In the quick COVID-19 Severity Index
of the 3 variables, which are summed to generate a score calculations, nasal cannula flow rate was imputed if nasal
and prediction as estimated with the independent cannula was documented without a flow rate. In the
validation cohort. COVID-19 Severity Index, no specific imputations were
required because gradient boosting natively handles missing
values. Chest radiograph interpretation was conducted
LIMITATIONS manually with radiology reports, but without reviewing the
The data in this study were observational and provided radiography, which introduces subjectivity as reflected in
from a single health system, and so they may not be the interrater agreement metric.
generalizable according to local testing and admissions There are limitations in model performance, with
practices. Our data were extracted from an electronic health confidence intervals reflective of moderate study size. We
record, which is associated with known limitations, additionally did not compare the models with unstructured
including propagation of old or incomplete data. There are provider judgment, and thus one cannot make conclusions
important markers of oxygenation that were out of the about whether this tool has utility beyond clinical gestalt.
scope of our study, including alveolar-arterial gradients. Most significant, however, is that management of COVID-
Because of data availability, no signs or symptoms or 19 is evolving, so future clinical decisions may not match
provider notes were included as candidate predictor those standards used in the reported clinical settings.
variables.
Retrospective observational studies lack control of
variables, so prospective studies will be required to assess DISCUSSION
validity of the presented models and the specificity of the Consistent with clinical observations, we noted a
features we identify as important to COVID-19 significant rate of progression to critical respiratory illness
progression. Because of the retrospective nature of this within the first 24 hours of hospitalization in COVID-19
study and the use of electronic health records, data patients. We used 6 parallel approaches to identify a subset
imputation and assumptions about missingness were of variables for the final quick COVID-19 Severity Index
required, which introduced biases into our results. We and COVID-19 Severity Index models. The quick
COVID-19 Severity Index ultimately requires only 3 predictions as calculated with SHAP values.35,36 The
variables, all of which are accessible at the bedside. transition point at which the SHAP value analysis
We propose that a quick COVID-19 Severity Index identified model risk associated with liver chemistries was at
score of 3 or less be considered low likelihood for 24-hour the high end of normal, consistent with previous
respiratory critical illness, with a mean outcome rate of 4% observations that noted normal to mild liver dysfunction
in the independent validation cohort (Figure 4) and a LR- among COVID-19 patients. We hypothesize that the
of 0.27 (0.26 to 0.28). This score is achievable under the asymptotic quality of the investigated variables with respect
following patient conditions: respiratory rate less than or to COVID-19 Severity Index risk contributions reflects our
equal to 28 breaths/min, minimum pulse oximetry reading moderate study size. We expect that scaling COVID-19
of greater than or equal to 89%, and oxygen flow rate of Severity Index training to larger cohorts will further
less than or equal to 2 L. In the validation cohort, a quick elucidate the effects of more extreme laboratory values.
COVID-19 Severity Index cutoff greater than 3 had a Although our data set included host risk factors, including
sensitivity of 0.79 (0.65 to 0.93) in predicting progression smoking history, obesity, and body mass index, these did
of respiratory failure. However, few patients in the not appear to play a prominent role in predicting acute
validation cohort had a quick COVID-19 Severity Index deterioration. Here, we recognize 2 important
score of 3 (SpO2 of 89% to 92% and respiratory rates of 23 considerations: first, that predictive factors may not be
to 28 breaths/min with oxygen requirement 2 L/min) mechanistic or causative factors in disease, and second, that
(Appendix E1, available online at http://www. these factors may be related to disease severity without
annemergmed.com [Figure S2]). In the validation cohort, providing predictive value for 24-hour decompensation.
patients with a quick COVID-19 Severity Index score of 4 We included radiographs for 1,170 visits in this cohort.
to 6 had a 30% rate of decompensation, whereas the group Radiographs are of significant clinical interest because
with a score of 7 to 9 had a 44% rate and the group with a previous studies have shown high rates of ground-glass
score of 10 to 12 had a 57% rate. A quick COVID-19 opacity and consolidation.37 Chest computed tomography
Severity Index score of greater than 9 had a specificity of may have superior utility for COVID-19 investigation, but
0.99 in predicting respiratory failure, with a LR of 8.36 the procedure is not being widely performed at our
(7.98 to 8.76). Taken together, the quick COVID-19 institutions as part of risk stratification or prognostic
Severity Index provides an objective tool for planning evaluation.38,39 Radiograph reports were classified
hospital dispositions. Patients with low quick COVID-19 according to containing bilateral, unilateral, or no opacities
Severity Index scores are unlikely to have respiratory or consolidations. We found high interrater agreement in
decompensation, whereas those with high scores may this coding, but radiographs were not consistently
benefit from higher levels of care. identified by our variable selection models. A majority of
COVID-19 Severity Index performance on the patients were coded as having bilateral consolidations,
validation cohort was not superior to that of the quick limiting the specificity of the findings. Further studies using
COVID-19 Severity Index. We hypothesize that this may natural language processing of radiology reports or direct
be related to cohort differences or COVID-19 Severity analysis of radiographs with tools such as convolutional
Index overfitting on the development cohort. The neural networks will provide more evidence regarding
COVID-19 Severity Index offers opportunities to examine utility of these studies in COVID-19 prognostication.40
further potential COVID-19 prognostic factors. We used Furthermore, we do not consider other applications of
gradient-boosting models rather than logistic regression radiographs including the identification of other pulmonary
because gradient boosting allowed us to better capture findings like diagnosis of bacterial pneumonia.
nonlinear relationships, such as those observed in the liver The Elixhauser Comorbidity Index, qSOFA, and
chemistries, and natively handles missing values without CURB-65 baseline models provided the opportunity to test
imputation. Lower age had higher SHAP values, suggesting well-known risk-stratification and prognostication tools
potential bias in the admitted patient cohort; young with a COVID-19 cohort. These tools were selected, in
admitted patients may be more acutely ill than older ones. part, for their familiarity within the medical community,
In alignment with current hypotheses about COVID-19 and because each has been proposed as having potential
severity, multiple variable selection techniques identified utility within the COVID-19 epidemic. These metrics have
inflammatory markers, including C-reactive protein and relatively limited predictive performance, and there were
ferritin, as potentially important predictors. More striking, limitations in electronic health records; none were designed
however, was the importance of aspartate aminotransferase to address the clinical question addressed here. We
and alanine aminotransferase in COVID-19 Severity Index observed both a high rate of missing mental status
documentation and a significant proportion of the work; AND (2) Drafting the work or revising it critically for important
population without documented medical histories. intellectual content; AND (3) Final approval of the version to be
published; AND (4) Agreement to be accountable for all aspects of
In particular, we hypothesize that the CURB-65
the work in ensuring that questions related to the accuracy or
pneumonia severity score may still have utility in integrity of any part of the work are appropriately investigated and
determining patient disposition with respect to discharge or resolved.
hospitalization.
Funding and support: By Annals policy, all authors are required to
Future studies will be required to expand on this work in disclose any and all commercial, financial, and other relationships
a number of ways. First, external validation is needed, as is in any way related to the subject of this article as per ICMJE conflict
comparison with physician judgment. Second, future of interest guidelines (see www.icmje.org). Dr. Wilson
studies may evaluate prospective robustness and utility of acknowledges funding from the National Institutes of Health
this scoring metric. Third, we expect related models to be R01DK113191 and P30DK079310. Dr. Schulz was an investigator
for a research agreement, through Yale University, from the
extended to patient admission decisions as well as Shenzhen Center for Health Information for work to advance
continuous hospital monitoring.41-43 Fourth, we anticipate intelligent disease prevention and health promotion; collaborates
potential applications in stratifying patients for therapeutic with the National Center for Cardiovascular Diseases in Beijing; is
interventions. Early proof-of-concept studies for the viral a technical consultant to HugoHealth, a personal health
ribonucleic acid polymerase inhibitor remdesivir included information platform; is cofounder of Refactor Health, an artificial
intelligence–augmented data mapping platform for health care;
patients with severe COVID-19 as defined by pulse
and is a consultant for Interpace Diagnostics Group, a molecular
oximetry level of less than or equal to 94% on ambient air diagnostics company.
or with any oxygen requirement.44,45 Given ongoing drug
Publication dates: Received for publication May 23, 2020.
scarcity, improved pragmatic, prognostic tools such as the
Revisions received June 17, 2020, and July 2, 2020. Accepted for
quick COVID-19 Severity Index may offer a route to publication July 13, 2020.
expanded inclusion criteria for ongoing trials or for early
Trial registration number: XXXXXXXX
identification of patients who might benefit from
therapeutics. Funding sources had no involvement in the design of the study.
Taken together, these data show that the quick Researchers are independent from funders.
COVID-19 Severity Index provides easily accessed risk
stratification relevant to ED providers. REFERENCES
1. World Health Organization. Novel coronavirus (2019-nCoV) situation
Drs. Haimovich and Ravindra served as co-first authors reports; 2020. Available at: https://www.who.int/emergencies/
and contributed equally to the work. diseases/novel-coronavirus-2019/situation-reports/.
2. CDC U. Coronavirus disease 2019 (COVID-19) cases in US; 2020.
Available at: https://www.cdc.gov/coronavirus/2019-ncov/cases-
Supervising editor: Gregory J. Moran, MD. Specific detailed updates/cases-in-us.html.
information about possible conflict of interest for individual editors 3. Singer AJ, Morley EJ, Meyers K, et al. Cohort of 4404 persons under
is available at https://www.annemergmed.com/editors. investigation for COVID-19 in a NY hospital and predictors of ICU care
and ventilation. Ann Emerg Med. 2020.
Author affiliations: From the Department of Emergency Medicine 4. Haimovich A, Warner F, Young HP, et al. Patient factors associated with
(Haimovich, Stoytchev, Taylor), Department of Internal Medicine, SARS-CoV-2 in an admitted emergency department population.
Section of Cardiovascular Medicine (Ravindra, van Dijk), Available at: https://onlinelibrary.wiley.com/doi/abs/ 10.1002/emp2.
Department of Internal Medicine (Young, Wilson), Clinical and 12145.
Translational Research Accelerator, Department of Medicine 5. Chan PS, Jain R, Nallmothu BK, et al. Rapid response teams: a
systematic review and meta- analysis. Arch Intern Med.
(Wilson), Center for Medical Informatics (Schulz, Taylor), and
2010;170:18-26.
Department of Laboratory Medicine (Schulz), Yale University 6. Badawi O, Liu X, Berman I, et al. Impact of COVID-19 pandemic on
School of Medicine, New Haven, CT; the Department of Computer severity of illness and resources required during intensive care in the
Science, Yale University, New Haven, CT (Ravindra, van Dijk); and greater New York City area. Available at: https://www.medrxiv.org/
the Center for Outcomes Research and Evaluation, Yale New content/early/2020/04/14/2020.04.08.20058180.
Haven Hospital, New Haven, CT (Young, Schulz). 7. Kennedy M, Joyce N, Howell MD, et al. Identifying infected emergency
department patients admitted to the hospital ward at risk of clinical
Author contributions: AH, NGR, and RAT designed the project. NGR, deterioration and intensive care unit transfer. Acad Emerg Med.
HPY, WLS, and RAT extracted and processed the data. AH, NGR, 2010;17:1080-1085.
FPW, DvD, and RAT created the models. SS designed the Web 8. Simchen E, Sprung CL, Galai N, et al. Survival of critically ill patients
interface. All authors contributed substantially to article revisions. hospitalized in and out of intensive care. Crit Care Med.
RAT takes responsibility for the paper as a whole. 2007;35:449-457.
9. Berlin DA, Gulick RM, Martinez FJ. Severe Covid-19. N Engl J Med.
All authors attest to meeting the four ICMJE.org authorship criteria: https://doi.org/10.1056/NEJMcp2009575.
(1) Substantial contributions to the conception or design of the 10. Wu Z, McGoogan JM. Characteristics of and important
work; or the acquisition, analysis, or interpretation of data for the lessons from the coronavirus disease 2019 (COVID-19) outbreak
in China: summary of a report of 72 314 cases from the Chinese architectures. In: Proceedings of the 30th International Conference on
Center for Disease Control and Prevention. JAMA. International Conference on Machine Learning, Volume 28. ICML ‘13.
2020;323:1239-1242. 2013:I-115-I-123.
11. Horwitz LI, Green J, Bradley EH. US emergency department 28. Saito T, Rehmsmeier M. The precision-recall plot is more informative
performance on wait time and length of visit. Ann Emerg Med. than the ROC plot when evaluating binary classifiers on imbalanced
2010;55:133-141. datasets. PLoS One. 2015;10.
12. Maves RC, Downar J, Dichter JR, et al. Triage of scarce critical care 29. Efron B, Tibshirani RJ. An Introduction to the Bootstrap. No. 57 in
resources in COVID-19: an implementation guide for regional Monographs on Statistics and Applied Probability. Boca Raton, FL:
allocation: an expert panel report of the Task Force for Mass Critical Chapman & Hall/CRC; 1993.
Care and the American College of Chest Physicians. Chest. 2020. 30. Lundberg SM, Lee SI. A unified approach to interpreting model
13. van Walraven C, Austin PC, Jennings A, et al. A modification of the predictions. In: Advances in neural information processing systems;
Elixhauser comorbidity measures into a point system for hospital death 2017:4765-4774.
using administrative data. Med Care. 2009:626-633. 31. Lundberg SM, Nair B, Vavilala MS, et al. Explainable machine-learning
14. Singer M, Deutschman CS, Seymour CW, et al. The Third International predictions for the prevention of hypoxaemia during surgery. Nat
Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). JAMA. Biomed Eng. 2018;2:749.
2016;315:801-810. 32. Artzi NS, Shilo S, Hadar E, et al. Prediction of gestational diabetes based
15. Ferreira M, Blin T, Collercandy N, et al. Critically ill SARS-CoV- on nationwide electronic health records. Nat Med. 2020;26:71-76.
2–infected patients are not stratified as sepsis by the qSOFA. Ann 33. Niculescu-Mizil A, Caruana R. Predicting good probabilities with
Intensive Care. 2020;10:1-3. supervised learning. In: Proceedings of the 22nd International
16. Lim WS, Van der Eerden M, Laing R, et al. Defining community acquired Conference on Machine Learning; 2005:625-632.
pneumonia severity on presentation to hospital: an international 34. Backus B, Six A, Kelder J, et al. A prospective validation of the HEART
derivation and validation study. Thorax. 2003;58:377-382. score for chest pain patients at the emergency department. Int J
17. Ilg A, Moskowitz A, Konanki V, et al. Performance of the CURB-65 Cardiol. 2013;168:2153-2158.
score in predicting critical care interventions in patients admitted 35. Zhang C, Shi L, Wang FS. Liver injury in COVID-19: management and
with community-acquired pneumonia. Ann Emerg Med. challenges. Lancet Gastroenterol Hepatol. 2020.
2019;74:60-68. 36. Cai Q, Huang D, Yu H, et al. Characteristics of liver tests in COVID-19
18. Moons KG, Altman DG, Reitsma JB, et al. Transparent Reporting of a patients. J Hepatol. 2020.
Multivariable Prediction Model for Individual Prognosis or Diagnosis 37. Wong HYF, Lam HYS, Fong AHT, et al. Frequency and distribution of
(TRIPOD): explanation and elaboration. Ann Intern Med. chest radiographic findings in COVID-19 positive patients. Radiology.
2015;162:W1-W73. 2020:201160.
19. McPadden J, Durant TJ, Bunch DR, et al. Health care and precision 38. Chung M, Bernheim A, Mei X, et al. CT imaging features of 2019 novel
medicine research: analysis of a scalable data science platform. J Med coronavirus (2019-nCoV). Radiology. 2020;295:202-207.
Internet Res. 2019;21:e13043. 39. Zhang K, Liu X, Shen J, et al. Clinically applicable AI system for
20. First DataBank. First DataBank enhanced therapeutic classification accurate diagnosis, quantitative measurements and prognosis of
system (ETC). Available at: http://www.firstdatabank.com/Products/ COVID-19 pneumonia using computed tomography. Cell.
therapeutic-classification-system-nddf.aspx. 40. Rajpurkar P, Irvin J, Zhu K, et al. CheXNet: radiologist-level pneumonia
21. Elixhauser A, Steiner C, Harris DR, et al. Comorbidity measures for use detection on chest X-rays with deep learning. Available at: http://arxiv.
with administrative data. Med Care. 1998:8-27. org/abs/1711.05225.
22. Agency for Healthcare Research and Quality. HCUP Tools and 41. Henry KE, Hager DN, Pronovost PJ, et al. A targeted real-time early
Software. Healthcare Cost and Utilization Project (HCUP). Rockville, warning score (TREWScore) for septic shock. Sci Transl Med.
MD: Agency for Healthcare Research & Quality; 2020. Available at: 2015;7:299ra122.
http://www.hcup-us.ahrq.gov/tools_software.jsp. 42. Simonov M, Ugwuowo U, Moreira E, et al. A simple real-time model for
23. Cohen SB, Ruppin E, Dror G. Feature Selection Based on the Shapley predicting acute kidney injury in hospitalized patients in the US: a
Value. In: IJCAI. vol. 5; 2005. p. 665-670. descriptive modeling study. PLoS Med. 2019;16.
24. Guyon I, Elisseeff A. An introduction to variable and feature selection. 43. Tomasev N, Glorot X, Rae JW, et al. A clinically applicable approach to
J Mach Learn Res. 2003;3:1157-1182. continuous prediction of future acute kidney injury. Nature.
25. Lundberg SM, Erion G, Chen H, et al. From local explanations to global 2019;572:116-119.
understanding with explainable AI for trees. Nat Mach Intell. 44. Grein J, Ohmagari N, Shin D, et al. Compassionate use of remdesivir
2020;2:2522-5839. for patients with severe Covid-19. N Engl J Med. 2020.
26. Chen T, Guestrin C. Xgboost: a scalable tree boosting system. In: 45. Wang Y, Zhang D, Du G, et al. Remdesivir in adults with severe
Proceedings of the 22nd acm sigkdd international conference on COVID-19: a randomised, double-blind, placebo-controlled, multicentre
knowledge discovery and data mining; 2016:785-794. trial. Lancet. 2020.
27. Bergstra J, Yamins D, Cox DD. Making a science of model search: 46. Janssen A, Pauls T. How do bootstrap and permutation tests work?
hyperparameter optimization in hundreds of dimensions for vision Ann Stat. 2003;31:768-806.