Comparing Care Pathways Between COVID-19 Pandemic Waves Using Electronic Health Records: A Process Mining Case Study

Georgiev, Konstantin; Fleuriot, Jacques D.; Papapanagiotou, Petros; McPeake, Joanne; Shenkin, Susan D.; Anand, Atul

doi:10.1007/s41666-024-00181-6

Comparing Care Pathways Between COVID-19 Pandemic Waves Using Electronic Health Records: A Process Mining Case Study

Research Article
Open access
Published: 23 December 2024

Volume 9, pages 41–66, (2025)
Cite this article

Download PDF

You have full access to this open access article

Journal of Healthcare Informatics Research Aims and scope Submit manuscript

Comparing Care Pathways Between COVID-19 Pandemic Waves Using Electronic Health Records: A Process Mining Case Study

Download PDF

Konstantin Georgiev¹,
Jacques D. Fleuriot²,
Petros Papapanagiotou³,
Joanne McPeake⁴,
Susan D. Shenkin⁵ &
…
Atul Anand¹

1459 Accesses
2 Citations
2 Altmetric
Explore all metrics

Abstract

The COVID-19 pandemic caused rapid shifts in the workflow of many health services, but evidence of how this affected multidisciplinary care settings is limited. In this data study, we propose a process mining approach that utilises timestamped data from electronic health records to compare care provider patterns across pandemic waves. To investigate healthcare patterns during the pandemic, we collected routine events from Scottish hospital episodes in adults with COVID-19 status, generating treatment logs based on care provider input. Conformance checking metrics were used to select the Inductive Miner infrequent (IMi) algorithm for downstream analysis. Visual diagrams from the discovered Petri Nets indicated interactions on provider- and activity-level data subsets. Measures of “cross-log conformance checking” and graph edit distance (GED) further quantified variation in care complexity in adverse subgroups. Our baseline cohort included 1153 patients with COVID-19 linked to 55,212 relevant care provider events. At the conformance checking stage, the IMi model achieved good log fitness (\(\overline{LF }\) \(=0.95\)) and generalisation (\(\overline{G }\) \(=0.89\)), but limited precision (\(\overline{PR }\) \(=0.27\)) across all granularity levels. More structured care procedures were present in Wave 1, compared to limited multidisciplinary involvement in Wave 2. Care activities differed in patients with extended stay (\(GED=348, \overline{ PR }\) \(=0.231\) vs \(GED=197, \overline{PR }\) \(=0.429\) in shorter stays). We demonstrated that process mining can be incorporated to investigate differential complexity in patients with COVID-19 and derive fine-grained evidence on shifts in healthcare practice. Future process-driven studies could use clinical oversight to understand operational adherence and driving factors behind service changes during pressured periods.

Transitional Care in Times of COVID 19: Opportunities for e-Health Interventions

COVID-19 outbreaks surveillance through text mining applied to electronic health records

Article Open access 28 March 2024

Bridging the Inferential Gaps in Healthcare

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The assessment, diagnosis, and management of unwell patients require the integrated work of multidisciplinary health and care teams. The increased workload, staff shortages, and the need for re-deployment during the COVID-19 pandemic caused an unprecedented reorganisation of hospital resources, particularly concerning older adults [1]. Older people recovering from severe COVID-19 conditions often require intensive rehabilitation services to regain physical function as a result of increased risk of infection and subsequent adverse outcomes [2]. The need for integrated treatment is particularly relevant for this population, as older patients often experience deconditioning after acute illness and may have a range of complex co-existing physical and mental conditions.

While striving to provide more efficient treatment for a COVID-19-affected population, the transition from Wave 1 to Wave 2 of the global pandemic also resulted in notable shifts in health and care services. Wave 2 was associated with a reduced risk of adverse outcomes in the hospital, including shorter stays and lower mortality risk [3]. In more complex cases, such as patients in intensive care units (ICU), there was a reduction of frequency in physiotherapy services, which had a negative effect on the time to first mobilisation and increased the need for continued rehabilitation after discharge [4]. The public health policy landscape rapidly changed with the introduction of shielding measures, access to testing sites and the vaccination rollout, and operational changes for the delivery of primary care. There is still surprisingly little description of changes in the interaction patterns of health and care providers for patients hospitalised with COVID-19 between Waves 1 and 2. More evidence is required to quantify these differences and establish a baseline for improvement of access to health services in the event of a future pandemic.

The increasing use of electronic health records (EHR) across the UK and abroad, integrated into modern healthcare information systems, has allowed the routine collection and analysis of detailed episodes of care, including recorded activity associated with nursing and rehabilitation staff. The use of big data has also progressed the development of tools for population-level analytics in COVID-19 patients [5]. Process mining (PM) techniques utilise timestamped event data and provide solutions that can observe “real-world behaviour” and visualise it in a manner that is understandable by non-specialists [6, 7]. A process model in the healthcare domain can be treated as a graphical representation of a series of operations performed by multidisciplinary teams. These events can be represented using the Business Process Modelling Notation (BPMN). BPMN diagrams are end-to-end flowcharts suitable for describing the interactions within a business model, including sequences in healthcare pathways [8, 9]. Workforce planning datasets across NHS Lothian, Scotland, now record timestamped interactions between hospital care providers and patients. This enables the use of PM techniques to automatically discover and compare the differences in healthcare interaction patterns.

In this case study, we used routine EHR data to extract COVID-19 hospital episodes and linked provider events, separated between the first two pandemic waves. We used these data as the basis for a PM approach consisting of four key stages:

1)
PM algorithm comparison and selection for downstream analysis through conformance checking measures.
2)
Development of patient subgroups for evaluating the differences in care provision for people with adverse outcomes and increased treatment complexity.
3)
Cross-log conformance checking performed across the patient subgroups, including distance measures for graph similarity between Waves 1 and 2.
4)
Visual process interpretation through the generation of BPMN diagrams discovered using the selected PM algorithm.

Through the output of this analysis, we aim to answer the following:

What are the key differences in contact patterns between Waves 1 and 2? Are they consistent across the provider-level and the activity-level?
How prevalent are these differences in patients with complex need (e.g. older people, patients with out-of-hours provision)?
Can we use PM to discover inherent patterns in the general population that can be linked to the underlying issues of the pandemic?
What challenges and considerations for further development of clinically meaningful process models can we identify in the context of COVID-19?

2 Methods

2.1 Data Sources

Two primary data sources were required to develop our event log and subpopulation for the study: the COVID-19 admission episodes and the linked provider contacts data. The healthcare contacts data were initially separated into two workforce planning datasets from the TrakCare (InterSystems, MA) EHR system. The first dataset included the timestamped points of contact linked to any routine or specialised care provider (e.g. nurse or general medical doctor). The second dataset consisted of additional metadata relevant to specific rehabilitation providers (e.g. physiotherapists and dieticians), including the intervention duration (total minutes) and details of the activity performed. The former helped generate the “provider-level metadata” for our event log, while the latter was used as an extension for providing the activities of the captured individuals, where possible. Intervention coding was based on a defined list of system input options covering care provider (e.g. physiotherapist) and their activities (e.g. mobilisation). This additionally split activities into assessments and interventions which were reviewed by a clinician to remove codes not associated with direct patient activity (e.g. administration). We also excluded activities coded for fewer than ten patients in our cohort. Data were processed and extracted within the DataLoch service (Edinburgh, UK) and analysed within their Secure Data Environment (SDE).

2.2 Data Collection and Preprocessing

Our cohort of interest consisted of first hospitalisations in adult patients (aged 18 and above) who tested positive for COVID-19. We collected patient records with acute hospital admission to one of three hospitals in Edinburgh, Scotland (NHS Lothian region), between March 2020 and March 2021. Hospital admissions were identified as COVID-19 positive when any reverse transcriptase-polymerase chain reaction (RT-PCR) test was recorded during hospital stay or within 10 days of admission from community testing. Only the first hospital admission period was included for patients meeting this criterion multiple times. For patients meeting this criterion across both Wave 1 and 2, we included only their admission episode recorded during Wave 1. The cutoff points chosen for the two waves corresponded with the state of the UK pandemic (distribution plots available in Appendix B.1):

Wave 1 (between 1st March 2020 and 31st May 2020)
Wave 2 (between 1st September 2020 and 31st March 2021)

We restricted our cohort to having at least two contacts with at least two different care provider types recorded in our dataset to limit the population to those truly admitted for ongoing care in hospital, rather than very brief attendances for assessment. We additionally excluded cases with in-hospital death, to focus on complete patterns of care in survivors to hospital discharge. To obtain a focused view of “inter-disciplinary relationships”, we excluded self-loops present in the event log, that is, any repeating consecutive entries with the same provider over time. To avoid overlapping cases between Waves 1 and 2, we limited patient follow-up in the hospital to 90 days from admission. These steps ensured that we collected hospital events representative of urgent and routine care across multiple disciplines required for recovery to a sufficient level. Emergency department medical and nursing contacts before hospital ward admission were not captured electronically, and this element of early care could not be included in the analysis.

For our subgroup analysis, we identified six categories consisting of patients with complex care needs and adverse outcomes:

> 75 years old (population median across both waves)
Admitted to ICU
Multimorbidity (presence of two or more long-term conditions at admission)
1-year all-cause mortality (from the point of discharge)
Extended hospital stay (> 30 days)
High out-of-hours care contacts (≥ 80th percentile of patients over both waves)

We defined out-of-hours care as any contact timestamped between 7 PM and 8 AM, regardless of the day of the week. The listed baseline characteristics included age, sex, and the Scottish Index of Multiple Deprivation (SIMD), representing a relative estimate of socioeconomic status summarised through deprivation quintiles [10]. The presence of long-term conditions was extracted through national administrative records (Scottish Morbidity Record 01) covering prior hospitalisation records with relevant ICD-10 coding, using code lists defined previously by the CALIBER research group [11]. In our case, the complete list of previous conditions consisted of ischaemic heart disease, myocardial infarction, stroke, diabetes, obesity, dementia, delirium, depression, asthma, and chronic obstructive pulmonary disease (COPD).

2.3 Process Mining Algorithms and Definitions

To describe how our PM approach captures the differences in the underlying care patterns in Waves 1 and 2, we must first define the representation of our event logs. Let \(L\) denote a standardised event log over the complete population, represented by a multiset of traces \(\{{t}_{1}, {t}_{2},\dots , {t}_{m}\}\in T\). Each trace \({t}_{i}\) describes a sequence of cases with an allocated timestamp from the care provider data. The number of cases captured by \({t}_{i}\) depends on the desired level of analysis and can be in the range \([2, {n}_{L}]\), where \({n}_{L}\) is the maximum amount of timestamped events for an individual patient. Process modelling is typically associated with discovering graph representations of \(L\), consisting of a series of interactions between nodes (\(N\)) and edges (\(E\)). Depending on the data structure, each edge may also include a frequency weight proportional to the prevalence of a sequence in the event log, as commonly adopted in Directly-Follows Graphs (DFGs).

The most common way to express a process model is through a Petri Net representation, applied in all described PM techniques. Petri Nets are directed bipartite graphs that can be characterised through node transitions between places (individual states identified from the event log), also known as arcs. A Petri Net \(P\) can be expressed as a tuple \(\left(x,y, z, {m}_{0}\right)\), where \(x\) and \(y\) are, respectively, the sets of places and transitions describing the care patterns data, \(z\subseteq \left(x\times y\right)\cup \left(y\times x\right)\) is the set of flow relations (or the set of graph arcs), and \({m}_{0}\) is the initial marking of \(P\). The initial marking is an input configuration defined by assigning a number of input “tokens” to a respective place. This allows observation of concurrent executions in the Petri Net and identifies place reachability. Let us explore how the transitions of a patient requiring multidisciplinary care are expressed using this notation.

We provide a visual example of this concurrent flow in Fig. 1, describing an intermediate state of the pathway \({P}_{s}\) and its follow-up sequence \({P}_{s+1}\). The initial marking of \({P}_{s}\) contains two tokens in the post-admission state (\({x}_{1}\)), one token prior to the split for mobility intervention (\({x}_{7}\)), one token after enteral tube feeding (nutritional support) intervention (\({x}_{8}\)), and one token prior to the split for general medical doctor visits (\({x}_{3}\)). At this stage, there are three enabled transitions in this part of the pathway: one post-admission (\({y}_{1}\)), one skipping initial general medical doctor input (\({y}_{4}\)), and one prior to the next transition for mobility intervention (\({y}_{3}\)). In this example, we assume that each enabled transition \({y}_{i}\) only consumes or produces one token at a time. After executing \({P}_{s}\to {P}_{s+1}\), these transitions are “fired” concurrently:

\({y}_{1}\) consumes a token from \({x}_{1}\) at the “post-admission state” and produces two tokens: one prior to eating, drinking, and swallowing assessment at \({x}_{2}\) and one at \({x}_{3}\).
\({y}_{4}\) is enabled, so it consumes a token from \({x}_{3}\) and produces one at \({x}_{10}\), skipping the general medical doctor nodes.
\({y}_{3}\) consumes the token from \({x}_{7}\) and produces a token at \({x}_{9}\), prior to mobility intervention.
The transition leading back to enteral tube feeding is disabled; therefore, \({x}_{8}\) must proceed forward, and emergency respiratory treatment is performed.

The remaining tokens can now be used to fire the enabled transitions \({y}_{4}\) and \({y}_{5}\) in the sequence \({P}_{s+1}\to {P}_{s+2}\). In these following events, \({y}_{4}\) will consume the token from \({x}_{3}\) and produce a token in \({x}_{10}\) and \({y}_{5}\) will consume a token from \({x}_{9}\) and produce a token in \({x}_{11}\) leading up to a mobility-related physiotherapy treatment. Additionally, at the current step, there are no enabled transitions after \({x}_{2}\), which indicates that an assessment for eating, drinking, and swallowing problems is executed, transferring this token over to \({x}_{4}\). Thus, the Petri Net captures these transitions in care as a dynamic system of discrete events, where any deviation from the original event log will be reflected during execution (e.g. missing tokens during an enabled transition). The BPMN diagram is a simplified version of a Petri Net that does not include transitional states but describes the overall process flow through decision nodes: AND-split, AND-join, XOR-split, and XOR-join. These nodes indicate the discovered constraints by the process model for the respective granularity level.

In our baseline performance comparison, we evaluated four process discovery algorithms: Alpha Miner (AM), Inductive Miner infrequent (IMi), Inductive Directly-Follows Miner (IDFG), and Heuristics Miner (HM). A summary of each of these algorithms and their ways of discovering a Petri Net from an input event log is given in Appendix A.1. The noise threshold \(\theta \in [\text{0,1}]\) present in IMi, IDFG, and HM (lower value indicates a stronger adjustment) is used to reduce model complexity and over-approximation of decision nodes that can result in intractable paths within the Petri Net. In the context of HM, \(\theta\) is used to prune weak causal dependencies (eventually follows relations) in the model.

2.4 Conformance Checking for Performance and Subgroup Analysis

Similarly to the example shown in Fig. 1, we use a technique known as “token-based replay” [12] to measure conformance between a Petri Net and an event log or similarity between two discovered Petri Nets. After executing the sequences across the enabled transitions in the Petri Net, a set of four values (\({n}_{m}, {n}_{c}, {n}_{p}, {n}_{r})\in {t}_{i}\) is computed for each trace, describing the number of missing, consumed, produced, and remaining tokens, respectively. These values are used to estimate the cumulative log fitness (\(LF\in [\text{0,1}]\)), where a higher value means the final marking has been reached with less remaining and missing tokens from the Petri Net. Other conformance measures include precision (\(PR\in \left[\text{0,1}\right]\)), [13] which looks at the accuracy of follow-up transitions for each marking and generalisation (\(G\in \left[\text{0,1}\right]\)), [14] which measures the average transition frequency compared to the original event log. Some well-known problems of token-based replay as an algorithm include token flooding (over-population of missing tokens for vastly different cases) and long transitions due to many identified fall-throughs severely affecting computational speed. However, we use an improved version that identifies the local shortest paths between places with enabled transitions [15]. This localisation technique allows for more robust diagnostics in model deviations when performing token-based replay. Additionally, we use the graph edit distance [16] (\(GED\in [1, \infty ))\) as a quantitative measure of similarity between two Petri Nets (\({P}_{i}, {P}_{j}\)), describing the minimum number of edit operations (insertion, deletion, or substitution of nodes/edges) required to transform \({P}_{i}\) into \({P}_{j}\). By default, each edit operation is assigned a cost of 1, which is accumulated upon traversing the graphs.

In the context of this case study, we use \({L}_{1}\) and \({L}_{2}\) to denote the sub-logs representing Waves 1 and 2 and \({P}_{1}\) and \({P}_{2}\) to denote the discovered Petri Nets of these sub-logs, respectively. When we perform performance comparison, we use token-based replay on a Petri Net against its original event log, i.e. we estimate the log fitness (\(LF\left({P}_{i}\right)\)), precision (\(PR\left({P}_{i}\right)\)), and generalisation (\(G\left({P}_{i}\right)\)) per cohort. When we perform subgroup analysis, we replay the Petri Nets against their counterpart event logs, i.e. we estimate the “cross-log fitness” (\(LF({L}_{i}, {P}_{j})\)), precision (\(PR({L}_{i}, {P}_{j})\)), and generalisation (\(G({L}_{i}, {P}_{j})\)). Further details regarding the process model definitions and estimation of the conformance checking measures are provided in Appendix A.2.

2.5 Implementation Tools and Software

The primary PM tools used for generating the analysis in this case study include bupaR, pm4py, and ProM. All preliminary data extraction, preprocessing, and process model generation were performed within the SDE using R (version 4.2.0). We used the base bupaR framework [17] (version 0.5.2) to generate the target event log, produce summary statistics, and generate the DFGs used in the IDFG models. Then, to obtain the Petri Nets (except for IDFG) and evaluate them using token-based replay, we used the R interface to the pm4py library [18] (version 1.2.7). We imported the generated Petri Net markings in Petri Net Markup Language (PNML) format to ProM [19] (version 6.13) and converted them into BPMN diagrams for visual analysis using the function “Convert Petri net to BPMN diagram”.

Additionally, we used ProM to discover the IDFG Petri Nets from the acquired DFGs. This was done using the function “Mine Petri net with Inductive Miner – directly-follows”. Afterwards, the IDFG Petri Nets were evaluated using the same approach but on the standard pm4py library (version 2.7, Python 3.10). To generate the GEDs between two input Petri Nets, we used the NetworkX (version 3.2) package in Python with a fixed timeout limit of 30 s.

2.6 Overview of the Approach

Now, we can review our overall pipeline and the steps required to produce the performance and subgroup analysis for this case study. The concept flowchart in Fig. 2 highlights these steps. We first determine the positive COVID-19 episodes linked to the EHR data and use the described metadata to extract patient outcomes and stratify the population into cohorts representing Waves 1 and 2.

We used the care provider contacts data to code and aggregate provider and activity-level events, which allowed us to generate four event log variants across the two waves. The next step involved process discovery, where each model type was evaluated using token-based replay on the generated Petri Net against its original event log. Then, the best-performing process model was selected for subgroup analysis. We additionally converted the Petri Nets to BPMN diagrams to visually compare the quality of the discovered process models.

3 Results

3.1 Patient and Event Log Summary Data

In our final population, we included 1153 patients presenting with COVID-19, with 302 (26%) in Wave 1 and 851 (74%) in Wave 2. The baseline characteristics and outcomes for the two cohorts are presented in Table 1. We can observe that the extracted episodes of care detailed predominantly older patients (76 [62, 85] years). People accessing healthcare services in Wave 1 were based in less deprived regions in Scotland (p = 0.003 across the SIMD category). We measured the differences across some of the most common admitting specialties, with respiratory care being more prevalent in Wave 1 (9% vs 2% in Wave 2). Among the reported outcomes, only death at 1-year was significantly different (19% in Wave 2 vs 13% in Wave 1, p = 0.018). Patients in Wave 1 received a higher number of “rehabilitation-specific sessions”, with 19 (6, 48) vs 15 (6, 35) contacts in Wave 2, p = 0.011 across 410 (130, 935) vs 292 (110, 710) rehabilitation minutes in Wave 2, p = 0.02.

Table 1 Baseline characteristics describing the two COVID-19 cohorts

Full size table

Our complete standardised event log consisted of 55,212 events on both the activity and provider levels (15,954 in Wave 1 and 37,912 in Wave 2). The breakdown of each coded care provider and defined activity can be observed in Table 2. Overall, we collected 11 unique provider types and 23 related activities, which were present in both waves. Unsurprisingly, the most common activity type involved standard nursing care services. The most prevalent specialist care was related to physiotherapy and occupational therapy. This set was substantially more limited as 14% of events were not identified due to missing data (labelled OTHER). A further 12% (REHAB-ANY-INT) were identified from the rehabilitation dataset but did not contain a sufficient description. Additionally, another 1% of the treatments (TRTFAILED) were not completed because the patient was too unwell or declined a rehabilitation session. Despite these limitations, the activity set provided more detailed dependencies for rehabilitation care provision. Similar to the length of stay, we also observed a slightly higher overall case duration per patient in Wave 2 (26 days vs 21 days in Wave 1).

Table 2 Summary of coded events and event log metadata

Full size table

3.2 Model Performance Comparison

Using the above-defined PM algorithms and measures of conformance checking, we applied token-based replay and computed the log fitness, precision, and generalisation of each discovered Petri Net against its original event log. We additionally tested the models with more relaxed (\(\theta =0.7\)) and strict (\(\theta =0.99\)) noise and dependency thresholds. The full results displayed in Table 3 show that, overall, there is always a tradeoff between precision and log fitness, and it is difficult to achieve good precision when the model effectively captures the complete set of states present in the event log.

Table 3 Model performance comparison in different process mining algorithms

Full size table

Models, such as AM and IDFG (dependencies highlighted in Appendix B.2), capture a higher model precision but represent a smaller subset of the event log behaviour. In contrast, the IMi and HM models with strict thresholds capture almost all state transitions but achieve poor precision. Ultimately, the best model for this case is achieved by relaxing the noise threshold \(\theta\) for the IMi model, limiting some of the captured behaviour (\(\overline{LF }\) \(=0.964\) on the provider-level and \(\overline{LF }\) \(=0.94\) on the activity-level) but representing a better fit for follow-up transitions (\(\overline{PR }\) \(=0.279\) on the provider-level and \(\overline{PR }\) \(=0.251\) on the activity-level). The IMi model generalisation was also the highest among models that achieved a \(L{F}_{i}>0.9\). It is also worth noting that upon adding this noise threshold, precision was still relatively poor for Wave 2 (particularly on the activity level) due to higher sequencing variability than Wave 1.

3.3 BPMN Diagrams of the Discovered Process Models

In the following analysis, we used the output Petri Nets for each subset of the IMi model to visualise the discovered interaction patterns. We converted them to BPMN format, which highlighted the care provider dependencies through decision nodes (“ + ” nodes indicating dependent activities (AND-join, AND-split) and “X” nodes indicating exclusive choice operations (XOR-join, XOR-split)). Figure 3 shows the discovered set of provider-level dependencies. In Wave 1, nursing care (NURSE) and occupational therapy (OT) visits occurred independently, while the middle part of the diagram included multiple levels of interaction rules. For example, physiotherapy (PT) contacts were associated with multidisciplinary care from other providers including general medical doctors (GMD), speech and language therapy (SLT), and dietician support (DT). By comparison, the Wave 2 BPMN highlighted that many providers were more independently linked to discharge, with limited multidisciplinary care. The only independent “AND-split” rule discovered by the process model was linked to interactions between dieticians (DT) and specialised nurses (S_NURSE), suggesting combined care for those requiring complex feeding support [20]. The limited involvement in multidisciplinary care during Wave 2 may be explained by lower disease severity. As seen in Table 1, overall rehabilitation contacts were fewer in this cohort. This may also mean that hospital pressures in Wave 2 resulted in more focus on single-specialty-led treatment rather than multidisciplinary care.

A similar pattern is observed in the “activity-level BPMN diagrams” (Fig. 4). In Wave 1, activities such as specialist nursing care (SP-NURSING) were executed early in the pathway with downstream dependencies for further activities that followed. In contrast, in the Wave 2 BPMN, more simultaneous activity was observed with fewer dependencies. As it might be expected, associations were present between SLT and DT activities, such as interventions for eating, drinking, and swallowing (SLT-EDS-A, SLT-EDS-INT) and food fortification (NUTR-FF-INT). With the return of hospital referrals in Wave 2, it is possible that this affected the required coordination to streamline the patient journey. As referrals were not generally provided in Wave 1 and the focus was on immediate involvement in active cases, it may be less surprising that the process models discovered more structured patterns during this time.

3.4 Comparison Between Complex Patient Subgroups

After observing the underlying care patterns through the BPMN diagrams, we now focus on the similarities of individual subgroups across waves and the differences between care provision for adverse and non-adverse cases. In this subgroup analysis, we highlighted these differences using the cross-log conformance checking method and the Petri Net GED (Table 4). Across all subgroup types, we observed relatively high log fitness (\(\overline{LF }\) \(>0.9\)), indicating that the trace subtypes were not substantially different. Precision was relatively higher on the activity-level, where model behaviour and follow-up sequences of activities were more similar between the waves. Generalisation was slightly better when replaying the log of Wave 2 against the Petri Nets of Wave 1 (\(G({L}_{2},{W}_{1})\)) compared to the opposite. On the activity-level, differences between complex and less complex cases were more prevalent than at the provider-level, as indicated by the GED and the log fitness. The most remarkable differences were in pathways in older patients (according to the GED) and pathways with high out-of-hours care (according to the log fitness). Intensive therapy was more standardised between the waves. This was expected, considering this specialty contains intervention sequences and timing guidelines. Care activities were substantially different in patients with an extended stay in hospital (\(GED=348, \overline{P }\) \(=0.231\) in the positive group and \(GED=197, \overline{P }\) \(=0.429\) in the negative group). However, 1-year mortality, multimorbidity, and out-of-hours care were not associated with significant differences between groups. Overall, there were contradictions between the estimated differences in the provider and activity levels, but the activity subsets seemed to align better between the waves regarding observed behaviour. This resulted in more pronounced distances between adverse and non-adverse groups. These included increased variation in older groups and extended stay patients and decreased variation among the rest.

Table 4 Multi-level cross-log conformance checking comparison over each patient subgroup

Full size table

To better understand the implications of the GED scores, we visualised BPMN diagrams across subgroups with high and low variability between the pandemic waves. Regarding high variability, we focused on care pathways in patients older than 75 (Fig. 5). On the activity-level, Wave 2 pathways were clearly associated with a wider range of activities across multiple disciplines in these patients.

The Wave 2 BPMN reveals more interdisciplinary relationships between nursing and rehabilitation services with possible interactions between nutrition support (NUTR-A, NUTR-ONS-INT), physiotherapy (PT-A), and palliative care providers (PC-INPUT) linked to follow-up activities of physiotherapy (PT-MOBILITY-A) and medical doctor input (GMD-INPUT). The level of multidisciplinary input for these older patients was much greater than the overall population (Fig. 4b). This may suggest that a larger portion of healthcare resources were directed towards care for older people during Wave 2. Regarding low variability, we analysed the pathways of patients in intensive therapy with similar GED scores (Fig. 6). In this case, the range of activities was similar. In both waves, routine nursing care (ST-NURSING) and medical doctor inputs (GMD-INPUT) depended on other interdisciplinary domains. Additional BPMN diagrams are provided in Appendix B.3, detailing the pathways of the remaining subgroups. Furthermore, we performed a stratified conformance checking analysis across sex and SIMD quintiles (Appendix B.4). On the activity-level, care pathways in male patients varied more than in female patients. Interestingly, substantially fewer changes in intervention patterns were detected in patients from areas with high deprivation \((GED=80)\). We additionally analysed the trace coverage rates (using the “provider-level model”) between subgroups to better understand the log fitness (Appendix B.5). These results aligned with some of the findings from the “provider-level subsets” in Table 4. The linear relationships between the conformance checking metrics and the GED were also presented.

4 Discussion

Identifying and measuring care patterns throughout the COVID-19 pandemic is challenging due to the never-before-seen demand for hospital services. We proposed a comprehensive PM approach utilising EHR data to understand the variation in healthcare provider patterns between the first two pandemic waves. Through the model performance and BPMN diagrams, we observed the ability of the IMi model to identify split and join rules across disciplines and activities. At baseline, patients were similar in age and sex, but those in Wave 2 who survived hospital discharge had a significantly higher all-cause death within the following year. Patients in Wave 2 had a longer length of stay but with fewer rehabilitation sessions and associated minutes of therapy. Fewer dependencies between care providers were discovered in Wave 2, suggesting simpler care pathways than in Wave 1, except for those older than 75 years old and those with extended hospital stays, where complexity was higher in Wave 2. In a more detailed analysis, this variation was driven at the activity-level, with greater input differences from a similar range of care providers. In contrast, care provider and activity pathways for those who received high out-of-hours care and intensive therapy were similar between waves. It is possible that the reintegration of routine care alongside COVID-19 specialised care in Wave 2 introduced limited capacity in hospital systems and promoted less coordinated multidisciplinary care.

There are several strengths to our case study. Our use of routine healthcare data, including fine-grained health contacts and associated in-hospital activities, can be used to define novel quantifiable measures of differential complexity in treatment between two cohorts. We showcased that PM can enhance data-driven analytics over comparative studies through cross-log similarity measures between care pathways. Visual BPMN diagrams supported these estimates by showcasing specific changes in healthcare patterns on a fine-grained level. We selected the most suitable process model to perform process discovery through standardised conformance checking measures. The COVID-19 pandemic initiated a period of exceptional hospital pressures, resulting in novel shifts in treatment pathways as part of infection control measures. However, there is still limited understanding towards how well hospital systems adapted to these changes and how this affected acutely ill patients. Thus, we provided a detailed investigation of care coordination during the pandemic and its impact on care complexity in adverse groups.

Wave 2 could have been more negatively impacted than Wave 1 regarding care provision for several reasons. More patients may have been placed on waiting lists due to a lack of capacity or were unsuitable for early rehabilitation treatment. Deconditioning due to the lockdown measures could have also limited service engagement [21, 22]. This may also be linked to increased vulnerability at admission. Regardless, evaluating these patient states without analysing underlying patient frailty markers is difficult. On the other hand, it is also possible that COVID-19 impacted patient function less significantly during Wave 2, and patients did not require extensive rehabilitation and care services. Care variation increased substantially in the older population, which may indicate a shift in prioritisation in health services. Vaccination also likely contributed to the relatively milder presentations in Wave 2 [23, 24]. However, this data was not routinely available for analysis in this study.

We additionally observed some potential issues within the BPMN diagrams, highlighting the limitations of the process model (Figs. 3 and 4). Firstly, the relaxed noise constraint allowed self-loops for many activities, even though these were excluded in the data preprocessing step. While the noise threshold \(\theta\) reduces model complexity and limits infrequent behaviour to discover more refined patterns, it can introduce self-loops that deviate from the event log in the current setup. For example, periodic nurse monitoring combined with physiotherapist checkups are common occurrences in practice. Limiting behaviour across each cut operation through the IMi model can indirectly result in self-loops. Wave 2 did not highlight many general aspects of multidisciplinary care typically present during a hospitalisation episode. It was surprising to see many captured events happening independently. Ultimately, we can also attribute some of the inconclusive evidence to the underlying limitations of the extracted EHR data. As patients in Wave 2 were sampled over a longer period, this could partially be attributed to an imbalance in the variation of care sequences between the two cohorts. The range of activities and care providers in the dataset might not be sufficient to derive clinical insights. In hospital services, many other events linked to categories such as laboratory testing, prescribing, bed allocation, and surgery can considerably impact the recovery trajectory.

Process mining continues to be a rapidly evolving field, and its applications on EHR data can significantly aid clinicians and stakeholders in understanding what drives changes in treatment trajectories. Process models, such as IMi, can simplify and deconstruct sequences of care patterns, identifying important interactions between disciplines or lack thereof. We can measure adherence across subgroups, treatment levels, and cohorts through conformance checking and graph similarity measures. In clinical practice, this can pinpoint bottlenecks in complex cases over small sample sizes. PM approaches can also be fine-tuned to consider existing insights from the hospital practice, incorporating them as rules before process discovery.

This case study can serve as a starting point for future research incorporating PM and clinical oversight to generate interpretable clinical pathways across many clinical domains. Operational adherence can be defined in many ways using various quantitative measurements. The appropriate metric is challenging to define, especially in episodes of care with high variability. Therefore, using human judgement might be beneficial, especially where contradictions exist between different quality measures. Designing input rules in the COVID-19 domain is challenging. However, through more well-defined pathways in other long-term conditions (e.g. hip fracture and stroke care), we can look at adherence in subgroups and evaluate COVID-19 status as a confounding variable. Future data studies could expand this set of contact events to incorporate other routine data sources such as ward and speciality movements, blood testing, and comparisons with non-COVID populations. In this study, we focused on hospital survivors, but a statistical approach to estimating competing risks between in-hospital death and resource complexity (e.g. a modified threshold of the GED) could add to understanding across all patients. An updated cohort including events from later waves, vaccination status, and identification of COVID-19 variants could also help provide more robust interpretations of these data.

5 Related Work

The use of EHR-based analytics has been prominent in patients with COVID-19. Previous large-scale studies have used routine data to observe disease trajectories, treatment progression, and clinical outcomes [25]. Recent studies have also identified a “long-COVID” population (patients with persisting post-infection symptoms) in Scotland, using linked primary and secondary care data [26]. Previous work has also compared Wave 1 and 2 outcomes in older adults, highlighting lower frailty levels and improved mortality outcomes later in the pandemic [27]. Despite this body of research, patterns of contact with health and care providers are rarely observed due to the small amount of timestamped data recording inpatient activity. Outside of COVID-19, previous studies have collected basic nursing care activities in hospitalised patients [28]. In rehabilitation medicine, studies have retrospectively collected continuous intensity measures, such as the frequency and duration of physiotherapy visits, correlating them to mobility and the likelihood of home discharge [28,29,30,31]. Although these measures can serve as proxies for care complexity, it is well-known that most patients with high rehabilitation needs require input from more than one specialist (“multidisciplinary care”), which is not captured in these studies.

In the context of COVID-19 care pathways, PM studies have adapted DFGs in combination with traditional statistical analyses to model disease progression in patients, including events associated with acute rehabilitation care [32]. However, in this instance, there are no strict rules regarding the timing or visit sequences between different care providers, so the use of weighted DFGs would be less meaningful for clinical interpretation. Additionally, filters used to simplify DFGs are known to result in misleading interpretations that deviate from the real output of the process model [33]. Instead, variants of Inductive Miner, which deals with infrequent behaviour in multiple stages of process discovery, have been the preferred method for subpopulation analysis [33,34,35]. The ability to capture more complex relationships in timestamped data throughout the stages of cut detection, log splitting, and identifying base cases ensures a good balance between model precision and log fitness. Additionally, the IMi algorithm allows for robust filtering of infrequent behaviour across these operations to increase computation speed while reducing model complexity and the risk of overestimating process deviations [36]. HM is another alternative adapted to observe hierarchical patterns and underlying causal dependencies between nodes with comparable performance [37]. Other approaches in handling comparisons between trace executions using alignment-based techniques, [38] rather than token-based replay, have generally been more prominent in practice (resulting in more efficient classification of transitions through “move” events). However, based on our experience, these were computationally challenging to perform over a large population of events with high trace variation and many subgroups.

A similar approach to our subgroup analysis was also applied in a cancer care population by Marazza et al. using an Inductive Miner variant [39]. They used the MIMIC-III dataset to discover variations in treatment pathways across patient subgroups. We provided additional measurements of the coverage rates (estimated as the percentage of fitting traces between groups 1 and 2) to distinguish how well pathways align between complex and less severe cases (Appendix B.4). GED has been previously used in several domains, including information retrieval and machine learning, for pattern recognition to merge similar embeddings or measure the classification accuracy in documents [40, 41]. To our knowledge, this is the first study that provides a comprehensive and validated PM approach on EHR data to compare multidisciplinary care patterns in patients with COVID-19, using multiple levels of granularity.

6 Conclusion

Health and care pathways in the hospital are complex for many patients, including those recovering from acute COVID-19 infection. Process mining can aid in providing visual interpretations of treatment progression in these patients and help identify changes in patterns of care. Our approach uses novel routine data from the EHR to gather care provider inputs and linked activities describing the delivered interventions. Using these data, we described the process flows and underlying patterns of care, highlighting some of the distinctions in activity trajectory between Waves 1 and 2 of the pandemic. We applied token-based replay and graph similarity measures to provide an extensive evaluation of the differences in impact on adverse and non-adverse subgroups. This approach can drive further development of novel process-driven applications for understanding treatment policy changes and provide actionable evidence for reducing risks of disruption in healthcare services.

Data Availability

The data that support the findings of this study are not openly available due to reasons of sensitivity, but summary data and visual diagrams can be provided by the corresponding author upon reasonable request. Data and models are located in a Secure Data Environment provided by the DataLoch service.

References

Joy M et al (2020) Reorganisation of primary care for older adults during COVID-19: a cross-sectional database study in the UK. Br J Gen Pract 70(697):e540–e547. https://doi.org/10.3399/bjgp20X710933
Article MATH Google Scholar
Zhou F et al (2020) Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet 395(10229):1054–1062. https://doi.org/10.1016/S0140-6736(20)30566-3
Article MATH Google Scholar
Fluck D et al (2022) Comparison of characteristics and outcomes of patients admitted to hospital with COVID-19 during wave 1 and wave 2 of the current pandemic. Intern Emerg Med 17(3):675–684. https://doi.org/10.1007/s11739-021-02842-5
Article MATH Google Scholar
Weblin J, Harriman A, Butler K, Snelson C, McWilliams D (2023) Comparing rehabilitation outcomes for patients admitted to the intensive care unit with COVID-19 requiring mechanical ventilation during the first two waves of the pandemic: a service evaluation. Intensive Crit Care Nurs 75:103370. https://doi.org/10.1016/j.iccn.2022.103370
Article Google Scholar
Mehta N, Shukla S (2022) Pandemic analytics: how countries are leveraging big data analytics and artificial intelligence to fight COVID-19? SN Comput Sci 3(1):54. https://doi.org/10.1007/s42979-021-00923-y
Article MATH Google Scholar
Van Der Aalst W (2016) Process mining: data science in action, vol 2. Springer, Berlin, Heidelberg
Book MATH Google Scholar
Van Der Aalst W (2012) Process mining. Commun ACM 55(8):76–83. https://doi.org/10.1145/2240236.2240257
Article MATH Google Scholar
Scheuerlein H et al (2012) New methods for clinical pathways—Business Process Modeling Notation (BPMN) and Tangible Business Process Modeling (t.BPM). Langenbecks Arch Surg 397(5):755–761. https://doi.org/10.1007/s00423-012-0914-z
Article Google Scholar
Munoz-Gama J et al (2022) Process mining for healthcare: characteristics and challenges. J Biomed Inform 127:103994. https://doi.org/10.1016/j.jbi.2022.103994
Article MATH Google Scholar
“SIMD (Scottish Index of Multiple Deprivation),” Scottish Index of Multiple Deprivation. [Online]. Available: http://simd.scot/. Accessed 20 Feb 2023
Kuan V et al (2019) A chronological map of 308 physical and mental health conditions from 4 million individuals in the English National Health Service. Lancet Digit Health 1(2):e63–e77. https://doi.org/10.1016/S2589-7500(19)30012-3
Article MATH Google Scholar
Rozinat A, van der Aalst WMP (2008) Conformance checking of processes based on monitoring real behavior. Inf Syst 33(1):64–95. https://doi.org/10.1016/j.is.2007.07.001
Article MATH Google Scholar
Muñoz-Gama J, Carmona J (2010) A fresh look at precision in process conformance,” In: Hull R, Mendling J, Tai S (eds) Business Process Management, in Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, pp. 211–226. https://doi.org/10.1007/978-3-642-15618-2_16
Buijs JCaM, van Dongen BF, van der Aalst WMP (2014) Quality dimensions in process discovery: the importance of fitness, precision, generalization and simplicity. Int J Coop Info Syst 23(01):1440001. https://doi.org/10.1142/S0218843014400012
Article MATH Google Scholar
A. Berti and W. M. P. van der Aalst (2021) A novel token-based replay technique to speed up conformance checking and process enhancement. In: Koutny M, Kordon F, Pomello L (eds) Transactions on Petri Nets and Other Models of Concurrency XV. in Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, pp. 1–26. https://doi.org/10.1007/978-3-662-63079-2_1
Sanfeliu A, Fu K-S (1983) A distance measure between attributed relational graphs for pattern recognition. IEEE Trans Syst Man Cybern SMC-13(3):353–362. https://doi.org/10.1109/TSMC.1983.6313167
Article MATH Google Scholar
Janssenswillen G, Depaire B, Swennen M, Jans M, Vanhoof K (2019) bupaR: Enabling reproducible business process analysis. Knowl Based Syst 163. https://doi.org/10.1016/j.knosys.2018.10.018
“Interface to the PM4py Process Mining Library.” [Online]. Available: https://bupaverse.github.io/pm4py/. Accessed 25 Nov 2023
“ProM 6.13 – ProM Tools.” [Online]. Available: https://promtools.org/prom-6-13/. Accessed 25 Nov 2023
Grecco BH, Araújo-Rossi PAO, Nicoletti CF (2022) Nutritional therapy for hospitalised patients with COVID-19: a narrative and integrative review. J Intensive Med 2(4):249–256. https://doi.org/10.1016/j.jointm.2022.04.004
Article Google Scholar
Jahn K et al (2022) Exercise capacity impairment after COVID-19 pneumonia is mainly caused by deconditioning. Eur Respir J 59(1). https://doi.org/10.1183/13993003.01136-2021
Rinaldo RF et al (2021) Deconditioning as main mechanism of impaired exercise response in COVID-19 survivors. Eur Respir J 58(2). https://doi.org/10.1183/13993003.00870-2021
Tran V-T, Perrodeau E, Saldanha J, Pane I, Ravaud P (2023) Efficacy of first dose of covid-19 vaccine versus no vaccination on symptoms of patients with long COVID: target trial emulation based on ComPaRe e-cohort. BMJ Med 2(1). https://doi.org/10.1136/bmjmed-2022-000229
Notarte KI et al (2022) Impact of COVID-19 vaccination on the risk of developing long-COVID and on existing long-COVID symptoms: a systematic review. eClinicalMedicine 53. https://doi.org/10.1016/j.eclinm.2022.101624
Thygesen JH et al (2022) COVID-19 trajectories among 57 million adults in England: a cohort study using electronic health records. Lancet Digital Health 4(7):e542–e557. https://doi.org/10.1016/S2589-7500(22)00091-7
Article MATH Google Scholar
Jeffrey K et al (2023) Identifying long COVID using electronic health records: a national observational cohort study in Scotland. Rochester, NY: 4376992. https://doi.org/10.2139/ssrn.4376992
Verduri A et al (2022) Comparison between first and second wave of COVID-19 outbreak in older people: the COPE multicentre European observational cohort study. Eur J Pub Health 32(5):807–812. https://doi.org/10.1093/eurpub/ckac108
Article Google Scholar
Englebright J, Aldrich K, Taylor CR (2014) Defining and incorporating basic nursing care actions into the electronic health record. J Nurs Scholarsh 46(1):50–57. https://doi.org/10.1111/jnu.12057
Article Google Scholar
Johnson JK, Lapin B, Green K, Stilphen M (2021) Frequency of physical therapist intervention is associated with mobility status and disposition at hospital discharge for patients with COVID-19. Phys Ther 101(1):181. https://doi.org/10.1093/ptj/pzaa181
Article Google Scholar
Rich J, Coman M, Sharkey A, Church D, Pawson J, Thomas A (2022) A single center observational study of the incidence, frequency and timing of critical care physiotherapy intervention during the COVID-19 pandemic. J Intensive Care Soc 23(3):273–280. https://doi.org/10.1177/1751143721991060
Article Google Scholar
Rosen K, Patel M, Lawrence C, Mooney B (2020) Delivering telerehabilitation to COVID-19 inpatients:a retrospective chart review suggests it is a viable option. HSS J 16(Suppl 1):64–70. https://doi.org/10.1007/s11420-020-09774-4
Article Google Scholar
Dagliati A et al (2022) A process mining pipeline to characterise COVID-19 patients’ trajectories and identify relevant temporal phenotypes from EHR data. Front Public Health 10, [Online]. Available: https://doi.org/10.3389/fpubh.2022.815674 Accessed 25 Sept 2023
van der Aalst WMP (2019) A practitioner’s guide to process mining: limitations of the directly-follows graph. Procedia Comput Sci 164:321–328. https://doi.org/10.1016/j.procs.2019.12.189
Article MATH Google Scholar
Pradana MIA, Kurniati AP, Wisudiawan GAA (2022) Inductive miner implementation to improve healthcare efficiency on Indonesia national health insurance data. In: 2022 International Conference on Data Science and Its Applications (ICoDSA), Bandung, Indonesia: IEEE, 239–244. https://doi.org/10.1109/ICoDSA55874.2022.9862837
Rademaker FM, Bemthuis RH, Jayasinghe J, Bukhsh FA (2024) Analyzing sepsis treatment variations in subpopulations with process mining. In: 26th International Conference on Enterprise Information Systems, ICEIS 2024 (pp. 85–94). SCITEPRESS. https://doi.org/10.5220/0012600700003690
Leemans SJJ, Fahland D, Van Der Aalst WMP (2014) Discovering block-structured process models from event logs containing infrequent behavior. In: Lohmann N, Song M, Wohed P (eds). Business Process Management Workshops, in Lecture Notes in Business Information Processing, Cham: Springer International Publishing, vol. 171, pp. 66–78. https://doi.org/10.1007/978-3-319-06257-0_6
Tavakoli-Zaniani M, Gholamian MR, Hashemi-Golpayegani SA (2022) Improving heuristics miners for healthcare applications by discovering optimal dependency graphs. J Supercomput 78(18):19628–19661. https://doi.org/10.1007/s11227-022-04637-7
Article Google Scholar
de Leoni M, van der Aalst WMP (2013) Aligning event logs and process models for multi-perspective conformance checking: an approach based on integer linear programming. In: Daniel F, Wang J, Weber B (eds). Business Process Management, in Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, pp. 113–129. https://doi.org/10.1007/978-3-642-40176-3_10
Marazza F et al (2020) Automatic process comparison for subpopulations: application in cancer care. Int J Environ Res Public Health 17(16):5707. https://doi.org/10.3390/ijerph17165707
Article MATH Google Scholar
Wang P, Eglin V, Garcia C, Largeron C, Lladós J, Fornés A (2014) A coarse-to-fine word spotting approach for historical handwritten documents based on graph embedding and graph edit distance. In: 2014 22nd International Conference on Pattern Recognition. pp. 3074–3079. https://doi.org/10.1109/ICPR.2014.530
Stauffer M, Fischer A, Riesen K (2016) A novel graph database for handwritten word images. In: Robles-Kelly A, Loog M, Biggio B, Escolano F, Wilson R (eds) Structural, Syntactic, and Statistical Pattern Recognition, vol. 10029, in Lecture Notes in Computer Science, vol. 10029. Cham: Springer International Publishing, pp. 553–563. https://doi.org/10.1007/978-3-319-49055-7_49

Download references

Acknowledgements

For the purpose of open access, the author has applied a CC-BY public copyright licence to any Author Accepted Manuscript version arising from this submission.

Funding

KG is supported by a PhD Fellowship award from the Sir Jules Thorn Charitable Trust (21/01PhD) as part of the University of Edinburgh’s Precision Medicine PhD programme. This work was supported by DataLoch (dataloch.org), which is core-funded by the Data-Driven Innovation programme within the Edinburgh and South East Scotland City Region Deal (ddi.ac.uk), and the Chief Scientist Office, Scottish Government (cso.scot.nhs.uk).

Author information

Authors and Affiliations

BHF Centre for Cardiovascular Science, Chancellor’s Building, University of Edinburgh, Edinburgh, EH16 4TJ, UK
Konstantin Georgiev & Atul Anand
Artificial Intelligence and Its Applications Institute, School of Informatics, University of Edinburgh, Edinburgh, EH8 9BT, UK
Jacques D. Fleuriot
Independent Researcher, Edinburgh, UK
Petros Papapanagiotou
The Healthcare Improvement Studies Institute, Department of Public Health and Primary Care, University of Cambridge, Cambridge, CB1 8RN, UK
Joanne McPeake
Ageing and Health Research Group and Advanced Care Research Centre, Usher Institute, University of Edinburgh, Edinburgh, EH16 4UX, UK
Susan D. Shenkin

Authors

Konstantin Georgiev
View author publications
Search author on:PubMed Google Scholar
Jacques D. Fleuriot
View author publications
Search author on:PubMed Google Scholar
Petros Papapanagiotou
View author publications
Search author on:PubMed Google Scholar
Joanne McPeake
View author publications
Search author on:PubMed Google Scholar
Susan D. Shenkin
View author publications
Search author on:PubMed Google Scholar
Atul Anand
View author publications
Search author on:PubMed Google Scholar

Contributions

KG conceived the case study and its design. KG developed the COVID-19 cohort, standardised event logs, and the PM methodology. KG and AA had access to and verified the raw data, analysis tables, and visual diagrams. KG, AA, JF, and PP interpreted the data. JF and PP provided expert input on refining the methodology. KG drafted the manuscript. AA, JM, SS, and JF revised the manuscript critically for important intellectual content. JM and SS provided clinical support and feedback for this project. All authors provided their final approval of the version to be published. All authors are accountable for the work.

Corresponding author

Correspondence to Konstantin Georgiev.

Ethics declarations

Ethics Approval and Consent to Participate

The case study was performed with the approval of the local Research Ethics Committee and delegated Caldicott Guardian in accordance with the Declaration of Helsinki. All data was collected from EHRs and national registries previously anonymised by the DataLoch service (Edinburgh, UK) and analysed in a Secure Data Environment. Consent from patients was not required for this study.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 8262 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Georgiev, K., Fleuriot, J.D., Papapanagiotou, P. et al. Comparing Care Pathways Between COVID-19 Pandemic Waves Using Electronic Health Records: A Process Mining Case Study. J Healthc Inform Res 9, 41–66 (2025). https://doi.org/10.1007/s41666-024-00181-6

Download citation

Received: 04 February 2024
Revised: 04 October 2024
Accepted: 03 December 2024
Published: 23 December 2024
Issue Date: March 2025
DOI: https://doi.org/10.1007/s41666-024-00181-6

Keywords

Profiles

Konstantin Georgiev View author profile
Atul Anand View author profile

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Comparing Care Pathways Between COVID-19 Pandemic Waves Using Electronic Health Records: A Process Mining Case Study

Abstract

Similar content being viewed by others

Transitional Care in Times of COVID 19: Opportunities for e-Health Interventions

COVID-19 outbreaks surveillance through text mining applied to electronic health records

Bridging the Inferential Gaps in Healthcare

Explore related subjects

1 Introduction

2 Methods

2.1 Data Sources

2.2 Data Collection and Preprocessing

2.3 Process Mining Algorithms and Definitions

2.4 Conformance Checking for Performance and Subgroup Analysis

2.5 Implementation Tools and Software

2.6 Overview of the Approach

3 Results

3.1 Patient and Event Log Summary Data

3.2 Model Performance Comparison

3.3 BPMN Diagrams of the Discovered Process Models

3.4 Comparison Between Complex Patient Subgroups

4 Discussion

5 Related Work

6 Conclusion

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics Approval and Consent to Participate

Conflict of Interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 8262 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles