0% found this document useful (0 votes)
30 views262 pages

Dhealth 2019

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views262 pages

Dhealth 2019

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 262

dHEALTH 2019 – FROM eHEALTH TO dHEALTH

Studies in Health Technology and


Informatics
International health informatics is driven by developments in biomedical technologies and
medical informatics research that are advancing in parallel and form one integrated world of
information and communication media and result in massive amounts of health data. These
components include genomics and precision medicine, machine learning, translational
informatics, intelligent systems for clinicians and patients, mobile health applications, data-
driven telecommunication and rehabilitative technology, sensors, intelligent home technology,
EHR and patient-controlled data, and Internet of Things.
Studies in Health Technology and Informatics (HTI) series was started in 1990 in
collaboration with EU programmes that preceded the Horizon 2020 to promote biomedical and
health informatics research. It has developed into a highly visible global platform for the
dissemination of original research in this field, containing more than 250 volumes of high-quality
works from all over the world.
The international Editorial Board selects publications with relevance and quality for the
field. All contributions to the volumes in the series are peer reviewed.
The HTI series is indexed by MEDLINE/PubMed; Web of Science: Conference
Proceedings Citation Index – Science (CPCI-S) and Book Citation Index – Science (BKCI-S);
Google Scholar; Scopus; EMCare.

Series Editors:
B. Blobel, O. Bodenreider, E. Borycki, M. Braunstein, C. Bühler, J.P. Christensen, R. Cooper,
R. Cornet, J. Dewen, O. Le Dour, P.C. Dykes, A. Famili, M. González-Sancho, E.J.S. Hovenga,
J.W. Jutai, Z. Kolitsi, C.U. Lehmann, J. Mantas, V. Maojo, A. Moen, J.F.M. Molenbroek,
G. de Moor, M.A. Musen, P.F. Niederer, C. Nøhr, A. Pedotti, N. Peek, O. Rienhoff, G. Riva,
W. Rouse, K. Saranto, M.J. Scherer, S. Schürer, E.R. Siegel, C. Safran, N. Sarkar,
T. Solomonides, E. Tam, J. Tenenbaum, B. Wiederhold, P. Wilson and L.H.W. van der Woude

Volume 260
Recently published in this series
Vol. 259 T. Bürkle, M. Lehmann, K. Denecke, M. Sariyar, S. Bignens, E. Zetz and J. Holm
(Eds.), Healthcare of the Future – Bridging the Information Gap – 5 April 2019,
Biel/Bienne, Switzerland
Vol. 258. A. Shabo (Shvo), I. Madsen, H.-U. Prokosch, K. Häyrinen, K.-H. Wolf, F. Martin-
Sanchez, M. Löbe and T.M. Deserno (Eds.), ICT for Health Science Research –
Proceedings of the EFMI 2019 Special Topic Conference
Vol. 257. F. Lau, J.A. Bartle-Clar, G. Bliss, E.M. Borycki, K.L. Courtney, A.M.-H. Kuo,
A. Kushniruk, H. Monkman and A.V. Roudsari (Eds.), Improving Usability, Safety
and Patient Outcomes with Health Information Technology – From Research to
Practice

ISSN 0926-9630 (print)


ISSN 1879-8365 (online)
dHealth 2019 –
From eHealth to dHealth
Proceedings of the 13th Health Informatics Meets Digital Health
Conference

Edited by
Dieter Hayn
Digital Health Information Systems, Center for Health & Bioresources, AIT
Austrian Institute of Technology GmbH, Graz, Austria

Alphons Eggerth
Digital Health Information Systems, Center for Health & Bioresources, AIT
Austrian Institute of Technology GmbH, Graz, Austria
and
Günter Schreier
Digital Health Information Systems, Center for Health & Bioresources, AIT
Austrian Institute of Technology GmbH, Graz, Austria

Amsterdam • Berlin • Washington, DC


© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.

This book is published online with Open Access and distributed under the terms of the Creative
Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).

ISBN 978-1-61499-970-6 (print)


ISBN 978-1-61499-971-3 (online)
Library of Congress Control Number: 2019941531

Publisher
IOS Press BV
Nieuwe Hemweg 6B
1013 BG Amsterdam
Netherlands
fax: +31 20 687 0019
e-mail: [email protected]

For book sales in the USA and Canada:


IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel.: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

LEGAL NOTICE
The publisher is not responsible for the use which might be made of the following information.

PRINTED IN THE NETHERLANDS


v

Preface
Since its beginning in 2007, the dHealth conference series is organized by the Austrian
Working Group of Health Informatics and eHealth. Each year, this event attracts around
300 participants from academia, industry, government and health care organizations.
In keeping with its interdisciplinary mission, the dHealth conference series provides
a platform for researchers, practitioners, decision makers and vendors to discuss innova-
tive health informatics and eHealth solutions to improve the quality and efficiency of
healthcare by digital technologies.
A special topic for the dHealth 2019 was “from eHealth to dHealth”, stressing that
healthcare will be more and more data-driven in the future. While eHealth in general
concerns healthcare IT solutions at professional healthcare providers, dHealth addresses
broader fields of application in all areas of life, including sensors, networks, genomics
and bioinformatics, data centered solutions, machine learning, etc.
The present proceedings give insights into the state of the art of different aspects of
dHealth, including the design and evaluation of user interfaces, patient centered solutions,
electronic health/medical/patient records, machine learning in healthcare and biomedical
data analytics. These topics address the data path “from sensors to decisions”, providing
an interdisciplinary approach to digital health, including aspects of biomedical and sen-
sor informatics.

Dieter Hayn
Alphons Eggerth
Günter Schreier
This page intentionally left blank
vii

Scientific Programme Committee


Univ.-Prof. Dr. Klaus-Peter Adlassnig, Medical University of Vienna, Austria
Prof. Dr. Josef Altmann, UAS Upper Austria, Wels, Austria
Univ.-Prof. Dr. Elske Ammenwerth, UMIT, Hall in Tirol, Austria
Univ.-Prof. Dr. Andrea Berghold, Medical University of Graz, Austria
Prof. Dr. Britta Böckmann, UAS Dortmund, Germany
Univ.-Prof. Dr. Ruth Breu, University of Innsbruck, Austria
Ass. Prof. Kim Delbaere, University of New South Wales, Sydney, Australia
Univ.-Prof. DDr. Wolfgang Dorda, Medical University of Vienna, Austria
Prof. Dr. Stephan Dreiseitl, UAS Hagenberg, Austria
Univ.-Prof. Dr. Georg Duftschmid, Medical University of Vienna, Austria
Univ.-Prof. Dr. Martin Dugas, University of Münster, Germany
Univ.-Prof. Dr. Walter Gall, Medical University of Vienna, Austria
Prof. José García Moros, Universidad de Zaragoza, Spain
Dr. Holger Gothe, UMIT, Hall in Tyrol, Austria
Prof. Dr. Martin Haag, Heilbronn University, Germany
Prof. Dr. Peter Haas, UAS Dortmund, Germany
Prof. Dr. Anke Häber, UAS Zwickau, Germany
Dr. Dieter Hayn, AIT, Graz, Austria
Dr. Mira Hercigonja-Szekeres, UAS, Zagreb, Croatia
ao. Univ.-Prof. Dr. Alexander Hörbst, UMIT, Hall in Tirol, Austria
Prof. Dr. Ursula Hübner, University of Osnabrück, Germany
Sergey Karas, Siberian State Medical University Tomsk, Russia
Prof. Dr. Guido Kempter, UAS Vorarlberg, Austria
Prof. Dr. Peter Klutke, University of Kempten, Germany
Prof. Dr. Werner Kurschl, UAS Hagenberg, Austria
Univ.-Prof. Dr. Richard Lenz, University of Erlangen-Nürnberg, Germany
Prof. Dr. Nicos Maglaveras, Aristotle University, Thessaloniki, Greece
Prof. Dr. Michael Marschollek, Medical University of Hannover, Germany
Prof. Dr. Christian Menard, UAS Carinthia, Austria
Prof. Dr. George Mihalas, University of Medicine and Pharmacy, Timisoara, Romania
Ass.-Prof. Dr. Ivana Ognjanovic, Donja Gorica University, Podgorica, Montenegro
Univ.-Prof. Dr. Karl-Peter Pfeiffer, FH Joanneum, Graz, Austria
Univ.-Prof. Dr. Ulrich Prokosch, University of Erlangen, Germany
Ass.-Prof. Stephen Redmond, University College Dublin, Ireland
Prof. Jean-Marie Rodriguez, French Institute of Health and Medical Research, France
FH-Prof. Dr. Stefan Sauermann, UAS Technikum Wien, Vienna, Austria
Prof. Dr. Paul Schmücker, University of Mannheim, Germany
Dr. Michael Shifrin, N.N. Burdenko Neurosurgical Institute, Moscow, Russia
Prof. Dr. Martin Stämmler, UAS Stralsund, Germany
Prof. Dr. Lăcrămioara Stoicu-Tivadar, Polytechnic University of Timisoara, Romania
Univ.-Prof. Dr. Zlatko Trajanoski, Medical University of Innsbruck, Austria
Prof. Elena Villalba, Universidad Politecnica de Madrid, Spain
viii

Reviewers
We would like to thank all reviewers for their significant contribution to the proceed-
ings of the dHealth 2019:

Josef Altmann Werner Kurschl


Elske Ammenwerth Thomas Lux
Mihaela Crisan-Vida George Mihalas
Kerstin Denecke Robert Mischak
Stephan Dreiseitl Robert Modre-Osprian
Georg Duftschmid Stefan Müller-Mielitz
Christo El Morr Paul Panek
Gerhard Fortwengel Stephen Redmond
Jan Gaebel Jean marie Rodrigues
Walter Gall Stefan Sabutsch
José García Moros Stefan Sauermann
Holger Gothe Guenter Schreier
Martin Haag Abbas Sheikhtaheri
Werner O. Hackl Michael Shifrin
Dieter Hayn Stefan Skonetzki
Emmanuel Helm Martin Staemmler
Peter Kastner Lacramioara Stoicu-Tivadar
Peter Klutke Sylvia Thun
Johannes Kriegel Karsten Weber
Martin Kropf
ix

Contents
Preface v
Dieter Hayn, Alphons Eggerth and Günter Schreier
Scientific Programme Committee vii
Reviewers viii

Creating Individualized Education Material for Diabetes Patients Using


the eDiabetes Platform 1
Kerstin Denecke, Patrick Jolo, Burcu Sevinc and Stephan Nüssli
Towards Smart Adaptive Care Toilets 9
Peter Mayer, Florian Güldenpfennig and Paul Panek
Evaluation of Depth Cameras for Use as an Augmented Reality Emergency Ruler 17
Michael Schmucker, Christoph Igel and Martin Haag
Electronic Medical Records for Mental Disorders: What Data Elements Should
These Systems Contain? 25
Nasim Hashemi, Abbas Sheikhtaheri, Niyoosha-sadat Hashemi
and Reza Rawassizadeh
Exchanging Appointment Data Among Healthcare Institutions 33
Philip Kyburz, Sascha Gfeller, Thomas Bürkle and Kerstin Denecke
Application of Named Entity Recognition Methods to Extract Information from
Echocardiography Reports 41
Szabolcs Szekér, György Fogarassy, Károly Machalik
and Ágnes Vathy-Fogarassy
Health Care Atlases: Informing the General Public About the Situation of
the Austrian Health Care System 49
Claire Rippinger, Nadine Weibrecht, Melanie Zechmeister, Sonja Scheffel,
Christoph Urach and Florian Endel
Improving the Prediction of Emergency Department Crowding: A Time Series
Analysis Including Road Traffic Flow 57
Jens Rauch, Ursula Hübner, Mathias Denter and Birgit Babitsch
Information Adapted Machine Learning Models for Prediction in Clinical
Workflow 65
Stefanie Jauk, Diether Kramer, Franz Quehenberger,
Sai Pavan Kumar Veeranki, Dieter Hayn, Günter Schreier
and Werner Leodolter
Evaluation of Chatbot Prototypes for Taking the Virtual Patient's History 73
Andreas Reiswich and Martin Haag
x

Evaluation of Deep Clustering for Diarization of Aphasic Speech 81


Daniel Klischies, Christian Kohlschein, Cornelius J. Werner
and Stephan M. Jonas
Ensemble Based Approach for Time Series Classification in Metabolomics 89
Michael Netzer, Friedrich Hanser, Marc Breit, Klaus M. Weinberger,
Christian Baumgartner and Daniel Baumgarten
Achieving an Interoperable Data Format for Neurophysiology with DICOM
Waveforms 97
Silvia Winkler, Martin Huber and Tilmann Kluge
A Comprehensive FXR Signaling Atlas Derived from Pooled ChIP-seq Data 105
Emilian Jungwirth, Katrin Panzitt, Hanns-Ulrich Marschall,
Martin Wagner and Gerhard G. Thallinger
Robust Comparison of Simultaneous EEG Recordings Using Kalman Filters and
Gaussian Mixture Models 113
Niels von Stein, Jonas Schulte-Coerne, Stephan M. Jonas
and Ekaterina Kutafina
Development of a National Roadmap for Electronic Prescribing Implementation 121
Hamidreza Dehghan, Saeid Eslami, Seyed Hadi Ghasemi,
Mohammad Jahangiri, Kambiz Bahaadinbeigy, Khalil Kimiafar,
Mahdi Aghabagheri, Seyedeh Mahdieh Namayandeh and Mahdi Sargolzaei
Design and Evaluation of a Smart Medication Recommendation System for
the Electronic Prescription 128
Seyed Hadi Ghasemi, Kobra Etminani, Hamidreza Dehghan,
Saeid Eslami, Mohammad Reza Hasibian, Hasan Vakili Arki,
Mohammad Reza Saberi, Mahdi Aghabagheri
and Seyedeh Mahdieh Namayandeh
Use of ICPC-2 – Current Status, Strengths and Weaknesses of the System 136
Karin Messer-Misak
Exploratory Analysis of Motion Tracking Data in the Rehabilitation Process of
Geriatric Trauma Patients 138
Amelie Altenbuchner, Sonja Haug and Karsten Weber
Requirements for a Telemedicine Center to Monitor LVAD Patients 146
Nils Reiss, Kirby Kristin Wegner, Jan-Dirk Hoffmann,
Sebastian Schulte Eistrup, Udo Boeken, Michiel Morshuis
and Thomas Schmidt
Empowering Diabetes Patients with Interventions Based on Behaviour Change
Techniques 154
Oliver Jung, Dietmar Glachs, Felix Strohmeier, Robert Mulrenin,
Sasja Huisman, Ian Smith, Hilde van Keulen, Jacob Sont
and Manuela Ploessnig
Topics for Continuous Education in Nursing Informatics: Results of a Survey
Among 280 Austrian Nurses 162
Elske Ammenwerth and Werner O. Hackl
xi

Pre-Navigation via Interactive Audio Tactile Maps to Promote the Wellbeing of


Visually Impaired People 170
Mark Scase, Ed Griffin and Lorenzo Picinali
Socially Assistive Robots (SAR) in In-Patient Care for the Elderly 178
Johannes Kriegel, Victoria Grabner, Linda Tuttle-Weidinger
and Irmtraud Ehrenmüller
Is Regular Re-Training of a Predictive Delirium Model Necessary After
Deployment in Routine Care? 186
Sai Pavan Kumar Veeranki, Diether Kramer, Dieter Hayn, Stefanie Jauk,
Alphons Eggerth, Franz Quehenberger, Werner Leodolter
and Günter Schreier
Photographic LVAD Driveline Wound Infection Recognition Using Deep
Learning 192
Noël Lüneburg, Nils Reiss, Christina Feldmann, Pim van der Meulen,
Michiel van de Steeg, Thomas Schmidt, Regina Wendl and Sybren Jansen
Improving Information About Private Consultants Through Data Linkage 200
Melanie Zechmeister and Florian Endel
Performance of Hospitals in Protecting the Confidentiality and Information
Security of Patients in Health Information Departments 202
Abbas Sheikhtaheri, Nasim Hashemi and Niyoosha-sadat Hashemi
Patient Record Linkage for Data Quality Assessment Based on Time Series
Matching 210
Alphons Eggerth, Dieter Hayn, Karl Kreiner, Sai Veeranki,
Heimo Traninger, Robert Modre-Osprian and Günter Schreier
eHealth Service for Integrated Care and Outpatient Rehabilitation – Pilot
Application of the Tyrol Stroke Pathway 218
Kristina Reiter, Julia Runge, Stefan Welte, Theresa Geley,
Clemens Rissbacher and Peter Kastner
Expressing Patient Selection Criteria Based on HL7 V3 Templates Within
the Open-Source Tool ART-DECOR 226
Simon Ott, Christoph Rinner and Georg Duftschmid
Predictive Modelling and Its Visualization for Telehealth Data – Concept and
Implementation of an Interactive Viewer 234
Michael Sams, Alphons Eggerth, Dieter Hayn, Sai Veeranki
and Günter Schreier

Subject Index 243


Author Index 245
This page intentionally left blank
dHealth 2019 – From eHealth to dHealth 1
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-1

Creating Individualized Education Material


for Diabetes Patients Using the eDiabetes
Platform
Kerstin DENECKEa,1, Patrick JOLOa, Burcu SEVINCa and Stephan NÜSSLIa
a
Bern University of Applied Sciences, Biel, Switzerland

Abstract. Diabetes mellitus (DM) is a chronic disease that affects many people in
Switzerland and around the world. Once diagnosed, a patient has to continuously
monitor blood glucose, manage medications or inject insulin. Technical skills and
competencies as well as knowledge on disease management have to be acquired
right after being diagnosed. Diabetes consultants support patients in this process and
provide educational material. While the process of generating patient-tailored
material is currently complex and time consuming, in future, the eDiabetes platform
can help. The platform developed in cooperation with the consulting section of the
Swiss Diabetes Society offers the opportunity to create individual patient
information and instructions to teach technical skills and knowledge on diabetes.
Further, an integrated forum allows exchanging information and discussing issues
regarding diabetes counselling on a secure platform. Usability tests showed that
eDiabetes is easy to use and provides benefits for diabetes consultants and patients.

Keywords. diabetes mellitus, patient education, information system, information


provision

1. Introduction

Diabetes mellitus (DM) is a chronic disease that affects many people in Switzerland and
around the world. DM causes blood glucose levels to increase (hyperglycemia). A recent
forecast of the International Diabetes Federation predicts that in 2045 more than 625
Million people worldwide will suffer from diabetes [1]. There is even an increasing
prevalence of type 2 DM in children and adolescents around the world [2]. Reasons for
the overall increase are an ageing society, obesity and lack of physical activity in people
[2]. As in other diseases, education plays a key role in the treatment of DM and in the
management of the condition by the patients themselves. The treatment success relies
heavily on patient accountability and awareness over the restrictions imposed by the
condition, in addition to the need for patients to manage their glucose levels. To avoid
complications and comorbidities, it is crucial that the newly diagnosed patient learns
about the disease, about the events that can occur and the therapeutic management of the
disease. There are mobile applications available such as “mysugr” that aim at supporting
the individual disease management, but support for diabetes consultants in creating
individualized education material is still missing. Patient education for diabetes aims at

1
Corresponding Author: Kerstin Denecke, Bern University of Applied Sciences, Quellgasse 21, 2501
Biel, Switzerland, E-Mail: [email protected]
2 K. Denecke et al. / Creating Individualized Education Material for Diabetes Patients

equipping a patient with necessary information to support self-management and avoid


complications. Right after being diagnosed with diabetes, patients have to adapt their
lifestyle and change habits. They have to absorb a lot of information in a short period of
time. In this situation, they are supported by specialists, referred to as diabetes
consultants. A diabetes consultant provides information on devices such as blood glucose
monitors, insulin pens and lancing devices, on the disease and complications. The
amount of information is often overwhelming for patients and the information is not
always presented in an adequate form regarding language and comprehensibility. To
support this process and to ensure that the patient is remembering all the information, the
diabetes consultant provides general leaflets to the patient. Different health literacies and
background knowledge, cultural contexts and languages complicate the information
provision: Often, explanations are necessary since standard material is too complex or
not written in the patient’s language, making it difficult for patients to follow the
instructions on the leaflets and instructions. On the other hand, the provision and
compilation of individualized information materials is time-consuming and cumbersome
for the consultants. Studies found out that inadequate health literacy might contribute to
the disproportionate burden of diabetes-related problems among disadvantaged
populations [3]. Additional interventions are required to improve diabetes outcomes
among patients with inadequate health literacy.
To address this challenge, we developed in cooperation with the consulting section
of the Swiss Diabetes Society the platform eDiabetes to support diabetes consultants in
generating individualized information for diabetic patients to be used in everyday
diabetes counselling. Using this platform, patient-related supplements, fact sheets and
instructions can be enriched with explanations in the patient’s wordings. Studies showed
that active involvement of patients into the treatment strongly impacts their behavior and
the course of disease. For example, Rachmani et al. found out in a clinical study that
“well-informed and motivated patients are more insistent to reach and maintain target
values of the main risk factors of diabetic complications” [4]. With the help of eDiabetes,
diabetes consultants can quickly compile individualized information materials for a
patient. This allows professionals to consider prior knowledge or living conditions of the
patient. In the following, we are summarizing the related work, and describing our
process of requirement collection. Then, we introduce eDiabetes, a web-based platform
for individualizing education material and present usability test results.

2. Related Work

Several educational interventions have been tested in patients with DM. Nevertheless, a
universally effective model for patients is still unavailable [5]. Health education is
recognized as an effective self-management capacity building tool, in which patients are
empowered to play an active role in the management of their conditions. The main four
pillars for health education are: 1) empowering individuals, 2) leadership, 3) motivation
and 4) education and information [6]. All type 1 and 20-30% type 2 diabetic patients
require insulin via daily subcutaneous injections. The technique is not complex, but many
patients tend to forget and inject insulin incorrectly [7]. Education demands a lot from
health care providers and includes specific training, teaching skills and motivation of
patients [7]. Initiate education for patients newly diagnosed with diabetes and providing
information on self-management skills to help ensure safe post-discharge care are some
of the suggested strategies for DM patient education [8].
K. Denecke et al. / Creating Individualized Education Material for Diabetes Patients 3

To support health professionals, national diabetes associations like the consulting


section of the Swiss Diabetes Society provide education material. User manuals for
glucose meters and insulin pens are offered by the manufacturer. Such general
information material often does not address the individual competencies of patients.
Studies showed that web-based care management improved glucose control in patients
with poorly controlled diabetes [9]. A study showed that both, counseling and web-based
diabetic patients' education improve patient outcome, however, counseling was more
effective than a web-based education strategy [10].
Several mobile applications for diabetes self-management are available on the
market specifically designed for patients. They support documentation of blood glucose
values, of meals and weights or they support calculation of carbohydrates [11]. A review
by Chomutare et al. confirmed that personalized education is an underrepresented feature
in diabetes mobile applications [11]. They found out that the four most prevalent features
of the applications available on the online market are 1) insulin and medication recording,
2) data export and communication, 3) diet recording, 4) weight management. Recent
work confirms that mobile apps for diabetes management focus on reporting and setting
reminders, rather than providing personalized education or therapeutic support [12]. This
indicates that there is still potential in offering solutions for patient-tailored education in
diabetes counselling. In this paper, we present our concept and the web-based platform
eDiabetes to support diabetes consultants.

3. Material and Methods

In order to analyze the situation in diabetes counselling, we made interviews with


diabetes consultants and representatives of the Swiss Diabetes Society. Through a
literature and web search, we identified limitations of existing tools and approaches and
derived a concept for the eDiabetes platform. The system was developed in an iterative
process: Feedback on the prototype was continuously collected from experts. Our
platform aims at supporting the diabetes consultants. However, it is relevant to know
which information needs patients have to be able to develop a system supporting the
diabetes consultants. For this reason, a survey with diabetes patients was conducted to
analyze the experiences of patients with user manuals of the various devices they have
to handle, such as blood glucose monitors, lancing devices and insulin pens. The results
and feedback of the patients were considered in the implementation of the tool. 18
questions were prepared for the patient survey. The open and closed questions targeted
at assessing the usefulness and comprehensibility of user manuals, difficulties with
domain specific terms and challenges in understanding user manuals. In order to reach a
large audience, the Swiss Diabetes Society published the survey on their homepage.
Furthermore, the participants were confronted with leaflets generated using the
eDiabetes platform and feedback was collected.
In addition, a usability study was performed with diabetes consultants to study the
user-friendliness of our eDiabetes platform and to identify problems. The results from
this survey were prioritized and used for adapting the implementation. The usability
questionnaire was developed based on ISO 9241/10 [13]. The 27 questions concerned
comprehensibility of the pages and the navigation within the platform. In addition, the
participants were asked for their personal opinion on the platform and for suggestions
for improvement. The five test persons of the usability study are members of the
management board of the consulting section of the Swiss Diabetes Society or diabetes
4 K. Denecke et al. / Creating Individualized Education Material for Diabetes Patients

consultants from a local hospital. Given the restricted time for the project (February 2018
to June 2018), only a test with a limited number of participants could be realized.
The eDiabetes platform integrates a community platform. To select an appropriate
platform, we performed a value benefit analysis for integrating a blog, a wiki or a forum.
We formulated 10 criteria for required functionalities as collected in the requirement
analysis. They include functionalities such as exchanging images and information,
searching for postings, create postings, exchange private messages with colleagues.

4. Results

4.1. Requirements

4.1.1. Requirements of diabetes consultants


The consulting section of the Swiss Diabetes Society currently offers a multilingual
leaflet folder with educational material online. The members have the possibility to print
the leaflets or to use them as a reference book. This material should be made available
on the platform and adding individual comments should be enabled. More specifically,
a web-based platform is desired to support healthcare professionals during a consultation
with a patient allowing them to generate individualized patient education material.
Further, the system should provide an exchange opportunity in the form of a community
platform to enable diabetes consultants to exchange materials or latest information on
diabetes counselling. The requirements analysis revealed that the diabetes professionals
are experiencing difficulties to teach foreign-language patients about diabetes. The
existing leaflets are more text-based than image-based which are hard to understand for
patients with another mother tongue. Accordingly, a system is required that explains
relevant information and processes using images. A system should support in generating
annotations of the images according to the patient needs.

4.1.2. Information needs of patients


Six out of ten participants answered all questions and were included in the results. The
participants fall into the following age ranges: age of 30-39 (2 persons), 40-49 (1 person),
50-59 (2 persons), older than 60 (1 person). They originated from Turkey, Sri Lanka,
Germany and Switzerland. Five persons were diagnosed with type 2 Diabetes and 1
person with type 1 diabetes. All six persons were diagnosed with diabetes 3 years ago or
more.
For most of the participants, the user manuals of the blood glucose meters, lancing
devices and insulin pens are self-explanatory. The biggest difficulties with the
instructions for use arise in understanding the texts and learning the processes. Some
participants have problems because the texts are not written in their mother tongue,
which confirms the perception of the diabetes consultants (see section 4.1.1). In addition,
the pictures in the user manuals are not meaningful and it is difficult to understand the
process of operation only by reading texts. A major challenge in dealing with the disease
is the change of eating habits and the general adaptation of everyday life to the
requirements of the disease.
In order to get a better overview on the process of dealing with the different medical
devices needed for diabetes management, it would be helpful for the patients if the
individual steps would be numbered. This would help them to have an accurate,
K. Denecke et al. / Creating Individualized Education Material for Diabetes Patients 5

understandable manual when using blood glucose meters, insulin pens, and lancing
devices. It would also be helpful if the leaflets or instructions for use are more visual and
contain less text.

4.2. eDiabetes platform

To address the requirements of the diabetes consultants, we developed the eDiabetes


platform. It is expected to be used by diabetes consultant while counselling a particular
patient. The platform provides two main functionalities: creating individual education
material and share it with a patient, and exchanging information with other diabetes
consultants through a forum (see Figure 1 and 2). These functionalities and their
realization within the platform are described in the following.
In collaboration with the patient, the education material is created dynamically using
eDiabetes. The platform enables a diabetes consultant to tailor leaflets and instructions
individually to a patient and his or her competencies, language and background.
Consultants add comments and explanations directly in the platform. This is realized by
text fields. In order to create explanations for the use of devices (glucose meter, etc.),
images that visualize the single steps can be arranged and text can be added (Fig. 3). As
a result, individualized leaflets or instructions that are better understandable for a patient
can be forwarded to the patient as PDF or printed. In case a patient is unsure about a
particular steps in the instructions for a device in its daily practice, he or she can always
refer to these sheets and follow the indicated steps. Content-wise the platform allows
creating individual explanations for symptoms of hypoglycemia, as well as describing
the steps how to react in case of a hypoglycemia and instructions on how to use the
devices.

Figure 1. Use case diagram for eDiabetes.


6 K. Denecke et al. / Creating Individualized Education Material for Diabetes Patients

As best suited community tool to be integrated into eDiabetes, a forum was selected
based on the results from the value benefit analysis. Thus, as a second functionality of
the eDiabetes platform, the open source forum software MyBB (https://mybb.com/) was
integrated. The forum allows experts to discuss with each other, exchange information,
discuss diabetes issues or ask for a second opinion. The forum is open to all members of
the Swiss Diabetes Society.

Figure 2. eDiabetes platform: For instructions on devices, original image material from the manufacturer is
required. This can be uploaded to a database that can be accessed by the eDiabetes platform.

The platform is running on a web server. This was set up using LAMP Stack, Apache
Version 2.4.18, MySQL 5.7.2, PHP 7.0.22 und phpMyAdmin 4.5.4.1. For developing
the views of eDiabetes, Java script, CSS and PHP were used.
The underlying information material on devices originates from the manufacturers
of glucose meter and insulin pens. In the course of the project, 11 manufacturers were
contacted to get approval of using the original images in our prototype. An update of the
material is required when new devices are put on market or modifications on existing
devices are made. To realize this, our concept foresees that the manufacturer can upload
the image to a database from which the eDiabetes platform collects the image files (Fig.
2).

Figure 3. Screenshot of the drag and drop functionality to create a user manual specifically for a patient. The
images can be arranged in the order as needed by the patient.
K. Denecke et al. / Creating Individualized Education Material for Diabetes Patients 7

4.3. Usability study

Usability in our context concerned on the one hand the interaction with the eDiabetes
platform and on the other hand, the appropriateness of the individualized leaflets that are
generated with the platform. The feedback on the eDiabetes platform by the diabetes
consultants was very positive; they confirmed that the platform is simple, clear, and easy
to use. They felt comfortable in interacting with the platform. However, two test persons
had difficulties in interacting with the platform. Further assessments on this issue has
shown that these two persons are older than the others and are using the computer less
often during their work. Nevertheless, the survey showed that interacting with the
eDiabetes platform is easy to learn without external help or a user manual. Test persons
claimed that they liked the platform, because it provides many possibilities for extensions.
They even asked for more options to add individual descriptions.
Patients on the other side confirmed that all relevant information is available on the
leaflets. The personalized leaflets help them, since the image-based process descriptions
provide a step-by-step presentation of the instructions which is clear, and all relevant
information is summarized briefly and concisely. They stated clearly that personalized
leaflets are better than the original instructions for use. In addition, information sheets
are more understandable for foreign-language patients than text-based leaflets. However,
they are still missing advices on what to pay attention to when using one of the devices.

5. Discussion and Conclusions

In this paper, we introduced a web-based platform for supporting diabetes consultants in


creating patient individualized information material. With this platform, we provide a
solution for addressing treatment challenges caused by limited health literacy in patients.
Studies showed that a low health literacy is associated with lower diabetes outcomes [14].
Patients who cannot understand instructions of glucose meters and insulin pens will have
problems in using the tools and thus, will have more variability in their glucose level.
Our system supports in these cases and can help in teaching skills and formulating
instructions in patient's wordings together with the patient. Martin et al. showed evidence
that verbal communication between patients and clinicians results in a better adherence
to the given instructions [15]. Therefore, it is the aim of our eDiabetes platform to support
the verbal instruction by a diabetes consultant in an efficient and effective manner.
Gillani et al. showed in a randomized controlled trial that provision of structured and
individualized information to people with diabetes positively influences the level of
patient activation, promotes better engagement and opens the potential to improve other
crucial diabetes outcomes [16]. To provide individualized und understandable
information at the beginning of the instruction process for a newly diagnosed diabetes is
the main focus of our eDiabetes platform. Through our assessments, we achieved first
hints that personalized leaflets and instructions can be useful for patients with diabetes.
According to patient opinions, material which is adapted to their individual life
circumstances will help them in better managing their diabetes. Additional studies are
necessary to prove the effectiveness of our approach for the diabetes consultants and the
positive effect of the patient’s adherence with the consultant’s instructions. It is also of
interest whether patients show a better self-management competence of their diabetes.
Our platform in its current shape provides mainly support for diabetes consultants.
When presenting the platform on the national congress of the Swiss Diabetes Association,
8 K. Denecke et al. / Creating Individualized Education Material for Diabetes Patients

the participants suggested adding additional functionalities for nutrition consultants in


the context of diabetes. The positive effects of patient education on disease outcomes
had been shown already [14].
In future, contracts with manufacturers have to be established to ensure a continuous
update of the data material. The creation of individual leaflets and instructions might be
of interest as well for other diseases that require use of medical devices or need
instructions on how to deal with complications. The overall concept is transferable,
however, the content needs to be replaced.

References

[1] D. Schillinger, K. Grumbach, J. Piette, et al. Association of Health Literacy With Diabetes Outcomes.
JAMA. 2002;288(4):475–482. doi:10.1001/jama.288.4.475
[2] T. Chomutare, L. Fernandez-Luque, E. Årsand, G. Hartvigsen: Features of Mobile Diabetes Applications:
Review of the Literature and Analysis of Current Applications Compared Against Evidence-Based
Guidelines. J Med Internet Res. 2011 Jul-Sep; 13(3): e65.
[3] Thomas Reinehr: Type 2 diabetes mellitus in children and adolescents. World J Diabetes. 2013 Dec 15;
4(6): 270–281.
[4] N.H. Cho, J.E. Shaw, S. Karuranga, Y. Huang, J.D. da Rocha Fernandes, A.W. Ohlrogge, B. Malanda: IDF
Diabetes Atlas: Global estimates of diabetes prevalence for 2017 and projections for 2045. Diabetes Res
Clin Pract. 2018 Apr;138:271-281. doi: 10.1016/j.diabres.2018.02.023. Epub 2018 Feb 26.
[5] L. Haas, M. Maryniuk, J. Beck, C.E. Cox, P. Duker, L. Edwards, et al.; Standards Revision Task Force.
National standards for diabetes self-management education and support. Diabetes Care 2014;37:S144-
53. DOI: http://dx.doi.org/10.2337/dc14-S144
[6] R.C.C. Iquize, F.C.E.T Theodoro, K.A. Carvalho, M.A. Oliveira, J.F. Barros, A.R.D. Silva : Educational
practices in diabetic patient and perspective of health professional: a systematic review. J Bras Nefrol.
2017 Apr-Jun;39(2):196-204. doi: 10.5935/0101-2800.20170034.
[7] A. Maldonato, D. Bloise, M. Ceci, E. Fraticelli, F. Fallucca . Diabetes mellitus: lessons from patient
education. Patient Educ Couns. 1995 Sep;26(1-3):57-66.
[8] A.T. Nettles: Patient education in the hospital. Diabetes Spectrum, 2005, 18(1), 44-48
[9] G.T. McMahon, H.E. Gomes, S. HicksonHohne, T.M. Hu, B.A. Levine, et al. Web-based care management
in patients with poorly controlled diabetes. Diabetes Care, 2005, 28: 1624-1629.
[10] F.A. Mersal, N.E. Mahday, N.A. Mersal. Efficiency of Web-Based Education versus Counselling on
Diabetic Patients ' Outcomes. Lambert Academic Publishing, Saarbrücken, 2012
[11] R. Rachmani, Z. Levi, I. Slavachevski, M. Avin, M. Ravid. Teaching patients to monitor their risk factors
retards the progression of vascular complications in high-risk patients with Type 2 diabetes mellitus--a
randomized prospective study. Diabet Med. 2002;19(5):385–92
[12] S. Izahar, Q.Y. Lean, MA. Hameed, et al. Content Analysis of Mobile Health Applications on Diabetes
Mellitus. Front Endocrinol (Lausanne). 2017;8:318. Published 2017 Nov 27.
doi:10.3389/fendo.2017.00318
[13] J. Prümper, M. Anft: Die Evaluation von Software auf Grundlage des Entwurfs zur internationalen
Ergonomie-Norm ISO 9241 Teil 10 als Beitrag zur partizipativen Systemgestaltung - ein
Fallbeispiel. Software-Ergonomie '93, Stuttgart: Teubner, 1993
[14] J. Vandenbosch,, S.V. den Broucke, L. Schinckus, et al. The impact of health literacy on diabetes self-
management education. Health Education Journal, 77(3), 2018, 349–362.
https://doi.org/10.1177/0017896917751554
[15] K. Martin, L. Carter, D. Balciunas, F. Sotoudeh, D. Moore, J. Westerfield. The impact of verbal
communication on physician prescribing patterns in hospitalized patients with diabetes. Diabetes Educ.
2003 Sep-Oct;29(5):827-36
[16] SM. Gillani, A. Nevill, BM. Singh. Provision of structured diabetes information encourages activation
amongst people with diabetes as measured by diabetes care process attainment: the WICKED Project.
Diabet Med. 2015 Jul;32(7):865-71
dHealth 2019 – From eHealth to dHealth 9
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-9

Towards Smart Adaptive Care Toilets


Peter MAYERa, Florian GÜLDENPFENNIGb and Paul PANEKa,1
a
HCI Group, Institute of Visual Computing and Human-Centered Technology, TU Wien
(Vienna University of Technology), Vienna, Austria
b
New Design University, St. Pölten, Austria

Abstract. Standard toilets in Western countries often do not meet the needs of
elderly and disabled people with physical limitations. While the existing concept
of barrier-free toilets and the emerging “changing places” concept offer more
space and support, the fixed height of the toilet seat still imposes a major problem
during all phases of toilet use and can limit the users’ autonomy by requiring
personal assistance. Thus, in the EU project iToilet an innovative ICT-based
modular height adjustable toilet system was designed to support the autonomy,
dignity and safety of older people living at home by digital technology
enhancements adapting the toilet to their needs and preferences. The main
requirements were: double foldable handrails, height and tilt adjustment,
emergency detection and call, and ease of use. The ICT component in this
approach serves a double purpose of enhancing usability of the base assistive
technology while at the same time providing safety for independent use. A field
test of a prototype system in real environments of a day care center and a
rehabilitation clinic has been successfully finished. The application of the iToilet
concept also in semi-public settings is currently studied in the Toilet4me project.

Keywords. AAL, toileting, autonomy, care, smart toilet, robotic toilet, barrier-free
toilet

1. Introduction

Standard toilets in Western countries often do not meet the needs of elderly and
disabled people [1, 2, 3]. For individuals with physical disabilities therefore barrier-free
toilet concepts have been introduced, which provide more room e.g. for wheelchairs or
assistants and a fixed height raised toilet seat with grab bars for support during transfer
and when sitting. Recently further improvements by a changing bench and a hoist have
been proposed via the “changing places” consortium [4] mainly in the UK. Still, such
concepts are difficult to implement in users’ homes, and they do not solve one of the
main challenges, the fixed toilet height during all phases of toilet use, which might be
unsuitable and prevent fully-autonomous toilet use or transfer. Therefore, in the EU
project iToilet an innovative ICT-based modular height adjustable toilet system was
designed to support the autonomy, dignity and safety of older people living at home by
digital technology enhancements adapting the toilet to their needs and preferences.
A main motivation of Assistive Technology design has always been to support
autonomy – at least in the sense that there are alternatives between personal assistance
and technological support to choose from. Such independence can be accomplished by

1
Corresponding Author: Paul Panek, Institute of Visual Computing and Human-Centered Technology,
TU Wien, Favoritenstrasse 11/193-05b, A 1040 Vienna, Austria, E-Mail: [email protected]
10 P. Mayer et al. / Towards Smart Adaptive Care Toilets

appropriate tools, for example, by providing intelligent assistance based on digital


technologies in everyday life as proposed since many years in the field of Ambient
Assisted / Active and Assisted Living (AAL). Given that toileting is a “taboo” or
“embarrassing” area for most people, not being dependent on personal assistance can
lead here to a strong gain in felt autonomy for affected people. In this way, people are
able to adhere to their own values and plans and are independent from others, that is,
they can act autonomously.
On the contrary, people with physical disabilities often experience restrictions in
autonomy when certain activities of daily living require them to rely on personal
assistance and then struggle to overcome barriers on their own which also poses a
threat to safety, as overexertion or sudden indisposition is not unlikely to cause
accidents. Therefore, besides the design of physical support and the usability of
functions as an alternative to personal assistance, a main emphasis in the iToilet project
(www.itoilet-project.eu) has been to also offer a safe day-by-day choice [5].
In an institutional setting, an assistive toilet tightly embedded in the care provision
process can also reduce the burden for care persons reducing their physical
involvement during daily work or in the toilet room. Digital technologies which
monitor the toilet use and parameters can deliver early hints about emerging
user/patient problems and provide reliable information for the care documentation
which otherwise has to be manually collected.
We now take a detailed look at the steps involved in using a toilet in order to
identify potential ‘pain points’ that might hinder independent toilet use. We then use
these critical steps to infer required design features of assistive toilets like iToilet.
x Entering the toilet room. Difficulties can arise when the (self-closing) door is
not easy to open especially when the user needs some kind of walking aid.
x Moving close to the toilet bowl with wheelchair or deposit walking aids.
x Undressing. This can be difficult when hands are needed for standing upright.
A high raised toilet seat to lean on and handrails can help in maintaining a
standing position during undressing.
x Sitting down or transferring from wheelchair. While sitting down from
standing usually is not effortful, the abrupt motion can cause discomfort and
instability. For wheelchair users the transfer should not be “uphill” i.e. the
toilet height should be slightly lower than or equal to the wheelchair seat and
handrails should not block their way.
x Stable sitting. The most stable position is with both feet fully on the ground
(often not the case in fixed raised seats) and with support by a backrest and
armrests on both sides. A low “squatting” position is also known to be helpful
for defecating, especially for elderly people [1].
x Cleaning. The cleaning process can be supported by easily reachable toilet
paper (on both sides), not requiring the sheets to be fetched with both hands
and by preventing the torn-off paper (or the complete paper roll) from falling
to the floor when released. For persons with restrictions in reaching behind for
cleaning a washing and drying support by a bidet function can be helpful.
x Flushing. Pressing a button behind ones’ back or on the wall can be difficult.
An automatic flush function or flush by command should be easier to use.
x Standing up from a low sitting position is the most difficult action for some
users. Motorized support for lifting users to a position where standing up is
possible is a main feature of iToilet. For wheelchair users the transfer should
P. Mayer et al. / Towards Smart Adaptive Care Toilets 11

be not “uphill” i.e. the toilet height should be slightly higher than or equal to
the wheelchair seat height.
x A raised toilet seat and armrests support the users in standing upright and
again facilitating the process of dressing.

An assistive, height adjustable toilet (in form of an ICT enhanced robotic toilet)
therefore should support three to four different individual heights in the range between
ca. 35 to 80cm during the toileting process. The individual settings can be retrieved
from a database, thus making it possible to use one adaptive toilet for many users e.g.
in institutional settings with IT infrastructure.
In the reminder of the paper, chapter 2 outlines the iToilet project approach
covering user requirement gathering, prototype development and evaluation for home
and institutional use. Chapter 3 provides an overview of the toilet4me study project on
a possible extension of the iToilet concept from use in home and institutional settings
to the area of semi-public spaces.

2. iToilet Approach for Home or Institutional Use

2.1. Collected Requirements

In the iToilet project presumptive users (n=41, all with mobility restrictions but able to
walk with a technical aid) [6, 7] were first asked to rank the relevance of problems in
relation with toilet use based on their personal experience to collate the project
assumptions right at the start (see Table 1).

Table 1. iToilet Ranked User Requirements


Rank High priority user requirement
1 bilateral (general stability and support), removable/foldable handrails (wheelchair)
2 height adjustment (in a wide range) and tilt adjustment
3 fall detection, emergency recognition and emergency call
4 simplicity (few, straightforward buttons on both handrails)
5 fixed toilet paper holder (on both handrails)
6 sit down and stand up support
7 custom settings (tilt and height) w. user identification

Rank Medium priority user requirement


1 self-sanitizing seat and bowl
2 shelf/tray area (to put objects)
3 upgradability, modularity
4 automatic or button operated flush
5 care documentation
6 spoken commands
7 individually formed toilet seat
8 voice guide
9 automatic dispensing of toilet paper
10 bidet with dryer
11 urine meter /analyzer

The same questions were also given to 21 secondary users (care givers) and 12 tertiary
users (managers of health care organizations and insurances). While primary users
generally gave lower scores to problems than secondary and tertiary users, the highest
12 P. Mayer et al. / Towards Smart Adaptive Care Toilets

ranked problems all dealt with the height of the toilet and the transfer to and from the
seat followed by hygiene issues.
Moreover, a set of questions regarding proposed support functions of an ICT
enhanced toilet was ranked by the same users. Here fall detection, emergency
recognition and custom settings were the most favored functions. Overall, the idea of
iToilet to provide ICT enhanced physical support for independent and safe use was
found to be clearly coincident with existing problems and appreciated by the users. The
users’ ranking of iToilet functions is documented in more details in [6, 8] and led to
prioritized features for prototyping.
While all of the high priority requirements (see Table 1) were implemented in the
iToilet prototypes some of the medium priority items were either only tested in
laboratory (automatic dispensing of toilet paper) or left for the design of a future
product (self-sanitizing seat and bowl, shelf/tray area, individually formed toilet seat
etc.) where commercial solutions are already available. This approach allowed the
consortium to focus on the most important user needs when designing, implementing
and testing prototypes.

2.2. iToilet Prototypes

Based on the requirement lists (Table 1) two different prototypes (PT), both based on
the existing sanitary products “Lift-WC” and “mobile toilet chair” of company Santis
Kft. [9] with the same interaction concept [10] and base functionality, were developed:
a chair-like prototype of a motorized stand-up support “PT1” (see Fig.1 left) which can
be easily placed over any existing toilet bowl (for single users at home, appropriate for
temporary use without complex installation as only the seat is movable) and another
prototype “PT2” based on a wall-mounted base toilet system where the whole toilet
including the bowl is movable. PT2 needs more installation efforts and might be more
suitable for use in multi-user settings of institutions (see Fig.1 right).

Figure 1. Chair-like iToilet prototype PT1 (left) and wall mounted prototype PT2 (right) as installed for user
tests (see section 2.3 iToilet Field Tests).
Despite the taboo area of toilet and personal hygiene considerable involvement of
users in participatory design activities could be accomplished [11, 12, 13]. In the
P. Mayer et al. / Towards Smart Adaptive Care Toilets 13

design of toilet paper dispensers, speech recognition, different buttons and armrests the
users actively contributed their ideas to the development in the form of co-design
activities.

Both prototype versions offered the following features:

Base system:
x Adjustable motorized height and tilt of seat (approx... 40-75cm height and 0-
10 respectively 30 degree tilt), position monitoring by sensors
x Arm rests for support on both sides, foldable for wheelchair transfer
x Manual operation by a user interface with big buttons featuring clear symbols
and tactile feedback (on both sides)
x Motor amperage draw monitoring and safety contacts for collision detection
x Wireless configuration via microcontroller, Wifi and MQTT (see next item)

Add-ons via wireless configuration:


x Hands-free operation via speech recognition (German and Hungarian tested)
x User identification e.g. by RFID tag at door entrance (for multi-user settings)
x Detection of users entering the room, optional audio signal greeting
x Recall of user preferences and toilet positioning for sitting-down (upon entry)
x Automatic adjustment of stored sitting position with a single user command
x Automatic adjustment of stored stand-up position with single user command
x Instructions and information to users by Text-to-Speech synthesis
x Monitoring of user presence and usage time for emergency detection
x Monitoring for falls by a dedicated (commercial) fall detector
x Storing of preferred individual toilet positions and user settings when leaving
the toilet; optional entry in care documentation system
x Emergency call via GSM with bi-directional hands-free voice connection
upon manual activation, speech command or detected emergency

Additional features in PT2 (wall mounted system):


x Activation of flush by command or automatic mechanism
x Bidet washing and drying included in seat with activation by command

As captured by this list, ICT together with the mechanical base modules not only
allows hands-free control of functions via speech recognition but also enables the
system to perform actions like automatic change of positions or flushing based on user
preferences while the emergency recognition component monitors safe use and
instructions can be given to users by the system or via the GSM voice link if needed.

2.3. iToilet Field Tests

The PT2 system was tested autonomously under real daily use conditions by a total of
50 users, 23 clients of a Multiple Sclerosis day care center and 27 patients of a
rehabilitation clinic [14], during a period of at least 4 weeks after approval by the
appropriate research boards had been granted. The testing in institutions was chosen
14 P. Mayer et al. / Towards Smart Adaptive Care Toilets

because of easier access to the necessary number of users with fitting profile and the
better support to users in case of problems.
After this four-weeks experience of the toilet, during the final interviews all
provided functions (c.f. 2.2) were rated as useful by at least 75 to 100% of users, except
the bidet function, which was not so well accepted at the day care center and the
recognition of spoken commands (as alternative to pressing buttons) which achieved
only around 60% rating on both test sites. The reliability of the prototype functions was
rated high for the core support functions but some additional features like the speech
recognition or the emergency monitoring were criticized because of too many false
positives (caused by the initially underestimated range of behaviors).
During the field trials around 500 toilet visits have been logged. The use patterns,
ranging from 5 minutes (on average) to sometimes 30 minutes, clearly showed that
individual, higher sit-down and stand-up heights have been preferred by the majority of
users which is in line with the good rating from interviews. The user identification by
RFID tag was technically reliable but was not always easy to use because the
participants were required to hold the personal tag against a reader. For plain home use
for single users this will not be necessary, for institutional use additional more
ergonomic methods of user identification should be investigated.

2.4. iToilet Ethics Approach and Autonomy Study

iToilet implemented comprehensive ethical governance and monitoring throughout the


project, because of the sensitive research area. Besides dedicated reports dealing with
safety and ethics, the MEESTAR tool [15] was successfully applied in several
consortium meetings to elaborate ethical implications of the iToilet approach and to
raise awareness among consortium members [13].
As a specific extra effort after the field trials the users at the day care center were
asked to engage in discussions with an independent researcher who was not involved in
the earlier project work. The aim of this in-depth fieldwork with 15 people with
Multiple Sclerosis, most of them participants of the recent field trials, was to get better
insight into the attitudes of the users towards technological assistance and preferred
autonomy support in their daily practice. In this way, the study triggered deeper
reflections on the implications of technology use for assisting people as well as for
monitoring user behavior. The interviewees confirmed in these open discussions the
benefits of the iToilet concept of providing personalized toilet settings. Still, they also
emphasized that, depending on their “daily shape” and their physical capabilities, from
time to time they want to try refrain from using assistive technology to train their
remaining strength [5].
These positive results of the iToilet tests which have demonstrated the benefits of
an ICT enhanced toilet encouraged us to prepare the investigation of solutions similar
to iToilet in additional areas. In the next section, we describe our ongoing study in
public settings (e.g., restaurants).

3. Toilet4me for Semi-Public Use

The Toilet4me study project (www.toilet4me-project.eu), which started after the


completion of iToilet project, addresses people of all ages with impairments/disabilities
P. Mayer et al. / Towards Smart Adaptive Care Toilets 15

and their needs when using a toilet outside home in public or semi-public environments
(e.g. in community centers, shopping malls, theatres, hotels etc.).
The main idea of Toilet4me is simple but challenging: As iToilet already
demonstrated the benefits for supporting people during toilet use at home (or in a care
institution), we now want to proceed and explore the feasibility of this type of
supportive toilet in places outside the own home. Offering the support in places which
people frequently visit or would like to visit, if appropriate toilet facilities would be
available, should allow people to closer participate in society, which should contribute
to their independence and quality of life. A service or technical solution which allows
the users to always “take their own preference settings with them in the pocket”, in the
form of a digital personal use profile, can facilitate a lot of new possibilities for several
user groups, inside and outside the home. Toilet4me together with end users (older or
disabled people, their caregivers and managers of public places and hotels) will
elaborate the requirements for such service.
It is expected that the principles of accessible toilets for home or institutional use
can also be applied in semi-public settings but of course challenges like costs for
installation, maintenance and service as well as suitable methods for the safe and easy
data exchange for preferences have to be solved. The Toilet4me project shall deliver
facts for informed estimations about the chances of a successful market introduction.

4. Discussion and Conclusions

iToilet has demonstrated that ICT enhanced physical support can assist people in using
a toilet without personal help while safety is preserved. Accessible and barrier-free
toilets are important for the autonomy of older or disabled users which otherwise would
be without choice and dependent on personal assistance in this taboo area. This affects
people both in their daily life at home, but also when going out and when participating
in social life as they are often guided by the availability of suitable toileting facilities.
While toilet rooms with traditional barrier-free design are a step into the right
direction and further improvements towards “changing places” lower the barriers for
many users, additional motorized support for optimum seat height for all phases of
toilet use and for people of all body sizes can be essential elements for supporting
many old or disabled people. ICT enhancements can aid the operation and enhanced
safety features can give users the feeling of safety even when no personal assistant is in
reach. Digital technologies can add a smart adaptive layer to the assistive base
technology which empowers the users to customize the assistive service according to
their individual needs wherever they are and to select the level of safety they prefer and
at the same time help providers to integrate the technology based support smoothly into
modern (health-) care services. Thereby a sound individual balance between wanted
autonomy and personal assistance in a location independent way can be achieved.
Well designed, “really” barrier-free smart and adaptive care toilets, enabling
personalized assistive settings and the integration with other health services, also might
open new market fields from the economically underdeveloped core AAL market
towards accessible tourism not only for public places like theaters or museums but also
for hotels and recreation or wellness sites. This is currently under investigation in the
Toilet4me project, as outlined above.
16 P. Mayer et al. / Towards Smart Adaptive Care Toilets

Acknowledgement

The iToilet project (AAL-2015-1-084) and the Toilet4me project (AAL-2018-5-101-


SCP) are both funded in the AAL Programme in part by the EU and by national
funding agencies of the project partners (in Austria: BMVIT / FFG).

References

[1] J. F.M. Molenbroek, J. Mantas, R. De Bruin (eds.), A Friendly Rest Room: Developing toilets of the
future for disabled and elderly people, IOS press, Amsterdam, 2011.
[2] P. Chamberlain, H. Reed, M. Burton, G. Mountain, ‘Future Bathroom’, What to make? Or How to
Make? Challenges in meeting sustainable needs. In: Sustainable Intelligent manufacturing, IST Press,
Portugal, 2011, pp. 777-784.
[3] A. Kira, The Bathroom, Viking Press, New York, 1976.
[4] Changing Places Consortium, Changing Places: The Practical Guide, 2013, online: http://www.changing-
places.org/ [last access: 31 Jan 2019].
[5] F. Güldenpfennig, P. Mayer, P. Panek, G. Fitzpatrick, An Autonomy-Perspective on the Design of
Assistive Technology: Experiences of People with Multiple Sclerosis, ACM CHI Conf on Human
Factors in Computing Systems (CHI 2019), May 4-9, 2019, Glasgow, Scotland, UK (to appear).
[6] T. Pilissy, A. Tóth, G. Fazekas, A. Sobják, R. Rosenthal, T. Lüftenegger, P. Panek, P. Mayer, Towards a
situation-and-user-aware multi-modal motorized toilet system to assist older adults with disabilities: A
user requirements study, 15th IEEE Intern Conf. on Rehabilitation Robotics (ICORR), QEII Centre,
London, UK, July 17-20, 2017, pp. 959-964, DOI: 10.1109/ICORR.2017.8009373.
[7] P. Panek, G. Fazekas, T. Lueftenegger, P. Mayer, T. Pilissy, M. Raffaelli, A. Rist, R. Rosenthal, A.
Savanovic, A. Sobjak, F. Sonntag, A. Toth, B. Unger, On the Prototyping of an ICT-Enhanced Toilet
System for Assisting Older Persons Living Independently and Safely at Home, Studies Health
Technology Informatics, vol. 236, IOS press, DOI 10.3233/978-1-61499-759-7-176, 2017, pp. 176-183.
[8] A. Sobják, T. Pilissy, G. Fazekas, A. Tóth, R. Rosenthal, T. Lüftenegger, P. Mayer, P. Panek, iToilet
project deliverable D1.1 (public version). User Requirements Analysis showing three priority level,
2016, http://www.itoilet-project.eu, last access: 20.3.2019.
[9] Sanitary company Santis Kft., Debrecen, Hungary, http://www.santis.org/, last access: 20.3.2019.
[10] P. Panek, P. Mayer, Initial Interaction Concept for a Robotic Toilet System, Proc of the Companion of
the ACM/IEEE Intern Conf on Human-Robot Interaction (HRI 2017), March 6-9, 2017, Vienna,
Austria, doi: 10.1145/3029798.3038420, pp. 249-250.
[11] P. Mayer, P. Panek, Involving Older and Vulnerable Persons in the Design Process of an Enhanced
Toilet System, ACM CHI Conf on Human Factors in Computing Systems (CHI 2017), Denver,
Colorado, May 6-11, 2017 doi: 10.1145/3027063.3053178, pp. 2774 – 2780.
[12] R. Rosenthal, F. Sonntag, P. Mayer, P. Panek, Partizipation als Instrument zur Optimierung der
Selbstwirksamkeit für Menschen mit der Diagnose Multiple Sklerose im Rahmen des EU Projektes
iToilet, Poster, Pflegekongress, Austria Center Wien, 30 Nov – 1 Dec, 2017.
[13] P. Panek, P. Mayer, Ethics in a Taboo-Related AAL Project, in: F. Piazolo, St. Schlögl (eds.),
Innovative solutions for an ageing society, proc of Smarter Lives 18 conf, 20 Feb 2018, Innsbruck,
Pabst Science Publishers, Lengerich, ISBN: 978-3-95853-413-1, pp. 127-133.
[14] G. Fazekas, et al., Assistive technology in the toilet ̺ Field test of an ICT̺enhanced lift̺WC,
accepted for 15th EFRR Congress 2019, April 15-17, 2019, Berlin, Germany (to appear).
[15] A. Manzeschke, K. Weber, E. Rother, H. Fangerau, Ergebnisse der Studie „Ethische Fragen im Bereich
Altersgerechter Assistenzsysteme“, Berlin, (VDI/VDE), 2013.
dHealth 2019 – From eHealth to dHealth 17
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-17

Evaluation of Depth Cameras for Use as an


Augmented Reality Emergency Ruler
Michael SCHMUCKERa,1, Christoph IGELb and Martin HAAGa
a
GECKO Institute, Heilbronn University of Applied Sciences, Heilbronn, Germany
b
Educational Technology Lab, DFKI, Berlin, Germany

Abstract. Children are rarely affected by medical emergencies. The experience of


doctors or paramedics with child emergencies is correspondingly poor. The
anatomical features and individual calculations make such an emergency much more
error-prone than a comparable adult emergency. Particularly in dose calculations,
critical errors occur time and again. Since these calculations are based on the child’s
weight, which is preclinically often derived from the size of the child, the number
of errors can be minimized with an assistance service that performs all calculations
based on the size. Technically, it is possible to detect the size with a depth camera,
which is occasionally installed in smartphones or head-mounted displays. In order
to investigate to what extent these cameras provide precise results, a study with 33
children was carried out. The children were measured with both an emergency ruler
and an augmented reality app with associated smartphone with depth camera. The
result is that the depth camera does not provide significantly different results than
an emergency ruler. This allows further research, e.g. the automatic recognition of
patients with the help of machine learning or usability studies, to be tackled.

Keywords. child, emergencies, resuscitation, mobile applications, user-computer


interface

1. Introduction

Medical emergencies involving children are rare occurrences. In Germany, experts


estimate that about 4000 children are resuscitated every year, 1000 of them preclinically.
On the other hand, there are about 30,000 doctors with the additional designation
"emergency medicine". An emergency physician would thus mathematically reanimate
a child every 30 years [1]. The resulting lack of routine, combined with anatomical
features in children, individual dose calculation and time-critical procedures, make
pediatric emergencies error-prone. In a survey of 104 emergency physicians conducted
by Zink et al., 88% of those questioned stated that they had already felt a sense of anxiety
or personal overexertion during work. In response to the question in which situation this
feeling arose, 84% of the physicians answered that they had had this feeling in a pediatric
emergency, followed by polytraumatized patients (20%) and obstetric emergencies
(18%). Multiple responses were possible [2]. Due to the lack of routine, it is essential to
have the emergency guidelines [3] issued regularly by the European Resuscitation
Council (ERC) at hand. The process as well as the activities themselves differ
significantly in children and adults. According to the current guidelines, children are

1
Corresponding Author: Michael Schmucker, Heilbronn University of Applied Sciences, Max-Planck-
Str. 39, 74081 Heilbronn, Germany; E-Mail: [email protected].
18 M. Schmucker et al. / Evaluation of Depth Cameras for Use as an Augmented Reality Emergency Ruler

resuscitated, for example, after five initial ventilations in a 15:2 rhythm (thoracic
compression to ventilation), while adults start in a 30:2 rhythm. Intubation is also more
difficult due to the anatomical characteristics [4]. However, the most frequent errors
occur in the dosage of medication. Children are particularly susceptible here because the
dose must be calculated or estimated individually, depending on their weight. As Young
and Korotzer have pointed out in their systematic analysis, parental estimation is the
most accurate method for determining weight, followed by the size-based method where
weight is derived from the child's height using survey data (e.g. German Health Interview
and Examination Survey for Children and Adolescents (KiGGS) [5]). Medical doctors'
weight estimates are not accurate [6]. But even if the weight is known, calculation errors
occur due to nervousness or hecticness. As part of a study by Hoyle et al., 125 out of 360
prescriptions were made with dosage errors. Especially in preclinical environments these
errors happen frequently. The reason for this is probably the lack of experience of
emergency paramedics or emergency physicians with pediatric emergencies [7]. In a
further study with simulated resuscitation, incorrect doses of a potency of 10 (1000% of
the recommended dose) took place with one of 32 prescriptions - these can pose a life-
threatening risk [8]. Young, inexperienced physicians, who make up the majority of
emergency physicians, are particularly susceptible [9]. Based on these facts, physicians
want electronic tools, such as a computer program or a calculator, because they can
demonstrably minimize calculation errors [8][10]. Such a computer program could
directly perform all necessary calculations, be it the dosage of drugs or the current
strength of the defibrillator. But the weight or height still has to be known first. Even
better, of course, would be the automatic recognition of the weight or size. This means
that calculation errors can be ruled out if the detection is error-free. At the moment, so-
called emergency rulers (e.g. Browselow Tape [11] or PediaTape [12]) are the most
important aids alongside the guidelines. An emergency ruler is placed next to the head
of a child. A color code can then be read off at the feet, which can be used to indicate
dosage recommendations or age-appropriate reactions, e.g. to the Glasgow Coma Scale.
This information is often stored in a brochure supplied with the ruler (Figure 1). The
dosage recommendations may also be printed directly on the tape.

Figure 1. PediaTape with corresponding booklet [12].

Technically, size recognition is possible with the help of depth cameras, some of
which are built into smartphones (Asus Zenfone AR, Lenovo Phab2 Pro) or head-
mounted displays (Microsoft HoloLens). These cameras are designed for augmented
reality (AR) applications. They are necessary for the most error-free placement of AR
elements in a room. These cameras can be programmed with the Tango Framework from
Google [13] among others. With the help of the depth camera and Google Tango
Framework it is possible to perform accurate measurements. The depth camera scans the
M. Schmucker et al. / Evaluation of Depth Cameras for Use as an Augmented Reality Emergency Ruler 19

room and gets to know the environment of the user. Thus, it is possible to carry out the
measurements within one to two seconds from any point and at any angle near the object
to be measured. But the question is, how accurate are these cameras in reality? Only
exact cameras are suitable for this type of application. This paper presents a study in
which 33 children aged between 3 and 6 years were measured with an emergency ruler
and an augmented reality app on a smartphone with a depth camera. The results were
then compared. The aim was to find out to what extent the size measurement functions
solidly and thus further work in this research area is meaningful.

2. Methods

First a systematic PubMed and Google Scholar search according to the PRISMA scheme
was indicated to investigate if there are already similar works or important preliminary
works.
Subsequently, a study design was developed that allows to compare an augmented
reality app on a standard smartphone with depth camera (Asus Zenfone AR) with an
emergency ruler (PediaTape). For this purpose, an augmented reality app was written,
which uses Libraries of the Google Measure App [14], an already existing app for
measuring lengths. Using the Google Measure App (tango version) directly was
unfortunately not possible because it often rounds off the results.
Afterwards 33 children aged between 3 and 6 years were measured one after the
other in an interior with daylight lying on the floor, first with the PediaTape, then with
the Zenfone AR. The two values were recorded. For further research and quality
assurance, the weight of each child (with clothing) was also noted. It is a within-group
design with one independent variable (measurement device) that adopts two values
(emergency ruler, augmented reality app). An exemplary measurement is shown in
Figure 2. For data protection reasons, the measurement for the illustration is simulated
with a doll.

Figure 2. Screenshot of the re-enactment of the measurement with PediaTape and Augmented Reality App.
20 M. Schmucker et al. / Evaluation of Depth Cameras for Use as an Augmented Reality Emergency Ruler

To compare two measurement methods of a variable and to decide if there is a


significant difference between the two methods, the Bland-Altman’s limits of agreement
are best suited. The test is identical to the Tukey mean-difference plot [15], but became
popular in medical statistics by Bland and Altman [16][17]. Thereby, the differences
(delta)
ௌ ାௌ
S1-S2 of the individual measurements (S1, S2) are plotted against their mean ( భ మ ).

This results in the following graph for the diagram:

ௌభ ାௌమ
ܵሺ‫ݔ‬ǡ ‫ݕ‬ሻ ൌ ቀ ǡ ሺܵଵ െ ܵଶ ሻቁ (1)

The upper and lower limits of agreement (LOA) are defined as ݀ േ 1.96s at a
significance level of ߙ= 0,05 where ݀ represents the mean difference and s the standard
deviation of pairwise differences. If 95% of the measurements lie within the LOA, both
methods can be considered interchangeable, i.e. both methods are equally appropriate
[17].
To check if a Bland-Altman Plot might be applicable, a one-sample t-test comparing
the mean deviation of the difference (S1-S2) to the reference value zero was executed as
preliminary work. In the optimal case, the difference of the individual measurements
would be zero - then both measurements would be identical. The following hypothesis
is tested:

H0: There is no difference between an augmented reality app running on the Asus
Zenfone AR and a PediaTape emergency ruler in the quality of
measuring.

If there are large deviations, this means that there is a significant difference between
the two measurement methods and a Bland-Altman plot would not be necessary.

3. Results

3.1. Related Work

Although there is some interesting work done with commercially available depth
cameras, such as hand gesture recognition [18], it is never evaluated how accurate these
cameras actually are. But in the case of this paper and other cases, an exact measurement
is the basis for further applied research. There are also various studies that attempt to
solve the relevant and known problem of dosage errors in pediatric emergencies with the
aid of an app [19][20][21]. But no attempt to automate the process of size recognition -
and thus the basis of all calculations - is available. Some of these apps also use the age-
based formula [19], which, according to the systematic review of Young and Korotzer,
is less accurate than the size-based estimate [6]. All dosage apps found have in common
that the values must be entered manually. However, these metrics must first be known
(age, weight or height). There are also apps that aim to digitally map the analog
emergency guidelines that are available [22].
M. Schmucker et al. / Evaluation of Depth Cameras for Use as an Augmented Reality Emergency Ruler 21

3.2. Preliminary Work

Table 1 and Table 2 show the results of the one sample t-test to test value = 0. There is
no significant difference (diff) between the two measurement methods in the quality of
measuring (t = -1.022; p = 0.314). The difference of the individual measurements does
not deviate significantly from zero (x̅ = 0.314; s = 2.044). H0 can thus be retained.

Table 1. Difference (diff) between PediaTape and Augmented Reality App (in cm). One sample statistics
(SPSS Output).
N Mean Std. Deviation Std. Error Mean
diff 33 -0.364 2.044 0.356

Table 2. Difference (diff) between PediaTape and Augmented Reality App (in cm). One sample t-test to test
value 0 (SPSS Output).
95% Confidence Interval of the
Difference
t df Sig. (2-tailed) Mean Difference Lower Upper
diff -1.022 32 0.314 -0.364 -1.088 0.361

3.3. Bland-Altman’s limits of agreement

As can be seen graphically in Figure 3, at least 95% of the measurements are within the
limits of agreement (Mean േ 2 Standard Deviation (SD)). Thus, it can be said that there
is no significant difference in the quality of the two measurement methods in terms of
size detection. In most cases (15 of 33; 45%) there was a deviation of 0 to 1 centimeter,
in 25 cases (75%) the deviation was two centimeters or less. Deviations greater than three
centimeters were rare (3 of 33; 9%). There was no variation greater than 5 centimeter.

Figure 3. Bland-Altman Plot.


22 M. Schmucker et al. / Evaluation of Depth Cameras for Use as an Augmented Reality Emergency Ruler

To exclude a proportional bias a linear regression can be made. For this purpose, the
mean (mmean) is tested for hypothesis H:

H: The coefficient of mean is zero.

As can be seen in Table 3, the t value is not significant (t = -1.389; p = 0.175). Thus,
H can be maintained. Within the available data a proportional bias can be excluded.

Table 3. Linear regression to test for proportional bias for dependent variable difference (diff). SPSS Output.
Unstandardized Coefficients Standardized Coefficients
Model B Std. Error Beta t Sig.
(Constant) 6.783 5.158 1.315 0.198
mean -0.065 0.047 -0.242 -1.389 0.175

4. Discussion

Figure 3 shows graphically that there is no significant difference between the


measurement of the augmented reality app and the emergency ruler to the significance
level α = 0,05. This is also confirmed by the t-test carried out in advance (Table 1 and
Table 2). However, it should be noted that the sample size (n = 33) is only slightly above
the rule of thumb proposed by Hogg and Tanis, among others, that for a sample size of
25 to 30 participants it can be assumed that the central limit theorem applies and that the
sample is normally distributed [23]. Nevertheless, it can be seen that the measurements
vary only in a few cases. As the linear regression shows, the measurement quality does
not depend on the size of the child (Table 3).
It must also be considered that even emergency rulers only give approximate weight
estimates. These vary between one and six kilograms depending on the size of the child
(PediaTape). This weight is ultimately critical to the dosage of medication and other
procedures.
This result confirms that the measurement with an augmented reality app on a
smartphone with depth camera (Asus Zenfone AR) provides good results indoors and
allows further applied research with the equipment. It should be noted that daylight can
influence the measurement with infrared sensors. This has to be tested in further studies.
At this point it should also be noted that Google discontinued its ambitious AR project
Tango in March 2018 in favor of the new technology ARCore. Due to the expensive
additional hardware required (depth camera with infrared sensor), it was not able to assert
itself [24]. However, the libraries are still available in the Google Archive [25] and the
hardware is still for sale [26]. ARCore, as well as Apple's competitor ARKit 2 [27], can
be used on many current smartphones. For exact AR applications, however, the use of a
depth camera is necessary.
The long-term goal is of course to provide an assistance service that can be used in
the event of a pediatric emergency. Further questions need to be clarified, for example
whether automated size recognition can take place with the help of machine learning.
The app would have to be able to classify people (children) without errors and then
calculate the size with the help of the depth camera. It is also possible to imagine that
there would then be intelligent in-situ support that would guide the user through the
acutely necessary process on the basis of the guidelines and would also, for example,
provide operating assistance for medical technology. This can then be displayed at the
M. Schmucker et al. / Evaluation of Depth Cameras for Use as an Augmented Reality Emergency Ruler 23

right time. Such intelligent assistance services have, for example, been researched in the
project A.L.I.N.A. [28] funded by the German Federal Ministry of Education and
Research (BMBF). It would also be helpful for operation during an emergency if the
assistance service was not running on a smartphone but on a head-mounted display. This
requires further research in this area, especially in the area of usability.

References

[1] ÄrzteZeitung.de, Die Angst fährt mit zum Einsatz, https://www.aerztezeitung.de/medizin/krankheiten


/herzkreislauf/article/858981/kinder-notfaelle-angst-faehrt-einsatz.html, last access: 10.01.2019.
[2] W. Zink et al., Invasive Techniken in der Notfallmedizin, Anaesthesist 2004;53:1086.
[3] I. Maconochie et al., European Resuscitation Council Guidelines for Resuscitation 2015 Section 6.
Paediatric life support, Resuscitation 2015;95:223-248.
[4] J. Lee-Jayaram, L. Yamamoto, Alternative airways for the pediatric emergency department, Pediatr Emerg
Care 2014;30(3):191-9.
[5] Robert Koch-Institut, KiGGS – Studie zur Gesundheit von Kindern und Jugendlichen in Deutschland,
https://www.kiggs-studie.de/deutsch/home.html, last access: 10.01.2019.
[6] K. Young, N. Korotzer, Weight Estimation Methods in Children: A Systematic Review, Ann Emerg Care
2016;68(4):441-451.
[7] J.D. Hoyle et al., Medication dosing errors in pediatric patients treated by emergency medical services,
Prehosp Emerg Care 2012;16(1):59-66.
[8] J. Kaufmann et al., Medikamentenfehler bei Kindernotfällen – eine systematische Analyse, Dtsch Arztebl
Int 2012;109(38):609-16.
[9] M. Gordon et al., Improved junior paediatric prescribing skills after a short e-learning intervention: a
randomized controlled trial, Arch Dis Child 2011;96(12):1191-4.
[10] A.D. Stevens et al., Color-coded prefilled medication syringes decrease time to delivery and dosing errors
in simulated prehospital pediatric resuscitations: A randomized crossover trial, Resuscitation 2015;96:85-
91.
[11] M. Meguerdichian et al., The Broselow tape as an effective medication dosing instrument: a review of
the literature, J Pediatr Nurs 2012;27(4):416-20.
[12] Pediatape, PediaTape – A Better Pediatric Tape, https://pediatape.com, last access: 10.01.2019.
[13] Google Tango, Tango Concepts, https://developers.google.com/tango/overview/concepts, last access:
20.02.2018.
[14] Google Play Store, Measure – Quick Everyday Measurements, https://play.google.com/store/
apps/details?id=com.google.tango.measure&hl=de, last access: 19.03.2019.
[15] W.S. Cleveland, Visualizing data, At&T Bell Laboratories, Murray Hill, N.J., 1993.
[16] D.G. Altman, J.M. Bland, Measurement in medicine: the analysis of method comparison studies, The
Statistician 1983;32:307-317.
[17] J.M. Bland, D.G. Altman, Statistical methods for assessing agreement between two methods of clinical
measurement, Lancet 1986;327(8476):307-10.
[18] Z. Ren et al., Robust hand gesture recognition based on finger-earth mover’s distance with a commodity
depth camera, In Proceedings of the 19th ACM international conference on Multimedia (MM ’11). ACM,
New York, NY, USA, 1093-1096.
[19] S. Banker, PediCalc medical app, customizable pediatric drug dosing at the touch of a button,
https://www.imedicalapps.com/2012/02/pedicalc-medical-app-pediatric-drug-dosing/, last access:
12.01.2019.
[20] Eunoia Info Services, Paediatricks, https://paediatricks.com, last access: 12.01.2019.
[21] iAnesthesia, Pedi Safe, https://www.ianesthesia.org/apps/pedi-safe, last access: 12.01.2019.
[22] M. Schmucker et al., Development of an accommodative smartphone app for medical guidelines in
pediatric emergencies, Stud Health Technol Inform 2014;198:87-92.
[23] R.V. Hogg et al., Probability and Statistical Inference, Pearson Education Ltd., Harlow GB, 2015.
[24] WIRED Staff, Augmented Reality: Google macht Schluss mit Tango, https://www.wired.de/collection/
tech/google-beendet-sein-augmented-reality-projekt-tango, last access: 10.01.2019.
[25] googlearchive, Project Tango Java API Example Projects, https://github.com/googlearchive/tango-
examples-java, last access: 10.01.2018.
[26] Zenfone AR, Erlebe neue Welten, https://www.asus.com/de/Phone/ZenFone-AR-ZS571KL/, last access:
10.01.2018.
[27] Apple Developer, ARKit 2, https://developer.apple.com/arkit/, last access: 10.01.2019.
24 M. Schmucker et al. / Evaluation of Depth Cameras for Use as an Augmented Reality Emergency Ruler

[28] S. Blaschke et al., Intelligent Assistance Services and personalized Learning Environments for
Knowledge-and Action support in the Interdisciplinary Emergency Room, Medizinische Klinik-
Intensivmedizin und Notfallmedizin 2016;111:366-366.
dHealth 2019 – From eHealth to dHealth 25
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-25

Electronic Medical Records for Mental


Disorders: What Data Elements Should
These Systems Contain?
Nasim HASHEMIa, Abbas SHEIKHTAHERIb,1, Niyoosha-sadat HASHEMIc and Reza
RAWASSIZADEHd
a
Iranian Social Security Organization, Tehran, Iran
b
Health Management and Economics Research Center, School of Health Management
and Information Sciences, Iran University of Medical Sciences, Tehran, Iran
c
Student, Islamic Azad University, Tehran North Branch, Tehran, Iran
d
Department of Computer Science, metropolitan college, Boston University, US

Abstract. Identifying data elements of electronic medical record systems (EMRs)


is one of the essential steps for the comprehensive and proper health data
collection. The aim of this study was to determine the data elements required for
EMRs in the field of mental disorders. We conducted a literature review and also
we randomly selected 50 medical records of patients with mental disorders to
identify a preliminary list of essential data elements for EMRs for mental
disorders. Then, 33 mental health specialists were surveyed to validate the list of
data elements through a questionnaire. We identified that health data elements of
EMRs for patients with mental disorders can be categorized into seven classes
(demographic data of patients, administrative data of physicians, administrative
data of patients, history, clinical data, treatment, and financial data) and 10
subclasses. After the validation process, 140 essential data elements for EMRs for
patients with mental disorders were introduced.

Keywords: data elements, minimum data set, electronic medical record systems,
mental disorders

1. Introduction

Today, healthcare systems have moved towards the utilization and use of electronic
technologies such as mobile health and electronic medical records (EMRs) [1]. The use
of these technologies can lead to reduce medical errors, improve quality of health
services, increase productivity, improve information quality, support for clinical
decision-making, reduce healthcare costs, and better patient-physician communication
and education [2-6].

1Corresponding Author: Abbas Sheikhtaheri, Health Management and Economics Research Center,
School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran,
E-Mail: [email protected]
26 N. Hashemi et al. / Electronic Medical Records for Mental Disorders

Despite the benefits of EMR, its utilization in the mental health field is more
modest compared to other areas [7]. Today, the very high growth of medical
knowledge has led to an increase in the number of clinical specialties and hence more
than one expert involved in the treatment of patients, due to nature of mental illness
[7]. In such an environment, many medical records could be created based on the
treatment process by many specialists. Therefore, patients’ data are scattered [6].
There are many barriers and challenges for developing and implementing
electronic systems such as EMRs [6, 8, 9]. In this regard, determining the appropriate
and consensus-based data elements for developing EMRs is important. Data elements
play an important role in collecting and documenting information about patients in
their EMRs [10]. The purpose of Minimum Data Set, as core health data elements, is to
standardize data items and their definitions [11]. On the other hand, for the purpose of
developing electronic medical record systems for patients with mental disorders and
customizing them according to the needs of psychiatric patients, one of the main steps
is to identify the data elements required by the electronic medical records in this area.
Many promising studies have been done on the design of data elements in different
medical fields [12-14]. Furthermore, there are many studies related to data elements of
EMRs in different fields [10, 15, 16]. In addition, some studies have been conducted on
the design of psychiatric assessment forms or mental illness registry [17, 18].
Organizations such as UK National Health Service [19] set Mental Health Minimum
Data Set (MHMDS) and Australian Institute of Health and Welfare (AIHW) [20] has
the same elements in medical records of patients with mental disorders as the 27
elements. However, these data elements have not been developed and customized for
electronic systems. Some promising studies have been conducted for implementation
of EMRs in mental institutions [21-23], but they have not reported data elements for
EMRs. Therefore, there are few studies regarding the data elements required for EMRs
in mental disorders. The aim of our study was to determine minimum data elements
required toward developing EMRs for patients with mental disorders.

2. Methods

This descriptive cross-sectional study was carried out in 2018. To determine the data
elements for the EMRs for patients with mental disorders, a literature review [17-20]
were conducted. In the next step, 50 medical records of psychiatric patients were
randomly selected from one of the specialized psychiatric hospitals, in Tehran, Iran.
All patients’ data were completely anonymized and the study has been proved by the
ethics committee as well. The medical records were selected based on the
psychological disorders diagnosis codes of International Classification of Diseases,
fifth chapter (F00-F99). From each of the 10 blocks of this chapter, five records were
randomly selected. Using a checklist, the contents of these records were extracted and
qualitatively were analyzed to identify the common data elements used by physicians.
Then, we aggregated and classified the data elements identified from the literature
review and medical records in some data classes and subclasses. In the next phase, data
elements were validated through a survey on psychologists and psychiatrists (Figure 1).
To this end, a questionnaire was designed in two parts. The first part contains the
demographic data of participants, and the second part was related to the data elements
for EMRs, which were classified into seven data classes. The scale of this
questionnaire was two choices: necessary and unnecessary. The content validity of this
N. Hashemi et al. / Electronic Medical Records for Mental Disorders 27

tool was confirmed by three relevant experts (in the fields of mental health, health
information management and medical informatics). We used the Kuder-Richardson
coefficient for its reliability. The paper-based questionnaires were handed to 45 mental
health specialists (psychologists and psychiatrists) with a minimum of 10 years of work
experience in the relevant field. These specialists were selected from three psychiatric
hospitals (15 participants from each hospital). Finally, 33 specialists participated.



Literature review

Classification Validation Final data
Preliminary
data of data process elements
elements elements

Extraction of data
elements from  Excluded
medical records
 data
elements

Figure 1. The steps of the study

Data analysis was done using descriptive statistics and SPSS software, version 20.
After a survey by specialists, the agreement on each of the given data elements was
calculated as percentages. All data elements with less than 75 percent agreement were
considered as unnecessary data elements and excluded [24, 25]. Other data elements
were suggested as the necessary data elements for the electronic medical records of
mental disorders. The ethics committee of Iran University of Medical Sciences, Tehran,
Iran approved this study and confidentiality of patients’ data was observed.

3. Results

The demographic characteristics of the participants are presented in Table 1.

Table 1. Demographic status of participants in the research

Demographic characteristics Total


Number Percent
Gender Man 30 90.9
Woman 3 9.1
Age group (years) 40> 3 9.1
40-50 6 18.2
50< 24 72.7
Type of job Clinical 29 87.9
Clinical and academic 4 12.1
work experience (years) 20> 9 27.2
20-30 12 36.4
30> 12 36.4
28 N. Hashemi et al. / Electronic Medical Records for Mental Disorders

Table 2. Frequency distribution of data elements

Number of data Number of Number of final


Classes of data Subclasses elements (Pre- deleted data data elements
poll) elements
Demographic data of 24 6 18
patients
Administrative data of 6 1 5
physicians
Administrative data of 8 - 8
patients
History Medical 16 - 16
Social 8 - 8
Allergy 2 - 2
Family 1 - 1
Clinical data Sign and symptoms 42 - 42
Chief compliant 3 - 3
Laboratory 8 3 5
Diagnosis 5 - 5
Treatment Medication 12 1 11
Electroconvulsive 14 - 14
Financial data 6 4 2
Total 155 15 140

In total, 155 data elements were identified and classified in seven data classes and
10 sub-classes for the EMRs. Of these, a total of 140 data elements were validated by
participants as necessary data elements for EMRs, as presented in Table 2. Finally, the
data elements obtained for the EMRs are shown in Table 3. The excluded data
elements are shown in Figure 2.

Figure 2. The excluded data elements


N. Hashemi et al. / Electronic Medical Records for Mental Disorders 29

Table 3. Data elements of the EMRs for mental disorders

Classes Subclasses Name of the data element


First name, Last name, Medical record number, Father's name, Date
of birth, Marital status, Gender (male, female, unknown), Current
Demographic Status of employment, National ID, Nationality, Language,
-
data of patients Ethnicity, Religion, Patient address, How to live (with parents,
alone and etc.), Patient phone number, Companion patient phone
number, Education level
Administrative
First name and surname of physician, Physician unique ID, Family
data of -
name of anesthesiologist, Anesthesiologist unique ID
physicians
Post-discharge status (recovery, relative improvement, discharge
against medical advice, died, follow up), Duration of stay, Type of
Administrative admission (outpatient, hospitalization, emergency), Patient entrance
-
data of patients (with ambulance, under anesthesia, etc.), Frequency of admissions,
Informed consent, Hospitalization place (ward, room), Final
diagnosis code based on last edited of ICD and DSM
History of chronic diseases (diabetes, hypertension, etc.), History of
surgery, History of head trauma, History of urinary incontinence in
childhood, History of hospitalization in psychiatric centers, History
of seizure, The most important event in childhood, Early childhood
Medical problems, Early adolescence problems, Early adulthood problems,
History of suicide, ADHD in childhood, Type and dose of previous
medications, Possible side effects of medications (type of
History complication, treatment of the complication), The source of history
(the patient, the patient's father, etc.)
Smoking history and its type, History of alcohol use, History of use
Social of psychoactive tablets and its type, Legal history of the patient,
Common habits, Other social histories
Allergy Patient's medication allergy, Patient's food allergy
Family History of mental illness in parents, sister, brother, uncle, etc.
Delusions and its type, Illusion and its type, Sleep disorders
(insomnia, etc.), Serious and important changes in food habits
(increase or decrease in appetite), Changes in daily activity
(extremes in activity-etc.), Irrational fears or phobias and its types,
Obsessive thoughts, Obsessive acts, Suicidal thoughts, Change in
libido (increase or decrease in libido), Self-injury, Self-sexual
injury, Physical harm to others, Sexual harm to others, Alcoholism,
Drug addiction, Cognitive problems, Depressed mood,
Sign and
Occupational dysfunction, Anxiety, High-risk behaviors, Status of
symptoms
social participation, Inability to enjoy, Illegal actions of patients,
Guilty feeling, Physical distress, Speech impairment, Feel of safety
in life, Feelings of panic and fear, Feelings of pity too much to
some people, Anger in dealing with others, Inability to concentrate,
Clinical data Pessimism and suspicion, Hysterical faints, Aggression (verbal or
physical), Fast weight change (increase or decrease weight),
Irritability, Symptoms of mental retardation (low IQ, etc.), Patient's
vital signs.
Main complaints, Starting date of the current disease, The onset of
Chief compliant
current disease (suddenly, after a specific problem, unknown, etc.)
Type of test (blood test, urine test, stool test, spinal cord test, etc.),
Laboratory Physician's order date, Patient status during the test, Interpretation
of the physician, Test results
Admission diagnosis (the cause of hospitalization), The main
diagnosis (diagnosis at discharge), Final diagnosis, Post-discharge
Diagnosis
recommendations (follow up, etc.), Any consultation (psychology,
internal, heart, etc.)
30 N. Hashemi et al. / Electronic Medical Records for Mental Disorders

Table 3. Continued

Classes Subclasses Name of the data element


Name of the medicine (brand or generic), Drug code, Prescribed
dose, Number or volume of the drug dispensed, Route of
administration (oral, nasal breathing, syringe, etc), Frequency of
Medication
drug use (four times a day, three times per day, etc.), Date of use,
Date of stopping the drug, Date of prescription, Time of
administration of the drug, Possible side effects of the drug
Treatment
Name of specialist physician, Name of anesthesiologist, Symptoms
of disease, Type of anesthetic, Muscle relaxant type, Type of
Electroconvulsive missed drug, Amount of anesthetic agent, Amount of muscle
therapy relaxant, Atropine, Mile ampere, Shock therapy result, Patient
status at the end of anesthesia, Complications before the shock,
Complications after shock
Financial data Cost, Insurance contract

4. Discussion

The accurate documentation of patients’ information is directly related to the quality of


treatment and the improvement of treatment outcomes. This requires the creation of a
comprehensive and appropriate medical record that covers all the required information
and responds as much as possible to the current situation of patients. This cannot be
done except by precisely identifying the appropriate data elements for documentation
of patients’ data in medical records. In fact, it is clear that the comprehensiveness and
appropriateness of EMRs for mental disorders depend on the correct selection of these
elements that their collection seems necessary [20-22].
This study aims to answer the question of which data elements are required for
electronic medical record systems for patients with mental disorders. After detailed
literature review [16-19] and surveys carried out by psychiatric experts about the data
elements designated for the electronic medical record, in total, 140 data elements for
the EMRs of patients with psychiatric disorders were identified. These elements were
classified into seven data classes: demographic data of patients, administrative data of
physicians, administrative data of patients, history, clinical data, treatment, and
financial data and 10 subclasses. Rezai et al. [17] considered the data elements required
to document the mental history and evaluation of patients in a total of 58 elements, in
11 categories of demographic data, past history (disease history, psychiatry history,
medical history, non-psychiatry history, history of development and history of health,
history of family psychology, history of relationships with family, legal history),
current symptoms, suicidal risk assessment, behavioral or emotional conditions,
thinking process, physical examination, drug abuse assessment, safety and domestic
violence, the ability to function and daily activities, multi-axial diagnosis, and
treatment outcomes. In the UK [18], 76 data elements in four categories including
patient details, mental health care details, details of the assessment of the care plan
approach, and the details of the mental health care package have been developed. In
Australia [19], 27 data elements have been defined in this regard. Furthermore, in
Europe, a standard titled Patient Summary has been developed o define data elements
for European electronic medical records. In the Patient Summary, different categories
of data such as patients’ attributes, patients’ address books, advance directives, allergy
and intolerance, functional status, history of past illness, history of pregnancy, history
of procedures, immunizations, medical devices, medication summary, plan of care,
N. Hashemi et al. / Electronic Medical Records for Mental Disorders 31

problems, results, social history and cross border have been defined [26]. However,
none of these data elements developed in these countries is related to EMRs for mental
disorders. Although they are valuable, they have not defined specialized data elements
for EMRs for mental disorders. In our study, some categories of data and data elements
are similar to these projects; however, more specialized data elements in the field of
mental disorders were identified for EMRs.
As this study was conducted in three hospitals in one country, our results may not
be generalizable to the other countries. Additionally, in this study, we considered
mental disorders in general. Therefore, this research provides fundamental findings for
further studies. Future studies may consider specific mental disorders to identify more
specialized data elements. Lastly, developing data elements is a preliminary step to
develop EMR systems. Further studies should focus on developing use cases and the
EMR system.
In conclusion, we are confident that the results of this study will be of great
assistance to mental health centers, which want to implement electronic medical
records. Identifying these elements at least leads to an overview that can help
information system developers and EMR vendors to facilitate and accelerate the
development of such a system and reduce the likelihood of a system failure. In
addition, the results of this study can be useful for mental health managers who tend to
implement electronic medical record systems, in order to plan more accurately and
increase system effectiveness.

References

[1] N.A. Latha, B.R. Murthy, U. Sunitha, Electronic health record. International Journal of Research in
Engineering and Technology. 1(10) (2012), 1-9.
[2] A. Sheikhtaheri, N. Hashemi, N.A Hashemi, Benefits of using mobile technologies in education from the
viewpoints of medical and nursing students. Studies in Health Technology and Informatics 251 (2018),
289-292.
[3] C. Chao, H. Hu , C.O. Ung , Y. Cai. Benefits and challenges of electronic health record system on
stakeholders: a qualitative study of outpatient physicians. Journal of Medical Systems. 37(4) (2013),
9960.
[4] N. Menachemi, T. Collum, Benefits and drawbacks of electronic health record systems. Risk
Management and Healthcare Policy, 4 (2011). 47-55.
[5] J. King, V. Patel, E. W. Jamoom, M.F. Furukawa, Clinical benefits of electronic health record use:
national findings. Health Services Research, 49 (1 Pt 2) (2013). 392-404.
[6] S. Malekzadeh, N. Hashemi, A. Sheikhtaheri, N. Hashemi. Barriers for implementation and use of health
information systems from the physicians' perspectives. Studies in Health Technology and Informatics,
251 (2018), 269-272.
[7] R.F. Stewart, P.J. Kroth, M. Schuyler, R. Bailey, Do electronic health records affect the patient-
psychiatrist relationship? A before and after study of psychiatric outpatients. BMC Psychiatry 10 (1)
(2010), 3.
[8] M. Jebraeily, Z. Piri, B. Rahimi, N, Ghasemzade, M, Ghasemirad, A, Mahmodi, Barriers of electronic
health records implementation. Health Information Management.8 (6) (2012), 807-814.
[9] N. Mirani, H. Ayatollahi, H. Haghani, A survey on barriers to the development and adoption of electronic
health records in Iran. Journal of Health Administration. 15 (50) (2013), 65-75.
[10] W.S. Weintraub, R.P. Karlsberg, J.E Tcheng, et al, ACCF/AHA 2011 key data elements and definitions
of a base cardiovascular vocabulary for electronic health records: A report of the American College of
Cardiology Foundation/American Heart Association Task Force on clinical data standards. Journal of
the American College of Cardiology. 58 (2) (2011), 202-222.
[11] M. Abedlhak, M.A. Hanken, Health information: Managing a strategic resource. 5th edition, Elsevier.
[12] F. Sadoughi, S. Nasiri, M. Langarizadeh, Necessity for designing national minimum data set of perinatal
period in Iran: a review article. Iranian Journal of Basic Medical Sciences. 5(57) (2014), 727-737.
32 N. Hashemi et al. / Electronic Medical Records for Mental Disorders

[13] A. Mohammadi, M. Ahmadi, A. Bashiri, Z. Nazemi, Designing the minimum data set for orthopedic
injuries, Journal of Clinical Research in Paramedical Sciences. 3(2) (2014), 75-83.
[14] F. Rafii M. Ahmadi, F. Hoseini, M. Habibi Koolaee, Nursing minimum data set: an essential need for
Iranian health care system. Iran Journal of Nursing. 24(71) (2011), 19-27.
[15] M. Darabi, A. Delpisheh, E. Gholamiparizad, M. Nematollahi. R .Sharifian. Designing the minimum
data set for Iranian children’ health records. Scientific Journal of Ilam University of Medical Sciences.
24(1) (2016).
[16] M. Ahmadi, A. Bashiri, A minimum data set of radiology reporting system for exchanging with
electronic health record system in Iran. Payavard Salamat. 8(2) (2014).
[17] H. Lotfnezhadafshar, Z. Zareh Fazlollahi, M. Khoshkalam, Comparative study of mental health registry
system of United Kingdom, Malaysia and Iran. Health Information Management. 6 (11) (2009), 1-11.
[18] A. Rezaei Ardani, L. Ahmadian, K. Kimiyafar, F. Rohani, Z. Ebnehoseini, Comparative study of data
elements in psychiatric history and assessment forms in selected countries. Journal of Health and
Biomedical Informatics. 3(1) (2016), 57-64.
[19] T. Square, B. Lane, Mental Health Minimum Data Set (MHMDS). 2012 [Accessed 4th Nov. 2018].
Available from: http://content.digital.nhs.uk/article/4865/Mental-Health-Minimum-Data-Set-MHMDS.
[20] Australian Institute of Health and Welfare. Patient care national minimum data set: national health data
dictionary, version 12. National Health Data Dictionary. Cat. no. HWI 48. Canberra: AIHW. 2003
[Accessed 4th Nov. 2018]. Available from: www.aihw.gov.au/publication-detail/?id=6442467503.
[21] S.A. von Esenwein 1, B.G Druss, Using electronic health records to improve the physical healthcare of
people with serious mental illnesses: a view from the front lines. International Review of Psychiatry.
26(6). (2014), 629-37.
[22] A. Takian, A. Sheikh , N. Barber . We are bitter, but we are better off: case study of the implementation
of an electronic health record system into a mental health hospital in England. BMC Health Services
Research , 12 (2012) 484.
[23] D. Robotham, M. Mayhew, D. Rose, T. Wykes. Electronic personal health records for people with
severe mental illness; a feasibility study. BMC Psychiatry. 15 (2015) 192.
[24] K. Kimiafar, M. Sarbaz, A. Sheikhtaheri, A. Azizi. The impact of management factors on the success
and failure of health information systems, Indian Journal of Science and Technology 8 (2015) 1-9.
[25] A. Sheikhtaheri, F. Sadoughi, M. Ahmadi, Developing Iranian patient safety indicators: an essential
approach for improving safety of healthcare, Asian Biomedicine 7 (3) (2013) 365-373.
[26] Health informatics- The patient Summary for unscheduled, cross border care. Draft European Standard
IPS, 2018. [Accessed 18 Mar 2019]. Available from: http://www.ehealth-standards.eu/en/documents/
dHealth 2019 – From eHealth to dHealth 33
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-33

Exchanging Appointment Data Among


Healthcare Institutions
Philip KYBURZa,1, Sascha GFELLERa, 1, Thomas BÜRKLEa and Kerstin
DENECKEa,2
a
Bern University of Applied Sciences, Biel, Switzerland

Abstract. The introduction of national electronic patient records such as the


electronic patient dossier EPD in Switzerland provides a new basis for digitizing
healthcare processes at a national level. One process however, that is currently
neglected within the Swiss EPD, is the scheduling process in healthcare. The
objective of this work is to analyze the appointment scheduling process and the
involved IT systems in order to develop an appointment data structure and a concept
for cross-institutional exchange of appointment data. The analysis showed that
various outpatient and inpatient information systems support appointment booking
through proprietary solutions. A true standard for appointment data exchange is
missing. We suggest an appointment data structure and a corresponding data
exchange process based on the FHIR standard. In its current implementation, the
Swiss EPD does not support this proposed appointment scheduling process. We
discuss how potential additions such as the IHE Care Services Discovery (CSD)
profile can provide better compatibility.

Keywords. appointment, scheduling. cross-institutional data exchange, FHIR

1. Introduction

Switzerland launches a national electronic patient record named electronic patient


dossier (EPD) in 2020 [1]. The EPD supports document based, patient-related, cross-
institutional data exchange based upon IHE (Integrating the Healthcare Enterprise) using
profiles such as XDS (Cross-Enterprise Document Sharing) and XCA (Cross-
Community Access) [2]. The EPD content will be a patient-related collection of CDA
(Clinical Document Architecture) documents. For semantic interoperability, various
Swiss CDA document types are currently defined, e.g. CDA-CH-EMED to support the
medication process [3].
Scheduling and appointment making is an essential process in in- and outpatient care.
For inpatient care, scheduling is typically supported with HL7 V2.x messages [4] or,
within radiology departments, with DICOM and IHE. DICOM provides sophisticated
workflow management among RIS (Radiology Information System), modalities and
PACS (Picture Archiving and Communication System) using DICOM Modality
Performed Procedure Steps (DICOM MPPS) [5]. DICOM MPPS standardizes procedure
step states such as “planned”, “scheduled”, “active”, “completed” etc.

1
Contributed equally
2
Corresponding Author: Kerstin Denecke, Bern University of Applied Sciences, Quellgasse 21, Biel,
Switzerland, E-Mail: [email protected].
34 P. Kyburz et al. / Exchanging Appointment Data Among Healthcare Institutions

In comparison, scheduling and appointment making across institutions, e.g. the


general practitioner (GP) schedules an X-ray examination at a nearby hospital, is still far
from being standardized. Many proprietary individual solutions, e.g. hospital specific
web portals exist [6] and are perceived by patients as a positive innovation [7]. There are
no possibilities to include such scheduling data into the EPD.
A national standardization for the digital appointment process could provide various
advantages. For example, the no-show rate (patients who do not appear for an
appointment) could be reduced [8, 9] and the workload for booking appointments in
healthcare institutions could be reduced [6]. Therefore, the aim of this work is to define
a foundation for an open and national cross-institutional standard for the exchange of
appointment data. This task splits into two parts. First, a generic data structure for
appointments is proposed and second, the process and the corresponding data exchange
methods are defined.

2. Material and Methods

Our starting point was the development of a mobile patient navigator app [10] which
enables a patient to look up his current appointments which may be altered by his
healthcare professionals.
In a following step, we examined some exemplary inpatient [11] and outpatient [12]
information systems with regard to appointment data exchange with such a navigator app.
In addition, several online outpatient appointment booking tools were searched and
analyzed, namely Medicosearch.ch, Docbox.ch, Doctena.ch, Samedi.de. The objective
of this analysis was to identify the data types that are stored in these systems with respect
to an appointment. Based on the results, we could identify common data types to derive
an appointment data structure.
Next, a Medline literature review was carried out with following search terms:
Computerized appointment scheduling, Cross institution appointment scheduling, Cross
sector AND appointment, Cross sector AND scheduling, Web based appointment
scheduling. The aim of retrieval was to identify existing solutions for cross-institutional
appointment communication. The results of the search were filtered for publications
dealing with the scheduling process. There were no restrictions on the publication date.
In a fourth step, the existing standards for scheduling in healthcare were surveyed to
check whether they are appropriate to be used in a comprehensive and cross-institutional
appointment scheduling solution. Thus, we analyzed the Appointment Resource of FHIR
[13], the HL7 V2.x SIU messages [4], and IHE profiles dealing with appointment
booking and the EPD. The derived data structure was then compared with the exchange
standards mentioned above and supplemented when necessary.

3. Results

3.1. Existing standards for appointment data

We were unable to detect any publication reporting on a nationwide standardized


electronic medical appointment booking system. The analysis showed instead, that
various in- and outpatient clinical information systems support some kind of appointment
booking. But, implementation and the necessary data to be collected varied. This was
P. Kyburz et al. / Exchanging Appointment Data Among Healthcare Institutions 35

particularly evident in the area of possible appointment types. The pre-defined values
differ in their degree of detail among systems. This can lead to difficulties in cross-
institutional communication. Links between appointments within or beyond an
institution or aggregation of appointments to a treatment episode were missing.
For inpatient care, HL7 V2 supports Scheduling Information Unsolicited Messages
(SIU) [4]. SIU supports 14 different trigger events to notify applications of appointment
changes. All events use a common message format. SIU-S12 for example is the event
for notification of a new appointment. The SCH segment contains information regarding
the date such as IDs, reason, duration, etc. It also shows who booked the appointment
and its status. In the TQ1 segment, the times are displayed in more detail. Thus, an
appointment can also have a repetition, i.e. an appointment can be booked weekly.
Different segments specifying patients, services, devices, rooms and service providers
for an appointment may be added.
Fast Healthcare Interoperability Resources (FHIR) [13] is an emerging standard
hosted by HL7.org where appointments are mapped with the appointment resource. An
appointment resource contains fields for start and end time, duration, location and the
participants of an appointment. Using additional FHIR resources such as the Slot and the
Schedule resources, the whole booking process can be addressed in FHIR. Furthermore,
through the use of the Subscription resource, FHIR supports that different participants
can be automatically notified about the change of a resource. The communication
between FHIR endpoints is realized through a REST API and the transmitted data can
be either XML or JSON formatted. Therefore, FHIR not only addresses the data structure
itself but also the communication through which the data is exchanged. However, the
FHIR standard is not a document-based standard and is therefore not directly compliant
to the EPD.
The so-called CDA-CH standard, a Swiss adaptation of the CDA, is used for the
EPD. CDA is part of the HL7 V3 standard. In the current CDA-CH document v.2.0.3,
the term "Appointment" is not at all mentioned [15]. CDA-CH together with the XDS.b
profile of IHE provides a document-based infrastructure for the Swiss EPD. An
appointment is currently not considered as document.
A specific use case for the communication of event data via the XDS.b profile has
not yet been defined by IHE. The Eye Care Appointment Scheduling (ECAS) profile
demonstrates the process for scheduling appointments [16]. This profile can serve as a
possible basis for the implementation of cross-institutional appointment data
communication. However, if this profile is to be implemented across institutions, the
transactions should be adapted or redefined. Various transactions originate from the
Radiology Framework of IHE (RAD-1, RAD-12, ...) that use the standard HL7 V2.x.
This is not optimal for cross-institutional communication. Furthermore, this profile
would have to be integrated into an XDS environment without the loss of dynamic
communication. For cross-institutional communication, it is essential to detect the
appropriate service provider in order to book an appointment. With the IHE CSD Profile
[17], this search process can be supported. CSD provides a register with the available
service providers. A query could, for example, return all orthopedists in the area of Berne
in Switzerland.
The results of this analysis demonstrate that there is no off-the-shelf solution
available for cross-institutional appointment data exchange. However, existing standards
can provide some foundations.
36 P. Kyburz et al. / Exchanging Appointment Data Among Healthcare Institutions

3.2. Requirements for appointment data structure

In previous work [10], we collected requirements for an appointment format. They


comprise three mandatory criteria: The appointment format must be 1) suited to be used
across institutions, 2) support outpatient and inpatient appointments and 3) must be of
benefit for patients, i.e. patient-supporting applications should be enabled using the
format. The appointment data structure should be able to map the information given in
Table 1. For the types of appointment, appointment status and prioritization, specific
catalogue values should be defined to ensure a normalized labeling. An appointment can
be linked to one or more patients, care providers, rooms, devices, documents and
services. Each of these items must also have a status that indicates whether the
appointment was accepted or rejected (e.g. whether a room could be blocked for an
appointment or a specific physician was scheduled for the appointment).
Table 1. Appointment data structure.
Name Data type / Pointer Required
Appointment date with time Date Yes
Appointment duration in minutes Integer No
From Date (Time) Yes
To Date (Time) No
Type of appointment String Yes
Appointment status String Yes
Prioritization String No
Reason of visit String (freetext) No
Patient Pointer to PatientID Yes
Care provider Pointer to CareProviderID Yes
Room Pointer to RoomID No
Medical device Pointer to DeviceID No
Documents Pointer to DocumentID No
Service Pointer to ServiceID No
Institution Pointer to InstitutionID Yes
Description String No

According to our analysis, the FHIR Appointment Resource is suitable for


implementing this data structure because it completely maps the proposed appointment
data structure for an individual appointment. In addition to the definition of an individual
appointment, two supplementary data structures should be defined through which
individual appointments can be linked with each other. These additional structures serve
the following purposes:
x illustration of an inpatient case
x mapping of an entire treatment episode
The data structure for the inpatient case should contain the following information:
date of entry, date of discharge, patient, institution, subordinated appointments, room the
patient stays in and the department. The individual appointments that take place during
the inpatient case can then be subordinated to this data structure. The second additional
data structure should include the possibility to represent a treatment episode of a patient
as for example a total hip endoprosthesis. Therefore, the data structure is a list to which
the different individual appointments and inpatient cases can be mapped.
The individual appointment, the inpatient case and the treatment episode can then
be linked with each other in a tree structure as depicted in figure 1.
P. Kyburz et al. / Exchanging Appointment Data Among Healthcare Institutions 37

Figure 1. Tree structure comprising several appointments, either individual or appointments in an inpatient
case, aggregated in a treatment episode

3.3. Appointment booking process

This section describes a possible process for the communication of appointment data. As
already considered in the mentioned FHIR resource, during the scheduling process, the
appointment can adopt various states. These states serve as the basis for this conceptual
communication process (fig. 2).

Figure 2. Defined process of an appointment booking process using the various states of an appointment

We propose a minimum of seven states for an appointment: Suggested, pending,


booked, showed up, cancelled, no-show, completed. As soon as a request is made for an
appointment, the appointment assumes the state “suggested”. A provisional date is
proposed, which can still be adjusted by the other parties involved. After defining the
basic data of the appointment, its state is set to “pending”. For a “pending” appointment
all key data is defined and all participants are invited. If all required participants agree to
38 P. Kyburz et al. / Exchanging Appointment Data Among Healthcare Institutions

the appointment, it will be changed to state “booked”. If not, the appointment will be
“cancelled”.
For “booked” appointments the date is fixed. If the patient checks in at the providing
institution, the state is altered to “showed up”. Once the appointment is finished the state
is set to “completed”. If one of the participants is no longer able to attend, he should
“cancel” in advance. The state “no-show” indicates that the appointment is scheduled,
but the patient did not appear and did not cancel the appointment in advance.
During the scheduling process, the defined conditions are monitored and influenced
by four different actors (table 2).
Table 2. Actors involved in the appointment booking process
Actor Description
Participant All entities participating in an appointment. This can be a person such as a
patient and a health professional but may also include other entities such as
an MRI or an operating room.
Requester of Participant who starts the initial appointment process by making a suggestion
appointment of an appointment
Healthcare Institution where the appointment takes place
institution
EPD Refers to the community to which the healthcare provide belongs
Community

Communication between the various actors is divided into individual steps. Each
step represents the exchange of data between two actors. Standards and IHE profiles are
proposed for realizing the various steps. The state of the appointment is also changed in
some steps. In figure 3 the individual communication steps are shown graphically and
explained in more detail in the following.

Figure 3. Communication among the various actors during an appointment booking process
(1) Through the IHE CSD profile the requesting person (Service Finder) uses the
ITI-73 transaction against the EPD Community (Care Services InfoManager) to search
an institution and receive corresponding information including the FHIR endpoint. To
realize this, the EPD functionality would have to be extended by the IHE Care Service
Discovery (CSD) profile. (2) The availability schedule of the institution and, if
necessary, of other required participants will be retrieved using the FHIR schedule and
slot resources. (3) When a free slot has been found, an appointment request will be sent.
P. Kyburz et al. / Exchanging Appointment Data Among Healthcare Institutions 39

In this step, the actual FHIR appointment resource is created and sent to the institution.
(4) All participants will be informed about the appointment request. (5) The individual
participants can confirm or reject the appointment request using the FHIR
AppointmentResponse resource. (6) In the previous steps, the appointment
communication was carried out via a system of the corresponding institution. Once the
state is set to booked, the FHIR Appointment Resource can be converted to a CDA-CH
document and uploaded to the EPD community. (7) If a “booked” appointment can no
longer be attended by a participant, it should be “cancelled”. A new version of the
appointment document with the status “cancelled” will be created and uploaded to the
corresponding community. (8) In order to notify all participants of the cancellation, the
EPD must be extended with the IHE Document Metadata Subscription (DSUB) profile.

4. Discussion

Our initial thinking started with the upload of scheduling documents within the EPD. We
detected quickly, that this approach causes several challenges: Every change in the date
of an appointment as well as every response of a participant result in a new document
version. Moreover, every appointment change requires a complete download of the
whole document and the upload of the modified version. In the current state of the EPD,
patients can only read their documents and not actively manipulate them. Thus, a patient
would be unable to suggest or confirm a date for an appointment. Furthermore, within
the Swiss EPD architecture, it is impossible to upload documents that are not directly
patient-related. Therefore, healthcare providers and other possible participants could not
provide their availability in form of an own schedule.
For this reason, we propose the (combined) use of FHIR as a potential alternative to
the direct document-based mirroring of the scheduling process within the EPD. We are
fully aware that the additional use of FHIR implies a significant additional effort. Every
healthcare institution will be required to provide her own FHIR endpoint with the
schedules of the bookable resources. Because of this additional effort, the whole national
scheduling process in the suggested form should be considered as an optional extension
which institutions can freely choose to implement and use.
An additional effort is necessary to convert the FHIR appointment resource to a
CDA document as soon as the appointment is booked. The scheduling process itself
could be performed without the conversion of the document and its upload to the EPD.
For this reason, the question should further be analyzed if and when the appointment is
uploaded to the EPD during this process. Once the upload is completed, the patient is no
longer able to actively manipulate the appointment because of mentioned limitations of
the EPD. Nevertheless, possible benefits of persisting an appointment as a document in
a national patient record concern the reuse of the appointment data. Documents resulting
from an appointment could be directly linked to this visit, providing an additional
opportunity to sort and search documents of a patient. On the other hand, the total number
of EPD documents grows considerably.
The proposed inter-sectorial appointment scheduling process has been designed for
integration with the Swiss EPD. We did not explicitly examine the compatibility with
other national health record implementations. During analysis of the Swiss EPD
infrastructure and the corresponding standards it turned out that such integration requires
additions such as the integration of the IHE CSD profile and the IHE DSUB profile.
Although the IHE CSD profile would also provide the possibility to check the availability
40 P. Kyburz et al. / Exchanging Appointment Data Among Healthcare Institutions

of a service by using the transaction ITI-75 based on CalDAV (RFC 4791), we decided
to use the IHE CSD profile to retrieve the FHIR endpoint. The main reason for this
decision lies in the possibility to include further FHIR resources in the process in future
steps. For example the questionnaire resource could be utilized to request further
information.
This concept provides the foundation for a possible solution to digitalize the cross-
institutional appointment booking process. FHIR is still a new and emerging standard
and further work is required to demonstrate the practicability of our proposal. The next
step is the validation through experts and the implementation of a proof of concept. This
should include at least an implementation for one institution. This proof of concept can
provide further information that can be used to parametrize the FHIR resources and the
possible appointment document for a national standardization. It also should be evaluated
if there is a need for the upload of the appointment document once the appointment is
booked. In parallel, the integration of the IHE CSD profile to a national patient record
infrastructure such as the Swiss EPD should be considered.

References

[1] Bundesgesetz über das elektronische Patientendossier. SR 816.1 Apr 15, 2017.
[2] IHE. IHE IT Infrastructure (ITI) Technical Framework Vol 1, Revision 15.0.
https://www.ihe.net/uploadedFiles/Documents/ITI/IHE_ITI_TF_Vol1.pdf. Last visited 11. February
2019.
[3] eHealth Suisse: Austauschformate eMedikation. Pre-Publication Review, 2018, Version 08.05.2018,
https://www.e-health-suisse.ch/fileadmin/user_upload/Dokumente/2018/D/180508_CDA-CH-
EMED_de.pdf
[4] Caristix. HL7V2 Scheduling Information Unsolicited Messages. http://hl7-
definition.caristix.com:9010/HL7%20v2.3/triggerEvent/Default.aspx?version=HL7+v2.3&triggerEvent
=SIU_S14. Last visited 11. February 2019.
[5] NEMA. DICOM PS3.3 2019a - Information Object Definitions.
http://dicom.nema.org/medical/dicom/current/output/pdf/part03.pdf. Last visited 11. February 2019.
[6] P. Zhao, I. Yoo, J. Lavoie, BJ. Lavoie, E. Simoes. Web-Based Medical Appointment Systems: A
Systematic Review. Journal of Medical Internet Research. 2017;19(4):e134.
[7] EA. Fradgley, CL. Paul, J. Bryant, C. Oldmeadow. Getting right to the point: identifying Australian
outpatients’ priorities and preferences for patient-centred quality improvement in chronic disease care.
Int J Qual Health Care. 1. September 2016;28(4):470–7.
[8] Z. Siddiqui, R. Rashid. Cancellations and patient access to physicians: ZocDoc and the evolution of e-
medicine. Dermatol Online J. 15. April 2013;19(4):14.
[9] K. Mohamed, A. Mustafa, S. Tahtamouni, et al. A Quality Improvement Project to Reduce the ‘No
Show’ rate in a Paediatric Neurology Clinic BMJ Open Quality 2016;5:u209266.w3789. doi:
10.1136/bmjquality.u209266.w3789
[10] K. Denecke, P. Kyburz, S. Gfeller, Y. Deng, T. Bürkle. A Concept for Improving Cross-Sector Care by
a Mobile Patient Navigator App. Stud Health Technol Inform. 2018;255:160–4.
[11] POLYPOINT – Hospital Information System [Internet]. POLYPOINT. [cited 7. February 2019].
Available from: https://polypoint.ch/
[12] Elexis – Physician Information System [Internet]. [cited 7. February 2019]. Available from:
https://elexis.info/
[13] HL7.org. Fast Healthcare Interoperability Resources (FHIR). https://www.hl7.org/fhir/.
Last visited 11. February 2019.
[14] C. Rinner, G. Duftschmid. Bridging the Gap between HL7 CDA and HL7 FHIR: A JSON Based
Mapping. Stud Health Technol Inform. 2016;223:100–6.
[15] T. Schaller. CDA-CH v2.0.3.pdf. HL7 Benutzergruppe Schweiz; 2018.
[16] Eye Care Appointment Scheduling - IHE Wiki [Internet]. [cited 20. October 2018]. Available online:
https://wiki.ihe.net/index.php/Eye_Care_Appointment_Scheduling
[17] Mobile Care Services Discovery (mCSD) - IHE Wiki [Internet]. [cited 27. October 2018]. Available
online: https://wiki.ihe.net/index.php/Mobile_Care_Services_Discovery_(mCSD)#Details
dHealth 2019 – From eHealth to dHealth 41
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-41

Application of Named Entity Recognition


Methods to Extract Information
from Echocardiography Reports
Szabolcs SZEKÉRa, György FOGARASSYb, Károly MACHALIKa
and Ágnes VATHY-FOGARASSYa, 1
a
Department of Computer Science and Systems Technology, University of Pannonia,
Hungary
b
State Hospital of Cardiology, Hungary

Abstract. As there is no consensus about how to store the results of


echocardiography examinations, information extraction from them is a non-trivial
task. Successful named entity recognition (NER) is key to getting access to the
stored information and the process of identification has been recognized as a
bottleneck in text mining. Our goal was to develop and compare such NER methods
that are capable of achieving this task. Our practical results show that the text
mining-based NER method is able to perform at a similar level in finding and
identifying terms as the regular expression-based NER method. The paper highlights
the advantages and disadvantages of both methods.

Keywords. data mining, text mining, named entity recognition, echocardiography,


electronic medical record

1. Introduction

Most medical institutes generally use Electric Medical Record (EMR) to record and store
information about their patients, including diagnostics, performed treatments and their
results. EMR is a valuable information source for medical analysis, however it is usually
incomplete or redundant, making data mining a difficult and challenging task. It is
especially true in case of echocardiography reports. Generally, echocardiography reports
can be divided into two parts in terms of diagnostic content: in the first semi-structured
part diagnostic results are stored in the form of term-value pairs (e.g.: interventricular
septum: 14 mm), and in the second part results are recorded as free text written in natural
language (e.g.: mild left ventricular hypertrophy). As there is no consensus about how to
store the results of echocardiography examinations and it is varying across different
medical institutes, processing of echocardiography reports is a nontrivial task. Present
paper is focusing on how to process the first, semi-structured part of echocardiography
reports. As processing of the free text part requires quite different methods, including
Natural Language Processing (NLP) techniques, we do not deal with them in this paper.

1
Corresponding Author: Ágnes Vathy-Fogarassy, Department of Computer Science and Systems
Technology, University of Pannonia, 2. Egyetem Str., 8200 Veszprém, Hungary, E-Mail: [email protected]
pannon.hu
42 S. Szekér et al. / Application of Named Entity Recognition Methods to Extract Information

Generally, information extraction from medical texts focuses on the following two
tasks: named-entity recognition (NER, or term extraction), and relation extraction (RE).
Named-entity recognition refers to the process of identifying particular types of names,
terminologies or symbols in documents, while relation extraction identifies the relation
between them [1]. Successful term identification is key to getting access to the stored
information and the process of identification has been recognized as a bottleneck in text
mining [2]. The process of term identification is usually done in three steps: the first step
is term recognition; the second step is term classification; and the last step is term
mapping [2].
There are two possible approaches to solve this task. The first approach is to directly
search for specific terms (e.g. aortic root, ejection fraction) in documents. Direct term
search always relies on a specialized dictionary to recognize and classify medical
terminology, and the performance of this approach heavily depends on the coverage and
quality of the dictionary. The acquisition of such knowledge is a time-consuming task.
Direct search can also be extended by pattern search, which requires a priori knowledge
about the structure of the processed text (e.g. use of colon between terms and values,
order of terms, various expletives). With this extension, it becomes possible to recognize
terms and their measured value (e.g. aortic root: 27 mm) together.
Other term extraction methods also exist which utilize classical text mining
techniques. These text mining-based solutions do not need a predefined dictionary to
extract terms from the text, but simply collect every occurrence of word sequences that
are possibly valid terms. However, these methods require a text pre-processing phase
(including text cleaning), and term candidates must be identified and mapped onto a
dictionary after term extraction.
In the literature, several international studies have been published which are engaged
in echocardiography report processing [3-10]. They are mostly based on the direct search
approach, but some of them apply text-mining methods as well. In the published studies,
typically only the extraction of one specific parameter is the aim, such as ejection fraction
(EF). Garvin et al., Kim et al., and Xie et al. all successfully extracted this parameter
from free text documents and described practical extraction techniques [3-5]. In [6] a
natural language-based method was presented which uses a predefined dictionary, expert
rules and predefined patterns to extract echocardiography measurements from
documents. In this study, a pattern-matching algorithm was created and tested to extract
term candidates from a large set of clinical notes. The presented method relies heavily
on pattern matching, but it can also identify possible misspellings and synonyms by
iterative extraction. Wells et al. also successfully extracted a set of predefined parameters,
including wall thicknesses, chamber dimensions or flow velocities [7]. They applied
NLP to parse the most frequently measured dimensions and used outlier analysis to filter
out unrealistic values. Toepfer et al. developed and evaluated an information extraction
component with fine-grained terminology that enabled them to recognize almost all
relevant information stated in German transthoracic echocardiography reports at the
University Hospital of Würzburg [8]. Jonnalagadda et al. described an information
extraction-based approach that automatically converts unstructured text into structured
data, which is cross-referenced against eligibility criteria using a rule-based system to
determine which patients qualify for a heart failure with preserved ejection fraction
(HFpEF) clinical trial [9]. In [10], Renganathan proposed text mining techniques that
enable the extraction of unknown knowledge from unstructured documents.
As we can see, all the suggested methods report successful medical text processing
but were implemented in different ways. However, until now, there was no analysis
S. Szekér et al. / Application of Named Entity Recognition Methods to Extract Information 43

published that would compare the two basic approaches. The purpose of our research
was to examine how well a text mining-based solution fares against a direct term search-
based method in processing medical, especially echocardiography documents, and
whether it is able to outperform it or not. For this purpose, we implemented both
approaches, processed the same corpus of echocardiography reports with them, and
compared the results. Our results are primarily valid for the analysis of echocardiography
reports, but we believe that they might be valid in case of disclosure information stored
in term-value pairs from other medical documents as well.
The structure of this document is as follows. In Section 2, we give a brief overview
of the challenges faced when processing echocardiography reports and present two
fundamentally different methods to extract, identify, and map terms. In Section 3, the
used dataset and the evaluation process are described and the result of the analysis is
presented. Finally, in Section 4, general experiences and future developments are
discussed.

2. Methods

The most vexatious problem with echocardiography reports is that there is no unified
process on how to record data of patients. The form of recorded information varies from
medical institute to medical institute. Furthermore, not only the location of data recording
is an influencing factor, but medical assistants or doctors record the results according to
their own habits and, arising from the lack of a unified recording interface, free text
contains many typos as well.
In our study, two methods have been realized to extract terms from the first, semi-
structured part of echocardiography reports. The first method is a general regular
expression-based method which processes raw text, meaning that there is no pre-cleaning
applied, and assumes that terms and their measurement results are separated by a colon.
The second method is based on traditional text mining methods. In this case, the raw text
is first cleaned and then the cleaned text is processed. This method searches for numerical
values and assumes that there is a term before and a unit of measurement (if needed) after
each numerical value.
The main difference between the two methods lies in the text preparation phase. The
regular expression-based method processes raw text and assumes, based on a priori
knowledge, that the term and their measurement result pairs follow a certain pattern, e.g.
they are separated by a colon, while the text mining-based method cleans and
manipulates the text in such a way that it becomes easier to process. Furthermore, the
text mining-based method does not rely on any a priori knowledge about the medical text
to process them. These two methods are introduced in detail in the following subsections.

2.1. Regular expression-based NER

The regular expression-based NER method uses regular expressions to extract terms
from echocardiography reports. A regular expression is a sequence of characters that
defines a search pattern. Usually this pattern is used by string searching algorithms for
"find" or "find and replace" operations on strings. The regular expression-based method
processes raw text, meaning that the data is processed as it is, no pre-cleaning methods
are applied. Furthermore, the regular expression-based processing method, based on a
priori knowledge, presumes that every term and the adherent value is separated by a
44 S. Szekér et al. / Application of Named Entity Recognition Methods to Extract Information

colon (in our case, but it can be separated by any other predefined separator character as
well) and the applied regular expressions are built upon this assumption and knowledge.
In our study, firstly simpler regular expressions have been defined on which more
complex expressions were based. These rudimentary regular expressions include
expressions for terms, values, units and extended units. Sample expressions for terms and
values are the following:
terms r'(?P<term>(?!\d)\w\D+)' (1)
values r'(?P<digits>\d[\d,.+\-x/*]*)' (2)
The first expression defines that terms cannot start with a number and one or more non-
numeric characters follow a word character. The second expression defines the values in
such a way that they can be integers (e.g. 27), decimal numbers (e.g. 12.5), ranges of
values separated by a hyphen (e.g. 25-28, 12.4-12.7) or a multi-dimensional value
specified by an "x" character (e.g. 27x13).
Using these rudimentary expressions more complex expressions can also be
constructed. For example, the measurement result is a complex expression, which is a
concatenation of values, some separating whitespace characters and a measurement unit
with affixation taken into account (measurement_result = [values][whitespace
characters][unit][affixation]). An expression for a term–measurement_result pair is the
concatenated form of the term, whitespace characters, a colon, whitespace characters
and the measurement result expressions (term–measurement_result = [term][whitespace
characters]:[whitespace characters][measurement_result]). The flexibility of the
regular expression-based NER comes from its ability to find character sequences
matching the defined patterns regardless of their position in a longer sequence.
Using the previously defined regular expression set, the raw text can be processed.
Based on the indeterministic nature of regular expression matching, the
echocardiography reports were processed from start to end. If a string matching an
expression was found, the string was processed, stored, and removed from the document.
These few steps were executed iteratively until no processable string was left.

2.2. Text mining-based NER

The second method is a more straightforward approach which utilizes traditional data
mining and cleaning methods. It pre-processes the raw text without any a priori
knowledge about the contents of the documents. As part of the cleaning process, this
method unifies whitespaces, removes all colons, parentheses and unneeded characters. It
is important to note that however, it does not modify any commas or dots as they can
also be used as decimal comma or decimal point based on the localization of the
recording software. It also unifies the units of measurement based on a predefined list.
The unified and found measures are concatenated to the preceding numerical values
during the pre-processing phase.
The trick of this method is that it assumes that all measured values are numerical
and before every numerical value there is a term and after every numerical value there
can be a unit of measurement present. To remove the numerical values which do not
express measurements results, the algorithm in the pre-processing phase modifies some
of the measures like mm2 or cm2 to, respectively, sqrmm and sqrcm.
After the text pre-processing phase, the text mining-based NER splits the cleaned
documents into "words" (sequences of characters separated by whitespaces) and searches
S. Szekér et al. / Application of Named Entity Recognition Methods to Extract Information 45

for the first occurrence of a "word" starting with a numerical value. The preceding ݊
words, if ݊ words are present, are considered term candidates. The candidates are then
checked, marked, and stored for later usage. In our case, ݊ ൌ Ͷ was chosen for the
threshold of the number of words for candidate terms.

2.3. Identification of complex terms and measurement results

The previously introduced methods are capable of recognizing term–measurement result


pairs, however more complex sequences are also present in the echocardiography reports
e.g. term1–term2–measurement_result1–measurement_result2 (e.g. left ventricular
diameter end-diastolic/end-systolic: 54/35 mm).
To find these kinds of expressions as well, in case of the regular expression-based
method, the execution was improved by adding more complex regular expressions. After
repeated testing, we came to the conclusion that by processing the raw text it is not
possible to find every term present in the documents and expanding the regular
expressions for every special occurrence is a difficult, time-consuming task.
The text mining-based method was extended as follows. During the execution, the
found sequence was checked whether it fits the predefined rules, e.g. simple term–
measurement_result pair (e.g. ejection fraction: 56%), term1–measurement_result1–
subterm2–measurement_result2 sequence (e.g. ejection fraction Teichholz: 56%,
Simpson: 52%”), or term1–term2–measurement_result1–measurement_result2
sequence (e.g. E/A: 0.4/0.8 m/s). If one of the rules fits the word list, the terms and the
numerical values were stored. These rules were defined as IF-THEN rules. In this form
the rule definition is much easier than the creation of the previously described, more
complex regular expressions.

2.4. Dictionary-based mapping

The recognized named entities from the processed documents were checked whether
they are valid terms or not by using a dictionary of terms. This dictionary has been
created with the help of a medical expert. The used dictionary contains more than 30
terms from the field of cardiology probably present in some form in echocardiography
reports and over 100 synonyms have also been defined for the 30 terms. The previously
defined ݊ ൌ Ͷ word length of terms stems from the dictionary, for the reason that the
maximum length of terms recorded in the dictionary is 4.
The terms extracted in the previous phases were compared against the elements of
the dictionary. The Jaro-Winkler distance [11] was calculated for each comparison and
if the distance was lower than a specified distance threshold, the term was considered
valid and identified. This threshold parameter was defined as the lowest, non-zero intra-
distance of the terms stored in the dictionary. The Jaro-Winkler distance (݀௪ ) can be
calculated in the following way:
݀௪ ሺ•ଵ ǡ •ଶ ሻ ൌ ͳ െ ‫݉݅ݏ‬௪ ሺ•ଵ ǡ •ଶ ሻ (3)
‫݉݅ݏ‬௪ ሺ•ଵ ǡ •ଶ ሻ ൌ ‫݉݅ݏ‬௝ ሺ•ଵ ǡ •ଶ ሻ ൅ ݈‫݌‬ሺͳ െ ‫݉݅ݏ‬௝ ሺ•ଵ ǡ •ଶ ሻሻ (4)
where ‫݉݅ݏ‬௝ is the Jaro similarity for ‫ݏ‬ଵ and ‫ݏ‬ଶ strings, ݈ is the length of a common prefix
up to 4 characters and ‫ ݌‬is a constant scaling factor with a standard value of ͲǤͳ. The
Jaro similarity (‫݉݅ݏ‬௝ ) is calculated in the following way:
46 S. Szekér et al. / Application of Named Entity Recognition Methods to Extract Information

Ͳ ‹ˆ݉ ൌ Ͳ
‫݉݅ݏ‬௝ ሺ•ଵ ǡ •ଶ ሻ ൌ ቊଵ ቀ ௠ ൅ ௠ ൅ ௠ି௧ቁ ‘–Ї”™‹•‡ (5)
ଷȁ௦ ȁ
భ ȁ௦ ȁ మ ௠

where ȁ‫ݏ‬௜ ȁ is the length of ‫ݏ‬௜ , ݉ is the number of matching characters and ‫ ݐ‬is half of the
number of transpositions. The concept of matching and transpositions is detailed in [11].

3. Results

To compare the effectiveness of the previously presented regular expression-based NER


(RE-NER) and text mining-based NER (TM-NER) a corpus containing 20 089
anonymized echocardiography reports has been processed. Each document had a unique
identifier and a basic description about the diagnosis. The first, semi-structured part
contained various terms mainly in the form of [term]:[measurement_result]. As any
other free text stored medical records, the echocardiography reports under study also
included typing errors or deficiencies.
Results of both algorithms were evaluated in the following way: for each method–
term pair we counted the number of documents in which the method has found the
specific term. Furthermore, an important part of the evaluation was to identify the
documents in which only one method was able to identify the given term. The number
of matched documents by any methods (N), by RE-NER (ܰோா ሻǡ by TM-NER (்ܰெ ሻ, and
the number of documents exclusively matched by RE-NER (ܰ݁ோா ሻ, and by TM-NER
(்ܰ݁ெ ሻ can be seen in Table 1. To evaluate the relative success of the methods, we also
calculated the frequencies of the matched documents relative to the number of the
documents matched by any method (‫ݍ‬ோா ǡ ‫்ݍ‬ெ ). Furthermore, the rate of the exclusively
matched documents was also calculated (‫݁ݍ‬ோா ǡ ‫்݁ݍ‬ெ ). These results are presented in
Table 2.

Table 1. The number of the most common terms identified by RE-NER and TM-NER methods.
RE-NER TM-NER
Term N ࡺࡾࡱ ࡺࢋࡾࡱ ࡺࢀࡹ ࡺࢋࢀࡹ
Left ventricular
19 598 19 464 42 19 549 116
end-systolic diameter
Interventricular septum
19 562 19 498 109 19 491 43
(end-diastolic)
Aortic root 19 537 19 492 66 19 476 21
Posterior wall
19 496 15 696 116 19 386 3 800
(end-diastolic)
Left ventricular
19 240 19 096 102 19 147 81
end-diastolic diameter
Left atrium (M-mode) 19 344 19 259 208 19 144 85
E 18 759 18 719 44 18 723 59
EF 18 768 18 640 636 18 135 131
A 18 458 18 421 977 17 483 41
Interventricular septum
14 372 2 2 14 370 14 370
(end-systolic)
Posterior wall
14 310 41 1 14 309 14 269
(end-systolic)
Right ventricle (M-mode) 10 656 10 448 239 10 432 17
2D right atrial dimensions 10 492 10 398 237 10 264 11
S. Szekér et al. / Application of Named Entity Recognition Methods to Extract Information 47

Most differences of ܰோா and ்ܰெ occur from typos, missing spaces or non-
numerical values. During testing, we found that there are cases when spaces are missing
between some named entities (terms). As TM-NER is based on the list of words, in this
case, this method is unable to find the appropriate term. To handle this kind of failure it
is suggested to insert separator space characters into the text (for example after the
measurements) during text cleaning. Furthermore, there were occurrences of term–
measurement_result pairs where the measurement result part was a non-numeric value.
The TM-NER method is unable to identify these kind of results, but RE-NER may be
able to find these occurrences based on the presumption that terms and values are
separated by a colon regardless the type of value. The biggest difference occurred during
the exploration of terms Interventricular septum (end-systolic) and Posterior wall (end-
systolic). These terms are composite terms. They follow the term1–
measurement_result1–subterm2–measurement_result2 pattern. RE-NER struggles to
find and to process these kinds of terms in a humanly consumable way.

Table 2. The relative occurrence of most common terms identified by RE-NER and TM-NER methods.
RE-NER TM-NER
term N ࢗࡾࡱ ࢗࢋࡾࡱ ࢗࢀࡹ ࢗࢋࢀࡹ
Left ventricular
19 598 99,32% 0,21% 99,75% 0,59%
end-systolic diameter
Interventricular septum
19 562 99,67% 0,56% 99,64% 0,22%
(end-diastolic)
Aortic root 19 537 99,77% 0,34% 99,69% 0,11%
Posterior wall
19 496 80,51% 0,59% 99,44% 19,49%
(end-diastolic)
Left ventricular
19 240 99,25% 0,53% 99,52% 0,42%
end-diastolic diameter
Left atrium (M-mode) 19 344 99,56% 1,08% 98,97% 0,44%
E 18 759 99,79% 0,23% 99,81% 0,31%
EF 18 768 99,32% 3,39% 96,63% 0,70%
A 18 458 99,80% 5,29% 94,72% 0,22%
Interventricular septum
14 372 0,01% 0,01% 99,99% 99,99%
(end-systolic)
Posterior wall
14 310 0,29% 0,01% 99,99% 99,71%
(end-systolic)
Right ventricle
10 656 98,05% 2,24% 97,90% 0,16%
(M-mode)
2D right atrial dimensions 10 492 99,10% 2,26% 97,83% 0,10%

4. Discussion

Information extraction from echocardiography reports stored as free text is a challenging


task in medical analysis. The success of information extraction mainly depends on the
quality of the source documents, and the algorithms have to overcome many difficulties.
The main goal of this study was to compare two types of information extraction methods
and to highlight their strengths and drawbacks. The algorithms in the study were (i) a
classical regular expression-based information extractor algorithm, which is based on
some prior knowledge about the text to be processed and (ii) a general text mining
algorithm, which operates without any prior knowledge about the text. Our practical
48 S. Szekér et al. / Application of Named Entity Recognition Methods to Extract Information

results show that the text mining-based method is able to perform at a similar level in
finding and identifying terms as the regular expression-based method. Both methods
have advantages over the other. The text mining-based algorithm has difficulty in
handling missing space characters. As the text mining-based method is based on the
assumption that measured results are stored as numerical values, this method is unable
to find non-numerical values. In case of the regular expression-based method, the
formulation of the expression set is a difficult task and it is even harder to extend this
regular expression set to recognize complex terms. Furthermore, not all occurrences can
be expressed with a general expression. These special occurrences require more and more
unique expressions to be added to the set, which increases the processing time.
Our primary finding is that the text mining-based NER method is able to perform at
a similar level in finding and identifying terms as the regular expression-based method
and in case of extracting complex terms and their measurement results it outperforms the
regular expression-based NER method. Information extraction can be further improved
by implementing a hybrid NER which merges the advantages and negates the
disadvantages of both methods. This hybrid NER is part of our future research.

Acknowledgment

We acknowledge the financial support of Széchenyi 2020 under EFOP-3.6.1-16-2016-


00015 and UNKP-18-3 New National Excellence Program of the Ministry of Human
Capacities, and the professional support of GINOP-2.2.1-15-2016-00019 "Development
of intelligent, process-based decision support system for cardiologists".

References

[1] Wencheng Sun, Zhiping Cai, Yangyang Li, et al. Data Processing and Text Mining Technologies on
Electronic Medical Records: A Review. Journal of Healthcare Engineering, 2018; Article ID 4302425
[2] Krauthammer M, Nenadic G. Term identification in the biomedical literature. J Biomed Inform.
2004;37(6):512–526.
[3] Xie F, Zheng C, Yuh-Jer Shen A, Chen W. Extracting and analyzing ejection fraction values from
electronic echocardiography reports in a large health maintenance organization. Health Inform J.
2017;23(4):319–328.
[4] Garvin JH, DuVall SL, SOuth BR, et al. Automated extraction of ejection fraction for quality measurement
using regular expressions in Unstructured Information Management Architechture (UIMA) fro heart
failure. J Am Med Inform Assoc. 2012;19(5):859–866.
[5] Kim Y, Garvin JH, Goldstein MK, et al. Extraction of left ventricular ejectionfraction information from
various types of clinical reports. J Biomed Inform. 2017;67:42–48.
[6] Patterson OV, Freiberg MS, Skanderson M, et al.. Unlocking echocardiogram measurements for heart
disease research through natural language processing. BMC Cardiovasc Disord. 2017;17(1):151.
[7] Wells QS, Farber-Eger E, Crawford DC. Extraction of echocardiographic data from the electronic medical
record is a rapid and efficient method for study of cardiac structure and function. J Clin Bioinforma.
2014;4(1):12.
[8] Toepfer, M., Corovic, H., Fette, et al. Fine-grained information extraction from German transthoracic
echocardiography reports. BMC Medical Informatics and Decision Making, 2015; 15(1):91
[9] Jonnalagadda, S.R., Adupa, A.K., Garg, R.P. et al. Text Mining of the Electronic Health Record: An
Information Extraction Approach for Automated Identification and Subphenotyping of HFpEF Patients
for Clinical Trials. J. of Cardiovasc. Trans. Res. 2017; 10(3), 313–321.
[10] Renganathan, V. Text Mining in Biomedical Domain with Emphasis on Document Clustering.
Healthcare Informatics Research, 2017; 23(3), 141–146.
[11] Piskorski J., Sydow M. String Distance Metrics for Reference Matching and Search Query Correction.
In: Abramowicz W. (eds) Business Information Systems. BIS 2007. LNCS, Vol 4439. Springer, Berlin
dHealth 2019 – From eHealth to dHealth 49
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-49

Health Care Atlases: Informing the General


Public About the Situation of the Austrian
Health Care System
Claire RIPPINGERa,1, Nadine WEIBRECHTa, Melanie ZECHMEISTERa, Sonja
SCHEFFELb, Christoph URACHc and Florian ENDELa
a
DEXHELPP, Vienna, Austria
b
Main Association of Austrian Social Security Institutions, Vienna, Austria
c
dwh Simulation Services, Vienna, Austria

Abstract. There is a great number of complex data concerning the Austrian Health
Care System. The goal was to process this data and present it to the general public
on an easily accessible information platform. The platform focuses on data about
the burden of disease of the Austrian Population, the available medical care and the
services provided by the physicians. Due to the vast differences in the underlying
source data, the methods used for the data acquisition range from statistical linkage
over web scraping to aggregating data on the reimbursed services. The results are
published on a website and are mainly displayed with interactive graphics. Overall,
these dynamic and interactive websites provide a good overview of the situation of
the Austrian Health Care System and presents the information in an intuitive and
comprehensible manner. Furthermore, the information given in the atlases can
contribute to the health care planning in order to identify distinctive service
provision in Austria.

Keywords. atlas, delivery of health care, data aggregation

1. Introduction

Issues concerning the health care system are mentioned by the Austrian media on a
regular basis. Recent keywords include the reformation of the Social Security System
and the imminent retirement of a large number of general practitioners in rural areas.
However, the media reports always focus only on a small aspect of the health care system
and fail to describe the complex overall situation.
Therefore, the Main Association of Austrian Social Security Institutions launched
the Health Care Atlases. The goal is to provide the public with a neutral, data driven
information platform which gives a good overview of the current situation of the health
care system and presents the complex data in a well comprehensible and intuitive manner.
The project was implemented by DEXHELPP, a research association developing
methods, models and technologies in order to support the analysis, planning and
controlling of the health care system.

1
Corresponding Author: Claire Rippinger, DEXHELPP, Neustiftgasse 57-59, 1070 Vienna, Austria, E-
Mail: [email protected]
50 C. Rippinger et al. / Health Care Atlases

The information in the atlases covers three main issues of the health care system: the
burden of disease of the Austrian Population, the medical care available in different
regions of Austria and the services provided by the physicians in their respective field.
Considering the big difference in available source data, the Health Care Atlases have not
been implemented in one single project. Instead, they were split into three smaller
projects, which have been implemented independently: the Epidemiology Atlas, the Care
Atlas and the Services Atlas.
Similar projects have been implemented by other countries, e.g. the UK, the US and
Germany. The Environment and Health Atlas for England and Wales [1] [2]is a
collection of interactive maps depicting the relative risks for a total of 14 health
conditions, averaged over a 25-year period. The Dartmouth Atlas Project [3]–[7]
presents a variety of maps and charts displaying various information ranging from
surgical procedures to medical discharges, all based on data from Medicare, a national
health insurance program in the United States. Finally, the German Versorgungsatlas
[8]–[12] is a library of many smaller projects involving health care. These projects
include information about the number of resident physicians, vaccination rates, a
selection of health indicators, etc.

2. Methods

The implementation of the different atlases consisted of two parts:


1. data acquisition and processing
2. data visualization
For the first part, different methods were applied for each of the three atlases, due to
a vast difference in the source data in terms of quantity and quality. These methods range
from statistical linkage (Epidemiology Atlas) over web scraping (Care Atlas) to
aggregating existing data on the billed services (Services Atlas). For the second part,
mainly all three atlases used the same graphical representation of the data.

2.1. Epidemiology Atlas

The Epidemiology Atlas aims to provide information on the burden of disease of the
Austrian population. This information can only be collected by indirect means, since
there is no standardized diagnostic coding in the outpatient sector in Austria. Diagnoses
are only available in relation to sick leaves or hospital stays. In this project, the following
three methods have been evaluated in order to derive disease information indirectly:
x ATHIS Austrian Health Interview Survey 2006/2007
x ATC-ICD predicting the ICD code (International Classification of Diseases)
from the ATC code (Anatomical Therapeutic Chemical Classification System)
x Methods of Diagnosis Assignment by experts
One of these methods (Methods of Diagnosis Assignment) only considers the
prevalence of diabetes, the two other methods (ATHIS and ATC-ICD) consider multiple
diseases. In the first instance, all of these three methods were applied to diabetes in order
to compare their performance. Additionally, in the finalized Epidemiology Atlas, ATC-
ICD has been used to compute the prevalence of multiple diseases.
C. Rippinger et al. / Health Care Atlases 51

In ATHIS [13], a representative sample of about 15 000 Austrian residents aged 15


and over have been surveyed. The participants have been asked about their overall health
status. The questionnaire included, among others, inquiries about a past or present
manifestation of diabetes and about treatment or antidiabetic medication received within
the past twelve months. The results considering diabetes were then extrapolated to the
Austrian population.
ATC-ICD is a statistical method calculating prevalence probabilities. In a first step,
hospital and sick leave diagnoses, as well as data on received medication are used to
determine assignment probabilities. In a second step, these assignment probabilities are
applied to the prescription data of all persons who have used social insurances services,
giving an estimate of the diabetes prevalence in the Austrian population. The underlying
data source is the research database GAP-DRG[14] (Grundlagenforschung für
ambulante, personenbezogene “Diagnoses related Groups”) of the Association of
Austrian Social Security Institutions. It contains reimbursement data of all social health
insurance funds as well as the diagnostic data of hospital admissions and sick leaves for
the years 2006 and 2007. It is the only available data source containing information on
both the inpatient and the outpatient sector. The ATC-ICD method developed by
Weisser et al. [15] is based on the idea of Chini et al. [16] to estimate the prevalence of
certain conditions based on pharmacy data.
In the third method, experts used the GAP-DRG database to evaluate the number of
people who had at least one prescription for a diabetes related medication or who had a
hospital stay with the diagnosis diabetes.
The prevalence estimates of all three methods are normalized to the Statistics Austria
population per 10 000 inhabitants and can be filtered by age, gender and federal state.
Apart from the method comparison based on the prevalence of diabetes, the
Epidemiology Atlas also provides prevalence estimates for all diseases classified by ICD,
more precisely, the 9th and 10th revision of ICD [17]. The estimates have been calculated
by extending the ATC-ICD method to these diseases. The results were calculated for
both a normalized and a standardized population[18] and have been processed to answer
the following questions:
1. What is the prevalence of a given disease for a specific federal state, age group
and gender?
2. How are the prevalence estimates for a given disease affected by the chosen cut-
off-point?
3. What are the most common diseases, ranked by their total prevalence?
For the first question, a person is considered to be affected by the disease if the
calculated probability is greater than 0.75. The influence of this threshold value (= cut-
off-point) of 0.75 on the calculated prevalence estimates is examined in the second
question.

2.2. Care Atlas

The aim of the Care Atlas is to provide an overview of the available medical supply in
Austria to the public population. Furthermore, this Care Atlas can contribute to and
support the general planning process of medical services by identifying distinctive
service provision in the Austrian health care sector. Consequently, this might have an
impact on improving the overall structural health care quality in the future. The
underlying data is publicly available on the websites of the Austrian Provincial Chambers
of Physicians (Landesärztekammern). For every physician, the website contains an entry,
52 C. Rippinger et al. / Health Care Atlases

providing information on the medical specialty, the type of social security contract, the
gender and the opening hours. Selenium, a framework which automates tasks performed
within a browser, was used to visit the single websites and to collect this information.
Regular expressions were then used to identify the information on the opening hours
provided by the physicians. The collected opening hours were stored in a database.
The collected data were subsequently processed to comply with the following
indicators:
x What is the number of available practices for a given federal state, medical
specialty, contract type and gender of the physician?
x What is the number of available weekly hours for a given federal state, medical
specialty, contract type and gender of the physician?
x What is the number of open practices by time and day of the week for a given
federal state, medical specialty, contract type and gender of the physician?
For the evaluation of the time dependent results, only those practices were included
in the data set for which the opening hours indicated a specific start and end time.
All results are expressed both in absolute values and in relation to the population.

2.3. Services Atlas

The goal of the Services Atlas was to provide information on the services provided by
the physicians. The underlying data consists of the services billed to the social health
insurances by resident physicians in the years 2016 and 2017 and was provided by the
Association of Austrian Social Security Institutions. This database contains information
about the nature and number of services provided by each contracted physician, as well
as additional information on the contracted physician (i.e. insurance provider, medical
specialty, address). Each individual service is classified by an alphanumeric code, in
which the first two characters assign the service to a specific body region [19]. It is the
only data source providing a uniform encoding of the services provided in every federal
state.
It is important to notice, that neither services provided by non-contracted physicians
nor by physicians employed in a hospital or a similar institution are represented in this
data set.
Considering the good quality of the underlying data, only the standard data
preprocessing had to be applied and the data could be aggregated to comply with five
different indicators:
x What is the spectrum of the services provided: How many different services are
provided by the individual medical specialties in a given year and federal state?
x What are the most billed services (grouped by the corresponding body region)
for a given medical specialty, federal state and year?
x What are the most billed individual services for a given medical specialty,
federal state and year?
x What is the distribution of the individual services: What percentage of the most
billed services is provided by what group of medical specialties in a given year
and federal state?
x How many individual services are billed in the different federal states in relation
to the population in a given year?
During the data processing, privacy laws have been taken into consideration, more
specifically k-anonymity has been respected for k equals 3 [20]. Hence, every
C. Rippinger et al. / Health Care Atlases 53

information concerning a federal state where there are less than three contracted
physicians for a given medical specialty, is not displayed in the final results.

2.4. Visualization

Since the targeted audience of the Health Care Atlases is the general public, the
information is predominantly given in graphical form, allowing a quick and intuitive
comprehension of the presented data.
The type of chart is chosen depending on the underlying data: Regional information
is displayed in a choropleth map of Austria depicting the different federal states. A single
hue progression is used to illustrate the magnitude of the values represented on the map.
Categorical data, which do not represent regional information, are mostly displayed
using bar charts, allowing a quick comparison of the depicted values. The 3-dimensional
information on the number of open practices by time and day of the week is represented
using heat maps. As in the choropleth maps, a single hue progression illustrates the
magnitude of the values.
All charts and maps have been implemented in the form of interactive graphics. This
way, the user can single-handedly browse through the results and apply several filters.
Furthermore, a number of different tooltips and popovers display more detailed
information.

3. Results

The Health Care Atlases are publicly available on the DEXHELPP-website2. For each
atlas, the information is divided into several chapters and subchapters. Every chapter or
subchapter contains an explanation of the data presented, as well as some operating
instructions for the interactive features of the chart. Furthermore, every atlas also
provides detailed background information, additional links and the source of the
underlying data.
Since the goal of this project was to provide a neutral information platform, the
Health Care Atlases do not make an assessment on the current situation and do not draw
any conclusions from the data. However, they can be used to make some interesting
observations. These atlases provide a good overall overview of the Austrian health care
system and its service provision. By providing these data in an aggregated manner,
contributions can be made regarding the identification of distinctive service provision in
various regions in Austria. For example, a comparison of two heat maps displaying the
number of practices by time and day of the week is incorporated in the Care Atlas. By
using a dropdown menu, the user can filter the available data and compare the number
of open practices of general practitioners in an urban and a rural area. Figure 1 shows the
results in relation to 100 000 inhabitants. In the capital city of Austria (Vienna), the
number of practices which are open in the morning is almost equal to the number of
practices with opening hours in the afternoon. In contrast to this, in Carinthia, which is
the federal state with the lowest population density, there are much more open practices
in the period between 8:00 and 12:00 o’clock.

2
http://www.dexhelpp.at
54 C. Rippinger et al. / Health Care Atlases

Figure 1. Two heatmaps comparing the number of open practices of general practitioners per 100 000
inhabitants. Vienna is displayed on the left and Carinthia on the right.

Easily accessible data like this may provide additional information regarding the
public debate on the health care system: is there a general lack of physicians in a given
federal state? Is there only an apparent lack due to a temporal clustering of the opening
hours? How does the need of health care provision vary within different federal states
and regions?

4. Discussion

The three Health Care Atlases give a broad overview of the current situation of the
Austrian health care service. They give an insight on the burden of disease of the
population and the availability and type of treatment. Simply put, the three atlases answer
the following questions:
What diseases does the Austrian population suffer from?
Where and when is a treatment available?
How is the Austrian population treated?
However, the atlases do not depict the complete Austrian health care system. The
information displayed in the Epidemiology Atlas depends on estimations and the Care
Atlas and the Services Atlas only contain information on registered physicians in the
outpatient sector. The Services Atlas is even further limited to registered physicians with
a contract with the Social Security Institutions.
C. Rippinger et al. / Health Care Atlases 55

5. Future Work

Being aware of the limitations mentioned in chapter 4, additional projects and further
analyses regarding the Health Care Atlases have already been launched. For the
Epidemiology Atlas, a new approach of the ATC-ICD-method is already under
investigation and for the Care Atlas, it is planned to also include the opening hours of
outpatient departments. Currently, there are no plans to expand the information of the
Services Atlas to non-contracted physicians, since there is no data available on this
subject matter.
Furthermore, it is planned to equip each atlas with the visualization of the temporal
change of the data. Currently, it is possible to change the considered year in the
Epidemiology and in the Service Atlas. However, it may also be interesting to see the
temporal change of the data during the whole investigated period in one single chart.
Finally, it is intended to monitor anonymous usage behavior of the visitors of the
atlases. Thus, it can be investigated, how often the atlases are visited, and which is the
most popular one. This will give an insight into the public interest in the project.

References

[1] “Home page | The Environment and Health Atlas.” [Online]. Available:
http://www.envhealthatlas.co.uk/homepage/. [Accessed: 23-Jan-2019].
[2] A. L. Hansell et al., The Environment and Health Atlas for England and Wales, 1 edition. Oxford: OUP
Oxford, 2014.
[3] “Home,” Dartmouth Atlas of Health Care. [Online]. Available: https://www.dartmouthatlas.org/.
[Accessed: 12-Feb-2019].
[4] D. C. Goodman and A. A. Goodman, “Medical care epidemiology and unwarranted variation: the
Israeli case,” Isr. J. Health Policy Res., vol. 6, no. 1, p. 9, Feb. 2017.
[5] R. Panczak et al., “Regional Variation of Cost of Care in the Last 12 Months of Life in Switzerland:
Small-area Analysis Using Insurance Claims Data,” Med. Care, vol. 55, no. 2, p. 155, Feb. 2017.
[6] D. C. Goodman and G. A. Little, “Data Deficiency in an Era of Expanding Neonatal Intensive Care
Unit Care,” JAMA Pediatr., vol. 172, no. 1, pp. 11–12, Jan. 2018.
[7] G. P. Westert et al., “Medical practice variation: public reporting a first necessary step to spark change,”
Int. J. Qual. Health Care, vol. 30, no. 9, pp. 731–735, Nov. 2018.
[8] “versorgungsatlas.de - Der Versorgungsatlas.” [Online]. Available:
https://www.versorgungsatlas.de/der-versorgungsatlas/. [Accessed: 23-Jan-2019].
[9] C. Schmidt et al., “Integration von Sekundärdaten in die Nationale Diabetes-Surveillance: Hintergrund,
Ziele und Ergebnisse des Sekundärdaten-Workshops am Robert Koch-Institut,”
Bundesgesundheitsblatt - Gesundheitsforschung - Gesundheitsschutz, vol. 60, no. 6, pp. 656–661, Jun.
2017.
[10] M. K. Akmatov, A. Steffen, J. Holstiege, R. Hering, M. Schulz, and J. Bätzing, “Trends and regional
variations in the administrative prevalence of attention-deficit/hyperactivity disorder among children
and adolescents in Germany,” Sci. Rep., vol. 8, no. 1, Dec. 2018.
[11] H. Burchert, Ed., Fachbegriffe des Gesundheitsmanagementsh, 2., überarbeitete und erweiterte
Auflage. Herne: nwb STUDIUM, 2018.
[12] S. March et al., “Quo vadis Datenlinkage in Deutschland? Eine erste Bestandsaufnahme,”
Gesundheitswesen, vol. 57, no. 03, pp. e20–e31, Mar. 2018.
[13] Statistik Austria, “ATHIS.” [Online]. Available:
http://www.statistik.at/web_de/services/publikationen/4/index.html?includePage=detailedView&secti
onName=Gesundheit&pubId=457. [Accessed: 23-Jan-2019].
[14] F. Endel, G. Endel, and N. Pfeffer, “PRM34 Routine Data in HTA: Record Linkage in Austrias GAP-
DRG Database,” Value Health, vol. 15, no. 7, p. A466, 2012.
[15] A. Weisser, G. Endel, P. Filzmoser, and M. Gyimesi, “ATC-> ICD–evaluating the reliability of
prognoses for ICD-10 diagnoses derived from the ATC-Code of prescriptions,” presented at the BMC
health services research, 2008, vol. 8, p. A10.
56 C. Rippinger et al. / Health Care Atlases

[16] F. Chini, P. Pezzotti, L. Orzella, P. Borgia, and G. Guasticchi, “Can we use the pharmacy data to
estimate the prevalence of chronic conditions? a comparison of multiple data sources,” BMC Public
Health, vol. 11, no. 1, p. 688, Sep. 2011.
[17] “ICD-10-GM.” [Online]. Available: https://www.dimdi.de/dynamic/de/klassifikationen/icd/icd-10-
gm/. [Accessed: 23-Jan-2019].
[18] Y. Dodge, Ed., The Oxford dictionary of statistical terms, First published in paperback 2006. Oxford:
Oxford Univ. Press, 2006.
[19] “Katalog ambulanter Leistungen (KAL): Entwicklung und Pilotprojekte bis inkl. 2013 |
Gesundheitssystem / Qualitätssicherung | Gesundheitssystem | Gesundheit | Sozialministerium.”
[Online]. Available:
https://www.sozialministerium.at/cms/site/gesundheit/dokument.html?channel=CH3958&doc=CMS1
240821423857. [Accessed: 23-Jan-2019].
[20] L. Sweeney, “k-anonymity: A model for protecting privacy,” Int. J. Uncertain. Fuzziness Knowl.-
Based Syst., vol. 10, no. 05, pp. 557–570, 2002.
dHealth 2019 – From eHealth to dHealth 57
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-57

Improving the Prediction of Emergency


Department Crowding: A Time Series
Analysis Including Road Traffic Flow
Jens RAUCHa,1, Ursula HÜBNERa, Mathias DENTERb and Birgit BABITSCHc
a
Health Informatics Research Group, Osnabrück University AS, Germany
b
Klinikum Osnabrück, Germany
c
New Public Health, University Osnabrück, Germany

Abstract. Background: Crowding in emergency departments (ED) has a negative


impact on quality of care and can be averted by allocating additional resources based
on predictive crowding models. However, there is a lack in effective external overall
predictors, particularly those representing public activity. Objectives: This study,
therefore, examines public activity measured by regional road traffic flow as an
external predictor of ED crowding in an urban hospital. Methods: Seasonal
autoregressive cross-validated models (SARIMA) were compared with respect to
their forecasting error on ED crowding data. Results: It could be shown that
inclusion of inflowing road traffic into a SARIMA model effectively improved
prediction errors. Conclusion: The results provide evidence that circadian patterns
of medical emergencies are connected to human activity levels in the region and
could be captured by public monitoring of traffic flow. In order to corroborate this
model, data from further years and additional regions need to be considered. It
would also be interesting to study public activity by additional variables.

Keywords. emergency hospital service, patients, forecasting, regression analysis

1. Introduction

Crowding in emergency departments (ED) is associated with poor patient care, higher
mortality and negative impact on patient-safety [1]. Crowding occurs when the need for
emergency services exceeds available resources for patient care in the emergency
department, hospital, or both [2]. An issue of misaligned demand and supply [3], it is
therefore vital to relieve overstrained EDs by curbing patient volume as well as adjusting
ED resources to better meet demand during highly busy hours. With respect to patients,
there are many advances in understanding their motives and patterns of frequent use
informing increasing demand [4]. These can be used for countering improper ED use and
raising awareness among patients. On the other hand, it is of equal importance to tackle
resource alignment on the hospital’s side, since there is often a mismatch between staffing
rosters and patient demand [5]. Growing adoption of health IT technology, however,
holds the chance to integrate detailed forecasting models into ED processes and resource
management, as electronic health care records become widely available [6].

1
Corresponding Author: Jens Rauch, Osnabrück University of AS, Health Informatics Research Group,
PO Box 1940, 49009 Osnabrück, Germany; E-Mail: [email protected].
58 J. Rauch et al. / Improving the Prediction of Emergency Department Crowding

Several recent studies could show that relatively simple regression models already
give good predictions on forthcoming ED crowding measures [7,8,9]. So far, more
complex models provide only somewhat more accurate results, despite their higher
modelling capacities. This is foremost due to the lack of appropriate external covariates
[10]. While there is a large portion of emergency research devoted to the study of surges
in patient volume in case of catastrophic events, pandemic outbreaks or seasonal
fluctuations, little has been published with regard to covariates of regular variations in
intra-day ED occupancy [3]. Indeed, only a small number of indicators for upcoming ED
service demand have been studied. Among the covariates often used are weather and
calendric data, so far. The use of weather data is justified by the fact that weather affects
a number of conditions, which are likely to lead to medical emergencies, e.g. certain air
masses increase asthma hospital admissions significantly [11]. There were few attempts
to include other public data. A recent study investigated the predictive website traffic on
a public health portal [12]. Other studies revolve around the impact of mass gatherings
on the demand for emergency services. Only recently, first results of a systematic
examination regarding the effect of mass gatherings on medical emergency prevalence
were presented [13].
Common to all these studies is the use of variables, which are inherently connected
to public activity level. The use of calendric data is usually intended to reflect the
influence that weekends, holidays and seasons have on public activity patterns. Aside
from its physiological effects [14], weather evidently has effects on public activities, too
[15]. However, both kinds of data are only indirectly connected to overall activity levels.
It seems reasonable to including more proximate measures of public activity into
forecasting models of ED crowding. For instance, daily commuters can make up a
considerable portion of the number of people in a city during specific times and working
hours [16]. Accordingly, it can be hypothesized, that they also contribute to medical
incidents subject to emergency care. Data readily available about population movement
and sojourn is given by public recordings of traffic data. In this paper, we propose hourly
traffic flow as a direct and openly available measure of public activity and aim at
investigating its effect on the prediction of the overall hourly ED load.

2. Methods

We conducted a retrospective study and used historical ED data extracted from the
Electronic Health Record of Klinikum Osnabrück, an academic teaching hospital with
660 beds serving the town and region of Osnabrück, Lower Saxony, Germany. The ED
has about 40,000 cases per year and is operated 24 hours a day on 365 days a year. Since
exported data was anonymised, no ethical statement needed to be obtained. Data covered
the period from January 1 until December 15, 2017. Historical traffic data for the same
period was obtained from the German Federal Highway Research Institute
(Bundesanstalt für Straßenwesen), which collects hourly data about the direction and
number of vehicles passing by measuring stations on federal roads and motorways. There
is a total of six measuring stations in the area of Osnabrück, covering the major traffic
axes for motorised vehicles from and to Osnabrück. All data sources were mirrored in a
PostgreSQL 10 research database. Data analysis was carried out using the statistics
software R.
ED occupancy is a common measure for ED crowding [17]. Thus, hourly totals of
patients within the ED was the variable of interest, while traffic flow as a measure of
J. Rauch et al. / Improving the Prediction of Emergency Department Crowding 59

activity should act as a preceding covariate. Cross-correlation analysis was performed to


examine correlation of traffic flow with the number of ED occupancy of concurrent ED
cases. Incremental 1-hour lags of the traffic data were compared stepwise to ED
occupancy to assess the temporal preceding effect of traffic flow on ED crowding. Since
ED occupancy follows a circadian pattern, we limited the number of lags to the preceding
23 hour traffic values. The pronounced periodicity of both time series may lead to an
overestimated linear relationship, i.e. covariate time series of ED occupancy that follow
the same circadian pattern might not improve a seasonal autocorrelative model [18].
Therefore, two SARIMA models were fitted to the ED occupancy data, one with and a
second without traffic data as external predictors. The SARIMA model parameters were
determined by inspection of the autocorrelation function (ACF) and the partial
autocorrelation function (PACF) plots of the seasonally differenced time series.
Additionally, candidate parameters were validated by fitting the parameterized models
on the time window January 1 to September 30 and comparing their performance with
respect to the remaining period. Traffic flow data from the three roads that showed the
highest cross-correlation with ED occupancy were used as predictors (Tab. 1).
A running cross-validation was employed to compare the performance of the models
[19]. In this procedure, each model was fitted repeatedly, beginning from a fixed starting
point of a connected time frame, which was iteratively expanded in 1 hour steps. For each
iteration both models (without and with traffic as predictor) were fitted and prediction
errors were calculated for a time horizon of up to six hours, resulting in six prediction
values (one per hour). Data from the last quarter of the year (October 1 to December 15)
was selected for this procedure. Prediction accuracy was calculated as root-mean-square
error (RMSE) and mean absolute error (MAE).

Figure 1. Typical intraday variation in traffic flow and ED occupancy (data from January 20, 2017), normalised
by respective day mean (red: motorway A33 and green: federal road B51, black: ED occupancy)
60 J. Rauch et al. / Improving the Prediction of Emergency Department Crowding

3. Results

An exemplary intraday time series of ED occupancy and two selected traffic densities
spanning a 24-hour period, beginning at 4 a.m. is shown in Figure 1. Obviously, traffic
flow took a similar shape and was preceding ED occupancy. Increases in ED occupancy
were foregone by increases in traffic load. There was a mean of 12.7 patients (± 7.86 SD)
and a maximum of 37 patients in the emergency department during the period of
investigation. Maximum Pearson correlation coefficients from all traffic measuring
stations with their respective lags are given in Table 1 together with the approximate
distance to the city centre. We also observed that in almost all cases traffic flow towards
the city had a higher correlation than traffic flow in the opposite direction. The best
explanation of the variance (rmax ) was given by traffic flow on motorway A33 at Hellern
and both stations on the federal road B51 with maximum correlation coefficients
of .73, .71 and .71 respectively, preceding ED occupancy by two hours in all three cases.
Accordingly, traffic values from these three roads were used as external predictors in the
SARIMA model.

Table 1. Maximum correlation coefficients from cross-correlation analysis of road traffic and ED occupancy.
Distance refers to driving distance from measuring station to Osnabrück centre. Traffic from roads included in
this study is bolded.
Measuring station Distance (km) rmax Lag (h)
B68 Lechtingen 6.0 .70 -3
B51 Ostercappeln 13.2 .71 -2
B51 Glandorf 20.9 .71 -2
A33 Hellern 5.6 .73 -2
A33 Fledder 6.5 .70 -3
A33 Handorf 10.2 .70 -3

Fitting of the SARIMA model to the training data period (January 1 to September
30) yielded optimal model parameters (1,0,0),(0,1,1)24 from analysis of ACF and PACF
plots. The time series was subjected to first order seasonal differencing, since no trend
but strong 24-hour seasonality was present. Inspecting the ACF of the differenced time
series showed a fair amount of decay. The respective PACF showed a sharp cut-off and
positive autocorrelation at lag 1. Thus, an AR term was added. Since ACF showed a
negative correlation at lag 24, an SMA term was added. An iterative comparison of the
models by the Bayesian Information Criterion in the parameter space
(AR,MA,SMA,SAR)‫{א‬0,...,5}2×{0,1,2}2 confirmed, that this model was indeed optimal.
We used the same parameters for the SARIMA model with the external predictors
“road traffic on the federal road B51” and “road traffic on the motorway A33 Hellern”.
Overall cross-validated RMSE and MAE values for all six time horizons are given in
Table 2. Improvement for shorter time horizons was better. E.g. for lag 1, the SARIM
model with traffic had roughly 20 % lower RMSE (4.04 vs. 3.21) and for lag 2 RMSE
was about 10 % lower (4.20 vs 3.77). Inclusion of external predictors improved the
prediction for all lags as could be shown (Tab. 2).
J. Rauch et al. / Improving the Prediction of Emergency Department Crowding 61

Table 2. Model comparison from cross validation over the period October 1 to December 15, 2017.

Lags

1 2 3 4 5 6
Measure Model
RMSE SARIMA 4.04 4.20 4.22 4.23 4.23 4.23

SARIMA+traffic 3.21 3.77 3.98 4.10 4.15 4.18

MAE SARIMA 3.11 3.23 3.24 3.25 3.25 3.25

SARIMA+traffic 2.32 2.88 3.05 3.14 3.18 3.21

4. Discussion

In this study, prediction of hourly ED occupancy could substantially be improved when


including traffic flow data into a seasonal autoregressive forecasting model. To the best
of our knowledge, this is the first study making use of regional traffic data as a preceding
indicator for expected ED service demand. Other studies that predicted hourly rates of
ED crowding by including external covariates resorted to variables from calendric,
weather and air pollution data sets. Model improvements by these data, however, fell
short of expectations [20]. Moreover, the effect of weather data, which is most commonly
used, showed to be highly dependent on regional and climatic features and is thus a
measure unfit for generalizable assertion about its impact on ED service demand [10].
In [7], SARIMA models without external predictors where fitted to occupancies of
different EDs. They achieved MAEs for one hour lags between 2.4 and 5.4 patients.
Similarly, a recent study that employed deep neural networks and made use of weather
and calendric data to predict patient volume had RMSEs between 4 and 5 patients for
hourly predictions [21]. By and large, the present results are lower than in the other
studies and show that public activity measured by road traffic can be successfully used
as an effective additional predictor.
Public activity had been measured in previous studies by local mass gathering events
[22,13]. For example, [22] found an increase in ED cases of about 1/1000 per participant
in a mass gathering event. However, not including time series data. It might also be
argued that mass gathering is a very specific predictor, i.e. one that is correlated with
specific medical conditions, e.g. alcohol intoxication. We contend that road traffic acts
as a general predictor of ED occupancy.
In opposition, the predictive value of road traffic cannot be causally attributed in the
same way, apart from an increased risk of traffic accidents. If road traffic was only related
to car accidents the predictive value would have been much lower as car accidents
account for only a small proportion of the ED cases.
The quest for a general predictor that mimics the seasonal course of ED occupancy
leads to findings about circadian patterns, such as heart rate, blood pressure and
serotonine level [14] which are governed by the wake-sleep-rhythm and individual
activity [23]. They were indeed associated with medical emergencies, a consistent finding
replicated worldwide [24]. Interestingly, not all peaks, e.g. morning peak of stroke events,
could be explained by biological features such as presence of hypertension, dyslipidemia,
62 J. Rauch et al. / Improving the Prediction of Emergency Department Crowding

diabetes mellitus, or cigarette smoke. [24]. Individual activity patterns also seem to play
a distinctive role. Notably, it was found that some medical emergencies amongst the
working population follow a pattern that differed from a nonworking subgroup [25].
Obviously, biological and behavioural circadian patterns are correlates of emergency
events on an individual level and thus of ED use. This study revealed circadian public
activity measured by road traffic to be a meaningful predictor of overall ED occupancy
on a regional level. These findings do not claim any causal relationship. Thus, more
research is needed to explain the underlying mechanisms.
However, indicators such a road traffic are powerful because they are available on a
regional level and thus able to predict a regional phenomenon, i.e. ED occupancy. It is
therefore a natural next step, to draw on datasets other than road traffic that are indicative
of public activity levels. Especially mobile cellular location data might be accessible for
prediction of upcoming ED demand and could further improve the results. If so, they
could corroborate the hypotheses of public activity as a correlate of ED occupancy. Apart
from exogenous factors like public activity, a comprehensive predictive model of ED
crowding evidently needs to incorporate factors that are endogenous to the hospital
processes, e.g. staffing rosters and bed occupancy. The present study, however, focussed
on the effectiveness of road traffic data as an easily publicly available predictor. Its
predictive power can be expected to generalize to all EDs in a given region.
This study is of course limited in that data from the six measuring stations is a
specific sample of traffic activity around the city, since neither usage of public transport
nor side road traffic was given. Also, we used ED occupancy as the only measure for ED
crowding. Occupancy belongs to the central throughput measures that inform ED
workload. Yet, there are several other aspects that should also be taken into consideration,
but which were not present in our data, e.g. ED capacity and hospital efficiency.
Furthermore, it remains to be examined in what way the present findings generalize to
other regions.

5. Conclusion

To the best of our knowledge, this is the first study, which examines regional traffic data
as indicator for urban critical health events that require emergency treatment. It could be
shown that road traffic as an external overall covariate can indeed contribute to a
substantial improvement in forecasting crowding in emergency departments.
Fundamentally, the effects might be explained by an inherent relation to human activity
levels that previously were found to be related to medical emergencies.

6. Acknowledgements

We thank Tobias Sonnenberg and the Klinikum Osnabrück for the provision of emer-
gency data and their collaboration. This work is funded by the state of Lower Saxony,
project ROSE, the learning health care system (ZN 3103).
J. Rauch et al. / Improving the Prediction of Emergency Department Crowding 63

References

[1] B.C. Sun, R.Y. Hsia, R.E. Weiss, D. Zingmond, L.-J. Liang, W. Han, H. McCreath and S.M. Asch,
Effect of Emergency Department Crowding on Outcomes of Admitted Patients, Annals of emergency
medicine 61(6) (2013), 605–611.
[2] Crowding, Annals of Emergency Medicine 47(6) (2006), 585, ISSN 0196-0644, 1097-6760.
doi:10.1016/j.annemergmed.2006.02.025.
[3] N.R. Hoot and D. Aronsky, Systematic Review of Emergency Department Crowding: Causes, Effects,
and Solutions, Annals of Emergency Medicine 52(2) (2008), 126–1361, ISSN 0196-0644.
doi:10.1016/j.annemergmed.2008.03.014.
[4] J. Rauch, J. Husers, B. Babitsch and U. Hübner, Understanding the Characteristics of Frequent Users of
Emergency Departments: What Role Do Medical Conditions Play?, Studies in health technology and
informatics 253 (2018), 175–179.
[5] R. Champion, L.D. Kinsman, G.A. Lee, K.A. Masman, E.A. May, T.M. Mills, M.D. Taylor, P.R.
Thomas and R.J. Williams, Forecasting Emergency Department Presentations, Australian Health
Review 31(1) (2007), 83–90, ISSN 1449-8944. doi:10.1071/ah070083.
[6] R.S. Evans, Electronic Health Records: Then, Now, and in the Future, Yearbook of medical informatics
Suppl 1 (2016), 48–61, ISSN 0943-4747. doi:10.15265/IYS-2016-s006.
[7] M. Hertzum, Forecasting Hourly Patient Visits in the Emergency Department to Counteract Crowding,
The Ergonomics Open Journal 10(1) (2017), 1–13.
[8] L. Zhou, P. Zhao, D. Wu, C. Cheng and H. Huang, Time Series Model for Forecasting the Number of
New Admission Inpatients, BMC medical informatics and decision making 18(1) (2018), 39.
[9] W. Whitt and X. Zhang, A Data-Driven Model of an Emergency Department, Operations Research for
Health Care 12 (2017), 1–15, ISSN 2211-6923. doi:10.1016/j.orhc.2016.11.001.
[10] R. Calegari, F.S. Fogliatto, F.R. Lucini, J. Neyeloff, R.S. Kuchenbecker and B.D. Schaan, Forecasting
Daily Volume and Acuity of Patients in the Emergency Department, Computational and Mathematical
Methods in Medicine 2016 (2016).
[11] P.F. Jamason, L.S. Kalkstein and P.J. Gergen, A Synoptic Evaluation of Asthma Hospital Admissions
in New York City, American Journal of Respiratory and Critical Care Medicine 156(6) (1997), 1781–
1788, ISSN 1073-449X. doi:10.1164/ajrccm.156.6.96-05028.
[12] A. Ekstrom, L. Kurland, N. Farrokhnia, M. Castren and M. Nordberg, Forecasting Emergency
Department Visits Using Internet Data, Annals of emergency medicine 65(4) (2015), 436–442.
[13] J. Ranse, A. Hutton, T. Keene, S. Lenson, M. Luther, N. Bost, A.N. Johnston, J. Crilly, M. Cannon and
N. Jones, Health Service Impact from Mass Gatherings: A Systematic Literature Review, Prehospital
and disaster medicine 32(1) (2017), 71–77.
[14] R. Manfredini, O. La Cecilia, B. Boari, J. Steliu, V. Michelini, P. Carli, C. Zanotti, M. Bigoni and M.
Gallerani, Circadian Pattern of Emergency Calls: Implications for ED Organization, The American
journal of emergency medicine 20(4) (2002), 282–286.
[15] T. Horanont, S. Phithakkitnukoon, T.W. Leong, Y. Sekimoto and R. Shibasaki, Weather Effects on the
Patterns of People’s Everyday Activities: A Study Using GPS Traces of Mobile Phone Users, PloS one
8(12) (2013), 81153.
[16] R. Patuelli, A. Reggiani, S.P. Gorman, P. Nijkamp and F.-J. Bade, Network Analysis of Commuting
Flows: A Comparative Static Approach to German Data, Networks and Spatial Economics 7(4) (2007),
315–331, ISSN 1572-9427. doi:10.1007/s11067-007-9027-6.
[17] L.I. Solberg, B.R. Asplin, R.M. Weinick and D.J. Magid, Emergency Department Crowding: Consensus
Development of Potential Measures, Annals of Emergency Medicine 42(6) (2003), 824–834, ISSN
01960644. doi:10.1016/S0196-0644(03)00816-3.
[18] R.H. Shumway and D.S. Stoffer, Time Series Regression and Exploratory Data Analysis, in: Time Series
Analysis and Its Applications, Springer, 2011, pp. 47–82.
[19] C. Bergmeir, R.J. Hyndman and B. Koo, A Note on the Validity of Cross-Validation for Evaluating
Autoregressive Time Series Prediction, Computational Statistics & Data Analysis 120 (2018), 70–83.
[20] M. Wargon, B. Guidet, T.D. Hoang and G. Hejblum, A Systematic Review of Models for Forecasting
the Number of Emergency Department Visits, Emergency Medicine Journal 26(6) (2009), 395–399.
64 J. Rauch et al. / Improving the Prediction of Emergency Department Crowding

[21] S. Jiang, K.-S. Chin and K.L. Tsui, A Universal Deep Learning Approach for Modeling the Flow of
Patients under Different Severities, Computer Methods and Programs in Biomedicine 154 (2018), 191–
203, ISSN 0169-2607. doi:10.1016/j.cmpb.2017.11.003.
[22] J. Ranse, S. Lenson, T. Keene, M. Luther, B. Burke, A. Hutton, A.N. Johnston and J. Crilly, Impacts on
In-Event, Ambulance and Emergency Department Services from Patients Presenting from a Mass
Gathering Event: A Retrospective Analysis, Emergency Medicine Australasia (2018).
[23] R. Manfredini, M. Gallerani, F. Portaluppi and C. Fersini, Relationships of the Circadian Rhythms of
Thrombotic, Ischemic, Hemorrhagic, and Arrhythmic Events to Blood Pressure Rhythms, Annals of the
New York Academy of Sciences 783(1) (1996), 141–158, ISSN 1749-6632.
doi:10.1111/j.17496632.1996.tb26713.x.
[24] R. Manfredini, F. Manfredini, B. Boari, R. Salmi and M. Gallerani, The Monday Peak in the Onset of
Ischemic Stroke Is Independent of Major Risk Factors, The American journal of emergency medicine
27(2) (2009), 244–246.
[25] C. Spielberg, D. Falkenhahn, S.N. Willich, K. Wegscheider and H. Voller, Circadian, Day-of-Week,
and Seasonal Variability in Myocardial Infarction: Comparison between Working and Retired Patients,
American Heart Journal 132(3) (1996), 579–585, ISSN 0002-8703.
dHealth 2019 – From eHealth to dHealth 65
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-65

Information Adapted Machine Learning


Models for Prediction in Clinical Workflow
Stefanie JAUKa,b,1, Diether KRAMERc, Franz QUEHENBERGERb, Sai Pavan Kumar
VEERANKId, Dieter HAYNd, Günter SCHREIERd and Werner LEODOLTERc
a
CBmed, Graz, Austria
b
Institute for Medical Informatics, Statistics and Documentation,
Medical University of Graz, Austria
c
Steiermärkische Krankenanstaltengesellschaft m.b.H. (KAGes), Graz, Austria
d
AIT Austrian Institute of Technology, Graz, Austria

Abstract. Background: In a database of electronic health records, the amount of


available information varies widely between patients. In a real-time prediction
scenario, a machine learning model may receive limited information for some
patients. Objectives: Our aim was to evaluate the influence of missing data on real-
time prediction of delirium, and detect changes in prediction performance when
training separate models for patients with missing data. Methods: We compared a
model trained specifically on data with missing values to the currently
implemented model predicting delirium. Also, we simulated five test data sets with
different amount of missing data and compared the prediction results to the
prediction on complete data set when using the same model. Results: For patients
with missing laboratory and nursing assessment data, a model trained especially
for this scenario performed significantly better than the implemented model. The
combination of procedure data and demographic data achieved the closest results
to a prediction with a complete data set. Conclusion: An ongoing evaluation of
real-time prediction is indispensable. Additional models adapted to the information
available might improve prediction performance.

Keywords. classification, machine learning, electronic health records, delirium

1. Introduction

In recent years, various classification models for machine learning (ML) have been
published, predicting events in health care such as hospital readmission, cancer survival
or cardiovascular diseases [1–5]. Although many of them achieve good prediction
results in test and validation data, there is a lack of follow-up studies on the
performance of such models in dynamic decision-making situations [6]. For integrating
prediction models in a clinical workflow, one has to overcome many obstacles. Such
obstacles may not only include official regulations, standards for implementation, and
privacy protection, but also negative attitudes of health care professionals, or
interoperability of health systems [7–9]. Due to such obstacles, only a small percentage
of developed ML models have made their way to clinical practice and there is limited
research on the behaviour of highly complex models in real-time prediction scenarios.

1
Corresponding Author: Stefanie Jauk, CBmed GmbH – Center for Biomarker Research in Medicine,
Stiftingtalstraße 5, 8010 Graz, Austria; E-Mail: [email protected].
66 S. Jauk et al. / Information Adapted ML Models for Prediction in Clinical Workflow

1.1. Prediction Modelling Based on Electronic Health Records

Risk prediction models in healthcare are often based on electronic health records
(EHR) of patients. EHR contain a large amount of information for a single patient and
represent longitudinal patient histories. KAGes (Steiermärkische
Krankenanstaltengesellschaft m.b.H.), the regional health care provider in Styria
(Austria), hosts longitudinal health records of around 90% of all Styrian inhabitants
resulting from clinical documentation. The records include inpatient and outpatient
visits and have been stored electronically in the hospital information system (HIS) of
KAGes for the last 15 years.
We recently developed a model based on a random forest that predicts the
occurrence of delirium for patients at the time of admission in a KAGes hospital [10].
Delirium is a syndrome of acute confusional state which is common among
hospitalized elderly patients. It can cause adverse medical outcomes and is associated
with an increased mortality rate [11]. However, there is evidence that delirium can be
prevented in many cases using nonpharmacological interventions such as reorientation,
hydration, sleep strategies or hearing and vision adaptations [12].
In our study [10], we used the longitudinal EHR of more than 8,500 internal
medicine patients including demographic data, three-character categories of ICD-10
coded diagnoses, procedures, laboratory data, nursing assessment and transfer data for
modelling. The model achieved an area under the receiver-operating characteristic
curve (AUROC) of more than 0.90 in a separated test set.
An adaptation of the delirium model for a cohort of internal medicine and surgical
patients was implemented in May 2018 in the clinical workflow of a KAGes hospital.
For every patient admitted to the hospital, a risk of developing delirium is predicted at
(a) the point of hospitalization. A second prediction (b) takes place the morning after
admission. In some cases additional predictions (c) are made, e.g. due to a transfer to
another department. Within the HIS, prediction results are presented to the health care
personnel in three risk categories: low risk, high risk and very high risk. The model has
been approved for regular clinical use and is currently under evaluation.

1.2. Information Availability as a Barrier to Implementation

Before implementation of the model, we already assumed some limitations for real-
time prediction. One limitation is encountered when specific data is missing for the
respective patient. For instance, the determination of the laboratory parameters at the
beginning of a hospitalization is a standard procedure. However, it is possible that
technical problems may delay the availability of these parameters in the system for
some hours. Also, information of nursing assessment is being recorded within the first
48 hours. Hence, the information for prediction might not be complete at admission.
For building a training cohort of ML models, all data is retrospectively available.
However, when implementing a trained model in clinical workflow, data transmission
delays might influence the real-time prediction. The implementation proved that – for
some patients – a model previously trained with a well-defined and well-prepared data
set did not represent the data we came across in the real-time scenario.
In addition, data available at time of hospitalization (and hence at time of
prediction) is usually not missing-at-random, but there are systematically missing
feature groups. This problem may be explained by two examples from our data:
S. Jauk et al. / Information Adapted ML Models for Prediction in Clinical Workflow 67

x When a patient with an existing KAGes-EHR is admitted to the hospital, most


information is available at the time of admission, including demographic data,
previous transfers and procedures, or previously coded diagnoses. However,
detailed information on the current condition of a patient such as latest
laboratory data and nursing assessment might need some time for evaluation
or data transmission.
x When a patient is admitted to a KAGes hospital for the first time, features
based on data from previous hospitalizations are not available.

1.3. Importance of Feature Subsets for Prediction

One way to determine the influence of single features on the prediction is the
measurement of variable importance. Variable importance helps to understand what
variables are most important for prediction results. Different approaches have been
studied, but recent studies show that many methods suffer from biases, especially if the
modelling variables vary in measurement scales [13]. Also, variable importance
methods focus on the entire prediction model and the results cannot explain individual
prediction result for a single patient. We believe that in addition to the analysis of
variable importance of the entire model, further evaluation on individual results is
necessary.
In one of our previous studies, we showed that a model trained with demographic
information and nursing assessment data only, achieved a higher AUROC than models
trained with other feature groups [14]. This indicates that nursing assessment data
contain informative features for our prediction model for delirium.
However, an application of the obtained results for a prediction model scenario in
clinical practice is not trivial. It is not feasible and efficient to train a prediction model
for every possible combination of informative features missing, as the number of such
combinations is too high. In addition, it remains unclear how a model trained with
information in all features performs for a patient with missing information in some
informative features, like nursing assessment, and how this might influence the
implementation of the model in clinical practice.

1.4. Objectives

Our first experiences with the real-time prediction of our delirium model confirmed
that recent nursing assessment data is missing at admission time for some patients, and
we are aware of the importance of nursing assessment in the prediction context. Also,
for some cases, laboratory data might be available only shortly after admission time.
Therefore, the aim of this study is to evaluate a possible benefit of a model that is
trained specifically for a case of missing laboratory data and nursing assessment data at
admission. Dependent on the information available in the HIS for a patient, such an
information adapted model could be employed in addition to the prediction based on
the whole feature set.
Besides, we want to determine the feature groups that result in a risk prediction
closest to the one achieved by a complete data set when using the same prediction
model. This simulation helps to understand the effect of missing data in certain feature
groups when using a model trained on a complete data set.
68 S. Jauk et al. / Information Adapted ML Models for Prediction in Clinical Workflow

2. Methods

The analysed data were extracted from the HIS of KAGes, openMEDOCS, which is
based on IS-H/i.s.h.med information systems, implemented on SAP platforms. After
extraction and anonymization, all analyses were computed in R using various packages.
For modelling of random forest models, the caret package [15] as well as associated
packages were used.
The study is part of the project that received approval from the Ethics Committee
of the Medical University of Graz (30-146 ex 17/18).

2.1. Development of the Implemented Model A

The random forest model implemented in May 2018 was an adapted version of the
previously published model [10]. We used the same inclusion and exclusion criteria as
in the previous model, but an extended period of admission time from 2011 till 2018.
During that period, 6,459 patients were coded with the ICD-10 three-character category
F05 (delirium due to known physiological condition). In addition to the delirium
patients we included 13,445 randomly selected controls from internal medicine and
surgical departments.
We selected variables based on literature and previous analyses. Examples of the
modelling features are shown in Table 1. As we cannot differentiate e.g. between not
coded and not present diseases, missing values for features were set to zero (i.e. not
present). This also applies to not analysed laboratory values and nursing assessment.
We split the cohort into training (75%) and test data set (25%). We trained a
random forest with up-sampling on the training data set including a 10-fold cross
validation, and tested the model on the separate test data. The model achieved an
AUROC of 0.85 for the cohort of surgical and internal medicine patients.
Although the classification of the model was binary (delirium vs. non-delirium),
we later added a second threshold in the implemented version in order to receive three
classes: low risk, high risk and very high risk. The chosen threshold was the result of
clinical considerations to alert patients above the 85th percentile with delirium risk.

2.2. Changes in Risk Prediction Due to Missing Data

To assess the contribution of information in different feature groups, we used a


validation cohort of patients with an admission between 1 st of March, 2017 and 1st of
March, 2018. We extracted EHRs of 7,514 patients, and used the already implemented

Table 1. Examples of modelling features extracted from electronic health records (n=556).
Feature group Examples n
Demographic data Age, sex, mother tongue, additional private insurance 30
Diagnosis codes ICD-10 codes (e.g. F00, E11, E78, I10, N39, I49), groups of 275
ICD-10 codes (e.g. F00_F09), total number of diagnoses
Procedure codes X-ray, MRI, physiotherapy, CT scan 95
Laboratory data CRP, ALT, AST, cholesterol, gamma-GT, haemoglobin, 53
bilirubin, MCV, creatinine
Nursing protocols Hearing impaired, vision impaired, sleeping disorder, body 96
mass index, catheter, communication possible, smoking
Administrative data, Number of transfers, number of hospital admissions, Charlson 7
indices comorbidity index, number of procedures
S. Jauk et al. / Information Adapted ML Models for Prediction in Clinical Workflow 69

random forest model (Model A) with the complete data set to predict the delirium risk
for the cohort.
To examine the influence of five available features groups, we simulated five
subsets. Every subset included informative features for demographic data (DEM) of a
patient. Additionally, each subset contained informative data in one of the following
feature groups only: Coded diagnoses (ICD), applied procedures (PROC), laboratory
data (LAB), nursing assessment data (NURS), and transfer data (TRANS). The
remaining features for every subset set were set to zero, i.e. non-informative.
We predicted the risk of delirium for every patient with Model A on each subset
and compared the results to the prediction with the complete data set. We calculated the
root mean squared error (RMSE) for all five subsets. The RMSE represents the square
root of the average of the differences between the prediction with the complete data set
and each subset, meaning that a high RMSE represents a high difference in prediction.
Our aim was to evaluate the deviation of predicted risk probabilities from a complete
data set, and therefore we did not evaluate the specificity or sensitivity of the prediction.

2.3. Information Adapted Model Training for a Use-Case Scenario

In order to evaluate the performance of an information adapted model, we compared


Model A to a random forest trained with the same data, but excluding features of
laboratory data and nursing assessment (Model B). As training data, we used the same
data set on which we had trained Model A.
For Model A, 556 features had been used for prediction, whereas Model B was
trained without laboratory and nursing assessment features (resulting in 407 features).
Both models were trained with a 10-fold cross validation, and the same training and
test split, which resulted in two test data sets with the same cases. For testing Model A,
we simulated missing (non-informative) laboratory data and nursing assessment data
for all cases in the test data; i.e. we set available values to zero.
We predicted the delirium risk for the simulated test data with Model A and Model
B and compared the results via plotting the receiver-operating characteristic curves
(ROC). In addition, we used the DeLong test for two dependent ROC curves to assess
differences of the two curves with an alpha-level of .05.

3. Results

3.1. Changes in Risk Prediction Due to Missing Data

Table 2 shows the results of the prediction on a complete data set compared to five
subsets excluding informative features of certain feature groups. The total number of
informative features for every subset varied between 37 and 305. The RMSE of
information in demographic data (DEM) only was the highest with 0.54. The subset of
DEM was represented by 30 features and was part of each subset. The lowest RMSE,
and therefore the lowest deviation from the prediction with a complete data set, was
achieved by the subset including informative features of procedures (DEM + PROC). A
risk prediction with diagnoses (DEM + ICD) was furthest from the results of the
complete data set. Although in the subset of transfers (DEM + TRANS) information for
only seven further features was added to the 30 demographic features, the RMSE was
the second lowest.
70 S. Jauk et al. / Information Adapted ML Models for Prediction in Clinical Workflow

Table 2. Root mean squared error (RMSE) for individual differences of 7,514 predicted risk probabilities
comparing the complete feature set with subsets. Every subset includes 30 features from the DEM feature set.
Feature group Number of features RMSE
DEM 30 0.540
DEM + ICD 305 0.491
DEM + LAB 83 0.361
DEM + NURS 126 0.235
DEM + PROC 125 0.180
DEM + TRANS 37 0.225

3.2. Information Adapted Model Training for a Use-Case Scenario

Figure 1 demonstrates the results of the prediction of two models for the same test data
set with missing information in laboratory and nursing assessment features. Model B,
trained without features of nursing assessment and laboratory, achieved better results in
delirium prediction than Model A. The AUROC for Model B was 0.830 [0.818, 0.841],
and 0.799 [0.787, 0.812] for Model A.
A DeLong test for correlated ROC curves showed a significant difference between
the curves of Model A and Model B (Z = 10.388, p < 0.001).

4. Discussion

Even though a prediction model performs well in the test data set, several obstacles
might occur during its implementation in a real-clinical workflow. In May 2018, we
integrated a random forest model predicting an occurrence of delirium in the HIS of a
hospital in Austria. As one may not foresee problems arising during implementation, a
constant evaluation of the implemented model is crucial. Our study raises awareness of
emerging limitations when a prediction model is implemented in a clinical workflow.
Also, we presented a way on how to overcome some of these limitations.
We applied our random forest model on data subsets with informative data missing
in different feature groups. The prediction using only demographic data and
information of procedures was closest to the prediction with the complete data set. A
subset with information of nursing assessment achieved the third closest prediction to
the complete data set. When showing the importance of nursing assessment in our
previous study [14], we trained separate models for combinations of feature groups.
This time, we used one model which was trained on the complete data set, and applied
it on data sets with missing information. We conclude that the performance of the same
model varies for patients with different information available, and that not all of this
variation might be explained by a global measure of variable importance.
At time of admission, for some patients information used for prediction might be
missing not-at-random and set to zero for prediction. For patients with missing
laboratory data and nursing assessment, a model trained specifically for that scenario
(Model B) achieved better prediction results than the currently implemented model
trained with all features (Model A). This indicates that for the implementation scenario
in KAGes two models are needed: Depending on the availability of recent laboratory
data and nursing assessment data, Model A or Model B should be employed for
prediction at admission time. This conclusion is remarkable, as it shows that the
S. Jauk et al. / Information Adapted ML Models for Prediction in Clinical Workflow 71

1.0
0.8
0.6
Sensitivity
0.4

Model A
0.2

Model B
0.0

1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2


Specificity

Figure 1. ROC curves for Model A (random forest trained with all features) and Model B (random forest
trained without nursing assessment and laboratory features). Prediction was computed on the same test data
with missing values of nursing assessment and laboratory data.

implementation of a ML model has to be strongly adapted to different scenarios in real-


time prediction.
Another way to overcome the problem of missing data would be the creation of a
training data set that is more similar to a real-time prediction scenario. However, it
proved to be very difficult to reconstruct the exact time points for data input: Some
diagnoses might be coded at one day, and deleted on the following day. Also, manual
input errors can’t be simulated easily.
However, the results of this study do not provide a method that is generalizable on
every implementation of a ML model. We assume that the results of the same
evaluation can be different for other use-cases and other hospitals. However, the
generated knowledge gives an idea on how to tackle the problem of information
availability when implementing prediction models with many modelling features for
real-time prediction.
To sum up, our study showed that an ongoing evaluation is indispensable in order
to understand the prediction results of a ML model implemented in a clinical workflow.
For some cases it might be beneficial to train adapted models that depend on the
available information and on systematically missing feature groups.

Acknowledgements

This work has been carried out with the K1 COMET Competence Centre CBmed,
which is funded by the Federal Ministry of Transport, Innovation and Technology
(BMVIT); the Federal Ministry of Science, Research and Economy (BMWFW); Land
Steiermark (Department 12, Business and Innovation); the Styrian Business Promotion
Agency (SFG); and the Vienna Business Agency. The COMET program is executed by
the FFG. KAGes and SAP provided significant resources, manpower and data as basis
for research and innovation.
72 S. Jauk et al. / Information Adapted ML Models for Prediction in Clinical Workflow

References

[1] K. Kourou, T.P. Exarchos, K.P. Exarchos, M.V. Karamouzis, and D.I. Fotiadis, Machine learning
applications in cancer prognosis and prediction, Comput. Struct. Biotechnol. J. 13 (2015) 8–17.
doi:10.1016/j.csbj.2014.11.005.
[2] A. Wong, A.T. Young, A.S. Liang, R. Gonzales, V.C. Douglas, and D. Hadley, Development and
Validation of an Electronic Health Record–Based Machine Learning Model to Estimate Delirium Risk
in Newly Hospitalized Patients Without Known Cognitive Impairment, JAMA Netw. Open. 1 (2018)
e181018. doi:10.1001/jamanetworkopen.2018.1018.
[3] S. Hao, Y. Wang, B. Jin, A.Y. Shin, C. Zhu, M. Huang, L. Zheng, J. Luo, Z. Hu, C. Fu, D. Dai, Y.
Wang, D.S. Culver, S.T. Alfreds, T. Rogow, F. Stearns, K.G. Sylvester, E. Widen, and X.B. Ling,
Development, Validation and Deployment of a Real Time 30 Day Hospital Readmission Risk
Assessment Tool in the Maine Healthcare Information Exchange, PLOS ONE. 10 (2015) e0140271.
doi:10.1371/journal.pone.0140271.
[4] S.F. Weng, J. Reps, J. Kai, J.M. Garibaldi, and N. Qureshi, Can machine-learning improve
cardiovascular risk prediction using routine clinical data?, PLOS ONE. 12 (2017) e0174944.
doi:10.1371/journal.pone.0174944.
[5] A. Rajkomar, E. Oren, K. Chen, A.M. Dai, N. Hajaj, M. Hardt, P.J. Liu, X. Liu, J. Marcus, M. Sun, P.
Sundberg, H. Yee, K. Zhang, Y. Zhang, G. Flores, G.E. Duggan, J. Irvine, Q. Le, K. Litsch, A. Mossin,
J. Tansuwan, D. Wang, J. Wexler, J. Wilson, D. Ludwig, S.L. Volchenboum, K. Chou, M. Pearson, S.
Madabushi, N.H. Shah, A.J. Butte, M.D. Howell, C. Cui, G.S. Corrado, and J. Dean, Scalable and
accurate deep learning with electronic health records, Npj Digit. Med. 1 (2018). doi:10.1038/s41746-
018-0029-1.
[6] M. Islam, M. Hasan, X. Wang, H. Germack, and M. Noor-E-Alam, A Systematic Review on
Healthcare Analytics: Application and Theoretical Perspective of Data Mining, Healthcare. 6 (2018)
54. doi:10.3390/healthcare6020054.
[7] F. Jiang, Y. Jiang, H. Zhi, Y. Dong, H. Li, S. Ma, Y. Wang, Q. Dong, H. Shen, and Y. Wang, Artificial
intelligence in healthcare: past, present and future, Stroke Vasc. Neurol. 2 (2017) 230–243.
doi:10.1136/svn-2017-000101.
[8] E.G. Liberati, F. Ruggiero, L. Galuppo, M. Gorli, M. González-Lorenzo, M. Maraldi, P. Ruggieri, H.
Polo Friz, G. Scaratti, K.H. Kwag, R. Vespignani, and L. Moja, What hinders the uptake of
computerized decision support systems in hospitals? A qualitative study and framework for
implementation, Implement. Sci. 12 (2017). doi:10.1186/s13012-017-0644-2.
[9] R. Amarasingham, R.E. Patzer, M. Huesch, N.Q. Nguyen, and B. Xie, Implementing Electronic Health
Care Predictive Analytics: Considerations And Challenges, Health Aff. (Millwood). 33 (2014) 1148–
1154. doi:10.1377/hlthaff.2014.0352.
[10] D. Kramer, S. Veeranki, D. Hayn, F. Quehenberger, W. Leodolter, C. Jagsch, and G. Schreier,
Development and Validation of a Multivariable Prediction Model for the Occurrence of Delirium in
Hospitalized Gerontopsychiatry and Internal Medicine Patients., Stud. Health Technol. Inform. 236
(2017) 32–39.
[11] S.K. Inouye, R.G. Westendorp, and J.S. Saczynski, Delirium in elderly people, The Lancet. 383 (2014)
911–922. doi:10.1016/S0140-6736(13)60688-1.
[12] T.T. Hshieh, J. Yue, E. Oh, M. Puelle, S. Dowal, T. Travison, and S.K. Inouye, Effectiveness of
Multicomponent Nonpharmacological Delirium Interventions: A Meta-analysis, JAMA Intern. Med.
175 (2015) 512. doi:10.1001/jamainternmed.2014.7779.
[13] C. Strobl, A.-L. Boulesteix, A. Zeileis, and T. Hothorn, Bias in random forest variable importance
measures: Illustrations, sources and a solution, BMC Bioinformatics. 8 (2007). doi:10.1186/1471-2105-
8-25.
[14] S. Veeranki, D. Hayn, D. Kramer, S. Jauk, and G. Schreier, Effect of Nursing Assessment on
Predictive Delirium Models in Hospitalised Patients, Stud. Health Technol. Inform. (2018) 124–131.
doi:10.3233/978-1-61499-858-7-124.
[15] M. Kuhn, caret: Classification and Regression Training. R package version 6.0-78., 2017.
dHealth 2019 – From eHealth to dHealth 73
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-73

Evaluation of Chatbot Prototypes for


Taking the Virtual Patient’s History
Andreas REISWICHa,1, Martin HAAGa
a
GECKO Institute, Heilbronn University of Applied Sciences, Heilbronn, Germany

Abstract. In medical education Virtual Patients (VP) are often applied to train
students in different scenarios such as recording the patient’s medical history or
deciding a treatment option. Usually, such interactions are predefined by software
logic and databases following strict rules. At this point, Natural Language
Processing/Machine Learning (NLP/ML) algorithms could help to increase the
overall flexibility, since most of the rules can derive directly from training data. This
would allow a more sophisticated and individual conversation between student and
VP. One type of technology that is heavily based on such algorithmic advances are
chatbots or conversational agents. Therefore, a literature review is carried out to give
insight into existing educational ideas with such agents. Besides, different
prototypes are implemented for the scenario of taking the patient’s medical history,
responding with the classified intent of a generic anamnestic question. Although the
small number of questions (n=109) leads to a high SD during evaluation, all scores
(recall, precision, f1) reach already a level above 80% (micro-averaged). This shows
a first promising step to use these prototypes for taking the medical history of a VP.

Keywords. natural language processing, machine learning, medical education,


algorithms

1. Introduction

In the publication by Riemer and Abendroth [1], different approaches are presented how
Virtual Patients (VP) can best be used in medical education. The listed systems are
thereby CASUS, CAMPUS and INMEDEA [1]. Typical task types in such case-based
learning systems are, for example, multiple-choice, long menu, free text, or assignment
questions [2]. In the current CAMPUS system, such an interaction element is used during
the anamnesis interview. Thereby, the students are able to select an appropriate
anamnestic question and receive the VP answer from CAMPUS. This dialog is based
upon predefined anamnestic questions, which are stored in a database. In order to build
up additional competencies among the students a more flexible and individual
conversation should be considered. This requires therefore a change of the system, since
storing an anamnestic question in each variation isn’t feasible. Hence, this research paper
examines various methods from the fields of Natural Language Processing/Machine
Learning (NLP/ML) to provide the capability of dealing with unknown questions. To
this end, different approaches (prototypes) are implemented using Python, evaluated by
leave-one-out cross-validation (LOO) and compared with each other using different

1
Corresponding Author: Andreas Reiswich, Heilbronn University, Max-Planck-Straße 39, 74081
Heilbronn, Germany, E-Mail: [email protected]
74 A. Reiswich and M. Haag / Evaluation of Chatbot Prototypes

micro-averaged ML scores (recall, precision, f1). The overall aim is to determine the best
performing prototype, when considering a small number of given training data (n=109).

2. Methods

2.1. Literature Review

The literature databases MEDLINE1, IEEE Digital Library2 and ACM Digital Library3
were used to conduct a systematic review of chatbots that were utilized in an educational
environment. The underlying method was derived from the PRISMA flow diagram [3].
The year range was set for all databases from 2015 to 2018. In case of IEEE the options
Full Text & Metadata and My Subscribed Content were selected for the search. For ACM,
the option ACM Full-Text Collection and Any field for search terms and lastly, the option
All Fields for PubMed, were selected. When performing the search on all three databases,
the following search string was applied (ACM result syntax output):
(+Chatbot* Training Education Apprenticeship Teaching)
In addition, the TeXMed [4] website was used to generate a BibTex file for PubMed.
All extracted results were then managed by Zotero4. Besides an additional Excel file was
used to summarize relevant information related to a set of predefined dimensions of
interest. These dimensions comprise, for example, the technical solution such as the
usage of the Artificial Intelligence Markup Language (AIML), Machine Learning (ML)
algorithms and the implemented user interface (UI). Aspects related to an educational
concept were also considered. The final update of all literature entries was conducted on
26th January 2018.

2.2. Technical Setup

All ML prototypes were developed either directly in Python (scikit-learn [5]) or with an
adapter class for Rasa NLU [6]. Solutions that weren’t available in Python were
excluded. Python was used as it is the core language for many scientific fields including
Artificial Intelligence (AI) & ML while still offering a highly readable code [7]. This
allowed a more efficiently use of own software fragments and created a uniform
approach for the overall evaluation using the inbuilt method cross_val_score [8] of
scikit-learn for the LOO cross-validation. In addition, Rasa NLU was selected as an open
source chatbot platform, which allows an execution on a private server. This can also be
a future prerequisite, for example, if chatbots build up on sensitive data from patients or
students. Therefore, external service providers such as Google or Facebook were
excluded from evaluation. In order to use Rasa for classification, all anamnestic
questions and the corresponding intents were specified in a separate file using the
required markdown language. For this, each intent was described by a heading and the
chatbot’s knowledge base as an enumeration of all anamnestic questions in plain
German. An example for a question and its intent is: “What medical complaints do you
have?” (intent: complaints_identification). No entities and no sentence modifications are
applied during this step for Rasa.

2
https://ieeexplore.ieee.org/Xplore/home.jsp, last access: 29.01.2019.
3
https://dl.acm.org/, last access: 29.01.2019.
4
https://www.zotero.org/, last access: 29.01.2019.
A. Reiswich and M. Haag / Evaluation of Chatbot Prototypes 75

Besides Rasa, different prototypes were also implemented in scikit-learn for


classifying the intent. Thereby, each question was transformed into a high dimensional
vector representation using the Bag of Words (BOW) approach [9] to generate the feature
matrix (rows: anamnestic questions, columns: BOW generated words as features). In
addition, methods like tf-idf, hashing and Doc2Vec were considered. However, BOW
was selected as it is simple to use while showing sufficient accuracy for an initial proof
of concept.
For the purpose of data exploration, additional visualizations were generated by
Exploratory 5 and yellowbrick 6 . Finally, the evaluation process was conducted by the
cross_val_score method of scikit-learn, applying it on all implemented prototypes. The
specific test procedure for cross_val_score was set to LOO [8] and the individual scoring
methods (micro-averaged) recall, precision and f1 were selected [10]. Finally, all results
were bundled into a Slack7 application.

3. Results

3.1. Results of the Literature Review

The literature review led to 332 papers on IEEE, 124 papers on ACM and 2 papers on
PubMed. They were then added to Zotero using the generated BibTex files, which also
included abstracts if available. In summary, 458 papers were considered from these three
databases using standard export and import of each database provider, TeXMed and
Zotero. Nonetheless, several files were removed before or during the process of sighting
each paper’s title and abstract. The reason was either duplicates (5), no valuable
information (24), e.g. referring only to a schedule [11] or having no access (1) [12].
Finally, 428 papers could be usefully sighted for title and abstract. At this stage, all
chatbots were considered, which were integrated in a more advanced educational concept,
e.g. in a concept of a Massive Open Online Course (MOOC). Therefore, results like
answering FAQs of a university [13], supporting the degree program choice of a student
[14] were not further reviewed. In addition, all results were excluded, which weren’t
enough focusing on a chatbot approach, e.g. only listing a chatbot as an example [15].
After sighting each title and abstract, 36 papers from IEEE, 7 from ACM and 1 from
PubMed were sighted on their full text information, leading to 21 accepted publications
(14 IEEE and 7 ACM papers). During this step, the paper quality itself wasn’t considered
as an additional criterion, only the chatbot context was decisive. In the following, a short
summary of the results related to the educational setting is presented.
Frequently, chatbots were integrated in a MOOC scenario [16][17][18][19]. Thereby,
Demetriadis et al. [16] focused on creating a more productive talk using transactive
questions and conceptual links to shape the relevant domain model of a task. Kloos et al.
[17] proposed a MOOC complementary chatbot, allowing to learn Java in several
interaction modes, such as review and gaming. Besides MOOCs, conversational agents
were also applied in Virtual Reality (VR) [20][21][22] creating an immersive educational
environment. Tsaramirsis et al. [21], for example, simulated the experiences of a student
in a classroom, including the communication with the lecturer. If the lecturer didn’t

5
https://exploratory.io/, last access: 29.01.2019.
6
https://www.scikit-yb.org/en/latest/, last access: 20.03.2019.
7
https://slack.com/, last access: 29.01.2019.
76 A. Reiswich and M. Haag / Evaluation of Chatbot Prototypes

respond to a student’s question, an inbuilt AIML Chatbot was used to generate the answer.
Other chatbot realizations covered the use case of language learning [20][23][24].
Troussas et al. [23] developed a mobile chatbot for learning vocabularies through text or
voice response. Further, gamification elements were incorporated by [25][26][27].
Pereira [25], for example, created a quiz chatbot for students in different subjects.
Thereby, a Telegram UI was applied since students were familiar in using such instant
messaging services [25]. Besides these results, Webber [28] was the only fitting VP
approach that was referenced within the literature results. However, Webber builds on a
rule-based SQL approach [29] that was published in 2005 [28]. Therefore, the intention
of this paper is to revisit the concept of a VP chatbot, considering next to classical ML
methods a modern approach (Rasa NLU) and a mobile chatting app (Slack) as it was
suggested by Io and Lee [30] in a recent bibliometric analysis about chatbots.

3.2. Data Description and Exploration

Several data sources were integrated to train and evaluate the different prototypes. This
included questions from the CAMPUS database and online resources [31][32][33][34].
They were used to generate data sets with the modified textcorpus-generator8 project.
Thereby, each anamnestic question was annotated by a single intent based on a personal
assumption that derived from the gained insight of these resources. A typical question
from the data set is for example “How much do you currently smoke per day?” (Intent:
Smoking) or the one given in Section 2.2.
Table 1. Basic properties of the anamnestic questions corpora
Features

Dataset Number of words Number of chars Proportion of stop words

AnamnesticData (n=109) (SD: 4.38) (SD: 26.65) (SD: 0.136)


MED/AVG/MIN/MAX 8 / 9.3 / 3 / 21 46 / 52.52 / 16 / 135 28.6 / 29.3 / 0 / 57.14

The basic properties of the underlying questions corpora are described by several
key features, being shown in Table 1. For each corresponding corpus, either the median
(MED), average (AVG), minimum (MIN) or maximum (MAX) were calculated.

Figure 1. Distribution of all intents in given anamnestic questions.


Additionally, an absolute frequency distribution, described in Figure 1, gives an
insight of the amount of annotated questions for each single intent. This data distribution

8
https://github.com/pagesjaunes/textcorpus-generator, last access: 09.12.2018.
A. Reiswich and M. Haag / Evaluation of Chatbot Prototypes 77

represents a general expected imbalance but doesn’t claim to reflect the future reality.
The underlying assumption is based on the opinion that certain intents won’t allow to
create the same amount of questions because they are either more specific (e.g. Person
weight) or more generally designed (e.g. Smoking). A future study must therefore show
how to define intents to avoid overlaps before ML classification.

Figure 2. Embeddings of anamnestic questions in 2D vector space after PCA.


Figure 2 indicates this overlapping issue by using a 2D representation of each single
anamnestic question. It was created by applying a Principal Component Analysis (PCA)
on the BOW transformed questions. BOW generated a vocabulary with 385 words for
all questions. For a future use, it is therefore crucial to find more intents like, for example,
Alcohol and Smoking (Figure 2), which generate distinctive ML features (better
distributed vectors). This would facilitate a clear ML classification and thus allowing to
return the right VP answer for the medical student.

3.3. Implementing prototypes for intent classification

To determine the intent of an anamnestic question different prototypes were developed


with scikit-learn and Rasa (Section 2.2). For scikit-learn, all anamnestic questions
(n=109) were first transformed to a high-dimensional vector space using the BOW model
(385 features) without applying any prior preprocessing. After sentence embedding
various supervised ML algorithms were applied, which were already integrated in scikit-
learn, including: Random Forest (RF), Naïve Bayes (NB), Linear Support Vector
Classification (Linear SVC) and the Logistic Regression (LR). Thereby, the
classification targets were set to the intents (n=8) while having the overall distribution
described in Figure 1.

Figure 3. Slack integration of the anamnesis chatbot. User chooses Rasa NLU as a classifier. Input sentence
contains an intentional spelling mistake and the chatbot returns the classified intent and its confidence value.
78 A. Reiswich and M. Haag / Evaluation of Chatbot Prototypes

Finally, all created prototypes were bundled into a Slack application. Figure 3
illustrates the final Slack UI using the Rasa prototype for intent classification. Thereby,
each single prototype is selectable by using the keyword: set_mode_x where x stands for
the number of the individual prototype, e.g. x=2 for Rasa (Figure 3). All additional
mappings can be listed by using the chatbot’s help command. All in all, the use of Slack
creates a UI, which allows the communication between student and VP through a
modern, well-known messaging service.

3.4. Classifier Evaluation

Table 2. Performance measures for the implemented approaches.


Scoring method

Approach recall_micro precision_micro f1_micro


in % in % in %
Platform
Rasa* 88.073 r 32.410 88.073 r 32.410 88.991 r 31.300

Scikit-Learn
RF* 85.321 r 35.390 84.404 r 36.282 82.569 r 37.938
NB 81.651 r 38.706 81.651 r 38.706 81.651 r 38.706
LSCV 88.991 r 31.300 88.991 r 31.300 88.991 r 31.300
LR 85.321 r 35.390 85.321 r 35.390 85.321 r 35.390

* score value fluctuates at over 80%

Since the total number of the available data for a specific intent was small, the LOO
cross-validation method was selected as a test procedure. In the next step each scikit-
learn prototype inherited the BaseEstimator9 class and implemented the adapter methods
fit(self, X, y) and predict(self, X). Subsequently, an object of this class was passed to the
cross_val_score method to perform the final evaluation. Thereby, the following scoring
methods (micro-averaged) were applied: recall, precision and f1. The results of this
respective approach are listed in Table 2, where each value represents an average of all
n measurements. All prototypes achieve a score of over 80% for each individual
combination with Rasa and LSVC delivering the best results (RF excluded because of
high fluctuation). However, since the SD of each score is very high due to LOO, all
current prototypes (Section 3.2) should remain selectable by the UI (Section 3.3) for a
future field study. This would allow to record not only new real data for cross-validation
but also gain feedback on the individual perception of use by each medical student. Both
insights could then be analyzed to select the final prototype for the use in a case-based
learning system.

4. Discussion

Considering the given data that is shown in Table 1, both mean values indicate that the
total number of words and characters in the training data is low. Instead, the proportion
of stop words with about 29% is quite high. Bearing all this in mind, the subsequent ML

9
https://scikit-learn.org/stable/modules/generated/sklearn.base.BaseEstimator.html, last access:
12.02.2019.
A. Reiswich and M. Haag / Evaluation of Chatbot Prototypes 79

algorithms had only few valuable information to deal with and still performed fine
(>80%, but high SD due LOO). Further improvements could be made by ML parameter
optimizations or by increasing the data quality when allowing medical students or experts
to ask questions and subsequently integrate their feedback for training purposes.
The results of Table 2 also show that there are only minor differences in terms of
classification performance between the Rasa NLU platform and the assembled scikit-
learn implementations. It would be interesting to investigate whether lemmatizing or
additional feature extractions, e.g. from grammatical structures, could lead to further
performance improvements for the scikit-learn prototypes.
The current version of the Slack UI could allow an unstructured anamnesis survey
between chatbot and user by replacing the intent with a VP answer. Thereby, it can be
used on the computer as well as within a mobile application. For future development, the
dialog system could be extended by storylines, e.g. by Rasa stories, making the overall
conversation more sophisticated. At this point, additional concepts like voice commands
[20] or a VR avatar [21] can be also considered. If there are no further improvements in
data quality the interaction between user and chatbot might be facilitated by additional
UI elements like in Fadhil and Villafiorita [26]. This could help building a feedback
channel, e.g. displaying a small set of intents for user selection if the confidence of the
chatbot isn’t sufficiently high enough. As a result, the user could indicate a suitable
intention or deny the given suggestion completely. These statements could then be
forwarded to an author’s Slack workspace for revision and re-added to the chatbot for
Reinforcement Learning.

References

[1] M. Riemer and M. Abendroth, Virtuelle Patienten: Wie werden sie aus Sicht von Medizinstudierenden am
besten eingesetzt?, Ger. Med. Sci. GMS E-J., (2013).
[2] M. R. Fischer et al., Virtuelle Patienten in der medizinischen Ausbildung: Vergleich verschiedener
Strategien zur curricularen Integration, Z. Für Evidenz Fortbild. Qual. Im Gesundheitswesen, 102(10),
(2008), 648–653.
[3] PRISMA, PRISMA Flow Diagram, http://prisma-statement.org/prismastatement/flowdiagram.aspx, last
access: 29.01.2019.
[4] TeXMed, TexMed – a BibTeX interface for PubMed, https://www.bioinformatics.org/texmed/, last access:
29.01.2019.
[5] F. Pedregosa et al., Scikit-learn: Machine Learning in Python, J. Mach. Learn. Res., 12, 2011, 2825–2830.
[6] Rasa NLU, Rasa NLU: Language Understanding for chatbots and AI assistants, https://rasa.com/docs/nlu/,
last access: 30.01.2019.
[7] G. Rashed and R. Ahsan, Python in Computational Science: Applications and Possibilities, Int. J. Comput.
Appl., 46(20), 2012, 26-30.
[8] Scikit-learn, API Reference, https://scikit-learn.org/stable/modules/classes.html#module-
sklearn.model_selection, last access: 29.01.2019.
[9] Scikit-learn, Text feature extraction, https://scikit-learn.org/stable/modules/feature_extraction.html#text-
feature-extraction, last access: 22.01.2019.
[10] Scikit-learn, Model evaluation: quantifying the quality of predictions, https://scikit-
learn.org/stable/modules/model_evaluation.html, last access: 29.01.2019.
[11] IEEE Xplore, Schedule, 2018 Zooming Innovation in Consumer Technologies Conference (ZINC),
(2018).
[12] S. Garg et al., Clinical Integration of Digital Solutions in Health Care: An Overview of the Current
Landscape of Digital Technologies in Cancer Care, JCO Clin Cancer Inf., 2(2), 2018, 1-9.
[13] B. R. Ranoliya et al., Chatbot for university related FAQs, 2017 International Conference on Advances
in Computing, Communications and Informatics (ICACCI), 2017, 1525–1530.
80 A. Reiswich and M. Haag / Evaluation of Chatbot Prototypes

[14] S. Mirri et al., User-driven and open innovation as app design tools for high school students, in 2018
IEEE 29th Annual International Symposium on Personal, Indoor and Mobile Radio Communications
(PIMRC), 2018, 6–10.
[15] C. Kuo and J. Z. Shyu, An Innovative Syndicate Medium Ecosystem, in 2018 IEEE International
Symposium on Innovation and Entrepreneurship (TEMS-ISIE), 2018, 1–5.
[16] S. Demetriadis u. a., „Conversational Agents as Group-Teacher Interaction Mediators in MOOCs“, in
2018 Learning With MOOCS (LWMOOCS), 2018, S. 43–46.
[17] C. D. Kloos et al., Design of a Conversational Agent as an Educational Tool, in 2018 Learning With
MOOCS (LWMOOCS), 2018, 27–30.
[18] H. Hsu and N. Huang, Xiao-Shih: The Educational Intelligent Question Answering Bot on Chinese-Based
MOOCs, in 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA),
2018, 1316–1321.
[19] A. Mitral et al., MOOC-O-Bot: Using Cognitive Technologies to Extend Knowledge Support in MOOCs,
in 2018 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE),
2018, 69–76.
[20] A. Berns et al., Exploring the Potential of a 360° Video Application for Foreign Language Learning, in
Proceedings of the Sixth International Conference on Technological Ecosystems for Enhancing
Multiculturality, New York, 2018, 776–780.
[21] G. Tsaramirsis et al., Towards simulation of the classroom learning experience: Virtual reality approach,
in 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom),
2016, 1343–1346.
[22] I. Stanica et al., VR Job Interview Simulator: Where Virtual Reality Meets Artificial Intelligence for
Education, in 2018 Zooming Innovation in Consumer Technologies Conference (ZINC), 2018, 9–12.
[23] C. Troussas et al., Integrating an Adjusted Conversational Agent into a Mobile-Assisted Language
Learning Application, in 2017 IEEE 29th International Conference on Tools with Artificial Intelligence
(ICTAI), 2017, 1153–1157.
[24] X. L. Pham et al., Chatbot As an Intelligent Personal Assistant for Mobile Language Learning, in
Proceedings of the 2018 2Nd International Conference on Education and E-Learning, New York, 2018,
16–21.
[25] J. Pereira, Leveraging Chatbots to Improve Self-guided Learning Through Conversational Quizzes, in
Proceedings of the Fourth International Conference on Technological Ecosystems for Enhancing
Multiculturality, New York, 2016, 911–918.
[26] A. Fadhil and A. Villafiorita, An Adaptive Learning with Gamification & Conversational UIs: The Rise
of CiboPoliBot, in Adjunct Publication of the 25th Conference on User Modeling, Adaptation and
Personalization, New York, 2017, 408–412.
[27] K. Katchapakirin and C. Anutariya, An Architectural Design of ScratchThAI: A Conversational Agent
for Computational Thinking Development Using Scratch, in Proceedings of the 10th International
Conference on Advances in Information Technology, New York, 2018, 7:1–7:7.
[28] G. M. Webber, Data Representation and Algorithms For Biomedical Informatics Applications, PhD
thesis, Harvard University, 2005.
[29] A. S. Lokman, J. M. Zain, F. S. Komputer and K. Perisian, Designing a Chatbot for diabetic patients,
International Conference on Software Engineering & Computer Systems, 2009.
[30] H. N. Io and C. B. Lee, Chatbots and Conversational agents: A bibliometric analysis, in IEEE
International Conference on Industrial Engineering and Engineering Management, 2017, 215-219.
[31] Jairvargas, Anamnese, https://www.slideshare.net/jairvargas/anamnese-44468748, last access:
29.01.2019.
[32] Alk-info.com, Alkoholtest mit 22 Fragen, Schnell-Test auf Alkoholgefährdung, https://www.alk-
info.com/tests/print/439-alkoholtest-mit-22-fragen-schnell-test-auf-%20alkoholgefaehrdung,last access:
21.03.2019.
[33] U. Latza et al., Erhebung, Quantifizierung und Analyse der Rauchexposition in epidemiologischen
Studien, Robert Koch Institut, 2005.
[34] Robert Koch Institut, Journal of Health Monitoring – Fragebogen zur Studie „Gesundheit in Deutschland
aktuell”(GEDA2014/2015-EHIS),
https://www.rki.de/DE/Content/Gesundheitsmonitoring/Gesundheitsberichterstattung/GBEDownloadsJ
/Supplement/JoHM_2017_01_gesundheitliche_lage9.pdf?__blob=publicationFile ,last access:
11.02.2019.
dHealth 2019 – From eHealth to dHealth 81
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-81

Evaluation of Deep Clustering for


Diarization of Aphasic Speech
Daniel KLISCHIESa,1, Christian KOHLSCHEINa, Cornelius J. WERNERb
and Stephan M. JONASc
a
Institute of Information Management in Mechanical Engineering,
RWTH Aachen University, Germany
b
Department of Neurology, Section Interdisciplinary Geriatrics,
University Hospital RWTH Aachen, Germany
c
Department of Informatics, Technical University of Munich, Germany

Abstract. Speaker attribution and labeling of single channel, multi speaker audio
files is an area of active research, since the underlying problems have not been
solved satisfactorily yet. This especially holds true for non-standard voices and
speech, such as children and impaired speakers. Being able to perform speaker
labelling of pathological speech would potentially enable the development of
computer assisted diagnosis and treatment systems and is thus a desirable research
goal. In this manuscript we investigate on the applicability of embeddings of audio
signals, in the form of time and frequency-band based segments, into arbitrary vector
spaces on diarization of pathological speech. We focus on modifying an existing
embedding estimator such that it can be used for diarization. This is mainly done via
clustering the time and frequency band dependant vectors and subsequently
performing a majority vote procedure on all frequency dependent vectors of the
same time segment to assign a speaker label. The result is evaluated on recordings
of interviews of aphasia patients and language therapists. We demonstrate general
applicability, with error rates that are close to what has been previously achieved in
diarizing children’s speech. Additionally, we propose to enhance the processing
pipelines with smoothing and a more sophisticated, energy based, voting scheme.

Keywords. diarization, expressive language disorders, machine learning, medical


informatics

1. Introduction

Aphasia is a language disorder usually acquired from strokes or other causes of brain
damage. It is usually not related to motoric or sensoric incapabilities, but a loss of the
brains capability to formulate language. The gold standard for aphasia classification and
severity measurement in Germany is the Aachen aphasia test (AAT) [1]. It consists of
several procedures, testing the patient’s linguistic capabilities in scenarios like image
description, storytelling and spontaneous speech. Conducting and evaluating a complete
AAT takes up to eight hours of work by a professional speech and language therapist or
neurologist and requires that the patient is present at the clinic.

1
Corresponding Author: Daniel Klischies, Institute of Information Management in Mechanical
Engineering (IMA), RWTH Aachen University, Germany, E-Mail: [email protected].
82 D. Klischies et al. / Evaluation of Deep Clustering for Diarization of Aphasic Speech

Our long term goal is to develop a method to automatically estimate aphasia severity
and syndrome classification based on a preexisting recording of a patient interview. In
order to do this, we need to separate the therapist's speech segments from the patient's
speech segments. This process, also called diarization, has seen significant research
interest in the past. While originally conducted using clustering procedures based on
immediate features of the underlying audio signal, recent developments (such as [2, 3])
suggested that generating clustering features using neural networks yields better results.
The specific use case of aphasic speech also yields some additional challenges and
characteristics with regard to diarization. Since a symptom of aphasia can be stuttering
and extensive use of filler words, we require that these are included in the diarization
output. We also cannot rely the diarization on semantic or syntactic linguistic properties,
since both capabilities might be severely reduced as an effect of aphasia. Lastly,
recordings of aphasia patients are rare and hard to obtain, because the prevalence of
aphasia within the population is relatively low and obtaining recordings usually includes
adhering to strict data protection rules. This results in significant problems when training
any machine learning based classifier, as training material is rare. Additionally, virtually
all recordings of aphasia patients are single source recordings, eliminating the possibility
to use multi-source diarization procedures. This specifically holds true for a set of
aphasic speech data we currently possess and are planning to analyze, and which was the
original motivation to perform single source diarization.
In general, single source speaker diarization systems require a set of metrics that can
be used to locally cluster temporal segments of speech, such that each cluster represents
a segment of speech by the same speaker. This can be implemented either bottom up, by
first creating many small segments and subsequently merging those segments, or top
down by splitting segments as long as they are suspected to be comprised of more than
one speaker. One possibility to perform bottom up clustering is to implement the initial
splitting based on a sliding window over the audio signal and subsequently clustering
these segments. Such an approach has recently been investigated by Wang et al. [3].
Their diarization system uses a long short-term memory (LSTM) network, which is a
memory cell based recurrent neural network [4], to derive a vector space embedding of
the sliding window segments and clusters them using different procedures. The most
promising clustering procedures are k-means clustering and spectral clustering, with a
diarization error rate of roughly 12%, depending on the evaluation data set. They
compared their results to the diarization error rate of a similar system that uses Gaussian
mixture models (GMMs [5, pp. 40-42]) instead of LSTMs to derive the embeddings,
which performed worse by at least 8 percentage points.
A combination of both, bottom up and top down clustering, has been implemented
by Bozonnet et al. in [6]. Their integrated approach led to an improvement in diarization
error rate by 4 percentage points, although generally performing worse than the system
developed by Wang et al, albeit their usage of different evaluation data. This is most
probably because the latter uses LSTM based embeddings, while the system by Bozonnet
used Gaussian mixture models.
In 2010, Meignier and Merlin published a paper describing the LIUM toolkit for
development of diarization systems [7]. Contrary to the other systems presented, this
system provides several building blocks for the implementation of diarization utilities,
based on agglomerative clustering. Due to its release date, this framework does not use
LSTMs but the older GMM approach.
Lastly, in 2016 Hershey et al. proposed a method to generate vector space
embeddings from speech data using LSTMs [2, 4]. These embeddings are generated
D. Klischies et al. / Evaluation of Deep Clustering for Diarization of Aphasic Speech 83

based on logarithmically scaled short term Fourier transformations of segments


originating from the input speech audio file. Each of these segments is comprised of a
time and frequency interval of said input audio. The LSTM is subsequently used to
compute and optimize an affinity matrix, denoting which time frequency segments
belong to the same speaker. Subsequently, several clustering methods such as the
aforementioned spectral clustering and k-means clustering are used to determine which
segments belong to the same speaker. Since the clustering is not only time but also
frequency band agnostic, this allows to separate overlapping speech, which is an
inherently more complex task.
Here, we investigate on how to adopt the method proposed by Hershey et al. to
speaker diarization, specifically for diarizing aphasic speech.

2. Methods

Applying deep clustering as introduced by Hershey et al. in [2] in practice leads to several
details that might influence the results dramatically. These details resolve around how
much training data is required to train an estimator, such that the embeddings are
sufficiently discriminative to fulfill the aforementioned criteria for successful clusterings.
Additionally, we are not interested in speech separation but diarization, requiring a slight
alteration of Hershey's proposed algorithm. In order to collapse the time frequency bins
into time bins, we employ a majority voting scheme: For each set of time frequency bins
representing the same time slot ‫ݐ‬, we count how many frequencies have been assigned
to which speaker. In the next step we assign the time slot ‫ ݐ‬of the input signal to the
speaker to whom the most time frequency bins were assigned. Under the hypothesis that
the acoustically dominant speaker of a time slot also dominates most of the time
frequency bins of said time slot, this majority voting allows us to diarize an input file
such that we always get the whole spectrum assigned to a single speaker. This saves us
from having to deal with a signal reconstruction problem that the original separation
procedure suffers from: If one does assign time-frequency bins of a signal to different
speakers, all those frequency bands that have not been assigned to a speaker would be
missing from the output signal. Isik et al. proposed some reconstruction methods for
these parts of the signal [8], but since we are dealing with pathological speech any
reconstruction methods based on assumptions of non-pathological speech could
introduce incorrect additions and reduce the quality of a diagnosis based on the
reconstructed signal.
Our implementation of the deep clustering algorithm itself is based on an
implementation by Haroran Zhou, who implemented deep clustering using Tensorflow
(https://github.com/zhr1201/deep-clustering). We modified his work, such that it
supports Python 3, resolved some minor bugs and adopted the frequency band majority
voting strategy presented in the previous section.
We preprocess data by down-sampling the signal to 8kHz, and generate 129 Fourier
transformation points per frame, with a window size of 256 separate Fourier
transformations, such that we get Hanning windows of length 256/8000Hz = 0.032s. The
network itself consists of four bidirectional LSTM layers with 300 memory units per
layer, followed by a layer with a hyperbolic tangent activation function to estimate the
embedding. Finally, the embedding is normalized based on its L2 norm.
We use a dropout of 50% for the forward propagation and 20% for the recurrent
propagation of errors, the estimated embedding space has 40 dimensions.
84 D. Klischies et al. / Evaluation of Deep Clustering for Diarization of Aphasic Speech

The classifier has been trained for a week using an Nvidia Titan X (Pascal
architecture), which was sufficient for 352000 training steps. For the training corpus, we
mixed (non-aphasic) speech files from the 360 hours LibriSpeech audio book corpus [9],
such that we get training files with two simultaneously speaking speakers per file. We
do this by combining 20 files per speaker with some other randomly chosen file
containing another speaker. The training is thus based on the original usage of the
classifier, as proposed by Hershey et al. and the resulting classifier could also be used
for speech separation. Our majority voting scheme is only applied after the training is
completed and the classifier is being evaluated.

2.1. Diarization error rate

In order to evaluate the results, we use a slightly modified version of the diarization error
rate (DER), which was originally proposed by the National Institute of Standards and
Technology (NIST) (cf. [10]). Given a prediction and a ground truth set of speaker labels,
the DER quantifies the correctness of the prediction. The ultimate goal is to develop a
diarization procedure that yields predictions with a DER of 0. While the NIST definition
measures how much of the overall recording time was incorrectly attributed, we want to
measure how many of the potential speaker labels are incorrect. This penalizes classifiers
that do not detect overlapping speech properly more than the NIST definition: In the
NIST definition, a segment that actually contains two speakers but was classified as
silence increases the DER just as much as a segment that contains two speakers but was
classified to contain one speaker. In our definition, classifying this segment to contain
silence is twice as bad as classifying it to contain a single speaker.
For an audio recording of length ஊ with a frame rate ‫ܤ‬, for which we know that it
contains ܰ speakers, we define the maximum amount of possibly incorrectly assigned
ሺ୘ ‫ڄ‬୆ሻൈଶ
labels ୫ୟ୶ ൌ  ஊ ‫ܰ ڄ ܤ ڄ‬. Furthermore, we define ‫  א ܮ‬ԋଶ ಂ to be our ground truth
speaker label matrix where ‫ܮ‬௜ǡ௝  ൌ ͳ iff in the ݅ frame, the ݆th speaker is active, and ܲ ‫א‬
th

ሺ୘ ‫ڄ‬୆ሻൈଶ
ԋଶ ಂ to be the estimated speaker label matrix. Then we can decompose the DER
of ܲ given ‫ ܮ‬into the following components:

σಿ
ೕసబ ௉೔ǡೕ
‫ܧ‬௙௔ ൌ σ଴ஸ௜ழ୘ಂ ‫ڄ‬஻ (1)
ா೘ೌೣ
‫ר‬௅೔ǡ‫ ڄ‬ୀ଴
σಿ
ೕసబ ௅೔ǡೕ
‫ܧ‬௠௜௦௦ ൌ  σ଴ஸ௜ழ୘ಂ ‫ڄ‬஻ (2)
ா೘ೌೣ
‫ר‬௉೔ǡ‫ ڄ‬ୀ଴
σಿ
ೕసబ ȁሺ௅೔ǡ‫ି ڄ‬௉೔ǡ‫ ڄ‬ሻೕ ȁ
‫ܧ‬௘௥௥௢௥ ൌ  σ ଴ஸ௜ழ୘ಂ ‫ڄ‬஻ (3)
ா೘ೌೣ
‫ר‬௅೔ǡ‫ ڄ‬ୀ଴‫ר‬଴ழ௅೔ǡ‫ ڄ‬

‫ܧ‬௙௔ is the false alarm rate, which we define as the percentage of possible speech
label tags that were marked as non-silence but were actually silence. A high false alarm
rate indicates that there is an issue with the voice activity detector (VAD) of the
diarization procedure. Analogously ‫ܧ‬௠௜௦௦ is the percentage of speech labels that were
classified as silence but were actually speech. If this value is high, then the VAD
algorithm is too restrictive, as it misses some speech. Lastly ‫ܧ‬௘௥௥௢௥ is the percentage of
D. Klischies et al. / Evaluation of Deep Clustering for Diarization of Aphasic Speech 85

incorrectly assigned speech labels, in regions where there was no silence according to
the ground truth and the diarization algorithm's VAD.
Given these components, the DER can be calculated as described in equation 4.

‫ܧ‬஽ாோ  ൌ  ‫ܧ‬௙௔ ൅  ‫ܧ‬௠௜௦௦  ൅  ‫ܧ‬௘௥௥௢௥  (4)

2.2. Benchmarking

We developed an automatic benchmarking procedure along with a set of Python scripts


performing the benchmarks. These scripts take a list of input folders containing audio
files, where each folder represents a speaker. Each file in each folder is then mixed
alternatingly with every file of the other speakers. Information about which part belonged
to which speaker is stored, in order to later on compare this ground truth to the results of
the diarization. Therefore, we evaluate the quality of deep clustering on a computer-
generated test set. This allows us to generate many different test audio files with different
lengths of individual utterances and the whole file, allowing more flexible test scenarios
compared to using the original recordings. Additionally, this allows us to evaluate the
performance of diarization algorithms on arbitrary speaker pairs, not just those that are
present in the test data set.
Currently our benchmarking script supports the following options:

1. Arbitrary amount of evaluation speakers, but a fixed amount of two speakers


per generated output file
2. Minimum/maximum length of each utterance in the output file
3. Length of a crossfaded overlap between utterances
4. Length of silence between utterances
5. Maximum number of output files to generate
6. Minimum length of output file (if two input files do not result in sufficient
length of the output file, the algorithm will append additional files from the
same speakers)

The scripts are capable of benchmarking arbitrary diarization algorithms and


implementations. Each of these implementations has to provide a Python 3 class with a
method, that takes a file path to a sample audio file to diarize, and optionally an integer
specifying into how many prediction frames this file should be split. Speaker vectors
generated by this method must always be two dimensional, and the overall return value
of the method must follow the same semantics that we defined for the prediction matrix
ܲ in our definitions of DER (see equation 4 in section 2.1).
After the diarization procedure returned its predicted diarization ܲ to the
benchmarking script, it compares this value to the ground truth labels ‫ ܮ‬that are based on
its knowledge about the original combination of different speakers and quantifies this
using the DER metric presented in the previous subsection 2.1. Lastly, the results are
summarized and mean, minimum, maximum, standard deviation and variance are
calculated over the set of diarization error rates.
86 D. Klischies et al. / Evaluation of Deep Clustering for Diarization of Aphasic Speech

3. Results

Our evaluation data set is based on the AphasiaBank dataset [11]. AphasiaBank is a data
set composed of transcribed video recordings of semi standardized interview scenarios
between aphasia patients and therapists. We downloaded and automatically split all those
recordings into utterances labeled with information whether the therapist or the patient
is speaking in that particular utterance, based on the timestamps and speaker labels of
the transcripts. Since the therapist usually does not change between different recordings
of the same data set, we store all therapist utterances of the same institution as it came
from one recording, while patient utterances are separated such that each recording leads
to a separate set of patient utterances.
We recombined a subset of the AphasiaBank speaker files, such that we get audio
files with a minimum length of 5 seconds and at most 3 seconds per utterance. The latter
value roughly matches average speaker durations in common evaluation data sets for
diarization of healthy speech [12]. For each speaker of the subset, we randomly choose
at least 5 utterances and combined each of them with an utterance from another,
randomly chosen speaker. If that combination was not at least 5 seconds long, we
appended additional utterances from the same speakers until 5 seconds of file length were
reached. This length requirements ensures that we get a balanced set of evaluation data.
This is particularly relevant because, depending on the aphasia syndrome, patients tend
to speak significantly longer or shorter than the therapist. Additionally, we did not only
mix speech of patients with speech of therapists, but also with speech of other patients.
This allows us to judge the quality of the classifier for diarization of aphasic speech in
general, and not only in scenarios where exactly one speaker suffers from aphasia. This
would not be possible, if we would not have recombined the files, as we do not possess
recordings containing multiple patients.
The result of this process were 125 separate audio files. Diarizing them with deep
clustering led to a DER of 27.94%, minimally 13.9%, maximally 39.19% and a standard
deviation of 0.0439. Since the way we compose the input file does not allow for overlap
(no crossfade) or gaps, the "false alarm" and "miss" error rates do not play a role in this
evaluation, and we only rely on the "error" part of the diarization metric.

4. Discussion and Future Work

We conducted an evaluation using specifically remixed audio segments of the


AphasiaBank data set, ensuring a well-balanced amount of speaker duration in all
evaluation files.
Comparing the achieved result to other diarization systems is hardly possible, due
to the fact that most speaker diarization systems are evaluated on professionally recorded,
healthy adult speech. An evaluation conducted by Anguera et al. in 2012 showed
diarization error rates between 7% and 49% for the RT09 data set, consisting of healthy
speech [12]. However, large amounts of this error are caused by overlapping speech,
which is a scenario we have not evaluated yet. Incorrectly attributed speaker information
accounts for 5 to 10 percentage points of the diarization error rate in this scenario. While
this is considerably better than our implementation of deep clustering, it was achieved
on healthy speech that is probably an easier task, and on recordings that were made using
professional equipment.
D. Klischies et al. / Evaluation of Deep Clustering for Diarization of Aphasic Speech 87

In 2018 Cristia et al. attempted to diarize conversational speech of children [13].


Since speech of children is located on a much smaller and therefore less discriminative
frequency band than adult speech, they argue that this is a much harder problem then
diarization of adult speech. Furthermore, since children are still learning to comprehend
sentences and are generally more affected by language disorders [14], diarization of such
speech might be more comparable to aphasic speech. Their system led to a DER of 20%
to 40% for two speakers.
The diarization result turned out to be mediocre compared to diarization systems for
non-aphasic speech. This could be related to insufficient amount of training data,
improvable network architecture or an incorrect voting scheme. Compared to common
diarization systems for healthy speech, our approach is worse by 10 to 15 percentage
points but performs roughly on the same level as diarization of children’s speech, which
is also prone to some issues that impair diarization of aphasic speech.
The performance of the deep clustering algorithm leaves a lot of room for
improvement. Some of these are general improvements that are beneficial for both,
pathologic and healthy speech diarization tasks, including improvements to the voice
activity detector by means such as adding a dynamic threshold to determine whether the
signal energy is sufficiently to be speech. More advanced techniques like spectral
analysis, as proposed by Ma et al. in [15] could improve this even more and might even
be able to cope with significant background noise.
The majority voting scheme that we used to adopt the algorithm to diarization
instead of separation might also be improvable. In its current implementation, a speaker
who gets assigned many low energy frequency bands of a time slot wins over a speaker
to whom few, but high energy bands were assigned. Taking the energy of the frequency
bands into account would lead to a weighted voting scheme that might work better than
our naive implementation.
The exact parameterization of the neural network, along with the training data used,
is currently also mostly arbitrary. Investigating how different parameters affect the
diarization error rate of the final classifier could lead to the ability to fine tune the system
to pathologic and especially aphasic speech. Apart from hyper parameter tuning, that
could be performed using grid search, it would be interesting to optimize the training
data set. Due to the lack of sufficient data, it is highly unlikely that it will become possible
to train the classifier exclusively on aphasic speech. It is therefore desirable to determine
a way to apply transfer learning to the classifier, such that it is trained on healthy speech
and subsequently adapted to pathologic speech in a way that does not require huge
amounts of pathologic speech. Unfortunately, any changes to the neural network's layout
require re-training the classifier, which requires a significant amount of computing time.

References

[1] Walter Huber et al., Aachener Aphasie Test (AAT): Handanweisung, Verlag für Psychologie, Hogrefe,
1983.
[2] John R. Hershey et al., Deep clustering: Discriminative embeddings for segmentation and separation,
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2016), 31-35.
[3] Quan Wang et al., Speaker Diarization with LSTM, IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP) (2018), 5239-5243.
[4] Sepp Hochreiter and Jürgen Schmidhuber, Long Short-Term Memory, Neural Computation 9.8 (1997),
1735-1780.
88 D. Klischies et al. / Evaluation of Deep Clustering for Diarization of Aphasic Speech

[5] Geoffrey McLachlan and David Peel, Finite Mixture Models, John Wiley & Sons, Hoboken, 2000.
[6] Simon Bozonnet et al., An integrated top-down/bottom-up approach to speaker diarization, Eleventh
Annual Conference of the International Speech Communication Association (INTERSPEECH) (2010),
2646-2649.
[7] Sylvain Meignier and Teva Merlin, LIUM SpkDiarization: an open source toolkit for diarization, CMU
SPUD Workshop, 2010.
[8] Yusuf Isik et al., Single-channel multi-speaker separation using deep clustering, 17th Annual Conference
of the International Speech Communication Association (INTERSPEECH) (2016), 545-549.
[9] Vassil Panayotov et al., Librispeech: an ASR corpus based on public domain audio books, IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018), 5206-5210.
[10] Jonathan G. Fiscus et al., The rich transcription 2005 spring meeting recognition evaluation, in:
International Workshop on Machine Learning for Multimodal Interaction, Springer, Heidelberg, 2005,
369-389.
[11] Brian MacWhinney et al., AphasiaBank: Methods for studying discourse, Aphasiology 25.11 (2011),
1286-1307.
[12] Xavier Anguera et al., Speaker diarization: A review of recent research, IEEE Transactions on Audio,
Speech and Language Processing 20.2 (2012), 356-370.
[13] Alejandrina Cristia et al., Talker diarization in the wild: The case of child-centered daylong audio-
recordings, 20th Annual Conference of the International Speech Communication Association
(INTERSPEECH) (2018), 2583-2587.
[14] Victoria M. Garlock et al., Age-of-acquisition, word frequency, and neighborhood density effects on
spoken word recognition by children and adults, Journal of Memory and language 45.3 (2001), 468-492.
[15] Yanna Ma and Akinori Nishihara, Efficient voice activity detection algorithm using long-term spectral
flatness measure, EURASIP Journal on Audio, Speech, and Music Processing 2013.1 (2013), 87.
dHealth 2019 – From eHealth to dHealth 89
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-89

Ensemble Based Approach for Time Series


Classification in Metabolomics
Michael NETZERa,1, Friedrich HANSERa, Marc BREITa, Klaus M. WEINBERGERa,c,
Christian BAUMGARTNERb and Daniel BAUMGARTENa
a
Institute of Electrical and Biomedical Engineering, UMIT, Austria
b
Institute of Health Care Engineering, Graz University of Technology, Austria
c
sAnalytiCo Ltd., Belfast, United Kingdom

Abstract. Background: Machine learning is one important application in the area of


health informatics, however classification methods for longitudinal data are still rare.
Objectives: The aim of this work is to analyze and classify differences in metabolite
time series data between groups of individuals regarding their athletic activity.
Methods: We propose a new ensemble-based 2-tier approach to classify metabolite
time series data. The first tier uses polynomial fitting to generate a class prediction
for each metabolite. An induced classifier (k-nearest-neighbor or naïve bayes)
combines the results to produce a final prediction. Metabolite levels of 47
individuals undergoing a cycle ergometry test were measured using mass
spectrometry. Results: In accordance with our previous work the statistical results
indicate strong changes over time. We found only small but systematic differences
between the groups. However, our proposed stacking approach obtained a mean
accuracy of 78% using 10-fold cross-validation. Conclusion: Our proposed
classification approach allows a considerable classification performance for time
series data with small differences between the groups.

Keywords. biomarkers, time series, classification, kinetics

1. Introduction

Machine learning (ML) is amongst the greatest application challenges of Health


Informatics (HI), resulting in improved medical diagnoses, disease analyses, and
pharmaceutical development [1]. Recently, Rav`ı et al. [2] reviewed the research
employing deep learning in health informatics including medical informatics, public
health, sensing, bioinformatics and imaging. In the area of biomedical time series
analysis, McCoy et al. [3] recently introduced a machine learning method for forecasting
hospital discharge volume using a Bayesian forecaster. In general, machine learning
methods can be categorized into unsupervised and supervised methods. Unsupervised
methods do not require labeled input and search for pattern. A popular unsupervised
method is clustering, where groups of instances with similar properties are identified. In
contrast, supervised methods require a learning input (outcome variable). Depending on
the type of the outcome variable we distinguish between regression (numeric outcome)
and classification problems (categoric outcome).

1
Corresponding Author: Michael Netzer, UMIT Hall, Eduard-Wallnöfer-Zentrum 1, 6060 Hall in Tirol,
Austria, E-Mail: [email protected].
90 M. Netzer et al. / Ensemble Based Approach for Time Series Classification in Metabolomics

Feature selection is an important step to reduce the number of variables to a smaller


set with higher discriminatory ability. The advantages include faster, more cost-effective
and better interpretable learning models [4]. In our previous work [5] we introduced a
feature selection approach to identify clusters of metabolic biomarker candidates that
considerably change over time during physical activity. In particular, biomarker
candidates were chosen by using maximum fold changes (MFCs) of metabolite levels
and P-values resulting from statistical hypothesis testing. We identified characteristic
kinetic patterns using a mathematical modeling approach. Examples for such patterns
include early, late, or other forms of kinetic response patterns. The method utilized
polynomial fitting and clusters of metabolites with similar kinetics were analyzed
applying a cluster analysis. Finally, kinetic shape templates were identified that represent
different kinetic response patterns (e.g., sustained, early, late response). The aim of this
work is to analyze and classify differences in metabolite time series data between groups
of individuals regarding their athletic activity. In particular, we developed a new
ensemble-based approach based on the combination of class predictions using
polynomial fitting. Our main contribution is the introduction of a new classification
method for metabolic time series data. We here exemplarily distinguish athletes from
non-athletes; however, this approach can be also applied to other binary classification
problems in this context.

2. Material and Methods

2.1. Dataset

A total of 47 individuals underwent a standardized cycle ergometry experiment (starting


workload of 50 Watt). The study population can be categorized into two classes: i)
average physical activity (nSamples = 24) vs. competitive athletes (nSamples = 23). Starting
the exercise with a workload of 50 W, the workload was increased by 25 W every 3
minutes up to the individual’s maximum physical load [5]. Due to different individual
maximum workloads, the time series data were normalized and linearly interpolated.
Based on dried blood spots of samples taken from the earlobe, levels of 110 metabolites
were measured using triple quadrupole tandem mass spectrometry (MS/MS) applying
stable isotope dilution for metabolite quantitation. Groups of quantified metabolites
include acylcarnitines, amino acids and sugars. Missing values were imputed using a k-
nearest-neighbor approach [6]. See [5] for a detailed description of data acquisition and
preparation.

2.2. Statistical Analysis

Statistical analysis was performed using non-parametric tests for repeated measures data
[7]. The p-value was obtained using an ANOVA-type statistic. In contrast to parametric
approaches rank-based methods allow to analyze categorical or heavily skewed data in a
systematic way [8].

2.3. Machine Learning Approach

We use a k-nearest neighbor (kNN) and naive bayes (NB) classifier. kNN is a popular
non-parametric method that determines the class of a new instance based on the majority
M. Netzer et al. / Ensemble Based Approach for Time Series Classification in Metabolomics 91

class of its k nearest neighbors. The NB model assumes independence between variables.
This assumption is not valid in general and may also be validated in our dataset, however
NB is a popular classifier that performs well in many classification tasks [9]. The
performance of models is estimated using 10-fold cross validation summarized by micro-
average. In particular, the dataset is divided into ten partitions using nine parts for
training and the remaining subset for testing. This procedure is repeated ten times. For
every iteration the accuracy is calculated and finally summarized by calculating the mean.
However, in particular for imbalanced datasets the accuracy is inappropriate.
ைିா
Consequently, we also calculate the parameter  ൌ , where O is the observed and E
ଵିா
the expected accuracy to overcome this problem.

2.4. Ensemble Learning Approach

In this work we introduce an ensemble learning approach consisting of the following


steps:
1. Training step
(a) Calculate a representative time course for each metabolite of each group. In
particular, we calculate a median curve for each group. The value for each
time point is consequently the median value of the group.
(b) Fit polynomial (degree of 9 as used in [5]) for each class using the median
value of each time point ( ௧ ).
2. Classification step of an unlabeled sample s
(a) Calculate the residual sum of squares (RSS) for each class c ‫ א‬C of each
metabolite m ‫ א‬M for all time points t ‫ א‬T

RSS(c)=∑( ௧ −yt)2 (1)


t

, where ௧ represents the value for each time point from step 1 and yt is
the actual value.
(b) Determine for each metabolite m ‫ א‬M the class c ‫ א‬C by selecting the class
with the smallest RSS

cm = argminRSS(c) (2)
c

(c) Select the final class using class predictions of the previous steps (class
selection step).
For the class selection step, we consider three methods:
ͻ Majority voting: Select the majority class based on the class predictions for all
metabolites. For instances, having 5 metabolites where four metabolites select
class 1 and one metabolite selects class 2, we use class 1.
ͻ Majority voting and feature selection: Use only a subset of ntop ranked features
for the voting step. The ranking is calculated using the area between the time
92 M. Netzer et al. / Ensemble Based Approach for Time Series Classification in Metabolomics

curves (ABC). The idea is that discriminatory metabolites are represented by


higher ABC values. Additionally, we weight the ABC scores by adjusted R2 (i.e.,
s = ABC× R2adjusted).
ͻ Stacking approach: We propose a 2-tier approach to predict the classes based on
the time series data. An induced classifier uses the class predictions for each
metabolite to produce a final prediction (see also Figure 1). In our experiments
we use kNN and NB as classification methods.

Figure 1. Stacking approach using the class prediction of each metabolite. The color represents the class
prediction considering each metabolite (green = class 1, red = class 2).

3. Results

3.1. Statistical Evaluation and Clustering

Figure 2 visualizes metabolites and corresponding FDR adjusted p-values for comparing
groups (i.e., average vs. competitive athletic activity) and time (i.e., varying workload).
Table 1 depicts the p-values for change of time, group differences and interaction of both
of these variables.

Figure 2. Scaled p-values comparing groups and time (workload). The y-axis is plotted in log scale. Features
above the horizontal blue line significantly change over time (p < 0.05).
M. Netzer et al. / Ensemble Based Approach for Time Series Classification in Metabolomics 93

Table 1. FDR adjusted P-values for change of time, group differences (average vs. competitive athletic) and
interaction of both variables (group:time) calculated using non-parametric tests for repeated measures data [7].
The majority of metabolites significantly change over time (i.e., P-value for time < 0.05).

time group group:time

Lactate < 0.01 0.65 0.88


C2 < 0.01 0.66 0.59
Alanine < 0.01 0.53 0.87
C3 < 0.01 0.65 0.59
Arginine < 0.01 0.82 0.88
Glycine < 0.01 0.80 0.88
C4 < 0.01 0.80 0.78
Tyrosine < 0.01 0.65 0.87
Glutamic Acid < 0.01 0.80 0.88
Phenylalanine < 0.01 0.62 0.88
C3 DC M < 0.01 0.82 0.78
C5 OH < 0.01 0.78 0.78
C5 < 0.01 0.82 0.88
Glucose < 0.01 0.53 0.87
Methionine < 0.01 0.62 0.87
C18 2 < 0.01 0.85 0.78
Ornithine < 0.01 0.65 0.78
Serine < 0.01 0.31 0.88
Histidine < 0.01 0.82 0.88
C18 < 0.01 0.85 0.78
C16 < 0.01 0.78 0.64
xLeucine < 0.01 0.62 0.88
Tryptophan < 0.01 0.31 0.88
Lysine 0.02 0.59 0.86
Valine 0.04 0.59 0.88
Proline 0.05 0.62 0.88
C18 1 0.05 0.96 0.86
Threonine 0.14 0.31 0.78
C0 0.17 0.86 0.88
Aspartic Acid 0.33 0.85 0.87
Citrulline 0.41 0.88 0.88
94 M. Netzer et al. / Ensemble Based Approach for Time Series Classification in Metabolomics

3.2. Classification Performances

Figure 3 shows the classification performances comparing the ensemble-based methods


using 10-fold cross validation. The number of top ranked features (ntop) for the feature
௡௨௠௕௘௥௢௙௠௘௧௔௕௢௟௜௧௘௦
selection approach (Mj. + FS) was set to . The black lines indicate

the median accuracy of each method.

Figure 3. Boxplots of accuracy (left) and kappa (right) values for predicting average vs. competitive athletic
using kNN (first row) and NB (second row) classifier.

4. Discussion and Conclusion

In this work, we analyzed metabolic changes by considering change over time (i.e.,
varying Watt levels) and group differences. In summary, a total of 25 metabolites
changed significantly over time (p < 0.05). Similar to our previous work [5], the smallest
p-values were observed for lactate, alanine, acetylcarnitine (C2) and related short-chain
acylcarnitines (C3, C5). Interestingly, no significant changes were observed comparing
average vs. competitive athletic groups. The smallest p-values were observed for serine,
M. Netzer et al. / Ensemble Based Approach for Time Series Classification in Metabolomics 95

tryptophan, and threonine. However, considering time charts, we identified a clear trend
by observing systematically higher metabolite levels for all time points for these
metabolites. The missing significance levels may be a result of the relatively high
standard deviation due to the small sample size and heterogeneities within the groups
(e.g., different individual maximum Watt levels and individual anaerobic thresholds).
Considering these three metabolites biochemically, the carbon skeletons of serine,
threonine and tryptophan are used to form pyruvate that is used as fuel in the
mitochondria by conversion to acetyl CoA (TCA cycle), converted to lactate or utilized
to produce glucose in the liver [10].
Even though the statistical approach revealed no significant metabolites when
comparing the classes, we obtain accuracy values of 75%. The highest mean accuracy of
76.83% was obtained by using our stacking approach using NB as classifier. The standard
deviations of the resulting performance values were also comparably low. Interestingly,
the proposed feature selection step did not improve the performance. Our assumption is
that the proposed feature ranking method is very prone to noise.
The degree of 9 used for polynomial fitting was based on our previous work,
however this value can be further optimized to increase accuracy values.
In summary, we introduced a new ensemble-based classification method for time
series metabolite data. For each metabolite, a class prediction is produced using
polynomial fitting. The predictions are summarized by using an induced classifier to
obtain a final classification of a new unlabeled sample. Note that this approach can be
also applied to proteomic or genomic datasets.

5. Acknowledgements

Michael Netzer was supported by the Tiroler Wissenschaftsfond. The authors thank
Prof. Dr. Elske Ammenwerth for her comments improving the paper.

References

[1] A. Holzinger, Interactive machine learning for health informatics: when do we need the human-in-the
loop?, Brain Informatics 3(2) (2016), 119–131.
[2] D. Ravı, C. Wong, F. Deligianni, M. Berthelot, J. Andreu-Perez, B. Lo and G.-Z. Yang, Deep learning
for health informatics, IEEE journal of biomedical and health informatics 21(1) (2017), 4–21.
[3] T.H. McCoy, A.M. Pellegrini and R.H. Perlis, Assessment of Time-Series Machine Learning Methods
for Forecasting Hospital Discharge Volume, JAMA network open 1(7) (2018), 184087–184087.
[4] Y. Saeys, I. Inza and P. Larranaga,˜ A review of feature selection techniques in bioinformatics.,
Bioinformatics 23(19) (2007), 2507–2517.
[5] M. Breit, M. Netzer, K.M. Weinberger and C. Baumgartner, Modeling and Classification of Kinetic
Patterns of Dynamic Metabolic Biomarkers in Physical Activity., PLoSComputBiol 11(8) (2015),
1004454. doi:10.1371/journal.pcbi.1004454. http://dx.doi.org/10.1371/journal.pcbi.1004454.
[6] T. Hastie, R. Tibshirani, G. Sherlock, M. Eisen, P. Brown and D. Botstein, Imputing missing data for
gene expression arrays, Stanford University Statistics Department Technical report, 1999.
[7] K. Noguchi, Y.R. Gel, E. Brunner and F. Konietschke, nparLD: An R Software Package for the
Nonparametric Analysis of Longitudinal Data in Factorial Experiments, Journal of Statistical Software
50(12) (2012), 1–23. http://www.jstatsoft.org/v50/i12/.
96 M. Netzer et al. / Ensemble Based Approach for Time Series Classification in Metabolomics

[8] F. Konietschke, A.C. Bathke, L.A. Hothorn and E. Brunner, Testing and estimation of purely
nonparametric effects in repeated measures designs, Computational Statistics & Data Analysis 54(8)
(2010), 1895–1905.
[9] J. Wolfson, S. Bandyopadhyay, M. Elidrisi, G. Vazquez-Benitez, D.M. Vock, D. Musgrove, G.
Adomavicius, P.E. Johnson and P.J. O’Connor, A Naive Bayes machine learning approach to risk
prediction using censored, time-to-event data, Statistics in medicine 34(21) (2015), 2941–2957.
[10] D.M. Medeiros, R.E. Wildman et al., Advanced human nutrition, Jones & Bartlett Publishers, 2013.
dHealth 2019 – From eHealth to dHealth 97
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-97

Achieving an Interoperable Data Format


for Neurophysiology with DICOM
Waveforms
Silvia WINKLERa,1, Martin HUBERa and Tilmann KLUGEb
a
Sigma Software Solutions OG, Vienna, Austria
b
Austrian Institute of Technology, Vienna, Austria

Abstract. Modern healthcare faces multiple challenges: diagnosis and treatment


happens multidisciplinary and distributed. The key principle to accomplish this is
interoperability. Some disciplines like radiology are well experienced in
interoperable workflows and cross institution data exchange; other disciplines just
realize the growing importance. In this paper we analyze the situation in neurology
and give an overview of attempts made in the past to establish an interchangeable,
interoperable data format for biomedical signal data, which would be suitable for
neurology, too. Focusing on EEG data we will discuss how DICOM Waveforms
could be used to cover many of the requirements. As a result necessary adaptions
and remaining issues are identified. With DICOM Waveforms a specification is
available that covers most of the interoperability requirements. With only little
adjustments DICOM Waveforms could establish data interoperability in neurology.

Keywords. health information interoperability, electroencephalography

1. Introduction

Modern healthcare requires the exchange of data. Today medicine acts in an


interdisciplinary and integrated way. Data have to be exchanged across departments
and even across hospitals or bigger organizations like hospital associations or
insurances. In the past great efforts for developing unified medical IT platforms for
data exchange and storage have been made. The medical devices are more and more
connected to these platforms and medical data is exchanged in a secure and
standardized way making it available to other departments which did not record it. The
key issue for cross department and moreover cross enterprise data exchange are
standardized data formats and communication protocols. IHE is a worldwide initiative
by healthcare professionals and industry to improve the interoperability of healthcare
systems and to provide integration profiles for a broad range of clinical workflows.
Based on medical standards like HL7 and DICOM a working interoperability between
healthcare applications can be reached. Today clinical systems for administration or
workflow (e.g. HIS, RIS or LIS), report repositories and imaging archives work
together using these standards. A substantial example for the benefits of standardized
data is the Austrian public health record ELGA, which was designed according IHE

1
Corresponding Author: Silvia Winkler, Sigma Software Solutions OG, Markhofgasse 1-9/3/338, 1030
Vienna, Austria, E-Mail: [email protected]
98 S. Winkler et al. / Achieving an Interoperable Data Format for Neurophysiology

specifications: HL7 CDA is the standardized document format for medical reports in
ELGA and DICOM is the required data format and communication protocol for
medical images and image related data like signal data or evidence documents. The
decision of the European Commission 2015/1302 of 28 July 2015 [1], which declares
27 IHE profiles as ‘eligible in public procurement’, emphasizes the legitimacy to
require interoperable data formats in future public tenders.
Prior condition for interoperable systems is a standardized data format. In the past
different organizations provided normative standards or noncommittal format
specifications for neurophysiological time signals like EEG; a comprehensive
comparison of many of them was done by A. Schloegl [2]. Until the recent past none of
the formats found a broader acceptance by manufacturers of EEG devices. We still see
the situation, that manufacturers use proprietary formats – in most of the cases with
restricted access to the specifications. Stead et al. [3] recently discussed the urgent need
to establish a common format and proposed the use of the MEF3 format.
This paper shows that the existing, well-established DICOM standard is able to
achieve interoperability for neurophysiology data like EEG and gives an overview of
the necessary extensions to the existing DICOM Waveforms specification.

2. Methods

Starting with a research on existing standards and noncommittal format specifications


developed by industry or research institutions, which are suitable for encoding and
exchanging EEG signal data, we took a close look on their capabilities, on their
metadata, and supported data encoding. We summarized their advantages and
disadvantages and, finally, evaluated them in terms of being integrated in today’s
healthcare eco systems.
In comparison to these we analyzed the capabilities of the already existing DICOM
waveform specification, which is available for some different sorts of biomedical
signal data but still lacks a specification for neurophysiological recordings. This is
remarkable in view of the fact that many of the historical standards were taken into
consideration when the DICOM waveform specification was developed in 2001;
particularly are to be mentioned
x ASTM E31.16 - E1467
x HL7 V2.3
x CEN TC251 PT5-007 SCP-ECG
x CEN TC251 PT5-021 VITAL
x IEEE P1073 – Medical information BUS (MIB)

Both, ASTM E1467 respectively its successor ACNS TS1 [1] and HL7 V2.x [6],
are evaluated later in this document. The two CEN standards mentioned in the DICOM
waveform specification existed only as drafts released from the CEN Technical
Committee 251 (WG 5 - Medical Device Communication in Integrated Healthcare).
SCP-ECG reached broader usage especially in cardiology use cases, today it is part of
the CEN/ISO/IEEE 11073-9x specifications. CEN VITAL and IEEE MIB became the
core standards in CEN/ISO/IEEE 11073 [7], [8].
S. Winkler et al. / Achieving an Interoperable Data Format for Neurophysiology 99

In order to achieve interoperability for neurophysiology signal data the evaluation


of the different standards included not only technical parameters but focused on their
ability to be included in clinical workflows and existing healthcare systems.

3. Results

3.1. ACNS TS1 (ASTM E1467)

The American Clinical Neurophysiology Society released the first version of this
standard (ASTM E1467) in 1992 as a result of a joint undertaking of clinical, academic,
and vendor interests. As its successor ACNS TS1 [1] was released in 2008. Both
versions are based on NCCLS LIS 5-A [5], which was one of the fundamentals of HL7
v2.x, too.
ACNS TS1 claims support not only for EEG data but for a broad range of digital
electro-physiologic waveform data in clinical and research environment like
electromyogram (EMG), polysomnography (PSG) and evoked potentials (EP) as well.
The standard is strictly ASCII based; in the earlier version even the signal data had
to be stored as 7.bit ASCII values. The current version still supports ASCII encoded
sample data and recommends them still for short recordings. In addition, sample data
can be provided in numeric form as well, but this requires additional data files.
Annotations are supported and stored as the waveform data itself in result segments.
For annotations the segment category ANA is used.
Besides the waveform data and their acquisition parameters the specification
contains in-depth structured data segments, well defined data types, comprehensive
lists of allowed values and codes and guidelines for message exchange. The message
format is similar to HL7 V2.x messages
ACNS TS1 is a comprehensive standard for medical waveform data, which
includes administrative and acquisition context. It is notable for its broad nomenclature,
which contains defined terms for almost every single parameter.
The standard defines different levels of implementation (Level I – Waveforms
only; Level II- Waveform or Procedure Annotations or Both; Level III – Coded
Information) and suggests a “Description of Implementation”, which a system should
provide in order to declare details of its compliance.
Although the message format itself is very close to HL7 V2.x, which is worldwide
used, there is no known implementation of this standard.

3.2. HL7 V2.6

Health Level Seven (HL7) is an international organization which provides standards for
healthcare interoperability.
HL7 V2 [6] is a message based standard with focus on administrative tasks like
patient and order management, and communication of results like laboratory measures.
It is broadly used all over the world and facilitates communication between hospital
information systems and departments like radiology or laboratory. It is used for
pharmacy tasks as well as for billing and – last but not least – in electronic health
records. It is one of the base standards used in IHE integration profiles.
100 S. Winkler et al. / Achieving an Interoperable Data Format for Neurophysiology

The standard is message-oriented: triggered by events, messages are exchanged


usually in a request-reply manner. Unrequested sending is supported, too. Waveform
support was introduced in V2.3 (1997), for this research we looked at the specification
of V2.6 (2008).
HL7 V2 messages use 7-bit ASCII encoding as default. An alternative character
set can be defined in the message header, even multi-byte character sets or Unicode are
allowed. The message itself consists of segments built up in a defined order of required
and optional fields separated by a fixed character used as separator.
Waveform data are supported as observation content (OBX) within observation
result (ORU) messages, which have a defined structure depending on the use case. The
ORU message contains segments with information about the clinical context – i.e. the
patient, the order or the result. Signal data are included in observations segments
(OBX) with category result type WAV. Only ASCII encoding, having a defined
delimiter after each sample value is allowed. Waveform annotations are transmitted in
OBX result segments with category ANO. Annotations are coded data associated with
a given point in time during the waveform recording. Relationship to the channels is
provided via ID.
HL7 is well-known and broadly used in the healthcare domain. It’s used for
messaging tasks and as such, it plays an important role in transmitting clinical
observations as well. It is focused on the message and not on the persisted object – this
could be the reason, why it is not used to store medical signal recordings except within
databases. There is no defined file format and no possibility to store the data on
portable media. None of the known vendors uses this format to store EEG recordings.
Maybe there are some implementations, which send relevant signal parts enclosed in
the documentation to a hospital information system.
Another reason, why there is no broad use of HL7 waveforms, might be the
limitation to ASCII data. Having large sets of binary data being converted to ASCII
representation brings a large overhead.

3.3. EDF and EDF+

The European Data Format EDF [9] and its successor EDF+ [10] are an open, non-
proprietary guideline for ASCII based encoding of signal data. There is a large set of
free available tools to handle this format.
EDF+ is widely compatible with EDF. With the more recent version,
discontinuous recordings and annotations are supported. EDF and EDF+ are frequently
used for sleep data (PSG), EEG, ECG, and EMG.
A standard EDF file consists of a header record with metadata for the recording
followed by data records. The filename extension has to be .edf or .EDF. A data file
contains recordings which were acquired with the same technique and with common
amplifier settings. Different settings result in different files.
For metadata provided in the file header only a limited character set (ASCII 32-
126) is allowed. A space separated CHAR-array contains the patient data. A space
separated ASCII-array contains information for the recording. Per-channel properties
define the physical conditions of the recording. There is no information about the
recording geometry, i.e. the electrode positions.
EDF and EDF+ only support 16-bit sample values in strict chronological order.
Values have to be stored as 2-complement, in little-endian byte order (low byte first).
Recorded data are split into chunks with a maximum size of 61440 bytes.
S. Winkler et al. / Achieving an Interoperable Data Format for Neurophysiology 101

The necessity to store sample values in 16-bit values leads to size overhead, if the
acquisition only produces 8-bit values. On the other hand sample values with more than
16 bits are not supported and can only be stored after downsizing.
EDF does not support annotations. EDF+ supports text annotations, events and
stimuli. They are stored together with the signal data as an additional channel with a
defined channel label. The sample data for the channel then contains characters instead
of 2 byte integers.
EDF and EDF+ are without doubt the most popular formats for biomedical signal
data. They are supported by a wide range of EEG manufacturers and software vendors.
In many cases interfaces for importing or exporting EDF are available.

3.4. GDF and OENORM K 2204

The General Data Format GDF [11] was defined with the aim to overcome some of the
limitations of EDF+, for example missing support for more than 16 bit sampling width
or missing electrode positions. For this purpose some elements of some other standards
were incorporated in the specification. An open source implementation (C++ and
MATLAB) including tools for conversion and a library for reading and writing GDF
2.x is available at [12]. In 2015 GDF became an Austrian Standard [13].
GDF provides a comprehensive set of metadata in fixed file header containing
patient and recording related information and a variable header reserved for channel
specific parameters.
Signal data is stored in the data section of the .gdf file. Data is organized in records,
each containing a defined amount of samples, first block for first channel, second block
for second channel, and so on. The sampling rate, number of samples and data type
may differ for each channel. The signal values are stored as numeric data. The data
type for each channel is defined in the variable header, the recommended data format is
32-bit integer, but there are 13 different data types defined ranging from 8-bit integer
up to 128-bit float.
Annotations are stored in a table of events after the data section. Its start address
within the .gdf file is calculable. Events have to follow a well-defined structure
containing type, position in samples, channel, duration, and time stamps. For different
types of events a code table is provided within the specification.
The General Data Format GDF overcomes some limitations of EDF+ like the
missing electrode localization, which is done via coordinate positions (XYZ) with
originates in the center of the head. GDF supports any kind of signal data and is not
restricted to EEG. The software is used in some research projects (listed at [12]), there
is no known commercial use.

3.5. DICOM Waveforms

DICOM [14] is a standard for medical imaging. It is based on a well-defined


information model and contains normative specifications for the communication
protocol and the data objects as well. Clinical workflow, image transmission, printing,
viewing, etc. are covered in this comprehensive standard.
102 S. Winkler et al. / Achieving an Interoperable Data Format for Neurophysiology

Figure 1. DICOM Waveform Information Model (from [14] PS3.17)

Support for different character sets is granted, the used character set is contained in
the DICOM message itself. Unicode and multi-byte character sets are supported, too.
Usage of a defined nomenclature is mandatory for many of the properties of the
acquisition system; use of different code systems is supported.
DICOM contains Waveform objects since 2001. Object definitions exist for audio
data, different types of ECG data, hemodynamic and respiratory signal data as well.
Waveform acquisition can happen in context of an image acquisition or without.
DICOM allows handling both situations: waveforms can be stored together with an
imaging context or on their own, as separate information objects.
DICOM Waveform objects are structured like the well-known DICOM image
objects following a well-defined information model – see Figure 1. The metadata
provide the full clinical context ranging from patient data to data acquisition parameters.
The signal data itself can be stored in different formats, depending on the signal
type and on the physical parameters of data acquisition (i.e. the bit-depth of the AD-
converter). Defined terms for waveform sample types are listed in Table 1.
DICOM waveforms also support annotations. They are stored together with the
waveform in one information object. The annotation information can be free text or a
coded item. In case of a coded item this can contain a numeric measurement or a coded
concept.
Furthermore the well-known and broadly supported DICOM Structured Report
objects could be used to store evaluation results and neurology reports.
S. Winkler et al. / Achieving an Interoperable Data Format for Neurophysiology 103

Table 1. DICOM Waveform Sample Formats (according [14])


Waveform Bits Allocated Waveform Sample Meaning
Interpretation
8 SB signed 8 Bit linear
UB unsigned 8 Bit linear
MB 8 Bit μ-law
AB 8 Bit A-law
16 SS signed 16 Bit linear
US unsigned 16 Bit linear

4. Discussion

In spite of various attempts to find a common data format neurophysiology still lacks
interoperability. Although EDF is supported by many device manufactures it is not
supported by any healthcare platforms. Extending DICOM Waveforms to
neurophysiology data would accomplish this and would result in additional advantages.
DICOM’s main objective is interoperability. The standard is – besides HL7 - one
of the major components of the IHE integration framework in the radiology and also in
the cardiology domain.
To add support for a General EEG Waveform Storage SOP Class analogously to
the already existing General ECG Waveform Storage SOP Class only some domain
specific adaptions would be necessary. Most of them are easy to achieve, like deviating
value ranges for recording properties such as sampling frequency or scaling factors.
The main effort results from identifying adequate nomenclatures for electrode positions,
coded acquisition context information and coded annotations.
For EEG recordings the anatomical position of the electrodes is an important
information. Clinical routine EEG uses electrode positions on the surface of the skull
according to the International 10-20 or 10-10 system [16]. ISO/IEEE 11073 – 10101
[15] provides standardized terms for these locations.
To achieve interoperability, DICOM supports annotations with coded content.
Therefore ISO/IEEE 11073 – 10101 [15] provides a nomenclature and codes for
neurology comprising measurement, device, and patient related events.
Moreover DICOM provides mechanisms to store and to preserve the spatial or
temporal relationship of DICOM instances. Synchronization of DICOM objects is
possible even if acquired on different devices. For clinical use this could be of interest
especially for EEG synchronized to video, fMRI, PET or SPECT.
Especially video recordings are important in neurology, for example in epilepsy
monitoring or sleep studies. As DICOM supports different video formats like MPEG2,
H.264 or H.265, the videos can be stored as (separate) DICOM files. These DICOM
instances can be synchronized to the waveform object – i.e. the EEG - using the
mechanisms described above.
Compression remains an open issue because the DICOM standard currently does
not include any time-series-specific compression algorithms. Integration of
compression algorithms is possible in principle and can be done if the DICOM
committee accepts their usage. Algorithms working for different types of signal data
(ECG, EEG, MEG, pressure waveforms, etc.) would be preferred; patent protection
could be an obstacle. As long as there are no waveform compression algorithms
104 S. Winkler et al. / Achieving an Interoperable Data Format for Neurophysiology

supported by the DICOM standard, waveforms can make use of the deflated transfer
syntax (RFC 1951; i.e. the zip algorithm) which is applied to the data set as a whole.
Another open issue is the missing support of (almost) real-time online submission
of waveforms. DICOM waveform objects are designed and intended to be persisted.
The format does not permit continuous writing due to the structure of the data. Even
though the DICOM standard defines a communication protocol (called Transfer
Syntax) to stream image data, which is based on the JPEG 2000 image format, there is
no such mechanism defined for waveform objects. The need for such communication
mechanism is well-known. Efforts will have to be taken in future by the standardization
bodies to define streaming protocols for DICOM waveforms for future
implementations.

References

[1] Commission Decision (EU) 2015/1302 of 28 July 2015 on the identification of ‘Integrating the
Healthcare Enterprise’ profiles for referencing in public procurement (Text with EEA relevance),
https://eur-lex.europa.eu/eli/dec/2015/1302/oj, last access: 21.3.2019
[2] A. Schloegl, An overview on data formats for biomedical signals. Image Processing, Biosignal
Processing, Modelling and Simualtion, Biomechanics. Munich, Germany: Word Congress on Medical
Physics and Biomedical Engineering, 2009; pp 1557-1560
[3] M. Stead, J.J. Halford, Proposal for a Standard Format for Neurophysiology Data Recording and
Exchange. Clin. Neurophysiol. 33(5) (2016), 403-413
[4] ASTM E1467: American Clinical Neurophysiology Society Technical Standard 1 (ACNS 1) Standard
for Transferring Digital Neurophysiological Data Between Independent Computer Systems, 2008
[5] NCCLS. Standard Specification for Transferring Clinical Observations Between Independent Computer
Systems. NCCLS document LIS5-A [ISBN 1-56238-493-7]. NCCLS, 940 West Valley Road, Suite
1400, Wayne, Pennsylvania 19087-1898 USA, 2003
[6] HL7 Messaging Standard Version 2.6, 2007
[7] ISO/IEEE 11073 – 10201 Health informatics Point-of-care medical device communication Part
10201:2004 Domain information model
[8] ISO/IEEE 11073 – 30300 Health informatics – Point-of-care medical device communication Part
30200:2004 Transport profile – Cable connected
[9] B. Kemp, A. Värri, A.C. Rosa, K.D. Nielsen and J. Gade, A simple format for exchange of digitized
polygraphic recordings. Electroencephalogr. Clin. Neurophysiol. 82(5) (1992), 391-393.
http://www.edfplus.info/specs/edf.html, last access: 8.2.2019
[10] B. Kemp, J. Olivan, European data format 'plus' (EDF+), an EDF alike standard format for the
exchange of physiological data. Clin. Neurophysiol. 114(9) (2003), 1755-61.
http://www.edfplus.info/specs/edfplus.html, last access: 8.2.2019
[11] A. Schloegl, GDF - A General Data Format for Biosignals, https://arxiv.org/abs/cs/0608052, last
access: 8.2.2019
[12] A. Schloegl, The BioSig Project, http://biosig.sourceforge.net/index.html, last access: 8.2.2019
[13] Austrian Standards Institute ÖNORM K 2204 General data format for biomedical signals, 2015
[14] NEMA PS3 / ISO 12052, Digital Imaging and Communications in Medicine (DICOM®) Standard,
National Electrical Manufacturers Association, Rosslyn, VA, USA
[15] ISO/IEEE 11073 – 10101 Health informatics – Point-of-care medical device communication Part
10101:2004 Nomenclature
[16] Jasper, H.H., The ten–twenty electrode system of the International Federation. Electroencephalogr.
Clin. Neurophysiol. 10 (1958), 371–375
[17] Klem, G.H., Luders, H.O., Jasper, H.H., Elger, C., The ten–twenty electrode system of the International
Federation. The International Federation of Clinical Neurophysiology. Electroencephalogr. Clin.
Neurophysiol. Suppl.(52) (1999), 3–6
dHealth 2019 – From eHealth to dHealth 105
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-105

A Comprehensive FXR Signaling Atlas


Derived from Pooled ChIP-seq Data
Emilian JUNGWIRTHa,b,d,e,1, Katrin PANZITTb, Hanns-Ulrich MARSCHALLc, Martin
WAGNERb,d,e and Gerhard G. THALLINGERa,d,e
a
Institute of Computational Biotechnology, Graz University of Technology, Austria
b
Research Unit for Translational Nuclear ReceptorResearch, Division of
Gastroenterology and Hepatology, Medical University Graz, Graz, Austria
c
Department of Molecular and Clinical Medicine, University of Gothenburg and
Sahlgrenska University Hospital, Gothenburg, Sweden
d
OMICS Center Graz, Graz, Austria
e
BioTechMed-Graz, Graz, Austria

Abstract. Background: ChIP-seq is a method to identify genome-wide


transcription factor (TF) binding sites. The TF FXR is a nuclear receptor that
controls gene regulation of different metabolic pathways in the liver. Objectives:
To re-analyze, standardize and combine all publicly available FXR ChIP-seq data
sets to create a global FXR signaling atlas. Methods: All data sets were
(re-)analyzed in a standardized manner and compared on every relevant level from
raw reads to affected functional pathways. Results: Public FXR data sets were
available for mouse, rat and primary human hepatocytes in different treatment
conditions. Standardized re-analysis shows that the data sets are surprisingly
heterogeneous concerning baseline quality criteria. Combining different data sets
increased the depth of analysis and allowed to recover more peaks and functional
pathways. Conclusion: Published single FXR ChIP-seq data sets do not cover the
full spectrum of FXR signaling. Combining different data sets and creating a
“FXR super-signaling atlas” enhances understanding of FXR signaling capacities.
Keywords. ChIP-seq, FXR, ENCODE

1. Introduction

Transcription factors (TF) bind to distinct recognition sites on the DNA and thereby
regulate gene transcription. Chromatin immunoprecipitation sequencing (ChIP-seq) is a
method to identify genome-wide binding sites of a specific TF and to gain information
about transcriptional regulation, affected genes and pathways. Nuclear receptors (NRs)
are a class of TFs, which are directly activated/inactivated by agonistic/antagonistic
ligands. The NR farnesoid X receptor (FXR) is activated by bile acids, thereby
controlling gene regulation of different metabolic pathways mainly in the liver (e.g.
bile acid-, lipid- and glucose metabolism). FXR recently attracted attention as a novel
drug target for various metabolic liver diseases. Therefore, understanding precise
genomic FXR binding and transactivation of genes is important to fully reconstruct
FXR signaling, particularly when used as therapeutic drug.

1
Corresponding Author: Emilian Jungwirth, Medical University of Graz, A-8010 Graz,
Stiftingtalstrasse 24, E-Mail: [email protected]
106 E. Jungwirth et al. / A Comprehensive FXR Signaling Atlas Derived from Pooled ChIP-seq Data

Several FXR ChIP-seq data sets for different species, conditions and cell lines
have been reported, none so far for human liver tissue. Our aim was to re-analyze these
publicly available data sets with a standardized method and combine these data sets for
further extended downstream analysis of FXR signaling properties. In addition, we
compared the available public data sets to our own human biopsy material.

2. Methods

We searched public sources for available FXR-ChIP-seq data sets to determine a


common set of generally applicable quality criteria based on the ones proposed in the
ENCODE- and other authoritative ChIP-seq guidelines [1, 2]. Furthermore, we
investigated different parameter settings and control sample variants. A combined
mouse FXR ChIP-seq data set was generated by pooling mapped reads of the available
four standardized mouse data sets to gain a higher sequencing depth. Enriched regions
a.k.a. peaks represent putative FXR binding sites. A de novo motif analysis and motif
scan was performed on all called peaks. The potentially regulated genes were
determined using proximity to peaks. Those genes were used to identify enriched
pathways.

2.1. Data sets

In public repositories, FXR-ChIP-seq data sets were available for mouse, rat and a cell
line of primary human hepatocytes. We also had access to our own FXR-ChIP-seq data
set from human liver tissue (Table 1). Raw reads were available for all data sets except
“Mouse-Guo” and “Mouse-Osborne”. For the “Mouse-Osborne” data set only mapped
read tracks were available. In case of the “Mouse-Guo” data sets only the called peak
tracks were available.
Table 1. Available data sets for this study. Naming of the data sets is based on the species and the last author
of the paper where the data was first published.

Paper Samples Name Ref


GenomeǦwide tissueǦspecific farnesoid X receptor binding in mouse
2 Mouse-Guo [3]
liver and intestine.
Genome-wide interrogation of hepatic FXR reveals an asymmetric Mouse-
1 [4]
IR-1 motif and synergy with LRH-1. Osborne
Metformin interferes with bile acid homeostasis through AMPK- Mouse-
4 [5]
FXR crosstalk. Lefebvre
Gene expression profiling in human precision cut liver slices in Mouse-
4 [6]
response to the FXR agonist obeticholic acid. Kersten
Genomic analysis of hepatic farnesoid X receptor binding sites
Mouse-
reveals altered binding in obesity and direct gene repression by 4 [7]
Kemper
farnesoid X receptor in mice.
Toxicogenomic module associations with pathogenesis: a network-
6 Rat-Stevens [8]
based approach to understanding drug toxicity.
Genome-wide binding and transcriptome analysis of human
2 PHH-Guo [9]
farnesoid X receptor in primary human hepatocytes.
Unpublished observation: FXR ChIP-seq in normal vs cholestatic
2 Human-Wagner -
patients
E. Jungwirth et al. / A Comprehensive FXR Signaling Atlas Derived from Pooled ChIP-seq Data 107

2.2. ChIP-seq analysis

We created our own ChIP-seq analysis pipeline (Fig 1). The quality of the data samples
is assessed at relevant steps of the analysis. Most of the data processing was performed
using a locally available Galaxy [10] instance. The analysis comprises three major
steps:
Raw read handling: Most of the data sets were single-end (SE) Illumina reads.
Trimmomatic (version 0.36.5) [11] was used to trim and filter overrepresented
sequences such as Illumina adapter. Additional parameters to the ILLUMACLIP were
a SLIDINGWINDOW of 4 bases with an average quality of 28 and a minimum length
of 80% of the raw read length to ensure a high read quality. FastQC [12] was used to
confirm the quality.
Mapping and peaks calling: Filtered reads were mapped to the human genome
version hg19, mouse genome version mm10 and rat genome version rn6 using Bowtie
2 (version 2.3.4.2) [13, 14] with default parameters.
To determine putative FXR binding sites model-based analysis of ChIP-seq
version 2 (MACS2 version 2.1.1) [15, 16] was used. Various parameter combinations
were used to evaluate their effects on the outcome and determine the most reliable
parameter combination. The parameters were: q-value of 0.01 or 0.05, using input, IgG
or no control sample, having a fixed or estimated fragment length and the two different
standard effective genome sizes for human (2.45 and 2.7Gbp).
Downstream analyses: For the top 500 scoring peaks a de novo motif analysis
was performed using Multiple Em for Motif Elicitation MEME SUITE (version
4.12.0.0) [17]. The sequences flanking the peak summit by 100bp on either side were
examined. Apart from the number of motifs which was set to 10 the default parameters
were used. Additionally, a motif scan for the canonical IR1 FXR motif
(AGGTCAxTGACCT) [18] was performed using the tool FIMO from MEME SUITE.
The scan was performed for the HOMER FXR motif across the narrow peaks and
wider peak regions. The wider peak region was defined as 1000bp up- and downstream
from the peaks summit.
Peaks were annotated to UCSC knownGenes using the R package ChIP-Seeker
[19]. Each gene was defined as potentially regulated by FXR if a peak summit is
located in the promotor (defined as +/-1kbp around TSS), intron or exon region of that
gene. Genes were subjected to a REACTOME [20] pathways analysis; a q-value of less
than 0.05 was considered statistically significant.

Figure 1. ChIP-seq analysis pipeline: The three major steps of a ChIP-seq analysis are (i) Read quality
control (QC), (ii) Mapping and peak calling, and (iii) Downstream-analyses such as a motif- and a pathway-
analysis.
108 E. Jungwirth et al. / A Comprehensive FXR Signaling Atlas Derived from Pooled ChIP-seq Data

2.3. Pooling the single data sets

A combined mouse data set “Mouse-pooled” was generated by pooling the filtered and
mapped reads of 13 individual mouse samples from 4 different mouse data sets to gain
higher sequencing depth. By pooling the samples on the read level, a summation of the
individual FXR-signals is achieved. This summation of the FXR-signals allows the
detections of weaker FXR binding sites, which could not be detected in single data sets.
Because all data sets are from different laboratories only limited summation of noise is
expected to occur. This analytic procedure combined with the strict filtering of the raw
reads is expected to lead to a high quality virtually deep sequenced FXR ChIP-seq data
set.
Subsamples were created to further investigate the saturation of FXR-related
peaks/genes. The subsamples were created by randomly selecting reads from the entire
combine data set. The subsamples size reached from 1/20 to 2/3 of the entire pooled
reads. For each subsample size five distinct subsamples were created.

2.4. Comparison

The comparison between the data sets on a read and peak level was based on the
quality metrics proposed in ENCODE- and other authoritative ChIP-seq guidelines [1,
2] (Table 2).

Table 2. Metrics used to assess the quality of the ChIP-seq samples. NSC/RSC were calculated using the
phantompeakqualtools package version 2 [21, 22].
Quality metric Abbriviation
Ratio of uniquely mapped reads to total number of reads UMR/TNR
Ratio of uniquely mapped reads to total number of mapped reads UMR/TMR
Non-Redundant Fraction NRF
PCR Bottleneck Coefficient 1 PBC1
PCR Bottleneck Coefficient 2 PBC2
Normalized Strand Cross-correlation coefficient NSC
Relative Strand Cross-correlation coefficient RSC
Fraction of reads, which are in peak regions FRiP
Percentage of peaks with foldchange greater than 5 %fc>5
Percentage of peaks, which are in Dnase I HS sites % Dnase I HS

The similarity between the various peak calling results and the corresponding
genes was determined using the Jaccard distance [23]. The pairwise Jaccard distances
were visualized with a heatmap. It was necessary to map the genes to their orthologues
of the other species to correctly estimate the similarity between different species.
Mouse and rat genes were mapped to their corresponding human genes.
A dotplot was used to illustrate enrichment of pathways across samples. Some
samples did not show any enriched pathways under the defined settings. Additional
pathway trees for each sample with enriched pathways were created, to investigate the
branch and subtree differences between the samples.
E. Jungwirth et al. / A Comprehensive FXR Signaling Atlas Derived from Pooled ChIP-seq Data 109

3. Results

In public repositories, FXR-ChIP-seq data sets from three different species are
available: five for mice, one for rat and one for human primary hepatocytes. Most data
sets include baseline FXR binding and binding events under pharmacological treatment
(i.e. FXR activation with different ligands) or diseased conditions (i.e. diet-induced
non-alcoholic fatty liver disease, bile duct ligation induced cholestasis). No public data
sets are available for human liver tissue (Table 1). Our analysis shows that these data
sets are heterogeneous concerning baseline quality criteria (Table 3).

Table 3. Evaluation of ChIP-seq quality for the available data sets. The number of samples/analysis results
which pass the quality metric in respect total number of samples/analysis results is presented. Peak calling
was performed with multiple parameter combinations; thereby the number of peak calling results is a
multiple of the number of samples.
Study

Rat-Stevens
Mouse-Guo
Threshold

PHH-Guo
Lefebvre
Osborne

Human-
Kemper

Wagner
Kersten
Mouse-

Mouse-

Mouse-

Mouse-

Mouse-
pooled
value

Quality metric
UMR/TNR 50% - - 4/4 4/4 4/4 - 4/6 2/2 2/2
UMR/TMR 50% - - 4/4 4/4 4/4 - 6/6 2/2 2/2
NRF 50% - - 4/4 4/4 4/4 1/1 6/6 2/2 1/2
PBC1 50% - 1/1 4/4 4/4 2/4 1/1 6/6 2/2 1/2
PBC2 1,00 - 1/1 4/4 4/4 4/4 1/1 6/6 2/2 2/2
NSC 1,05 - - 0/4 0/4 4/4 - 6/6 2/2 2/2
RSC 0,8 - - 0/4 0/4 0/4 - 6/6 2/2 2/2
FRiP 1% - - 16/16 15/32 27/32 4/4 24/24 8/16 22/24
%fc>5 50% - 8/8 16/16 32/32 32/32 0/4 24/24 16/16 24/24
%Dnase I HS 80% 1/2 8/8 0/16 0/32 0/32 0/4 - 2/16 5/24

When analyzed with the various analysis parameters in a standardized manner, the
number of called FXR peaks and associated genes ranges from 103 to 40,080 and 6 to
12,873 in the single data sets, respectively. For the combined data set, the number of
called peaks reached from 24,747 to 59,319 and the number of associated genes from
10,038 to 13,826 for the different parameter combinations. The called peaks/genes of
the combined data sets represent more than just the simple addition of binding
sites/genes from the single data sets and can be explained by enhancement of weak
signals after virtually increasing sequencing depth.
The comparison of the public data sets to our human data set revealed that the
quality of the human data set (although derived from surgical tissue) is in many regards
at least as good as published data sets. The human data set passed the RSC quality
criteria, which is crucial for the correct estimation of the fragment length by MACS2.
The human data set also included an input and IgG control sample, which was critical
to analyze the impact of different control samples in ChIP-seq experiments.
The most prevalent motif identified by the de novo search within the top 500 peaks
was the canonical FXR IR-1 motif (AGGTCAxTGACCT). It was present in 2 to 54%
of narrow peaks and 20 to 64% in wider peak regions for the different data sets.
The similarity between the samples was determined using the Jaccard distance
based on the identified genes. Samples of the same data set group together rather than
samples from the same condition/treatment from different data sets (Fig 2). Based on
the quality criteria and the pairwise Jaccard similarities the parameter combination of:
110 E. Jungwirth et al. / A Comprehensive FXR Signaling Atlas Derived from Pooled ChIP-seq Data

q-value 0.05, no control sample, fixed set fragment length (if the estimated fragment
length was unrealistic) and - for the human samples - an effective genome size of
2.7Gbp was considered as the most reliable parameter settings. Only peak calling
results from those parameters were used for all further analysis.

Figure 2. Heatmap based on the pairwise Jaccard distance. The samples are colored based on the data sets.
The cluster tendency seems to be towards data sets rather than sample conditions.

Peaks were assigned to the closest annotated genes. Based on the assigned genes
enriched REACTOME pathways were identified (Fig. 3). The combined analysis
revealed additional significant pathways, which are not present in any of the single
mouse data sets. Some of those additional pathways are also present in samples of other
species. This demonstrates both a conservation of the FXR dependency of that pathway
across multiple species and validity of the additional pathways identified by the
combined data set.
Mouse−Kersten−40_MVEH8
Mouse−Kersten−41_MVEH9
Mouse−Kersten−38_MINT4
Mouse−Kersten−39_MINT5
Human_cholestasis_01

Mouse−Lefebvre−13
Mouse−Lefebvre−14
Mouse−Lefebvre−15
Mouse−Lefebvre−16
Mouse−Kemper−16
Mouse−Kemper−17
Mouse−Kemper−18
Mouse−Kemper−19

Rat_36_ligated_14d
Mouse−Guo−ileum
Mouse−Guo−Liver
Human_normal_03

Rat_38_ligated_5d

Rat_40_ligated_1d
Rat_35_sham_14d

Rat_39_sham_1d
Rat_37_sham_5d
Mouse−Osborne
Mouse−pooled
PHH−Guo_77
PHH−Guo_79

Diseases of signal transduction


SUMOylation
Sig. by Receptor Tyrosine Kinase
R-HSA-983168
Programmed Cell Death
Asparagine N-linked glycosylation
Phospholipid metabolism
Peroxisomal protein import
Metabo. of amino acids and derivates
Post-translat. protein phophorylation
Integration of energy metabolism
Caridac conduction
Neuronal System
PLC beta mediated events
Synthesis of PE
Cellular responses to external stimuli
Meatabo. of RNA
Metabo. of lipids

Figure 3. Top enriched RACTOME pathways represented in a dotplot. The dot color relates to the q-value
and the size to the pathway coverage (number of pathway genes found in the pathway / total number of genes
in the pathways). For some samples no enriched pathways were found under the defined settings.

3.1. Insights in FXR binding events revealed by the combined data set
The combined mouse data set shows many additional peaks, genes and pathways which
were not present in any of the individual samples (e.g the ‘Translocation of SLC2A4
(GLUT4) to the plasma membrane’ pathway is one of 33 pathways which are only
E. Jungwirth et al. / A Comprehensive FXR Signaling Atlas Derived from Pooled ChIP-seq Data 111

present in the combined mouse data set). Similarly, some peaks, genes and pathways
present in one or more individual mouse samples are not present in the combined data
set (e.g. the ‘Tspy-ps’ gene is not present in the combined mouse data set although it is
present in 8 of the individual mouse samples). This indicates that the signal for those
peaks is not conserved across all samples. This could be explained either by a signal
that is only present under very specific conditions, which were only met in a single
sample, or by incorrectly called peaks due to noise. Peaks are more prevalent in the
vicinity of TSSs, which is expected for a TF ChIP-seq experiment.
Interestingly, over 96% of the liver FXR ChIP-seq genes from the “Mouse-Guo”
data set are present in the combined data set although the “Mouse-Guo” was not
included in the pool because only the peak tracks were available. Furthermore 70% of
the “Mouse-Guo” genes which are not present in any other single mouse sample are
present in the pooled data set. This indicates that the pooling of FXR signal allowed the
detection of weaker signals. Although the combined data sets revealed many new
potential FXR related binding sites, saturation appears not to be reached. This is
demonstrated by subsampling the combined data set (Fig. 4).
A B

Figure 4. The number of reads with respect to the number of peaks (A) and number of genes (B) for the
“Mouse-pooled” data set and its subsamples. The blue points represent the number of peaks/genes for either
the entire “Mouse-pooled” data set or of its subsamples. A linear (black) and exponential (red) fitting curve
was created for the data points; the exponential curve represents a much better fit.

4. Discussion

Several FXR ChIP-seq data sets are publicly available for various species and
conditions. Standard ENCODE quality criteria are usually not reported for those data
sets. We observe that the analysis results are sensitive to settings of certain analysis
parameters such as the effective genome size and most prominently to the choice of
control sample, which is generally underappreciated in most studies. A low-quality
control sample can have a significant impact on the peak calling results even if the
ChIP-seq sample is of good quality. Influences of control samples on the peak calling
results were also reported in other studies [24]. Therefore, an analysis without a control
sample should be considered. Interestingly, the human in vivo samples were more
similar to rodent in vivo samples than to in vitro human primary hepatocytes.
Individual data sets often exhibit a too low sequencing depth to identify weak/rare
binding sites, therefore we combined all available mouse reads to create a “FXR-super-
signaling-atlas” for a profound downstream analysis of FXR signaling capacities. This
data set allowed to detect more binding sites, genes and connected pathways. However,
even the combined data set did not reach the theoretical determined saturation.
112 E. Jungwirth et al. / A Comprehensive FXR Signaling Atlas Derived from Pooled ChIP-seq Data

In conclusion, this meta-analysis of these different data sets with standardized


methods should help to get a comprehensive and global overview of FXR binding
events, FXR binding motifs, FXR-dependent gene regulation and affected pathways
across various species. Combining standardized public data sets allows for more
profound detection of binding events and signaling capacities.

References
[1] Landt SG, Marinov GK, Kundaje A, Kheradpour P, Pauli F, Batzoglou S, et al. ChIP-seq guidelines and
practices of the ENCODE and modENCODE consortia. Genome Res 2012;22(9), 1813-31.
[2] Shin H, Liu T, Duan X, Zhang Y, Liu XS. Computational methodology for ChIP-seq analysis.
Quantitative Biology 2013;1(1), 54-70.
[3] Thomas AM, Hart SN, Kong B, Fang J, Zhong X, Guo GL. GenomeǦ wide tissueǦ specific farnesoid X
receptor binding in mouse liver and intestine. Hepatology 2010;51(4), 1410-9.
[4] Chong HK, Infante AM, Seo Y, Jeon T, Zhang Y, Edwards PA, et al. Genome-wide interrogation of
hepatic FXR reveals an asymmetric IR-1 motif and synergy with LRH-1. Nucleic Acids Res
2010;38(18), 6007-17.
[5] Lien F, Berthier A, Bouchaert E, Gheeraert C, Alexandre J, Porez G, et al. Metformin interferes with
bile acid homeostasis through AMPK-FXR crosstalk. J Clin Invest 2014;124(3), 1037-51.
[6] Ijssennagger N, Janssen AW, Milona A, Pittol JMR, Hollman DA, Mokry M, et al. Gene expression
profiling in human precision cut liver slices in response to the FXR agonist obeticholic acid. J Hepatol
2016;64(5), 1158-66.
[7] Lee J, Seok S, Yu P, Kim K, Smith Z, RivasǦ Astroza M, et al. Genomic analysis of hepatic farnesoid
X receptor binding sites reveals altered binding in obesity and direct gene repression by farnesoid X
receptor in mice. Hepatology 2012;56(1), 108-17.
[8] Sutherland J, Webster Y, Willy J, Searfoss G, Goldstein K, Irizarry A, et al. Toxicogenomic module
associations with pathogenesis: A network-based approach to understanding drug toxicity. The
Pharmacogenomics Journal 2017;18(3), 377-90.
[9] Zhan L, Liu H, Fang Y, Kong B, He Y, Zhong X, et al. Genome-wide binding and transcriptome
analysis of human farnesoid X receptor in primary human hepatocytes. PloS One 2014;9(9), e105930.
[10] Afgan E, Baker D, Van den Beek M, Blankenberg D, Bouvier D, Čech M, et al. The galaxy platform
for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res
2016;44(W1), W3-W10.
[11] Bolger AM, Lohse M, Usadel B. Trimmomatic: A flexible trimmer for illumina sequence data.
Bioinformatics 2014;30(15), 2114-20.
[12] Andrews S. FastQC A quality control tool for high throughput sequence data.
<http://www.bioinformatics.babraham.ac.uk/projects/fastqc/>. Accessed 2018 10/10.
[13] Langmead B, Salzberg SL. Fast gapped-read alignment with bowtie 2. Nature Methods 2012;9(4), 357.
[14] Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA
sequences to the human genome. Genome Biol 2009;10(3), R25.
[15] Feng J, Liu T, Qin B, Zhang Y, Liu XS. Identifying ChIP-seq enrichment using MACS. Nature
Protocols 2012;7(9), 1728.
[16] Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of
ChIP-seq (MACS). Genome Biol 2008;9(9), R137.
[17] Bailey TL, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in
bipolymers. Proc Int Conf Intell Syst Mol Biol. 1994;2, 28-36.
[18] Laffitte BA, Kast HR, Nguyen CM, Zavacki AM, Moore DD, Edwards PA. Identification of the DNA
binding specificity and potential target genes for the farnesoid X-activated receptor. J Biol Chem
2000;275(14), 10638-47.
[19] Yu G, Wang L, He Q. ChIPseeker: An R/bioconductor package for ChIP peak annotation, comparison
and visualization. Bioinformatics 2015;31(14), 2382-3.
[20] Fabregat A, Jupe S, Matthews L, Sidiropoulos K, Gillespie M, Garapati P, et al. The reactome pathway
knowledgebase. Nucleic Acids Res 2017;46(D1), D649-55.
[21] Kundaje A, Jung LY, Kharchenko P, et al. Assessment of ChIP-seq data quality using cross-correlation
analysis. <http://code.google.com/p/phantompeakqualtools>. Accessed 2018 08/23.
[22] Kharchenko PV, Tolstorukov MY, Park PJ. Design and analysis of ChIP-seq experiments for DNA-
binding proteins. Nat Biotechnol 2008;26(12), 1351.
[23] Jaccard P. Lois de distribution florale dans la zone alpine. Bull Soc Vaudoise Sci Nat 1902;38, 69-130.
[24] Marinov GK, Kundaje A, Park PJ, Wold BJ. Large-scale quality analysis of published ChIP-seq data.
G3 (Bethesda) 2014;4(2), 209-23.
dHealth 2019 – From eHealth to dHealth 113
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-113

Robust Comparison of Simultaneous EEG


Recordings Using Kalman Filters and
Gaussian Mixture Models
Niels VON STEINa, Jonas SCHULTE-COERNEa, Stephan M. JONASb and Ekaterina
KUTAFINAa,1
a
Department of Medical Informatics, RWTH Aachen University, Germany
b
Department of Informatics, Technical University of Munich, Germany

Abstract. In this manuscript we propose a novel method to compare simultaneously


recorded electroencephalography (EEG) signals from different devices. Although
standard methods like correlation and spectral analysis give quantitative answers to
this question, these methods often penalize certain artifacts such as eye blinking too
strongly. In our analysis we instead utilize an unsupervised labeling technique to
evaluate the matching of two signals by comparing their label sequences. The
proposed method was successfully tested on artificial data, where it showed a
reduced deviation from the ground truth compared to the correlation coefficient.
Furthermore, the method was applied on a real use-case to assess the quality of a
low-cost EEG device compared to a clinical one. Here it showed more consistent
results than the correlation coefficient, while it also did not rely on outlier removal
prior to the analysis. However, the proposed method still suffers from accidental
matches of labels, so that unrelated data sets may be assigned an unexpectedly high
matching score. This paper suggests extensions to the proposed method, which
could improve this issue.

Keywords. electroencephalography, latent class analysis, unsupervised machine


learning

1. Introduction

The collection of electroencephalography (EEG) data has been a very cumbersome


procedure. Patients usually undergo examinations with stationary and costly medical
grade devices. Data acquisition sessions often include time consuming setup procedures.
In contrast, the continuous development of consumer grade health devices in the recent
years has increased the supply of low-cost, mobile EEG devices. On the one hand, this
may solve the issue of availability and accessibility of medical equipment. On the other
hand, it allows researchers to think about new ways of acquiring EEG data in various
different application fields like telemedicine, brain-computer-interfacing or personal
health.
However, cost reduction usually comes with a reduction in measurement quality [1].
The signal quality of mobile EEG devices has been extensively studied among specific
devices as well as among entire device groups [2, 3]. These studies used external stimuli

1
Corresponding Author: Ekaterina Kutafina, RWTH Aachen University, Germany, E-Mail:
[email protected]
114 N. Von Stein et al. / Robust Comparison of Simultaneous EEG Recordings

to record event related potentials (ERP) and applied statistical analyses to quantify the
signal quality.
While ERPs are used in many medical contexts, others require the recording of
resting state EEG. For example, in epilepsy diagnostics, resting state EEG is recorded
and analyzed for presence of epileptiform abnormalities. Therefore, mobile EEG devices
should also be validated for resting state EEG, which limits the usage of statistical
methods, such as averaging. In this case, methods such as cross-correlation or cross-
spectrum analysis are applied to compare the performance of EEG devices [4]. These
methods often fail to adequately reflect the matching of two EEG sequences, due to
sensitivity to strong disturbances like eye blinking, which are neglected as artifacts
during a clinical interpretation of the EEG signal. While some of these problems can be
solved through artifact removal, it is highly desirable to compare core features of the
signals. Until now, little work was reported on EEG data comparison through
mathematical modelling. One such approach was proposed by Mikkelsen et al. [5]. The
authors built linear models to investigate mutual information between scalp and ear EEG
devices.
Therefore, we propose an approach that is based on labeling the EEG signal’s
samples through clustering and comparing the label sequences. Specifically, we fit an
autoregressive model (AR) to the EEG signal by using a Kalman filter to handle the low
signal-to-noise ratio and non-stationarity. The non-stationarity results in a sequence of
AR models for each EEG time series. Subsequently, the computed AR coefficient are
clustered with a Gaussian Mixture Model and the distance between cluster label
sequences for two simultaneously recorded EEG time series is defined and returns a
“matching score”. We argue, that such score offers a promising alternative to the
currently used signal-based approaches and is characterised by resistance to noise and
artifacts.

2. Methods

Our approach is based on labeling the EEG signal’s samples based on clustering and
comparing the label sequences. The aim of labeling the signal’s samples is to express the
matching quality in terms of the matching of label sequences, which removes any
interdependencies between scoring and the model used for labeling. One promising
approach for this is presented by Penny et. al. [6]. The authors fit mixtures of Gaussians
to the hidden state vectors of stationary phases in the signal in order to distinguish, if the
test person had done a task with his left or right hand.
In our approach, the hidden state vectors are the coefficients of an autoregressive
model, which is adapted by using a Kalman filter. Arnold et. al. [7] investigated the
properties of several physiological signals (including EEG) by adaptive filtering. They
showed that this approach is well suited for such types of signals due to its fast
convergence to new system states.
One important property of EEG is the piecewise-stationarity of its signals.
Stationarity is given when mean and variance of a signal do not change over time. In
general, this condition does not hold for long EEG recordings but for short periods of
time, stationarity can be assumed [8]. If two EEG devices take simultaneous
measurements on the same patient and with similar electrode positions, these stationary
phases should be identical. For this purpose, we first try to identify stationary phases in
N. Von Stein et al. / Robust Comparison of Simultaneous EEG Recordings 115

the EEG signal. Afterwards we compare the two EEG recordings regarding the matching
of these stationary phases, which gives a quantitative measure.
The stationary phases are identified by the coefficients of an autoregressive (AR)
model of the EEG signal. The piecewise-stationarity of the signal implies, that the
coefficients are time-variant in general, but that they are constant during a stationary
phase.
The adaptation of the AR model’s coefficients is done with a Kalman filter, which
results in a time series of coefficient sets. These sets can be clustered by training a
Gaussian Mixture model (GMM). Applying the GMM to a time series of coefficient sets
yields a sequence of labels for the stationary phases, which is used for the comparison of
the EEG signals. Since the GMM can be trained on the coefficient sets of both EEG
recordings, we repeat the process vice versa and take the average of both comparisons.

2.1. State estimation with a Kalman Filter

Eq. 1 shows how the standard ‫ܴܣ‬ሺ‫݌‬ሻ model estimates the next value of a sequence
‫ݕ‬ො௡ from a linear combination of the past ‫ ݌‬values ‫ݕ‬ො௡ି௜ and includes additive white,
Gaussian noise ߝ. The parameters of an AR model are the stochastic properties of the
noise and the weighting factors ܽ௜ for the linear combination.

‫ݕ‬ො௡ ൌ ෍ ܽ௜ ή ‫ݕ‬ො௡ି௜ ൅ ߝ (1)


௜ୀଵ
In the stochastic interpretation an AR model’s coefficients vector represents the
hidden state of the model. In our application, the hidden state has to be time-variant, so
that the ݊-th hidden state ܺ௡ ൌ ሺܽଵ ǡ ǥ ǡ ܽ௣ ሻ௡ ் is used to generate the ݊-th value ‫ݕ‬ො௡ as
described in Eq. 1.
A Kalman filter (KF) is used to estimate the hidden state from the EEG recordings.
The KF algorithm consists of a prediction step and a correction step. In the prediction
step, the next hidden state is calculated from the previous one and the system’s dynamics.
In the correction step, the hidden state prediction is adjusted by the deviation between an
actual measurement and an AR models output, that is based on the predicted hidden state.
The uncertainty of the state prediction is tracked and compared to the measurement
uncertainty during the correction step, in order to achieve robustness against noise in the
EEG recordings.
Since the dynamics, according to which the hidden state changes, are unknown, the
transition between hidden states is modeled as a random walk. The random walk simply
adds a random offset to the previous estimation of the hidden state. Eq. 2 shows the
prediction of the next hidden state ܺ෠௡ from the previous estimation ܺ෠௡ିଵ and an additive
component ߱௡ , that is distributed according to a zero-mean, Gaussian distribution.
ܺ෠௡ ൌ ܺ௡ିଵ ൅ ߱௡ (2)
The predicted uncertainty of the state estimation is modeled as the covariance
matrix ܲ෠௡ . It is computed by adding the covariance ܳ of ߱௡ to the previous uncertainty
estimation ܲ௡ିଵ , as shown in Eq. 3. This shows, that due to the random walk, the
predicted hidden state is less certain than the previous estimation.
ܲ෠௡ ൌ ܲ௡ିଵ ൅ ܳ (3)
During the correction step, the estimation of the hidden state ܺ௡ is computed from
its prediction ܺ෠௡ and the error of the predicted AR model’s output ݁௡ . The factor by
which the prediction error influences the state estimation is the Kalman gain ‫ܭ‬௡ . It
116 N. Von Stein et al. / Robust Comparison of Simultaneous EEG Recordings

contains the ratio between the measurement uncertainty ܴ and ܲ෠௡ , which prevents noisy
measurements from having an exceeding influence on the state estimation. Furthermore,
it contains the inverse of ‫ܪ‬௡ , which transforms the output error ݁௡ to an error in the state
estimate, so it can be used for correction. The equations for the correction step are shown
in Eqs. 4-6.
݁௡ ൌ ‫ݕ‬௡ െ ‫ݕ‬ො௡ ൌ ‫ݕ‬௡ െ ‫ܪ‬௡ ் ܺ෠௡ (4)
‫ܪ‬௡ ் ܲ෠௡
‫ܭ‬௡ ൌ ் (5)
‫ܪ‬௡ ܲ෠௡ ‫ܪ‬௡ ൅ ܴ
ܺ௡ ൌ ܺ෠௡ ൅ ‫ܭ‬௡ ݁௡ s (6)
The final operation of the correction step is shown in Eq. 7, which shows the
calculation of the estimated uncertainty ܲ௡ . Compared to the predicted uncertainty of the
state estimate ܲ෠௡ , this is lower due to the inclusion of the information from the
measurement.
ܲ௡ ൌ ܲ෠௡ െ ‫ܪ‬௡ ‫ܭ‬௡ ் ܲ෠௡ (7)
After initialization of ܺ଴ , ܲ଴ , ܳ and ܴ , this filter however will not accurately
approximate the hidden states over time. In the event of sudden changes of the signal’s
characteristics, which occur in transitions between two stationary phases, a KF with
constant prediction uncertainty ܳ will take too long to adapt the hidden state. This can
be remedied by using the Jazwinski algorithm [9] to implement a time-variant covariance
matrix ܳ௡ . This algorithm uses a prediction and update procedure, which is similar to the
steps of the KF algorithm. The prediction step in Eq. 9 uses the measurement
covariance ߪ௡ ଶ , which is the denominator of the Kalman gain. Eq. 10 shows how the
smoothing parameter ߙ is used to update ܳ௡ . Contrary to the Kalman gain, ߙ is a
constant hyperparameter.
ߪ௡ ଶ ൌ ‫ܪ‬௡ ் ܲ෠௡ ‫ܪ‬௡ ൅ ܴ (8)
݁௡ ଶ െ ߪ௡ ଶ
ܳ෠௡ ൌ ݉ܽ‫ ݔ‬ቊͲǡ ቋ (9)
‫ܪ‬௡ ் ‫ܪ‬௡
ܳ௡ ൌ ߙܳ௡ିଵ ൅ ሺͳ െ ߙሻܳ෠௡ (10)

2.2. Estimating prototype states

We train a Gaussian Mixture Model (GMM) on the sequence of hidden states ܺ௡ in order
to find clusters. The clusters represent the piecewise-constant AR coefficients during the
stationary phases, which we refer to as prototype states in the further text.
The state sequence is centered and normalized with the ‫ܮ‬ଶ -norm per vector
component to satisfy scale dependency of the GMM. Moreover, we use individual
diagonal covariance matrices for each component.
During transitions between two stationary phases, the hidden state fluctuates in a
way, that it might not match any of the prototype states. Including these transition phases
in the training for the GMM might therefore result in a biased model. Because of this,
the training is only done with states where the evidence is high. The evidence ‫݌‬ሺ‫ݕ‬௡ ȁߠ௡ ሻ
describes the probability observing the measured ‫ݕ‬௡ , given the current state of the KF ߠ௡ .
It is modeled according to Eq. 11 as a conditional, Gaussian probability density.
‫݌‬ሺ‫ݕ‬௡ ȁߠ௡ ሻ ൌ ࣨሺ‫ݕ‬௡ ȁ‫ݕ‬ො௡ ǡ ߪ௡ ଶ ሻ (11)
During state transitions, in which the KF has to adapt, its current state does not
correspond well to the measured EEG signal. Therefore, the evidence decreases
significantly between two stationary phases. We decided to use the mean evidence as
N. Von Stein et al. / Robust Comparison of Simultaneous EEG Recordings 117

threshold for the evidence, below which the hidden states were excluded from the
training of the GMM.

2.3. Comparing the EEG signals

Let ‫ܮ‬ሺܺ௡ ǡ ߮ሻ be the function, that maps a hidden state ܺ௡ to the most probable prototype
state of a trained GMM with the model parameters ߮. With this, the state sequences can
be transformed into sequences of prototype state labels.
In order to reduce fluctuations of the comparison results, the label sequences of the
EEG signals are segmented into non-overlapping chunks. The unilateral matching score
for each chunk is the number of samples, for which both EEG signals have the same
label, normalized by the chunk size.
Since a GMM can be trained on the hidden state sequences of both EEG signals, the
matching score that is used to assess the similarity of the signals is computed as the
average of both unilateral matching scores.
If a single value is required for the comparison, the average of all segments’
matching scores is taken.

3. Results

3.1. Simulation

For verifying our method we generate a test set of piecewise-stationary signals with a
known hidden state sequence. Additionally, we add fixed-length Gaussian windows with
varying amplitude at random points to simulate disturbances of the signal. We use a
weighted sum of sine functions with different frequencies and white noise. In order to
generate piecewise-stationary phases, the weighting coefficients of individual summands
are switched. Each generated pair of signals consists of three state phases while either
0%, 33%, 66% or 100% of the phase sequences match. The white noise covariance was
set to ߪ௡ ଶ ൌ ͲǤͳ which corresponds to 10% of the maximal amplitude. For initialization
of the model we first fit an ‫ܴܣ‬ሺͷሻ-model, for which the choice ‫ ݌‬ൌ ͷ was determined
heuristically. The estimated coefficients are then used as the initial state ܺ଴ and we
assume a low initial covariance ܲ଴ ൌ ͳ, while the measuring noise covariance ܴ is set to
the root-mean-square-error (RMSE) of the AR model’s prediction. This should reflect
the assumption of an underlying AR process and explains the error of the fitted AR model
with measurement noise. Moreover, we assume a slight smoothing in the Jazwinski
algorithm of ߙ ൌ ͲǤͲͷ for stability.
The results are displayed in Fig. 1, which illustrates, that our approach reliably gives
results closer to the true matching than correlation does. The average deviation of the
matching score from the ground truth is 0.131, while the correlation coefficient deviates
by 0.225 on average. Within groups of pairs with an identical true matching, the matching
score has an average deviation of 0.114, 0.102, 0.117 and 0.190 from the ground truths
of 1.00, 0.66, 0.33 and 0.00 respectively. The corresponding differences between the
correlation coefficient and the true matching are 0.414, 0.341, 0.137 and 0.007.
118 N. Von Stein et al. / Robust Comparison of Simultaneous EEG Recordings

Figure 1. Comparison of the true matching of artificial signals with the results of the proposed method and the
correlation coefficient.

3.2. EEG Data

In order to evaluate the proposed approach on real EEG data, we analyze two
simultaneously recorded, multi-channel EEG signals. One has 14 channels and is
recorded by a consumer-grade mobile device (mEEG), the other one has 21 channels and
is recorded by a clinical device (cEEG), both in the 10-20 system. The differences in
electrode numbers, electrical characteristics and reference systems required initial
preprocessing of the data to obtain 28 pairs of EEG signals with 128Hz sampling rate.
Each pair contains one signal from the mEEG and one from the cEEG, which were
recorded simultaneously with spatially close electrodes. The preprocessing procedure
included a normalization to avoid large scale differences. Despite this processing, this
data is referred to as raw during the further text, in order to distinguish it from the clean
data, in which outliers and artifacts were removed with the intention to compute more
meaningful correlation coefficients. As a reference, the Pearson correlation coefficient
was computed for both the raw and the clean data, while the proposed approach was
evaluated on the raw data only.
For the model we follow the same process as before, which fits an AR model and
uses its coefficients as initial state ܺ଴ . We use ‫ ݌‬ൌ Ͷ for the AR model and take the
RMSE of its prediction as initialization for ܴ. We assume a higher initial covariance of
ܲ଴ ൌ ͳ and a smoothing in our state noise covariance adaptation of ߙ ൌ ͲǤͲͷ. Due to the
length of the EEG recordings, we split the estimated state changes into segments of 1000
samples for the cluster analysis. In order to determine the number of Gaussian
components we employ the estimation, that EEG signals are stationary for about half a
second [10], which amounts to about 16 possible stationary phases per segment.
However, not all of these phases correspond to a distinct prototype state and also the
length of a stationary phase is variable. We therefore tested values of three to nine
components, where a choice of three resulted in the best generalization performance. The
results of our analysis for 28 pairs of EEG recordings is presented in Fig. 2, which
includes the correlation coefficients for comparison. Each pair is recorded
simultaneously with spatially close combinations of electrodes, so a high degree of
similarity is expected.
N. Von Stein et al. / Robust Comparison of Simultaneous EEG Recordings 119

Figure 2. Comparison of the proposed approach with the correlation coefficients for real EEG recordings.

4. Discussion

With the simulated data, the matching score of the proposed method is generally closer
to the ground truth than the correlation coefficient. The average deviation of the matching
score from the true matching is almost 42% lower than that of the correlation coefficient.
On EEG data, the proposed method’s matching score is much more consistent than
the correlation coefficient. Fig. 2 shows, that it has a lower variance and there are no
cases with an exceptionally good or bad matching score. By contrast, the correlation
coefficient is unexpectedly low for certain pairs like 9 and 17. And this being the case
for both the raw and the clean data shows, that the low correlation cannot be explained
with artifacts and outliers. Additionally, the correlation coefficients vary strongly
between different pairs, even when comparing only the clean data.
However, an interesting observation can be made, when grouping the simulated data
by the true matching degree. For the higher values of the ground truth matching, the
proposed method yields a lower matching score than expected, while the matching score
for pairs with little to no relationships in the data consistently exceeds the true matching.
The unexpectedly high matching score for unrelated pairs of signals can be
explained with the fact, that a GMM always assigns a cluster label, even if the evidence
of that sample’s hidden state estimation is low. This can cause random matches in the
label sequence even for completely mismatching sections and prevents matching scores
of zero for those segments. A similar behavior can be observed with the real EEG data,
in which the pairs of signals from spatially remote electrodes were assigned a matching
score, which was only slightly worse on average, than that of highly related signal pairs.
Checking the overall posterior probabilities for all mixture components may help to
improve this behavior.
With perfectly matching pairs, the proposed method does not produce a matching
score of one, because of fluctuating changes of the state label while no significant
changes in the hidden state sequence occur. In this case the hidden state is very close to
a decision boundary of a mixture component. In our model we assume independence
between stationary phases. However, for EEG signals there is a higher probability to stay
in one stationary phase than transitioning to another [10]. Since the combination of a KF
and a GMM does not allow to model this aspect explicitly, an improvement might be
implemented by using a first-order hidden Markov model.
Another point to mention is the selection of an optimal number of mixture
components of the GMM. The actual number of stationary phases varies between
120 N. Von Stein et al. / Robust Comparison of Simultaneous EEG Recordings

different EEG recordings. A sub-optimal selection of this parameter will directly affect
the matching score, as the GMM model might over- or underfit. In the EEG dataset our
selection of the parameter was based on domain knowledge and the performance tested
on a small subset of the data. The best parameter was then used for all pairs of EEG
signals. However, the optimal choice for this parameter may vary highly across the
different pairs, which would cause a large scoring error.

5. Conclusion

The problem at hand was to develop a method for comparing two simultaneous EEG
recordings from different devices. As standard techniques for this task like correlation
suffer from typical characteristics of the EEG signal, we proposed an approach, which
focuses on statistical methods. Instead of comparing single values of the time series we
interpret the EEG sequence as a stochastic process with a time-variant hidden state. The
development of the hidden state serves as a proxy and allows for a more accurate
comparison. The proposed method was successfully tested on simulated and real world
data with promising results. The method is robust against outliers and generates more
consistent matching scores for simultaneous recordings than the correlation coefficient.
Some future technical improvements, possibly leading to even more robust model,
were proposed. However, a test set of EEG signals with known matching levels would
be necessary in order to perform a proper comparison of different methods. Since the
definition of matching strongly depends on the specific application, the design of such a
data collection should be performed together with the domain experts.

References

[1] B. Farnsworth, EEG headset prices: An overview of 15+ EEG devices, July 2017.
[2] Nicholas A. Badcock, Petroula Mousikou, Yatin Mahajan, Peter de Lissa, Johnson Thie and Genevieve
McArthur, Validation of the Emotiv EPOC EEG gaming system for measuring research quality auditory
ERPs, PeerJ, February 2013.
[3] Andrew Melnik, Petr Legkov, Krzysztof Izdebski, Silke M. Kärcher, W. David Hairston, Daniel P. Ferris
and Peter König, Systems, Subjects, Sessions: To What Extent Do These Factors Influence EEG Data?,
Frontiers in Human Neuroscience, March 2017.
[4] M. Lopez-Gordo, D. Sanchez-Morillo and F. Valle, Dry EEG Electrodes, Sensors, July 2014.
[5] Kaare B. Mikkelsen, Preben Kidmose and Lars K. Hansen, On the Keyhole Hypothesis: High Mutual
Information between Ear and Scalp EEG, Frontiers in Human Neuroscience, June 2017.
[6] W. D. Penny and S. J. Roberts, Dynamic models for nonstationary signal segmentation, Computers and
Biomedical Research, 1999.
[7] M. Arnold, X. H. R. Milner, H. Witte, R. Bauer and C. Braun, Adaptive AR modeling of nonstationary time
series by means of Kalman filtering, IEEE Transactions on Biomedical Engineering, 1998.
[8] S. Sanei and J. A Chambers, EEG signal processing, Wiley Online Library, 2007.
[9] A. H. Jazwinski, Adaptive filtering, Automatica, 1969.
[10] P. L. Nunez and S. J. Williamson, Neocortical dynamics and human EEG rhythms, Physics Today,
January 1996.
dHealth 2019 – From eHealth to dHealth 121
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-121

Development of a National Roadmap for


Electronic Prescribing Implementation
Hamidreza DEHGHANa, Saeid ESLAMIa,1, Seyed Hadi GHASEMIa, Mohammad
JAHANGIRIa, Kambiz BAHAADINBEIGYb, Khalil KIMIAFARa, Mahdi
AGHABAGHERIc, Seyedeh Mahdieh NAMAYANDEHc, Mahdi SARGOLZAEIa
a
Mashhad University of Medical Sciences, Mashhad, Iran
b
Kerman University of Medical Sciences, Kerman, Iran
c
Shahid Sadoughi University of Medical Sciences, Yazd, Iran

Abstract. Background: In July 2015, Iran Food and Drug Administration


convened a multi-stakeholder workgroup (workgroup) to help develop
recommendations for electronic prescribing implementation in Iran. Objectives: In
general, the consensus of the workgroup was to focus on solutions that
incrementally reduce the burden on patients, providers, and payers, and require
minimal rework by using national standards that have already been used for Health
Information Interchange. We used a road mapping method which includes a
number of systematic steps and is adapted from the standard scientific method.
Medical Informatics Experts Developed protocols for Scoping Reviews,
Systematic reviews and Health Technology Assessment study and then collected
evidence from peer-reviewed scholarly journal publications and gray literature.
Health Insurance companies representatives and Electronic Prescribing pilot
studies executives were asked to report their experiences in the case of e-
prescribing. Results: After five meetings, by comparing and contrasting the
national and international evidence, the recommendations were finalized in expert
panels. In this paper, we report recommendations from this roadmap.

Keywords. electronic prescribing, national roadmap, Iran, implementation,


standards

1. Introduction

According to recent scientific evidence, the appropriate electronic prescribing (EP)


implementation can be fruitful in patient safety increase and saving costs [1-5].
Therefore, EP implementation is a priority for the Iran Ministry of Health and Medical
Education (MOH). In July 2015, Iran Food and Drug Administration (IFDA) convened
a multi-stakeholder workgroup (workgroup) to help develop recommendations for EP
implementation in Iran.
In general, the consensus of the workgroup was to focus on solutions that
incrementally reduce the burden on patients, providers, and payers, and require
minimal rework by using national standards already has been used for Health
information Interchange (HII).

1
Corresponding Author: Saeid Eslami, Mashhad University of Medical Sciences, Mashhad, Iran,
E-Mail: [email protected]
122 H. Dehghan et al. / Development of a National Roadmap for Electronic Prescribing Implementation

2. Methods

2.1. Road mapping

We proceed with a vision-driven road mapping approach in order to derive a plan of


actions. We used a road mapping method which includes a number of systematic steps,
is adapted from the “standard” scientific method developed by Afsarmanesh et al. [6-8].
However, we adapted their method to customize the steps according to our limitation
and preferences. For example, we have dropped the “plan time” step because our
stakeholders were not willing to have that. They preferred to only have a roadmap chart
in order to characterize inter-liking between the identified actions. Figure 1 shows the
method that we have used to develop our roadmap in a step-by-step manner.

Figure 1. Our road mapping method

2.2. Workgroup members

IFDA called upon the following executives to introduce expert(s): 1. MOH, 2. Medical
Council Organization, 3. IFDA Rational Drug Use Committee, 4. Electronic
Prescribing Pilot Projects Executives, 5. Main Health Insurance Companies, and 6.
Medical Informatics Experts (MIE) from Mashhad University of Medical Sciences.

2.3. International Evidence collection

In accordance with the guidelines of the workgroup, Medical Informatics Experts were
asked to find the previous researches, analyze their results, summarize the most
important findings, and report the most significant results for the workgroup members.
We used Arksey, H. and O'Malley methodology [9] for scoping reviews, Prisma P for
systematic reviews [10], and Core HTA [11] for Health Technology Assessment Study
[12-13].
MIE collected evidence from peer-reviewed scholarly journal publications by
searching in major electronic databases (Medline/PubMed, Embase, Scopus and
Google Scholar). A comprehensive gray literature search was conducted to find other
national reports, recommendations, standards, and Implementation Guides.
National Committee on Vital and Health Statistics (NCVHS)2 of the United States
of America published a report on recommendations for electronic prescribing in 2005
[6]. To prepare the mentioned report, evidence from the past was gleaned and then
reviewed and the gaps of knowledge in this regard were found. After reviewing
2
The NCVHS serves as the statutory [42 U.S.C. 242k(k)] public advisory body to the Secretary of
Health and Human Services (HHS) for health data, statistics, privacy, and national health information policy
and the Health Insurance Portability and Accountability Act (HIPAA). website: https://ncvhs.hhs.gov/
H. Dehghan et al. / Development of a National Roadmap for Electronic Prescribing Implementation 123

evidence MIE suggested this report to be used as the cornerstone. So we focused on the
evidence after 2005 and evidence were reviewed up to September 2016. The second
milestone was the epSOS Project 3 as an infrastructure for cross-border exchange of
Patient Summary and e-prescribing in Europe. A comparative review of electronic
prescription systems in five countries (Denmark, Finland, Sweden, England, and the
United States of America) [7] has been used.

2.4. National Evidence Collection

Health Insurance Companies representatives and Electronic Prescribing Pilot studies


executives were asked to report their experiences in the case of e-prescribing. In a
scoping review, published Iranian studies were scrutinized and then they were utilized
for recommendations implementation. We used the study results of “Modeling of
Outpatient Prescribing Process in Iran” [8] for describing the current situation.

2.5. Expert Panel Meetings

After five meetings, by comparing and contrasting the national and international
evidence, the recommendations were finalized in expert panels.

3. Results

The following recommendations are the results of the collaboration of the multi-
stakeholder workgroup, which are identified as the needed actions in our proposed
roadmap.
1. E-prescribing standards should be comprehensive and suitable for all
physicians, pharmacists and it should provide information for insurance
companies.
2. The standards should be compatible with other MOH information
interchange (HII) standards.
3. Information security and confidentiality should be guaranteed.
4. Backward compatibility of the standards should be mentioned.
5. E-prescribing implementations should support national formulary.
National formulary data should be available by web service.
6. Basic e-prescribing functionality should be implemented including:
a. Creating new prescription
b. Canceling prescription

3
epSOS is an eHealth (electronic Health) interoperability project funded by the European Commission.
It aims at improving medical treatment of citizens while abroad by providing Healthcare Professionals (HCP)
with the necessary electronic and safe patient data This initiative broke new ground and generated a lot of
interest in Europe: "When the project was initiated in 2008 it involved a few stakeholders, but it gradually
grew to encompass 25 countries and about 50 beneficiaries", project coordinator Fredrik Lindén (Sweden)
and his team write in their letter
(http://epsos.eu/fileadmin/content/pdf/deliverables/epSOS_letter_to_contributors_1July2014.pdf).
"The epSOS project achieved considerable results in a range of areas. Main technical deliverables
include development of a solid basis for the eprescription and patient summary services, considering:
governance, use cases, data content, semantics, specifications, architecture, testing mechanisms, etc.".
website: http://www.epsos.eu/
124 H. Dehghan et al. / Development of a National Roadmap for Electronic Prescribing Implementation

c. Refilling prescription
d. Revising prescription according to pharmacist consultation
e. Patient medication history should be accessible for the
prescription provider.
f. Supporting prior authorization (prior authorization is done by
pharmacies in Iran)
g. Providing medication delivery feedback for physician
7. To guarantee the security standards the following infrastructures are
mandatory: secure health information interchange network, digital
signature, and PKI service.
8. Support of health ID card should be mentioned.
9. Prescription delivery should be possible from a pharmacy which is not
connected to the e-prescribing network. (e.g. by health card or printed
prescription)
10. Clinical workflow must be supported in network instability.
11. E-prescribing should support clinical workflow in offices, and pharmacies.
12. E-prescribing implementation should support claims data.
13. Information processes and data analyses should be planned from the first
step.
14. A mapping between different coding standards should be mentioned.
15. Medication availability in the country or mentioned pharmacy should be
observed while prescribing.
16. Vivid regulations should be observed for alternative medications delivery
by pharmacists.
17. The standard format of the prescription should be observed.
18. Evidence shows that decision support system reduce medical errors;
therefore, it is recommended the e-prescribing system to be equipped with
decision support systems. The following DSSs in e-prescribing are
recommended:
a. Access to clinical guidelines,
b. Notification to drug allergy,
c. Drug dose calculation,
d. Order set recommendation,
e. Providing feedback based on national average drug use,
f. Suggestion for cheaper alternative drugs
19. Knowledge-base using in decision support system should be supervised
and guaranteed to be updated.
20. Patient, provider and pharmacist identification standards should be
implemented across the country.
21. E-prescribing should support the care of non-citizens, in this case,
passport number can be used for patient identification.
22. The identification code for office, pharmacy, Hospitals, Clinics (Health
care Centers), insurance plan should be provided.
23. The process of license issue for e-prescribing solutions should be
implemented.
24. Incentive considerations for the pharmacies or physicians using e-
prescribing.
25. E-prescribing should be integrated with EHR systems.
26. Free text field should be available for special cases.
H. Dehghan et al. / Development of a National Roadmap for Electronic Prescribing Implementation 125

27. Patient preference including order language should be mentioned.


28. The alternative plan should be available in crises.

Focus area Recommendations


E-prescribing
1. Comprehensive standard 2. HII compatibility 4. Backward compatibility
Standard Focus

3. Security & confidentiality 5. National formulary

Functionality
6a. New prescription 6b. Cancel prescription
Focus
6c. Refilling prescription

Prescription
6d. Pharmacist consultation 6e. Patient history
Revising Focus

6f. Prior authorization 6g. Delivery feedback

Infrastructure
Focus 7. Secure HII network 7. Digital signature 7. PKI service

Implementation
8. Support health ID card 9. Prescription delivery 10. Clinical workflow
Focus

11. Clinical workflow 12. Claims data 13. Data analyses

Observation
14. Code mapping 15. Medication availability
Focus

16. Vivid regulations 17. Standard format

DSS Focus
18. DSSs for e-prescribing 19. Supervised Knowledge

Identification
20. Identification Standards 22. Identification Codes
Focus
20. Administrative
21. Non-citizens patients
identification

Others
23. Process of license issue 24. Incentive considerations 25. Integrated with EHR

26. Free text field 27. Patient preference 28. Alternative plan

Figure 2. The proposed road map chart

According to the 7th step of the method (see Figure 1), these actions are
summarized and represented in the form of a roadmap chart. Our results are categorized
into 7 different focus areas. The transitions the actions reveal a time-based dependency
or priority between some of the actions. Figure 2 shows the proposed roadmap chart.
126 H. Dehghan et al. / Development of a National Roadmap for Electronic Prescribing Implementation

4. Discussion

E-prescribing was implemented in developed countries such as Sweden from 1980s [9].
Over the years, the reasons for success and failure of e-prescribing have been
investigated. Although, systematic reviews and meta-analyses showed that the e-
prescribing implementation can reduce medical errors and save cost but the context of
the national health model influence on development and adoption of Electronic
Prescribing.
Although most of the evidence is transferable and we must learn lessons from
experiences in other countries but the recommendations from one country should not
be used before customization in another country.
A few National level road-maps for digitalization of health care are published [18].
Most of them are based on the expert meeting but we collect evidence by scoping and
systematic reviews, interview with semi structured questionnaire and a Health
Technology Assessment study to support expert panel. This method led us to some
specific recommendations. Because of the noticeable number of tourists and
immigrants in Iran that they don’t have Health ID we recommended to use passport
number for the identification process. We noticed that herbal and traditional medicines
are important in Iran so we recommended EP systems must support them and free text
field should be available for special cases. Catastrophic disasters have occurred in Iran;
therefore, we recommended having an alternative plan in a crisis. People in different
languages live in Iran; therefore, we recommended EP systems must support
multilanguage drug order.
We published our method and results hoping our experience be useful for other
countries. We also hope to get feedback from scholars to update the recommendations.
We have planned to publish supplementary studies and explanation of recommendation
items as soon as possible.

References

[1] Nguyen MR, Mosel C, Grzeskowiak LE. Interventions to reduce medication errors in neonatal care: a
systematic review. Therapeutic advances in drug safety. 2018;9(2):123-55.
[2] Deetjen U, European E-Prescriptions : Benefits and Success Factors 2016,
https://www.politics.ox.ac.uk/materials/publications/15224/workingpaperno5ulrikedeetjen.pdf, last
access: 20.3.2019.
[3] Page N, Baysari MT, Westbrook JI. A systematic review of the effectiveness of interruptive medication
prescribing alerts in hospital CPOE systems to change prescriber behavior and improve patient safety.
International journal of medical informatics. 2017;105:22-30.
[4] Stojkovic T, Marinkovic V, Manser T. Using Prospective Risk Analysis Tools to Improve Safety in
Pharmacy Settings: A Systematic Review and Critical Appraisal. Journal of patient safety. 2017.
[5] Hermanowski, T. R., Kowalczyk, M., Szafraniec-Burylo, S. I., Krancberg, A. N., & Pashos, C. L. (2013).
Current status and evidence of effects of e-prescribing implementation in United Kingdom, Italy,
Germany, Denmark, Poland and United States. Value in Health, 16(7), A462–A463.
[6] Camarinha-Matos, L. M., Afsarmanesh, H., Ferrada, F., Oliveira, A. I., & Rosas, J. (2013). A
comprehensive research roadmap for ICT and ageing. Studies in Informatics and Control, 22(2), 233–
254. http://doi.org/10.24846/v22i3y201301
[7] Afsarmanesh, H., Camarinha-Matos, L. M., & Msanjila, S. S. (2009). A well-conceived vision for
extending professional life of seniors. IFIP Advances in Information and Communication Technology,
307, 682–694. http://doi.org/10.1007/978-3-642-04568-4_70
[8] Camarinha-Matos, L. M., & Afsarmanesh, H. (2012). Collaborative networks in active ageing–a roadmap
contribution to demographic sustainability. Production Planning & Control, 23(4), 279–298.
H. Dehghan et al. / Development of a National Roadmap for Electronic Prescribing Implementation 127

[9] Arksey, H., & O’Malley, L. (2005). Scoping studies: towards a methodological framework. International
Journal of Social Research Methodology, 8(1), 19–32. http://doi.org/10.1080/1364557032000119616
[10] Moher, D., Shamseer, L., Clarke, M., Ghersi, D., Liberati, A., Petticrew, M., … Shekelle, P. (2015).
Preferred reporting items for systematic review and meta-analysis protocols (PRISMA-P) 2015
statement. Systematic Reviews, 4(1), 1. http://doi.org/10.1186/2046-4053-4-1
[11] Lampe, K., Mäkelä, M., Garrido, M. V., Anttila, H., Autti-Rämö, I., Hicks, N. J., … Kärki, P. (2009).
The HTA core model: a novel method for producing and reporting health technology assessments.
International Journal of Technology Assessment in Health Care, 25(S2), 9–20.
[12] Eslami, S., Dehghan, H., Namayandeh, M., Dehghani, A., Dashtaki, S. H., Gholampour, V., …
Ghasemian, S. (2018). Applied Criteria of Hospital Information Systems in Organizational Evaluation:
A Systematic Review Protocol. Internal Medicine and Medical Investigation Journal, 3(2), 52–56.
[13] Dehghan, H. R., Eslami, S., Namayandeh, M., Dehghani, A., Dashtaki, S. H., Gholampour, V., …
Barzegar, A. (2018). Criteria for Ethical Evaluation of Hospital Information Systems: A Protocol for
Systematic Review. Internal Medicine and Medical Investigation Journal, 3(4).
[14] National Committee on Vital and Health Statistics. Recommendations from Past Reports : E Ǧ
Prescribing Standards(2005). Retrieved from
http://endingthedocumentgame.gov/PDFs/ePrescribing.pdf
[15] Samadbeik, M., Ahmadi, M., Sadoughi, F., & Garavand, A. (2017). A comparative review of electronic
prescription systems: Lessons learned from developed countries. Journal of Research in Pharmacy
Practice, 6(1), 3. http://doi.org/10.4103/2279-042X.200993
[16] Ahmadi, M., Samadbeik, M., & Sadoughi, F. (2014). Modeling of the Outpatient Prescribing Process in
Iran: A Gateway toward Electronic Prescribing System. Iranian Journal of Pharmaceutical Research,
12(2), 725–738. Retrieved from
http://ijpr.sbmu.ac.ir/index.php/daru/article/view/?_action=articleInfo&article=1500
[17] Deetjen, U. (2016b). European e-prescriptions: benefits and success factors. Working Paper No. 5,
Working Paper Series of the Cyber Studies Programme, Department of International Relations,
University of Oxford.
[18] WHO. (2018). Towards a Roadmap for the Digitalization of National Health Systems in Europe, (June),
1–44. Retrieved from http://www.euro.who.int/__data/assets/pdf_file/0008/380897/DoHS-meeting-
report-eng.pdf?ua=1
128 dHealth 2019 – From eHealth to dHealth
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-128

Design and Evaluation of a Smart


Medication Recommendation System for
the Electronic Prescription

Seyed Hadi GHASEMIa, Kobra ETMINANIa,1, Hamidreza DEHGHANa, Saeid


ESLAMIa, Mohammad Reza HASIBIANa, Hasan VAKILI ARKIa, Mohammad Reza
SABERIa, Mahdi AGHABAGHERIb, Seyedeh Mahdieh NAMAYANDEHb
a
Department of Medical Informatics, Faculty of Medicine, Mashhad University of
Medical Sciences, Mashhad, Iran
b
Shahid Sadoughi University of Medical Sciences, Yazd, Iran

Abstract. Background: electronic prescription is shown to have many benefits in


terms of reducing medication errors, improving patient safety, productivity, and
resource management, but it may cause new errors and physician frustration if not
designed and implemented properly. Improving usability and user-centered design
is essential for physicians’ adoption. Objectives: To enhance the efficiency of the e-
prescribing system by reducing the risk of inappropriate selection of the medication
and also to reduce the prescribing time and effort to reach the desired drug. Methods:
Important data fields for predicting medications were determined through interviews
with pharmacists. Among those, fields which were available in a claims dataset of
16 million prescriptions were extracted and were used to develop a neural network
model to be used by a recommender system that displays the most probable
medications on top of the drop-down list in the e-prescription application. Results:
Offline and field evaluations both showed that this model could improve
performance. Conclusion: smart recommenders systems can improve e-prescription
usability, safety, and enhanced physicians’ adoption.

Keywords. electronic prescribing, recommender system, usability

1. Introduction

Medical errors are the third leading cause of death in the United States[1], and about 20%
of these errors are related to medication errors [2, 3]. Electronic prescribing (e-
prescribing) has been offered as a solution to this problem and has been shown to have
many benefits [4-7]. But on the other hand, if electronic prescribing is not implemented
properly, it can add new errors (e-iatrogenesis) [8-11]
Usability and User-Centered Design (UCD) [12] are critical elements in the design
and development of electronic prescription and electronic health record (EHR) systems
in general, which can enhance patient's safety and encourage physicians’ adoption and
reduce their dissatisfaction with these systems [13].

1
Corresponding Author: Kobra Etminani, Department of Medical Informatics, Faculty of Medicine,
Mashhad University of Medical Sciences, Mashhad, Iran, E-Mail: [email protected]
S.H. Ghasemi et al. / Design and Evaluation of a Smart Medication Recommendation System 129

Poor usability not only results in an increased level of clinician frustration but also
can lead to errors, posing serious threats to patient safety. [14-16]
This study is aimed to enhance the efficiency of the e-prescribing system by
reducing the risk of inappropriate selection of the medication and also to reduce the
prescribing time of the physicians. We propose a model that recommends the most
commonly prescribed medication on top of the drop-down list in the e-prescription
application and showed that it can improve performance.

2. Methods

The main idea was based on the observation that pharmacists can read physicians’
handwritten prescriptions just with a few legible letters. It was obvious that they use
some complementary information such as the physicians’ specialty to narrow down the
search space which contains all possible alternatives to the correct medication, eventually
reaching a very probable result and confirming that with other clues on the prescription.
We decided to use this approach to design a smart system to “guess” what might be in
the physician’s mind when he/she has entered only a few characters from the beginning
of drug’s name. So that the application could display the search results based on their
probability, instead of sorting them alphabetically.
At the first phase of the study, in order to extract the pharmacists’ tacit knowledge
and find out which information fields they use to reach the conclusion about a specific
drug, we conducted semi-structured deep interviews with pharmacists. They were
provided an initial list of eight fields and asked to try to describe how they think and talk
about the information fields they may use if the case of facing an illegible prescription.
These fields were identified through two brain-storming sessions and include “physician
specialty”, “frequency of drug use in general”, “frequency of drug use among physicians
with same specialty”, “other drugs in the prescription”, etc. The pharmacists were
allowed to modify the list, add new fields and/or remove useless fields from the list. The
viewpoints of the interviewees where noted and also all the sessions’ voice were recorded
with the interviewees’ permission. Recorded voices were rechecked by the interviewers
to ensure nothing is missed from the notes. Interviews were continued until information
saturation and no new fields were added to the list in two consequent interviews. A total
of sixteen pharmacists were interviewed.
A checklist containing all information fields we found in the interviews was
prepared and sent to twenty pharmacists via email, asking them to rank the fields by their
relative importance in reading illegible prescriptions. Then the fields were sorted by their
average rank, making the final list ready for the next phase.
Some of the identified fields were not available (for example, the “diagnosis” field
is not recorded in claims data), and also some are not applicable in the context of
electronic prescribing, such as the number of ordered drug which won’t be available to
the system before the drug itself is known.
We used a drug claims database of over 16 million prescriptions containing 46
million drug items, prescribed over two consequent years in a large province of Iran, to
build and train a model for predicting drug names in the context of an electronic
prescription system.
To build a model, we used the “Lift” concept which is commonly used in the data-
mining approach: “Association Rule Mining”. It is defined as the ratio of two
probabilities. The ratio of the probability of an event might occur in a specific condition
130 S.H. Ghasemi et al. / Design and Evaluation of a Smart Medication Recommendation System

to the probability of that event might occur in general. We used this value to rank drugs
matching the user’s input and sort the search result based on those ranks.
For example, when a drug name starts with letters “Ac”, it can be “Acetaminophen”,
“Acetazolamide” or some other drugs. In our database, there were 471.000 prescriptions
(out of 16 million) which contain “ACETAMINOPHEN 325MG TAB”. So the
probability of this drug to be prescribed in general is 471,000/16,000,000 which equals
to 0.029. But when we have another piece of information, the probability may change. If
we know that the physician is an ophthalmologist, the database says that there are about
186000 prescriptions from ophthalmologists among which “ACETAMINOPHEN
325MG TAB” is prescribed 700 times. So the probability changes to 700/186000 which
equals to 0.00382. In this example, knowing the specialty of the physician changed the
probability to about 1/7 of its previous value. On the other hand, the same calculations
for “ACETAZOLAMIDE 250MG TAB” changes its general probability of 0.00097 to
0.03614, leading to a lift of 37.25. This means that ophthalmologists prescribe this drug
37 times more often than general.

In the same way, other pieces of available information change the probability
upward or downward. With the available data we could access 7 information fields:
x Doctor profile
x Patient
x Specialty
x Previous Drug (i.e. other drugs in the same prescription)
x Previous Drug in the Same Specialty as the doctor
x Drug Simple Name (I.e. all dosage-forms and strengths of the drug)
x Drug Simple Name in the Same Specialty as the doctor

To combine these effects in a weighed manner we constructed a “simulations” table


by selecting random prescriptions from the database, simulating their prescribing and
finding all matching drugs (beginning with first N letters of the drug name) and
calculating lifts for those 7 fields for all matches. The “class label” field for the actual
prescribed drug was set to 1 and for other matching drugs set to 0. We tested 45 different
combinations of those fields and the number of matching characters, N. In this way, the
simulations table was filled with 24 million simulated rows for different configurations.
Then we used Brain.js open-source neural network library
(https://github.com/BrainJS) to set up and train a 5-layer neural network with the back-
propagation learning algorithm.
The neural network was fed with 45 sets of 10000 randomly selected cases in each
set (with almost equal distribution of rows with class labels 0 and 1, 30% for train and
70% for test).
The trained network was exported as a JavaScript function which gets an array of 8
lifts (1st field was drugs general probability and the rest were lifts for 7 different
information fields) and returns a number in (0,1) range denoting the probability of the
drug being prescribed given those lifts.
S.H. Ghasemi et al. / Design and Evaluation of a Smart Medication Recommendation System 131

2.1. Offline Evaluation (Lab study)

2.1.1. Matching-drug level


After training the neural network, the model output was calculated for all rows in the
simulations table. To calculate the model’s performance we used ROC Curve analysis
on the simulations table and set appropriate cut-off points for each configuration group.
This evaluation measures the performance of the model in its lowest level: assigning the
correct label to each of drugs in search result set.

2.1.2. Selected-Drug level


We evaluated the model performance in “selected-drug” level by calculating the rank of
the actually prescribed drug in each set of prescription items in the simulations table
between three sorting methods: sorting by drug name alphabetically, simple sorting by
drugs frequency in general, and sorting by the model output. Then compared the
percentage of the actual drug being in the first rank, top 3, top 5 or top 10 in these three
sorting methods.

2.2. User Evaluation (Prescription level)

For real-user evaluation (prescription level), we implemented a recommender system


into an existing laboratory electronic prescription software. The recommender system
was based on the exported model from the neural network.
Then we asked 26 physicians to write prescriptions for 10 scenarios of patients
with common complications (high blood pressure, migraine, sinusitis, etc.) in a cross-
over design. Participants were randomly assigned to two groups. In the first round, one
of which worked with unmodified software (alphabetical sort) and the other worked with
software plus enhanced sorting of search results. In the next round, the software for
groups was swapped.
In this step, we have to select one of the 45 configurations. Since the patients were
not real patients, there was no patient profile available. Also, physicians who participate
in our study were not among those we had their claims data, so physician profiles were
not available either. For the number of typed characters, we choose 3. So the neural
network model used to enhance the search results in this evaluation step was based on
physician’s specialty and co-occurrence of drugs in same prescription and assuming that
user has entered first three characters of the drugs’ name (model code: Spec-Drug-3).
All physicians’ activities in the software were logged and analyzed. Time to find the
desired drug, number of typed characters to reach the desired drug, and the position of
the selected drug in the list were extracted from the log for analysis. The multi-regression
analysis was used to test the efficacy of the model with control of intervention order.

3. Results

The first phase resulted in an ordered list of 22 information fields. The most important
fields where drug’s form, asking the patient about his/her medication history, considering
other legible items that may exist in the prescription, physician’s specialty, drug’s dosage,
132 S.H. Ghasemi et al. / Design and Evaluation of a Smart Medication Recommendation System

Table 1. Evaluation steps for the model. In the matching-drug level, the result of the model for each search
result, and its concordance with the drug that actually prescribed is the basis of the performance measurement.
In the selected-drug level, the rank of the desired drug within the result set determines the performance. In the
prescription level, some measures of overall performance are recorded and compared between groups.
1. Select random prescriptions from the claims database
2. For each drug item (“desired drug”) in each selected prescription
2.1. Extract first N letters of drugs name => user query
2.2. Find All Drug Names Starting with "user query" => matched drugs
2.3. For each "Matched Drug":
2.3.1. Calculate Lifts in the actual prescription context
2.3.2. Feed lifts into the model and calculate result => “model result”

Matching Drug
2.3.3. Assign a class label: If "model result">= threshold, class=”Yes”, Otherwise class = “No”
2.3.4. Assign “classification result”:

Level
“matching drug” = “desired drug” “matching drug” ӆ “desired drug”
Class label = “Yes” True Positive (TP) False Positive (FP)

Offline Evaluation (Lab Study)


Class label = “No” False Negative (FN) True Negative (TN)
3. Count the number of TPs, TNs, FPs, and FNs
4. Calculate the model performance measures: Sensitivity = TP / (TP + FN), Specify = TN/TN+FP
5. For each drug item (“desired drug”) in selected prescription
5.1. Sort “matching drugs” by name of drugs alphabetically.
5.2. Assign “Alphabet-Top-N” label to “Yes” or “No”, by checking whether the “desired drug” is
among top N drugs of the list or not. (N=1,3,5,10)
5.3. Sort “matching drugs” by “model result” in descending order.

Selected Drug
5.4. Assign “Frequency-Top-N” label to “Yes” or “No”, by checking whether the “desired drug” is

Level
among top N drugs of the list or not. (N=1,3,5,10)
5.5. Sort “matching drugs” by “model result” in descending order.
5.6. Assign “Model-Top-N” label to “Yes” or “No”, by checking whether the “desired drug” is among
top N drugs of the list or not. (N=1,3,5,10)
6. Count the number of “Yes”s and “No”s in each sorting method, for each level of N.
7. Compare differences between each pair of sorting method, for each level of N. (Using statistical
methods, such as Chi-square test)
8. Implement the model into a laboratory e-prescribing application, define some test scenarios.

User Experience
9. Ask physicians to prescribe drugs for test cases, record all users’ activities in the application. Set
the sorting method to “alphabet” or “model”, in a cross-over design. Prescription
Level

10. Measure “time to find the desired drug”, “number of entered characters” to reach the desired drug
and “the position of the selected drug”.
11. Compare differences of measures in the previous step, between the two sorting methods, using
statistical methods such as “multi-regression analysis”.

asking the patient about his/her symptoms, usage instructions, and patient’s age,
respectively.
These fields were categorized into five groups: fields related to (1) patient’s profile,
(2) physician’s profile, (3) physician’s specialty, (4) the medication’s properties and (5)
other medications in the prescription.
In the “matching drug” level, 45 ROC curve analyses were made. Figure 1 shows a
sample of 3 ROC curves for configurations that previous drug and physician’s profile is
used, and the user has entered 2, 3 or 4 letters of the drug name. In this sample, choosing
cut-off points of 0.77 for the first curve (2 letters) results in a sensitivity of 0.931 and
specificity of 0.921. Table 2 shows these performance measures.
In the “selected-drug” level, as shown in figure 2, in alphabetical sort, the desired
drug is in top of the list in only 12% of cases, but when sorting by results of the model
based on the physician’s profile, the first suggested drug is the desired one in about two-
thirds of cases. Sorting on the general frequency better results than alphabetical sort, but
performance is lower than the model. The same difference is seen when comparing the
desired drug being in top 3, top 5 or top 10 suggestions.
S.H. Ghasemi et al. / Design and Evaluation of a Smart Medication Recommendation System 133

Figure 1. ROC Curves for model accuracy, when the physician's profile and previous drug in the prescription
are known, and the user has entered 2, 3 or 4 letters of the drug name. Note that for better illustration, the
horizontal axis range is changed to (0,0.3).

Table 2. Model performance measures in “matching drugs” level, for different combinations of known fields
and the number of entered characters. Minimum and maximum values for each measure are in bold.
SEN: Sensitivity, SPC: Specificity, spec: doctor’s specialty, prevdrug: previous drug in the same prescription.
2 Letters 3 Letters 4 Letters
Known Fields SEN SPC SEN SPC SEN SPC
patient 0.999 0.990 0.999 0.985 0.998 0.986
doctor 0.863 0.945 0.919 0.900 0.930 0.874
prevdrug 0.933 0.865 0.902 0.870 0.811 0.895
Spec 0.902 0.876 0.860 0.853 0.903 0.821
doctor patient 0.995 0.992 0.997 0.988 0.996 0.982
prevdrug patient 0.997 0.992 0.998 0.980 0.998 0.987
prevdrug doctor 0.925 0.942 0.919 0.919 0.917 0.902
spec doctor 0.931 0.921 0.942 0.881 0.950 0.863
spec patient 0.996 0.992 0.997 0.986 0.997 0.986
spec prevdrug 0.926 0.901 0.930 0.860 0.905 0.860
prevdrug doctor patient 0.993 0.993 0.997 0.985 0.996 0.982
spec doctor patient 0.996 0.991 0.996 0.987 0.996 0.987
spec prevdrug doctor 0.950 0.927 0.933 0.910 0.941 0.882
spec prevdrug patient 0.996 0.992 0.997 0.988 0.998 0.984
spec prevdrug doctor patient 0.997 0.991 0.996 0.987 0.993 0.987
134 S.H. Ghasemi et al. / Design and Evaluation of a Smart Medication Recommendation System

100%

99,436%
90%

98,030%

96,972%
91,738%
91,177%
80%

82,161%
Percent of cases

70%
60%

65,496%
65,484%
50%

53,344%
40%

45,265%
12,677%
30% 33,739%
20%
10%
0%
Alphabetical Frequency Model - Doctor
Top Result 12,677% 53,344% 65,496%
Top 3 33,739% 82,161% 91,738%
Top 5 45,265% 91,177% 96,972%
Top 10 65,484% 98,030% 99,436%
Sorting Method

Figure 2. Percentage of cases that the desired drug is in 1st rank, top 3, top 5 or top 10 suggestions. Comparing
alphabetical sorting with sorting by frequency and sorting by model results.

Chi-square tests showed that differences observed between each pair of sorting
methods, in all these four levels, and in all configurations of models were statistically
significant (p < 0.001, df=1).
In the last evaluation step (user experience, prescription-level), the multi-regression
analysis showed that sorting by the model is significantly better than alphabetical sort in
terms of less time to find the desired drug (p<0.001) and fewer number of entered
characters (p<0.01). The position of the selected drug was not significantly different
between sorting methods (p>0.05).

4. Discussion

Recommender systems are widely used in commercial and e-commerce sites, and many
methods for implementing and evaluating these systems are developed [17].
In this project, we used collaborative filtering[18] method to enhance user usability of
an e-prescribing system.
In 2014, Syed-Abdul et al. proposed a smart model that recommends most
commonly prescribed medications in the drop-down menu for a given disease. They used
association between diagnosis and prescribed drugs to calculate Mean Prescription Rank
(MPR) of prescriptions and Coverage Rate (CR) of prescriptions and developed a model
to compute a proactive medication list using these concepts. They showed that this
system can shorten the length of the medication drop-down menu in the electronic
S.H. Ghasemi et al. / Design and Evaluation of a Smart Medication Recommendation System 135

prescription application and concluded that this could improve safety and save time.
They showed that “diagnosis” field can be used in developing recommender systems.
Our study showed that patient’s profile, physician’s profile, physician’s specialty
and other prescribed drugs can also be used alone or in combination with each other to
develop recommender systems for electronic prescribing.
Future researches may combine these fields with diagnosis and reach better results.
Although we could show that recommender system can improve usability by
reducing time and effort to find the desired drug, it’s efficacy to enhance patient safety
should be studied in future researches in physicians’ routine practice.

References

[1] Makary, M.A. and M. Daniel, Medical error-the third leading cause of death in the US. BMJ, 2016. 353:
p. i2139.
[2] Bates, D.W., et al., Incidence of Adverse Drug Events and Potential Adverse Drug Events: Implications
for Prevention. JAMA, 1995. 274(1): p. 29-34.
[3] Tamblyn, R., et al., The medical office of the 21st century (MOXXI): effectiveness of computerized
decision-making support in reducing inappropriate prescribing in primary care. CMAJ: Canadian
Medical Association journal = journal de l'Association medicale canadienne, 2003. 169(6): p. 549-
556.
[4] Eslami, S., N.F. de Keizer, and A. Abu-Hanna, The impact of computerized physician medication order
entry in hospitalized patients--a systematic review. Int J Med Inform, 2008. 77(6): p. 365-76.
[5] Meisenberg, B.R., R.R. Wright, and C.J. Brady-Copertino, Reduction in chemotherapy order errors with
computerized physician order entry. J Oncol Pract, 2014. 10(1): p. e5-9.
[6] Porterfield, A., K. Engelbert, and A. Coustasse, Electronic prescribing: improving the efficiency and
accuracy of prescribing in the ambulatory care setting. Perspectives in Health Information
Management, 2014. 11(Spring).
[7] Reckmann, M.H., et al., Does computerized provider order entry reduce prescribing errors for hospital
inpatients? A systematic review. J Am Med Inform Assoc, 2009. 16(5): p. 613-23.
[8] Weiner, J.P., et al., “e-Iatrogenesis”: the most critical unintended consequence of CPOE and other HIT.
2007. 14(3): p. 387-388.
[9] Nanji, K.C., et al., Errors associated with outpatient computerized prescribing systems. J Am Med Inform
Assoc, 2011. 18(6): p. 767-73.
[10] Campbell, E.M., et al., Types of unintended consequences related to computerized provider order entry.
J Am Med Inform Assoc, 2006. 13(5): p. 547-56.
[11] Koppel, R., et al., Role of computerized physician order entry systems in facilitating medication errors.
Jama, 2005. 293(10): p. 1197-203.
[12] Ratwani, R.M., et al., Electronic health record usability: analysis of the user-centered design processes
of eleven electronic health record vendors. Journal of the American Medical Informatics
Association, 2015. 22(6): p. 1179-1182.
[13] Cohen, J.F., J.-M. Bancilhon, and M.J.S.A.C.J. Jones, South African physicians' acceptance of e-
prescribing technology: An empirical test of a modified UTAUT model. 2013. 50(1): p. 43-54.
[14] Tamblyn, R., et al., The development and evaluation of an integrated electronic prescribing and drug
management system for primary care. Journal of the American Medical Informatics Association:
JAMIA, 2006. 13(2): p. 148-159.
[15] Johnson, K.B., et al., Showing Your Work: Impact of annotating electronic prescriptions with decision
support results. J Biomed Inform, 2010. 43(2): p. 321-5.
[16] Halamka, J., et al., E-Prescribing collaboration in Massachusetts: early experiences from regional
prescribing projects. Journal of the American Medical Informatics Association: JAMIA, 2006.
13(3): p. 239-244.
[17] Portugal, I., P. Alencar, and D.J.E.S.w.A. Cowan, The use of machine learning algorithms in
recommender systems: a systematic review. 2018. 97: p. 205-227.
[18] Ekstrand, M.D., et al., Collaborative filtering recommender systems. 2011. 4(2): p. 81-173.
136 dHealth 2019 – From eHealth to dHealth
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-136

Use of ICPC-2 – Current Status, Strengths


and Weaknesses of the System
Karin MESSER-MISAKa,1
a
Fachhochschule Joanneum, Institut of eHealth, Graz, Austria

Abstract. Background: Classifications of primary care must be as interoperable as


possible with current international health terminology and classifications.
Objectives: The aim of the work was to point out the strengths and weaknesses of
the ICPC-2 coding and to work out recommendations for further dissemination from
the user's point of view. Methods: Selected studies on the experience with the use
of ICPC-2 in several countries were analyzed, a quantitative study on the prevalence
in Austria was carried out. On this basis, a qualitative study was then initiated, which
analyzes the strengths and weaknesses from the perspective of practice. Results:
Although there are recommendations and agreements from a political point of view,
the scope of application in Austria is limited. Conclusion: Due to the reorganization
of primary health care and other health economics requirements, unified
documentation, which is already common in the intramural field, will be essential.

Keywords. ICPC-2, documentation, primary health care, national health programs

1. Introduction

Conventions on the designation and ordering of phenomena of the study area are present
in all sciences in order to make them accessible, communicable and comparable to
systematic research [1]. Classification systems are helpful and sometimes indispensable
from a clinical, scientific, administrative and economic point of view. The
documentation effort is critically questioned by the health service providers [2,4] and it
is also feared that bad coding generates financial disadvantages [3].
After the "countless symptoms and non-disease related conditions that occur in
primary care" [5] were inadequately classified by the ICD 10, the World Organization of
National Colleges, Academies and Academic Associations of General Practitioners /
Family Physicians (WONCA) developed, issued and continuously adapted a coding
system that specifically addresses the needs of general medical and primary care
documentation.

2. Methods

In advance, selected studies from Germany, Switzerland, the Netherlands, Norway and
Australia [6,7] were analyzed for the practical application of ICPC-2 coding. On this
basis, two studies were conducted in Austria during the period from February to

1
Corresponding Author: Karin Messer-Misak, FH Joanneum Gesellschaft mbH, Eckertstr. 30i, 8020
Graz, Austria, E-Mail: [email protected]
K. Messer-Misak / Use of ICPC-2 – Current Status, Strengths and Weaknesses of the System 137

September 2018. The quantitative survey (n=28) was carried out in cooperation with the
"Austrian Forum for Primary Care in the Health Care System" [6, pp.44] in order to
ascertain which of the 28 institutions with primary care character have ICPC-2 in
practical use and to what extent the documentation software used was implemented. The
qualitative resignation in the form of expert interviews [7, pp. 20] (n=4) had the goal of
developing a strengths-weaknesses analysis for use of ICPC2 coding in practice.

3. Results and next steps

The results of the quantitative survey [6, pp41-48] have shown that only four of the
surveyed institutions in Austria have practical experience with the documentation by
means of ICPC-2. The qualitative analysis showed some strengths of the use of ICPC-2
[7] such as e.g. documentation of counseling events, episodes and counseling results on
the symptom level, coding specifically of treatment episodes, clarity due to the small
number of codes and the possibility of documentation of non-medical content (e.g.,
social issues). But there are also some significant weaknesses of the system [7, p. 68],
which should not be ignored, such as the uncertainty about the correct application, a basic
skepticism regarding the benefits, no consistent specifications as how to code, no exact
diagnostic, original diagnostic texts do not correspond to common usage and procedures
are unclear.
In order to ensure an Austria-wide effective implementation, the following steps are
considered as recommendable: In terms of content, it is recommended that an extension
and supplementation of the data will be required [5, p.20-21, 7, p. 68]. At the federal
level, the organizational and legal preparations of a cross-sector coded diagnostic
documentation in the entire outpatient area will be introduced by December 2021. And
at the organizational level, specific training is required as well as a uniform guide how
to manage the integration into common practice software [7, p.68].
Due to the new reorganization of primary care and other health economics
requirements, unified documentation, which is already common in the intramural field,
will be essential.

References

[1] H.-U. Wittchen, Klinische Psychologie & Psychotherapie (Lehrbuch mit Online-Materialien). 2., überarb.
und erw. Auflage. Springer, ISBN 978-3-642-13017-5, Heidelberg 2011, S. 28-53.
[2] E. Gollner, F. Schnabel, Strukturevaluation der medizinischen Dokumentation bei unterschiedlichen
Krankenhausträgern, Innovation durch Evaluation: Impulse setzen durch Evaluationsprozesse im Social-
Profit- und Public Health-Sektor, Forschungsforum der österr. Fachhochschulen. 2/2017, p. 102.
[3] S. Stark, S. Hölzer, Dokumentations- und Kodierprozesse im Spital: Herausforderungen und Massnahmen.
https://saez.ch/de/resource/jf/journal/file/view/article/saez/de/saez.2005.11404/2005-33-1410.pdf/ , CH
Ärztezeitung, 2005; Nr. 32/33, p. 86.
[4] K. Blum, U. Müller, Dokumentationsaufwand im Ärztlichen Dienst der Krankenhäuser.
Repräsentativerhebung des Deutschen Krankenhausinstituts. In: Das Krankenhaus 7/2003, p. 544-548.
[5] WONCA International Classification Committee (Hrsg.). Internationale Klassifizierung der medizinischen
Primärversorgung ICPC-2. Ein Codierungssystem der Allgemeinmedizin. ISBN 3211835504, Springer,
2001, p. 11-12.
[6] T. Kraußler, Analyse der aktuellen Umsetzung einer einheitlichen Diagnose- und Leistungserfassung
mittels ICPC-2 in Österreich. Masterarbeit. Österreich, Graz, 2018.
[7] K. Kahr, Kritische Betrachtung des Einsatzes der ICPS-2 Codierung in der Primärversorgung. Masterarbeit.
Österreich, Graz, 2018.
138 dHealth 2019 – From eHealth to dHealth
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-138

Exploratory Analysis of Motion Tracking


Data in the Rehabilitation Process of
Geriatric Trauma Patients

Amelie ALTENBUCHNERa,1, Sonja HAUGa and Karsten WEBERa


a
Institut für Sozialforschung und Technikfolgenabschätzung (IST), Ostbayerische
Technische Hochschule Regensburg (OTH), Germany

Abstract. Background: This article is based on an ongoing long-term study, in


which customary motion trackers measure steps during rehabilitation of geriatric
trauma patients (Med=86 years). Objectives: Exploring steps after 28 days of
measurement. Finding similarities in the data by running cluster analysis and
formulating linear regressions models to predict steps through time. Methods: Two
types of motion trackers (FitBitAlta HR and Garmin vívofit 3) have been used to
measure patients’ (N=24) steps after hip fracture in two study groups. Cluster
analysis detected three clusters for progress in number of steps that were tested for
group differences with ANOVA. Regression analysis tested models for individual
patients. Results: Three-cluster solutions showed significant differences for the
average amount of steps after 5, 14, 21 and 28 days. Regression models could predict
71 % of the individual patients’ progress in study group 2. Conclusion: The long-
term study will provide more data in the future to examine the three-cluster solution
and to find out in what stage of rehabilitation the measurement of the steps could be
used to predict individual rehabilitation.

Keywords. health monitoring, motion tracker, wearables, rehabilitation, hip fracture

1. Introduction

The concepts of socially assistive technology and self-tracking provide sensor-based


approaches to the issues of demographic change, considering the economic and
geographic situation many elderly live in. This data driven technology could provide an
autonomous life in the familiar environment of the elderly. Currently it is, however, not
utilizable for its intended users and only few devices are within everyday usage. Most
devices are technology driven and not based on the needs and possibilities of the intended
users and not within everyday usage [1].
In addition, it is very challenging to transfer study findings from a laboratory
environment into the living rooms, especially in the geriatric cohort [2]. If an item gets
marketable, users reject many of the implemented applications within the first year of

1
Corresponding Author: Amelie Altenbuchner, Institut für Sozialforschung und
Technikfolgenabschätzung (IST), Ostbayerische Technische Hochschule Regensburg (OTH), Seybothstraße 2,
D-93053 Regensburg, Germany, E-Mail: [email protected]
A. Altenbuchner et al. / Exploratory Analysis of Motion Tracking Data 139

usage [3]. Exploring practical devices for measuring patients’ motion in everyday
conditions is, nevertheless, a current desideratum of gerontology [4].
Rather than developing a new technology system for geriatric patients in the
rehabilitation process after a hip fracture, this study intends to find out how to use an
existing customary device in a new context. This approach is called technology design
[5]. A study group measured gait with customary body-fixed sensors after hip fracture in
the hospital nine hours a day for one to two days and then two weeks later. The authors
found a change in performance and suggest using the median for further analysis, when
comparing groups because of deviations [6]. To continuously measure steps, it is useful
to use a customary motion tracker similar to a wristwatch [2]. Patients are often used to
wear their watch all the time and usually do not forget to put it back on. Furthermore, the
devices can be easily integrated into daily life. However measuring in everyday life
conditions is accompanied by the impossibility to control for any confounder variables.
In this paper, we perform an explanatory statistical analysis (EDA) on the observed
data (steps taken). The aim is to detect cluster groups, that show similarities in the
individual performance of patients and to design a significant linear regression model for
each patient with the aim of predicting steps through time. The analysis is part of an
ongoing long-term study that explores the possibilities of a conventional motion tracker
during rehabilitation.

2. Methods

2.1. Subjects

Subjects are patients on a geriatric trauma ward in a German hospital after surgery for
hip fracture. Patients with this condition or – in case – their legal guardians are invited
to join the study. Recruitment starts postoperatively by obtaining informed consent. Data
collection begins post-surgery. No medical intervention is part of the study. The patients
have a study ID and the online registration of the motion tracker contains a pseudonym.
There are two study groups in the prospective observational study with two different
types of motion trackers. Study group 1 (sg1) includes all hip fracture patients who
agreed to participate (n =10) (31% of all patients) until discharge from hospital (Med=9
days) from 17/11/06 to 18/28/02. The first group was primarily examined to explore
conditions in the research field and to test if the patients agree to use a motion-tracker
(FitBit ALTA HR) at all [2]. In study group 2 (sg2) the same sample procedure includes
14 patients at this point and has been ongoing during the stay in the rehabilitation and in
the domestic situation since 18/06/13 (Med=45 days) with a different motion tracker
(GARMIN vìvofit3).

2.2. Instruments

Sg1 uses the motion tracker FitBit ALTA HR with a battery life of five days. Data
downloading is possible while charging the device by connecting to a personal computer,
laptop or a tablet or via Bluetooth using a smartphone application without recharging.
Sg2 uses the motion tracker GARMIN vìvofit3 with battery life up to one year.
Validity of measurement of customary motion trackers and motion sensors is not
completely solid for elderly patients [7], although some authors assume proven validity
for body worn sensors [6] for elderly living at home or stationary.
140 A. Altenbuchner et al. / Exploratory Analysis of Motion Tracking Data

2.3. Analysis

In both study groups, the providers’ online tools allow data extraction into a .csv-file.
After extraction, this file was imported into IBM SPSS Statistics 24.
Median tests compared sg1 and sg2 as sample groups. Running a k-means cluster
analysis showed similarities in the data of steps for each patient during time. The cluster
groups were tested as factors for ANOVA, to find out, if there are significant mean
differences in patients belonging to the clusters.
To detect individuals within the clusters for whom time is a significant predictor of
steps, linear regression analysis is run for every patient.

3. Results

3.1. Measurement Descriptive Statistics

Table 1 shows the descriptive statistics of patients (N=24) in sg1 (n=10) and sg2 (n=14).
26% of the patients participating in the study are male. Patients are in the geriatric
age of 86 ±6.8 years on average. The median test did not show a significant difference
between the age medians of sg1 (Med=85.5) and sg2 (Med=86.0) [z=.120, p=n.s, n=24].

Table 1. Sample of sg1 and sg2.


Features Sg1 Sg 2 Total
FitBit Alta HR Vívofit 3 GARMIN
(ID 1-10) (ID11-24)
Male 2 3 5
Female 8 11 19
Age M=85.9 ±7.7 M=86.4±6.4 M=86.2±6.8
Length of measurement in M=8.6±2.4 M=55.9±49.9 M=29.61±40.2
days
Length of hospital stay in M=15.8±3.6 M=18.1±3.6 M=16.9±4.3
days
Gap admission and M=5.3±2.7 M=12.3±1.6 M=8.0±4.3
measurement in days

The average length of measurement (M=29.6±40.2) shows a high dispersion,


because after sg1 (M=8.6±2.4) the change of instrument for sg2 (M=55.9±49.9) allows a
long-term measurement of up to one year. Still the standard deviation in sg2 is relatively
high as there is an individual length of measurement for every patient.
Due to field conditions as well as ethical and legal reasons it is impossible to start
with data collection right after surgery. The gap between admission and the beginning of
measurement of steps differs in the sg1 (M=5.3±2.7; Med=5.5) and sg2 (M=12.3±1.6;
Med=9.5) [z=6.171, p=.013, n=24]. As sg2 is a long-term study, the time necessary to
inform and educate about the study itself and the time for consideration and informed
consent is longer. Patients in sg2 stayed on average 2,3 days longer in hospital than
patients in sg1 and in addition longer than the total mean (M=16.9±4.3). The median test
showed, that the distributions of hospital days of sg1 (Med=16) and sg2 (Med=18) are
similar [z=2.253, p=n.s., n=23]. The mean difference in hospital days of sg1 and sg2 is
closer dropping two outliers: one patient in sg1 deceased and another patient in sg2 had
to stay four weeks.
A. Altenbuchner et al. / Exploratory Analysis of Motion Tracking Data 141

Table 3 shows the distribution of steps for the patients during individual lengths of
data collection.

3.2. Cluster Analysis

K-means cluster analysis suggests three clusters (Table 2) for the average steps per day
after five days of measurement (s5) (N=24) [F(2,21)=76.096, p≤.001].

Table 2. Cluster solutions.


Cluster N of patients in cluster
after … days of data collection
…5 days … 14 days …21 days …28 days
1 3b 1a 2a 4b
2 3a 4b 8c 1a
3 18c 7c 2b 6c
Total 24 12 12 11
a
Progress high, b Progress middle, c Progress low

Table 3. Patients’ steps.


ID N Mean SD Average steps per day after…
(days)
…5 days …14 …21 …28
days days days
1 5 997 302 997
2 8 1182 495 1008
3 10 204 120 202
4 10 1864 676 1309
5 12 818 325 815
6 7 1361 969 721
7 10 242 124 245
8 5 67 32 67
9 11 218 130 105
10 8 560 269 511
11 59 1456 656 163
12 42 172 186 107 165 203 197
13 176 2931 2692 368 302 244 183
14 21 255 502 24 295 255
15 165 42 76 98 126 156 178
16 73 2399 1056 178 902 1371 1720
17 36 98 126 163 211 161 126
18 176 4827 3036 243 426 446 583
19 8 75 48 61
20 46 493 341 54 121 213 347
21 57 1491 1081 401 566 1044 1241
22 58 863 358 242 419 637 702
23 50 1287 896 394 514 718 791
24 34 123 144 45 96 76 81

The three-cluster solution explains 86,7% of variation of s5 (R²adjust=.867). Games-


Howell’s post-hoc test states, that cluster 3 (M=175.59 steps ±120.3) is significant
different from cluster 1 (M=682.1 steps ±155.5) and cluster 2 (M=1104.5 steps ±176.8),
whereas cluster 1 and 2 show no significant differences from one another.
142 A. Altenbuchner et al. / Exploratory Analysis of Motion Tracking Data

For the average steps per day after 14 days of measurement (s14) (N=12)
[F(2,9)=43.449, p≤.001], after 21 days (s21) (N=12) [F(2,9)=54.462, p≤.001] and 28
days (s28) (N=12) [F(2,8)=34.360, p≤.001] only sg2 is considered for the analysis.
In s14 88,5% of the variation is explained through the three-cluster solution
(R²adjust=.885). Cluster 1 (M=902.2 steps) only contains one person, cluster 2 (M=481.2
steps ±71.3) and cluster 3 (M=188 steps ±236.5) show a significant difference from each
other.
In s21 90,7% of the variation is explained through the three-cluster solution
(R²adjust=.924). Games-Howell’s post-hoc test states, that cluster 2 (M=219.3 steps
±107.8) and cluster 3 (M=677.6 steps ±57.7) are significant different from each other
and cluster 1 (M=1207.4 steps ±231.7) shows no significant difference to the other
clusters.
In s28 89,6% of the variation is explained through the three-cluster solution
(R²adjust=.896). Cluster 2 (M=1720 steps; same patient as in s14) only contains one person,
cluster 1 (M=829.1 steps ±287.4) and cluster 3 (M=185 steps ±90.3) show a significant
difference from each other.

3.3. Linear Regression Analysis

Linear regression analysis shows that in sg1 a linear regression model, that predicts steps
through the days of measurement is significant for 30% of all patients [N=3; ID=4, 6, 9].
For the other patients in sg1 this is not the case [N=7; ID=1, 2, 3, 5, 7, 8, 9]. In sg2
the same regression model is significant for 71% of all patients [N=10; ID=11, 13, 15,
16, 17 18, 20, 21, 22, 23]. For 29% of all patients [N=4; ID=12, 14, 19, 24] the model
cannot predict the steps.
The following rows show the significant regression models for the patients.

ID 4: F(1,10)=6.023, p=.034, R²adjust=.313; , f=0.7; y=114.871+ .613×x


ID 6: F(1,8)=9.309, p=.016, R²adjust=.48; , f=0.9; y=234.709+ .733×x
ID 9: F(1,8)=6.779, p=.031, R²adjust=.391; , f=0.8; y=30.733+ .677×x
ID 11: F(1,17)=20.520, p≤.001, R²adjust=.520; , f=1.04; y=87.022+ 19.21×x
ID 13: F(1,26)=29.509, p≤.001, R²adjust=.514, f=1.02.; y=-16.655+ -.29×x
ID 15: F(1,26)=14.528, p=.001, R²adjust=.334, f=.7; y=6.686+ .599×x
ID 16: F(1,26)=63.713, p≤.001, R²adjust=.699, f=1.5; y=114.271+ .843×x
ID 17: F(1,26)=17.896, p≤.001, R²adjust=.385, f=.79; y=-10.105+ -.639×x
ID 18: F(1,26)=25.647, p≤.001, R²adjust=.477, f=.95; y=29.627+ .705×x
ID 20: F(1,26)=54.303, p≤.001, R²adjust=.664, f=1.4; y=33.493+ .822×x
ID 21: F(1,26)=6.864, p=.014, R²adjust=.178, f=.46; y=63.103+ .457×x
ID 22: F(1,26)=24.551, p≤.001, R²adjust=.466 , f=.92; y=31.214+ .697×x
ID 23: F(1,26)=22.601, p≤.001, R²adjust=.444, f=.89; y=37.578+ .682×x

4. Discussion

The main goal of this article is an exploration of data a motion tracker can provide for
the sample of geriatric trauma patients. A target group for whom no general statistical
record about their average number of steps exists so far. Consequently, the first aim was
to detect similarities and patterns in the number of steps individuals take. The second
A. Altenbuchner et al. / Exploratory Analysis of Motion Tracking Data 143

aim was to find out if it could be possible to predict steps through time in the future, by
formulating linear regression models.
Overall, results present the development of mobility after hip fracture surgery of 24
individuals. At the beginning of data collection 75% of patients fit into the same cluster.
Through the rehabilitation process the distribution gets wider, whereas the largest
number of patients are together in one cluster continuously, the one with low progress
compared to the other clusters. This and the fact, that in s14 and s21 only one patient
defines an own cluster, may speak for a two cluster solution. Still the difference between
individuals is high and 50% of patients in sg2 are continuing in the long-term study. So
cluster analysis needs to be repeated in the future with the available long-term data. The
goal is to find out if a three-cluster model fits the data and if this result could construct a
scientific hypothesis for future research old-age rehabilitation.
Information about the amount of steps before the incident that led to hip fracture is
unfortunately not available. Consequently not the actual amount of steps, but the decline
and incline rate could be included in further analysis in the future.
During data collection only a few interruptions occurred. This reflects the
acceptance and usability of the technology / motion-trackers as they are similar to a
wristwatch. Still a great challenge is the cancellation and therefore the width in length of
measurement, due to health conditions or other events in the patients or caregivers life.
This leads to the fact, that for 29% of patients in sg2 the regression model was not
significant. Who are these patients? ID 12 deceased during the study; caregivers
cancelled the participation for ID 14, a patient with dementia diagnosis. The patient with
ID 19 cancelled the study after eight days and the patient with ID 24 just started the
ongoing study. All other patients show middle to high effect sizes in the regression
models. For rehabilitants and their caregivers it could be interesting to monitor a trend
of physical motion. A suggestion for further research is to focus on a statistical
connection between steps and geriatric assessment scores and to detect predictors for the
quality of rehabilitation. In the ongoing project, our research will focus on validity
aspects for the target group. We plan to estimate the validity for walking with a rollator
or a walking stick.
Additionally this leads to another important issue for patients: The effect of motion
feedback on motion. During the ongoing study we evaluate feedback.
The feedback issue implies the imperative of data literacy [8]: On the one hand,
passive data measured by an objective wearable provides an insight into daily activities
lacking of subjectivity. On the other hand, the visualisation of numbers, representing
activity, is a skewed reflection that might be hard to interpret. The WHO
recommendations suggest almost the same level of physical activity for people over 65
as for people between 18 and 65 years [9]. The differentiation between youngsters,
middle old and elderly persons lies in the individual cardiac frequency [10]. For people
who have never been so much into sports their whole life, as a majority of the German
population for example [11], these requirements should be hard to meet. Even suffering
from physical limitation, moderate activity at least three times a week is the suggestion
for preventing falls [12] and even mostly inactive elderlies benefit from any activity
integration in their daily routine [12]. In addition, dementia patients benefit from physical
activity in therapy [13, 14]. By self-tracking or caregiver-tracking the data base could
provide individual information [8] about the objective state of motion. Through that, it
is possible to interpret the data, even without medical expert knowledge by asking and
answering: Is there a decline, incline or consistency through a certain period? Does the
144 A. Altenbuchner et al. / Exploratory Analysis of Motion Tracking Data

data fit the subjective impressions? Does the data fit the general state of health? What
are the rehabilitation goals?
Digital self-tracking could lead to a loss of control and autonomy [8] and is criticised
for an economically body optimizing aspect and the fact that the data is sold by the
providers [15]. It is a relatively affordable tool for regaining power through knowledge
about the own body and self – in this case about motion and mobility. In the life of the
elderly these entities are associated with rehabilitation, health, participation, quality of
life and hence autonomy [16, 4]. This project includes the topics of quality of life and
motivation by feedback [2] in the future.
Finally physical activity […] mitigate[s] the mortality risks [17], namely all kinds of
physically activity is superior to inactivity [12]. Motion-tracking data reflects motion and
sets the foundation stone for developing complex personalized interventions in the future
[4].

References

[1] K. Weber, Demografie, Technik, Ethik: Methoden der Demografie, Technik, Ethik: Methoden der
normativen Gestaltung technisch gestützter Pflege, Pflege & Gesellschaft 22(4) (2017), 338–352.
[2] A. Altenbuchner, S. Haug, R. Kretschmer, K. Weber, How to measure physical motion and the impact of
individualized feedback in the field of rehabilitation of geriatric trauma patients, in: Health Informatics
Meets eHealth. G. Schreier and D. Hayn, open access IOS press, 2018. pp. 226–232.
[3] K. Gurley, FA. Norcio, A systematic review of technologies designed to improve and assist cognitive
decline for both the current and future aging populations, in: Internationalization, design and global
development: Third international conference, IDGD 2009, held as part of HCI International 2009, San
Diego, CA, USA, July 19–24, N. Aykin, Berlin, 2009. pp. 156–63.
[4] A. Barth, G. Doblhammer, Physische Mobilität und Gesundheit im Alter, in: Die transformative Macht der
Demografie. T. Mayer, Wiesbaden, 2017. pp. 207–244.
[5] G. Banse, R. Hauser, Technik und Kultur - ein Überblick, in: Technik und Kultur: Bedingungs- und
Beeinflussungsverhältnisse. A. Grunwald, G. Banse, Karlsruhe, 2010. pp. 17–39.
[6] P. Benzinger, U. Lindemann, C. Becker, K. Aminian, M. Jamour, S.E. Flick. Geriatric rehabilitation after
hip fracture. Role of body-fixed sensor measurements of physical activity. Z Gerontol Geriatr 47(3)
(2014), 236–42.
[7] B. Grimm, S. Bolink. Evaluating physical function and activity in the elderly patient using wearable motion
sensors. EFORT Open Rev1(5) (2017), 112–120.
[8] S. Duttweiler, J.-H. Passoth, Self-Tracking als Optimierungsprojekt? in: Leben nach Zahlen – Self-
Tracking als Optimierungsprojekt? S. Duttweiler, R. Gugutzer, J.-H. Passoth, Bielefeld, 2016. pp. 9–42.
[9] World Health Organization, Global recommendations on physical activity for health, Geneva, 2010.
[10] A. Rütten, K. Abu-Omar, T. Lampert, T. Ziese, Körperliche Aktivität, Gesundheitsberichterstattung des
Bundes, Vol. 26, Berlin, 2005.
[11] World Health Organization, What is Moderate-intensity and Vigorous-intensity Physical Activity? -
Intensity of physical activity, http://www.who.int/dietphysicalactivity/physical_activity_intensity/en/,
last access: 22.01.2019.
[12] K. Pfeifer, W. Banzer, E. Füzéki, W. Geidl, C. Graf, V. Hartung, et al., Empfehlungen für Bewegung, in:
Nationale Empfehlungen für Bewegung und Bewegungsförderung. A. Rütten, K. Pfeiffer, Erlangen,
2016. pp. 17–64.
[13] L. Clare, Rehabilitation for People Living with Dementia: a Practical Framework of Positive Support,
PLoS medicine 14 (2017), e1002245.
[14] H. Bork, Rehabilitation nach hüft- und knieendoprothetischer Versorgung älterer Menschen, Orthopäde
46(1) (2017), 69–77.
[15] S. Schaupp, Wir nennen es flexible Selbstkontrolle. Self-Tracking als Selbsttechnologie des
kybernetischen Kapitalismus, in: Leben nach Zahlen – Self-Tracking als Optimierungsprojekt? S.
Duttweiler, R. Gugutzer, J.-H. Passoth, Bielefeld, 2016. pp. 63–86.
A. Altenbuchner et al. / Exploratory Analysis of Motion Tracking Data 145

[16] S. Förch, R. Kretschmer, T. Haufe, J. Plath, E. Mayr, Orthogeriatric Combined Management of Elderly
Patients With Proximal Femoral Fracture: Results of a 1-Year Follow-Up. Geriatr Orthop Surg Rehabil
8(2) (2017), 109–114.
[17] K. M. Diaz, A.T. Duran, N. Colabianchi, S.E. Judd, V.J. Howard, S.P. Hooker, Effects on Mortality of
Replacing Sedentary Time With Short Sedentary Bouts or Physical Activity: A National Cohort Study,
American Journal of Epidemiology kwy271 (2019), 1-7. DOI:10.1093/aje/kwy271.
146 dHealth 2019 – From eHealth to dHealth
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-146

Requirements for a Telemedicine Center to


Monitor LVAD Patients
Nils REISSa,1, Kirby Kristin WEGNERa, Jan-Dirk HOFFMANNa,
Sebastian SCHULTE EISTRUPa, Udo BOEKENb, Michiel MORSHUISc
and Thomas SCHMIDTa
a
Schüchtermann-Klinik Bad Rothenfelde, Bad Rothenfelde, Germany
b
Universitätsklinik Düsseldorf, Düsseldorf, Germany
c
Herz-und Diabeteszentrum NRW, Bad Oeynhausen, Germany

Abstract. E-health, especially telemedicine, has undergone a remarkably dynamic


development over the last decade. Most experience is currently in the field of
telemedical care for heart failure (HF) patients. However, HF patients with an
implanted left-ventricular assist device (LVAD) have been more or less excluded
from consistent telemonitoring until now. The majority of complications
associated with LVAD therapy occur during the post-implantation phase. Effective
outpatient management is therefore the key to improving long-term outcome of
LVAD patients. Thereby, implementation of a telemedicine center for close
monitoring could play an important role, e.g. through early detection of
complications. This study provides insights into structural, staff and spatial
requirements for a telemedicine center to monitor the special group of LVAD
patients, based on comprehensive literature research and expert interviews.

Keywords: heart failure, left-ventricular assist device, telemonitoring, remote


monitoring, disease management

1. Introduction

LVAD implantation as an alternative to heart transplantation is known to improve


survival, functional capacity, and quality of life in heart failure patients [1, 2]. Today it
is possible for LVAD patients to be discharged home. Survival following LVAD
implantation can now be 10 years or more [3].
Despite all the technological progress, a significant number of severe
complications remain, with a high rate of readmission to the implanting center in the
long-term follow-up [4–10]. The most frequent complications are renewed heart failure,
thromboembolism, hemorrhage, infection (especially driveline infections) and right-
heart failure [4–10].
Nowadays, aftercare for this special patient group usually comprises outpatient
visits every 3 months [11, 12]. Between these visits, patients are predominantly left to
their own devices and largely manage the LVAD system (Figure 1) themselves. The
quality of this self-management crucially depends on patient compliance [13]. Should

1
Corresponding Author: Nils Reiss, Schüchtermann-Klinik Bad Rothenfelde, Institute for
Cardiovascular Research, Ulmenallee 5-11, 49214 Bad Rothenfelde, Germany, E-Mail:
[email protected]
N. Reiss et al. / Requirements for a Telemedicine Center to Monitor LVAD Patients 147

patients have questions or problems, they can contact staff at the implanting center by
telephone at any time. Ultimately, however, aftercare in the post-hospital phases is
insufficient and urgently in need of improvement.

Figure 1. LVAD patient with internal and external equipment [14] (1- pump, 2- batteries, 3- driveline, 4-
controller, with permission of Abbott®).

In the field of heart failure therapy (without LVAD), first experiences have been
made with telemonitoring approaches in the last few years [15–19]. Implementation of
telemedicine centers facilitates remote medical services. Telemedicine centers improve
the efficiency of treatment and provide patients with a greater sense of safety by
assuring permanent contact with qualified medical staff. However, the approaches
applied so far are insensitive to LVAD patients, and new strategies are indispensable
for this patient group [20, 21]. To date, there are no telemedicine centers available for
monitoring heart failure patients supported by LVAD.
In the following paper, structural, staff and spatial requirements for a telemedicine
center to monitor the special group of LVAD patients are described, based on
comprehensive literature research and expert interviews.
148 N. Reiss et al. / Requirements for a Telemedicine Center to Monitor LVAD Patients

2. Methods

2.1. Comprehensive literature research

For the systematic recording, organization and administration of all relevant literature
sources, a systematic literature search was carried out at the beginning of the study. In
order to select relevant studies, inclusion and exclusion criteria were established using
the PICO schema [22] and advanced elements. Based on the PICO schema, the
following parameters were considered in more detail in order to subsequently define
corresponding inclusion or exclusion criteria (Table 1).

Table 1. Inclusion and exclusion criteria for systematic literature search.

Inclusion (I) - and Exclusion(E) -Criteria

I1 Population Adult heart failure patients treated by LVAD


E1 Children and adolescents < 18 years
I2 Intervention Telemonitoring by Telemedicine Center
E2 Language Non-German and non-English language publications
I3 Type of study Without restriction
I4 Study duration Without restriction

2.2. Expert interviews

A qualitative investigation based on guided interview and focus group techniques was
conducted at three German heart centers with caregiver experts. The expert interviews
were conducted as openly as possible since the goal of this method was a
comprehensive survey of expert knowledge regarding the research topic. Guidance
interviews are non-standard interviews that work with given topics and a list of
questions – known as the guide.
In line with the research question of the present work, a guideline was developed
for the expert interviews with corresponding topics and questions. The guide was
subdivided into five subject blocks with a total of 14 principal questions and 9
subordinate questions, corresponding to 23 categories. The 5 blocks were:

ƒ Initial questions (8 categories)


ƒ Structural requirements (6 categories)
ƒ Personnel requirements (3 categories)
ƒ Spatial requirements (2 categories)
ƒ Final questions (4 categories)

The expert interviews were intended to provide a first insight into the field of
structural, personnel and spatial requirements in order to derive conclusions for the
practical implementation of a telemedicine center.
The selection of experts is an essential decision in research design because it
decides the nature and quality of the information. Experts are in this scenario people
who, based on their work with LVAD patients, have the expertise to make statements
about the requirements of planned centers. The selected group of experts thus includes
N. Reiss et al. / Requirements for a Telemedicine Center to Monitor LVAD Patients 149

persons who work closely with LVAD patients, who have the required experience, and
who have a comprehensive overview of the overall care situation of this patient
clientele.
These include, on the one hand, implanting physicians and physicians with pre-
and aftercare patients. On the other hand, VAD coordinators and VAD nurses with
appropriate specialist training, working intensively with LVAD patients, are also
involved.
The interviews were conducted personally, face-to-face, by the same interviewer.
Face-to-face interviews are characterized by high information content and a good
controllability of the conversation. The individual expert interviews were transcribed
promptly after implementation using the transcription program f5 in order to fully map
and capture the information received. The transcripts of the expert interviews were
used as base material for the data analysis and evaluated using the qualitative content
analysis according to Mayring [23].

3. Results

3.1. Literature research

In summary, there were no published studies available to answer the questions posed
by the research topic. Thus, the current state of research must be assessed as bad or
non-existent, clarifying the crucial importance of research regarding this topic.
A reduced search strategy was therefore then used, with 450 hits achieved in total
(Table 2). By means of the systematic literature research or the reduced search strategy
and the supplementary unsystematic research by hand, publications on individual
elements of the search strategy were found. These were helpful for development of the
guideline and provided information which had an impact on the topic.

Table 2. Research strategy PubMed.

Source: PubMed
Date: 30.04.2018
Filter: humans, english or german, adult
# Term Result
1 “left ventricular assist device*“ 4.768
2 LVAD 3.460
3 #1 OR #2 5.694
4 telemonitoring 1.140
5 “remote monitoring“ 1.420
6 “telemedical monitoring“ 20
7 #4 OR #5 OR #6 2470
8 “telemedical service cent*“ 27
9 “telemedicine cent*“ 4.985
10 “telehealth cent*“ 5.286
11 #8 OR #9 OR #10 5290
12 #3 AND #7 AND #11 Filters: Humans; English; German; Adult: 19+ years 0
13 #3 AND #11 Filters: Humans; English; German; Adult: 19+ years 0
14 #3 AND #7 Filters: Humans; English; German; Adult: 19+ years 450
150 N. Reiss et al. / Requirements for a Telemedicine Center to Monitor LVAD Patients

3.2. Expert interviews

The multicenter interviews were carried out at three different German clinics in Lower
Saxony and North Rhine-Westphalia. All clinics had been implanting LVAD systems
for many years and also offered aftercare for discharged patients. From the 3 hospitals,
a total of 11 experts (6 physicians and 5 VAD coordinators or VAD nurses, 91% male)
took part in the interviews. The mean experience of the interviewed experts with
LVAD patients can be assessed as very good, spanning 12.9 ± 7.5 years (Table 3). All
interviews could be conducted through to the end without interruption. Overall, the
interviews led to a total of 04:29:09 hours of soundtrack and spoken data, and single
interviews lasted between 00:14:15 and 00:42:32 hours (mean 00:24:28 hours).

Table 3. LVAD experiences of the individual clinics and interviewed experts.

Clinic 1 Clinic 2 Clinic 3


LVAD implantations per year ~ 45-50 ~ 100-120 ~ 30-40
LVAD outpatient treatment (pts) ~ 100 ~ 250 ~ 45
LVAD experience of the interviewed 11.5 20 11
experts (yrs)

The expertise and experience in the field of telemedicine varied among the LVAD
experts interviewed, ranging from no knowledge to expert knowledge. The median was
within the range of basic knowledge of telemedicine (Figure 2).

Physician VAD nurse / VAD coordinator

expert knowledge

good knowledge

basic knowledge

theoretical knowledge

no knowledge

0 1 2 3 4
number of recipients

Figure 2. Knowledge of individual experts in the field of telemonitoring (n=11).

3.3. Generated hypothesis

Based on the findings from the five interview blocks mentioned, corresponding to 23
categories, 10 hypotheses were generated according to the answers given by the experts
(Table 4). The first hypothesis is that each LVAD-implanting clinic should monitor its
LVAD patients via its own telemedicine center. The second hypothesis is as follows: in
order to better interpret the transmitted data and parameters, patients should be known
to the telemedicine center. In addition, based on the findings of this thematic block, a
N. Reiss et al. / Requirements for a Telemedicine Center to Monitor LVAD Patients 151

third hypothesis is that a minimum of 50 LVAD patients is necessary to implement a


telemonitoring program. The fourth hypothesis is that telemonitoring of LVAD patients
by a telemedicine center leads to a reduction in operating expenses in other areas of
LVAD patient aftercare. Nonetheless, the fifth hypothesis is that routine outpatient
appointments could not be completely avoided, although the sixth hypothesis indicates
that after implementation of a telemedicine center routine outpatient appointments
could be reduced.
The subject block “personnel requirements” also gave rise to extensive insights
during the expert interviews. First of all, based on the interpreted findings, it is
hypothesized that the professional groups physicians, VAD coordinators and VAD
nurses must be involved. Physicians and VAD coordinators are called for by all the
experts, with the majority of experts further calling for VAD nurses. This consensus
makes it clear that, ideally, the occupational groups should be involved which are
currently also involved in the care and aftercare of LVAD patients. Moreover, the
hypothesis is put forward that additional staff needs to be hired to monitor LVAD
patients through a telemedicine center. In the thematic block “personnel requirements”,
it is hypothesized that the telemedicine center does not have to be occupied 24/7 and
can be covered by staff on call. This hypothesis is based on that an access to the patient
data is possible from the staff´s home. Furthermore, alerts can be sent to the
smartphone of staff members regardless of where their whereabouts are.
The thematic block “spatial requirements” gave rise to the following hypothesis:
the telemedicine center must be spatially connected to the implanting center.
In summary, all experts agreed that close telemonitoring of LVAD patients should
lead to improvements in quality of life and quality of aftercare in this special patient
group. According to this expert opinion, telemonitoring of LVAD patients should be
feasible. The overall results confirm the potential of this form of care.

Table 4. Generated hypotheses.

Hypothesis Thematic block Hypothesis


H1 Structural Each LVAD implanting center should have a telemedicine center to
requirements monitor LVAD patients.
H2 Structural For interpretation of transferred data and parameters, LVAD
requirements patients should be known to the physicians.
H3 Structural A minimum number of 50 LVAD patients is necessary for the
requirements implementation of a telemonitoring program.
H4 Structural Telemonitoring of LVAD patients by a telemedicine center leads to
requirements a reduction in operating expenses in other areas of LVAD patient
aftercare.
H5 Structural Outpatient appointments for LVAD patients cannot be completely
requirements avoided through telemonitoring.
H6 Structural Outpatient appointments for LVAD patients can be reduced
requirements through telemonitoring.
H7 Personnel Physicians, VAD coordinators and VAD nurses should all be
requirements involved in the running of a telemedicine center for LVAD patients.
H8 Personnel For telemonitoring of LVAD patients through a telemedicine
requirements center, additional staff must be hired.
H9 Personnel A telemedicine center does not have to be occupied 24/7 and can
requirements be covered by staff on call.
H10 Spatial A telemedicine center must be spatially connected to the
requirements implanting center.
152 N. Reiss et al. / Requirements for a Telemedicine Center to Monitor LVAD Patients

Qualitative investigation is a sufficient tool for learning about user requirements


(clinical experts; physicians, VAD coordinators or VAD nurses) when setting up a
telemedicine center. The results achieved in this study gave rise to 10 hypotheses
which could/should be taken into account when implementing telemedicine centers for
LVAD patients in the future. The telemonitoring of LVAD patients seems to be
particularly suited to close monitoring because the patients already have numerous
sensors as a result of the implanted devices. This means that significantly more
conclusive parameters can be generated than in the group of heart failure patients
without such devices. The quality of aftercare should be elevated to a different level as
a result. The early detection of potentially severe complications in the early stages
should lead to a reduction in invasive treatments and as a consequence also to a
reduction in treatment costs.
Nevertheless, there are still obstacles which must be overcome before telemedicine
centers can be implemented. There are many issues of concern regarding the legal and
ethical aspects of telemedicine (confidentiality and privacy). Finally, reimbursement
for care provided using a telemedicine service still requires clarification.
Future studies are needed in order to demonstrate the extent to which the
elaborated (still theoretical) requirements hold true in practice, and which aspects need
additional attention.

Acknowledgment

This project is funded by the German Federal Ministry of Education and Research
(BMBF) within the framework of the ITEA 3 Project Medolution (14003).

4. References

1. Gustafsson F, Rogers JG. Left ventricular assist device therapy in advanced heart failure: Patient
selection and outcomes. Eur J Heart Fail. 2017;19:595–602. doi:10.1002/ejhf.779.
2. Slaughter MS, Pagani FD, Rogers JG, Miller LW, Sun B, Russell SD, et al. Clinical management of
continuous-flow left ventricular assist devices in advanced heart failure. J Heart Lung Transplant.
2010;29:S1-39. doi:10.1016/j.healun.2010.01.011.
3. Pinney SP, Anyanwu AC, Lala A, Teuteberg JJ, Uriel N, Mehra MR. Left Ventricular Assist Devices
for Lifelong Support. J Am Coll Cardiol. 2017;69:2845–61. doi:10.1016/j.jacc.2017.04.031.
4. Hernandez RE, Singh SK, Hoang DT, Ali SW, Elayda MA, Mallidi HR, et al. Present-Day Hospital
Readmissions after Left Ventricular Assist Device Implantation: A Large Single-Center Study. Tex
Heart Inst J. 2015;42:419–29. doi:10.14503/THIJ-14-4971.
5. Kimura M, Nawata K, Kinoshita O, Yamauchi H, Hoshino Y, Hatano M, et al. Readmissions after
continuous flow left ventricular assist device implantation. J Artif Organs. 2017;20:311–7.
doi:10.1007/s10047-017-0975-4.
6. Smedira NG, Hoercher KJ, Lima B, Mountis MM, Starling RC, Thuita L, et al. Unplanned hospital
readmissions after HeartMate II implantation: Frequency, risk factors, and impact on resource use and
survival. JACC Heart Fail. 2013;1:31–9. doi:10.1016/j.jchf.2012.11.001.
7. Hasin T, Marmor Y, Kremers W, Topilsky Y, Severson CJ, Schirger JA, et al. Readmissions after
implantation of axial flow left ventricular assist device. J Am Coll Cardiol. 2013;61:153–63.
doi:10.1016/j.jacc.2012.09.041.
8. Akhter SA, Badami A, Murray M, Kohmoto T, Lozonschi L, Osaki S, Lushaj EB. Hospital
Readmissions After Continuous-Flow Left Ventricular Assist Device Implantation: Incidence, Causes,
and Cost Analysis. Ann Thorac Surg. 2015;100:884–9. doi:10.1016/j.athoracsur.2015.03.010.
9. Forest SJ, Bello R, Friedmann P, Casazza D, Nucci C, Shin JJ, et al. Readmissions after ventricular
assist device: Etiologies, patterns, and days out of hospital. Ann Thorac Surg. 2013;95:1276–81.
doi:10.1016/j.athoracsur.2012.12.039.
N. Reiss et al. / Requirements for a Telemedicine Center to Monitor LVAD Patients 153

10. Haglund NA, Davis ME, Tricarico NM, Keebler ME, Maltais S. Readmissions After Continuous Flow
Left Ventricular Assist Device Implantation: Differences Observed Between Two Contemporary
Device Types. ASAIO J. 2015;61:410–6. doi:10.1097/MAT.0000000000000218.
11. Schmidt T, Reiss N, Hoffmann JD, Feldmann C, Deniz E, Roske K, et al. Post-Hospital Care in LVAD
Patients - Experiences of Two Large German Heart Centers. J Heart Lung Transplant. 2017;36:S436.
doi:10.1016/j.healun.2017.01.1248.
12. Jakovljevic DG, McDiarmid A, Hallsworth K, Seferovic PM, Ninkovic VM, Parry G, et al. Effect of
left ventricular assist device implantation and heart transplantation on habitual physical activity and
quality of life. Am J Cardiol. 2014;114:88–93. doi:10.1016/j.amjcard.2014.04.008.
13. Casida JM, Wu H-S, Abshire M, Ghosh B, Yang JJ. Cognition and adherence are self-management
factors predicting the quality of life of adults living with a left ventricular assist device. J Heart Lung
Transplant. 2017;36:325–30. doi:10.1016/j.healun.2016.08.023.
14. Abbott. HeartMate3 System. https://www.heartmate.com/app_themes/patient/images/img27.jpg.
Accessed 24 Jan 2019.
15. Hindricks G, Taborsky M, Glikson M, Heinrich U, Schumacher B, Katz A, et al. Implant-based
multiparameter telemonitoring of patients with heart failure (IN-TIME): A randomised controlled trial.
Lancet. 2014;384:583–90. doi:10.1016/S0140-6736(14)61176-4.
16. Koehler F, Koehler K, Deckwart O, Prescher S, Wegscheider K, Kirwan B-A, et al. Efficacy of
telemedical interventional management in patients with heart failure (TIM-HF2): A randomised,
controlled, parallel-group, unmasked trial. Lancet. 2018;392:1047–57. doi:10.1016/S0140-
6736(18)31880-4.
17. Böhm M, Drexler H, Oswald H, Rybak K, Bosch R, Butter C, et al. Fluid status telemedicine alerts for
heart failure: A randomized controlled trial. Eur Heart J. 2016;37:3154–63.
doi:10.1093/eurheartj/ehw099.
18. Abraham WT, Adamson PB, Bourge RC, Aaron MF, Costanzo MR, Stevenson LW, et al. Wireless
pulmonary artery haemodynamic monitoring in chronic heart failure: A randomised controlled trial.
Lancet. 2011;377:658–66. doi:10.1016/S0140-6736(11)60101-3.
19. Abraham WT, Adamson PB, Costanzo MR, Eigler N, Gold M, Klapholz M, et al. Hemodynamic
Monitoring in Advanced Heart Failure: Results from the LAPTOP-HF Trial. J Cardiac Fail.
2016;22:940. doi:10.1016/j.cardfail.2016.09.012.
20. Glitza JI, Müller-von Aschwege F, Eichelberg M, Reiss N, Schmidt T, Feldmann C, et al. Advanced
telemonitoring of Left Ventricular Assist Device patients for the early detection of thrombosis. Journal
of Network and Computer Applications. 2018;118:74–82. doi:10.1016/j.jnca.2018.04.011.
21. Reiss N, Schmidt T, Boeckelmann M, Schulte-Eistrup S, Hoffmann J-D, Feldmann C, Schmitto JD.
Telemonitoring of left-ventricular assist device patients-current status and future challenges. J Thorac
Dis. 2018;10:S1794-S1801. doi:10.21037/jtd.2018.01.158.
22. Schardt C, Adams MB, Owens T, Keitz S, Fontelo P. Utilization of the PICO framework to improve
searching PubMed for clinical questions. BMC Med Inform Decis Mak. 2007;7:16. doi:10.1186/1472-
6947-7-16.
23. Mayring P. Qualitative Inhaltsanalyse: Grundlagen und Techniken. 12th ed. Weinheim: Beltz; 2015.
154 dHealth 2019 – From eHealth to dHealth
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-154

Empowering Diabetes Patients with


Interventions Based on Behaviour Change
Techniques
Oliver JUNGa,1, Dietmar GLACHSa, Felix STROHMEIERa, Robert MULRENINa,
Sasja HUISMANb, Ian SMITHb, Hilde van KEULEN c, Jacob SONT b and Manuela
PLOESSNIGa
a
Salzburg Research Forschungsgesellschaft mbH, Salzburg, Austria
b
Leids Universitair Medisch Centrum, Leiden, The Netherlands
c
Netherlands Organization for Applied Scientific Research (Department Child Health),
The Hague, The Netherlands

Abstract. The number of people with diabetes is increasing in every European


country and like all chronic diseases it cannot be cured. However, patient
empowerment is an acknowledged strategy for improving the patients’ health
situation. This paper describes the Action Plan Engine developed as a tool for
diabetes patients in the POWER2DM project. The Action Plan Engine offers a
guided workflow based on treatment goals and activities. A periodic review
evaluates how successful a patient has fulfilled these goals and activities. Part of the
evaluation is detailed feedback, in particular about 170 interventions based on
Behaviour Change Techniques in order to change a patient’s lifestyle behaviour
towards a healthier, diabetes-appropriate lifestyle. Additionally, the Action Plan
Engine offers decision trees for coping with barriers regarding glucose monitoring,
exercise, carbohydrate, insulin and stress.

Keywords. clinical decision support systems, early medical intervention, patient


participation

1. Introduction

In 2015, diabetes affected 59.8 million people in Europe aged between 20 and 79 years.
According to the International Diabetes Federation (IDF) the number of people with
diabetes is increasing in every European country and it is estimated that the number of
people with diabetes in Europe will rise to 71.1 million in 2040 [1]. Diabetes is basically
a life-long disease and like all chronic diseases it cannot be cured. Nevertheless, there
are strategies for improving the patients’ health situation. One key aspect is empowering
patients to put them in the position to better take care of their diabetes. Information and
Communication Technologies can play a key role in better management of diabetes and
in patient empowerment. Patient empowerment [2] involves patients to a greater extent
in their own healthcare process and disease management becomes an integrated part of
their daily life.

1
Corresponding Author: Oliver Jung, Salzburg Research Forschungsgesellschaft mbH, Jakob-Haringer-
Straße 5/3, 5020 Salzburg, Austria, E-Mail: [email protected]
O. Jung et al. / Empowering Diabetes Patients with Interventions 155

In this paper, we present the approach of the Action Plan Engine developed in the
POWER2DM project2. POWER2DM started in February 2016 and the main aim is to
develop a personalised self-management support system for Type 1 and Type 2 diabetes
patients. It offers a guided action plan for self-management by combining decision
support based on personalised results of interlinked predictive computer models,
feedback functionalities based on Behavioural Change Techniques, and real-time
collection and interpretation of personal data and self-management activities. The Action
Plan Engine is a web-based module in POWER2DM and integrates personalized
behaviour change interventions to increase adherence of the patients to their care
program and improve their interaction with health professionals.
The Action Plan Engine in POWER2DM is an advancement of the self-management
support system developed in the EMPOWER project [3] by means of refactoring and
addition of interventions, advanced exercises and modular design approaches.

2. Overview of the Action Plan Engine

The Action Plan Engine offers a guided workflow as an iterative cycle, typically on a
weekly basis. For every cycle, the patient is encouraged to specify tasks and activities he
or she wants to take care of in this period. These planned activities help to adhere to
medical treatment plans, e. g. measuring glucose values, but may also support the
accomplishment of personal goals in planning exercises. If a patient specifies activities
on a weekly basis, the likelihood that these activities are realistic is higher than planning
activities for a longer period. However, the Action Plan cycle can also be bi-weekly,
monthly or of another duration. Besides planning of activities, the Action Plan Engine
supports writing of diaries with respect to mood or stress.
The Action Plan Engine interacts with other POWER2DM components: (i) with the
component for the doctors (the Shared Decision Making application) for supporting the
appointment and for specifying treatment goals and activities and (ii) with the mobile
app for a convenient acquisition of patient data and integration of device data.

2.1. Conceptual Approach

Basically, the Action Plan Workflow comprises four main steps (see Figure 1): in the
first step, the patient can specify long-term self-management goals based on personalised
values and on the treatment plan and goals. Based on the treatment goals, the patient can
also specify a treatment goal in a more detailed way (e.g. specifying the type of exercise
he would like to do) but also add additional personal goals.
In the next step and based on the self-management goals, the patient specifies short-
term (e.g. weekly) activities by using a calendar. Relating an activity to a goal keeps the
user aware why he is performing an activity.
Next, patient data will be recorded by devices but also manually through web and/or
mobile forms. This phase supports the self-monitoring of vital data and behaviour.
Currently, the following patient data can be recorded via web forms: blood glucose,
blood pressure, body weight, exercises, meals, problems, sleep and stress.

2
The POWER2DM project is funded by the European Union’s Horizon 2020 research and innovation
programme under grant agreement No 689444
156 O. Jung et al. / Empowering Diabetes Patients with Interventions

Figure 1. Patient Empowerment Workflow through POWER2DM.

In the last step, the Action Plan Engine evaluates and gives feedback how successful
the patient has fulfilled his planned goals and activities. This includes feedback about the
overall performance and the performance of all concerned goals and activities.
Additionally, the Action Plan Engine provides hints and advices (=interventions for self-
management) for all activities and goals. Interventions can be in a different context, e.g.
a tip for improving self-management activities, an advice based on national guidelines
(e.g. recommended duration for physical activities), a tip for coping with daily problems
(e.g. sleep problems or stress) or positive reinforcement [4] by means of a motivational
message (e.g. when the patient has successfully completed all his activities for a specific
goal).
Furthermore, part of the Action Plan Engine are some exercises such as the Energy
Battery, a metaphor in three steps for mood or energy problems (e.g. in case of low mood
or too much stress, in case of sleeping problems), the Value Compass, a tool for reflecting
on the importance of personal values in different life areas (e.g. to support goal
definition), and the Information Material, a WordPress website including detailed articles
about information and problems relevant for diabetes patients.

2.2. Technical Approach

Technically, the Action Plan Engine is a web-application which provides a fully


interactive, multi-lingual 3 graphical user interface accessible with both, desktop PC
and/or mobile devices (tablets and smartphones). Furthermore, an application
programming interface (API) is provided for third-party applications by means of REST
services exchanging JSON objects [5].

3
Currently supported languages are English, Spanish, Dutch and German.
O. Jung et al. / Empowering Diabetes Patients with Interventions 157

Figure 2. Action Plan Engine - General Architecture.

This API is also used by the Mobile Application developed within the POWER2DM
project. Figure 2 depicts the general architecture and the main components of the Action
Plan Engine. Towards the user, the Action Plan provides an HTML5/JS application. In
the backend, the Action Plan Engine is implemented as a Java Servlet running inside a
secure container providing controlled access via the APIs.
Separate APIs for each service are provided, such as the management of goals,
planning of activities and observations or accessing the review over a specified period.
For providing such information, the Action Plan Engine does not store any patient data
itself, but relies on existing, secure patient data management infrastructures. In
POWER2DM, the FHIR-compatible 4 personal data store (PDS) [6] and the
POWER2DM identity service for authorization and authentication have been used. The
Action Plan Engine only transforms patient data, calculates graphs and statistics, and
creates interventions based on the action plans and results stored in the PDS. Also, all
data entered via the Action Plan is stored only in the secure PDS.
Figure 3 shows the menu structure provided to the patients after authentication.
Landing page is a “Dashboard”. From that, the patient can navigate to the “Treatment
Plan”, which contains goals defined by the care provider and the patient together during
a “Shared-Decision Making” process. Goals defined in the treatment plan can be adopted
by the patient and further detailed in their own self-management “Action Plan”. This
menu item also contains the links to further information such as the review, or exercises
like the Energy Battery and Value Compass. Daily activities can be either recorded
through the mobile application, or by using the “Journal” pages. Finally, profile and
settings are available for the patient to change personal preferences.

4
“Fast Healthcare Interoperability Resources” (http://hl7.org/fhir) is a medical standard created by the
standardisation organisation HL7.
158 O. Jung et al. / Empowering Diabetes Patients with Interventions

Figure 3. Action Plan Engine User Interface – Main Navigation elements.

Further technical details about the Action Plan Engine are described in a public
report on the prototype architecture of POWER2DM [7].

3. Interventions for Behavioural Changes

Most theories and determinants explain behaviour, but do not describe how to change
behaviour. In POWER2DM, the interventions of the Action Plan Engine are based on
the Behaviour Change Techniques (BCTs) of Abraham and Michie [8]. They describe
interventions to change a person’s lifestyle behaviour. A BCT is an “observable,
replicable, and irreducible component of an intervention designed to alter or redirect
causal processes that regulate behaviour”.
The Action Plan Engine provides interventions as part of the periodic review and by
suggesting interventions at the end of barrier decision trees. These interventions are
stored in the intervention table which is based on a dual approach and supports both a
psychological and a technical approach. The starting point is the compliance with the
planned goals and activities. Depending on the degree of fulfilment, different types of
interventions/purposes and BCTs can be specified, e.g. positive reinforcement when a
goal resp. an activity is completely achieved, a question to detect a barrier when a goal
resp. an activity is almost or not achieved.
Currently, the intervention table of the Action Plan Engine includes about 170
different interventions. These interventions can be of different types. They can be plain
text (e.g. positive reinforcement), they can refer to an external website (e.g. about a
detailed description of diabetes and coping with emotions), they can recommend an
exercise (e.g. an exercise for coping with low mood or energy problems) and they can
refer to a more detailed explanation in the POWER2DM information material (e.g. an
article about fear of needles).

3.1. Interventions for the Periodic Review

The periodic review collects all scheduled activities within the review period where each
scheduled activity is marked as completed whenever a corresponding observation is
present. Otherwise the scheduled event remains as planned in the review. As a result, the
number of planned activities compared to the number of completed activities denotes the
compliance or performance in completing planned activities. However, since activities
are of a particular type (monitoring glucose, doing exercises etc.), the review
computation is also performed for each activity type and of course for the activities
O. Jung et al. / Empowering Diabetes Patients with Interventions 159

overall. Furthermore, any activity may point to one or more related goals. Hence, the
performance computation is additionally performed for each goal and on an overall basis.
As a result, the review shows several review categories such as overall performance,
activity performance, goal performance. Besides the planned activities and goals, the
review takes additional recordings such as sleeping problems, mood or stress into
account. For these data, we use Likert Scales [9] as a basis for evaluation in the periodic
review (sleeping problem intensity, mood level, and stress level).
From the technical point of view, the degree of the patient’s compliance in planning
activities and successfully completing them is the basis for interventions. Interventions
in this context are motivational or informative messages shown to the patient to foster
behaviour change. To support the selection of meaningful interventions, the selection of
an intervention is based on several rules such as the review category (e.g. activity or
goal) and the performance rule to compare with the review result in order to show only
helpful messages to the patient.
For the review, the accurate evaluation of the performance with the intervention
table is a crucial task. For this, the performance rule consists of an expression constant,
a comparator and the target value. The expression constant points to the computed
review performance result (e.g. performance of glucose monitoring activities) or to the
recorded problem or stress intensity respectively. Since the review period covers several
days or even weeks, appropriate expressions for selecting the lowest, the highest or the
average intensity values are available. The comparator expression allows the comparison
of the resolved review value with the given target value.
The performance however is expressed as a percentage of completed tasks compared
to the number of planned tasks. The resulting percentage is transformed into a
corresponding 4-step Likert Scale outlining the degree of compliance. The values for
problem intensity and stress levels are aligned to a Likert Scale as well, thus specifying
the performance criteria for all kinds of interventions is simple and straight forward.
Finally, when computing the review and selecting the proper interventions, eligible
messages are identified by i) filtering for the review category and ii) applying the
performance rule. Whenever all rules evaluate to true, the intervention is eligible to be
shown to the patient. This ensures that the patient gets appropriate feedback based on his
achievements.

3.2. Decision Trees for Coping with Barriers

Decision Trees are ultimately a special kind of intervention. They can be triggered by
other interventions, observations or user interaction based on defined rules. In contrast
to other interventions, they incorporate direct user feedback.
Since the rules and workflows of the Decision Trees are highly based on expert
knowledge and defined by practitioners, a dynamic “workflow tool” has been developed
using vis.js 5 and Node-RED 6. It allows for definition by non-technicians that can be
exported to a structured JSON format for integration into the Action Plan Engine
subsequently.
Figure 4 shows an example workflow that covers all definable node and edge types.
Nodes can be i) comments (grey) – for improving collaboration, ii) questions (dark blue)

5
“vis.js” (http://visjs.org/) is a dynamic, browser based visualization library
6
“Node-RED” (https://nodered.org/) is a flow-based programming tool for the Internet of Things
160 O. Jung et al. / Empowering Diabetes Patients with Interventions

– asked to the user for direct feedback, iii) content (purple) – redirects to new pages or
static content, iv) conditions (light blue) – for checking existing observations, v) triggers
(orange) – for manually or periodically executing the workflow and vi) actions (red) –
for activating existing triggers. Edges can be i) answers (light blue) – for proceeding
based on user input and ii) forwards (grey) – for directly linking nodes.

Figure 4. Action Plan Engine Workflow Tool – Example workflow.

The exported JSON file is used for creating more complex user interface elements
or dialog-based select inputs dynamically generated on the front-end. Figure 5 shows an
example dialog about glucose monitoring. The user is guided through the previously
defined questions and answers gradually while conditions are checked and content is
loaded in the background. In this example, the user states to monitor too little due to
disliking needles and is provided with a link to an information page on how to overcome
needle phobia.

Figure 5. Action Plan Engine Decision Trees – Example dialog.

In the end, each Decision Tree can result in an intervention or trigger another
Decision Tree allowing for complex and comprehensive rules and support by the use of
easily definable components. In the POWER2DM project, we specified five decision
O. Jung et al. / Empowering Diabetes Patients with Interventions 161

trees for coping with barriers regarding glucose monitoring, exercise, carbohydrates,
insulin and stress that consist of up to 30 sub-trees.

4. Conclusions

In this paper, we presented the approach of the Action Plan Engine developed as a
component in the POWER2DM project. An early prototype of POWER2DM including
the Action Plan Engine is currently implemented and evaluated in a randomised trial with
9 months follow-up in total with 230 patients (115 type-1 diabetes, 115 type-2 diabetes)
in pilot applications in the Netherlands and in Spain. The trial aims at evaluating the
acceptance rate and effectiveness of the presented interventions as well as the HbA1c
levels in comparison to a reference group. Although continuous feedback from
physicians, psychologists and patients helps to improve the prototype, it is currently too
early to present reliable evaluation results. So far, patients are enthusiastic about the idea
of a holistic self-management support system, receiving support and feedback on
physiological, behavioural and psychological parameters.
The Action Plan Engine is actually part of the POWER2DM system, integrating with
the FHIR based storage and a provided single-sign-on solution. However, due to its
modular design, the Action Plan Engine may be used in many environments and can be
easily transferred and used in other e-Health projects, focussing on different diseases. A
connection and integration to third party health systems is also possible due to the use of
standardised interfaces. This also allows for integrating more natural input methods, such
as voice recognition, for some exercises (e.g. decision trees).

References

[1] IDF, Across the Globe, http://www.diabetesatlas.org/across-the-globe.html, last access: 9.1.2019.


[2] Castro, E., Regenmortel, T., Vanhaecht, K., Sermeus, W., Hecke, A., Patient empowerment, patient
participation and patient-centeredness in hospital care: A concept analysis based on a literature review,
Patient Education and Counseling 99(12) (2016), 1923-1939.
[3] Schmuhl, H., Demski, H., Lamprinos, I., Dogac, A., Ploessnig, M., Hildebrand, C., Concept of knowledge-
based self-management pathways for the empowerment of diabetes patients, EJBI 10(2) (2014), 12-16.
[4] Elder, J., Ayala, G., Harris, S., Theories and intervention approaches to health-behavior change in primary
care, American Journal of Preventive Medicine 17(4) (1999), 275-284.
[5] Kumar, D., Best Practices for Building RESTful Web Services,
https://www.infosys.com/digital/insights/Documents/restful-web-services.pdf, last access: 14.1.2019.
[6] Namli, T., Köse, O., POWER2DM D4.2 Personal Data Store Service Implementation,
http://power2dm.eu/wp-content/uploads/POWER2DM-D4.2.pdf, last access: 22.3.2019.
[7] Namli, T., POWER2DM D1.3 Conceptual Design of the POWER2DM Architecture,
http://www.power2dm.eu/wp-content/uploads/Power2DM-D1.3-1.pdf , last access: 24.1.2019.
[8] Abraham, Ch., Michie, S., A Taxonomy of Behavior Change Techniques Used in Interventions by the
American Psychological Association, Health Psychology 27(3) (2008), 379–387.
[9] Joshi, A., Kale, S., Chandel, S., Pal, D., Likert Scale: Explored and Explained, British Journal of Applied
Science & Technology 7(4) (2015), 396-403.
162 dHealth 2019 – From eHealth to dHealth
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-162

Topics for Continuous Education in


Nursing Informatics: Results of a Survey
Among 280 Austrian Nurses
Elske AMMENWERTHa,1 and Werner O. HACKLa
a
Institute of Medical Informatics, UMIT – University for Health Sciences, Medical
Informatics and Technology, Hall in Tirol, Austria

Abstract. Background: Nurses are increasingly confronted with IT-based systems


as part of their daily work. However, they often lack basic competencies in
managing these complex systems. Objectives: To analyze the need for continuous
education in health informatics among Austrian nurses. Methods: Survey within five
of the largest healthcare organizations in Austria. Overall, 280 nursing practitioners
with IT responsibilities and nursing managers from middle and top management
participated. Results: Participants assessed five topics (IT project management, IT
in nursing, eHealth, nursing terminologies, and computer science basics) as
important for continuous education in health informatics. Top management rated the
importance of most topics higher than middle management did. Nursing
practitioners gave ratings in between middle and top management. Conclusion:
Austrian nursing practitioners with IT responsibilities and nursing managers see a
need for continuous education in health informatics. This supports findings of
international recommendations of nursing informatics continuous education. There
is, however, a lack of suitable opportunities for continuous education in Austria.

Keywords. nursing informatics, competency-based education, needs assessment

1. Introduction

Health care nowadays is unthinkable without the use of modern information and
communication technologies. In Austria, nurses and other health care professionals
routinely use IT-based tools such as electronic medical records, computerized physician
order entry systems, patient data management systems or mobile documentation tools in
their daily work. In the future, also health information exchange between institutions will
have an increasing impact on nursing [1]. Managing these complex socio-technical
information systems and the increasing volume of patient information is not a trivial task
[2]. Studies show challenges related to introduction and use of IT systems in nursing,
such as inefficient workflow support [3], low usability [4] and limited evidence on the
impact of nursing IT on quality of care and patient outcome [5].
These problems at least partly originate in the fact that nurses and other health care
professionals have insufficient competencies in dealing with these challenges; they seem
to have difficulties, for example, to express their requirements, to contribute to system

1
Corresponding Author: Elske Ammenwerth, Instittue of Medical Informatics, UMIT – University for
Health Sciences, Medical Informatics and Technology, EWZ 1, 6060 Hall in Tirol, Austria, E-Mail:
[email protected]
E. Ammenwerth and W.O. Hackl / Topics for Continuous Education in Nursing Informatics 163

implementation and system testing, to prepare for IT-based workflow changes, and to
establish an adequate change management and communication policy.
Often, other professional groups such as health informaticians who are adequately
trained for these types of clinical IT projects thus coordinate these projects in hospitals.
However, such projects have a higher chance to succeed with close end-user involvement.
Therefore, clinical IT projects seem not well manageable without a close cooperation
between IT staff (both in the IT department of the health care organizations and at the
vendors side) and clinical staff (such as nurses and other health care professionals).
When we look at the situation in Austria, nurses are mostly not well equipped to
contribute to system analysis, system specification, system selection, system
implementation, and system evaluation, as informatics competencies are quite limited
among Austrian nurses. Nursing informatics competencies are typically not part of
nursing education, nor do adequate continuous education opportunities exist. We see the
same situation in Germany and Switzerland [6].
In other countries, nursing informatics seems much better integrated in nursing
education and continuous education. For example, countries such as Australia [7] offer
an own career option for nursing informatics. This is not the case in Austria. However,
also in Austria, more and more nurses contribute to IT projects and need additional IT-
related competencies.
We were thus interested in better understanding whether nurses with IT
responsibilities and nursing managers themselves see a need for further education in
nursing informatics, and if yes in which topics. The objective of this study is thus to
analyze the needs for continuous education in health informatics among nurses in Austria.
Our motivation was to use this information to design a tailored continuous education
program in health informatics for nurses and other health care professionals.

2. Methods

Chief nursing managers of five of the largest Austrian health care organizations (AUVA,
GESPAG, SALK, KAGES, Tirol Kliniken) were contacted. All agreed to participate. A
survey was prepared, covering 5 major topics and 52 sub-topics of nursing informatics.
The list of topics and sub-topics was developed based on a literature review of
international recommendations on health informatics education of several institutions,
including: Australian Health Informatics Education Council [7], Global Academic
Curricula Competencies for Health Information Professionals [8], Technology
Informatics Guiding Education Reform [9], Canadian Association of Schools of Nursing
[10] and International Medical Informatics Association [11].
For each topic and sub-topic, we asked the following question: “Would you be
interested in continuous education for nursing staff involved in IT projects?”
For the five major topics, a yes/no answer was possible. For the 52 sub-topics, a 4-
point scale was used to document the answer (1 = not interesting, 4 = interesting).
The survey was distributed in all five participating health care institutions using a
snowball system. The survey was organized as online-based survey; only in one health
care institution, paper-based questionnaires were used. Nursing managers on different
hierarchical levels as well as nursing practitioners involved in IT projects were invited
to participate. Participation was voluntary and fully anonymous. Survey results were
analyzed using SPSS.
164 E. Ammenwerth and W.O. Hackl / Topics for Continuous Education in Nursing Informatics

3. Results

Overall, 330 questionnaires were returned, with responses coming from nurses in a broad
range of professional positions. First results of this broader survey have already been
published [12]. Now, for this paper, we focus on a more detailed analysis of the responses
of nurses with additional IT responsibilities and of nurses in middle or higher
management roles. These roles were chosen, as the continuous education program that
we planned would target nurses with IT responsibilities, so we were interested in their
opinion. Also, as middle and top management have to approve such education, we also
were interested in their judgment. Overall, 280 respondents came from these groups and
were thus included in the analysis. Table 1 shows the participants and their professional
position.

Table 1. Survey participants and their professional position


Professional position Number of participants %
Nurses in direct patient care with 51 18,2%
additional IT responsibilities
(e.g. IT key user)
Middle nursing management 201 71,7%
(e.g. nursing ward manager)

Senior nursing management 28 10,0%


(e.g. nursing director, nursing
manager)

For all five main topics, the overall answers showed high interest for continuous
education: IT in nursing (overall interest: 92% of all responses); IT project management
(85%); eHealth technologies (85%); nursing terminologies (83%); computer science
basics (81%).
Figure 1 shows the answer regarding the five main topics for each surveyed
professional role. All three groups show comparable support for all five topics. Nurses
with additional IT responsibilities show least interest in nursing terminologies and
computer science basics. Middle management shows least interest in computer science
basics and eHealth technologies. Top management shows least interest in computer
science basics and IT project management.

Figure 1. Interest in continuous education for five main topics in health informatics, dependent on the
professional role of survey participants (n = 280). To highlight the trends, the category “no answer” is not
presented. Only yes/no answers were possible for these five major topics.
E. Ammenwerth and W.O. Hackl / Topics for Continuous Education in Nursing Informatics 165

Figure 2. Interest in continuous education for 52 sub-topics in health informatics, dependent on the
professional role of survey participants (n = 280). To highlight the trends, the category “no answer” is not
presented. Answers of 1 (interesting) and 2 (partly interesting) are combined and presented in green; answers
3 (partly uninteresting) and 4 (uninteresting) are combined and presented in red.

Figure 2 shows the answers of the 280 respondents for the 52 sub-topics. To allow
better identification of the most interesting topics, answers were classified in
“interesting” versus “not interesting”, and “no answer” responses were omitted. These
answers were comparable between the five participating health care institutions, thus
sub-group analysis is not presented.
Results show some differences between the professional roles. For example, demand
for nursing terminologies was 80% for nurses with IT duties, 92% for nurses in middle
management and 100% for nurses in higher management. In general, as Figure 2 shows,
166 E. Ammenwerth and W.O. Hackl / Topics for Continuous Education in Nursing Informatics

top management considered most of the topics of higher importance than middle
management did. Nurses with IT responsibilities showed larger interest in most topics
compared to middle management, but a bit less than top management.
Several sub-topics reached highest interest among all participants. Here is a list of
sub-topics with highest support over all three groups, ordered according to topic:
x IT project management: Standardization and optimization of nursing workflow;
process and change management in nursing; usability of IT systems.
x IT use in nursing: Electronic patient records; electronic nursing documentation
systems; electronic medication systems.
x eHealth technologies: Importance of eHealth on nursing; importance of
electronic health records; mobile IT tools in nursing.
x Nursing terminologies: Legal basis for electronic patient records; development
of nursing documentation systems; ensuring the quality of nursing
documentation.
x Computer science basics: Creating and using small databases; legal
requirements for data privacy and security.

A sub-group analysis of “no answers” showed some interesting differences between


the three groups. The mean percentage of “no answers” over all sub-topics was 25% for
nurses with additional IT responsibilities, 19% for middle management, and only 15%
for high management. This indicates that the higher the position, the more clear is the
opinion on interesting topics. Items with highest “no answers” percentages were
“creation of an IT functional specification” for nursing with additional IT responsibilities
(38% “no answer”), “electronic surgical documentation systems” for middle
management (44%) and “Programming simple computer programs” for higher
management (26%), indicating topics the different professional roles are probably not
familiar with and thus cannot judge.

4. Discussion

Our survey supported a clear interest in continuous education in the indicated topics and
sub-topics. We included only nursing practitioners with IT responsibilities as well as
middle and top nursing managers in the analysis. We focused on these group as we see
this group as target audience and important stakeholder for a planned continuous
education program in health informatics. A survey with nursing practitioners without IT
responsibilities may certainly have yielded different results.
All groups showed high interest in most of the presented topics. Interest correlated
with the professional position: Top nursing managers mostly showed a stronger interest
in most topics than middle nursing management. This may reflect a better understanding
of the strategic benefits and challenges of eHealth technologies in nursing, as responded
to these challenges demands well-trained nursing work force.
Nurses practitioners with IT responsibilities also showed high interest in most sub-
topics. Their preferences, however, differed partly from that of top nursing managers.
For example, while top management showed interest in project management and IT
specifications, nursing practitioners did show much lower interest in this. In turn, nursing
practitioners showed large interest in interfaces (such as HL7) which was not a topic of
interest for top managers. This may reflect the different – operational versus strategic –
E. Ammenwerth and W.O. Hackl / Topics for Continuous Education in Nursing Informatics 167

perspectives of information management in nursing. All groups showed quite


consistently less interest in topics related to basics of computer science. Here,
respondents did not consider basic computer skills as needed to deal with complex
application such as electronic patient records.
The strength of this survey is the relatively large number of 280 participants from
five of the largest health care organizations in Austria. As a limitation, we used a
convenience sample and no random sample as well as a snowball system to recruit
participants. Mostly participants with interest in eHealth topics may have volunteered to
participate in the survey; this may have led to a selection bias, resulting in higher demand
for continuous education. On the other side, the answers were identical in all five
participating institutions, indicating some validity of findings. Our survey participants
came from larger health care organizations; outpatient and home-care nurses were not
included in this survey.
Our results are supported by recent larger study to develop nursing informatics core
competencies for nurses in the DACH region (Germany, Austria, Switzerland) [6]. In
this study, based on a literature survey, an expert survey and expert focus groups with 87
experts, 24 core nursing informatics competencies were developed and validated for five
different nursing professional roles (such as clinical nurse or nursing manager). In this
study, highest rated competency areas for nursing managers were, among others, nursing
documentation (including terminologies), process management, and project management.
Highest rated competencies for clinical nurses were nursing documentation (including
terminologies), data protection and security, information management in nursing, and IT
and ethics. The recommended core competencies were thus quite comparable with our
results. While this DACH study was based on expert opinions, our survey was based on
opinions of nursing practitioners and nursing managers, both surveys together thus
providing a good few on needed competencies.
On an international scale, nursing informatics has already been recognized as an
important factor for future innovation in health care [13]. However, nursing graduates
have been found to be inadequately prepared for nursing informatics [14]. International
position papers stress the need to provide nurses with nursing informatics competencies
and propose to define Chief Nursing Informatics Officer (CNIO) as new nursing
management role [15]. Our survey thus seems quite timely, as it reflects the competencies
needed to build up a competent nursing informatics workforce.
In our survey, corresponding to these proposals, nursing managers showed large
interest in continuous education for their workforce. However, in Austria, there is a lack
of suitable opportunities for continuous education in nursing informatics. Table 2
summarizes some of these opportunities, with focus on part-time programs. The master
program at UMIT just started and was built, among others, on the results of this survey.
All master programs typically require a bachelor degree. Nursing education in
Austria is just moving toward academic degrees, so many nursing practitioners without
bachelor degree cannot enroll in these master programs.
168 E. Ammenwerth and W.O. Hackl / Topics for Continuous Education in Nursing Informatics

Table 2. Some opportunities for part-time continuous education in nursing informatics in Austria.
University Name of program Comments Source
University for 3-day short Offered since 2012, addresses nurses www.umit.at/pflegeinfor
Health Sciences, introductory with interest in IT matik
Medical Informatics course “Applied
and Technology Nursing
(UMIT) Informatics”
University for 2,5-years part-time Online-based program. Targets nurses www.umit.at/him
Health Sciences, master program and other health care professionals
Medical Informatics “Health with bachelor degree, as well as
and Technology Information graduates from technical studies.
(UMIT) Management”
University of 2-years part-time Addresses “health experts”, including https://www.fhstp.ac.at/
Applied Sciences master program nurses with bachelor degree, as well de/studium-
St. Pölten “Digital Health” as graduates from technical studies. weiterbildung/medien-
digitale-
technologien/digital-
healthcare
FH Joanneum Graz 2-years part-time Addresses, among others, medical- https://www.fh-
master “eHealth” technical or management graduates joanneum.at/ehealth/ma
(with IT knowledge) and graduates ster
from technical studies.

5. Conclusion

Austrian nursing practitioners and nursing managers show a strong interest for
continuous education in health informatics. This supports findings of other international
surveys. There is, however, a lack of suitable opportunities for continuous education in
Austria.
The results of the survey have been used to design a new master program in Health
Information Management at our University [16]. This master program is fully online and
thus especially suited for continuous education of health care professionals. We will
carefully monitor the participants and their professional background in the future to
determine whether this educational offer is accepted among nurses and other health care
professionals.

References

[1] G.L. Alexander, M. Rantz, C. Galambos, A. Vogelsmeier, M. Flesner, L. Popejoy, J. Mueller, S.


Shumate, and M. Elvin, Preparing Nursing Homes for the Future of Health Information Exchange,
Appl Clin Inf. 6 (2015) 248–266.
[2] E. Berner, and J. Moss, Informatics challenges for the impending patient information explosion, J
Am Med Inf. Assoc. 2 (2005) 614–7. doi:10.1197/jamia.M1873.
[3] M. Yeung, S. Lapinsky, J. Granton, D. Doran, and J. Cafazzo, Examining nursing vital signs
documentation workflow: barriers and opportunities in general internal medicine units, J Clin Nurs.
21 (2012) 975–82. doi:10.1111/j.1365-2702.2011.03937.x.
[4] J. Viitanen, A. Kuusisto, and P. Nykänen, Usability of electronic nursing record systems: definition
and results from an evaluation study in Finland, Stud Heal. Technol Inf. 164 (2011) 333–8.
[5] M. Ko, L. Wagner, and J. Spetz, Nursing Home Implementation of Health Information Technology:
Review of the Literature Finds Inadequate Investment in Preparation, Infrastructure, and Training,
Inquiry. Jan-Dec (2018). doi:10.1177/0046958018778902.
[6] N. Egbert, J. Thye, W.O. Hackl, M. Müller-Staub, E. Ammenwerth, and U. Hübner, Competencies
for nursing in a digital world. Methodology, results, and use of the DACH-recommendations for
E. Ammenwerth and W.O. Hackl / Topics for Continuous Education in Nursing Informatics 169

nursing informatics core competency areas in Austria, Germany, and Switzerland., Inform. Health
Soc. Care. Aug (2018) 1–25. doi:10.1080/17538157.2018.1497635.
[7] AHIEC, Health Informatics - Scope, Careers and Competencies V1.9 (2011).
http://www.ahiec.org.au/docs/AHIEC_HI_Scope_Careers_and_Competencies_V1-9.pdf.
[8] Global Health Workforce Council, Global Academic Curricula Competencies for Health Information
Professionals (2015). http://www.ahima.org/about/~/media/AHIMA/Files/AHIMA-and-Our-
Work/AHIMA-GlobalCurricula_Final_6-30-15.ashx?la=en.
[9] TIGER, The TIGER Initiative - Technology Informatics Guiding Education Reform (2015).
http://thetigerinitiative.org.
[10] CASN, Nursing Informatics - Entry-to-Practice Competencies for Registered Nurses (2013).
http://www.casn.ca/2014/12/nursing-informatics-entry-practice-competencies-registered-nurses-2.
[11] J. Mantas, E. Ammenwerth, G. Demiris, A. Hasman, R. Haux, W. Hersh, E. Hovenga, K.C. Lun, H.
Marin, F. Martin-Sanchez, and G. Wright, Recommendations of the international medical
informatics association (IMIA) on education in biomedical and health informatics, Methods Inf. Med.
49 (2010). doi:10.3414/ME5119.
[12] W.O. Hackl, E. Ammenwerth, and R. Ranegger, Bedarf an Fort- und Weiterbildung in
Pflegeinformatik – Ergebnisse einer Umfrage, Zeitschrift Für Pflegewiss. (2016) 381–387.
doi:10.3936/1354.
[13] E. Hovenga, H. Sinnott, and J. Gogler, Operationalising the National Nursing Informatics Position
Statement, Stud Heal. Technol Inf. 250 (2018) 221–3.
[14] E. Shin, E. Cummings, and K. Ford, A qualitative study of new graduates’ readiness to use nursing
informatics in acute care settings: clinical nurse educators’ perspectives, Contemp Nurse. 51 (2018)
64–76. doi:10.1080/10376178.2017.1393317.
[15] S. Remus, and M. Kennedy, Innovation in transformative nursing leadership: nursing informatics
competencies and roles, Nurs Leadersh. 25 (2012) 14–26.
[16] E. Ammenwerth, W.O. Hackl, M. Felderer, and A. Hörbst, Developing and evaluating collaborative
online-based instructional designs in health information management, Stud Heal. Technol Inf. 243
(2017) 8–12.
170 dHealth 2019 – From eHealth to dHealth
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-170

Pre-Navigation via Interactive Audio Tactile


Maps to Promote the Wellbeing of Visually
Impaired People
Mark SCASEa,1, Ed GRIFFINb and Lorenzo PICINALIc
a
Division of Psychology, De Montfort University, Leicester, UK
b
School of Nursing and Midwifery, De Montfort University, Leicester, UK
c
Dyson School of Design Engineering, Imperial College London, London, UK

Abstract. Background: Pre-navigational tools can assist visually impaired people


when navigating unfamiliar environments. Assistive technology products (eg
tactile maps or auditory simulations) can stimulate cognitive mapping processes to
provide navigational assistance in these people. Objectives: We compared how
well blind and visually impaired people could learn a map presented via a tablet
computer auditory tactile map (ATM) in contrast to a conventional tactile map
accompanied by a text description objectives. Methods: Performance was assessed
with a multiple choice test that quizzed participants on orientation and spatial
awareness. Semi-structured interviews explored participant experiences and
preferences. Results: A statistically significant difference was found between the
conditions with participants using the ATM performing much better than those
who used a conventional tactile map and text description. Participants preferred
the flexibility of learning of the ATM. Conclusion: This computer-based ATM
provided an effective, easy to use and cost-effective way of enabling blind and
partially sighted people learn a cognitive map and enhance their wellbeing.

Keywords. cognition, vision, blindness, eHealth

1. Introduction

The health and wellbeing of visually impaired and blind people can be affected by the
ability of these individuals to navigate around the world [1]. The human cognitive
system incorporates several aspects one of which is spatial navigation. The layout and
arrangement of the environment is encoded and mapped [2-5]. Cognitive processes
underlying this mapping have attracted considerable research [6-10]. Mental models
reflect the spatial layout and locations of objects in the world [11-12] and it is possible
to generate these models by giving individuals physical maps of unknown
environments which act as pre-navigational tools.
Whilst people with visual impairments can navigate familiar places well they can
have difficulty in new environments [13]. The processes by which both sighted and
visually impaired (including blind) people acquire and process mental imagery has
been found to be similar [14] with some evidence to suggest that blind people

1
Corresponding Author: Mark Scase, Division of Psychology, De Montfort University, The Gateway,
Leicester, LE1 9BH, UK, E-Mail: [email protected]
M. Scase et al. / Pre-Navigation via Interactive Audio Tactile Maps 171

compensate by having better tactile acuity [14-15], voice memory [16] and auditory
localization [17].
Assistive technology can be used to enhance the wellbeing of blind and visually
impaired people [18, 1]. Tactile maps, haptic navigation and global positioning
systems can promote cognitive mapping and help blind and visually impaired
individuals navigate [19-21]. Pre-navigational aids incorporating tactile components
such as Braille can provide information for a map [22] and have been received well by
visually impaired users [23]. Furthermore, representations about objects or features
can come in tactile form either via embossed paper or devices with pins and haptic
feedback [24] producing vibration when an area of interest is touched. The
disadvantage of tactile maps including Braille is that this tactile print requires more
space that conventional text. Therefore, the amount of haptic information that can be
presented for a certain map size is less than for a printed map.
An alternative Braille is a tactile map with audio feedback or audio description
[25]. A system with audio-tactile interaction can improve blind user satisfaction [26]
and can help with non-visual navigation [27]. Combining tactile and audio stimuli can
produce a flexible learning system. Individuals with visual impairments prefer
navigation via route-like descriptions [28] particularly if they can have the flexibility to
construct their own route from a personal mental model of the cognitive map.
A paper tactile map was combined with a tablet computer by O’Sullivan et al. [22]
to produce a prototype audio tactile map (ATM). A paper tactile map was made by
printing onto swell paper and then heated to create ridges corresponding to the map.
This paper was then overlain onto the tablet. When users touched the paper the tablet
detected the presses and could give audio feedback. Through a user-centered design
with visually impaired people this system was able to give a multimodal experience
and be a pre-navigational tool for individuals with visual impairments. This paper is an
extension of the research originally conducted by O’Sullivan et al. with the aim to
assess the effectiveness of this ATM amongst visually impaired individuals.
The aim of the current study was to compare an ATM prototype providing a
flexible learning environment with a conventional tactile map that was accompanied by
a route-based audio description. Learning effectiveness would be assessed by mixed
methods both quantitatively, and qualitatively through interviews. Since ATMs are
multimodal in nature allowing for multiple learning styles it was predicted that
participants using an ATM would acquire and retain knowledge of a map better than
those using a tactile map accompanied by an audio description.

2. Methods

2.1. Participants

Fourteen volunteers (eight male; six female) with congenital (n=6) or acquired (n=6)
visual impairments (2 undisclosed), took part in the study. All participants reported
visual impairments ranging from mild (i.e. low vision assisted by lenses, but not
entirely corrected) to complete blindness (i.e. no perception of light). Ages ranged from
30 to 65 years (M=48.8; SD=14.4). Participants were randomly assigned to either an
experimental group (Condition 1) or a control group (Condition 2) using a matched
pairs approach. Matching was based on participant severity of visual impairment (e.g.
mild, moderate, severe, blind).
172 M. Scase et al. / Pre-Navigation via Interactive Audio Tactile Maps

2.2. Theoretical Background & Design

A mixed-methods approach was used combining quantitative and qualitative


procedures [29] where a quantitative experimental study (Phase 1) was followed by
semi-structured interviews with the same participants (Phase 2) [30]. The quantitative
component informed the qualitative strand by providing a focal point for discussion,
whilst the qualitative elements helped explain factors contributing to group differences.
In phase 1 one group (Condition 1) was exposed to the ATM for five minutes,
whilst the other was exposed to an identical non-interactive tactile map accompanied
by a verbal description of a journey through that map. Following a 1-minute break
participants completed 20 multiple choice questions assessing orientation and spatial
awareness. Both map conditions were based upon the same fictional building of a
health club and the same information was provided to participants in both conditions.
However, the interface in which participants received the information differed.
Phase 2 involved the collection of qualitative data via semi-structured interviews.
Participants were asked a series of questions about the experiment and responses were
analyzed thematically by: 1) Data familiarization, 2) Coding, 3) Searching for themes,
4) Reviewing the themes, 5) Defining and naming themes [31].

2.3. Materials

A fictitious map depicting a health club (Figure 1) was used for both conditions
because of its distinctive rooms (e.g. Swimming Pool Room, Gym, Café, Sauna etc.)
with unique atmosphere, sound and sensory stimuli. This map was incorporated into an
ATM (Condition 1) and a conventional tactile map with an accompanying description
(Condition 2). Both maps provided the same detail and information.
In the ATM condition a tactile map was printed onto swell paper where the
internal and external walls and door spaces were embossed. The paper had a QR code
printed in one corner which, when scanned by the tablet computer camera, loaded data
into the tablet. This paper was attached to tablet screen (9.5” x 7.3”) and included
sound, audio description and acoustic-click feedback to reflect the room size and
acoustics. Interaction with the ATM could be in three ways: i) Moving the finger inside
a room activated its corresponding background noises (bold text; see Figure 1); ii)
Tapping twice inside the room activated the playback of text-to-speech auditory
information about the room (italic text; see Figure 1); iii) Tapping three times inside
the room activated the acoustic-click feedback. This option simulated how a finger
clicking noise would have sounded in the room. Audio production software was used to
simulate the appropriate echo feedback for the room size (i.e. larger space = longer
delay). Reverberation was applied to the sound based on room size and the materials
typically present within the room. For example, the large swimming pool room
generated an echo with a longer delay than the smaller sauna area. The pool room also
had a higher level of reverberation than the other rooms, to simulate the effects of hard
surfaces typically found in a swimming pool area. Conversely, the small shop area had
a comparatively short delay and less reverberation due to containing items for sale.
The second condition of a tactile map with verbal description involved five
minutes’ exposure to a non-interactive paper tactile map of the same fictional health
club. A verbal description of a journey through the map included a description of the
shape and size of the room, the background sounds and the objects within the room (e.g.
‘You take the door on your left and enter a large 10 x 8 metre room containing a
M. Scase et al. / Pre-Navigation via Interactive Audio Tactile Maps 173

rectangular swimming pool and a walkway running around its edge... You leave this
room by the door in which you entered it’). The sequential journey took participants
through all of the rooms. Participants were required to start at the elevator and trace
their journey using the tactile map.

Figure 1. A representation of the map developed for this experiment including description of rooms and
sound effects.

A series of 20 multiple-choice questions were developed to examine the


recollection and knowledge of the environment assessing aspects of orientation and
spatial awareness. Four questions assessed aligned directional awareness using cardinal
points (‘You are standing in the Reception facing north. In what direction is the Shop?’
[North, South, East, West]). Four questions assessed aligned directional awareness
using ordinal points (‘You are standing in the centre of the Gym facing north. In what
direction is the Cafe?’ [North-East, North-West, South-East, South-West]. Four
questions assessed misaligned directional awareness using subjective orientation (‘You
174 M. Scase et al. / Pre-Navigation via Interactive Audio Tactile Maps

are in the swimming pool room facing East. In what direction is the Bar?’ [In front,
Behind, to your left, to your right]. Four questions examined map memory by asking
participants about the fewest number of doors they would need to travel through to get
from one room to another (‘You are in the Changing Facility and you want to get to the
Sauna. What is the smallest number of doors that you would need to travel through?’ [1,
2, 3, 4]). Four questions examined room size by asking participants to identify the
largest or smallest out of two rooms (‘Which room is the largest Room?’ [Gym or
Jacuzzi Room]).

2.4. Procedure and Analysis

Phase 1: After five minutes’ exposure to one of the map conditions, participants
completed 20 multiple choice questions, for which scores between the conditions were
compared. As the questions had the potential effect of contributing to the participants
learning of the map, the questions were asked in the same order to all participants.
After completing the questions and undergoing a de-briefing, participants were invited
to explore the ATM prototype prior to Phase 2 of the study.
Phase 2: After completing Phase 1 of the study and spending time with the ATM
prototype, participants were asked a series of questions on their experience and views
on the ATM. Data were analyzed thematically using node and tree-node functions.
The study was approved by the Faculty of Technology Research Ethics Committee,
De Montfort University, Leicester, UK (reference: 1415/297, chair Bernd Stahl).

3. Results

3.1. Phase 1 Analysis

A Mann-Whitney U-test identified that the overall scores for the 20 multiple-choice
questions was significantly higher for Condition 1 (Md=15, n=7) than Condition 2
(Md=13, n=7) U=11.50, z=-1.68, p=.042 (one-tailed), r=.45 indicating a medium to
large effect size using Cohen’s 1988 criteria (i.e. .3=medium, .5=large).

Figure 2. A Box plot showing the median overall scores on the multiple choice test for both conditions.
M. Scase et al. / Pre-Navigation via Interactive Audio Tactile Maps 175

Therefore, participants performed better with the ATM than with a conventional
tactile map and verbal description.

3.2. Phase 2: Qualitative Analysis

The experience and views of participants on the ATM was explored in the second
phase. Nearly all participants used language to suggest the ATM would be useful and
beneficial to them with three themes emerging.

3.2.1. Theme 1: The value of flexible learning.


This theme reflects a positive attitude the ATMs multiple approaches towards
navigational assistance. Participants from both Conditions reflected positively about
the combination of stimuli available with the ATM: “I like this one better because
you’ve got lots of different ways of learning about the rooms” (Condition 2: Male);
“It’s better for my way of learning, having different options to choose from”
(Condition 2: Male); “The first one gives you a path, but this one, you can choose your
own” “If you forget where you’ve been, you can just go back” (Condition 1: Male); “It
is much better having the sound effects than having someone just reading a description,
it is really good to be able to hear the sounds from each room” (Condition 1: Male)

3.2.2. Theme 2: An intuitive and fun approach to learning


Six participants used language suggesting they found the ATM easy to use and
accessible. One participant commented on how intuitive she found the system: “It is
good fun, and it is easy to use. Having the sounds play makes it feel real and it results
in it feeling very intuitive”. (Condition 1: Female); “We’re used to tablet PCs, so
having something that uses the features makes it nice to use. I don’t feel like I have to
relearn much”. (Condition 1: Female); “This is much more fun to use than the other
one. The sounds make it more enjoyable” (Condition 2: Male)

3.2.3. Theme 3: Recommendations for developing and improving the ATM.


Whilst the ATM prototype was designed as a pre-navigation tool, two participants
asked if it was compatible with global positioning system (GPS) technology: “I would
find this really useful if it had a GPS built into it and it could tell me where I was. You
might want to think about doing that” (Condition 1 Male). There was a suggestion on
how to improve the alignment of the embossed map on the tablet: “I think a plastic clip
that could go on the front of the iPad would make it easier for us to position the map on
the screen” (Condition 1 Male). It was also suggested that objects of interest in rooms
could be activated by an alternative tap gesture. These comments link to notion of a
flexible system accommodating the needs of individuals.

4. Discussion

Visually impaired and blind individuals can experience challenges when navigating
unfamiliar environments [13] and tactile maps can help [23], particularly with verbal or
audio feedback [25-26]. This study compared an ATM to a more conventional, verbally
annotated tactile map [25] with a sequential journey format [28]. People using the
176 M. Scase et al. / Pre-Navigation via Interactive Audio Tactile Maps

ATM had a statistically significant higher overall score on the assessment of their
recollection and cognitive mapping of the fictitious environment. This result suggested
that the ATM was a more effective system for spatial recall than the tactile map
accompanied by a text description. A journey approach to learning an environment is
more effective than survey-based approaches among visually impaired individuals [28].
Conventional tactile maps with an audio description (like Condition 2) offer such a
method but learning is linear and in a fixed sequence. The ATM system however,
offered participants a flexible way of learning an environment allowing for both
journey and survey strategies. Participants appeared to use both strategies when
learning the environment. The map in Condition 2 contained the same information that
was presented in the ATM condition with no sound effects but rather the verbal
description of the rooms. The ATM condition provided a more multimodal approach to
learning. Blind individuals can have better memory of some auditory information [16-
17] and so the ATM might have contributed to the production of a more detailed
cognitive map.
Qualitative feedback included recommendations for improving the system which
could be incorporated into further development. The multiple approaches to learning an
environment offered by the ATM accommodated diverse learning needs of individuals.
The flexibility to learn an environment appeared important and might have been a
factor in the better recall experienced in Condition 1. Furthermore, the enjoyable
aspects of the ATM may have increased motivation and engagement among users.
An ATM system allowing multimodal learning from both survey and route
perspectives yields superior performance in the encoding and retrieval components of
cognitive mapping suggesting that this system is an effective pre-navigation tool
among individuals with visual impairments. The provision of assistive technology has
enabled people with disabilities to be less challenged by their environment. Mobility
and navigation aids can improve the wellbeing of visually impaired people [33] and the
flexible learning approach of the ATM may be a valuable addition to future assistive
technology developers. The use of a QR code printed on the swell paper linking to
mapping data for the tablet increases flexibility and sustainability of this ATM. This
project linked with a local organization Vista Blind (www.vistablind.org.uk) from
where some participants were recruited. Dissemination of this ATM technique will be
promoted initially locally and could lead to further enhancement of the ATM.

Acknowledgment. The authors thank Paul Thornton for helpful discussions.

References

[1] R. Hewett, G. Douglas, S. Keil, Wellbeing of Young People with Visual Impairments. Visual Impairment
Centre for Teaching and Research, University of Birmingham, Birmingham 2015.
[2] S.M. Kosslyn, Image and Mind, Harvard University Press, Cambridge Massachusetts, 1980.
[3] B. Tversky, Spatial Mental Models, in: The Psychology of Learning and Motivation: Advances in
Research and Theory. Academic Press Inc, San Diego, 1991. pp. 109-145.
[4] B. Tversky, Distortions in memory for maps, Cognitive Psychology 13(3) (1991), 407-433.
[5] C. Campus, L. Brayda, F. De Carli, R. Chellali, F. Famà, C. Bruzzo, L. Lucagrossi, G. Rodriguez, Tactile
exploration of virtual objects for blind and sighted people: the role of beta 1 EEG band in sensory
substitution and supramodal mental mapping, Journal of Neurophysiology 107(10), 2713-2729.
[6] R. Kupers, D.R. Chebat, K.H. Madsen, O.B. Paulson, M. Ptito, Neural correlates of virtual route
recognition in congenital blindness. PNAS 107(28) (2010), 12716-12721.
[7] E.C. Tolman, Cognitive maps in rats and men, The Psychological Review 55(4) (1948), 189-208.
M. Scase et al. / Pre-Navigation via Interactive Audio Tactile Maps 177

[8] C. Eden, Cognitive mapping, European Journal of Operational Research 36(1) (1988), 1-13.
[9] J. O’Keefe, L. Nadel, The Hippocampus as a Cognitive Map, Oxford University Press Oxford UK, 1978.
[10] N.J. Cohen, H. Eichenbaum, The theory that wouldn't die: A critical look at the spatial mapping theory
of hippocampal function, Hippocampus 1(3) (1991) 265-268.
[11] P.N. Johnson-Laird, Mental Models, in: Foundations of Cognitive Science. MIT Press, Cambridge,
Massachusetts, USA, 1989. pp. 467-499.
[12] Y. Bestgen, V. Dupont, The construction of spatial situation models during reading, Psychological
Research 67(3) (2003), 209-218.
[13] J.R. Marston, R.G. Golledge, The hidden demand for participation in activities and travel by persons
who are visually impaired, Journal of Visual Impairment & Blindness 97(8) (2003), 475-488.
[14] Z. Cattaneo, T. Vecchi, C. Cornolodi, I. Mammarella, D. Bonino, E. Ricciardi, P. Pietrini, Imagery and
spatial processes in blindness and visual impairment, Neuroscience and Biobehavioral Reviews 32(8)
(2008), 1346-1360.
[15] D. Goldreich, I.M. Kanics, Performance of blind and sighted humans on a tactile grating detection task,
Perception & Psychophysics 68(8) (2006), 1363-1371.
[16] B. Röder, H.J. Neville, Developmental functional plasticity, in: Handbook of Neuropsychology:
Plasticity and Rehabilitation. Elsevier Science, Amsterdam, 2003. pp. 231-270.
[17] B. Röder, W. Teder-Sälejärvi, A. Sterr, F. Rösler, S.A. Hillyard, H.J. Neville, Improved auditory spatial
tuning in blind humans, Nature 400 (1999), 162-166.
[18] M.A. Hersh, M.A. Johnson, Assistive Technology for Visually Impaired and Blind People, Springer-
Verlag London Ltd, London, 2008.
[19] U.R. Roentgen, G.J. Gelderblom, M. Soede, L.P. de Witte, Inventory of electronic mobility aids for
persons with visual impairments: a literature review, Journal of Visual Impairment & Blindness
102(11) (2008), 702-724.
[20] S. Ertan, C. Lee, A. Willets, H. Tan, A. Pentland, A wearable haptic navigational guidance system,
Digest of the 2nd International Symposium on Wearable Computers (1998), 164-165.
[21] R.D. Jacobson, R.M. Kitchin, GIS and people with visual impairments or blindness: Exploring the
potential for education, orientation, and navigation, Transactions in GIS, 2(4) (1997), 315-332.
[22] L. O’Sullivan, L. Picinali, A. Gerino, D. Cawthorne, A prototype audio-tactile map system with an
advanced auditory display, International Journal of Mobile Human Computer Interaction 7(4) (2015),
53-75.
[23] L. Zeng, G. Weber, ATMap: Annotated tactile maps for the visually impaired, in: Cognitive
Behavioural Systems. Lecture Notes in Computer Science, vol 7403. Springer Berlin, 2012. pp. 290-
298.
[24] C. Campus, L. Brayda, F. De Carli, R. Chellali, F. Famà, C. Bruzzo, L. Lucagrossi, G. Rodriguez,
Tactile exploration of virtual objects for blind and sighted people: the role of beta 1 EEG band in
sensory substitution and supramodal mental mapping, Journal of Neurophysiology 107 (2012), 2713-
2729.
[25] C. Graf, Verbally annotated tactile maps: Challenges and approaches, in: Spatial Cognition VII.
Lecture Notes in Computer Science, vol 6222. Springer Berlin, 2010. pp. 303-318.
[26] A. Brock, P. Truillet, B. Oriola, D. Picard, C. Jouffrais, Interactivity improves usability of geographic
maps for visually impaired people, Human-Computer Interaction 30(2) (2015), 156-194.
[27] M. Geronazzo, A. Bedin, L. Brayda, C. Campus, F. Avanzini, Interactive spatial sonification for non-
visual exploration of virtual maps, International Journal of Human-Computer Studies 85(C) (2016), 4-
15.
[28] M.L. Noordzij, S. Zuidhoek, A. Postma, The influence of visual experience on the ability to form
spatial mental models based on route and survey descriptions, Cognition 100(2) (2006) 321-342.
[29] J.W. Creswell, Research Design: Qualitative, Quantitative, and Mixed Methods Approaches, , Sage,
Thousand Oaks California, 2013.
[30] C. Teddlie, A. Tashakkori, Foundations of Mixed Methods Research: Integrating Quantitative and
Qualitative Approaches in the Social and Behavioral Sciences, Sage, Thousand Oaks California, 2009.
[31] V. Braun, V. Clarke, Using thematic analysis in psychology, Qualitative Research in Psychology 3(1)
(2006), 77-101.
[32] V. Braun, V. Clarke, Teaching thematic analysis: Over-coming challenges and developing effective
strategies for effective learning, The Psychologist 26(2) (2013), 120-123.
[33] B. Andò, S. Baglio, V. Marletta, A. Valastro, A haptic solution to assist visually impaired in mobility
tasks, IEEE Transactions on Human-Machine Systems 45(5) (2015), 641-646
178 dHealth 2019 – From eHealth to dHealth
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-178

Socially Assistive Robots (SAR)


in In-Patient Care for the Elderly
Johannes KRIEGELa,b,1, Victoria GRABNERc, Linda TUTTLE-WEIDINGERb and
Irmtraud EHRENMÜLLERb
a
Institute for Management and Economics in Healthcare, UMIT - University for Health
Sciences, Medical Informatics & Technology, 6060 Hall i.T., Austria
b
Department Gesundheits-Sozial- und Public Management, Fachhochschule
Oberösterreich, Linz, Austria
c
Alten- und Pflegeheime der Kreuzschwestern GmbH, Sierning, Austria

Abstract. In-patient care of the elderly is currently being put to the test in all
developed industrial nations. The aim is to make the resident-centered and
nursing-related care more professional. In addition to the organizational and
interdisciplinary orientation, the use of socially assistive robot technologies and
artificial intelligence is increasingly coming to the fore. By means of literature
research, expert interviews and an online survey of Upper Austrian nursing home
directors, current and future challenges and challenges for the use of socially
assistive robots (SAR) in in-patient care for the elderly were identified and
prioritized. It becomes clear that the technological and application-oriented
maturity of SAR as well as the modular adaptation of the hybrid SAR services to
the existing structures and processes from the point of view of the nursing home
management are in the foreground. In the future, it will be increasingly important
to bring the process-related and technological support of human-machine
interaction through SAR to a value-adding level.

Keywords. socially assistive robots, in-patient care of the elderly, human-machine


interaction, use cases, business model, Austria

1. Introduction

1.1. In-patient care of the elderly

Nursing care for the elderly in Austria is, in addition to the informal care provided by
relatives and mobile care for the elderly, characterized by in-patient care in retirement
and nursing homes. In-patient care of the elderly includes long-term residency in a
nursing home, where people´s needs for care are accommodated by a specialist staff
under constant supervision. In in-patient nursing homes, long-term accommodation is
usually the case. The prerequisite for this is that out-patient care or other types of care
can no longer adequately address a person´s need for care. The range of services in the
in-patient care of older people includes not only the provision of hotel services (for
example, accommodation and meals) but also nursing, therapeutic and medical services,

1
Corresponding Author: Johannes Kriegel, University of Applied Sciences Upper Austria,
Garnisonstraße 21, 4020 Linz/Austria, E-Mail: [email protected]
J. Kriegel et al. / Socially Assistive Robots (SAR) in In-Patient Care for the Elderly 179

some of which are self-provided or secured by cooperation partners. In addition, the


retirement and nursing home provides social services designed to ensure the desired
quality of life and social participation. [1]
Currently, there are about 850 homes in Austria offering nursing and in-patient
care with about 75,000 places, and the average length of stay is about 1.5 years. A
public or municipal provider runs approximately 400 of the facilities, and about 450 are
in private or church hands. In-patient care of the elderly is financed by taxes and
private funds, whereby the corresponding daily allowances are regulated by the state.
At present, the in-patient care of elderly people in Austria is determined by the
increasing needs for care and support of the residents, increasing waiting times for a
nursing home space and a lack of qualified caregiving professionals. These
developments are being addressed by increased quality assurance and the expansion of
cooperation between nursing homes and external partners (such as primary care
physicians, hospitals, service providers). The goal here is to organize and link the
nursing homes, which have previously been isolated, within the framework of regional
care networks, in a centralized, nurturing and interdisciplinary manner [2].

1.2. Residents´ orientation through socially assistive robots

In order to improve the security of supply and the quality of care in the in-patient care
of older people, it is necessary to improve the division of labor and fragmented in-
patient care for the elderly by means of more comprehensive and resident-centered
provision of services. Such optimization is increasingly carried out by technologically
supported solutions. In doing so, it increasingly uses socially assisted robotic
technologies and artificial intelligence. Socially assistive robots (SAR) are
autonomously acting robots that interact and communicate with humans or other
autonomous physical agents, following social behavior and defined rules tied to their
roles and functions [3,4,5]. A key factor of influence and success is thereby the
consideration of user and resident interests. In addition to the resident`s perspective
(e.g., wishes and possibilities of the residents), centering on residents also includes the
service and process perspective (e.g., comprehensive and barrier-free design of the care
processes) [6,7]. Important aspects in this regard are the interlinked service and service
design in in-patient care across professions and specialties. Increasingly, supportive
and digital technologies are used in the areas of nursing and therapy as well as
supporting services. What are the possible applications for the use of socially assistive
robots (SAR) to support and optimize the security of supply and quality in in-patient
care provided by nursing homes? Furthermore, it is necessary to identify the associated
challenges regarding the use of socially assistive robots (SAR) to support and optimize
the security and quality of care in in-patient care provided by nursing homes for the
elderly.

2. Methods

2.1. Identification of influences and challenges through literature research

The identification of possible influencing factors, requirements and applications of


socially assisted robot technologies in the in-patient care of older people as well as the
associated challenges was carried out by means of a semi-structured literature search.
180 J. Kriegel et al. / Socially Assistive Robots (SAR) in In-Patient Care for the Elderly

For this purpose, relevant national and international databases (e.g., Science Direct
College Edtition, Emerald Collections, Pubmed, Cochrane Library, Thieme Connect,
SpringerLink) were searched using targeted keywords or keyword combinations (e.g.,
residential geriatric care, socially assistive robots, use cases, service providers, etc.).
The identified articles, studies and reports were reviewed and their contents interpreted
in the context of the research question raised. Furthermore, the results of the literature
search were incorporated into the development and design of the survey instruments
used below (online questionnaire, expert interviews).

2.2. Survey of nursing home director`s perspectives via online survey

In order to identify possible influencing factors, requirements and applications for the
use of socially assistive robots to support and optimize the security of supply and
quality in in-patient care by nursing homes, a survey of the nursing home directors´
perspectives was carried out by means of an online survey. This focused on the current
situation and the future applications of SAR-supported service provision in the in-
patient care of older people. A distinction was made between function-related and
resident-related use cases. For this purpose, a standardized online questionnaire based
on the results of the literature search as well as six expert interviews (one nursing home
director, two nurses, three graduate social workers for elderly work, one kitchen
manager, one laundry assistant) were compiled. Based on a pre-test (n=5), the number
of application options as well as challenges to be selected and the formulation of the
questions were adopted. The online survey took place from 12/18/2018 to 1/8/2019
using the Unipark survey tool [8]. To this end, 106 nursing home directors in Upper
Austria were invited to participate by e-mail. The return was n=46, resulting in a return
rate of 48.76%.

3. Results

3.1. Factors influencing the use of SAR in in-patient care for the elderly

The use of socially assistive robot technologies in in-patient care for the elderly is
confronted with a multitude of different influencing factors, ranging from legal
regulations and access to utility services to the lack of services and mature technologies
in the context of in-patient care for the elderly. In the process of assessing them, a
systematic classification of the manifold challenges concerning the external dimensions
of environment and society, caregiving system and technological developments as well
as the internal dimensions of organization and results, information and communication,
and caregiving professionals of in-patient care for the elderly can be made (see Fig. 1).

3.2. Optional use cases for SAR in in-patient care for the elderly

Socially assistive robotics aims to support in-patient care for the elderly through social
interaction and human users (such as employees, residents) in order to add value to the
care services and associated supportive processes. In addition to the technical, legal and
economic aspects, it is also important to consider the psychological, social and ethical
J. Kriegel et al. / Socially Assistive Robots (SAR) in In-Patient Care for the Elderly 181

dimensions of robotics technologies. According to the estimates of nursing home


directors in Upper Austria, the main use cases for the use of SAR in in-patient care for

Figure 1. Factors influencing the use of SAR in in-patient care for the elderly

the elderly lie in the function-related support processes (e.g., transport of food, laundry,
care supplies) as well as in the resident-related care processes (e.g., communication,
entertainment, therapy support) (see Figure 2).

Figure 2. Prioritization of possible use cases for SAR in in-patient care for the elderly
182 J. Kriegel et al. / Socially Assistive Robots (SAR) in In-Patient Care for the Elderly

3.3. Challenges for the use of SAR in in-patient care for the elderly

Based on the intended usefulness of the use of robotic technologies in the in-patient
care of older people with regard to time saving, the focus on the core business,
workload management, documentation and cost savings, it is important to establish the
deployment of SAR conceptually and in an application-oriented way in the future. In
addition to a functioning and solution-oriented robotics technology, this also requires
its incorporation into existing or future supply and support processes. Furthermore, it is
necessary to provide the appropriate interfaces, standards and necessary infrastructures
for embedding SAR in the complex supply system. Another key success factor is the
respective acceptance on the part of employees, residents and relatives towards SAR.
Finally, a dedicated SAR services provider is required to enable and ensure hybrid
SAR services in nursing homes.
From the point of view of the surveyed nursing home directors, the embedding of
SAR technologies into the existing infrastructures and service processes as well as the
training and involvement of the employees and residents involved are critical to
success. The nursing home directors accordingly expect challenges in establishing SAR
in the nursing home, especially with regard to the integration and maintenance of
existing and required software and information technologies. In addition to the
different software requirements, the interface management between the different
software programs will be a particular challenge. In addition to the technological
realization, according to the nursing home directors, the SAR solutions must be
adapted to the respective structural and process-related circumstances and preferences.
The nursing home directors consider the publicly debated threats of data abuse, reduced
staff and monitoring by SAR and artificial intelligence to be minor challenges.

Figure 3. Challenges associated with the use of SAR in in-patient care of the elderly
J. Kriegel et al. / Socially Assistive Robots (SAR) in In-Patient Care for the Elderly 183

4. Discussion

4.1. Required business model for modular SAR services in the nursing home

In conjunction with the development of new SAR services and innovative value-added
services, customer- and solution-oriented product bundles of services have to be
combined and integrated into hardware and software as well as service elements to
form an independent new customer-specific business solution [40]. It is also important
to involve an active service provider in the development and provision of SAR services.
Without a service provider, there will be no SAR services in nursing homes! For this,
the development of a SAR business model is recommended, which illustrates the core
structure, the internal and external cooperations as well as the financial requirements of
the organization [41]. Furthermore, the business model represents the current and
future core products or services that the organization offers or wants to offer as well as
the associated objectives. In the context of an experimental research and SAR services
development, the possible and identified use cases have to be considered in concrete
terms of the twelve relevant dimensions of a SAR services business model (customer
segments; customer relationships; communication and distribution channels; revenue
streams; value propositions; emotions; key activities; key resources; key partnerships;
cost structure; ethics; legal regulation) [42].

4.2. Functionality and added value for in-patient care of the elderly

The future use of socially assistive robots in tailored, comprehensive in-patient care for
the elderly will be determined on the one hand by user requirements and on the other
hand by the associated added value [43]. In the medium term, SAR services will play a
major role in support and cooperative performance processes, not only in order to cope
with the upcoming shortage of skilled workers, but also to foster qualitative and
supportive human-machine interaction. At the same time, the delegation of non-
professional, repetitive and stressful activities to SAR gives rise to the possibility of
enhancing and expanding social human-human interactions as well as the nursing
profession [44]. The use of SAR results in a measurable benefit for the elderly and
health care. However, the development of this future scenario must actively involve
caregiving professionals and residents with their corresponding requirements,
possibilities and fears and integrate the targeted SAR solutions into the actual care
processes [45]. This requires appropriately aligned experimental research and
development as well as targeted integration and project management.

References

[1] B. Grossmann, P. Schuster, Langzeitpflege in Österreich: Determinanten der staatlichen


Kostenentwicklung. Fiskalrat, Wien, 2017
[2] I. Wilbacher, S. Scheffel, B. Glock, M. Zechmeister, Medizinische Versorgung in Pflegeheimen in
Österreich. Hauptverband, Wien, 2017
[3] T. Fong, I. Nourbakhsh, K. Dautenhahn, A survey of socially interactive robots. Robotics and
Autonomous Systems, 42 (2003), 143-166
[4] D. Feil-Seifer, M.J. Mataric, Defining socially assistive robotics. Proceedings of the International
Conference on Rehabilitation Robotics, 2005, 465-468
184 J. Kriegel et al. / Socially Assistive Robots (SAR) in In-Patient Care for the Elderly

[5] R. Bemelmans, GJ. Gelderblom, P. Jonker, L. de Witte, Socially assistive robots in elderly care: a
systematic review into effects and effectiveness. J Am Med Dir Assoc, 13 (2012), 114-120
[6] Y.H. Park, H.L. Bang, G.H. Kim, J.Y. Ha, Facilitators and barriers to self-management of nursing home
residents: perspectives of health-care professionals in Korean nursing homes. Clin Interv Aging, 10
(2015), 1617-1624
[7] T. Vandemeulebroucke, B.D. de Casterlé, C. Gastmans, How do older adults experience and perceive
socially assistive robots in aged care: a systematic review of qualitative evidence. Aging Ment Health,
22 (2018), 149-167
[8] QuestBack, Enterprise Feedback Suite EFS survey. QuestBack, Köln-Hürth, 2013
[9] M. Firgo, U. Famira-Mühlberger, Ausbau der stationären Pflege in den Bundesländern. WIFO, Wien,
2014
[10] G. Dewsbury, D. Dewsbury, Securing IT infrastructure in the care home. Nursing and Residential Care,
19 (2017), 672-674
[11] F. Kohlbacher, C. Herstatt, N. Levsen, Golden opportunities for silver innovation: How demographic
changes give rise to entrepreneurial opportunities to meet the needs of older people. Technovation, 39
(2015), 73-82
[12] G. Kojima, S. Iliffe, K. Walters, Frailty index as a predictor of mortality: a systematic review and meta-
analysis. Age and Ageing, 47 (2018), 193-200
[13] F. Hoffmann, H. Kaduszkiewicz, G. Glaeske, H. van den Bussche, D. Koller, Prevalence of dementia in
nursing home and community-dwelling older adults in Germany. Aging Clin Exp Res, 26 (2014), 555-
559
[14] E. Borowiak, J. Kostka, T. Kostka, Comparative analysis of the expected demands for nursing care
services among older people from urban, rural, and institutional environments. Clin Interv Aging, 10
(2015), 405-412
[15] U. Famira-Mühlberger, Die Bedeutung der 24-Stunden-Betreuung für die Altenbetreuung in Österreich.
WIFO, Wien, 2017
[16] S.C. Miller, J.M. Teno, V. Mor, Hospice and palliative care in nursing homes. Clin Geriatr Med, 20
(2004), 717-734
[17] T. Uhrhan, M. Schaefer, Drug supply and patient safety in long-term care facilities for the elderly.
Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz, 53 (2010), 451-459
[18] H. Cramer, H. Pohlabeln, M. Habermann, Factors causing or influencing nursing errors as perceived by
nurses: findings of a cross-sectional study in German nursing homes and hospitals. Journal of Public
Health, 21 (2013), 145-153
[19] P. Khosravi, A.H. Ghapanchi, Investigating the effectiveness of technologies applied to assist seniors:
A systematic literature review. International Journal of Medical Informatics, 85 (2016), 17-26
[20] J. Pineau, M. Montemerlo, M. Pollack, N. Roy, S. Thrun, Towards robotic assistants in nursing homes:
Challenges and results. Robotics and Autonomous Systems, 42 (2003), 271-281
[21] H.H. Tsai, Y.F. Tsai, H.H. Wang, Y.C. Chang, H.H. Chu, Videoconference program enhances social
support, loneliness, and depressive status of elderly nursing home residents. Aging Ment Health, 14
(2010), 947-954
[22] J.E. Morley, Telemedicine: Coming to Nursing Homes in the Near Future. J Am Med Dir Assoc, 17
(2016), 1-3
[23] J. van Hoof, A.M.C. Dooremalen, M.H. Wetzels, H.T.G. Weffers, Exploring Technological and
Architectural Solutions for Nursing Home Residents,Care Professionals and Technical Staff - Focus
Groups With Professional Stakeholders. International Journal for Innovative Research in Science &
Technology, 1 (2014), 90-105
[24] W.Y. Louie, D. McColl, G. Nejat, Acceptance and Attitudes Toward a Human-like Socially Assistive
Robot by Older Adults. Assist Technol, 26 (2014), 140-150
[25] E. Mariani, R. Chattat, M. Vernooij-Dassen, R. Koopmans, Y. Engels, Care Plan Improvement in
Nursing Homes: An Integrative Review. J Alzheimers Dis, 55 (2017), 1621-1638
[26] D.C. Grabowski, R.J. Town, Does Information Matter? Competition, Quality, and the Impact of
Nursing Home Report Cards. Health Service Research, 46 (2011), 1698-1719
[27] N.L. Crogan, B. Evans, B. Severtsen, J.A. Shultz, Improving nursing home food service: uncovering
the meaning of food through residents' stories. J Gerontol Nurs, 30 (2004), 29-36
[28] J. Adams, H. Verbeek, S.M. Zwakhalen, The Impact of Organizational Innovations in Nursing Homes
on Staff Perceptions: A Secondary Data Analysis. J Nurs Scholarsh, 49 (2017), 54-62
[29] R. Briggs, S. Robinson, F. Martin, D. O’Neill, Standards of medical care for nursing home residents in
Europe. European Geriatric Medicine, 3 (2012), 365-367
[30] N. Carrier, G.E. West, D.J. Ouellet, Dining experience, foodservices and staffing are associated with
quality of life in elderly nursing home residents. Nutr Health Aging, 13 (2009), 565-570
J. Kriegel et al. / Socially Assistive Robots (SAR) in In-Patient Care for the Elderly 185

[31] P. Voutilainen, A. Isola, S. Muurinen, Nursing documentation in nursing homes--state-of-the-art and


implications for quality improvement. Scand J Caring Sci, 18 (2004), 72-81
[32] C.S. Kruse, M. Mileski, A.G. Vijaykumar, S.V. Viswanathan, U. Suskandla, Y. Chidambaram, Impact
of Electronic Health Records on Long-Term Care Facilities: Systematic Review. JMIR Med Inform, 5
(2017): e35. doi:10.2196/medinform.7958
[33] H.H. Tsai, Y.F Tsai, Older nursing home residents’ experiences with videoconferencing to
communicate with family members. Journal of Clinical Nursing, 19 (2010), 1538-1543
[34] C.P. Jansen, K. Claßen, K. Hauer, M. Diegelmann, H.W. Wahl, Assessing the effect of a physical
activity intervention in a nursing home ecology: a natural lab approach. BMC Geriatr, 14 (2014), 117.
doi:10.1186/1471-2318-14-117
[35] D. Schaffler-Schaden, S. Pitzer, M. Schreier, J. Dellinger, B. Brandauer-Stickler, M. Lainer, M. Flamm,
J. Osterbrink, Improving medication appropriateness in nursing home residents by enhancing
interprofessional cooperation: A study protocol. Journal of Interprofessional Care, 32 (2018), 517-520
[36] A.R. van Stenis, J. van Wingerden, I. Kolkhuis Tanke, The changing role of health care professionals in
nursing homes: A systematic literature review of a decade of change. Front Psychol, 8 (2017), 2008.
doi:10.3389/fpsyg.2017.02008
[37] A. Kurowski, B. Buchholz, L. Punnett, ProCare Research Team, A physical workload index to evaluate
a safe resident handling program for nursing home personnel. Hum Factors, 56 (2014), 669-683
[38] M. Ko, L. Wagner, J. Spetz, Nursing Home Implementation of Health Information Technology: Review
of the Literature Finds Inadequate Investment in Preparation, Infrastructure, and Training. Inquiry, 55
(2018), 46958018778902. doi: 10.1177/0046958018778902
[39] C. Donoghue, Nursing home staff turnover and retention - an analysis of national level data. Journal of
Applied Gerontology, 29 (2010), 89-106
[40] S. Glende, I. Conrad, L. Krezdorn, S. Klemcke, C. Krätzel, Increasing the acceptance of assistive robots
for older people through marketing strategies based on stakeholder needs. International Journal of
Social Robotics, 8 (2016), 355-369
[41] T. Blackman, Care robots for the supermarket shelf: a product gap in assistive technologies. Ageing
Soc, 33 (2013), 763-781
[42] J. Kriegel, L. Reckwitz, K. Auinger, L. Tuttle-Weidinger, S. Schmitt-Rueth R. Kränzl-Nagl, New
service excellence model for e-health and AAL solutions – A framework for continuous new service
development, in: eHealth2017 - Health Informatics meets eHealth. Proceedings of eHealth2017, Vienna,
Stud Health Tech Inform, 2017. pp. 275-281
[43] M. Saborowski, I. Kollak, “How do you care for technology?” – Care professionals' experiences with
assistive technology in care of the elderly, Technological Forecasting and Social Change, 93 (2015),
133-140
[44] S.M.S. Khaksara, R. Khoslaa, M.T. Chua, F.S. Shahmehr, Service innovation using social robot to
reduce social vulnerability among older people in residential care facilities, Technological Forecasting
and Social Change, 113 (2016), 438-453
[45] T. Linner, W. Pan, C. Georgoulas, B. Georgescu, J. Güttler, T. Bock, Co-adaptation of robot systems,
processes and in-house environments for professional care assistance in an ageing society. Procedia
Engineering, 85 (2014), 328-338
186 dHealth 2019 – From eHealth to dHealth
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-186

Is Regular Re-Training of a Predictive


Delirium Model Necessary After
Deployment in Routine Care?
Sai Pavan Kumar VEERANKIa,1, Diether KRAMERb, Dieter HAYNa, Stefanie
JAUKc, Alphons EGGERTHa, Franz QUEHENBERGERd, Werner LEODOLTERb and
Günter SCHREIERa
a
AIT Austrian Institute of Technology, Graz, Austria
b
Steiermärkische Krankenanstaltengesellschaft m.b.H. (KAGes), Graz, Austria
c
CBmed, Graz, Austria
d
Institute for Medical Informatics, Statistics and Documentation, Medical University of
Graz, Austria

Abstract. Adoption of electronic medical records in hospitals generates a large


amount of data. Health care professionals can easily lose their sight on the important
insights of the patients’ clinical and medical history. Although machine learning
algorithms have already proved their significance in healthcare research, remains a
challenge translation and dissemination of fully automated prediction algorithms
from research to decision support at the point of care. In this paper, we address the
effect of changes in the characteristics of data over time on the performance of
deployed models for the use case of predicting delirium in hospitalised patients. We
have analysed the stability of models trained with subsets of data from one single
year (2012, 2013...2016, respectively), and tested the models with data from 2017.
Our results show that in the case of delirium prediction, the models were stable over
time, indicating that re-training the models is not necessary e.g. once per year might
be more than sufficient.

Keywords. machine learning, model deployment, model stability, prediction

1. Introduction

Adoption of electronic medical records (EMRs) in hospitals generates a large amount of


data. Healthcare professionals can easily lose their sight on the important insights of the
patients’ clinical and medical history. Machine learning algorithms have already proved
their significance in healthcare research [1–3] and many other fields of medicine [4].
Therefore, the application of machine learning algorithms in a real-world setting is
promising [4]. Such applications might not only increase throughput, optimize resources
and save costs for the health care providers, but they also reduce patients’ stress, improve
quality of life and increase safety.

1
Corresponding Author: Sai Pavan Kumar Veeranki, AIT Austrian Institute of Technology,
Reininghausstraße 13, 8020, Graz, Austria, E-Mail: [email protected]
S.P.K. Veeranki et al. / Regular Re-Training of a Predictive Delirium Model 187

1.1. Deploying predictive models in real-world healthcare settings

Soto et al. developed web-based services for deploying risk assessment models and
decision support tools [5], which can already be used in clinical routine. However, most
of these methods are not able to be integrated with existing EMR data and require the
manual input of model parameters. Translation and dissemination of fully automated
prediction algorithms from research to decision support at the point of care is a challenge.
In a previous paper, we have adapted the cross industrial standard process for data
mining (CRISP-DM). As compared to the original CRISP-DM standard, our approach
consists of two cycles: one represents the complete project (outer cycle) and the other
represents the predictive analytics cycle (inner cycle), as shown in Figure 1 [2].

Figure 1. Overview our data-driven decision support for health and care [2].
The inner cycle in Figure 1 is a continuous process to improve the performance of
the model with newly generated features and newly available data even after deployment.
The model is deployed into clinical workflow to perform predictions with prospective
data, which closes the outer cycle in Figure 1.
According to Brownson et al. [6], the translation of basic research to practice in
health business takes 17 to 20 years and millions of dollars in order to overcome several
sequential hurdles. In [7], authors described a method based on FHIR web services to
deploy predictive models into healthcare routine. HIMSS Analytics developed an
Adoption Model for Analytics Maturity (AMAM), which provides a score that can be
used to check the technology readiness of hospitals and health care providers to provide
personalised medicine in 7 stages. There are 5 stages before reaching stage 6, which
states the readiness to provide clinical risk intervention and predictive analytics.
According to HIMSS, there are currently only a few hospitals that reached stage 6 [8].
Apart from above mentioned hurdles, stages, social, legal, ethical, economic,
political, usability-related and organisational factors etc., from our observation, there are
intrinsic issues of the models and data themselves, which complicates the process of
deployment. Some of these issues relate to system changes over time, such as variations
of one or more of the following characteristics:
x availability of new diagnostic and therapeutic procedures
x major changes in financing and incentives
x changes of health profiles of the patients / populations
x new processes for recording the data
x changes in data quality
188 S.P.K. Veeranki et al. / Regular Re-Training of a Predictive Delirium Model

x increasing amount of data


If significant changes appear after the deployment of a predictive model, it is a
necessity to re-train the model with an updated dataset. In certain circumstances it might
even be a necessity to re-build the model from scratch. However, currently it is hard to
decide how often a model should be re-trained or re-build after an initial deployment.
This is also due to the fact that hardly anyone has yet reached this stage. Although there
are publications regarding different aspects of re-training the models to keep them up-
to-date [9], there is currently no particular evidence on how often re-training of predictive
models in healthcare is necessary.

1.2. Objective

In 2018, a delirium prediction model has been deployed within a routine clinical
workflow to support health care professionals [10]. In the current study, we analysed the
stability of the predictive models over time. Our results provide a basis for the decision,
how often the deployed delirium model will need to be re-trained or re-built during
operation.

2. Methods

2.1. Dataset

The data for this analysis were extracted from the Hospital Information Systems (HIS)
of “Steiermärkische Krankenanstaltengesellschaft m.b.H.” (KAGes) (i.e. EMR of
KAGes) which is the regional health care provider in Styria (one of the nine provinces
of Austria). KAGes has about 90% market share in terms of acute care hospital beds of
the region and has access to more than 2.1 million longitudinal health records. The
retrospective dataset has been extracted from the KAGes HIS between the years 2012
and 2017 for developing the models for the use case of predicting the occurrence of
delirium during hospitalisation at the time of admission of the patient.
For the analysis, we have identified 4,596 delirium patients as cohort from the HIS
based on certain inclusion and exclusion criteria that were published in [1] and randomly
selected 25,000 patients who had never been diagnosed with delirium at KAGes. Due to
privacy reasons, patients with extremely rare diseases and patients with no previous
records were excluded, ending up with 24,972 patients in the control group.

2.2. Feature set

Our data consisted of patients’ demographics, transfers in the hospital, procedures,


nursing assessment, laboratory values and diagnoses. Procedures, laboratory values and
diagnoses were recorded according to international standards such as International
Classification of Procedures in Medicine (ICPM), International Classification of
Diseases version 10 (ICD-10) and Logical Observation Identifiers Names and Codes
(LOINC).
The complete patient history (above mentioned feature groups) was extracted from
the HIS prior to a) the date of the admission of a random hospitalisation for the control
S.P.K. Veeranki et al. / Regular Re-Training of a Predictive Delirium Model 189

group and b) the date of admission of the hospitalisation when the patient was diagnosed
with delirium for the cohort. Altogether, the feature set comprised of 502 features.

2.3. Data preparation and predictive modelling

In order to analyse the stability of the models over time, we divided the data into six
subsets based on the year of the reference date. We selected a Random Forest (RF) as a
modelling method, since RF had outperformed various other methods for predicting
delirium in one of our previous works [1].
We trained one RF model for each year from 2012 to 2016 with 10-fold cross
validation, which provided us 10 models per year. To validate the question at stake, we
tested all the 5*10 models with the sixth subset, i.e. 2017 data. Area under the receiver
operating characteristic curve (AUROC) measures were calculated for each model with
test data for comparing the performance of the models.

3. Results

Each boxplot in Figure 2 represents the distribution of the AUROC for the 10 models of
a single year, when tested with the data from the year 2017. We found no significant
variation or trend in the performance of the models with respect to the AUROC when
comparing the models that were trained with older data to models trained with more
recent data.

Figure 2. This figure shows the AUROC distribution of the models trained with data from the years 2012 –
2016 each and tested with the data from the year 2017.
190 S.P.K. Veeranki et al. / Regular Re-Training of a Predictive Delirium Model

Table 1 summarises all the median AUROC measure, when testing the models
trained with data form one year with the data of subsequent years.

Table 1. Tabulated the median of AUROC of the models trained on data from one year and tested with the
data from other year’s
Test 2013 2014 2015 2016 2017
Train
2012 0.8820 0.8678 0.8740 0.8800 0.8857
2013 0.8678 0.8798 0.8812 0.8804
2014 0.8783 0.8847 0.8828
2015 0.8917 0.8845
2016 0.8836

4. Discussion

Our results indicate that, in the case of predicting delirium models, re-training in yearly
intervals would have not improved the model performance in comparison to that of the
models that have been trained up to five years earlier. From this finding, one might
conclude, that even in the future, re-training will not be necessary unless there are major
data revisions. However, the deployment of a delirium model itself might represent such
a major change. Therefore, re-training right after deployment might still be indicated.
Although our analysis shows that there is no need for frequent modelling in the case
of delirium prediction, these analytical methods should be applied to other clinical
questions with care. Additionally, the size of the dataset might play an important role in
this context. Further research would be necessary to find out, if similar results would be
received with smaller and/or larger datasets.
We applied 10-fold cross-validation when training each of our models. Since we
tested our models with different test-datasets (stemming from different years), cross-
validation could even have been omitted. However, by the use of cross-validation, we
could also compare the results from models trained with data form 2012-2016 with a
model trained with data from 2017. However, this has not been investigated yet.
Additionally, by the use of cross-validation we receive information concerning the effect
of slight changes in the learning data of a single year on the model performance.
We decided to look at the question at stake from a pragmatic (“top-down”) point of
view: if the model performance does not change, then re-training might not be necessary.
However, additional (“bottom-up”) analyses of the stability of the underlying features
might give valuable insights within the inner cycle depicted in Figure 1. Therefore, it
may be beneficial to monitor features during the operation of the model, in order to detect
unexpected changes in a feature’s characteristic that might have effects on the model
performance.

5. Conclusions

We found no significant decrease of model performance when applying delirium


prediction models, which have been trained with data of a single year from back to 2012
S.P.K. Veeranki et al. / Regular Re-Training of a Predictive Delirium Model 191

(five years earlier), to the data from the year 2017, indicating that re-training of the model
in regular intervals might not be critical.
However, there is a necessity to further observe the models’ performance during
their application to prospective data, as the characteristics of the clinical routine data
might change due to the deployment of such a predictive model. Major organizational or
legislative changes or changes in e.g. HIS should be scrutinized concerning their impact
on a predictive model.

Acknowledgement

This work is part of the IICCAB project (Innovative Use of Information for Clinical Care
and Biomarker Research) within the K1 COMET Competence Centre CBmed
(http://cbmed.at), funded by the Federal Ministry of Transport, Innovation and
Technology (BMVIT); the Federal Ministry of Science, Research and Economy
(BMWFW); Land Steiermark (Department 12, Business and Innovation); the Styrian
Business Promotion Agency (SFG); and the Vienna Business Agency. The COMET
program is executed by the FFG. KAGes and SAP provided significant resources,
manpower and data as basis for research and innovation.

References

[1] D. Kramer, S. Veeranki, D. Hayn, F. Quehenberger, W. Leodolter, C. Jagsch, and G. Schreier,


Development and Validation of a Multivariable Prediction Model for the Occurrence of Delirium in
Hospitalized Gerontopsychiatry and Internal Medicine Patients, Stud. Health Technol. Inform. 236
(2017) 32–39.
[2] D. Hayn, V. Sai, K. Martin, A. Eggerth, K. Kreiner, D. Kramer, and G. Schreier, Predictive Analytics
for Data Driven Decision Support in Health and Care, IT Inf. Technol. 57 (2018).
[3] A. Eggerth, D. Hayn, S. Veeranki, J. Stieg, and G. Schreier, Prediction of Readmissions in the German
DRG System Based on section sign21 Datasets., Stud. Health Technol. Inform. 253 (2018) 170–174.
[4] I.A. Scott, Machine Learning and Evidence-Based Medicine, Ann Intern Med. (2018). doi:10.7326/M18-
0115.
[5] G. E. Soto, and J. A. Spertus, EPOCH® and ePRISM®: A web-based translational framework for
bridging outcomes research and clinical practice, in: 2007 Comput. Cardiol., 2007: pp. 205–208.
doi:10.1109/CIC.2007.4745457.
[6] R.C. Brownson, G.A. Colditz, and E.K. Proctor, Dissemination and Implementation Research in Health:
Translating Science to Practice, OUP USA, 2012. https://books.google.at/books?id=vrp2oqJCPIIC.
[7] M. Khalilia, M. Choi, A. Henderson, S. Iyengar, M. Braunstein, and J. Sun, Clinical Predictive Modeling
Development and Deployment through FHIR Web Services., AMIA Annu. Symp. Proc. AMIA Symp.
2015 (2015) 717–726.
[8] VALIDATED STAGE 6 & 7 PROVIDERS LIST, (n.d.). https://www.himssanalytics.org/north-
america/stage7 (accessed February 9, 2019).
[9] M. Watson, Keeping Your Machine Learning Models Up-To-Date, Medium. (2018).
https://medium.com/ibm-watson-data-lab/keeping-your-machine-learning-models-up-to-date-
f1ead546591b (accessed March 18, 2019).
[10] S. Veeranki, D. Hayn, A. Eggerth, S. Jauk, D. Kramer, W. Leodolter, and G. Schreier, On the
Representation of Machine Learning Results for Delirium Prediction in a Hospital Information System
in Routine Care, Stud. Health Technol. Inform. 251 (2018) 97–100.
192 dHealth 2019 – From eHealth to dHealth
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-192

Photographic LVAD Driveline Wound


Infection Recognition Using Deep
Learning
Noël LÜNEBURGa,1, Nils REISSb, Christina FELDMANNc, Pim van der MEULENa,
Michiel van de STEEGa, Thomas SCHMIDTb, Regina WENDLc, Sybren JANSENa
a
Target Holding, Groningen, The Netherlands
b
Schüchtermann-Schiller’sche Kliniken, Bad Rothenfelde, Germany
c
Hannover Medical School, Department for Cardiothoracic, Transplantation
and Vascular Surgery, Hannover, Germany

Abstract. The steady increase in the number of patients equipped with mechanical
heart support implants, such as left ventricular assist devices (LVAD), along with
virtually ubiquitous 24/7 internet connectivity coverage is motive to investigate and
develop remote patient monitoring. In this study we explore machine learning
approaches to infection severity recognition on driveline exit site images. We apply
a U-net convolutional neural network (CNN) for driveline tube segmentation,
resulting in a Dice score coefficient of 0.95. A classification CNN is trained to
predict the membership of one out of three infection classes in photographs. The
resulting accuracy of 67% in total is close to the measured expert level performance,
which indicates that also for human experts there may not be enough information
present in the photographs for accurate assessment. We suggest the inclusion of
thermographic image data in order to better resolve mild and severe infections.

Keywords. heart assist device, driveline infection, infection classification,


convolutional neural network

1. Introduction

An increasing number of patients with heart failure classified as severe according to the
New York Heart Association (NYHA) Classification [1], are treated with a mechanical
support implant. This may either be for the period while waiting for the heart
transplantation, or as a permanent solution, the so-called destination therapy [7]. A left
ventricular assist device (LVAD) is a pumping device implanted onto the heart, taking
over the main pump function of the left ventricle while the heart is still functional at a
low percentage. The device relies on a permanent electrical connection to a control
module and battery pack situated on the outside of a patient’s body, through a driveline
tube. The control module collects device operation data which is a valuable opportunity
for data exchange and thus early detection of problems.
The driveline exit site is a delicate location, requiring continuous wound treatment
and wound dressing, the latter to be renewed typically once in five days. Driveline

1
Corresponding Author: Noël Lüneburg, Target Holding, Atoomweg 6B Groningen, The Netherlands,
E-Mail: [email protected].
N. Lüneburg et al. / Photographic LVAD Driveline Wound Infection Recognition 193

infections occur frequently because the driveline exit site creates a conduit for the entry
and proliferation of bacteria. This is one of the most severe adverse events for the patient,
leading to the necessity of surgical wound revision or even the replacement of the assist
device implant [8]. Driveline infection is defined as an infection affecting the soft tissues
around the driveline outlet, accompanied by redness, warmth, and purulent discharge.
Telemonitoring of driveline exit sites can provide early detection of these symptoms
and can aid in the remote diagnosis of relevant driveline infections. The majority of
LVAD patients have positive reactions towards telemonitoring [9]. Photographs of the
driveline exit site, taken by caregivers or patients themselves with their mobile devices
during renewal of the wound dressing, are sent through a mobile application to the
physician in charge in the patient’s clinic. The image will be reviewed in combination
with any available device data, clinical data and the accompanying patient-update on
their well-being or quality of life. The aim is to prevent patients from having to travel to
their clinics for check-ups too often, or to consult their local general practitioner, but
even more so not to miss out on the early detection of an upcoming adverse event. Right
now the state of the art is that patients are seen by their clinics once every three months,
without any visual monitoring in between.
Before deep learning was widely accepted as a machine learning method in the
image processing field, the support vector machine (SVM) and multi-layered perceptron
(MLP) were popular choices for computer aided image analysis in the domain of
photographic imaging [11] as well as non-photographic medical imaging [12]. Deep
learning has been applied to skin cancer classification supported by a large data set [10]
in which a deep CNN matched (and even outperformed in certain configurations)
dermatologists in classification accuracy.
In the following sections we describe three applications of deep learning which
support the diagnosis procedure by automatically predicting the presence and severity of
driveline infections based on patient photographic data. This can be executed ‘on the fly’
and will not add significant transit time to the images. The physician-in-charge then
receives the images with a severity indication, and in particular a warning sign in case of
a recognized severe infection.

2. Methods

The data set we worked on for this study consists of 745 general photographs from a total
of 61 patients, taken and provided in pseudonymized format by Schüchtermann-
Schiller’sche Kliniken and Hannover Medical School. The photographs had been taken
and stored for documentation, without being further processed for some time.
Photographs were taken from various positions and lack consistency in lighting. In
addition, photographs can be out of focus or show signs of camera motion, and part of
the wound area can be obstructed by dressing. These conditions might apply to future
images taken by patients as well and we are prepared to handle this automatically.
732 out of the 745 photographs are labelled as belonging to one of the following
three classes: no infection, mild infection, severe infection. In regular operations, labels
are assigned by clinical experts based on features such as presence of bacteria, odour and
warmth, in addition to visual features on the surface of the wound. We intentionally only
assessed the photographic data as this will be the data available from remote patient
monitoring.
194 N. Lüneburg et al. / Photographic LVAD Driveline Wound Infection Recognition

The data set is heavily imbalanced concerning the representation of the three classes,
specifically, the severe label is assigned to only 5.1% of all photographs. The distribution
for each class is listed in Table 1. The number of photographs per unique patient varies
between 1 and 38 with an average of 6.8 photographs. A severe infection case occurred
in 17 patients. For these patients on average 2.2 photographs were assigned the severe
label.

Table 1. Infection class distribution in the analysed photographic data set


Class # samples percentage
No infection 483 66.0 %
Mild infection 212 29.0 %
Severe infection 37 5.1 %
Total 732 100.0 %

The processing steps used in the machine learning classification training procedure
were as follows.
1. Detection and filtering of out-of-focus photographs,
2. driveline tube segmentation,
3. prediction of region of interest,
4. classification of wound infection class.

In the following sections each of the processing steps is explained in more detail.

2.1. Detection and filtering of out-of-focus photographs

We would like to filter out highly out-of-focus data samples from the training set to
increase the quality of the training data. The aim was to automatically remove the subset
of photographs without sufficient detail to determine the infection class.
Quantification of blur in a photograph can be done by computing the sum of the
partial second derivatives of the image in both dimensions, known as the Laplacian
operator, which has an application in autofocusing for microscopes [2]. The amount of
blur is reduced to a single number by taking the variance of the Laplacian value across
all pixels in the image.
Before the out-of-focus detection algorithm was developed a set of 692 photographs
was available, which were manually classified as either out-of-focus or clear. This
allowed us to set a threshold on the variance of the Laplacian that ensures a balanced
ratio between precision and recall for out-of-focus detection.
Whenever a device is used to take and send a photo, the out-of-focus detection could
trigger an immediate request for a repeated photograph, sent back to the patient’s LVAD
App while they are still busy with the wound dressing renewal.

2.2. Driveline tube segmentation

Drivelines may have different visual features, from an opaque white colour to
transparent, granting view on different internal cable colours, sometimes reflecting flash
lighting on their surface. They occur in all photographs and their presence may increase
the complexity of training an infection classification network if the network itself is not
able to ignore the irrelevant tube features. This section focuses on two separate
N. Lüneburg et al. / Photographic LVAD Driveline Wound Infection Recognition 195

approaches for detecting the driveline tubes which allowed us to mask and negate the
features in the driveline tube area of the image during infection classification.
In the absence of annotated photographs, a first approach made use of the
Felzenszwalb unsupervised segmentation algorithm [3]. It is a greedy graph-based
algorithm which iteratively merges adjacent pixel regions based on local and global
contrast. The Felzenszwalb algorithm is sensitive to the variations within the
photographic data, requiring parameter tuning on a per sample basis for adequate
segmentation performance, which is inconvenient for practical applications.
A supervised deep learning method may be better suitable for capturing the image
complexity. In order to facilitate supervised learning we set up a web-based annotation
service. Anonymous images were offered to annotators in a random sequence. Images
and annotation results were exchanged through a secure connection. LVAD experts were
able to use this service to visually annotate driveline regions and other skin coverage
(e.g. wound dressing) in photographs. A magnification tool allowed for the exact
drawing of the segmentation map with usual point and click devices.
A specific architecture of convolutional neural networks (CNN) called U-net [4] was
used for training on the annotated data. It is a type of semantic segmentation CNN which
can be used to assign a class label (‘driveline tube’ or ‘background’ in this case) to each
pixel in an image. Physicians used the annotation service to annotate 185 photographs
which we randomly split into 148 training and 37 validation samples. Data augmentation
is applied to the training set and ground truth annotations in the form of affine
transformations to artificially enrich the training set.

2.3. Prediction of region of interest

Experiments with multiple classification preprocessing configurations showed that


selecting a rectangular area around the driveline exit site increases performance
compared to using the full image as input to the classification module. This is also due
to the wide variety of zooming at the exit sites and wound areas in the data set.
A training/validation set was created by manually annotating 745 photographs.
Similar to tube segmentation (Section 2.2) we trained a U-net on this training set to
convert image input to region of interest “blobs” as output. The blobs, which indicate a
region of interest prediction, were converted to rectangular sections using post-
processing, which is a requirement for our classification model.

2.4. Classification of wound infection class

While the first three steps above provide methods and tools for the preparation of the
photographs to be analysed, infection class recognition is the main contribution of the
research described in this paper. We set up a classification network that learns to identify
one of the three infection classes (none, mild, severe) based on an input image.
Experiments were set up using a variety of popular CNN classification architectures.
The best performing network on our data set was the VGG-16 architecture [5], pretrained
on ImageNet [6] and fine-tuned on the driveline photographic data. The training data was
augmented using affine transformations to indirectly increase the effectiveness of the
classifier [13].
Since the labels of our training set were initially assigned using more information
than only the visual features observed in the photographs, we initiated a blind expert
evaluation. In such an evaluation we can not only compare the performance of the
196 N. Lüneburg et al. / Photographic LVAD Driveline Wound Infection Recognition

classification CNN with respect to the original labels, but also to the performance of
human experts in an identical task. The blind evaluation set consisted of 100 photographs,
containing an even division of samples from both heart clinics. The chosen class
distribution reflects the class distribution of the full data set as much as possible, while
ensuring a minimum of 15 samples per class (see Table 3). The images were drawn
randomly from the respective class’s image pool. Physicians from both clinics were
asked to provide their classification of infection class for each of the evaluation
photographs. The classification CNN was trained using the leave-one-out method to
obtain a single infection class prediction for each of the 100 photographs.
A separate experiment was set up to analyse the effects of tube segmentation on
classification performance. Segmentation masks from Section 2.2 were applied to the
photographs before feeding these to the classification CNN.

3. Results

3.1. Tube segmentation

The Felzenszwalb unsupervised segmentation algorithm was compared to the supervised


U-net semantic segmentation CNN. We measured the performance of both methods on
the annotated validation set (n=37) using the Dice score coefficient. The Dice score
quantifies pixel overlap between ground truth annotations and masks generated by
segmentation algorithms and is measured in the range of 0 – 1, where 0 is fully dissimilar
and 1 is perfect similarity. On the validation set the Felzenszwalb algorithm resulted in
a Dice score coefficient of 0.72, while U-net scored higher with a Dice score coefficient
of 0.95. An example of Felzenszwalb and U-net output can be seen in Figure 1. In this
example the Felzenszwalb algorithm predicted the area around the driveline exit as being
part of the driveline, thus masking part of the wound area.

Figure 1. Visualisation of driveline tube segmentation masks. The blue region represents the predicted
driveline tube area. Left: Felzenszwalb segmentation method (note that the non-skin background is included in
the blue region). Right: U-net segmentation method.
N. Lüneburg et al. / Photographic LVAD Driveline Wound Infection Recognition 197

3.2. Prediction of the region of interest

To assess the effect of extracting a region of interest (RoI) on classification performance


we compared the classification accuracy on full images, manual RoIs and U-net
generated RoIs on a validation set. Table 2 shows the results of each configuration. We
observe that cropping RoIs either manually or generated by U-net performed slightly
better than using the full image when evaluating infection classification performance.
Manually cropped RoIs led to slightly better results than U-net RoIs.

Table 2. Infection classification accuracy and macro (unweighted) F1 score based on different types of region
of interest (RoI) extraction methods.
RoI type Accuracy (%) F1 score
None (full image) 66.7 0.472
Manual RoI 71.7 0.498
U-net RoI 69.8 0.496

3.3 Infection classification

Two LVAD experts, one from each of the two clinics involved in the study, have
assigned infection class labels to each of the 100 photographs in the blind evaluation.
For comparison, output predictions from the classification CNN have been obtained on
the same set. We compute prediction accuracy using the original labels, and the resulting
metrics are shown in Table 3. The total accuracy of all participants, humans and machine,
is between 66% and 69%. The mean accuracy is derived from the results of the three
classes, weighted by the class distribution, as shown in Table 3. The severe infection
class, which is least represented in the data and prone to under-skin processes, shows the
lowest accuracy for all participants. Since prediction performance in this class is at least
as important as in the other classes, the macro F1 score is reported for each participant
as well. We observed that due to the lower performance on the severe infection class the
macro F1 average score of the classification CNN is lower than that of the trained
physicians.
Multiple approaches for applying tube masks (generated by the U-net segmentation
CNN) to classification input photographs were explored, such as setting the driveline
tube to a solid colour and a combination of inpainting and blurring to attempt to hide the
tubes in the photographs. In every approach in which a tube segmentation mask was
applied to classification input images, the resulting classification accuracy ended up
lower than without applying the mask.

Table 3. Prediction accuracy and F1 score of each participant providing predictions on the blind evaluation set
(n=100). The macro F1 score average is reported for each candidate, which is calculated by weighing each
class equally. Numbers in bold indicate the highest scores per class.

Physician 1 Physician 2 Classification CNN


Accuracy F1 Accuracy F1 Accuracy F1
No infection (n=58) 89.7 0.85 81.0 0.80 81.0 0.80
Mild infection (n=27) 44.4 0.50 51.9 0.47 66.7 0.57
Severe infection (n=15) 33.3 0.34 33.3 0.42 13.3 0.20
Total / macro (n=100) 69.0 0.56 66.0 0.56 67.0 0.52
198 N. Lüneburg et al. / Photographic LVAD Driveline Wound Infection Recognition

4. Discussion

In this study we explored machine learning approaches towards providing assistance in


applying photographic data for remote LVAD patient monitoring. Patients or caregivers
can provide photographs without needing clinical assistance by using a mobile device
such as a smartphone, eventually in combination with a dedicated App that might
transmit more data, e.g. from the LVAD device itself.
We have demonstrated that a U-net architecture can achieve high driveline tube
segmentation performance (Dice score coefficient of 0.95) with only 148 training
samples. When applying the segmentation masks during infection classification we
observe that applying the segmentation masks to the input does not improve
classification accuracy. It is possible that our attempts to hide the driveline tube distorts
the image, affecting relevant features. Alternatively, the network may have learnt to use
the tube for more detailed localization of the wound area. Future work could use
segmentation masks to further improve region of interest prediction, as one can derive
the driveline exit site, and therefore region of interest, from an accurate driveline
segmentation mask (for a visual example see Figure 1).
In typical specialized machine learning applications accuracy figures of 90% or
higher on multi-class problems are common. In contrast, our classification CNN achieves
67% accuracy on the blind evaluation experiment with three classes. However, we
observe that human experts did not score significantly higher on a pure visual infection
recognition task.
We can conclude that photographic data is not in all cases sufficient to accurately
determine the infection class without additional external data. Future planned
developments include the application of thermographic imaging. This is expected to lead
to improved results as infrared images can uncover the sub-surface heat sources that are
present in infection processes. This development would thus mainly improve the
infection classification performance, as more severe infections show an increase in
temperature around the wound. Smartphone devices that allow users to simultaneously
take a picture in both visible light and infrared light have recently become available on
the market.

5. Acknowledgements

The authors of this paper would like to thank Dr. Ioannis Giotis for his valuable
knowledge on skin lesion segmentation during the start of the project. In addition, we
thank Dr. Rolf Neubert for the coordination between involved parties as well as helping
to improve the writing style of the paper.

References

[1] Specifications Manual for Joint Commission National Quality Measures, New York Heart Association
(NYHA) Classification, https://manual.jointcommission.org/releases/TJC2016A/DataElem0439.html,
last access: 12.02.2019.
[2] J. L. Pech-Pacheco, et al., Diatom autofocusing in brightfield microscopy: a comparative study, in:
Proceedings. 15th International Conference on Pattern Recognition. IEEE, 2000. pp. 314-317.
[3] P.F. Felzenszwalb, D. P. Huttenlocher, Efficient graph-based image segmentation, International journal
of computer vision, 59, 2004, 167-181.
N. Lüneburg et al. / Photographic LVAD Driveline Wound Infection Recognition 199

[4] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation,
in: International Conference on Medical image computing and computer-assisted intervention. Springer,
Cham, 2015. pp. 234-241.
[5] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv
preprint arXiv:1409.1556, 2014.
[6] Stanford University, Princeton University, ImageNet, http://www.image-net.org/, last access: 22.1.2019.
[7] Pinney, S. P., Anyanwu, A. C., Lala, A., Teuteberg, J. J., Uriel, N., & Mehra, M. R. (2017). Left
ventricular assist devices for lifelong support. Journal of the American College of Cardiology, 69(23),
2845-2861.
[8] Zierer, A., Melby, S. J., Voeller, R. K., Guthrie, T. J., Ewald, G. A., Shelton, K., ... & Moazami, N. (2007).
Late-onset driveline infections: the Achilles’ heel of prolonged left ventricular assist device support. The
Annals of thoracic surgery, 84(2), 515-520.
[9] Deniz, E., Feldmann, C., Schmidt, T., Hoffmann, J. D., Hanke, J., Rojas-Hernandez, S. V., ... & Haverich,
A. (2017). The Impact of Telemonitoring in Patients with Ventricular Assist Device. The Thoracic and
Cardiovascular Surgeon, 65(S 01), ePP17.
[10] Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S. (2017).
Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115.
[11] Masood, A., & Ali Al-Jumaily, A. (2013). Computer aided diagnostic support system for skin cancer: a
review of techniques and algorithms. International journal of biomedical imaging, 2013.
[12] Wernick, M. N., Yang, Y., Brankov, J. G., Yourganov, G., & Strother, S. C. (2010). Machine learning in
medical imaging. IEEE signal processing magazine, 27(4), 25-38.
[13] Perez, L., & Wang, J. (2017). The effectiveness of data augmentation in image classification using deep
learning. arXiv preprint arXiv:1712.04621.
200 dHealth 2019 – From eHealth to dHealth
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-200

Improving Information About Private


Consultants Through Data Linkage
Melanie ZECHMEISTERa,1, Florian ENDELa
a
DEXHELPP, Vienna, Austria

Abstract. In Austria, there is no single source of truth holding information about all
physicians and their medical practice. Therefore, different sources have to be
combined to accumulate detailed information about doctors, identify data errors and
increase overall data quality. The aim of this project is to link two datasets from
vastly different origins utilizing reproduceable and mostly automatic procedures in
contrast to manually acquired links in the past. As there is no global identifier, names
and addresses of the doctors were used instead. Because of different spellings and
typos of names and addresses within and in between the datasets, direct comparison
does not lead to satisfactory results. Therefore, probabilistic matching with string
metrics was applied. The utilized methods significantly improve the linkage and
allow to match about 80 % of the private consultants in both datasets.

Keywords. data analytics, data linkage

1. Introduction

In Austria, there is no single register about all outpatient physicians and their medical
practices. Furthermore, the ambulatory outpatient healthcare system is highly
fragmented. Some physicians have contracts with some social health insurance (SHI)
institutions and get reimbursed directly. Additionally, there are non-SHI-accredited
doctors of a patient’s personal choice where out-of-pocket payment is required and
whose bills get partly reimbursed by the health insurance on request.
The "Handbuch der Sanitätsberufe" (HSB) of the publishing house Göschl contains
information of all doctors in Austria, but there is no information about doctors practice,
e.g. consultations, included. On the other hand, routinely collected administrative and
accounting data of the SHI institutions includes rich information about reimbursements
and services rendered, but hardly any information about the physicians themselves.
The aim of the project is mostly automatically link these two datasets to be able to
get a complete picture of the non-SHI physicians. The result consists of the accounting
data of the non-SHI physicians supplemented by additional information from the HSB.

2. Methods

The datasets had to be acquired, transferred securely and transformed into a usable
format. During the initial data exploration and quality assessment it became clear, that
1
Corresponding Author: Melanie Zechmeister, Verein DEXHELPP, Neustiftgasse 57-59, 1070 Vienna,
Austria, E-Mail: [email protected]
M. Zechmeister and F. Endel / Improving Information About Private Consultants 201

there are often more than one entry per doctor or office. Due to the lack of global
identifiers, names and addresses had to be used for deduplication and matching. Because
of differing spellings, abbreviations and typos of names and addresses even in the same
dataset, string metrics, which provide a measure of similarity or distance of two texts [1]
have to be applied. Visualizations and automatically generated reports enabled a direct
communication of results, possible thresholds of the distance measure, and errors in the
deduplication and matching procedure.
As the datasets are updated periodically, the process had to be implemented in a way
which allows to rerun it for the updated data with little additional effort. Therefore, the
process was implemented as a series of R-Files which call the matching algorithm for
each variable. So the process can be reexecuted also with differing datasets. One must
just identify the linking variables.

3. Results

Deduplication and linking the two datasets using string metrics highly increased the
quality of the information and data available. While there are about 27.000 entries in the
raw dataset, only 15.000 identified physicians (about 45 %) are left after deduplication.
The string matching is essential for the linkage process as well. Exact linkage leads
to about 5.800 matchings, which is about 38 % of the private consultants. After applying
the text matching, about 12.414 or 82 % can be linked. Furthermore, the threshold of the
distance metric can be adapted to the requirements of a project, e.g. resulting in more
links while accepting more false positives.

4. Discussion

It becomes apparent that the employment of text matching leads to a big improvement
of the results. Especially the automatization saves a lot of time compared to manual
linkage which has been the method of choice before.
Even so there is potential for improvement. The applied string metric and matching
procedure [2] was chosen as it is implemented in R. A systematic evaluation of different
metrics and algorithms might improve the results.
Furthermore, the manual identification and selection of the threshold is still time
consuming and depending on human interaction. Applying automatic detection of fitting
boundaries would accelerate the procedure.
In this project all matchings above the boundaries were accepted, the others were
declined, which leads to single errors. Spot-check inspections showed that some errors
could be identified manually.

References

[1] W. E. Winkler. String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of
record linkage. Proceedings of the Section on Survey Research Methods (1990), 354-369.
[2] Andreas Borg, Murat Sariyar (2019). RecordLinkage: Record Linkage in R. R package version 0.4-10.1.
https://CRAN.R-project.org/package=RecordLinage
202 dHealth 2019 – From eHealth to dHealth
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-202

Performance of Hospitals in Protecting the


Confidentiality and Information Security of
Patients in Health Information Departments
Abbas SHEIKHTAHERIa, Nasim HASHEMIb,1 and Niyoosha-sadat HASHEMIc
a
Health Management and Economics Research Center, School of Health Management
and Information Sciences, Iran University of Medical Sciences, Tehran, Iran
b
Iranian Social Security Organization, Tehran, Iran
c
Student, Islamic Azad University, Tehran North Branch, Tehran, Iran

Abstract. Keeping health information confidential is an important aspect of


managing health information. This study aimed at determining the performance of
health information management departments (HIMD) to identify the policies of
these hospitals, their similarities, and differences in their procedures in this
respect. Managers of the departments and information disclosure and medical
record staff in 22 teaching hospitals were invited to complete a questionnaire
regarding their practices in four axes including confidentiality principles,
principles of disclosure consent, disclosure information to external and internal
users. We found that there are no specific national framework and guidelines for
the disclosure of health information. Hospitals are undertaking different ways in
this regard. In most cases, patients’ consent is not considered necessary for
disclosure and only hospital managers’ or physicians’ consent is sufficient.

Keywords: health information, privacy, security, confidentiality

1. Introduction

Since health data is sensitive and private, the need for confidentiality and security of
these data is obvious [1-3]. Confidentiality refers to the protection of information
against unauthorized access or disclosure and keeping information confidential should
be conducted by controlling the access level of individuals (authorized users) in
organizations, as well as protecting information at the time of data transmission [4-6].
Therefore, one of the most important responsibilities of the Health Information
Management Departments (HIMD) in hospitals is compliance with the principles of
confidentiality and information security [7]. The HIMDs should play a significant role
in monitoring and observing laws, adhering to professional standards, and conducting
appropriate procedures for keeping health information secure and confidential [5,8].
However, research conducted in different countries indicates the high deviations of the
HIMDs with these principles [8], and notwithstanding, different countries have

1 Corresponding Author: Nasim Hashemi, Iranian Social Security Organization, Tehran, Iran, E-Mail:

[email protected]
A. Sheikhtaheri et al. / Protecting the Confidentiality and Information Security of Patients 203

different policies and procedures to protect the privacy and security of patient
information in hospitals [4]. For example, USA [9], Canada [10] and Australia [11]
have enacted regulations in this regard. Furthermore, the European Union enacted the
General Data Protection Regulation for protecting data and privacy for all individuals
within the European Union (EU) and the European Economic Area (EEA) in 2016
[12]. In Iran, there is no clear policy to maintain the security and confidentiality of
patient information. Some studies have shown that the confidentiality of medical
records was not observed appropriately [6, 7]. Additionally, some previous studies
indicated that patients are concern in this regard [13]. Considering the importance of
the observance of confidentiality rules, this study carried out to determine the
performance of HIMDs in teaching hospitals in Iran to identify their similarities and
differences as a basis for compiling the policies of confidentiality and security of health
information.

2. Method

This cross-sectional study was undertaken in 2018 in 22 teaching hospitals in Tehran,


Iran. All of the managers of HIMDs, information disclosure and medical record
officers (N=71) were invited and finally, 53 questionnaires were returned and analyzed.
A questionnaire (32 questions) was developed based ona different scenarios and
possible actions. Six questions were related to the demographic data and 24 questions
were related to the principles of compliance with the confidentiality, compliance with
the principles of information disclosure consent, and compliance with the principles of
confidentiality in responding to internal and external users. In each question, the
participants were asked to determine their practices in each case. Answer options for
each question were defined based on possible actions. Respondents could choose more
than one option or mention other procedures used in their hospitals. The validity of the
questionnaire was determined based on the opinions of experts in health information
management. Reliability was tested through test-retest (r=0.8). In order to collect data,
paper-based questionnaires were distributed to participants. The participants were
provided with the necessary explanations and time to complete the questionnaire. Data
were analyzed by frequency and percentage of each actions using SPSS version 22.

3. Results

Most participants were women (89.3%), had a bachelor's degree (85.7%) and were
specialists in the field of HIM (96.4%). The mean age of participants was 39.7 years
and the average working experience was 16.5 years. In most hospitals (Table 1),
patients do not have the right to review and request correction of their medical records
(66%) and most of them do not release patients’ information to them (47.1%). Most
hospitals receive an obligation from the users for not disclosing the contents of the
medical records (43.4%). Regarding the compliance with the principles of information
disclosure consent (Table 2), we found that obtaining permission from the hospital
administrators and authorities to disclose patient information without the patient's
consent (67.9%), and access of the hospital doctors to the medical records without the
patients’ consent and only upon request from the doctor (56.6%) were considered
adequate. Disclosure of any information requested by the users without the consent of
204 A. Sheikhtaheri et al. / Protecting the Confidentiality and Information Security of Patients

patients (43.4%) and disclosure of patients' information to their workplace and


employers with only the permission of the hospital authorities and without the patients’
consent (52.8%) were the most common processes in these hospitals.

Table 1. Compliance with the confidentiality principles of health information in HIMDs

Questions Possible policies and procedures Frequency


(percentage)
Ownership of medical records For the hospital 40(75.5)
For the Patients -
For patients and the hospital 13(24.5)
Ownership of the information recorded in For the hospital 12(22.6)
the medical records (not the physical For the patients 2(3.7)
records) For patients and the hospital 39(73.5)
Confidentiality notice and alerts to medical Receive users’ commitment to not 23(43.4)
record users disclose information
Oral reminder to prevent disclosure of 21(39.6)
information
No oral or written notice 4(7.5)
Other 5(9.4)
Patient`s right to review and apply for The patient has the right to review and -
his/her record correction apply for his/her documents
Not at all allowed 35(66)
Only with doctor's permission is allowed 13(24.5)
Other 5(9.4)
The conditions for receiving patient The information is provided to the 17(32.1)
information by the patient in the harmful appropriate person as identified by the
conditions patient
The hospital doctor reviews the medical 6(11.3)
record to disclose the information
Information is disclosed in such a way as -
to minimize the harmful effects on the
patient
Information is not disclosed to the 25(47.1)
patient at all
Other 4(7.5)
Measures to disclose information in legal Checking the signature of a patient by 6(11.3)
cases signing a consent letter
Writing a report by patient’s doctor 27(50.9)
Attaching the leaflet contains the hospital 2(3.7)
logo to the report
Other 10(16.9)

Regarding the external users (Table 3), responding to the legal requests with the
order of the hospital director (88.6%) and providing medical records to other hospitals
and external doctors with the orders of the hospital managers without patients’ consent
was 81.1%. Only 64.3% of participants stated that their hospitals had a policy for using
medical records by the researchers. Disclosure of medical record information for
A. Sheikhtaheri et al. / Protecting the Confidentiality and Information Security of Patients 205

lawyers and the authorities only by the judiciary order was only 62.6%. Regarding the
internal users (Table 4), we found that heads and managers of hospitals have
convenient and fast access to medical records without the patients’ consent (58.4%).
Only 52.8% of participants declared that they had a clear policy for the use of patients’
information in the educational programs.

Table 2. Compliance with principles of disclosure consent in HIMDs

Questions Possible policies and procedures Frequency


(percentage)
Consent to release medical record Consent is received from patient 17(32.1)
information to applicants No consent is received from patient 2(3.7)
The doctor`s permission is sufficient 8(15.1)
The permission and orders of the hospital 36(67.9)
managers and authorities are sufficient
The access of the hospital doctors Without the permission of the patient and only 30(56.6)
to medical records upon the request from the doctor
Only with the written consent of the patient 4(7.5)
Only with the permission of the hospital 15(28.3)
manager
Other 9(17)
Disclosure of medical records for With the consent of the patient 13(24.5)
the hospital where the patient is With the consent of the doctor 27(50.9)
being transferred Without any consent form 8(15.1)
Other 5(9.4)
Authorized information for Patient's name, date of reception and patient's 11(20.7)
disclosure without the consent of general condition
the patient Results of experiments, x-rays, and 6(11.3)
electrocardiogram
The current physical and psychological history 8(15.1)
of the patient
Any kind of information that the applicant asks 23(43.4)
Conditions for providing patient Only with the consent and request of patient 11(20.7)
information to their workplace Just by asking the patient’s workplace 11(20.7)
Authorization of hospital authorities is 28(52.8)
sufficient
Other 3(5.7)

Table 3. Compliance with confidentiality principles in responding to external users in the HIMDs

Questions Possible policies and procedures Frequency


(percentage)
The manner of getting medical records out of It can be taken out of the hospital -
hospital It cannot be taken out of hospital 13(24.5)
With request of judicial authorities, 39(73.5)
it can be taken out of hospital
It can be taken out of hospital to get 1(1.8)
a copy of pages
206 A. Sheikhtaheri et al. / Protecting the Confidentiality and Information Security of Patients

Table 3. Continued

Questions Possible policies and procedures Frequency


(percentage)
The access of other healthcare With the request of professionals 28(52.8)
professionals outside the hospital to With written patients’ consent 7(13.2)
patient information With permission of hospital directors 30(56.6)
Other 6(11.3)
Guideline for researchers to use It is clear 33(62.3)
medical records Not specified 10(18.9)
I do not know 10(18.9)
The requirements for providing The requested information is provided to the 2(3.7)
medical records to other hospitals applicant
and external doctors Only with patient permission 10(18.9)
With the orders of authorities and hospital 43(81.1)
directors
Other 5(9.4)
The circumstances required for Information is disclosed without patient’s 7(13.2)
providing medical records to consent
insurance companies Only with patient’s consent 13(24.5)
With the order of the hospital director 37(69.8)
Conditions for responding to requests Positive response without patient`s consent 5(9.4)
from external agencies and offices With the order of the hospital directors 47(88.6)
With the patient's consent 6(11.3)
Other 1(1.9)
The requirement for disclosure of Consent or authorization is not needed 6(11.3)
medical records to lawyers and legal Only with patient's consent 7(13.2)
authorities Only by judicial decisions 33(62.6)
Other 11(20.7)
The requirement for getting the Patient has the right to receive his/her complete 15(28.3)
patient's information by the patient information
him/herself Patient does not have the right to receive 5(9.4)
his/her complete information
Only with the approval of physician, patient has 7(13.2)
the right to receive his information
Only with the approval of the authorities and 26(49.1)
the hospital directors, patients have the right to
receive his/her information
The conditions for receiving a copy He/she does not have the right to receive any 2(3.7)
of the records by the patient copy at all
him/herself He/she can take copies of medical records 28(52.8)
He/she can only receive the copies by the 4(7.5)
approval of the physicians
He can only receive the copies by the 24(45.2)
permission of the hospital directors
Other 4(7.5)
A. Sheikhtaheri et al. / Protecting the Confidentiality and Information Security of Patients 207

Table 4. Compliance with confidentiality principles in responding to internal users

Questions Possible policies and procedures Frequency


(percentage)
Hospital authorities access (hospital They can easily access the information 31(58.4)
head and director) to patients’ Not allowed to access information 7(13.2)
medical records Only with the patient permission, they have 4(7.5)
the right to access the records
Other 13(24.5)
Hospital staff access to patients’ Their access is in accordance with their 34(64.1)
medical records personal responsibility and authority in order
to do hospital affairs
If requested, they can access records 2(3.7)
They have access only with the permission of 7(13.2)
the patient’s physician
Other 10(18.8)
The use of medical records to assess Only with the patient’s consent 6(11.3)
the quality of health care With the permission of the doctor 19(35.8)
With hospital managers’ permission 32(60.4)
Other 6(11.3)
The presence of a clear policy for The hospital has specific policies 28(52.8)
using patients' information in hospital There are no specific policies 4(7.5)
educational programs Hospital uses records at any time for training 17(32.1)
Other 4(7.5)

4. Discussion

In general, the findings show that these hospitals, in some cases, use the same
procedures, but in many cases, the current process of hospitals regarding the
confidentiality and disclosure of information in different circumstances is not the same.
Furthermore, in many cases, procedures of hospitals show that confidentiality and
security of health information is not a priority, and they provide access to patients’
information without their consents.
In the first axis, most hospitals take obligation from the users not to disclose
information. In addition, in most hospitals, patients have no right to review and request
correction of his/her information, and if the information is harmful, no information is
given to patients. In the case that the information provided under legal conditions, a
few hospitals controlled patients' consent and were mostly confined to the doctor`s
report or the hospital's director order. Few hospitals stated that if a patient requests
information from his/her records, he/she has access to this information with his/her
identification card. Furthermore, in a small number of hospitals, keeping the
confidentiality and privacy of patients was mentioned in the staff job descriptions.
Some hospitals only gave the possibility to correct patients’ identification data.
Additionally, it was found that in most hospitals, the doctors' access to patients’
information is possible without the permission of patients and only by a doctor's
request. For other users of the medical records, the permission of the hospital managers
and authorities is considered sufficient. In most cases, the disclosure of information to
the hospitals where a patient is transmitted is undertaken only with the physicians’
208 A. Sheikhtaheri et al. / Protecting the Confidentiality and Information Security of Patients

permission. In most hospitals, sending information to users did not require patients’
consent. In addition, it was determined that hospital managers have access to patients’
information without their consents, but other staffs have access to information only
within their scope of tasks. The use of medical records to assess the quality of health
care is also possible with the permission of the doctor. To send out information outside
the hospitals, most hospitals do not receive consent from the patients and only the
permission of the hospital authorities is considered sufficient, except in legal cases
where the information is sent out merely by a request from the judicial authorities.
According to the HIPAA Privacy Policy, a patient has the right to access and
control his health information [9]. According to Canadian law, health care providers
and centers are required to protect personal information and to justify them about all
information activities they carry out. Healthcare organizations should provide patients
with access to their health information [10]. In Australia, there are laws developed to
protect the privacy of health information for patients to access their health information
[11]. General Data Protection Regulation has developed a framework for data privacy,
rights of data subjects, and transfer of data for European countries [12] but in Iran it
seems that hospitals do not have a common framework to protect health information
privacy and give enough patients’ access to their information and patients cannot
control their information.
According to the HIPAA, patients’ health information should not be released
without their consent unless there is a clear reason for it; and users should also protect
it. The patients should also be informed about what information is disclosed and for
whom and why [9]. However, the findings showed that in Iran, a patient’s consent to
deliver the information is not taken into consideration, and therefore, patients do not
have control over who has access to and uses their information. Moreover, the use of
health information is allowed when it is permitted or required by the law, and the
patient has expressed his/her consent to this disclosure [14].
Use of information to achieve the primary purposes of health information
collection and other purposes such as planning, providing healthcare services,
allocating resources, managing errors and risks, and improving the quality of care,
training of health care providers is allowed without patients’ consent, unless they have
expressly announced their disagreement for this [15]. This issue is partly observed in
Iranian hospitals. Application of health information for research, evaluation of
healthcare quality and education do not dependent on patients’ consent. Although the
educational use of information is permitted [16,17], but the identity of patients should
not be released [18] and students should be responsible for maintaining health
information [19].
In the case of research, the written consent of patients is required unless based on
the ethics committee, written consent of patients is not required or researchers use a
limited set of data without the patient's identification data [20]. In other words, if the
study needs identity information, the approval of the ethics committee should be
available [21]. In some cases, such as health and medical research that benefits the
community, and there is no possibility of obtaining consent from patients, instead of
the consent form, appropriate mechanisms should be taken into account to protect the
privacy of health information [22]. In Iranian hospitals, researchers are allowed to
access the information with the permission of hospital managers, and researchers
should have the ethics committee permission. Therefore, this issue is respected in our
hospitals.
A. Sheikhtaheri et al. / Protecting the Confidentiality and Information Security of Patients 209

In summary, this study showed that in our country, there are no specific national
frameworks and guidelines for the disclosure of health information and their privacy
and security, and hospitals are conducting different ways in this regard. Also, in many
cases, international principles are not respected. Therefore, a specific framework for
security and confidentiality of health information may be developed in order to protect
the confidentiality and security of health information in both electronic and manual
medical record systems.

References

[1] G.S. Poduri, Confidentiality and patient records. AP Journal of Psychological Medicine. 14(2)
(2013),110- 113.
[2] N. Hajrahimi, S.M. Hejazi Dehaghani, A. Sheikhtaheri, Health information security: A Case study of
three selected medical centers in Iran. Acta Informatica Medica. 21(1) (2013), 42-45.
[3] J.R. Junges, M. Recktenwald, H.D Raymundo, et al. Confidentiality and privacy of information about
patients treated by primary health care teams: a review. Revista Bioética. 23(1) (2015), 200-206.
[4] T. NaseriBooriAbad, A. Sheikhtaheri. Information privacy and pervasive health: Frameworks at a glance.
Journal of Biomedical Physics and Engineering. (2019), In press.
[5] M. Langarizadeh, A. Orooji, A. Sheikhtaheri, Effectiveness of Anonymization methods in preserving
patients’ privacy: a systematic literature review. Studies in Health Technology and Informatics. 248
(2018), 80-87.
[6] E. Mehraeen, H. Ayatollahi, M. Ahmadi, A Study of information security in hospital information systems,
Health Information Management. 10(6) (2014), 779-788.
[7] A. Hajavi, M. Khoushgam, M. Hatami, A Comparative study on regarding rate of the privacy principles
in legal issues by WHO manual at teaching hospitals. Journal of Health Administration. 33(11) (2007),
7-16.
[8] M. Farzandipour, Policies for providing medical records at hospitals. Dissertation of Health Information
Management. (2002).
[9] K.A. Wager, F.W. Lee, J.P. Glaser, Health care information systems: a practical approach for health care
management. John Wiley & Sons (2017).
[10] A. Thorogood, Protecting the privacy of Canadians’ health information in the cloud. Can J Law
Technol. 14 (2016), 173-213.
[11] New South Wales information and privacy commission, health records and information privacy Act
2002.
[12] General Data Protection Regulation (GDPR). (2016) Available from: https://gdpr-info.eu/
[13] A. Sheikhtaheri, M.S Jabali, Z.H. Dehaghi. Nurses' knowledge and performance of the patients' bill of
rights. Nursing Ethics. 23(8) (2016), 866-876.
[14] Government of Newfoundland and Labrador: Department of Health and Community Services, The
personal health information act policy development manual. (2011).
[15] Personal Health Information Protection Act. (2004) Available from:
https://www.ontario.ca/laws/statute/04p03.
[16] Uconn Health. Policy: Use of protected health information (phi) in education (POLICY NUMBER
2014-07). (2014) Available from: https://health.uconn.edu/policies/wp-
content/uploads/sites/28/2015/07/policy_2014_07.pdf.
[17] UT Health HIPAA compliance program: Office of regulatory affairs and compliance: Using protected
health information (PHI) for education. (2014).
[18] M. Abdelhak, S. Grostick, M.A. Hanken, Health Information: Management of a Strategic resource.
Elsevier Health Sciences, (2014).
[19] University of Hawaii HIPAA training program, Appropriate uses of protected health information for
educational purposes. (2014).
[20] UCI Office of Research. Protected Health Information (HIPAA). (2015) Available from:
http://www.research.uci.edu/compliance/human-research-protections/researchers/protected-health-
information-hipaa.html.
[21] Office of the Information and Privacy Commissioner, The health information Act: use and disclosure of
health information for research.
[22] C. O'Keefe, D. Rubin, Individual privacy versus public good: Protecting confidentiality in health
research. Statistics in Medicine. 34(23) (2015), 3081-3103.
210 dHealth 2019 – From eHealth to dHealth
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-210

Patient Record Linkage for Data Quality


Assessment Based on Time Series
Matching
Alphons EGGERTHa,b,1, Dieter HAYNa, Karl KREINERa, Sai VEERANKIa,b, Heimo
TRANINGERc, Robert MODRE-OSPRIANa and Günter SCHREIERa,b
a
AIT Austrian Institute of Technology GmbH, Graz, Austria
b
Graz University of Technology, Graz, Austria
c
ZARG Zentrum für ambulante Rehabilitation GmbH, Graz, Austria

Abstract. Background: Huge amounts of data are collected by healthcare providers


and other institutions. However, there are data protection regulations, which limit
their utilisation for secondary use, e.g. research. In scenarios, where several data
sources are obtained without universal identifiers, record linkage methods need to
be applied to obtain a comprehensive dataset. Objectives: In this study, we had the
objective to link two datasets comprising data from ergometric performance tests in
order to have reference values to free text annotations for assessing their data quality.
Methods: We applied an iterative, distance-based time series record linkage
algorithm to find corresponding entries in the two given datasets. Subsequently, we
assessed the resulting matching rate. The implementation was done in Matlab.
Results: The matching rate of our record linkage algorithm was 74.5% for matching
patients’ records with their ergometry records. The highest rate of appropriate free
text annotations was 87.9%. Conclusion: For the given scenario, our algorithm
matched 74.5% of the patients. However, we had no gold standard for validating our
results. Most of the free text annotations contained the expected values.

Keywords. medical record linkage, data analysis, ergometry, exercise test, cardiac
rehabilitation

1. Introduction

Clinical trials are commonly limited to a specific study population, which cannot
represent the target population in full detail. Additionally, in clinical trials there is a
limited time of observation and a relatively small number of subjects. To get a more
comprehensive view, different strategies can be followed. On the one hand, researchers
are trying to reuse available datasets from various clinical trials by combining them to
bigger datasets (“record linkage”), e.g. using EUPID [1]. On the other hand, routine care
increasingly relies on information and communications technology (ICT), which leads
to huge amounts of patient data. As they are documenting treatments and their outcomes
under real-world conditions, routine data are a highly valuable resource for research
studies (“secondary use”).[2]

1
Corresponding Author: Alphons Eggerth, AIT Austrian Institute of Technology GmbH,
Reininghausstraße 13, 8020 Graz, Austria, E-Mail: [email protected]
A. Eggerth et al. / Patient Record Linkage for Data Quality Assessment 211

Secondary use of healthcare data brings high responsibilities for the research team.
The research environment needs to meet legal and ethical requirements, which limit the
usage of the data. A very important aspect is the protection of the patients’ personal
information (e.g. GDPR for Europe [3], HIPAA for the US [4]). Along with providing
secure data storage and computing environments, data needs to be cleaned from
identifying elements. Thus, names, social security numbers, telephone numbers, etc.
must be removed from the datasets. [2]
Once all requirements are met, there is the question for the optimal record linkage
algorithm. Usually, several datasets are given, which need to be linked. In an optimal
case, a universal identifier or common patient pseudonyms exist, which can be used to
directly connect all the records. However, there are situations, when such a universal
identifier (e.g. for privacy reasons) or common patient pseudonyms (e.g. due to various
origins of the datasets) are not present. To enable linkage of de-identified datasets in such
situations, two datasets need to share some of their fields, i.e. some information needs to
be present in both datasets (e.g. date of birth, ZIP code, sex), which contain enough
information to obtain a unique combination of values for each patient (see k-anonymity
concept [5]). While deterministic approaches try to match records through rule-based
algorithms, probabilistic approaches rely on statistical methods to calculate weights for
the available parameters, which are then applied for estimating matching probabilities
[6-9].
In our study, we obtained data from two different sources: a) patient record data
from a manually entered database (exported as an Excel file) as well as b) raw data,
metadata and manually entered free text annotations of ergometric performance tests
recorded with the used ergometers (exported as separate XML files). We tried to link
these two data sources to validate the XML files’ free text annotations. However, there
was neither a universal identifier nor common patient pseudonyms nor was the data
format suited for commonly used record linkage algorithms. Thus, we transformed the
data and applied a distance based time series record linkage approach, as proposed by J
Nin and V Torra [10].
This paper is organised as follows: For the given datasets, we first present the
application of the time series record linkage algorithm. Second, we investigate the quality
of the XML files’ free text annotations.

2. Methods

Pseudonymised ergometry data from the rehabilitation centre ZARG Zentrum für
ambulante Rehabilitation GmbH were obtained comprising of an Excel file containing
manually entered database entries from 1.538 cardiac rehabilitation patients as well as of
29.876 XML files that had been recorded with ergometers and contained data from one
ergometric performance test each. In this paper, we use “PAT file” as a notation for the
Excel file and “ERGO files” as a notation for the XML files. However, the date ranges
of the datasets were just partly overlapping. Thus, for most of our analyses, the PAT file
entries after 13.06.2017 were not used. For a detailed description of the source datasets
see Table 1. All analyses were conducted using Matlab (The MathWorks, Natick, US).
During pre-processing, two pseudonymised IDs of the ERGO files were dismissed,
as they had more than 100 performance test entries, which is very unlikely for a common
patient. Furthermore, minor typos (e.g. year = “2217” instead of “2017”), which became
obvious due to unexpected outcomes during implementation, were corrected.
212 A. Eggerth et al. / Patient Record Linkage for Data Quality Assessment

As an initial step, the information from the ERGO files was parsed and transformed
into a table. After comparing the ERGO and the PAT dataset with each other, we
extracted for all available performance tests the date and six parameters providing
identical entries in both datasets. We converted all dates and parameter values to integer
values and created a separate table for dates and for each of the parameters both for PAT
and ERGO: a) for the PAT files we arranged the integers in the columns with one row
for each patient (referred to as “PAT tables”) and b) for the ERGO files, we arranged the
integers in the columns with one row for each ID (referred to as “ERGO tables”).

Table 1. Properties of the ergometry data received from the ZARG rehabilitation center.
Property PAT file (= Excel file) ERGO files (= XML files)
Content Four sheets of manually entered patient Each ERGO file contained the
information and results from ergometric data of one performance test.
performance tests. Every sheet The data comprised of raw
represented one of four ergometric data (e.g. heart rate curves,
performance tests, which had been workload step profiles,
conducted during the cardiac ECGs), metadata and
rehabilitation program: sometimes annotations (e.g.
x Start of phase 2 free text entries denoting the
x End of phase 2 reason for the test). Files for
x Start of phase 3 the same patient were linked
x End of phase 3 by a pseudonymised ID.
Number of records 1,538 patients 29,876 ergometric performance
tests
Origin of the dataset Cardiac rehabilitation Cardiac rehabilitation;
by order of a physician;
as part of a training program.
First performance test 11.02.2013 22.01.2004
Last performance test 13.09.2018 13.06.2017
Identical entries in Sex, age, height, weight, maximum workload value of applied step profile,
both datasets maximum heartrate value during performance test, date of performance test.

2.1. Record linkage algorithm

We implemented an iterative, distance-based time series record linkage algorithm to


match the de-identified patient record data of each cardiac rehabilitation patient from the
PAT file with his/her corresponding, pseudonymised ID in the ERGO files, which he/she
had obtained during ergometry. The matching was done in up to six iterations (one for
each parameter) for each patient, one patient after the other. In every iteration, for each
parameter one iteration step was done, which consisted of 5 sub-steps (see Figure 1). For
each iteration, the one parameter, which resulted in the minimum number of remaining
IDs, was chosen. If more than one ID remained, only these dates and values of the ERGO
tables, which were related to the remaining IDs, were used for the next iteration. Already
chosen parameters were omitted in further iterations. Within each iteration step, we
applied a criterion for testing equality (see Formula 1), and marked all identical date-
value pairs. Then, we counted the number of these exact matches for every ID of the
ERGO tables and kept the IDs with the maximum number.
The iterations were continued, until finally only one single ID remained, which was
then chosen to be the matched ID for the current patient. However, if in the end more
than one rehabilitation patient was linked to the same ID, the ID was assigned to a patient,
if she/he alone had the highest total number of exact matches during the last iteration.
A. Eggerth et al. / Patient Record Linkage for Data Quality Assessment 213

ܾܽ‫ݏ‬ሺ݀ܽ‫݁ݐ‬௩௔௟௨௘ଵ െ ݀ܽ‫݁ݐ‬௩௔௟௨௘ଶ ሻ  ൅ ܾܽ‫ݏ‬ሺ‫݁ݑ݈ܽݒ‬ଵ െ ‫݁ݑ݈ܽݒ‬ଶ ሻ  ൌൌ Ͳ (1)

Figure 1. Depiction of one iteration step, which is run several times during an iteration and contains 5 iteration
sub-steps. For every iteration step, the date tables of PAT and ERGO are used along with the value tables of
PAT and ERGO for the currently evaluated parameter. In the PAT table up to four entries are available for
each patient. In sub-step 1, a patient is selected from the PAT tables and one of her/his four date-value pairs is
selected. In sub-step 2, the ERGO tables contain dates and values in their columns and each of their rows
represents a pseudonymised ID. The date-value pair selected in sub-step 1 is now subtracted from all the values
and dates of the ERGO tables. In this way, values that are identical will result in zeros. In sub-step 3, the
minimum of the differences from sub-step 2 is calculated for each row and stored (in the same column as in
the PAT tables and in the same row as in the ERGO tables). Sub-steps 1-3 are done for the up to four date-
value pairs of the selected patient. In sub-step 4, the entries equal to 0 (= exact matches) are counted for each
row of the table resulting from sub-step 3 and stored to another table. Now, for the patient selected in sub-step
1, this table shows in each row the number of exact matches between her/his date-value pairs and the date-
value pairs of this row from the ERGO tables. In sub-step 5, the IDs from the ERGO tables’ rows, which have
the maximum number of exact matches, are stored for the selected patient. Sub-steps 1-5 are done for each
parameter to finally choose the one, which results in the fewest IDs in sub-step 5.
214 A. Eggerth et al. / Patient Record Linkage for Data Quality Assessment

2.2. Mapping of free text classifications

At ZARG, cardiac rehabilitation patients can participate in two phases of rehabilitation,


which are denoted “phase 2” and “phase 3”. At the start and at the end of each phase, a
performance test is conducted to track the patient’s performance. While the reason of a
performance test can easily be obtained from the PAT file due to the various sheets as
“start of phase 2”, “end of phase 2”, “start of phase 3” and “end of phase 3”, the ERGO
files merely contain a field with free text entries (“ReasonForStudy”).
Using the unambiguous matching results from the implementation of the record
linkage algorithm, we retrieved the entries of the “ReasonForStudy” fields from the
ERGO files: First, for the matched IDs, we arranged all their date and “ReasonForStudy”
values in a table. Then, we extracted all “ReasonForStudy” entries for the four types of
reasons by using the respective dates from the PAT file together with the matched IDs.

3. Results

3.1. Record linkage algorithm

With our record linkage algorithm, we initially obtained 761 matches for the full PAT
file, which equals to 49.5%. Removing all patients, which had at least one date value
outside the overlapping date range of PAT file and ERGO files, the matching rate was
74.5%. Detailed results can be found in Table 2. For further analyses, the matches of the
overlapping date range were used.

Table 2. Results of applying our record linkage algorithm. A “matched patient” is a patient of the PAT file,
which can be unambiguously linked to a single pseudonymised ID of the ERGO files. For the column “PAT
file until 13.06.2017” all patients with values after 13.06.2017 (= date of the last performance test of the ERGO
files) were omitted.
Property Full PAT file PAT file until 13.06.2017
Number of patients 1,538 877
Matched patients 761 (49.5%) 653 (74.5%)
IDs matched to more than one 206 (13.4%) 129 (14.7%)
patient (thus rejected)
Patients without a matching ID 571 (37.1%) 95 (10.8%)

3.2. Mapping of free text classifications

As described in section 2.2, the free text entries of the ERGO files for the reason of the
performance test could be obtained from the “ReasonForStudy” field. Thus, for the
overlapping time range, we collected all these free text entries for each of the four reasons.
Table 3 shows the obtained free text entries for each of the four performance test reasons
together with the number of their occurrence.
For “start of phase 2” 167 free text entries were obtained through the matching IDs.
78.4% of these entries contained the expected string “Erstuntersuchung Phase II”. 15.6%
of the records contained an empty string. Thus, only very few unrelated entries remained.
A. Eggerth et al. / Patient Record Linkage for Data Quality Assessment 215

Table 3. Available free text entries from the ERGO files, which could be unambiguously matched to PAT
entries for the respective performance test reasons. There can be four different reasons, relating to the current
stage of the cardiac rehabilitation program (“start of phase 2”, “end of phase 2”, “start of phase 3”, “end of
phase 3”). Only entries of the PAT file, which were within the overlapping time range of both data sources
(11.02.2013 – 13.06.2017) were considered for the matching.
Start of phase 2 (167 free text entries obtained from the ERGO files)
Free text entry from the matched ERGO files Number of occurrences
“Erstuntersuchung Phase II” 131 (78.4%)
“ ” (empty string) 26 (15.6%)
“Erstuntersuchung Phase III” 3 (1.8%)
“Abschlußuntersuchung Phase II” 1 (0.6%)
“Abschlußuntersuchung Phase III” 1 (0.6%)
“Anfangsuntersuchung ProHeart” 1 (0.6%)
“CAVE!! Hr. [name] [birthdate] Erstuntersuchu” (sic!) 1 (0.6%)
“EU II” 1 (0.6%)
“Pro-Heart 3” 1 (0.6%)
“ZU Proheart” 1 (0.6%)
End of phase 2 (174 free text entries obtained from the ERGO files)
Free text entry from the matched ERGO files Number of occurrences
“Abschlußuntersuchung Phase II” 153 (87.9%)
“ ” (empty string) 17 (9.8%)
“AU II” 1 (0.6%)
“Erstuntersuchung Phase II” 1 (0.6%)
“Proheart ZU” 1 (0.6%)
“Zwischenuntersuchung Phase III” 1 (0.6%)
Start of phase 3 (184 free text entries obtained from the ERGO files)
Free text entry from the matched ERGO files Number of occurrences
“Erstuntersuchung Phase III” 111 (60.3%)
“ ” (empty string) 49 (26.6%)
“Zwischenuntersuchung Phase III” 6 (3.3%)
“Abschlußuntersuchung Phase III” 5 (2.7%)
“Erstuntersuchung Phase II” 4 (2.2%)
“Pro-Heart 2” 2 (1.1%)
“Abschlußuntersuchung Phase II” 1 (0.5%)
“Anfangsuntersuchung ProHeart” 1 (0.5%)
“EU Phase III” 1 (0.5%)
“Eingangsuntersuchung Phase III” 1 (0.5%)
“Pro Heart” 1 (0.5%)
“Pro Heart ZU” 1 (0.5%)
“Rehaabbruch” 1 (0.5%)
End of phase 3 (194 free text entries obtained from the ERGO files)
Free text entry from the matched ERGO files Number of occurrences
“Abschlußuntersuchung Phase III” 149 (76.8%)
“ ” (empty string) 32 (16.5%)
“Pro Heart” 3 (1.5%)
“Erstuntersuchung Phase III” 2 (1.0%)
“10/10/1min” 1 (0.5%)
“Abschlußuntersuchung Phase III Verl.” 1 (0.5%)
“Anschlussuntersuchung Phase III” 1 (0.5%)
“Kontrolluntersuchung” 1 (0.5%)
“Pro Heart / Herzverband” 1 (0.5%)
“Rehaabbruch Phase III” 1 (0.5%)
“Vorzeitiger Rehaabbruch/AU PH3” 1 (0.5%)
“Zwischenuntersuchung Phase III” 1 (0.5%)
216 A. Eggerth et al. / Patient Record Linkage for Data Quality Assessment

For “end of phase 2” 174 free text entries were obtained through the matching IDs.
87.9% contained the expected string “Abschlußuntersuchung Phase II”. Only 9.8% of
the entries contained an empty string and the total number of remaining unrelated values
was 2.3%. Thus, “end of phase 2” showed the highest rate of accurate entries along with
the fewest empty strings and the lowest number of unrelated entries. For “start of phase
3” 184 free text entries were obtained. Only 60.3% of the performance tests were tagged
with the expected entry “Erstuntersuchung Phase III”. With 26.6%, more than a quarter
of the performance tests was annotated with an empty string. Also, the number of
unrelated values was the highest in comparison to the other phases, totaling in 13%. The
final reason “end of phase 3” showed similar characteristics like “start of phase 2”. Of
194 obtained free text entries, there were 76.8% of expected entries containing the string
“Abschlußuntersuchung Phase III” and 16.5% empty strings. The number of unrelated
entries was 6.7%.
While “start of phase 2” and “end of phase 3” had a similar rate of accurate entries,
the rate of “start of phase 3” was lower and the rate of accurate entries of “end of phase
2” was comparably higher than for the other reasons of the performance tests.

4. Discussion

Looking at the outcome of this study, the applied time series record linkage algorithm
achieved a matching rate of 74.5% and the observed free text entries were in accordance
with our expectations for up to 87.9% of the entries. However, at this time, we had no
gold standard for evaluating the accuracy of our matches. For more reliable analyses of
the resulting combined dataset, the datasets should be linked by patient pseudonyms.
Another issue were the different date ranges of the two data sources. While the PAT
file contained records ranging from 2013 to 2018, the ERGO files contained records
ranging from 2004 to 2017 only. Obviously, no matches outside the overlapping date
range were possible and considering the full date range, only half of the patients (49.5%)
from the PAT file could be unambiguously matched to their IDs of the ERGO files.
Looking at the overlapping date range, 74.5% of the patients could be matched.
For our matching approach, we assumed that no patient had more than one ID in the
ERGO files and allowed only one single linkage between PAT file patients and ERGO
file IDs. Thus, if a patient would have had two IDs in the ERGO files, one “correct”
linkage would have been dismissed.
Even if only patients of the overlapping date range were considered for this study,
the gathered knowledge could still be used for the full date range of the datasets. There
were up to 87.9% of correctly entered free text entries, which gives the reassurance that
these entries are quite reliable.
The proposed record linkage algorithm can be used to combine de-identified datasets
to one comprehensive, de-identified dataset, which could be the basis for further insights.
However, it is not possible to identify single patients or to recreate personal information.
Formula 1 gives the used criterion for testing equality. It was chosen, because our
implemented routine transformed all parameter time series to date-value pairs in integer
format for easier handling. For identical values at the same date, this criterion was
logically true, which allowed to count these entries as exact matches. For allowing some
distance between two values instead of only counting exact matches, adaptations would
be needed: The values’ dates would separately need to be checked for an exact match,
A. Eggerth et al. / Patient Record Linkage for Data Quality Assessment 217

while the values themselves would be allowed to diverge within some boundaries (e.g.
± 5 bpm for the maximum heart rate).
The matching results show, that on the one hand, the proposed matching algorithm
was suitable for the given scenario, as unrelated free text entries were very rare. On the
other hand, the free text entries showed to be very accurate.

5. Conclusion

For the given scenario with two data sources containing identical entries for some of their
parameters, our iterative, distance-based time series record linkage algorithm achieved a
matching rate of 74.5%. Furthermore, the free text annotation entries in the ERGO files
were in accordance with our expectations for up to 87.9% of the entries, which showed
that inclusion of the PAT file will be unnecessary for our future analyses of this dataset.

6. Conflict of Interest

To the authors’ knowledge, no conflicts of interest were given.

7. Acknowledgement

This work was partly funded by the Austrian Research Promotion Agency (FFG) as part
of the project EPICURE under grant agreement 14270859.

References

[1] D. Hayn et al., "IT Infrastructure for Merging Data from Different Clinical Trials and Across
Independent Research Networks," (in eng), Stud Health Technol Inform, vol. 228, pp. 287-91, 2016.
[2] S. Dusetzina, S. Tyree, A. Meyer, A. Meyer, L. Green, and W. Carpenter, "Linking Data for Health
Services Research: A Framework and Instructional Guide.," University of North Carolina at Chapel
Hill, Rockville, MD, 2014, vol. AHRQ Publication.
[3] G. Chassang, "The impact of the EU general data protection regulation on scientific research," (in
eng), Ecancermedicalscience, vol. 11, p. 709, 2017.
[4] R. Nosowsky and T. J. Giordano, "The Health Insurance Portability and Accountability Act of 1996
(HIPAA) privacy rule: implications for clinical research," (in eng), Annu Rev Med, vol. 57, pp. 575-
90, 2006.
[5] K. El Emam and F. K. Dankar, "Protecting privacy using k-anonymity," (in eng), J Am Med Inform
Assoc, vol. 15, no. 5, pp. 627-37, 2008 Sep-Oct 2008.
[6] A. Sayers, Y. Ben-Shlomo, A. W. Blom, and F. Steele, "Probabilistic record linkage," (in eng), Int
J Epidemiol, vol. 45, no. 3, pp. 954-64, 06 2016.
[7] G. P. Oliveira, A. L. Bierrenbach, K. R. Camargo, C. M. Coeli, and R. S. Pinheiro, "Accuracy of
probabilistic and deterministic record linkage: the case of tuberculosis," (in eng|por), Rev Saude
Publica, vol. 50, p. 49, Aug 2016.
[8] Y. Zhu, Y. Matsuyama, Y. Ohashi, and S. Setoguchi, "When to conduct probabilistic linkage vs.
deterministic linkage? A simulation study," (in eng), J Biomed Inform, vol. 56, pp. 80-6, Aug 2015.
[9] I. P. Fellegi and A. B. Sunter, "A theory for record linkage," Journal of the American Statistical
Association, vol. 64, no. 328, pp. 1183--1210, 1969.
[10] J. Nin and V. Torra, "Distance Based Re-identification for Time Series, Analysis of Distances.," in
Privacy in Statistical Databases. PSD 2006. Lecture Notes in Computer Science, vol. 4302, J.
Domingo-Ferrer and L. Franconi, Eds. Berlin, Heidelberg: Springer, 2006.
218 dHealth 2019 – From eHealth to dHealth
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-218

eHealth Service for Integrated Care and


Outpatient Rehabilitation – Pilot
Application of the Tyrol Stroke Pathway

Kristina REITERa,d,1, Julia RUNGEb, Stefan WELTEa, Theresa GELEYc, Clemens


RISSBACHERb and Peter KASTNERa
a
AIT Austrian Institute of Technology GmbH, Graz, Austria
b
Landesinstitut für Integrierte Versorgung, Tirol Kliniken GmbH, Innsbruck, Austria
c
Tyrolean Health Fund, Tyrolean Government, Innsbruck, Austria
d
University of Applied Sciences FH Joanneum, Graz, Austria

Abstract. Background: Stroke is one of third most common causes of death and the
main cause for permanent disabilities. The Tyrol Stroke Pathway covers, all steps
from stroke onset to outpatient rehabilitation. Objectives: The main objective of this
paper is to describe how the paper-based documentation in the outpatient
rehabilitation can be implemented in an eHealth service for integrated care.
Methods: First a state analysis followed by a requirement analysis was performed.
An interactive mock-up was designed for further discussion with the stakeholders.
After the implementation of the system the evaluation was performed in two steps:
feedback from a virtual test phase and a pilot operation was analyzed. Results: First
experiences during the virtual test phase with key stakeholders of the therapy
pathway showed a high level of acceptance. Users reported an improvement in the
communication and documentation processes. Conclusion: Initial results illustrate
how a shift from paper-based documentation to an integrated eHealth service can
improve communication and documentation in an independent therapy network.

Keywords. integrated care, stroke, eHealth, health service, health care delivery,
patient care management

1. Introduction

In Austria and in most other countries, stroke is one of third most common causes of
death and the main cause for permanent disabilities [1][2]. However, compared to other
countries, Austria has a low mortality rate in patients having a stroke [3]. Consequently,
the demand for post-stroke care is high. Efficiency and quality of interprofessional
interaction and communication in rehabilitation teams of physiotherapy, occupational
therapy and logopedics could be improved through a better patient-oriented inter-
professional communication [4][5].
Tyrol, a state in western Austria, implemented the Tyrol Stroke Pathway, which
covers the care of stroke patient. The care of patients follows a structured rescue and

1
Corresponding Author: Kristina Reiter, AIT Austrian Institute of Technology GmbH,
Reininghausstraße 13, University of Applied Sciences FH Joanneum, Eckerstraße 30i, Graz, Austria, E-Mail:
[email protected].
K. Reiter et al. / eHealth Service for Integrated Care and Outpatient Rehabilitation 219

treatment chain, which can be divided into the following phases: prehospital phase, the
hospital phase, the inpatient rehabilitation and the outpatient rehabilitation. An
immediate treatment in specialized centers (Stroke Units) and an integrated care after the
stationary treatment is recommended by several guidelines [6][7].
“It was commenced, as a long-term routine-care program and aimed to include all
patients with stroke in the survey area. During the period of implementation of the
comprehensive stroke management program, thrombolysis administration increased and
clinical outcome significantly improved” [8]. The Tyrol Stroke Pathway now is
implemented in all counties and the outpatient rehabilitation in seven out of nine counties
of Tyrol as a routine standard and is generally accepted [9].
Tyrol is characterized by rural and urban areas surrounded by Central Europe’s main
mountain range, which leads to several challenges, especially in transportation and for
organizing outpatient rehabilitation.
The outpatient rehabilitation has the aim to provide rehabilitation in the proximity
of the patients’ home. Therefore regional, multidisciplinary stroke networks had been
established, covering the hospitals and the disciplines physiotherapy, occupational
therapy and logopedics, as well as general practitioners, neurologists, nursing homes,
coordinating organizations (“Sozialsprengel”) and discharge managers of the acute care
hospital. The outpatient rehabilitation of the Tyrol Stroke Pathway is implemented in
already existing health care structures. The discharging hospital contacts the local
coordinator, who is a member of the coordinating organization, in order to get therapy
organized early enough to have a good transfer for the patient. The treatment plan follows
scientific standards, which are defined in the Tyrol Stroke Pathway [8]. Outpatient
rehabilitation is based on the International Classification of Function (ICF). Therapy
sessions take place at home in order to learn to participate again in everyday life and to
handle activities of daily life.
The Stroke Pathway is well developed, but according to the evaluation of the project
2017 [9] administration burden in terms of organizing the ambulant stroke rehabilitation
and administration burden in order to submit invoices as well as a lack of communication
was reported by the network members. Reducing administration time and providing
information immediately and simultaneous to regional network members of the patient
is strived.
The aim of this paper is to describe how an eHealth service for integrated care and
outpatient rehabilitation has to be designed and implemented to support the
characteristics of the interdisciplinary network of the ambulant stroke rehabilitation. The
paper further intends to solve how to enhance communication and information exchange
and reduce administration burden in the therapy network for ambulant stroke
rehabilitation and to evaluate user feedback from virtual test phase and pilot operation.
Our hypotheses are to reduce problems which came along with the paper-based
documentation, to reduce the time between discharge and the beginning of the treatment
and to reach acceptance throughout the treatment network.

2. Methods

Based on the annual report of the Tyrol Stroke Pathway program [9] and the results of
the gap analysis it was decided to set up an electronic, virtual communication and
documentation platform and to fully replace the paper-based documentation [10].
220 K. Reiter et al. / eHealth Service for Integrated Care and Outpatient Rehabilitation

The first step was to analyze the current procedures followed by a requirement
analysis. The manual for network members [10] and the set of different paper-based
forms built the basis for the analysis. Additional requirements were identified during
regular meetings with the pathway experts of the Governmental Institute of Integrated
Healthcare (“Landesinstitut für Integrierte Versorgung”) and members of the treatment
network.
After analyzing the most important requirements, an interactive mock-up was
designed with the software “Justinmind” (www.justinmind.com, San Francisco, USA)
and discussed with the stakeholders. After few adjustments the implementation was
realized by the AIT Austrian Institute of Technology with the existing eHealth platform
KIOLA.
Along with the electronic documentation tool the aim was to provide information
to network members immediately and to save time in the administrative process. First
the management of prescriptions was supported by implementing a standardized
workflow for the prescription in the outpatient rehabilitation followed by the approval of
the health insurance. The first follow up prescription is approved by a general practitioner
and the second is approved by the neurologist, after determining the mRS (modified
Ranking Scale) in a so called 3-Month-Assessment.
A key indicator for quality in the treatment pathway is the time between discharge
and the beginning of the outpatient rehabilitation, indicated by the first therapy session.
Due to financial regulations it is important that geographically the nearest therapist
attends the patient. To guarantee a quick organization of an interdisciplinary team a tool
based on Google Maps was implemented. It allows to identify and contact the nearest
therapists, which means Google Geometry Library [11] was used for the calculation of
the distances and Google Places Library [12] was used for address search with
geographic coordinates.
A central element of the treatment network in the outpatient care is the interdisciplinary
meeting between the therapists (physiotherapist, logopedics, occupational therapist).
This interdisciplinary meeting has the purpose that the treatment is planned and
coordinated between the interdisciplinary team of therapists. The challenge in the
interdisciplinary documentation process is to ensure that every therapist is able to create
treatment goals and further evaluate these treatment goals after the treatment process. In
addition to the interdisciplinary meeting an outcome check is performed with SINGER
(Scores of Independence for Neurologic and Geriatric Rehabilitation,
https://www.singer-assessment.de/). The results of the SINGER-Assessment are
uploaded and stored as PDF-documents.
Evaluation of the novel IT service was planned in two steps. First step: a virtual test
phase with ten dummy data sets representing patient cases in the outpatient rehabilitation
was performed. All relevant system roles were used by key stakeholders to test the
documentation and process quality. Second step: a pilot operation of the system in real
use in the western part of Tyrol for a period of three months has been set up. For
evaluation of the pilot phase the authors will use the Information System Success Model
Survey. This survey is based on the Delone&McLean Information System Success
Model [13]. It consists of questions to access the six dimensions: information quality,
system quality, service quality, intention to use, user satisfaction and net benefits. The
instrument also contains open questions on benefit and possibilities for improvements.
The questionnaire has been adapted for this evaluation but is not formally validated.
K. Reiter et al. / eHealth Service for Integrated Care and Outpatient Rehabilitation 221

3. Results

Although the treatment path in the country of Tyrol had been in use for several years, the
status and requirements analysis showed that the paper-based documentation of the
ambulant rehabilitation needs to be replaced. The key element of the paper-based system
is a patient folder which includes different administrative, medical and therapeutic
information. The folder was used by all health care professionals to document results of
therapy sessions together with administrative information.
The patient folder is kept by the patients for keeping it readily available for all therapist
during the home visits and for the physicians during office visits. During the
requirements analysis it has become evident that the current paper-based system tends to
cause errors (e.g. the patient folder might get lost) and it is time consuming, especially
in the organization of appointments and the evaluation of the processes. Potential for
useful and time-saving functionalities has been identified:
• Communication and coordination between health care professionals
• Assignment of therapists in close vicinity to the patients
• Enabling an easier approval process by the responsible health insurance
companies
• Enabling a correct billing process for the therapists with the health insurance
companies
• Support a transparent execution and documentation of the SINGER Assessment
Additionally, regional discrepancies in processes have been identified. The organization
of the prescription is managed differently.
The outpatient rehabilitation workflow is shown in Figure 1. The workflow, shown in
Figure 1, has not changed with the implementation of the electronic documentation.
Further details about the outpatient rehabilitation workflow are described in [10].

Figure 1. Visualization of the outpatient rehabilitation process.


222 K. Reiter et al. / eHealth Service for Integrated Care and Outpatient Rehabilitation

The central elements in the electronic documentation tool are the patient list
combined with a list of tasks as well as a checklist for every individual patient. The
checklist is adapted for every role (discharge manager, local coordinator, therapist) in
the system.

On the left side of Figure 2 an extract of the checklist for the local coordinator is shown.
The comprehensive checklist is divided into eight sections:
• Registration
• First prescription
• Network Management

Figure 2. Google Maps based organization tool for identifying and assigning the nearest therapist.
K. Reiter et al. / eHealth Service for Integrated Care and Outpatient Rehabilitation 223

• Interdisciplinary Meeting
• Therapy goals
• Documents and Information
• First follow up prescription
• Second follow up prescription
The checklist is linked with compulsory entry fields and shows green buttons
whenever a process step (e.g. entry of the discharge date) is fulfilled. Notifications via
e-mail including a direct link to the required task are triggered via the electronic
documentation tool in order to ensure a fast process handling.
During the discharge process the local coordinator is responsible for organizing the
therapeutic network close to the patient’s home. In Figure 2 the organization tool for the
therapeutic network is shown. Based on the patient’s main residence the nearest
therapists are shown in the map and can be contacted.

3.1. Virtual test phase

In December 2018 we started the virtual test phase. Based on the evaluation of the test
phase we identified potential for changes. The accessibility of documentation made by
therapists has been changed to avoid parallel usage of other communication channels,
for example WhatsApp or E-Mail. The second request concerns the return of the
approved prescription by the health insurance. A Fax2Mail solution is planned since the
processes in the health insurance does not allow an integrated solution. A high
acceptance in the test phase could be accomplished, which also became apparent in a
high participation rate in the training workshop for the pilot phase with over 60
participants.

3.2. Pilot operation

In January 2019 the training workshops started for all participating partners in the pilot
region of Landeck and Imst, which are located in the western part of Tyrol. All
participants signed the terms of use to take part at the pilot phase, which was planned for
a duration of 3 months with up to 30 patients. The first patient was registered on 1 st of
February at the hospital in Zams. As part of an accompanying evaluation, user surveys
are ongoing carried out according to the Information System Success Model [13]. First
results of the evaluation will be ready to be presented at the dHealth conference at the
end of May 2019.

4. Discussion

To sum up, this paper has highlighted the implementation process of an electronic
communication and documentation platform in the outpatient rehabilitation of stroke
patients. Additionally, an evaluation through a virtual test phase and a pilot phase was
performed. Especially the interdisciplinary character of the outpatient stroke
rehabilitation workflow and the independent character of the network members was a
challenging factor in the realization. The implemented eHealth service for integrated care
and outpatient rehabilitation was the first step to shift from paper-based documentation
224 K. Reiter et al. / eHealth Service for Integrated Care and Outpatient Rehabilitation

to an electronic documentation and communication workflow. Several other issues


remain to be addressed. Future work will concentrate on an electronic approval process
with interfaces to health insurances, general practitioners and neurologists. Additionally,
an electronic accounting system for the billing of provided health care services and
medical treatments between health insurances and therapists was also requested by
members of the therapy network and should be considered in a future version. With the
increasing availability of the Austrian Electronic Health Record (ELGA), another issue
in the ongoing implementation work should address interoperability with the ELGA
infrastructure based on given standards to avoid parallel documentation whenever
possible. Furthermore the integration of rehabilitation centers will help to close the gap
between the inpatient and outpatient rehabilitation processes.
This study has gone some way towards enhancing our understanding of an
interdisciplinary therapy management network. This is beneficial for the rollout of the
intended routine system which is expected to be utilized by about 500 system users in
twelve different roles with about 50 process steps and for 1.000 patient cases per year.
This shows also the complexity and the demand of an interdisciplinary communication
and documentation platform. Our experiences could conceivably be applied to similar
therapy management workflows or networks in the outpatient area.

References

[1] Öffentliches Gesundheitsprotal Österreichs, Schlaganfall, https://www.gesundheit.gv.at/krankheiten/


gehirn-nerven/schlaganfall/inhalt, last access: 26.1.2019.
[2] World Health Organization, Global Health Estimates, https://www.who.int/healthinfo/
global_burden_disease/en/, last access: 26.1.2019.
[3] A.G. Thrift, T. Thayabaranathan, G. Howard, V. J. Howard, P.M. Rothwell, V. L. Feigin, D.A. Cadilhac,
Global stroke statistics, International Journal of Stroke 12(1) (2017), 13–32.
[4] S. Franz, J. Muser, U. Thielhorn, C.W. Wallesch, J. Behrens, Inter-professional communication and
interaction in the neurological rehabilitation team: a literature review, Disability and Rehabilitation
(2018) 20, 1-9.
[5] K. K. Miller, S. H. Lin, M. Neville, From Hospital to Home to Participation: A Position Paper on Transition
Planning Poststroke, Archives of Physical Medicine and Rehabilitation (2018)
[6] Deutsche Gesellschaft für Allgemeinmedizin und Familienmedizin, Schlaganfall DEGAM-Leitlinie Nr. 8,
omikron, 2012.
[7] T. Steiner, S. Juvela, A. Unterberg, C. Jung, M. Forsting, G. Rinkel: European Stroke Organization
Guidelines for the Management of Intracranial Aneurysms and Subarachnoid Haemorrhage. Cerebrovasc
Dis, 35 (2013), 93-112. doi: 10.1159/000346087
[8] J. Willeit, T. Geley, J. Schöch, H. Rinner, A. Tür, H. Kreuzer, N. Thiemann, M. Knoflach, T. Toell, R.
Pechlander, K. Willeit, N. Klingler, S. Praxmarer, M. Baubin, G. Beck, C. Dengg, K. Engelhardt, T.
Erlacher, T. Fluckinger, W. Grander, J. Grossmann, H. Kathrein, N. Kaiser, B. Matosevic, H. Matzak,
M. Mayr, R. Perfler, W. Poewe, A. Rauter, G. Schoenherr, H.R. Schoenherr, A. Schinnler, H. Spiss, T.
Thurner, G. Vergeiner, P. Werner, P. Willeit, S. Kiechl, Thrombolysis and clinical outcome in patients
with stroke after implementation of the Tyrol Stroke Pathway: a retrospective observational study, The
Lancet Neurology 14(1)(2015), 48-56, doi: 10.1016/S1474-4422(14)70286-8. Epub 2014 Nov 28.
[9] Tiroler Gesundheitsfonds Gesundheitsplattform, Integrierter Patientenpfad. Behandlungspfad
Schlaganfall Tirol. Berichtsjahr 2017. https://www.tirol.gv.at/fileadmin/themen/gesundheit-vorsorge/
krankenanstalten/downloads/TGF/schlaganfall/downloads/integr_patientenpfad_behandlungspfad_schl
aganfall_tirol_berichtsjahr2017.pdf, last access: 12.2.2019.
[10] Tiroler Gesundheitsfonds. Reformpoolprojekt. Integrierter Patientenpfad. Behandlungspfad Schlaganfall.
Handbuch für Netzwerkpartner 2017, https://schlaganfallpfad.tirol-kliniken.at/2017/TP_4_Pdf_2017/
01_HANDBUCH_Netzwerkpartner_Ambulante%20RHEHA_2017.pdf, last access: 26.1.2019.
K. Reiter et al. / eHealth Service for Integrated Care and Outpatient Rehabilitation 225

[11] Google Developers Site, https://developers.google.com/maps/documentation/javascript/geometry, last


access: 29.1.2019
[12] Google Developer Sites, https://developers.google.com/maps/documentation/javascript/places, last
access: 29.1.2019
[13] W.H. Delone, E.R. McLean, The DeLone and McLean model of information systems success: a ten-year
update. Journal of Management Information Systems 19(4) (2003), 9–30.
226 dHealth 2019 – From eHealth to dHealth
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-226

Expressing Patient Selection Criteria Based


on HL7 V3 Templates Within the Open-
Source Tool ART-DECOR
Simon OTTa,1, Christoph RINNERa and Georg DUFTSCHMIDa
a
Section for Medical Information Management, Center for Medical Statistics,
Informatics and Intelligent Systems, Medical University of Vienna, Austria

Abstract. Background: Reuse of EHR data for selecting patients who are eligible
for clinical research can substantially improve the recruitment process. ART-
DECOR is an open-source tool that is commonly used to design and publish HL7
V3 templates of national (e.g. ELGA) and international EHR initiatives. Objectives:
Extend ART-DECOR to allow the definition of criteria that may be used for patient
selection. Methods: Using the native ART-DECOR development framework we
extended existing ART-DECOR template associations by allowing conditions to be
formulated. Results: An editor for the specification of conditions was implemented.
The resulting criteria are internally translated to XPath expressions and can be
immediately applied to CDA documents. As a prototypical application of our
approach we implemented a “Trial Criteria Evaluator” tool that allows trial
eligibility criteria to be composed of our ART-DECOR criteria and have them
checked against a patient’s CDA documents. Conclusion: Referring to HL7
templates, our criteria can be applied to documents of national EHR systems such
as ELGA and hereby reach a broad patient cohort. Implementing our approach
within ART-DECOR alleviates its reuse and enhancement by other researchers.

Keywords. electronic health records, patient selection, reference standards

1. Introduction

According to a recent WHO study, almost every second member state of the EU already
has an operative national electronic health record (EHR) system in place [1]. In Austria,
the ELGA system [2] was started in 2015 and aims to finalize the rollout phase in the
outpatient sector this year.
Documents stored in EHR systems are frequently formatted according to the HL7
Clinical Document Architecture (CDA) standard [3]. Since the CDA model is very
generic, HL7 V3 templates [4] are used to specify the structure and content of particular
document types. ART-DECOR (https://art-decor.org) is an open-source tool and
methodology that is commonly used to design and publish HL7 V3 templates of EHR
systems. Templates are available for the ELGA CDA document types (http://elga.art-
decor.org/) and of various other international EHR initiatives (https://art-
decor.org/decor/services/Statistics?list=bbrs).

1
Corresponding Author: Simon Ott, Section for Medical Information Management, Center for Medical
Statistics, Informatics and Intelligent Systems, Medical University of Vienna, Spitalgasse 23, A-1090 Vienna,
Austria, E-Mail: [email protected]
S. Ott et al. / Expressing Patient Selection Criteria Based on HL7 V3 Templates 227

Routine data recorded in EHRs have become an interesting source for clinical
research [5]. A typical step in a clinical research project is the identification of a patient
cohort with particular characteristics. This may for example be patients with a particular
diagnosis and age, who would be eligible for a particular clinical trial or patients, who
received a particular treatment and should be interviewed in the course of an outcome
study. An automatic identification of patients based on their EHR data [6] could
substantially reduce the high effort and rate of missed patients that typically characterize
manual patient recruitment [7].
ART-DECOR allows the definition of high-level information needs, such as ‘age’
or ‘diagnosis’, and map them to a component of a CDA template that holds the
corresponding data. These so-called ART-DECOR “concepts” and “template
associations” can be stored in individual ART-DECOR project files. Currently they are
not sufficient for the definition of fine grained criteria to select patients as they do not
support the specification of particular conditions that the mapped template components
would have to satisfy, such as “age ≥ 6” and “diagnosis = type 1 diabetes”.
In this paper we therefore present an extension of ART-DECOR that allows the
definition of criteria that may be used to identify patients with particular characteristics
relevant for a clinical research project. They are based on ART-DECOR template
associations and thus include a reference to those components of a CDA document that
hold the source data of the criteria. The criteria are stored in the ART-DECOR project
file and can then be applied to a patient’s CDA documents to check whether they satisfy
the criteria.

2. Methods

We will explain the requirements for our extension of ART-DECOR by means of an


example. Assume that patients need to be checked for an elevated white blood cell count
as a prerequisite to participate in a clinical trial. In the current version of ART-DECOR,
we can define a concept “white blood cell count” (see fig. 1 top) of type Quantity and
associate it with code “26464-8” of code system LOINC (OID = 2.16.840.1.113883.6.1).

Figure 1. Definition of concept “white blood cell count” with associated LOINC code in ART-DECOR (top).
Concept “white blood cell count” is mapped to element “hl7:observation” of template “Laboratory Observation”
that is used in ELGA CDA laboratory reports for storing laboratory data (bottom).
228 S. Ott et al. / Expressing Patient Selection Criteria Based on HL7 V3 Templates

We can further map concept “white blood cell count” to a suitable HL7 V3 template
that refers to a particular component of a CDA document, which would hold the white
blood cell count value of a patient (see fig. 1 bottom).
For the definition of our desired criterion “elevated white blood cell count” the
following points are missing:
x Specification of conditions (e.g., we consider the white blood cell count to be
elevated if the observation’s “hl7:interpretationCode” element holds code “H”
or “HH”)
x Detailing of template associations if a concept is mapped to a generic template
(e.g., in fig. 2 the generic template “Laboratory Observation” will only hold a
white blood cell count value, if the observation’s “hl7:code” element holds the
LOINC code “26464-8” for Leucocytes)
x Executing the criterion against a CDA document to check whether the
contained data satisfy the criterion

For implementing the first two points we decided to develop an editor that allows
the specification of conditions for an ART-DECOR template association. For the third
point, we added a testing component to ART-DECOR that allows the specified criteria
to be immediately executed against an uploaded CDA document. All extensions should
be implemented within the native ART-DECOR development frameworks (i.e. Orbeon
Forms and eXist-db) to easily integrate them into an existing ART-DECOR environment
using the ART-DECOR package manager.
In order to demonstrate a potential prototypical application of our extension, we
planned to develop a “Trial Criteria Evaluator” tool. It should allow to evaluate whether
a patient may be a potential candidate for a clinical trial based on his/her CDA documents.

3. Results

3.1. Adding conditions to ART-DECOR template associations

Originating from an existing template association, our extension allows the required
“implicit” and “explicit” conditions to be added to define a criterion (see fig. 2).

Figure 2. Conditions can be added to existing template associations.

For the definition of implicit conditions, existing ART-DECOR metadata


specifications (i.e. fixed value constraints, terminology associations) are offered to the
user (see fig. 3). As an example, the fixed value constraint for the template’s element
S. Ott et al. / Expressing Patient Selection Criteria Based on HL7 V3 Templates 229

“hl7:templateId” can be used to create the implicit condition that concept “elevated white
blood cell count” should only refer to CDA observations that hold the OID
“1.3.6.1.4.1.19376.1.3.1.6” in their “hl7:templateId” element. Further, the terminology
association defined for concept “elevated white blood cell count” can be used to create
the implicit condition that this concept should only refer to observations that hold code
“26464-8” in their “hl7:code” element.

Figure 3. Specification of implicit conditions.

Explicit conditions are defined by the user by manually specifying one or more
statements and linking them by Boolean operators (see fig. 4). Each statement is
composed of an attribute of a template element, an operator and a comparison value. All
attributes as predetermined by the template element’s datatype and its child elements are
offered for selection. As an example, element “hl7:code” is of datatype CE (Coded with
Equivalents) and thus allows attributes “code”, “codeSystem”, “codeSystemName”,
“codeSystemVersion”, “displayName”, and “nullFlavor” to be selected. Further,
operators "=", "≠", "<", "≤", ">", "≥" and "IS NULL"(i.e., value of the attribute is empty)
are available to formulate a statement.
All data concerning the defined conditions are stored as additional elements of the
corresponding <templateAssociation> component within the ART-DECOR project file.

Figure 4. Specification of explicit conditions.


230 S. Ott et al. / Expressing Patient Selection Criteria Based on HL7 V3 Templates

3.2. Checking criteria against CDA documents

In order to allow the specified criteria to be checked also independently of our tool, we
automatically translate them to XPath expressions. Hereby, we logically link all implicit
and explicit conditions of a criterion with a Boolean AND operator and generate the
correct XML Schema Datatypes [8] (e.g., numerical comparison values are converted to
xs:double, values of HL7 datatype TS or any flavor are converted to xs:dateTime). The
expression derived from a criterion’s conditions is used as the predicate of the XPath.
The node-test of the XPath is the root element of the template or in case of multiple root
elements of a template the parent of these elements as a wildcard. The generated XPath
(see fig. 7) uses the HL7 defined XML namespace for v3 “urn:hl7-org:v3” [9].
For an immediate checking of the criteria we also implemented a testing component
within our tool. It allows the user to upload a CDA document and execute the generated
XPaths against the document. All components of the CDA document that are found by
means of the XPath and thus satisfy the criteria are displayed.

Figure 5. Trial Criteria Evaluator. Criteria defined in ART-DECOR project “Test” were used to compose four
trial criteria of a diabetes trial (https://clinicaltrials.gov/ct2/show/NCT01390480). Three CDA documents were
uploaded to be checked against the trial criteria.
S. Ott et al. / Expressing Patient Selection Criteria Based on HL7 V3 Templates 231

3.3. Trial Criteria Evaluator

The “Trial Criteria Evaluator” (see fig. 5) demonstrates in a prototypical way, how our
extension could be applied to check a patient’s eligibility for a clinical trial.
Being an Orbeon forms application, it can be added to an existing ART-DECOR
environment by installing the corresponding package in the eXist-db. As it is completely
independent of ART-DECOR itself, it may also be installed as standalone version
without ART-DECOR. It allows the user to load the criteria that he/she defined within
ART-DECOR and combine them with Boolean operators to form complex trial inclusion
and exclusion criteria.

Figure 6. Result screen. At the top, the total result and an overview of satisfied inclusion/exclusion criteria is
shown. Below, a detailed report is displayed that shows for each criterion and document, whether the criterion
is satisfied/not satisfied by the document’s data or whether the document does not contain the required data.
Inclusion criteria are depicted in green background color, exclusion criteria in red.

Figure 7. XPath for criterion “Hypercalcemia (>2.65 mmol/L) of fig. 6.


232 S. Ott et al. / Expressing Patient Selection Criteria Based on HL7 V3 Templates

The user can then upload a set of documents of a particular patient and evaluate
whether the patient could be a potential candidate for the trial. Hereby all uploaded
documents are checked iteratively whether they satisfy one of the trial criteria. Our
evaluation is conservative insofar, as we assume that the uploaded documents represent
only a subset of the patient’s complete medical history [10]. We thus have to expect that
there may be additional data that we are not aware of but may nevertheless satisfy a trial
criterion. Consequently, the only safe assessment that our trial criteria evaluator is
capable of, is to exclude a patient if the uploaded documents satisfy one or more
exclusion criteria. Otherwise it concludes that the patient may be eligible and displays a
report of which criteria are satisfied by the uploaded documents, respectively for which
criteria the uploaded documents do not contain the required data (see fig. 6).
For each criterion the corresponding generated XPath can be displayed (see fig. 7).

4. Discussion

The work presented in this paper is the result of an ongoing bachelor thesis of the first
author and is thus of preliminary nature. The final results will be presented at the
conference. We further plan to make our extensions of ART-DECOR publicly available
as open-source code by then.
Various suggestions have been made to automate the identification of patient cohorts
based on EHR data [11]. These approaches typically refer to institutional EHR systems
with proprietary data models. This limits the number of patients that can be addressed
and requires individual mappings of the selection criteria to the data model of each single
EHR system. Fernandez-Breis et al. suggest to map the selection criteria to standardized
EHR data defined by means of openEHR archetypes [12]. Our approach is similar but is
based on the more prevalent HL7 CDA standard and associated HL7 v3 templates. We
further implemented our method within the open-source tool ART-DECOR that is used
within several national EHR system initiatives. The EHR4CR platform [13] uses a
distributed architecture, where trial criteria can be defined at a central server and
transmitted to clinical data warehouses (CDW) of participating hospitals. It requires
individual mappings of the criteria expressed in the central ECLECTIC syntax [14] to
the data models of each single CDW. A similar approach is pursued by SHRINE [15],
which allows queries to be distributed to CDWs that are based on the i2b2 [16] model.
As a prerequisite the participating CDWs have to support common i2b2 ontologies.
Compared to earlier projects, the first main contributions of our work is that our
criteria directly refer to elements of HL7 CDA documents and hereby make use of the
knowledge of the CDA structure as specified within HL7 v3 templates. They may thus
be applied to the document types of national EHR systems such as ELGA and hereby
reach a broad patient cohort. Our second main contribution is that we implemented our
approach as an extension of the open-source tool ART-DECOR that is widely used in
the course of (inter)national EHR initiatives. This alleviates reuse and further
enhancement of our work by other researchers.
Several alternatives exist for the expression and execution of criteria, such as Arden
Syntax or SNOMED CT. Applying ontology-based semantic knowledge [17] in the
processing of criteria would also be a useful extension of our work, e.g. when searching
for patients with different types of lung diseases. A comparison with these alternative
methods is beyond the scope of the present paper but is planned for a more elaborate
version of this work.
S. Ott et al. / Expressing Patient Selection Criteria Based on HL7 V3 Templates 233

Our tool’s functionality to generate XPath expressions for ART-DECOR template


associations could also be useful in other domains, where data has to be retrieved from
CDAs. If, for example, a particular lab value of a patient’s CDA documents has to be
processed in an Arden Syntax MLM, the required XPath to be applied within the MLM’s
curly braces clause could be easily generated by our tool.
When implementing an automated identification of patients that might be eligible
for a research project, obviously existing judicature with respect to data protection would
have to be strictly considered.

References

[1] World Health Organization, Global eHealth survey 2015, http://portal.euro.who.int/en/data-


sources/ehealth-survey-2015/, last accessed January 10, 2019.
[2] S. Herbek, H.A. Eisl, M. Hurch, A. Schator, S. Sabutsch, G. Rauchegger, A. Kollmann, T. Philippi,
P. Dragon, E. Seitz, and S. Repas, The Electronic Health Record in Austria: a strong network between health
care and patients, European Surgery (2012), 155-163.
[3] R.H. Dolin, L. Alschuler, S. Boyer, C. Beebe, F.M. Beilen, P.V. Biron, and A. Shabo, HL7 Clinical
Document Architecture, Release 2, J Am Med Inform Assoc 13 (2006), 30-39.
[4] K.U. Heitmann, J. Curry, and L. Nelson, HL7 Templates Standard: Specification and Use of
Reusable Information Constraint Templates, Release 1, in, 2014.
[5] H.U. Prokosch and T. Ganslandt, Perspectives for medical informatics. Reusing the electronic
medical record for clinical research, Methods Inf Med 48 (2009), 38-44.
[6] M. Cuggia, P. Besana, and D. Glasspool, Comparing semi-automatic systems for recruitment of
patients to clinical trials, Int J Med Inform 80 (2011), 371-388.
[7] E. Fink, P.K. Kokku, S. Nikiforou, L.O. Hall, D.B. Goldgof, and J.P. Krischer, Selection of patients
for clinical trials: an interactive web-based system, Artif Intell Med 31 (2004), 241-254.
[8] D. Peterson, S. Gao, A. Malhotra, C. Sperberg-McQuenn, and H. Thompson, W3C XML Schema
Definition Language (XSD) 1.1 Part 2: Datatypes, https://www.w3.org/TR/2012/REC-xmlschema11-2-
20120405/datatypes.html, last accessed January 3, 2019.
[9] C. McCay and M. Stephens, XML Implementation Technology Specification, Release 2,
http://hl7.ihelse.net/hl7v3/infrastructure/its_r2/its_r2Spec.html, last accessed January 5, 2019.
[10] O. Dameron, P. Besana, O. Zekri, A. Bourde, A. Burgun, and M. Cuggia, OWL model of clinical
trial eligibility criteria compatible with partially-known information, J Biomed Semantics 4 (2013), 17.
[11] F. Kopcke and H.U. Prokosch, Employing computers for the recruitment into clinical trials: a
comprehensive systematic review, J Med Internet Res 16 (2014), e161.
[12] J.T. Fernandez-Breis, J.A. Maldonado, M. Marcos, C. Legaz-Garcia Mdel, D. Moner, J. Torres-
Sospedra, A. Esteban-Gil, B. Martinez-Salvador, and M. Robles, Leveraging electronic healthcare record
standards and semantic web technologies for the identification of patient cohorts, J Am Med Inform Assoc 20
(2013), e288-296.
[13] Y. Girardeau, J. Doods, E. Zapletal, G. Chatellier, C. Daniel, A. Burgun, M. Dugas, and B. Rance,
Leveraging the EHR4CR platform to support patient inclusion in academic studies: challenges and lessons
learned, BMC Med Res Methodol 17 (2017), 36.
[14] R. Bache, S. Miles, and A. Taweel, An adaptable architecture for patient cohort identification from
diverse data sources, J Am Med Inform Assoc 20 (2013), e327-333.
[15] G.M. Weber, S.N. Murphy, A.J. McMurry, D. Macfadden, D.J. Nigrin, S. Churchill, and I.S.
Kohane, The Shared Health Research Information Network (SHRINE): a prototype federated query tool for
clinical data repositories, J Am Med Inform Assoc 16 (2009), 624-630.
[16] S.N. Murphy, G. Weber, M. Mendis, V. Gainer, H.C. Chueh, S. Churchill, and I. Kohane, Serving
the enterprise and beyond with informatics for integrating biology and the bedside (i2b2), J Am Med Inform
Assoc 17 (2010), 124-130.
[17] S. Liu, Y. Ni, J. Mei, H. Li, G. Xie, G. Hu, H. Liu, X. Hou, and Y. Pan, iSMART: Ontology-based
Semantic query of CDA documents, AMIA Annu Symp Proc 2009 (2009), 375-379.
234 dHealth 2019 – From eHealth to dHealth
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
doi:10.3233/978-1-61499-971-3-234

Predictive Modelling and Its Visualization


for Telehealth Data – Concept and
Implementation of an Interactive Viewer
Michael SAMSa,b,1, Alphons EGGERTHa,b, Dieter HAYNa, Sai VEERANKIa,b and
Günter SCHREIERa,b
a
AIT Austrian Institute of Technology, Graz, Austria
b
Institute of Neural Engineering, Graz University of Technology, Graz, Austria

Abstract. Background: Predictive modelling is becoming increasingly important in


the healthcare sector. A comprehensive understanding of obtained models and their
predictions is indispensable for the development and later acceptance of such
systems. Objectives: A general concept of a toolset that supports data scientists in
the development of predictive models in the telehealth context had to be developed
and subsequently implemented. Methods: Based on surveys the user requirements
were determined. The concept development was based on the data model of the
‘HerzMobil Tirol’ telehealth program. The implementation was conducted in
MATLAB. Results: A list of requirements was identified, based on which a viewer
was implemented. Conclusion: The developed viewer concept and its
implementation facilitate a deeper insight and a better understanding of the
development process of predictive models in the telehealth context.

Keywords. telemedicine, predictive analytics, visual analytics, human-in-the-loop

1. Introduction

1.1. Background

New technologies and increasing digital technologies are leading to huge amounts of
data being available in the health and care sector. Affordable and portable measuring
devices also enable continuous monitoring of patients at home in telehealth settings. In
order to really benefit from all this data, computer-aided processing methods are
increasingly applied, especially machine learning approaches are currently considered in
many healthcare areas [1][2][3].
While technological progress offers promising possibilities, fully automatic analysis
and modelling approaches are prone to be incomprehensive solutions. To date, human
involvement is still an essential part in the modelling process[4][5]. On the one hand,
specialists need to contribute their knowledge and skills during the development process
of predictive models. On the other hand, presentation of modelling results needs to be
comprehensible and understandable for humans in order to achieve legitimacy and

1
Corresponding Author: Michael Sams, Institute of Neural Engineering, Graz University of Technology,
AIT Austrian Institute of Technology, Reininghausstraße 13/1, 8020 Graz, Austria, E-Mail:
[email protected]
M. Sams et al. / Predictive Modelling and Its Visualization for Telehealth Data 235

acceptance of such systems. Therefore, interactive visual analytics tools are needed to
fuse human intelligence with computational processing power to achieve optimum
results[6][7][8].

1.2. Model development and implementation

At the AIT Austrian Institute of Technology, the ‘Predictive Healthcare Information


Systems’ team is carrying out research on decision support systems and predictive
modelling in the healthcare domain. Among other activities, we are dealing with the
question of how a data-driven decision support system for health and care (DS4H) should
optimally be designed and implemented. In [9], D. Hayn et al. described healthcare
specific requirements for such a DS4H.
We postulate, that the development of a DS4H should follow a two-level process
with two continuously repeating cycles, which are depicted in Figure 1. Cycle 1
represents the actual model development process, including data cleaning & pre-
processing, feature engineering, model training, evaluation and visualization,
interpretation & validation. Cycle 2 is intended to support the surrounding processes,
which are present in a real-world scenario, i.e. objective definition, data collection & de-
identification as well as deployment.

Figure 1. Two-level process of data driven decision support in health and care.[9]

To implement the process depicted in Figure 1, a software package, called


‘Predictive Analytics Toolset for Healthcare’ (PATH) is currently developed. PATH is
a MATLAB (The MathWorks, Natick, US) based predictive modelling toolset. The
fundamental idea is to come from retrospective data to prospective predictions.
In its current state, PATH is primarily utilised for the inner cycle of the process
illustrated in Figure 1. The current development focus lies on tools that support
visualization, interpretation and validation of predictive models. As an example of a
recent adaptation, PATH’s “ECG viewer” was developed in the course of participating
in the Computing in Cardiology Challenge (CinC) 2017[10]. This viewer was utilised
for the development of algorithms, which are now capable of automatically detecting
cardiac anomalies with high accuracy. The viewer supports visualization and analysis on
the various involved levels, including raw data (e.g. ECG), pre-processed data (e.g.
averaged heartbeat), extracted features (e.g. QT interval), built models (e.g. classify as
236 M. Sams et al. / Predictive Modelling and Its Visualization for Telehealth Data

normal / pathological), evaluation outcome (e.g. a false positive case), and the
assessment of the relevance of a given feature [11].
However, although for ECG visualization and classification the ECG viewer
extension is highly suitable, PATH lacks tools for supporting the predictive modelling
process of other domains, like e.g. the telehealth domain.

1.3. Objectives

The aim of the present work was to develop and implement a tool that supports data
scientists in developing predictive models in the telehealth domain. This tool had to be
developed on the basis of the predictive toolset PATH, which was introduced in section
1.2. The tool had to be integrated into the process of model development, which
corresponds to the inner cycle of Figure 1. It had to be designed as an interactive
visualization tool, allowing users to gain better insights during the model development
process, which provides the basis for improvements and model optimization.
To this end, it was necessary to investigate which design and functionality
requirements such a tool should have in order to support the workflow of data scientists
and to cope with the special characteristics of telehealth data. Based on these
requirements, a viewer was designed and subsequently implemented in MATLAB.

2. Methods

2.1. Dataset

The development of the viewer was based on a test dataset consistent with the data model
of the heart failure disease management program 'HerzMobil Tirol'. [12] In addition to
demographic data of the patients and measurements taken by the patients at home (i.e.
body weight, heart rate and blood pressure), the test dataset also included clinical notes
of healthcare professionals, information about medication compliance and prescriptions
as well as information on hospital admissions.
To suit for the development of various functionalities, the test dataset was extended
by artificially adding ECG recordings to obtain a further data level. Thus, for selected
patients, a time series of ECG measurements was added to the existing data.

2.2. Requirements Analysis

In order to get a comprehensive basis for the requirements analysis with respect to the
special characteristics of telehealth data, first the general characteristics of health data
were examined in detail. Subsequently, this basis was extended and refined with
experiences from the concrete telehealth use case from the 'HerzMobil Tirol' program.
The procedure of gathering the viewer requirements regarding the development of
predictive models was similar. In addition to theoretical considerations, the existing
workflow of the AIT data scientists was analysed in detail and their previous experiences
regarding the development of predictive models were considered. There was also an in-
depth exchange about expectations towards such a tool and its capabilities.
On the basis of these surveys, the essential elements and functionalities of such an
interactive visualization tool were identified.
M. Sams et al. / Predictive Modelling and Its Visualization for Telehealth Data 237

2.3. Implementation

The implementation of the viewer was carried out in MATLAB R2018a (The
MathWorks Natick, US). For the development of the graphical user interface
MATLAB’s in-house program GUIDE was utilised.[13]
As a basis for the implementation, the ‘ECG viewer’ framework of the CinC 2017
challenge, which was described in section 1.2, was used. In the course of this work, this
framework was generalized using the 'HerzMobil Tirol' telehealth data model. The
verification of this generalised tool was then carried out with a de-identified dataset from
the research project EPICURE, containing records of ergometric performance tests
conducted by a cohort of cardiac rehabilitation patients.

3. Results

3.1. Requirements Analysis

An important aspect of telehealth data is the variety of possible data types. The data can
thereby include a variety of measurements, various medical events (e.g. hospitalization,
procedures, medications) as well as unstructured clinical notes. Commonly, the focus of
interest lies on the progression of a patient's health over time. Thus, another essential
aspect is the temporal characteristic of the data. Especially temporal patterns are thereby
of great interest. Based on the gained knowledge concerning the characteristics of
telehealth data and the workflow of data scientists, the following essential elements and
functionalities of a visualization tool that supports the model development process were
identified. (Figure 2)
Due to the temporal characteristic of telehealth data, an appropriate time series
visualization poses a core element of the requirements. Time series data typically have
to be pre-processed or transformed before they can be used in the modelling process.
Therefore, it was necessary to support an interaction of the viewer and the signal
processing functionality of the software. Selection of different available signal
processing algorithms should be supported along with a possibility to launch them
directly out of the user interface combined with automatically reloading all affected
viewer elements to keep the visualizations up-to-date. This enables a convenient
environment for testing different algorithms and for analysing different outcomes.
Another key aspect in the modelling process are the so-called “features”. These are
specific attributes or properties of the data and are essential to the predictive model (e.g.
classification and regression trees). To keep a good overview of the current modelling
processes, all features should be at hand, which can e.g. be realised through presenting
them as a list.
When it comes to distinct model result visualization of a single patient, the pure
listing of numerical values is not sufficient. The model result should be presented in an
intuitive and comprehensive way in order to instantly provide the required information
238 M. Sams et al. / Predictive Modelling and Its Visualization for Telehealth Data

Figure 2. Essential elements and functionalities of a visualization tool

for further model improvement. For example, a threshold violation could be indicated by
color coding depending on whether it is a positive or negative outcome. Another
important aspect is, that if the modelling results are related to a certain time interval (e.g.
weekly intervals) the time synchronous relation to the underlying raw data must be given.
Typical datasets for predictive modelling consist of several data levels. Along with
raw data from measuring devices, derived data or any other kind of supplementary
information on a measurement can be available. For example, a time series of heart rate
values could have been derived from ECG recordings. Thus, while initially the actual
heart rate values are shown, there should be a functionality to go further into detail and
to take a closer look at the underlying biosignal, i.e. the ECG. Other examples would be
various biosignals, lab reports or even imaging data. Such a functionality was
implemented as a so-called ‘drill-down’ functionality and allows the user to obtain a
higher level of details.
Another important part of the predictive modelling process is the overall model
evaluation. However, the development of a predictive model does not follow a specific
path, like e.g. from raw data to modelling result, but it is an interactive, iterative process
of understanding all the aspects of a given scenario and the resulting modelling outcomes.
The main challenges are understanding why a given result has been obtained and also
finding potential issues, which might lead to errors or unsatisfactory effects. Therefore,
it should be possible to launch the viewer directly out of the evaluation process (e.g. by
choosing an interesting cell of a confusion matrix) presenting just the chosen sub-
selection of cases (e.g. the false positive cases only). In order to then once again allow a
closer look at the underlying data, the underlying measurements, the features that were
used for the model etc., to enable and support the process of gaining insights.

3.2. Implementation

Figure 3 shows the interface of the ‘PATHviewer’ with its main elements.
M. Sams et al. / Predictive Modelling and Its Visualization for Telehealth Data
Figure 3. MATLAB-based PATHviewer. 1: feature table, 2: modelling result panel, 3: time series panel, 4: drill-down panel

239
240 M. Sams et al. / Predictive Modelling and Its Visualization for Telehealth Data

In the feature table (1), all the features of the currently selected patient are listed.
The modelling result panel (2) is for the presentation of the model output. In this example,
the prediction was about, whether for the respective patient an admission to a hospital
will happen or not. Along the x-axis, the time is plotted, which in this case are 7-day
intervals. For each of these intervals, a predicted probability, whether an admission to
the hospital might happen is visualized by bars with corresponding height. The red dotted
line represents a threshold. If the predicted probability exceeds the threshold, the model
indicates that there will be an admission. An actual admission to a hospital is indicated
by a grey background of the respective week. The color of the bars is indicating whether
the prediction was true (green) or false (red).
The time series panel (3) is dedicated for the visualization of time series data. This
panel is implemented in such a way that it is user-configurable. A configuration menu
allows the user to define the number of axes and their size. In a separate step it is then
possible to decide via a context menu, which parameters should be plotted on which axis.
A data cursor functionality can be used to display detailed information on the data points.
This data cursor function can also be used to display text information such as e.g. clinical
notes.
The purpose of the drill-down panel (4) is to allow the user to take a deeper look
into the underlying data. If there is a time series of measurements or events for which an
underlying data level exists, the individual data points can be selected via mouse click.
The layout and content of the drill-down panel then adapts to the respective measurement
or event type. In the case of the example shown in Figure 3, it is an ECG measurement
with the corresponding signal analysis.
There are two possibilities for the start-up procedure. On the one hand, the Viewer
can be called via the MATLAB 'Command Window', where all existing data can then
interactively be loaded into the viewer. On the other hand, after a model run, it is possible
to load the viewer directly from the evaluation graphics (e.g. by clicking onto a cell of
the confusion matrix), with a specific subselection of cases (e.g. false positive cases).

4. Discussion

The concept development of a tool that supports data scientists in the development of
predictive models in the telehealth domain revealed many aspects that have to be
considered. A major challenge in terms of the amount and variety of available data is
finding the right balance. On the one hand, a good and comprehensive overall overview
of the situation should be available, providing simultaneous visualization across several
data levels. On the other hand, however, care must also be taken to ensure that there is
no overload of information that overwhelms the user.
The developed tool takes the former aspect into account by simultaneously
presenting the model result, the underlying time series data as well as the features used
for modelling on a single interface. The color-coded representation of the modelling
results enables a quick overview of the model outcomes. Furthermore, the horizontally
time synchronous alignment of the time series directly underneath each other illustrates
the temporal relationship between the model output and the progression of the raw data.
This is of particular importance in order to gain a deeper understanding of the model
outcomes and to formulate new hypotheses.
The second aspect, prevention from information overload, is ensured by the
configurability of the viewer and the concept of 'details on demand'. The user has the
M. Sams et al. / Predictive Modelling and Its Visualization for Telehealth Data 241

possibility to adapt the time series visualization according to his needs, to retrieve
detailed information by means of interactions and to go one step deeper into the data with
the help of drill-down procedures.
Although the presented viewer was developed on the basis of data from a heart
failure disease management program, the basic concept and framework of the tool can
also be applied to a number of other use cases in the health and care sector. Proof-of-
principle for these capabilities has been obtained by applying the viewer to analysis set
of rehabilitation ergometry data. It is a limitation, however, that it required a certain
amount of changes in the code of the viewer to adapt for new application areas. For an
even more generic solution of the viewer, further developments will be necessary.
Nevertheless, even at the current stage of development, the concept and its concrete
implementation make an important contribution to gaining a deeper understanding of the
model development process. It enables insights that can only be achieved by
simultaneously viewing the various levels of data for all development steps. This is
essential for the optimization process of such models in order to improve their
performance. On the other hand, this insight and a deeper understanding are essential to
make the model outcome comprehensible and explainable, which is critical for the
acceptance of data-driven decision support systems in real world applications.

Acknowledgement
This work was partly funded by the Austrian Research Promotion Agency (FFG) as part
of the project EPICURE under grant agreement 14270859.

References

[1] T. B. Murdoch and A. S. Detsky, “The Inevitable Application of Big Data to Health Care,” JAMA,
vol. 309, no. 13, pp. 1351–1352, 2013.
[2] J. Hu, A. Perer, and F. Wang, “Data Driven Analytics for Personalized Healthcare,” in Healthcare
Information Management Systems: Cases, Strategies, and Solutions, C. A. Weaver, M. J. Ball, G. R.
Kim, and J. M. Kiel, Eds. Cham: Springer International Publishing, 2016, pp. 529–554.
[3] D. Gotz and D. Borland, “Data-Driven Healthcare: Challenges and Opportunities for Interactive
Visualization,” IEEE Comput. Graph. Appl., vol. 36, no. 3, pp. 90–96, 2016.
[4] D. A. Keim, F. Mansmann, J. Schneidewind, J. Thomas, and H. Ziegler, “Visual Analytics: Scope
and Challenges,” in Visual Data Mining: Theory, Techniques and Tools for Visual Analytics, S. J.
Simoff, M. H. Böhlen, and A. Mazeika, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008,
pp. 76–90.
[5] A. Holzinger, “Interactive machine learning for health informatics: when do we need the human-in-
the-loop?,” Brain Informatics, vol. 3, no. 2, pp. 119–131, Jun. 2016.
[6] D. Keim, J. Kohlhammer, G. Ellis, and F. Mansmann, Mastering the Information Age. 2010.
[7] D. Keim, G. Andrienko, J.-D. Fekete, C. Görg, J. Kohlhammer, and G. Melançon, Visual Analytics:
Definition, Process, and Challenges. 2008.
[8] S. Liu, X. Wang, M. Liu, and J. Zhu, “Towards better analysis of machine learning models: A visual
analytics perspective,” Vis. Informatics, vol. 1, no. 1, pp. 48–56, 2017.
[9] D. Hayn et al., “Predictive analytics for data driven decision support in health and care,” it -
Information Technology, vol. 60. pp. 183–194, 2018.
[10] G. D. Clifford et al., “AF Classification from a Short Single Lead ECG Recording: the
PhysioNet/Computing in Cardiology Challenge 2017,” Comput. Cardiol. (2010)., vol. 44, p.
10.22489/CinC.2017.065-469, Sep. 2017.
[11] M. Kropf et al., “Cardiac anomaly detection based on time and frequency domain features using tree-
based classifiers,” Physiol. Meas., vol. 39, no. 11, p. 114001, 2018.
[12] A. der Heidt et al., “HerzMobil Tirol network: rationale for and design of a collaborative heart failure
disease management program in Austria,” Wien. Klin. Wochenschr., vol. 126, no. 21, pp. 734–741,
Nov. 2014.
[13] The MathWorks, Inc. Matlab, https://www.mathworks.com/products/matlab.html, last accessed:
6.2.2019.
This page intentionally left blank
dHealth 2019 – From eHealth to dHealth 243
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).

Subject Index
AAL 9 emergency hospital service 57
algorithms 73 ENCODE 105
appointment 33 ergometry 210
atlas 49 exercise test 210
Austria 178 expressive language disorders 81
autonomy 9 FHIR 33
barrier-free toilet 9 forecasting 57
biomarkers 89 FXR 105
blindness 170 health care delivery 218
business model 178 health information 202
cardiac rehabilitation 210 health information interoperability 97
care 9 health monitoring 138
child 17 health service 218
ChIP-seq 105 heart assist device 192
classification 65, 89 heart failure 146
clinical decision support systems 154 hip fracture 138
cognition 170 human-in-the-loop 234
competency-based education 162 human-machine interaction 178
confidentiality 202 ICPC-2 136
convolutional neural network 192 implementation 121
cross-institutional data exchange 33 in-patient care of the elderly 178
data aggregation 49 infection classification 192
data analysis 210 information provision 1
data analytics 200 information system 1
data elements 25 integrated care 218
data linkage 200 Iran 121
data mining 41 kinetics 89
delirium 65 latent class analysis 113
delivery of health care 49 left-ventricular assist device 146
diabetes mellitus 1 machine learning 65, 73, 81, 186
diarization 81 medical education 73
disease management 146 medical informatics 81
documentation 136 medical record linkage 210
driveline infection 192 mental disorders 25
early medical intervention 154 minimum data set 25
echocardiography 41 mobile applications 17
eHealth 170, 218 model deployment 186
electroencephalography 97, 113 model stability 186
electronic health records 65, 226 motion tracker 138
electronic medical record 41 named entity recognition 41
electronic medical record systems 25 national health programs 136
electronic prescribing 121, 128 national roadmap 121
emergencies 17 natural language processing 73
244

needs assessment 162 scheduling 33


nursing informatics 162 security 202
patient care management 218 smart toilet 9
patient education 1 socially assistive robots 178
patient participation 154 standards 121
patient selection 226 stroke 218
patients 57 telemedicine 234
prediction 186 telemonitoring 146
predictive analytics 234 text mining 41
primary health care 136 time series 89
privacy 202 toileting 9
recommender system 128 unsupervised machine learning 113
reference standards 226 usability 128
regression analysis 57 use cases 178
rehabilitation 138 user-computer interface 17
remote monitoring 146 vision 170
resuscitation 17 visual analytics 234
robotic toilet 9 wearables 138
dHealth 2019 – From eHealth to dHealth 245
D. Hayn et al. (Eds.)
© 2019 The authors, AIT Austrian Institute of Technology and IOS Press.
This article is published online with Open Access by IOS Press and distributed under the terms
of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).

Author Index
Aghabagheri, M. 121, 128 Jauk, S. 65, 186
Altenbuchner, A. 138 Jolo, P. 1
Ammenwerth, E. 162 Jonas, S.M. 81, 113
Babitsch, B. 57 Jung, O. 154
Bahaadinbeigy, K. 121 Jungwirth, E. 105
Baumgarten, D. 89 Kastner, P. 218
Baumgartner, C. 89 Kimiafar, K. 121
Boeken, U. 146 Klischies, D. 81
Breit, M. 89 Kluge, T. 97
Bürkle, T. 33 Kohlschein, C. 81
Dehghan, H. 121, 128 Kramer, D. 65, 186
Denecke, K. 1, 33 Kreiner, K. 210
Denter, M. 57 Kriegel, J. 178
Duftschmid, G. 226 Kutafina, E. 113
Eggerth, A. v, 186, 210, 234 Kyburz, P. 33
Ehrenmüller, I. 178 Leodolter, W. 65, 186
Endel, F. 49, 200 Lüneburg, N. 192
Eslami, S. 121, 128 Machalik, K. 41
Etminani, K. 128 Marschall, H.-U. 105
Feldmann, C. 192 Mayer, P. 9
Fogarassy, G. 41 Messer-Misak, K. 136
Geley, T. 218 Modre-Osprian, R. 210
Gfeller, S. 33 Morshuis, M. 146
Ghasemi, S.H. 121, 128 Mulrenin, R. 154
Glachs, D. 154 Namayandeh, S.M. 121, 128
Grabner, V. 178 Netzer, M. 89
Griffin, E. 170 Nüssli, S. 1
Güldenpfennig, F. 9 Ott, S. 226
Haag, M. 17, 73 Panek, P. 9
Hackl, W.O. 162 Panzitt, K. 105
Hanser, F. 89 Picinali, L. 170
Hashemi, N. 25, 202 Ploessnig, M. 154
Hashemi, N.-s. 25, 202 Quehenberger, F. 65, 186
Hasibian, M.R. 128 Rauch, J. 57
Haug, S. 138 Rawassizadeh, R. 25
Hayn, D. v, 65, 186, 210, 234 Reiss, N. 146, 192
Hoffmann, J.-D. 146 Reiswich, A. 73
Huber, M. 97 Reiter, K. 218
Hübner, U. 57 Rinner, C. 226
Huisman, S. 154 Rippinger, C. 49
Igel, C. 17 Rissbacher, C. 218
Jahangiri, M. 121 Runge, J. 218
Jansen, S. 192 Saberi, M.R. 128
246

Sams, M. 234 Urach, C. 49


Sargolzaei, M. 121 Vakili Arki, H. 128
Scase, M. 170 van de Steeg, M. 192
Scheffel, S. 49 van der Meulen, P. 192
Schmidt, T. 146, 192 van Keulen, H. 154
Schmucker, M. 17 Vathy-Fogarassy, Á. 41
Schreier, G. v, 65, 186, 210, 234 Veeranki, S.P.K. 65, 186, 210, 234
Schulte Eistrup, S. 146 von Stein, N. 113
Schulte-Coerne, J. 113 Wagner, M. 105
Sevinc, B. 1 Weber, K. 138
Sheikhtaheri, A. 25, 202 Wegner, K.K. 146
Smith, I. 154 Weibrecht, N. 49
Sont, J. 154 Weinberger, K.M. 89
Strohmeier, F. 154 Welte, S. 218
Szekér, S. 41 Wendl, R. 192
Thallinger, G.G. 105 Werner, C.J. 81
Traninger, H. 210 Winkler, S. 97
Tuttle-Weidinger, L. 178 Zechmeister, M. 49, 200
This page intentionally left blank
This page intentionally left blank

You might also like