100% found this document useful (1 vote)
44 views86 pages

Human Activity Recognition HAR in Healthcare 1st Edition Luigi Bibbò - Download The Full Ebook Now For A Seamless Reading Experience

The document provides information about various ebooks available for instant download on ebookgate.com, including titles related to healthcare, human activity recognition, and surgical neuropathology. It features a special issue on Human Activity Recognition (HAR) in Healthcare, edited by Luigi Bibbò and Marley M.B.R. Vellasco, highlighting the importance of HAR in improving care for older adults through technology. The document also outlines the phases of HAR and its applications in healthcare and other fields.

Uploaded by

dirlikmrhmet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
44 views86 pages

Human Activity Recognition HAR in Healthcare 1st Edition Luigi Bibbò - Download The Full Ebook Now For A Seamless Reading Experience

The document provides information about various ebooks available for instant download on ebookgate.com, including titles related to healthcare, human activity recognition, and surgical neuropathology. It features a special issue on Human Activity Recognition (HAR) in Healthcare, edited by Luigi Bibbò and Marley M.B.R. Vellasco, highlighting the importance of HAR in improving care for older adults through technology. The document also outlines the phases of HAR and its applications in healthcare and other fields.

Uploaded by

dirlikmrhmet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Instant Ebook Access, One Click Away – Begin at ebookgate.

com

Human Activity Recognition HAR in Healthcare 1st


Edition Luigi Bibbò

https://ebookgate.com/product/human-activity-recognition-
har-in-healthcare-1st-edition-luigi-bibbo/

OR CLICK BUTTON

DOWLOAD EBOOK

Get Instant Ebook Downloads – Browse at https://ebookgate.com


Click here to visit ebookgate.com and download ebook now
Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

Human Resources In Healthcare Managing for Success Third


Edition Bruce Fried

https://ebookgate.com/product/human-resources-in-healthcare-managing-
for-success-third-edition-bruce-fried/

ebookgate.com

Self Regulation in Activity Theory Applied Work Design for


Human Computer and Human Machine Systems 1st Edition
Gregory Z. Bedny (Author)
https://ebookgate.com/product/self-regulation-in-activity-theory-
applied-work-design-for-human-computer-and-human-machine-systems-1st-
edition-gregory-z-bedny-author/
ebookgate.com

Warships in the War of the Pacific 1879 83 1st Edition


Angus Konstam

https://ebookgate.com/product/warships-in-the-war-of-the-
pacific-1879-83-1st-edition-angus-konstam/

ebookgate.com

Metal Polymer Nanocomposites 1st Edition Luigi Nicolais

https://ebookgate.com/product/metal-polymer-nanocomposites-1st-
edition-luigi-nicolais/

ebookgate.com
Cancer Modelling and Simulation 1st Edition Luigi Preziosi

https://ebookgate.com/product/cancer-modelling-and-simulation-1st-
edition-luigi-preziosi/

ebookgate.com

Practical Surgical Neuropathology A Diagnostic Approach A


Volume in the Pattern Recognition series Expert Consult
Online and Print 1 Har/Psc Edition Arie Perry Md
https://ebookgate.com/product/practical-surgical-neuropathology-a-
diagnostic-approach-a-volume-in-the-pattern-recognition-series-expert-
consult-online-and-print-1-har-psc-edition-arie-perry-md/
ebookgate.com

Instant Practice Packets Alphabet Ready to Go Activity


Pages That Help Children Build Alphabet Recognition and
Letter Formation Skills Joan Novelli
https://ebookgate.com/product/instant-practice-packets-alphabet-ready-
to-go-activity-pages-that-help-children-build-alphabet-recognition-
and-letter-formation-skills-joan-novelli/
ebookgate.com

AS pure mathematics C1 C2 3rd ed Edition Val Hanrahan

https://ebookgate.com/product/as-pure-mathematics-c1-c2-3rd-ed-
edition-val-hanrahan/

ebookgate.com

New Developments in Pseudo Differential Operators ISAAC


Group in Pseudo Differential Operators 1st Edition Luigi
Rodino
https://ebookgate.com/product/new-developments-in-pseudo-differential-
operators-isaac-group-in-pseudo-differential-operators-1st-edition-
luigi-rodino/
ebookgate.com
Special Issue Reprint

Human Activity Recognition


(HAR) in Healthcare

Edited by
Luigi Bibbò and Marley M.B.R. Vellasco

mdpi.com/journal/applsci
Human Activity Recognition (HAR) in
Healthcare
Human Activity Recognition (HAR) in
Healthcare

Editors
Luigi Bibbò
Marley M. B. R. Vellasco

Basel • Beijing • Wuhan • Barcelona • Belgrade • Novi Sad • Cluj • Manchester


Editors
Luigi Bibbò Marley M. B. R. Vellasco
University of Florence Pontifical Catholic University
Florence of Rio de Janeiro
Italy Rio de Janeiro
Brazil

Editorial Office
MDPI
St. Alban-Anlage 66
4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal
Applied Sciences (ISSN 2076-3417) (available at: https://www.mdpi.com/journal/applsci/special
issues/A1K098AX9D).

For citation purposes, cite each article independently as indicated on the article page online and as
indicated below:

Lastname, A.A.; Lastname, B.B. Article Title. Journal Name Year, Volume Number, Page Range.

ISBN 978-3-0365-9778-2 (Hbk)


ISBN 978-3-0365-9779-9 (PDF)
doi.org/10.3390/books978-3-0365-9779-9

© 2024 by the authors. Articles in this book are Open Access and distributed under the Creative
Commons Attribution (CC BY) license. The book as a whole is distributed by MDPI under the terms
and conditions of the Creative Commons Attribution-NonCommercial-NoDerivs (CC BY-NC-ND)
license.
Contents

About the Editors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Luigi Bibbò and Marley M. B. R. Vellasco


Human Activity Recognition (HAR) in Healthcare
Reprinted from: Appl. Sci. 2023, 13, 13009, doi:10.3390/app132413009 . . . . . . . . . . . . . . . . 1

Kamsiriochukwu Ojiako and Katayoun Farrahi


MLPs Are All You Need for Human Activity Recognition
Reprinted from: Appl. Sci. 2023, 13, 11154, doi:10.3390/app132011154 . . . . . . . . . . . . . . . . 10

Aitor Arribas Velasco, John McGrory and Damon Berry


An Evaluation Study on the Analysis of People’s Domestic Routines Based on Spatial, Temporal
and Sequential Aspects
Reprinted from: Appl. Sci. 2023, 13, 10608, doi:10.3390/app131910608 . . . . . . . . . . . . . . . . 28

Qian Huang, Weiliang Xie, Chang Li, Yanfang Wang and Yanwei Liu
Human Action Recognition Based on Hierarchical Multi-Scale Adaptive Conv-Long Short-Term
Memory Network
Reprinted from: Appl. Sci. 2023, 13, 10560, doi:10.3390/app131910560 . . . . . . . . . . . . . . . . 41

Sakorn Mekruksavanich, Wikanda Phaphan, Narit Hnoohom and Anuchit Jitpattanakul


Attention-Based Hybrid Deep Learning Network for Human Activity Recognition Using WiFi
Channel State Information
Reprinted from: Appl. Sci. 2023, 13, 8884, doi:10.3390/app13158884 . . . . . . . . . . . . . . . . . 63

Abı́lio Oliveira and Mónica Cruz


Virtually Connected in a Multiverse of Madness?—Perceptions of Gaming, Animation, and
Metaverse
Reprinted from: Appl. Sci. 2023, 13, 8573, doi:10.3390/app13158573 . . . . . . . . . . . . . . . . . 85

Dimitris Filos, Jomme Claes, Véronique Cornelissen, Evangelia Kouidi


and Ioanna Chouvarda
Predicting Adherence to Home-Based Cardiac Rehabilitation with Data-Driven Methods
Reprinted from: Appl. Sci. 2023, 13, 6120, doi:10.3390/app13106120 . . . . . . . . . . . . . . . . . 109

Changmin Kim and Woobeom Lee


Human Activity Recognition by the Image Type Encoding Method of 3-Axial Sensor Data
Reprinted from: Appl. Sci. 2023, 13, 4961, doi:10.3390/app13084961 . . . . . . . . . . . . . . . . . 132

Tsige Tadesse Alemayoh, Jae Hoon Lee and Shingo Okamoto


Leg-Joint Angle Estimation from a Single Inertial Sensor Attached to Various Lower-Body Links
during Walking Motion
Reprinted from: Appl. Sci. 2023, 13, 4794, doi:10.3390/app13084794 . . . . . . . . . . . . . . . . . 149

Sara Caramaschi, Gabriele Basso Papini and Enrico Gianluca Caiani


Device Orientation Independent Human Activity Recognition Model for Patient Monitoring
Based on Triaxial Acceleration
Reprinted from: Appl. Sci. 2023, 13, 4175, doi:10.3390/app13074175 . . . . . . . . . . . . . . . . . 166

Luigi Bibbo’, Francesco Cotroneo and Marley Vellasco


Emotional Health Detection in HAR: New Approach Using Ensemble SNN
Reprinted from: Appl. Sci. 2023, 13, 3259, doi:10.3390/app13053259 . . . . . . . . . . . . . . . . . 184

v
Héctor José Tricás-Vidal, Marı́a Concepción Vidal-Peracho, Marı́a Orosia Lucha-López,
César Hidalgo-Garcı́a, Sofı́a Monti-Ballano, Sergio Márquez-Gonzalvo and José Miguel
Tricás-Moreno
Association between Body Mass Index and the Use of Digital Platforms to Record Food Intake:
Cross-Sectional Analysis
Reprinted from: Appl. Sci. 2022, 12, 12144, doi:10.3390/app122312144 . . . . . . . . . . . . . . . . 206

vi
About the Editors
Luigi Bibbò
Luigi Bibbò received a B.D. and an M.D. in Biomedical Engineering at the University of
Naples “Federico II”, Italy, in 2006 and 2009, repsectively, and a Ph.D. in Electronic Engineering
and Computer Science at the Second University of Naples, Italy, in 2014. From Sept. 2013 to May
2014, he was a Visiting Scientist at Tufts University of Boston (USA) in the Ultrafast nonlinear
Optics and Photonics Laboratory. From April 2016 to Nov. 2018, he was a postdoc researcher
at the College of Electronics and Information Engineering of Shenzhen University (CHINA) for
research on plasmonic metamaterials. From Feb. 2019 to July 2019, he was an OAM multiplexing
research fellow at Nanophotonic Research Center (NCR). From August 2019 to August 2022, he was
a Researcher at DIIES at the University ”Mediterranea” of Reggio Calabria (Italy) and lecturer of
the course Electronic Bioengineering. Since March 2023, he has been a researcher at the Department
of Industrial Engineering of the University of Florence on the design, development, and validation
of Robotics, IoT, and Artificial Intelligence technologies for biomedical applications. His research
interests include computational intelligence methods and applications, including neural networks,
virtual reality, and augmented reality. He is a reviewer of numerous international newspapers and
Guest Editor of Frontiers and MDPI papers.

Marley M. B. R. Vellasco
Marley M. B. R. Vellasco received bachelor’s and master’s degrees in electrical engineering from
the Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Brazil, and a Ph.D. degree in computer
science from the University College London (UCL). She is currently the Head of the Computational
Intelligence and Robotics Laboratory (LIRA), PUC-Rio. She is the author of four books and more
than four hundred scientific papers in the area of soft computing and machine learning. She has
supervised more than 35 Ph.D. Thesis and 85 M.Sc. dissertations. Her research interests include
computational intelligence methods and applications, such as neural networks, fuzzy logic, hybrid
intelligent systems (neuro-fuzzy, neuro-evolutionary, and fuzzy-evolutionary models), robotics and
intelligent agents, applied to decision support systems, pattern classification, time-series forecasting,
and control, optimization, and data mining.

vii
applied
sciences
Editorial
Human Activity Recognition (HAR) in Healthcare
Luigi Bibbò 1, * and Marley M. B. R. Vellasco 2

1 BioRobotics Lab, Department of Industrial Engineering, University of Florence, 50134 Florence, Italy
2 Department of Electrical Engineering, Pontifical Catholic University of Rio de Janeiro,
Rio de Janeiro 22451-000, Brazil; [email protected]
* Correspondence: luigi.bibbo@unifi.it

1. Introduction
Developments in the medical and technological fields have led to a longer life ex-
pectancy. However, this improvement has led to an increase in the number of older people
with critical health conditions who need care. Older people who cannot care for them-
selves need special assistance during their daily care. Long-term care involves medical,
welfare, rehabilitative, and social services that significantly impact the national social and
health system and involve a growing number of caregivers who are difficult to find [1].
Advances in information and communications technology (ICT), nanotechnology, and
artificial intelligence (AI) have made it possible to develop efficient home care systems [2],
contributing to the containment of public expenditure and the improvement of the living
conditions of older adults. The creation of intelligent objects, ordinarily present in the home,
the advent of IoT, and the existence of AI algorithms have created the right conditions
for the creation of smart environments (AmI) [3] and ambient assisted living (AAL) [4].
These systems make the home active, intelligent, and safe, making it possible to carry out
daily activities in the best possible way and with full autonomy, as well as ensuring timely
intervention in critical situations. The innovations in care for older people, introduced by
technological evolution, are evident in the creation of smartwatches [5] and fitness bracelets
for monitoring vital parameters such as blood pressure, heart rate, and physical activity;
telemedicine to remotely monitor health status and establish treatment plans [6]; and robots
to support social care [7].
The automatic detection of physical activities performed by human subjects is identi-
fied as human activity recognition (HAR). Its goal is correctly classifying data or images
Citation: Bibbò, L.; Vellasco, into gestures, actions, and human-to-human or human–object interactions. Identification
M.M.B.R. Human Activity is achieved using AI that analyzes activity data captured from different sources. Sources
Recognition (HAR) in Healthcare. range from wearable sensors [8] and smartphone sensors [9] to photographic devices or
Appl. Sci. 2023, 13, 13009. https:// CCTV cameras [10]. HAR is used in different fields of application ranging from video
doi.org/10.3390/app132413009 surveillance systems, the assessment of the state of health or the analysis of patient behavior
in a natural environment by monitoring the actions carried out, or even for the detection of
Received: 2 December 2023
anomalies predicting falls, to human–computer interaction and robotics. Depending on the
Accepted: 4 December 2023
area of application, the sensors used will be different.
Published: 6 December 2023
From a functional point of view, HAR consists of the following phases:
• Automatic acquisition of data on activities performed and vital signs through wearable
sensors and sensors connected to medical equipment.
Copyright: © 2023 by the authors. • Data pre-processing (elimination of any noise or unwanted signals).
Licensee MDPI, Basel, Switzerland.
• Features extraction.
This article is an open access article
• Model training and testing.
distributed under the terms and
• Activity recognition.
conditions of the Creative Commons
Attribution (CC BY) license (https:// Two technologies can be used for activity recognition: recognition based on vision
creativecommons.org/licenses/by/ or sensor-based recognition. Inertial sensors are preferred over video-based sensors that
4.0/).

Appl. Sci. 2023, 13, 13009. https://doi.org/10.3390/app132413009 https://www.mdpi.com/journal/applsci


1
Appl. Sci. 2023, 13, 13009

require the installation of cameras in all rooms in a house for motion recording. In ad-
dition, they are expensive, and the accuracy of reconnaissance is affected by brightness
problems and inevitable visual disturbances, as well as violating privacy. Sensors based on
MEMS technology are miniaturized, economical, and have low power consumption [11].
Monitoring activities in the environment where older people live is relevant to evaluating
their behavioral changes. Technology can help to detect and alert healthcare professionals
or family members about a patient’s behavioral changes, preventing serious problems.
Ultimately, with the help of these systems, we can monitor the patient’s status depending
on the specific pathology, the tracking data, and the exact location.
This Research Topic aims to create a collection of articles illustrating different method-
ological approaches to the subject of HAR in an exciting scenario. It contains eleven
articles that will be briefly described below to stimulate the reader’s interest and to
expand their understanding.

2. An Overview of Published Articles


Ojiako and Farrahi (contribution 1) experimented with an innovative predictive model
of human activities (HAR). They demonstrated that the sensor-based MLP mixer archi-
tecture enables competitive performance in vision-based tasks with lower computational
costs than other deep learning techniques. The MLP mixer recently created by Google
Brain [12] does not use convolutions or self-attention mechanisms, and instead consists
entirely of MLPs. The authors compared the performance of the MLP mixer with the
existing state-of-the-art literature:
*Ensemble LSTM.
*CNN-BiGRU.
*AttenSense.
*Multi-agent attention.
*DeepConvLSTM.
*Triple attention.
*Self-Caution*CNN.
*b-LSTM-S.
The performance was 10.1% better in the Daphnet Gait dataset, 1% better in the
PAMAP2 dataset, and 0.5% better in the Opportunity dataset.
Velasco et al. (contribution 2) used the HAR approach to understand human behavior
by analyzing data representative of domestic routines. Their study is oriented towards
establishing a connection between the activities of daily living, the spaces in which they
take place, and the times related to the performance of the activities in a given place. Re-
search has shown that this information is helpful for healthcare professionals to assess the
health status of patients, for family members to keep track of the habits of relatives, and
for home designers to assess the architectural characteristics of home interiors for acces-
sibility and movement of residents. The authors used the knowledge discovery database
(KDD) approach with the data analyst variant as a key player in the knowledge discovery
process [13]. The KDD approach is an interactive and iterative knowledge discovery pro-
cess that identifies relationships between data that must be valid, new, potentially useful,
and understandable. The analyst gains a greater understanding of the domestic routine
with each process iteration. The parameters used for the evaluation are the sequence
of places visited, times of day at which they are visited, and average duration of visits;
the signals are acquired using PIR sensors connected to a Raspberry Pi4, placed inside
each room of the house. Transitions between positions are detected by measuring the
RSSI power of the Bluetooth signal emitted by a BLE device worn by the subject being
monitored. The evaluation of the method was verified through workshops with seventeen
multidisciplinary participants: architects, engineers, health professionals, and caregivers.
The feedback obtained was positive, confirming the validity of the method adopted as a
source of significant information on the status of the monitored subjects.

2
Appl. Sci. 2023, 13, 13009

In the third manuscript, Huang et al. (contribution 3) proposed a new multiscale


hierarchical adaptive network structure for HAR called HMA Conv-LSTM. In this model,
there are:
• a multi-scale hierarchical convolution module (HMC) that performs finer-grained
feature extraction on the spatial information of feature vectors;
• an adaptive channel feature fusion module that can blend functionality at different
scales, improving model efficiency and removing redundant information;
• a dynamic channel-selection module-LSTM based on the attention mechanism to
extract time context information.
This multi-scale convolution module uses convolutional cores of different scales for
extracting and splicing multi-scale features in both sensory and temporal dimensions. This
strengthens the network’s ability to recognize features of different scales, improves its
adaptability, and enhances its ability to characterize features.
The diversity and duration of the actions detected by sensors placed on different
body positions dictate longer sliding window sizes for segmentation. This sizing can
result in some fine-grained subtle action processes being overlooked, thus affecting action
recognition. In contrast, the proposed hierarchical architecture can split the action window
and extract features from the sensor sequence data with finer granularity to recognize
the finer action processes effectively. To validate the efficacy of the proposed model, the
authors carried out experiments on several public HAR datasets: Opportunity, PAMAP2,
USC-HAD, and Skoda. Their model was built using Google’s open-source TensorFlow
2.9.0 deep learning framework. The proposed model achieves competitive performance
compared to several state-of-the-art approaches. The evaluation results also show that
the proposed HMA Conv-LSTM can effectively obtain the temporal context and spatial
information from sensor sequence data.
Again, Mekruksavanich et al. (contribution 4) used an innovative approach based
on a DL network and the nature of the data. Exploiting the potential offered by WiFi-
based detection techniques, they used channel status information (CSI) [14] rather than
the received signal strength indicator (RSSI). The authors proposed a hybrid deep learning
network called CNN-GRU-AttNet that leverages the strengths of CNN and GRU to extract
informative spatio-temporal features from raw CSI data automatically and to efficiently
classify tasks. They also integrated an attention mechanism into the network that prioritizes
important features and time steps, thereby improving recognition performance. The
network consists of five layers: the input layer, two CNN layers, a GRU layer, an attention
layer, a fully connected layer, and an output layer. To assess the effectiveness of the
proposed model, the authors used two publicly accessible datasets, CSI-HAR and Stan
WIFI. They refer to seven activities: walking, running, sitting, lying down, standing up,
bending, and falling. Because these datasets did not have predefined training and test sets,
they adopted the cross-validation technique five times to evaluate the model’s performance.
They also performed a comparative evaluation of the performance of five core deep learning
models: CNN, LSTM, BiLSTM, GRU, and BiGRU.
The results show exceptional efficacy in the classification of HAR activities, superior to
the five basic DL models, producing an average accuracy of 99.62%, an accuracy of 99.61%,
and an F1 score of 99.61% in all movements.
Kim and Lee (contribution 5), aware that some physical activities may include similar
features that lead the automatic classification phase to incorrect evaluations, proposed
a new approach to improve recognition accuracy. Their proposed method uses a smart-
phone’s three-axis acceleration and gyroscopic data to define activity patterns visually. In
particular, the method expands the sensor data into 2D and 3D images. This generates
new characteristics of human activities that cannot be detected in one-dimensional data.
These new features allow, on the one hand, the recognition of more diverse types of human
physical activity and, on the other hand, the identification of unique characteristics among
similar types of activities. The raw values from the accelerometer and gyroscope that
correspond to the breadth of the continuous data of the activities performed are used to

3
Appl. Sci. 2023, 13, 13009

represent 2D image models. Each time-series value is transformed into a luminosity value,
obtaining the Brightness Intensity Distribution Model (BIDP) for each physical activity
data. Each point is expressed as a distinct brightness value based on the measured value.
This type of representation includes areas of intense and low brightness depending on the
location of the data waveform that can degrade the model’s performance. To overcome this
problem, the authors carried out a processing step to generate a standardized visual image.
The image data were used in the training phase along with the raw 1D data to increase
the precision and accuracy of the HAR. The sensor data from the triaxial accelerometer
and gyroscope used in this study came from the “WISDM Activity and Biometrics for
Smartphones and Smartwatches” published by Weiss [14]. The neural network used was of
the multidimensional convolutional type. The model achieved a 90% or higher performance
for all 18 classes of physical activity examined.
This model’s HAR performance was superior to previous studies’ corresponding performance.
Caramaschi et al. (contribution 6) experimented with a model for the recognition
of human activity independent of the orientation of the worn device that classified five
predefined activities within a range of actions that could occur in a clinical setting. Their
proposal stems from the study of how changes in sensor orientation affect the classification
of deep learning (DL) human activity recognition (HAR) targeting activities such as slow
and assisted walking and wheelchair use. The HAR model is orientation-agnostic, uses data
augmentation, and is trained with acceleration measurements recorded from five sensor
positions on the participant’s trunk. The wearable sensor data augmentation approach,
first used by Ohashi et al. [15], positively affects time-series computing and potentially
improves data-driven tasks such as HAR. They used two datasets. The first is the Wearing
Position Study (WPS) acquired at Philips Research Laboratories (2022). It contains three-axis
acceleration measurements from nineteen healthy volunteers, comprising ten males and
nine females. The second is the Simulated Hospital Study (SHS) acquired at Philips
Research Laboratories (2019). It includes ten healthy male and ten female volunteers.
Five GENEActive (GA) sensors were used for monitoring: two in contact with the skin,
two dangling from the neck, and one in the pocket of the clinical gown. The implemented
HAR model is a modified version of the DNN proposed by Fridriksdottir et al. [16]. The
main difference is replacing the long short-time memory layer with a convolutional layer.
This change in architecture was introduced to simplify the model and did not generate
significantly different results from the previous DNN. The performance achieved by the
two sets was evaluated to choose the number of augmented rotation intervals to be applied
to the training data. The first set consisted of seven rotations between 0 and 90 degrees,
while the second set consisted of seven rotations between 0 and 180 degrees. In light
of this preliminary analysis, the final augmentation settings for the augmented model’s
training set consisted of ten rotations from 0 to 180, with a 20-degree pitch on the frontal,
longitudinal, and sagittal axes separately. Cross-validation was used five times to train
both the base and augmented model. The cross-validation performance was used to
evaluate the augmentation approach (i.e., the range of rotations) and the effect of rotation
on the baseline model. The control data results confirmed the augmented model’s good
performance obtained during cross-validation. Testing showed that as the data increased,
the model could learn additional configurations not provided by the initial dataset.
Adherence to cardiac rehabilitation does not currently produce the expected results,
negatively affecting the health status of patients and the use of available resources. To
improve this trend, Filos et al. (contribution 7) set up a study based on machine learning
techniques to predict the adherence of patients with cardiovascular disorders to a six-month
home cardiac telerehabilitation program. Their approach is based on the use of clinical in-
formation available before the start of a program and behavioral and cardiovascular fitness
characteristics acquired during the preliminary phase of familiarization with the program.
As a first step, the methodology applied involves classifying patients into different clusters.
Hierarchical clustering, an algorithm that groups objects with similar characteristics in a tree
hierarchy, was used for classification. The baseline data led to the formation of three groups

4
Appl. Sci. 2023, 13, 13009

of patients: an active, low-risk patient group, sedentary, high-risk patients, and a group
of patients at high cardiovascular risk but who are fit and motivated. Familiarity with
exercise showed three adherence behaviors (high adherence, low adherence, and transient
adherence), while exercise sessions after the familiarization phase resulted in adherent
and non-adherent clusters. Two model types, namely repetitive decision trees (DT) and
random forest (RF), were used to predict long-term adherence. The data to develop the DT
model were patient clusters created based on baseline characteristics and clusters related
to adherence to the exercise program. Since the DT model is unstable, a slight variation
in the training dataset can lead to changes in the tree. A random forest (RF) technique,
which is more stable, was thus applied. The first model showed both high accuracy and
high recall, at 80.2 ± 19.5% and 94.4 ± 14.5%, respectively, which were better than the
performance of the second model, which displayed a precision of 71.8 ± 25.8% and a recall
of 87.7 ± 24%. Network analysis was applied to discover correlations of their characteristics
that relate to adherence. This study highlighted how important the combination of basic
clinical data with the characteristics acquired during a brief familiarization phase is for
the high-accuracy prediction of adherence to the long-term RC program. The proposed
methodology can be generalized to facilitate the identification of patients who are more
adherent to telerehabilitation programs.
Obesity increases the risk of many chronic diseases, especially cardiovascular disease,
and is a cause of death. Faced with the rapid increase in obesity in the population, Vidal
et al. (contribution 8) developed a cross-sectional analytical study of residents of the United
States of America (USA) who have an Instagram account to determine whether using any
meal tracking platform to record food consumption correlated with an improvement in
body mass index (BMI). The survey was conducted on a sample of actual or graduate
students from Mary Hardin Baylor University, Oakland University, the University of
Kentucky, and Queens University in Charlotte. Eight hundred and ninety-six subjects with
an Instagram account signed up to participate in an anonymous online survey, of which
78.7% were women, 20.6% were men, and 0.7% were classified as others. As for generations,
11.5% belonged to Generation Z, 75.6% to the Millennials, 11.4% to Generation X, and 1.6%
to the Baby Boomers. Overall, 93.5% of the sample did not smoke, 2.3% smoked, and 4.1%
smoked occasionally. Concerning academic qualifications, 3.7% had high school graduates,
6.1% had some university credits, 0.6% had technical training, 3.2% had an associate degree,
43.2% had a bachelor’s degree, 15.1% had a master’s degree and 28.1% had a doctorate. The
information acquired through the questionnaire included the number of hours per week
dedicated to Instagram or physical activity and the intensity of physical activity performed.
In order to test the influence of using any meal tracking platform to record food intake
on BMI, they were asked if they had used any digital platform in the past month. The
chi-square test was used to study the relationships between the use of any digital platform
in the last month and gender, generation, smoking habits, highest academic degree earned,
and time spent on Instagram. The Mann–Whitney U test was adopted to compare BMI,
weekly hours spent on Instagram looking at nutrition- or physical activity-related content,
vigorous physical activity, moderate physical activity, time spent walking, and time spent
sitting among participants who did not eat meals. The survey showed that the platform
was used by 34.2% of the sample. Participants who used any meal tracking platform
also had a higher BMI, invested more hours per week on Instagram looking at nutrition-
or physical activity-related content, and performed more minutes per week of vigorous
physical activity. The survey showed that participants rely on new technologies for optimal
weight without obtaining practical results. The authors believe that combining care with
digital app-based tools and support from healthcare professionals can help individuals to
effectively achieve a healthy weight.
In the ninth paper, Alemayol et al. (contribution 9) proposed a gait and pose analysis
study based on estimating the angle of the lower limb joint from a single inertial sensor.
Gait analysis is critical in healthcare; it is mainly adopted for precise patient monitoring,
the identification of movement abnormalities, the evaluation of surgical findings, and

5
Appl. Sci. 2023, 13, 13009

the detection of osteoarthritis of the knee and hip to diagnose Parkinson’s disease. Gaits
are interpreted through three types of parameters: spatiotemporal (e.g., stride speed and
length/stride), kinematic (e.g., hip extension/flexion), or kinetic parameters (e.g., ground
reaction moments and forces). The authors used kinematic parameters, the joint angles
of the lower limb, and preferred wearable sensors for data collection. These sensors are
preferred to non-wearable ones, which generally consist of optical motion acquisition
systems with high position accuracy, as they are expensive and require longer installation
times and specific skills. Motion analysis in a real-world environment requires precise
and reliable sensors. The investigations identified the Xsens inertial sensors as the most
suitable for this purpose. The literature has various testimonies on the number of sensors,
their positioning and estimation methods, and the analysis of movement. The authors
employed various neural network algorithms to determine the number and placement
of sensors for estimating the joint angle of both legs. To calculate the actual values of the
lower limb joint angle, seven individual Awinda sensors were mounted on the lower half
of the body of each of the sixteen subjects, in particular one on the pelvis at the height of the
anterior-superior iliac spine, another on each of the lateral thighs, two more on the upper
parts of the tibiae and finally two more on the upper anterior parts of the feet. The goal
was the estimation of leg kinematics (joint angles) from any of the sensors attached to the
body. The authors used four different neural network models for the estimation: long-term
bidirectional memory (BLSTM), convolutional neural network, wavelet neural network,
and unidirectional LSTM. Two groups of target angles of the leg joint were examined. The
first set contained only four corners of the leg joint in the sagittal plane, while the second
included six angles of the leg joint in the sagittal plane and two angles of the leg joint in
the coronal plane. By evaluating different combinations of networks and datasets, it was
found that the BLSTM network was the best performer with both datasets, with an absolute
mean error (MAE) of between 3.02◦ and 4.33◦ for the four dominant angles of the leg joint
in the sagittal plane. The results improved with an increased number of sensors and the
introduction of biometric information. From the investigation of the placement of the single
sensor, it was found that the shin or thigh is the optimal position for estimating the angle of
the leg joint. Actual leg movement was compared to a computer-generated simulation of
leg joints, which demonstrated the possibility of estimating leg joint angles during walking
with a single inertial sensor.
Bibbò et al. (contribution 10) developed an innovative model to detect subjects’
emotional health using a self-normalizing neural network (SNN) containing an ensemble
layer. In the context of HAR, computer vision technology can be applied to recognize
emotional states through facial expressions using facial positions such as the nose, eyes,
and lips. The recognition of facial emotions is important because, from the analysis of the
face, it is possible to detect the subject’s health status, such as anxiety, depression, stress,
malaise, and neurodegenerative disorders, making facial diagnosis possible. This is a
beneficial technique in caring for older adults; through the information provided, medical
staff can evaluate the type of intervention required to reduce the subjects’ discomfort. Some
facial manifestations can be associated with the first pathological symptoms, preventing
diseases that can degenerate. The innovation produced by the authors is the development
of an AI classifier based on a set of classifier neural networks whose outputs are directed to
an ensemble layer. In particular, the networks are self-normalizing neural networks (SNNs).
The model comprises six SNNs, each trained to identify six emotions (anger, disgust, fear,
happiness, sadness, and surprise). The networks cascade, and each is dedicated to detecting
the presence or absence in the input image of a single specific emotion (among the six
present in this study) assigned to and associated with it. Each neural network is trained
with its images for a specific emotion. Each network produces two outputs, among which
the first, identified with EM through a numerical enhancement (from 0 to 1), confirms the
correspondence of the emotion detected with that assigned to the network. The second,
identified with AM, similarly through a numerical enhancement (from 0 to 1), signals the
presence of a different emotion from that assigned to the specific network. These outputs

6
Appl. Sci. 2023, 13, 13009

are then transferred to the ensemble layer, which provides an accurate result by analyzing
the outputs of the individual networks according to statistical logic. Kaggle was used as the
dataset. The authors used an approach to validate the results through the control network
in the experiments. The results showed a success rate for almost all emotions of around
80%, with a peak of 95% for the emotion “Fear”.
The exciting topic of the metaverse is addressed in the eleventh article of this collection.
One of the areas in which the metaverse is applied is digital games. Virtual reality and ani-
mation allow virtual characters to take on natural roles and generate new immersive ways
to live their lives. Oliveira et al. (contribution 11) aimed their research at understanding
the impact of the concept of the metaverse on ordinary people’s lives. The definition of the
concept of the metaverse was first postulated by Neal Stephenson in his book Snow Crash
in 1992. It was defined as a virtual world capable of reaching, interacting, and influencing
human existence [17]. There currently needs to be a single definition.
The metaverse can be understood as a network of interconnected 3D virtual worlds
rendered in real time that can be experienced synchronously and persistently by an unlim-
ited number of users. This study is part of the research on the metaverse, virtual reality,
and gaming. It was produced in three focus groups with Portuguese adults who are regular
video game players. The focus group originated in the work of the Bureau of Applied
Social Research at Columbia University in 1940. It is used in research in several disciplines.
It is a qualitative method of collecting data on a particular topic in an informal discussion
between selected people. During the discussion, information is gained about what people
think or feel and how they act. The developed investigation has the following aims:
• To verify how the metaverse is represented and characterized;
• To identify which technologies stimulate the immersion experience;
• To identify the main dimensions that influence the acceptance of the metaverse concept;
• To understand perceptions of metaverse and VR regarding socialization and well-being;
• To test perceptions of a player’s daily life regarding the concepts of the metaverse,
virtual reality, and gaming;
• To understand the impact of social representations on the concept of play;
• To understand animation’s perceived role in relation to the Metaverse, Virtual Reality,
and gaming concepts.
The data collected during the focus groups are the answers provided by the 13 partici-
pants to the twenty-eight questions distributed across the three themes: games, animation,
and metaverse. The results obtained from player responses produced accurate information
on how the metaverse is represented and characterized and relates to virtual reality and
gaming. In conclusion, the metaverse is considered a game that allows immersive experi-
ences through virtual reality technology and the style and esthetics of animation. It is also
seen as a means of socialization and communication, and a promoter of well-being.
In the future, its expansion into the world of social networks as a means of communi-
cation is foreseeable.

3. Conclusions
AI-based automated HAR monitoring systems are exceptional tools that can be inte-
grated into current practices to improve quality of life. The role of AI is essential in HAR
systems because of its ability to extract hidden information and the level of accuracy shown
in its classification activities. However, using these innovative technologies raises several
issues related to divergent considerations among stakeholders concerning security, privacy,
and health implications due to the use of these technologies. The approach in the design
phase to the role of AI, from the point of view of its responsibilities, needs to be sufficiently
clear. It should be highlighted whether the ML model is assistive or autonomous. Assis-
tive models provide healthcare professionals with treatment, diagnosis, and management
suggestions, leaving them responsible for making decisions. Autonomous models provide
direct diagnoses without any interpretation or supervision from the doctor. Since the de-
veloper’s choice regarding the level of autonomy has clear implications for accountability,

7
Appl. Sci. 2023, 13, 13009

it should be the subject of dialogue and discussion between stakeholders. Implementing


machine learning systems requires considering both clinical and ethical aspects to produce
benefits in health care, facilitate independent living, and reduce healthcare spending. One
of the biggest challenges we will see in the future is the development of increasingly high-
performance artificial intelligence models in new application domains that comply with
moral and ethical requirements [18].

Author Contributions: L.B.: Writing—original draft, Writing—review and editing. M.M.B.R.V.:


Review and editing. All authors have read and agreed to the published version of the manuscript.
Conflicts of Interest: The authors declare no conflict of interest.
List of Contributions
1. Ojiako, K.; Farrahi, K. MLPs Are All You Need for Human Activity Recognition. Appl. Sci. 2023,
13, 11154. https://doi.org/10.3390/app132011154.
2. Arribas Velasco, A.; McGrory, J.; Berry, D. An Evaluation Study on the Analysis of People’s
Domestic Routines Based on Spatial, Temporal and Sequential Aspects. Appl. Sci. 2023, 13,
10608. https://doi.org/10.3390/app131910608.
3. Huang, Q.; Xie, W.; Li, C.; Wang, Y.; Liu, Y. Human Action Recognition Based on Hierarchical
Multi-Scale Adaptive Conv-Long Short-Term Memory Network. Appl. Sci. 2023, 13, 10560.
https://doi.org/10.3390/app131910560.
4. Mekruksavanich, S.; Phaphan, W.; Hnoohom, N.; Jitpattanakul, A. Attention-Based Hybrid
Deep Learning Network for Human Activity Recognition Using WiFi Channel State Information.
Appl. Sci. 2023, 13, 8884. https://doi.org/10.3390/app13158884.
5. Kim, C.; Lee, W. Human Activity Recognition by the Image Type Encoding Method of 3-Axial
Sensor Data. Appl. Sci. 2023, 13, 4961. https://doi.org/10.3390/app13084961.
6. Caramaschi, S.; Papini, G.; Caiani, E. Device Orientation Independent Human Activity Recog-
nition Model for Patient Monitoring Based on Triaxial Acceleration. Appl. Sci. 2023, 13, 4175.
https://doi.org/10.3390/app13074175.
7. Filos, D.; Claes, J.; Cornelissen, V.; Kouidi, E.; Chouvarda, I. Predicting Adherence to Home-Based
Cardiac Rehabilitation with Data-Driven Methods. Appl. Sci. 2023, 13, 6120. https://doi.org/10.3390/
app13106120.
8. Tricás-Vidal, H.; Vidal-Peracho, M.; Lucha-López, M.; Hidalgo-García, C.; Monti-Ballano, S.;
Márquez-Gonzalvo, S.; Tricás-Moreno, J. Association between Body Mass Index and the Use of
Digital Platforms to Record Food Intake: Cross-Sectional Analysis. Appl. Sci. 2022, 12, 12144.
https://doi.org/10.3390/app122312144.
9. Alemayoh, T.; Lee, J.; Okamoto, S. Leg-Joint Angle Estimation from a Single Inertial Sensor
Attached to Various Lower-Body Links during Walking Motion. Appl. Sci. 2023, 13, 4794.
https://doi.org/10.3390/app13084794.
10. Bibbo’, L.; Cotroneo, F.; Vellasco, M. Emotional Health Detection in HAR: New Approach Using
Ensemble SNN. Appl. Sci. 2023, 13, 3259. https://doi.org/10.3390/app13053259.
11. Oliveira, A.; Cruz, M. Virtually Connected in a Multiverse of Madness?—Perceptions of Gaming,
Animation, and Metaverse. Appl. Sci. 2023, 13, 8573. https://doi.org/10.3390/app13158573.

References
1. Un’indagine Sugli Anziani non Autosufficienti: Le Scelte delle Famiglie tra Assistenza Domiciliare e RSA. I Luoghi della Cura
Rivista Online Network Non Autosufficienza (NNA). 2022. Available online: https://www.luoghicura.it/dati-e-tendenze/2022/11
(accessed on 8 July 2021).
2. Bibbo, L.; Carotenuto, R.; Corte, F.D.; Merenda, M.; Messina, G. Home care system for the elderly and pathological conditions. In
Proceedings of the 7th International Conference on Smart and Sustainable Technologies (SpliTech), Split/Bol, Croatia, 19 August
2022; pp. 1–7. [CrossRef]
3. Gams, M.; Gu, I.Y.-H.; Härmä, A.; Muñoz, A.; Tam, V. Artificial intelligence and ambient intelligence. J. Ambient. Intell. Smart
Environ. 2019, 11, 71–86. [CrossRef]
4. Cicirelli, G.; Marani, R.; Petitti, A.; Milella, A.; D’Orazio, T. Ambient Assisted Living: A Review of Technologies, Methodologies
and Future Perspectives for Healthy Aging of Population. Sensors 2021, 21, 3549. [CrossRef] [PubMed]
5. San-Segundo, R.; Blunck, H.; Moreno-Pimentel, J.; Stisen, A.; Gil-Martín, M. Robust Human Activity Recognition using
smartwatches and smartphones. Eng. Appl. Artif. Intell. 2018, 7, 190–202. [CrossRef]
6. Şahin, E.; Yavuz Veizi, B.G.; Naharci, M.I. Telemedicine interventions for older adults: A systematic review. J. Telemed. Telecare 2021.
[CrossRef] [PubMed]

8
Appl. Sci. 2023, 13, 13009

7. Bradwell, H.L.; Aguiar Noury, G.E.; Edwards, K.J.; Winnington, R.; Thill, S.; Jones, R.B. Design recommendations for socially assis-
tive robots for health and social care based on a large-scale analysis of stakeholder positions: Social robot design recommendations.
Health Policy Technol. 2021, 10, 100544. [CrossRef]
8. Uddin, M.Z.; Soylu, A. Human activity recognition using wearable sensors, discriminant analysis, and long short-term memory-
based neural structured learning. Sci. Rep. 2021, 11, 16455. [CrossRef]
9. Straczkiewicz, M.; James, P.; Onnela, J.P. A systematic review of smartphone-based human activity recognition methods for health
research. NPJ Digit. Med. 2021, 4, 148. [CrossRef]
10. Sharma, V.; Gupta, M.; Kumar Pandey, A.; Mishra, D.; Kumar, A. A Review of Deep Learning-based Human Activity Recognition
on Benchmark Video Datasets. Appl. Artif. Intell. 2022, 36, 1. [CrossRef]
11. Demrozi, F.; Pravadelli, G.; Bihorac, A.; Rashidi, P. Human Activity Recognition using Inertial, Physiological and Environmental
Sensors: A Comprehensive Survey. IEEE Access 2020, 8, 210816–210836. [CrossRef] [PubMed]
12. Tolstichin, I.O.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Steiner, A.; Keyser, D.; Uszkoreit, J.;
et al. MLP-Mixer: An all-MLP architecture for vision. In Advances in Neural Information Processing Systems, Proceedings of the 35th
Conference on Neural Information Processing Systems (NeurIPS 2021), Online, 6–14 December 2021; Ranzato, M., Beygelzimer, A.,
Dauphin, Y., Liang, P., Wortman Vaughan, J., Eds.; Neural Information Processing Systems Foundation, Inc. (NeurIPS): Vancouver,
BC, Canada, 2021; Volume 34, pp. 24261–24272. Available online: https://proceedings.neurips.cc/paper_files/paper/2021/file/
cba0a4ee5ccd02fda0fe3f9a3e7b89fe-Paper.pdf (accessed on 8 July 2021).
13. Brachman, R.J.; Arnad, T. The Process of Knowledge Discovery in Databases: A First Sketch. In Proceedings of the 1994 {AAAI}
Workshop, Seattle, WA, USA, 31 July–4 August 1994. Technical Report {WS-94-03}.
14. Weiss, G.M. UCI Machine Learning Repository: WISDM Smartphone and Smartwatch Activity and Biometrics Dataset. 2019.
Available online: https://archive.ics.uci.edu/ml/machine-learning-databases/00507/WISDM-dataset-description.pdf (accessed
on 8 July 2021).
15. Ohashi, H.; Al-Nasser, M.; Ahmed, S.; Akiyama, T.; Sato, T.; Nguyen, P.; Nakamura, K.; Dengel, A. Augmenting wearable sensor
data with physical constraint for DNN-based human-action recognition. In Proceedings of the ICML 2017 Times Series Workshop,
Sydney, NSW, Australia, 6–11 August 2017; pp. 6–11.
16. Fridriksdottir, E.; Bonomi, A.G. Accelerometer-based human activity recognition for patient monitoring using a deep neural
network. Sensors 2020, 20, 6424. [CrossRef] [PubMed]
17. Ball, M. The Metaverse: And How it Will Revolutionize Everything; W.W. Norton & CO: New York, NY, USA, 2022.
18. Siau, K.; Wang, W. Artificial Intelligence (AI) Ethics: Ethics of AI and Ethical AI. J. Database Manag. 2020, 31, 74–87. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

9
applied
sciences
Article
MLPs Are All You Need for Human Activity Recognition
Kamsiriochukwu Ojiako * and Katayoun Farrahi

School of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UK;
[email protected]
* Correspondence: [email protected]

Abstract: Convolution, recurrent, and attention-based deep learning techniques have produced
the most recent state-of-the-art results in multiple sensor-based human activity recognition (HAR)
datasets. However, these techniques have high computing costs, restricting their use in low-powered
devices. Different methods have been employed to increase the efficiency of these techniques;
however, this often results in worse performance. Recently, pure multi-layer perceptron (MLP) archi-
tectures have demonstrated competitive performance in vision-based tasks with lower computation
costs than other deep-learning techniques. The MLP-Mixer is a pioneering pureMLP architecture that
produces competitive results with state-of-the-art models in computer vision tasks. This paper shows
the viability of the MLP-Mixer in sensor-based HAR. Furthermore, experiments are performed to
gain insight into the Mixer modules essential for HAR, and a visual analysis of the Mixer’s weights is
provided, validating the Mixer’s learning capabilities. As a result, the Mixer achieves F1 scores of
97%, 84.2%, 91.2%, and 90% on the PAMAP2, Daphnet Gait, Opportunity Gestures, and Opportu-
nity Locomotion datasets, respectively, outperforming state-of-the-art models in all datasets except
Opportunity Gestures.

Keywords: human activity recognition; MLP-Mixer; efficiency

1. Introduction
The last two decades have witnessed the rapid growth of wearable devices, which
are increasingly being used for ubiquitous health monitoring. Human activity recognition
Citation: Ojiako, K.; Farrahi, K. MLPs
Are All You Need for Human
(HAR) aims at detecting simple behaviours, such as walking or gestures; more complex
Activity Recognition. Appl. Sci. 2023,
behaviours, like cooking or opening a door, with various use-cases that continue to grow
13, 11154. https://doi.org/10.3390/
as the field expands; and assistive technology, such as identifying odd behaviours in the
app132011154 elderly, including falls [1], skill assessment [2], helping with rehabilitation [3], sports injury
detection, and ambient assisted living [4–6]. Accurately predicting human activities from
Academic Editors: Luigi Bibbò and
sensor data is difficult due to the complexity of human behaviour and the noise in the
Marley M.B.R. Vellasco
sensor data [7].
Received: 6 September 2023 With larger datasets and more computational power, deep learning has evolved, re-
Revised: 28 September 2023 moving the need for manually created features and inductive biases from models and
Accepted: 29 September 2023 increasing the reliance on automatically learning features from raw labelled data [8]. Com-
Published: 11 October 2023 plex deep learning techniques, such as convolutions and attention-based mechanisms,
are used increasingly with growing computational capacity. These techniques perform
well with larger models, resulting in processes that are generally more expensive compu-
tationally and memory-wise than previous techniques. Although wearable devices and
Copyright: © 2023 by the authors.
smartphones have rapidly increased in computation efficiency over the past two decades,
Licensee MDPI, Basel, Switzerland.
they are still limited in power and storage; this prevents them from using state-of-the-art
This article is an open access article
distributed under the terms and
deep learning techniques in HAR.
conditions of the Creative Commons
MLP-Mixers, recently created by Google Brain [8], are simplistic and less computa-
Attribution (CC BY) license (https://
tionally expensive models, yet they produce near state-of-the-art results in computer vision
creativecommons.org/licenses/by/ tasks. Wearable devices could produce competitive results in HAR without the significant
4.0/).

Appl. Sci. 2023, 13, 11154. https://doi.org/10.3390/app132011154 https://www.mdpi.com/journal/applsci


10
Appl. Sci. 2023, 13, 11154

computational demands that current state-of-the-art models impose if MLP-Mixers per-


formed similarly in HAR, which would help advance HAR toward low-powered devices.
The main contributions of this paper are as follows:
• We investigate the performance of the MLP-Mixer in multi-sensor HAR, achieving
competitive, and in some cases, state-of-the-art performance in HAR without con-
volution, recurrent, or attention-based mechanisms in the model. The accompanied
code can be found here https://github.com/KMC07/MLPMixerHAR (accessed on 6
October 2023).
• We analyse the impact of each layer in the Mixer for HAR.
• We analyse the effect of the sliding windows on the Mixer’s performance in HAR.
• We perform a visual analysis of the Mixer’s weights to validate that the Mixer is
successfully recognising different human activities.

2. Related Work
Four main categories of deep-learning architecutres have been used in HAR, convolution-
based architectures, recurrent networks, hybrid models, and attention-based models [9].
Evaluation is performed on benchmark HAR datasets, including Opportunity [10], Daphnet
Gait [11], PAMAP2 [12], Skoda Checkpoint [13], WISDM [14], MHEALTH [15], and UCI-
HAR [16].
With the recent success of CNNs in feature detection, Zeng et al. [17] first proposed
using CNNs in HAR, but they only used a basic CNN on a single accelerometer. Next,
Hammerla et al. [18] thoroughly investigated CNN use in HAR and established its viability.
However, good performance requires large CNN models; this increases the computational
cost, constraining their use on low-power devices. To solve this, Tang et al. [19] looked into
the performance and viability of an efficient CNN that uses a tiny Lego filter inspired by
Yang et al. [20]. The paper investigated a resource-constrained CNN model for HAR on
mobile and wearable devices, achieving an F1 score of 91.40% and 86.10% in the PAMAP2
and Opportunity datasets, respectively. However, this work had the drawback of having
slightly worse performance when compared to conventional CNNs when using small Lego
filters instead of traditional filters.
Recurrent networks are good at capturing long-term dependencies, and because of their
architecture, they can pick up temporal features in sequenced data. Hammerla et al. [18]
took advantage of these benefits and proposed three LSTM models: two uni-directional
LSTM and a bi-directional LSTM model, which trains on both historical and upcoming
data. The models were trained and evaluated on the PAMAP2, Opportunity, and Dapnet
Gait datasets. This work described how to train similar recurrent networks in HAR
and introduced a brand-new regularisation method. The bi-LSTM model outperformed
state-of-the-art models in the Opportunity Gestures dataset, achieving an F1 score of
92.7%. Murad et al. [21] showcased the performance of uni-directional, bi-directional,
and cascaded LSTM models. The bi-direction LSTM performed best on the Opportunity
dataset, with an accuracy of 92.5%. The cascaded LSTM performed the best on Daphnet,
with an accuracy of 94.1%. However, the work did not evaluate the models on extensive
and complex human activities; additionally, resource efficiency was not considered when
designing the model.
CNNs effectively extract spatial features from a local area; however, these models do
not have “memory”, making it hard to learn long-term dependencies between different
samples. RNNs, on the other hand, due to their specific structure, have memory allowing
them to learn long-term dependencies; however, they are challenging to train. Researchers
have created hybrid deep learning models to address the shortcomings of both CNN and
RNN neural networks.
Recently, attention mechanisms have been applied in models to improve performance
in HAR. Attention mechanisms allow the model to learn what to focus on in the dataset and
understand the relationship between each input element. Ma et al. [22] combined attention
mechanisms with a CNN-GRU. This architecture provides the benefits of CNNs, GRUs,

11
Appl. Sci. 2023, 13, 11154

and attention, enabling spatial and temporal understanding of the dataset. The model
had good performance on all the datasets explored. However, the model is unsuitable for
low-powered devices due to the computational complexity of combining all these models.
Gao et al. [23] combined temporal and sensor attention in residual networks using a novel
dual attention technique to enhance the capacity for feature learning in HAR datasets.
The temporal attention focuses on the target activity sequence and chooses where in the
sequence to concentrate, whereas the sensor attention is vital in selecting which sensor to
focus on, obtaining accuracy scores of 82.75% and 93.16% on Opportunity and PAMAP2,
respectively. Although this model performed well, it was constrained by the shortage
of labelled multimodal training samples. Additionally, this work did not consider this
model’s computation and memory requirements, which decreases its potential for use in
low-powered devices.

MLP Architectures
In a different area of study, with the arrival of the MLP-Mixer, pure deep MLP archi-
tectures have started appearing in computer vision tasks. The MLP variants have similar
structures to the MLP-Mixer, usually with only the internal layers being modified to im-
prove the model. These MLPs work by using a “token-mixing” or/and “channel-mixing”
layer to capture relevant information from the input, followed by stacking these layers N
times. The MLP-Mixer achieved competitive results in computer vision tasks; however,
CNNs and Transformer-based models such as Vision Transformers (ViT) [24] outperform
the Mixer. To overcome this, Liu et al. [25] proposed a new MLP model called gMLP that
introduces a spatial gating unit into MLP layers to enable cross-token interactions. The
gMLP performs spatial and channel projections similar to the MLP-Mixer; however, there
is no channel-mixing layer. The gMLP has 66% fewer parameters than the MLP-Mixer yet
has a 3% performance improvement.
Another method involves using only channel projections. Removing the token-mixing
layer prevents MLPs from gaining context from the input and stops the tokens from
interacting with one another. Instead, to regain context, the feature maps are spatially
interacted with using channel projections after being shifted to align them between the
various channels [24]. Yu et al. [26] proposed the S2 -MLP. This model uses spatial shift
operations to communicate between patches. This method is computationally efficient
with low complexity. This model achieves high performance even with its simplicity,
outperforming the MLP-Mixer and remaining competitive with ViT. Finally, Wei et al. [27]
proposed ActiveMLP. This is a token-mixing mechanism that enables the model to learn
how to combine the current token with useful contextual information from other tokens
within the global context of the input. This mechanism allows the model to learn diverse
patterns successfully in vision-based tasks, achieving an accuracy of 82% in ImageNet-1K.
The token-Mixer uses static operations. This prevents the token-Mixer from adapting
to the varying content contained in the different tokens. Methods have been proposed
to add adaptability, allowing the varying information in the tokens to be mixed [24].
Tang et al. [28] try to overcome the static token-mixing layer by viewing each token as an
amplitude and phase-varying wave. The phase is a complex number that controls the
influence of how tokens and fixed weights are related in the MLP, whereas the amplitude is
a real number that represents each token’s content. The combined output of these tokens is
affected by the phase difference between them, and tokens with similar phases tend to com-
plement one another. WaveMLP limits the fully connected layers to only tokens connected
within a local window to address the issue of input resolution sensitivity; however, this
prevents the MLP from taking global context across the entire input. WaveMLP is among
the best MLP architectures, achieving 82.6% top 1-accuracy in ImageNet-1K. It achieves
competitive results with CNNs and Transformers but is still outperformed by them. To
improve on this, Wang et al. [29] proposed the DynaMixer; by considering the contents
of each set of tokens to be mixed, DynaMixer can dynamically generate mixing matrices.
The DynaMixer mixes the tokens row-wise and column-wise to improve the computation

12
Appl. Sci. 2023, 13, 11154

speed. In each iteration of the DynaMixer, feature dimensionality occurs to produce the
Mixer matrices; additionally, substantially reducing the number of dimensions has little
impact on the performance. These feature spaces are separated into various segments for
token-mixing. The DynaMixer currently produces state-of-the-art performance among
MLP vision architectures, achieving 82.7% top-1 in Imagenet-1k.

3. Methodology
3.1. MLP-Mixer
The MLP-Mixer (Mixer) does not use convolutions or self-attention mechanisms and
is instead made up entirely of MLPs. Even with a simpler architecture than CNNs and
transformers, the Mixer produces competitive results in computer vision tasks against
state-of-the-art models. The Mixer only uses basic matrix multiplication, changes to data
layout, and scalar non-linearities, resulting in a simpler and faster model. The Mixer has
a similar architecture to the ViT; however, the Mixer’s structure has benefits in terms of
speed by allowing linear computation scaling when increasing the number of input patches
instead of quadratic scaling in the case of the ViT.
Figure 1 illustrates the MLP-Mixer architecture. The input is divided into unique
patches that do not overlap. The patches are linearly projected into an embedding space.
In contrast to the transformer and ViT, the input does not need positional embeddings
as the Mixer is sensitive to the position of the inputs in the token-mixing MLPs [8]. The
Mixer consists of two types of MLP layers: the token-mixing layer and the channel-mixing
layer. The inspiration behind this is that modern vision neural architectures, according
to [8], (1) mix their features at a given spatial location across channels and (2) mix their
features between different spatial locations. CNNs implement (1) with a convolution
layer through the 1 × 1 convolution operation; and (2) using large kernels and by adding
multiple convolution layers with pooling, which decreases the input spatially. In attention-
based models, both (1) and (2) are performed within each self-attention layer. The Mixer’s
purpose is to separate per-location operations (1) and cross-location operations (2). These
features are achieved through two layers, called “token-mixing” and “channel-mixing”,
representing the per-location and the cross-location operations, respectively.

Figure 1. Annotated MLP-Mixer architecture with token-mixing annotated on the left and channel-
mixing annotated on the right. Image from [8].

Each unique patch has identical dimensions. The number of patches is calculated by
dividing the input dimensions ( H, W ) by the patch resolution (P, P), S = HW/P2 . The

13
Appl. Sci. 2023, 13, 11154

sequence of non-overlapping patches is projected into an embedding space with dimension


C, resulting in a matrix of dimensions S × C. The layers in the Mixer are all the same size
and are made up of two MLP blocks each.
• The first block is the token-mixing MLP; the input matrix is normalised and transposed
to allow the data to mix across each patch. The MLP(MLP1) will act on each column
of the input matrix, sharing its weights across the columns. The matrix is transposed
back into its original form. The overall context of the input is obtained by feeding
each patch’s data into the MLP. This token-mixing block essentially allows different
patches in the same channel to communicate.
• The second block is the channel-mixing MLP; this receives residual connections from
its pre-normalised original input to prevent information from being lost during the
training process. The result is normalised, and a different MLP(MLP2) performs
the channel-mixing with a separate set of weights. The MLP acts on each input
matrix row, and its weights are shared across the rows. A single patch’s MLP receives
data from every channel, enabling communication between the information from
various channels.
Each MLP block contains two feed-forward layers with a GELU [30] activation function
applied to each row of the input data. The Mixer layers are calculated in Equation (1) (the
layer index is not included), and the GELU function is demonstrated in Equation (2).

U∗,i = X∗,i + W2 σ (W1 LayerNorm( X )∗,i ), f or i = 1 . . . C, (1)


Yj,∗ = Uj,∗ + W4 σ (W3 LayerNorm(U ) j,∗ ), f or j = 1 . . . S.

GELU ( x ) = xP( X ≤ x ) = xΦ( x ) (2)


It is intuitive to share the weights in each layer of the channel-mixing MLPs as this
offers positional invariance, a key characteristic of convolution layers in CNNs. However,
it is less intuitive to share the weights across channels in the token-mixing MLPs. For
instance, some CNNs use separable convolutions [31], which apply convolutions to each
channel independently of the others. However, these convolutions apply different filters
to each channel, in contrast to the token-mixing MLPs, which use the same filter for all
channels. Additionally, sharing weights in the token-mixing and channel-mixing layers
prevents the Mixer from growing in size quickly when the number of patches, S, or the
dimensions of the embedding space, C, increases, leading to substantial memory savings.
Furthermore, the empirical performance of this model is unaffected by this characteristic.

4. Datasets
To evaluate the performance of the MLP-Mixer in classifying a variety of activities,
three datasets are used for benchmarking.

4.1. Opportunity
The opportunity dataset [10] contains complex labelled data collected from multiple
body sensors. It consists of data from four subjects recorded in a daily living scenario
designed to create multiple activities in a realistic manner. Each subject had six sets of data.
The opportunity dataset consists of all three types of human activities: recurrent, static,
and spontaneous. The subjects wore a body jacket that contained five inertial measurement
units (IMU), made up of a 3D accelerometer, a gyroscope, and a magnetic sensor; two
inertial sensors for both feet; and 12 wireless accelerometers sensors, which suffered from
data loss due to their Bluetooth connection. In this dataset, only sensor data without
packet loss was used. This included data from the inertial sensors on both feet and the
accelerometer sensors on the back and upper limbs, resulting in each sample containing
77 dimensions of sensor data when combining all the sensor data together. The sensors
recorded the data at a sampling rate of 30 Hz. The Mixer is trained, validated, and tested on
are similar to that in the previous literature [18,32–34] for consistency and fair comparison.

14
Appl. Sci. 2023, 13, 11154

The Mixer was tested on ADL4 and AD5 from subjects 2 and 3, ADL2 from subject 1 was
used as the validation set, and the rest of the ADLs and all the drill sessions were used for
training the Mixer. The Opportunity dataset has multiple benchmark HAR tasks, including:
• Opportunity Gestures: This involves successfully classifying different gestures being
performed by the subjects from both arm sensors. There are 18 different gesture
classes.
• Opportunity Locomotion: This involves accurately classifying the locomotion of the
subjects using full body sensors. There are five different locomotion classes.

4.2. PAMAP2
The PAMAP2 dataset [12] contains complex labelled data collected from chest, hand,
and ankle sensors. This consisted of data recorded from nine subjects. Each subject followed
a routine of 12 different actions and optionally performed an addition of 6 activities,
resulting in 18 recorded activities each, 19 if you include the null class.
The PAMAP2, similar to the Opportunity dataset, contains all three types of human
activities. The nine subjects wore IMUs on their hands, ankles, and chest. The IMU
recorded multimodal data, which consisted of an accelerometer, gyroscope, heart rate,
temperature, and magnetic data. In total, the data contains 40 sensor recordings and
12 IMU orientation data points, resulting in each sample containing 52 dimensions of
sensor data when combined. Each sensor sampled the data at a sampling rate of 100 Hz,
and the dataset was downsampled to approximately 33.3 Hz to have a similar sampling rate
to the opportunity dataset. There were missing data present in the dataset from the packet
loss of the wireless sensors. To account for this, only the heart rate sensor was interpolated;
afterwards, samples with missing values were excluded from the dataset. The parts of the
dataset that are trained, tested, and validated are identical to the previous literature [34,35].
The Mixer was tested on subject 6 and validated on subject 5, and the rest were used for
training; however, subject 9 was dropped due to significantly less sensor data compared
to the rest of the subjects. Additionally, the orientation data points were not used as they
were unimportant for this problem, leaving the dataset with a dimension of 40 features. To
make the experiments performed on PAMAP2 comparable with the previous literature, the
optional activities and the null activities are excluded while training the Mixer, resulting in
a total of 12 classes to be classified.

4.3. Daphnet Gait


The daphnet gait dataset [11] contains labelled data collected from accelerometer
sensors. It consists of data collected from 10 subjects who are affected with Parkinson’s
disease (PD). The subjects are instructed to carry out three types of tasks, walking in a
straight line; walking while turning; and realistic ADL scenarios, which involve tasks
such as getting coffee. These tasks were designed to frequently induce gait freezing in the
subjects. Freezing is a common symptom of PD, which causes difficulty starting movements,
such as taking steps, for a short period of time [18]. The goal of the dataset is to detect
whether the subjects are freezing or doing the specified actions (walk, turn). This is a binary
classification problem since the specified action are combined into one class, “No Freeze”,
and the “Null” class is excluded from the experiment.
Accelerometers were used to capture information about the subjects. They were placed
on the chest, above the ankle, and above the knee, resulting in each sample containing nine
dimensions of sensor data when combined. Each sensor sampled the data at a sampling
rate of 64 Hz, and the dataset was downsampled to 32 Hz for temporal comparison with
the other datasets. A fair comparison was maintained by splitting the dataset into training,
validation, and testing sets identical to the early literature [18]. The Mixer was tested on
data from subject 2, validated on subject 9, and trained using the rest of the information.

15
Appl. Sci. 2023, 13, 11154

4.4. Sliding Windows


For the datasets to be trained and tested by the Mixer, a sliding window approach is
used on the dataset. This splits the dataset into multiple sequences with the dimensions
(D f × S L ), where D f is the number of features in the dataset and S L is the sliding window
length. These 2D sequences, in the case of the Mixer, are treated as images. The length of
the sliding window maintains a fixed length throughout each separate training process
but varies across the different datasets and experiments. As mentioned in Section 3.1,
the Mixer takes an input image with dimensions (H, W) that is split into patches with
identical dimensions (P, P). This requires the patch resolution, P, to be fully divisible by
both dimensions of the input. This limits the length of the sliding window to either be
divisible by the number of features in the dataset or divisible by the patch resolution.
The Mixer outputs a prediction of the activity for every sliding window interval after
observing it; however, there would be multiple predictions in the sliding window instead
of a single ground truth prediction. There are multiple methods around this [35], which
involve using the prediction at the end of the sliding window, max-pooling all of the
sequence predictions over time, or returning the most frequent predictions. The Mixer
benefits from mixing its features at a given spatial location across channels and between
different spatial locations. In addition, the token-mixing MLP provides a global context of
the input to the model. Therefore, using the most frequent predictions as the ground truth
prediction is preferred to other methods since the Mixer learns context from the whole
input. The details of the sliding window for each dataset are briefly described below, and
the summary of their parameters is tabulated in Table 1.
• Opportunity: The dataset was fit into a sliding window with an interval of 2.57 s. This
duration represents 77 samples, which makes the input dimensions identical, allowing
the patch resolution to be a factor of 77. The dataset was normalised to account for the
wide range of sensors used in the dataset. After preprocessing the data, there were no
labels of “close drawer 2” activity in the test set (ADL4 and AD5 from subjects 2 and 3).
• PAMAP2: Before downsampling, the dataset was fitted into a sliding window interval
of 0.84 s, which corresponds to 84 samples. The “rope-jumping” activity in subject 6
had a very small number of samples. After preprocessing, there were no labels of this
activity present in the test set (subject 6).
• Daphnet Gait: Before downsampling, a sliding window interval of 2.1 s was used to
fit the dataset; this interval corresponds to 126 samples. Daphnet Gait contains a lot of
longer activities, so a wider sliding window interval was chosen to provide the Mixer
with more information.

Table 1. The parameters used for each dataset. Note, the parameters are chosen in order to make
them comparable to prior literature for a fair comparison.

Opportunity PAMAP2 Daphnet Gait


Parameters
Number of Activities 18 19 2
Number of Features 77 40 9
Sliding Window Length 77 84 126
Sampling Rate 30 Hz 100 Hz 64 Hz
Downsampling 1 3 2
Step Size 3 3 3
Normalisation True False False
Interpolation False True False
Includes Null activities True False False

Large sliding windows were used to give the Mixer access to more information and
enable the sequence to be divided into patches correctly and in an error-free manner.
Smaller step sizes were used because the Mixer tends to overfit, giving it more training

16
Appl. Sci. 2023, 13, 11154

points and ensuring that there were enough data points for adequate testing on the various
activities in each dataset.

4.5. Data Sampler and Generation


A class balance sampler was applied to the training dataset to give similar probability
to the classes during training, allowing the Mixer to learn from each class equally in the
imbalanced datasets. The different samples are stored based on their labelled class. During
each batch, the sampler accesses the training samples based on their weights. The samples
are weighted based on the proportion of their class in the training dataset.

4.6. Patches
The MLP-Mixer requires a sequence of input patches. This layer converts the input
sensor data into separate patches. The patch resolution has to be fully divisible by both
the input height and width dimensions. The patch resolution differed between datasets,
and the resolution for each dataset is tabulated in Table 2. This was implemented using a
strided Conv2D layer in Pytorch. A strided Conv2D layer produces the same results as the
per-patch fully-connected layer used in [8]. This layer reshapes the input from number of
samples, number of channels, input height, and input width to number of samples, number
of patches, and patch-embedding dimensionality.

Table 2. Specification of the Mixer architecture for each dataset.

Opportunity PAMAP2 Daphnet Gait


Specifications
Number of Layers 10 10 10
Patch Resolution 11 4 9
Input Sequence Length 49 210 14
Patch-Embedding Size 512 512 512
Token Dimension 256 256 256
Channel Dimension 2048 2048 512
Learnable Parameters (M) 21 21 5

5. Experimental Setup
The Mixer was trained using the Adam optimiser with the cross-entropy loss as the
criterion and hyperparameters β 1 = 0.9, β 2 = 0.999. The Mixer has a tendency to overfit, so
a weight decay of 1 × 10−3 was used. The gradient clipping at the global norm was set to 1,
and the batch size for the training and testing dataset was 64. A learning rate scheduler
was used, and the learning rate was set to 0.01. For the first 500 steps, the learning rate
scheduler used a linear warm-up rate. Then, until the training was finished, it used a
cosine decay.
The specifications of the Mixer architecture used to produce the main results in
Section 6 is tabulated in Table 2. The experiments were run five times with the best
specifications, and the mean of the results was taken.

5.1. Ablation Study


The Mixer is ablated to compare the importance of different design choices of the
MLP-Mixer in HAR. The different design choices involve the architecture of the Mixer
(token-mixing MLP, channel-mixing MLP) and the RGB embedding layer. The macro F1
score is used in the ablation study to assess the model. This prevents high evaluation scores
by simply choosing the majority class in imbalanced datasets and provides accurate insight
into the model’s learning capabilities across class activities.
The MLP-Mixer without RGB Embedding: The Mixer saw a slight decrease in per-
formance, which meant that this layer made some contribution to the Mixer’s learning
capabilities. This allows the sensor data to simulate the RGB channels in images. This

17
Appl. Sci. 2023, 13, 11154

produces three sets of features for the Mixer to project into its embedding space instead of
a single set of features from the single sensor channel. The results are tabulated in Table 3.

Table 3. Mixer ablation study.

Opportunity PAMAP2 Daphnet


Metric Fm Fm Fm
Base Mixer 0.68 0.971 0.85
Mixer with no RGB Embedding 0.63 0.940 0.79
Mixer with no Token-Mixing 0.05 0.165 0.12
Mixer with no Channel-Mixing 0.569 0.82 0.795

The MLP-Mixer without the Token-Mixing MLPs: The model had a significant de-
crease in performance in all the datasets without the token-mixing MLPs. The Mixer uses
token-mixing to learn global context from the input and communicate information between
patches; without this layer, the Mixer cannot effectively capture the spatial and temporal in-
formation of the activities in the datasets. The results tabulated in Table 3 indicate the Mixer
loses its capabilities to learn relevant features of the dataset; hence, it can be concluded that
the token-mixing MLP is necessary for the Mixer to perform well in HAR benchmark datasets.
The MLP-Mixer without the Channel-Mixing MLPs: The channel-mixing MLPs
allow the model to communicate between channels, essentially acting as a 1 × 1 convolution.
This enables the Mixer to detect features between channels, and without it, only spatial
information between the various patches will be learned. The results tabulated in Table 3
showcase substantial performance loss, which indicates that the channel-mixing MLP is
important for HAR. However, the performance loss is lower than the performance loss in
the absence of the token-mixing MLPs. This indicates that the channel-mixing MLP is a
supplement to the token-mixing MLP, communicating the information learned from the
token-mixing layer across channels rather than capturing core features needed for accurate
prediction in HAR.

5.2. Measuring Performance


When evaluating classification problems, accuracy can be used as a metric that de-
termines the percentage of correct predictions the model made; this works very well in
most problems, but in classification problems with imbalanced datasets, this metric is no
longer as valuable. For example, in a binary classification task, the dataset could be imbal-
anced with a ratio of 1:100 for the minority and majority classes, respectively. Accurately
predicting the majority class but failing to classify all of the minority classes would still
lead to an accuracy of approximately 99%, which does not evaluate the model’s ability to
predict different classes. Fortunately, there are other metrics that can be used on imbalanced
datasets to evaluate the model’s performance. The following possibilities arise when a
model predicts classes:
• True Positive (TP): the model accurately predicts that the class is an activity.
• True Negative (TN): the model accurately predicts that the class is not an activity.
• False Positive (FP): the model inaccurately predicts that the class is an activity.
• False Negative (FN): the model inaccurately predicts that the class is not an activity.

5.2.1. Precision
Precision is the ratio of positive classification for class i over all positive predictions. It
answers the following question: How many samples recognised and predicted as class i,
were correctly classified? The precision is calculated below:

TP
Precision = (3)
TP + FP

18
Appl. Sci. 2023, 13, 11154

5.2.2. Recall
Recall or the true positive rate is the ratio of positive classification prediction for class
i over all predictions of class i. It answers the following question: How many times was
class i correctly classified? The recall is calculated below.

TP
Recall = (4)
TP + FN

5.2.3. F1 -Score
The F1 score combines recall and precision to create a new accuracy-like measurement.
It is the harmonic mean of precision and recall, accounting for the false positives (precision)
and the false negatives (recall) in the different classes. The F1 score is calculated below:

Precision · Recall
F1 = 2 · (5)
Precision + Recall
In a multi-classification problem, having an F1 score for each class is not preferable to
a single score that gives insight into the overall performance of the model. This single score
is obtained using average techniques over all the F1 scores [36].

5.2.4. Macro F1 -Score


The macro F1 score computes the unweighted mean of all the F1 scores. It treats all
classes equally, which is very useful in imbalanced datasets since the imbalance is not taken
into account when averaging the F1 scores.

5.2.5. Weighted F1 -Score


The weighted F1 score computes the weighted mean of all the F1 scores. It weighs
each class based on the number of true occurrences (true positives and false negatives) it
has, which is very useful in imbalanced datasets where you want to give classes with more
instances in the dataset a higher weightage in the F1 score.

6. Results
The Mixer is compared with the following state-of-the-art architectures:
• Ensemble LSTMs [32]: combines multiple LSTMs using ensemble techniques to
produce a single LSTM.
• CNN-BiGRU [37]: CNN connected with a biGRU.
• AttenSense [22]: a CNN and GRU are combined using an attention mechanism to
learn spatial and temporal patterns.
• Multi-Agent Attention [38]: combines multi-agent collaboration with attention-based
selection.
• DeepConvLSTM [35]: combines an LSTM to learn temporal information with a CNN
to learn spatial features.
• BLSTM-RNN [33]: a bi-LSTM, with its weights and activation functions binarized.
• Triple Attention [39]: a ResNet, using a triple-attention mechanism.
• Self-Attention [40]: a self-attention-based model without any recurrent architectures.
• CNN [18]: a CNN with three layers and max pooling.
• b-LSTM-S [18]: bidirectional LSTM that uses future training data.
Table 4 shows the performance comparison between the Mixer and existing state-
of-the-art literature. Table 4 shows that the MLP-Mixer performs better than previous
techniques in the Opportunity Locomotion, PAMAP2, and Daphnet Gait datasets. Despite
the model’s shortcomings in the Opportunity Gestures dataset, it is still competitive with
most of the previously developed methods. Sliding window techniques were used in all
the previous techniques, with only the sliding window lengths and overlaps differing.
Although the Mixer beats the previous techniques in Opportunity Locomotion, most

19
Appl. Sci. 2023, 13, 11154

previous work that used the Opportunity dataset for performance evaluation only focused
on the gesture classification task while disregarding the locomotion task.
The sliding window lengths used were similar to or larger than previous techniques,
allowing the model to capture more information from each interval. Therefore, it can be
concluded that the MLP-Mixer model can learn the spatial and temporal dynamics of the
sensor data more effectively than the previous models. The Mixer performs better than
existing attention and convolution-based models in PAMAP2. The macro-score of the
Mixer is slightly higher (0.97) than the triple-attention model [39] (0.96) and significantly
higher than the best convolution-based model [18] (0.937), and it performed better than the
state-of-the-art by 1%. In the daphnet-gait dataset, the model also performed better than
convolution and recurrent models, producing a macro-score of 0.842 compared to 0.741. It
performed better than the state-of-the-art by 10.1%. However, the existing literature using
the Daphnet Gait focuses more on future prediction [41–43] instead of recognition and uses
different evaluation metrics; therefore, it cannot be directly compared to the Mixer. In the
Opportunity Gestures, the Mixer remains competitive but does not perform better than the
b-LSTM-S. The opportunity dataset was particularly challenging for the MLP-Mixer, due
to shorter activities combined with a larger sliding window necessary for the image to be
split into patches. As a result, there were several activities in the training sliding window,
making it more difficult for the Mixer to learn and harder for it to predict activities in the
test sliding window. The b-LSTM-S performed 1.7% better than the Mixer in this dataset.

Table 4. State-of-the-art comparison for MLP-Mixer scores with bold font showing the best perform-
ing cases. Mixer results in the format mean ± std. Fw is the weighted F1 score, and Fm is the F1
macro score.

Opportunity Opportunity
PAMAP2 Daphnet Gait
Locomotion Gestures
Metric Fw Fm Fm
Ensemble LSTMs [32] - 0.726 0.854 -
CNN-BiGRU [37] - - 0.855 -
AttenSense [22] - - 0.893 -
Multi-Agent Attention [38] - - 0.899 -
DeepConvLSTM [35] 0.895 0.917 - -
BLSTM-RNN [33] - - 0.93 -
Triple Attention [39] - - 0.932 -
Self-Attention [40] - - 0.96 -
CNN [18] - 0.894 0.937 0.684
b-LSTM-S [18] - 0.927 0.868 0.741
MLP-Mixer 0.90 ± 0.005 0.912 ± 0.002 0.97 ± 0.002 0.842 ± 0.007

7. Discussion
Convolutions capture the spatial information in a local area of the data. However, they
are not effective at learning long-term dependencies (temporal data) [24], unlike recurrent
networks, which specialise in long-term dependencies. The self-attention mechanism learns
the entire context of input patches. Additionally, it learns what to pay attention to based on
its weights [40], allowing it to learn the relationship between the sensors and the different
activities. The token-mixing MLPs can be considered a convolution layer that captures
information about the entire input, combining spatial information from a single channel
and distributing channel weights to increase efficiency, which allows the Mixer to perform
better than previous techniques when an adequate amount of data is provided and the
invariant features of the input are coherent.
The normalised confusion matrices of the PAMAP2, Opportunity, and Daphnet
datasets are illustrated in Figures 2–4, respectively. The model’s ability to distinguish
between activities in the PAMAP2 confusion matrix showed that it had learned the various
spatial and temporal characteristics of each activity. The model did have some trouble

20
Appl. Sci. 2023, 13, 11154

distinguishing between the “ironing” and “standing” activities; this is probably because
the sensor data for these actions are similar in the chest and ankle regions but only slightly
different in the hand regions. With further inspection, standing consisted of talking while
gesticulating, further validating the possibility of similarities in the hand sensors. Further-
more, the model had little trouble differentiating between “walking”, “vacuum cleaning”,
and “descending stairs” activities; this is understandable since it mistook these activities
for similar ones.

Figure 2. Normalised confusion matrix of the PAMAP2 dataset.

Figure 3. Normalised confusion matrix of the opportunity dataset.

21
Appl. Sci. 2023, 13, 11154

Figure 4. Normalised confusion matrix of the Daphnet Gait dataset.

It was more difficult for the model to distinguish between different activities in the
Opportunity dataset. Because there were significantly more samples of Null activities
than any other activity, the Opportunity confusion matrix, Figure 3, shows that the model
frequently mistook activities for being unrelated. Furthermore, because the activities were
short, the model had a more challenging time figuring out where a given activity began
and ended in the sliding window. The confusion matrix demonstrates that the model was
could pick up on some of the “open door 2” and “close fridge” activity characteristics.
However, the model did not successfully capture features of “open drawer 1” and mistook
this activity for “close drawer 1”. Further investigation revealed that the activity, which
consisted of opening and closing the drawer, took place in a single sequence, suggesting
that the model could not determine when the activity began and, therefore, could not
correctly distinguish between the two.
There was a significant imbalance between the two activities in the Daphnet Gait
dataset, much like in the opportunity dataset. As shown in Figure 4, the Mixer was trained
on an adequate sample size for the majority class”, No Freeze”, allowing it to learn when
the participants were not freezing correctly. However, in the minority case, there was
insufficient data from the Mixer to properly learn relevant features, resulting in the Mixer
incorrectly classifying the participants as not freezing 26% of the time.

7.1. Performance of Sliding Window Parameters


Each dataset contains a different range of activity lengths and repetition rates. The
sliding window length has a significant impact depending on how long the activities are in
the dataset. The sliding window’s parameters were altered to study its effect on the Mixer
performance. The model’s parameters were fixed, and the step size was constant instead
of using an overlap percentage of the window length to prevent the number of samples
from affecting the results. Small window intervals contain insufficient data for the Mixer to
learn from and make decisions. On the other hand, if the sliding window interval is large
relative to the activities in the window, it allows information from multiple activities to
be present in a single sliding window, making it harder for the Mixer to determine which
activity the sliding window represents among the multiple activities.

22
Appl. Sci. 2023, 13, 11154

Performance generally improves with increasing overlap, but as there are more sam-
ples to train and test, the computational complexity of training the Mixer also rises. In
contrast, little to no overlap significantly reduces the sample size, particularly for larger
sliding window sizes, which causes the Mixer to over-fit on the dataset.
Figures 5–7 illustrate the changes in the Mixer’s performance when the sliding win-
dow length is changed. In datasets with more extended activities, such as PAMAP2 and
Daphnet, larger sliding windows increase the model’s capability to learn by providing
more information. On the other hand, in the Opportunity dataset, which contains shorter
activities, the model’s performance decreases with larger window lengths. The sliding win-
dow figures indicate that the sliding window has a slight effect on the Mixer’s performance,
but overall the model is not sensitive to the sliding window length.

Figure 5. Evaluation of sliding window length on the Opportunity dataset

Figure 6. Evaluation of sliding window length on the Daphnet Gait dataset

23
Appl. Sci. 2023, 13, 11154

Figure 7. Performance evaluation of sliding window length on the PAMAP2 dataset

7.2. Weight Visualisation


The models’ weights are visualised to provide insight into which sensors the model
considers necessary for different activities. This experiment aims to confirm that the Mixer
is capturing relevant features and to offer some interpretation of how the Mixer categorises
the activities. The analysis is performed on the PAMAP2 dataset to showcase various
simple and complex activities. Six different activities and their associated weights are
illustrated in Figure 8.
Figure 8 shows how the Mixer associates various sensors with various activities.
The Mixer not only learns which sensors are crucial but also when they are crucial as
the emphasis of the sensors changes throughout the sliding window. For example, in
ascending stairs, the hand (X, Y), chest (X), and ankle sensors have essential features that
the Mixer emphasises, typical when climbing a staircase with handrails. Cycling focuses
on the hand (Y) sensor, most likely for steering, and the chest and ankle sensors, likely for
pedalling. The Mixer prioritises the hand’s (X, Z) sensors when ironing, as expected. While
lying down, the Mixer considers all sensors important, except for the ankle (Z) and hand
(Y), which is to be expected given that the participants had complete freedom to change
their lying positions. Finally, the Mixer values the hand (X, Z) and chest (X) sensors for
vacuum cleaning and the ankles (X, Y) and chest (X) sensors for running activities, which
is consistent with common sense. This analysis concludes that the Mixer is successfully
learning the spatial and temporal characteristics of the various activities because the weight
assignments for these activities are understandable and in tune with common sense.

24
Appl. Sci. 2023, 13, 11154

(a) (b)

(c) (d)

(e) (f)

Figure 8. The Mixer’s weight visualisation for each accelerometer sensors in the sliding window.
Each figure represents a different activity: (a) Ascending stairs, (b) cycling, (c) ironing, (d) lying,
(e) running, and (f) vacuum cleaning.

8. Conclusions
In this paper, the MLP-Mixer performance is investigated for HAR. The Mixer does
not use convolutions or self-attention mechanisms and instead relies solely on MLPs. It
uses token-mixing and channel mixing layers to communicate between patches and chan-
nels, learning the global context of the input and enabling excellent spatial and temporal
pattern recognition in HAR. Experiments were performed on three popular HAR datasets:
Opportunity, PAMAP2 and Daphnet Gait. The Mixer was assessed using sliding windows
on the dataset. This paper demonstrates that pure-MLP architectures can compete with
convolutional and attention-based architectures in terms of HAR viability and performance.
We demonstrate that the MLP-Mixer outperforms current state-of-the-art models in the test
benchmarks for all datasets except for Opportunity Gestures. It performs 10.1% better in
the Daphnet Gait dataset, 1% better in the PAMAP2 dataset, and 0.5% in the Opportunity
Locomotion dataset. The Mixer was outperformed in the Opportunity Gestures; however,
it remained competitive with the state-of-the-art results. To the best of my knowledge,

25
Appl. Sci. 2023, 13, 11154

vision-based MLP architectures have not been applied to HAR tasks. It is interesting to
see the performance of a pure-MLP architecture outperform and remain competitive with
state-of-the-art models in HAR.

Author Contributions: Conceptualization, K.O. and K.F.; methodology, K.O. and K.F.; software, K.O.;
validation, K.O.; investigation, K.O.; writing—original draft preparation, K.O.; writing—review and
editing, K.O. and K.F.; supervision, K.F. All authors have read and agreed to the published version of
the manuscript.
Funding: This research received no external funding.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: The links to the publicly available datasets used in this paper are
provided in Section 4 of the paper (Datasets).
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Parker, S.J.; Strath, S.J.; Swartz, A.M. Physical Activity Measurement in Older Adults: Relationships With Mental Health. J. Aging
Phys. Act. 2008, 16, 369–380. [CrossRef] [PubMed]
2. Kranz, M.; Möller, A.; Hammerla, N.; Diewald, S.; Plötz, T.; Olivier, P.; Roalter, L. The mobile fitness coach: Towards individualized
skill assessment using personalized mobile devices. Pervasive Mob. Comput. 2013, 9, 203–215. [CrossRef]
3. Patel, S.; Park, H.S.; Bonato, P.; Chan, L.; Rodgers, M. A Review of Wearable Sensors and Systems with Application in
Rehabilitation. J. Neuroeng. Rehabil. 2012, 9, 21. [CrossRef]
4. Cedillo, P.; Sanchez-Zhunio, C.; Bermeo, A.; Campos, K. A Systematic Literature Review on Devices and Systems for Ambient
Assisted Living: Solutions and Trends from Different User Perspectives. In 2018 International Conference on eDemocracy &
eGovernment (ICEDEG); IEEE: New York, NY, USA, 2018. [CrossRef]
5. De Leonardis, G.; Rosati, S.; Balestra, G.; Agostini, V.; Panero, E.; Gastaldi, L.; Knaflitz, M. Human Activity Recognition by
Wearable Sensors: Comparison of different classifiers for real-time applications. In Proceedings of the 2018 IEEE International
Symposium on Medical Measurements and Applications (MeMeA), Rome, Italy, 11–13 June 2018; pp. 1–6. [CrossRef]
6. Park, S.; Jayaraman, S. Enhancing the quality of life through wearable technology. IEEE Eng. Med. Biol. Mag. 2003, 22, 41–48.
[CrossRef] [PubMed]
7. Lara, O.D.; Labrador, M.A. A Survey on Human Activity Recognition using Wearable Sensors. IEEE Commun. Surv. Tutorials
2013, 15, 1192–1209. [CrossRef]
8. Tolstikhin, I.O.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Steiner, A.; Keysers, D.; Uszkoreit, J.; et al.
MLP-Mixer: An all-mlp architecture for vision. Adv. Neural Inf. Process. Syst. 2021, 34, 24261–24272.
9. Le, V.T.; Tran-Trung, K.; Hoang, V.T. A comprehensive review of recent deep learning techniques for human activity recognition.
Comput. Intell. Neurosci. 2022, 2022, 8323962. [CrossRef]
10. Roggen, D.; Calatroni, A.; Rossi, M.; Holleczek, T.; Förster, K.; Tröster, G.; Lukowicz, P.; Bannach, D.; Pirkl, G.; Ferscha, A.;
et al. Collecting complex activity datasets in highly rich networked sensor environments. In Proceedings of the 2010 Seventh
International Conference on Networked Sensing Systems (INSS), Kassel, Germany, 15–18 June 2010; pp. 233–240. [CrossRef]
11. Bächlin, M.; Plotnik, M.; Roggen, D.; Maidan, I.; Hausdorff, J.; Giladi, N.; Troster, G. Wearable Assistant for Parkinson’s Disease
Patients with the Freezing of Gait Symptom. Inf. Technol. Biomed. IEEE Trans. 2010, 14, 436–446. [CrossRef]
12. Reiss, A.; Stricker, D. Introducing a New Benchmarked Dataset for Activity Monitoring. In Proceedings of the 2012 16th
International Symposium on Wearable Computers, Newcastle, UK, 18–22 June 2012; pp. 108–109. [CrossRef]
13. Zappi, P.; Lombriser, C.; Stiefmeier, T.; Farella, E.; Roggen, D.; Benini, L.; Tröster, G. Activity Recognition from On-Body Sensors:
Accuracy-Power Trade-Off by Dynamic Sensor Selection. In Proceedings of the Wireless Sensor Networks; Verdone, R., Ed.; Springer:
Berlin/Heidelberg, Germany, 2008; pp. 17–33.
14. Weiss, G.M.; Yoneda, K.; Hayajneh, T. Smartphone and Smartwatch-Based Biometrics Using Activities of Daily Living. IEEE
Access 2019, 7, 133190–133202. [CrossRef]
15. Banos, O.; García, R.; Holgado-Terriza, J.; Damas, M.; Pomares, H.; Rojas, I.; Saez, A.; Villalonga, C. mHealthDroid: A Novel
Framework for Agile Development of Mobile Health Applications; Proceedings 6; Springer International Publishing: Berlin/Heidelberg,
Germany, 2014; Volume 8868, pp. 91–98. [CrossRef]
16. Anguita, D.; Ghio, A.; Oneto, L.; Parra, X.; Reyes-Ortiz, J.L. A Public Domain Dataset for Human Activity Recognition using
Smartphones. In Proceedings of the European Symposium on Artificial Neural Networks (ESANN), Computational Intelligence
and Machine Learning, Bruges, Belgium, 24–26 April 2013.
17. Zeng, M.; Nguyen, L.T.; Yu, B.; Mengshoel, O.J.; Zhu, J.; Wu, P.; Zhang, J. Convolutional Neural Networks for human activity
recognition using mobile sensors. In Proceedings of the 6th International Conference on Mobile Computing, Applications and
Services, Austin, TX, USA, 6–7 November 2014; pp. 197–205. [CrossRef]

26
Appl. Sci. 2023, 13, 11154

18. Hammerla, N.Y.; Halloran, S.; Ploetz, T. Deep, Convolutional, and Recurrent Models for Human Activity Recognition using
Wearables. arXiv 2016, arXiv:1604.08880.
19. Tang, Y.; Teng, Q.; Zhang, L.; Min, F.; He, J. Layer-Wise Training Convolutional Neural Networks with Smaller Filters for Human
Activity Recognition Using Wearable Sensors. IEEE Sens. J. 2021, 21, 581–592. [CrossRef]
20. Yang, Z.; Wang, Y.; Liu, C.; Chen, H.; Xu, C.; Shi, B.; Xu, C.; Xu, C. Legonet: Efficient convolutional neural networks with lego
filters. In Proceedings of the International Conference on Machine Learning. PMLR, Long Beach, CA, USA, 9–15 June 2019;
pp. 7005–7014.
21. Murad, A.; Pyun, J.Y. Deep Recurrent Neural Networks for Human Activity Recognition. Sensors 2017, 17, 2556. [CrossRef]
22. Ma, H.; Li, W.; Zhang, X.; Gao, S.; Lu, S. AttnSense: Multi-level Attention Mechanism For Multimodal Human Activity
Recognition. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI), Macao, China,
10–16 August 2019; pp. 3109–3115. [CrossRef]
23. Gao, W.; Zhang, L.; Teng, Q.; He, J.; Wu, H. DanHAR: Dual Attention Network for multimodal human activity recognition using
wearable sensors. Appl. Soft Comput. 2021, 111, 107728. [CrossRef]
24. Liu, R.; Li, Y.; Tao, L.; Liang, D.; Zheng, H.T. Are we ready for a new paradigm shift? A survey on visual deep MLP. Patterns 2022,
3, 100520. [CrossRef] [PubMed]
25. Liu, H.; Dai, Z.; So, D.R.; Le, Q.V. Pay Attention to MLPs. Adv. Neural Inf. Process. Syst. 2021, 34, 9204–9215.
26. Yu, T.; Li, X.; Cai, Y.; Sun, M.; Li, P. S2-MLP: Spatial-Shift MLP Architecture for Vision. In Proceedings of the 2022 IEEE/CVF
Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 3615–3624. [CrossRef]
27. Wei, G.; Zhang, Z.; Lan, C.; Lu, Y.; Chen, Z. ActiveMLP: An MLP-like Architecture with Active Token Mixer. arXiv 2022,
arXiv:2203.06108.
28. Tang, Y.; Han, K.; Guo, J.; Xu, C.; Li, Y.; Xu, C.; Wang, Y. An Image Patch is a Wave: Phase-Aware Vision MLP. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10935–10944.
29. Wang, Z.; Jiang, W.; Zhu, Y.; Yuan, L.; Song, Y.; Liu, W. DynaMixer: A Vision MLP Architecture with Dynamic Mixing. In
Proceedings of the 39th International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; Volume 162,
pp. 22691–22701.
30. Hendrycks, D.; Gimpel, K. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv
2016, arXiv:1610.02136.
31. Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017.
32. Guan, Y.; Ploetz, T. Ensembles of Deep LSTM Learners for Activity Recognition using Wearables. In Proceedings of the ACM
on Interactive, Mobile, Wearable and Ubiquitous Technologies; Association for Computing Machinery: New York, NY, USA, 2017;
pp. 1–28.
33. Edel, M.; Köppe, E. Binarized-BLSTM-RNN based Human Activity Recognition. In Proceedings of the 2016 International
Conference on Indoor Positioning and Indoor Navigation (IPIN), Alcala de Henares, Spain, 18–21 September 2016; pp. 1–7.
[CrossRef]
34. Moya Rueda, F.; Grzeszick, R.; Fink, G.A.; Feldhorst, S.; Ten Hompel, M. Convolutional Neural Networks for Human Activity
Recognition Using Body-Worn Sensors. Informatics 2018, 5, 26. [CrossRef]
35. Ordóñez, F.J.; Roggen, D. Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity
Recognition. Sensors 2016, 16, 115. [CrossRef]
36. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.;
et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830.
37. Mekruksavanich, S.; Jitpattanakul, A. Deep Convolutional Neural Network with RNNs for Complex Activity Recognition Using
Wrist-Worn Wearable Sensor Data. Electronics 2021, 10, 1685. [CrossRef]
38. Chen, K.; Yao, L.; Zhang, D.; Guo, B.; Yu, Z. Multi-agent Attentional Activity Recognition. arXiv 2019, arXiv:1905.08948.
39. Tang, Y.; Zhang, L.; Teng, Q.; Min, F.; Song, A. Triple Cross-Domain Attention on Human Activity Recognition Using Wearable
Sensors. IEEE Trans. Emerg. Top. Comput. Intell. 2022, 6, 1–10. [CrossRef]
40. Mahmud, S.; Tonmoy, M.T.H.; Bhaumik, K.K.; Rahman, A.K.M.M.; Amin, M.A.; Shoyaib, M.; Khan, M.A.H.; Ali, A.A. Human
Activity Recognition from Wearable Sensor Data Using Self-Attention. arXiv 2020, arXiv:2003.09018.
41. Li, B.; Yao, Z.; Wang, J.; Wang, S.; Yang, X.; Sun, Y. Improved Deep Learning Technique to Detect Freezing of Gait in Parkinson’s
Disease Based on Wearable Sensors. Electronics 2020, 9, 1919. [CrossRef]
42. Thu, N.T.H.; Han, D.S. Freezing of Gait Detection Using Discrete Wavelet Transform and Hybrid Deep Learning Architecture. In
Proceedings of the 2021 Twelfth International Conference on Ubiquitous and Future Networks (ICUFN), Jeju Island, Republic of
Korea, 17–20 August 2021; pp. 448–451. [CrossRef]
43. El-ziaat, H.; El-Bendary, N.; Moawad, R. A Hybrid Deep Learning Approach for Freezing of Gait Prediction in Patients with
Parkinson’s Disease. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 766–776. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

27
applied
sciences
Article
An Evaluation Study on the Analysis of People’s Domestic
Routines Based on Spatial, Temporal and Sequential Aspects
Aitor Arribas Velasco *, John McGrory and Damon Berry

School of Electrical and Electronic Engineering, Technological University Dublin, Grangegorman Lower,
D07 H6K8 Dublin, Ireland; [email protected] (J.M.); [email protected] (D.B.)
* Correspondence: [email protected]

Abstract: The concept of collecting data on people’s domestic routines is not novel. However,
the methods and processes used to decipher these raw data and transform them into useful and
appropriate information (i.e., sequence, duration, and timing derived from monitoring domestic
routines) have presented challenges and are the focus of numerous research groups. But how are
the results of the decoded transposition received, interpreted and used by the various professionals
(e.g., occupational therapists and architects) who consume the information? This paper describes the
inclusive evaluation process undertaken, which involved a selected group of stakeholders including
health carers, engineers and end-users (not the occupants themselves, but more so the care team
managing the occupant). Finally, our study suggests that making accessible key spatial and temporal
aspects derived from people’s domestic routines can be of great value to different professionals.
Shedding light on how a systematic approach for collecting, processing and mapping low-level sensor
data into higher forms and representations can be a valuable source of knowledge for improving the
domestic living experience.

Keywords: behaviour analysis; domestic environments; activities of daily living; knowledge discovery
in databases

Citation: Arribas Velasco, A.;


1. Introduction
McGrory, J.; Berry, D. An Evaluation
Study on the Analysis of People’s People spend most of their time indoors. The Irish people spend an average of
Domestic Routines Based on Spatial, 90% of their time in indoors [1–3]. Human Activity Recognition (HAR) approaches are
Temporal and Sequential Aspects. increasingly being employed to understand human behaviour through the analysis of data
Appl. Sci. 2023, 13, 10608. https:// representative of resident’s domestic routines. Current research indicates that healthcare
doi.org/10.3390/app131910608 professionals, as well as family members of vulnerable older people, and professionals
from the built environment, could potentially benefit from information regarding how
Academic Editors: Marley
M.B.R. Vellasco and Luigi Bibbò
householders transit between the different domestic spaces. The term domestic space has
been used to refer to the private space of the house [4]. Based on the interaction between
Received: 28 August 2023 people and houses, this research focuses on two perspectives.
Revised: 16 September 2023 On the one hand, if we look at the design of a house, although there are generic spaces,
Accepted: 22 September 2023 such as an entrance/exit area to the house, a kitchen, bedrooms, bathrooms, etc., their
Published: 23 September 2023 physical characteristics differ from one another, such as the number of rooms and floors,
their dimensions and orientation, and thus the way they are connected and distributed.
Space syntax, a set of techniques and theories for the study of spatial configurations, is
used to predict possible effects of architectural spaces on users, particularly, how people
Copyright: © 2023 by the authors.
make and use spatial configurations [5]. For example, space syntax has been used to
Licensee MDPI, Basel, Switzerland.
This article is an open access article
assess the impact of different proposals for extending the existing layout of the Tate Britain
distributed under the terms and
Museum [6]. In addition, research has shown that various applications can benefit from
conditions of the Creative Commons occupant information, such as improvements in energy efficiency and indoor air quality,
Attribution (CC BY) license (https:// space utilisation and optimisation, occupants’ comfort enhancement, and healthcare sys-
creativecommons.org/licenses/by/ tems [7]. Iweka et al. showed how information about people’s behaviour in relation to
4.0/). the use of domestic spaces is needed to ensure an effective transition towards optimal

Appl. Sci. 2023, 13, 10608. https://doi.org/10.3390/app131910608 https://www.mdpi.com/journal/applsci


28
Appl. Sci. 2023, 13, 10608

energy use in private dwellings [8]. Ayalp pointed out the importance and the need to use
information representative of domestic human’s behaviour when designing new homes [9].
In addition, Mahmoud noted how the interior architectural characteristics of a space impact
the accessibility and circulation of people [10].
On the other hand, domestic routines help family members to organise themselves,
what they have to do when, as well as in what order and how often. Basic household
activities may include bedtime routine, cooking, using the toilet, etc. The skills required
to perform these routine tasks have been measured by clinicians to assess the health
status of patients in order to independently care for oneself [11]. The term used in this
domain is activities of daily living (ADLs) [12]. Basic ADLs include: ambulating, feeding,
dressing, personal hygiene, continence and toileting [11]. ADLs are traditionally assessed
by healthcare professionals through face-to-face interviews with patients [13]. Although
the aim of this research is not to provide a method to replace existing ADL assessment
techniques, the focus is on the connection between these activities and the spaces of the
house in which they take place, which is of relevant interest in order to provide supporting
evidence. For example, ambulating refers to the ability of the patient to move from one
position to another and walk independently. Others, such as personal hygiene and toileting,
can be inferred based on the use of the bathroom space. Also, feeding is intrinsically related
to the amount of time the person spends in the kitchen. Bouchachia and Mohsen, who
designed a smart home approach to support caregivers working with people with dementia,
remarked that family members can use smart home information to keep track on a daily
basis of their loved one’s day to day routines, while occupational health professionals could
use this information to improve their knowledge of patients [14].
Both previously described views are characterised by temporal information derived
from people’s domestic routines, in addition to the characteristics of the spaces of the
house wherein they take place—spatial information. Spatial and temporal properties have
been used to get insights about people’s interaction with domestic spaces. Thiago and
Gershon defined a human-sensing taxonomy that includes five components that can be
measured through spatial and temporal sensing information to analyse the occupancy
of buildings and how people interact with them: presence (is there at least one person
present?), count (how many people are present?), location (where is each person?), track
(where was this person before?) and identity (who is each person?) [15]. Based on these
components, Wael al. defined three lenses through which to analyse building occupancy:
occupancy resolution (refers to different occupancy levels, for example, resident presence
or absence), temporal resolution (refers to the frequencies over time with which events
take place) and spatial resolution (refers to the building structure, rooms, floors, and the
building as a whole) [7]. These lenses align with major components of this research:
• The movements of people as a result of household routines in domestic buildings;
• Locations as parts of the whole design of the house through which people move, and
timeliness as the times of the day, duration and the frequency of events in different
spaces of the house;
• Occupancy of buildings. This is used to refer to the presence and movements of
people indoors. The term indoor positioning can include crude binary PIR detection (i.e.,
occupancy of a space), or a finer resolution of a location of a person within the space
(i.e. positioning location), especially in areas where GPS signal is not present [16].
This paper presents a systematic approach, based on the knowledge discovery in
databases (KDD) process, which uses sensor data that reflect the transitioning between
locations in a home (e.g., moving from the bedroom to the bathroom) and provides time-
based information about the use of different rooms by a monitored resident (e.g., at 2 a.m.
moved from the bedroom to the kitchen and stayed for 5 min). The data are then transposed
to a set of data visualisations to provide supporting evidence on the following aspects of
the monitored household’s domestic routines:
• What is the frequency of the visits to the locations?
• What are the most common transitions between locations of the house?

29
Appl. Sci. 2023, 13, 10608

• Which hours of the day are most representative of an activity taking place in a particu-
lar location?
• How long on average does the monitored subject spend in a location?
This information is not fully representative of activities such as brushing teeth or
preparing food, but is intended to be useful to carers or observers who need to understand
the spatial and temporal aspects of other person’s routines in their home. It can also
inform designers on space usage and areas that have the highest numbers of transition, for
example, kitchen to dining room, so increased care can be taken during the design of these
spaces, or perhaps more wear-resistant materials can be used.
We present an overview of the proposed overall methodological KDD process in Section 2.
In Section 3, the evaluation study conducted to gather feedback and first impressions from
the main consumers of the information made available is described. The responses collected
through the evaluation study are analysed in Section 4. Finally, Section 5 presents the conclu-
sions based on the results of the thematic analysis carried out.

2. Proposed Method
The mapping of low-level sensor data into other forms, which may be more compact,
more abstract, or more useful, involves various steps that go beyond the computational
reasoning of the datasets. There are several questions that need to be addressed as a part of
this process, including what types of data are needed, how the data will be stored, how
the data will be processed, and how the results will be presented. In 1996, Fayyad et al.
described the knowledge discovery in databases process as the “non-trivial process of iden-
tifying novel, potentially useful, and ultimately understandable patterns or relationships
within a dataset in order to make important decisions” [17]. So, KDD is a systematic and
iterative way of uncovering structures of information, understandable patterns, from data
that can be interpreted as valid. In addition, these entities should be valid for new data
with some degree of certainty, resulting in some benefit to the end user or task [18].
As a result of this successful methodology proposed by Fayyad et al., a number of
different KDD approaches were developed, derived mainly for business uses [18]. The
five steps (Sample, Explore, Modify, Model and Assess—SEMMA) constitute the data
mining process developed by the SAS Institute for enterprises to solve different business
problems [19]. Two Crows Consulting also proposed a data mining process model very
similar to the original KDD process [20]. Anand and Buchner proposed an internet-enabled
knowledge discovery process model adapted to the web mining project [21]. Similarly,
in 1997, Cabena et al. suggested a business-oriented KDD process that included most of
the steps involved in the original KDD process [22]. Brachman and Anand introduced
an alternative perspective, a human-centred process, focusing on the data analyst as the
key actor in the overall KDD process [23]. One of the main reasons for this argument was
that the extraction of valuable knowledge requires prior background knowledge (i.e., an
expertise) beyond the data and their analysis, and this background knowledge of the study
area, according to the authors, resides only in the analyst.
Depending on the KDD approach studied, the number of steps can vary; nonetheless,
the generic steps involved in KDD are: (1) developing an understanding of the end goal,
(2) collecting data, (3) selecting a target dataset, (4) cleaning and preprocessing data,
(5) creating sub-sets of interest, (6) data mining, and ultimately, (7) producing outputs for
evaluation (Figure 1).

30
Appl. Sci. 2023, 13, 10608

Figure 1. Knowledge discovery in databases steps.

Traditionally, the KDD process has used data mining algorithms to automate the
extraction of patterns. Generally, data mining techniques developed in the field of HAR
in domestic environments have been classified into two main groups: data-driven and
knowledge-driven approaches, as well as hybrid methods [7]. Regardless of the approach
undertaken, the activity recognition process focusses on the creation of models that accu-
rately map human activities. Reusability and scalability are the main challenges of these
approaches, as the nature of human activities involves the sequencing of events, and a
particular start time and duration for each step. Additionally, domestic environments vary
in shape, form, and materials, and these factors influence, among other things, the way
in which they are used by their inhabitants. Nonetheless, regardless of the computational
method used, the results of the data mining step need to be presented in a meaningful way
and in a form that can be dynamically adapted by an analyst through iterations so that
conclusions can be drawn.
Our KDD approach follows the idea put forward by Brachman and Anand; we pro-
pose a human-centred approach that brings the analyst’s background knowledge into the
knowledge discovery process. The aim, therefore, is to make the background knowledge in
the knowledge discovery process a key element in the elaboration of assumptions derived
from the study of the sensor data. Our KDD process mimics the scientific method, as it
offers the possibility to explore observations and answer questions. Hence, the process
starts with a question formulated by the analyst; for example, what is the resident’s night-
time routine? This leads to the formulation of a hypothesis, via deduction, perhaps that
the night-time routine of the resident includes the use of the bathroom and the bedroom,
that a minimum duration is expected for these events, and that the frequency of visits to
the bathroom should not exceed 2 min on average. To test the hypothesis derived from the
analyst’s background knowledge, four modes of data visualisation, described in the follow-
ing section, were adapted. These visualisations are flexible based on different parameter
modifications undertaken by the analyst to show different key spatial and temporal aspects
of the sensor data. By iteratively examining these data visualisations, a conclusion can be
drawn. Table 1 shows the comparison between the main steps of our KDD process and the
generic KDD steps previously listed.

31
Appl. Sci. 2023, 13, 10608

Table 1. Comparison between our KDD steps and generic KDD steps.

Our KDD Steps Generic KDD Steps

1. Identify goals 1. Identify goals

2. Collecting data 2. Collecting data

3. Selection 3. Selection

4. Preprocessing 4. Preprocessing

5. Transformation 5. Transformation

6. The question addressed (Developing a hypothesis)


6. Data mining
7. Testing the hypothesis using data visualisations

8. Evaluation (examining results and drawing 7. Evaluation (examining results


conclusions) and drawing conclusions)

Through each iteration of the KDD process, the analyst is expected to gain a deeper
understanding of the routine analysed. The key spatial and temporal parameters used to
analyse the daily routines include:
• Order in which locations are transited (e.g., between 1 a.m. and 7 a.m.: (1)—bedroom,
(2)—corridor, (3)—bathroom, (4)—corridor, (5)—bedroom etc.);
• Times of the day when locations are visited (e.g., between 1 a.m. and 7 a.m.: bathroom
at 1:45 a.m. and at 5:50 a.m.);
• Average duration of the visits (e.g., between 1 a.m. and 7 a.m.: average duration of
visits to the bathroom is 3 min).
This information can then be used, for example, by healthcare professionals and family
members to better understand the behavioural aspects of a monitored loved one. But also,
it can be of great value to architects seeking to understand how people use spaces, and thus
how the design of the interior affects the way people conduct their daily routines.
The purpose of the survey discussed in this paper was to collect feedback and first
insights from a selected group of professional stakeholders that could benefit from the
information reported at the end of the process, and thus how the approach described in
this paper can contribute to the field of HAR by providing a systematic tool with which
the data containing the architectural characteristics of the house, the collected sensor data
showing the transitions of a monitored householder between the different locations of
the house, and the placement of the sensing technology, can be decoupled in a reusable
and structured way. This enables the migration of these low-level data inputs into a set
of data visualisations adapted to display key spatial and temporal aspects, including the
sequencing between the most frequently occupied areas of the house and the duration and
timing of events, related to the use of the space by a monitored householder.
The remaining steps, including data collection, data cleaning and pre-processing
techniques, and data transformation, were not examined in this evaluation study so as to
avoid confusing the volunteer participants due to the technical nature of these steps.

3. Evaluation
The workshops developed aimed to engage the study participants with a prototype
of the step-by-step data analysis process in order to address the extent to which low-level
sensor-based data could be a meaningful source of information. The evaluation study was
conducted using Google Forms and involved architects, engineers, healthcare professionals

32
Appl. Sci. 2023, 13, 10608

and end-users (not the occupant, but the care team managing the occupant). The evaluation
consisted of the following sections.

3.1. Section A: Understanding the Data and Metadata


In the first part, the participant was given a brief introduction to the research and what
was expected of them in this study. They were presented with a sample of the anonymised
CSV file containing the data analysed (Figure 2). Each entry contains the date and time of
an event, the location ID corresponding to a particular space of the house, and the sensor
status, with “1” representing that the sensor was activated.

Figure 2. Sample extracted from the CSV dataset.

The custom-built tracker was a (Passive Infrared) PIR sensor attached to a Raspberry
Pi 4. This tracker device was placed in each room of the house to continuously, anony-
mously and unobtrusively monitor the transitions between locations by measuring the
RSSI strength of the Bluetooth signal emitted by a BLE device worn by the monitored
subject. The novel linking of the PIR and RSSI was imposed to avoid false positives due
to the proximity of the rooms and fluctuations in RSSI signal measurements. The indoor
tracking system and the collection of data for testing and evaluation were approved by the
Research Ethics and Integrity Committee of the TU Dublin.
Then, they were shown a representation of the layout of the house where the monitored
subject lived. This Tube-map visualisation of the house is a digitised pseudo map designed
to make it easier to understand the possible transitions between rooms, e.g., adjacent rooms.
To this end, the rooms are represented by circles of different colours, i.e., every bathroom is
coloured pink and every bedroom green, and the possibility to move between two locations
(which we define as a transition) is represented by a straight line (which we define as
an edge), as shown in Figure 3. In addition, the average distance in metres between two
rooms is also shown, calculated as the distance from the centre point of one room to the
transitioning area, i.e., door, open wall, lift, or staircase, and from this point to the centre
of the adjacent room. Finally, the average time it would take to cover this distance for a
70- to 80-year-old person is also indicated. It was explained to the participants that this
information is obtained from the expanded Building Information Model (BIM) based on the
original BIMXML model, which is a key enabler for the reusability of the process, regardless
of the architectural characteristics of the house.
Based on this information, the participant was asked to rate, on a scale from 0—Very
difficult to 4—Very easy, their ability to understand the possible movements that can be
made by a resident based on the floor plan of the house.

33
Appl. Sci. 2023, 13, 10608

Figure 3. Tube-map visualisation.

3.2. Section B: Adding Context to the Dataset and Establishing a Daily Routine Hypothesis
This section provided a brief explanation of the context in which the dataset analysed
was created, i.e., the age of the monitored resident, whether s/he lived independently
or with other family members, and the time over which the data were collected. Then,
the volunteer participant was asked the following question: based on your own back-
ground knowledge, could you describe how you imagine the sleeping night routine of
the resident being monitored? For example, what locations of the house do you think
are occupied/visited? Further, if there is any timing associated, such as time of arrival
to a specific location, or minimum time spent in it. The answers provided by the analyst
(volunteer participant in this study) would be used as the hypothesis to be verified or
refined during the visual data exploration. Ultimately, this will help the analyst to gain a
deeper understanding of the resident’s behaviour as derived from domestic routines.

3.3. Section C: Understanding the Data Visualisations


This section introduced the participant of the study to the data visualisations selected
and adapted to enable the analysis of the data. The data visualisations used in this work
have been chosen for displaying key spatial and temporal aspects previously discussed.
(a) Visualisation 1: This diagram shows a summary of the average percentage of sensor
events (monitored resident visits) per time interval of 1 h in a selected location from the
dataset. Overall, this visualisation aims to provide an insight into the times of day when the
monitored subject is most likely to visit the location selected for the analysis, for example
the bedroom in Figure 4. This visualisation uses a dynamic variable, the target location
(e.g., the bathroom), which can be manually modified to adapt the information presented.
(b) Visualisation 2: This graph shows a summary of the average duration of the
sensor events (monitored resident visits) at a selected location in 5 min time intervals for a
selected time window (Figure 5). This diagram uses three dynamic variables that allow the
information presented to be manually adjusted. These variables are the target location, and
the start time and the end time of the time window requested for analysis (e.g., bathroom,
from 00:00 to 00:25).

34
Appl. Sci. 2023, 13, 10608

Figure 4. Side–by–side graph proposed for analysing the average number of sensor events represent-
ing the occupancy of a selected location per hour within the 24 h of a day.

Figure 5. Side–by–side graph proposed for analysing the average duration of the sensor events
(monitored resident visits) in a selected location per 5 min within a selected time window.

The aim was to use this information in combination with the previous information
shown in visualisation 1. Thus, it is possible to determine how often on average the
monitored resident visits the selected location and how long he/she spends there on average.
(c) Visualisation 3: This graph shows the sequence or order of locations that the
monitored resident passed through between different days. In order to simplify the content
of this graph, no time information about the duration of the events is shown. The time
window over which the sequences are drawn can be manually selected by specifying the
start and end times, e.g., from 05:00:00 to 06:00:00, Figure 6.

35
Appl. Sci. 2023, 13, 10608

Figure 6. Node graph proposed to display the sequence of visits to different locations of a
monitored resident.

(d) Visualisation 4: The aim of this diagram is to show additional temporal information
to the previously shown sequences for a selected day. Therefore, a layer of temporal context
is added, which can be used to estimate the start time, duration and end time of each event
on a selected day (Figure 7). The date can be manually selected, e.g., 12 May 2021.

Figure 7. 24 h clock visualisation proposed for the study of temporal information associated with the
daily routine of a monitored resident.

After completing the description of these visualisations, participants were asked the
following questions to assess the process and the data visualisations evaluated:
• Would you have been able to identify the identity of the monitored resident or other
people from the dataset used?
• Were the data visualisations useful in accepting or refining your preliminary hypothesis?

36
Exploring the Variety of Random
Documents with Different Content
rudos y melancólicos, ni cascadas, ni castillos roqueros de aire
amenazador; allí no hay preciosidades artísticas, ni gente muy rica,
ni gente muy pobre; todo es alegre, pequeño, sin exageración, claro,
reposado.
El campesino vasco es casi el único aldeano de Europa que tiene hoy
aspecto de campesino. Cuando se le ve trabajar en su tierra con sus
bueyes, está tan identificado con la naturaleza, que se funde con
ella. El contemplar a estos aldeanos es para mí uno de los pocos
motivos que me induce a tener respeto por ciertas formas de la
tradición.
Muchas veces, contemplando el campo, recordaba aquellos versos
de Elizamburu, el poeta de Sara, que fué capitán de granaderos de
la Guardia Imperial de Napoleón:

Icusten duzu goicean,


Arguia asten denian,
Mendito baten gañian
Eche tipitho aintzin churi bat,
Lau aitz ondoren erdian
Chacur churi bat atean
Iturriño bat aldean.
An bizi naiz ni paquean.

(¿Ves por las mañanas, cuando la luz comienza a alumbrar,


en lo alto del monte una casa chiquita, con la fachada
blanca, en medio de cuatro robles, con un perro blanco en
la puerta y una fuentecilla al lado? Allí vivo yo en paz.)

Estos versos no tenían la originalidad de los de Goethe, de los de


Víctor Hugo o de los de Heine; pero reflejaban dentro de su
medianía admirablemente el deseo de un vasco de vivir en la tierra
de los antepasados.
Elizamburu, el capitán de granaderos, que había recorrido media
Europa, había sentido al escribirlos la nostalgia de su aldea, soñando
con volver a su casa, blanca y pequeña, a la vida obscura del
campo. Yo, que no había recorrido Europa, experimentaba un anhelo
parecido.
Quizá era un anhelo intelectual, más que real, un amor por una idea,
por un concepto...
No conozco yo bien la casa campesina de otros países; no sé si es
mejor o peor; pero no creo que me entusiasme como la casa vasca.
No me ilusiona el cortijo o la masía en donde apenas se hace fuego,
ni las porcelanas, ni los azulejos, ni los suelos de ladrillo; a mí me
gusta que en el hogar haya siempre lumbre, y que una columna de
humo salga constantemente de la chimenea; me gusta que en la
cocina haya poca luz, que huela a leña quemada, que haya una
buena vieja junto al fuego y que se oiga cerca el mugido de los
bueyes...
No, seguramente Aviraneta no tenía estos ridículos accesos
sentimentales. El era en sus ideas y en sus planes más constante,
más tenaz; su personalidad estaba constituída de una substancia
homogénea; no tenía esta heterogeneidad de mi carácter, ni
tampoco este sentimentalismo mío, no sé si perruno o de capitán de
granaderos.
X.

FÍSICA

Tenía curiosidad por averiguar lo ocurrido entre Delfina y Stratford,


pero a ninguno de los dos me hubiera atrevido a preguntarles nada.
A los tres días de nuestra estancia en Jaureguía fuimos a Bayona
madama D'Aubignac, la de Saint-Allais y yo; y al llegar a casa me
encontré con una carta de Aviraneta, en la que me decía que fuese a
Bidart y buscase y copiase unos documentos en su archivo, y que
luego fuera a Sara y me enterase del giro de los asuntos de
Muñagorri.
Al día siguiente marché a Bidart y fuí a hospedarme al caserío
Ithurbide, la antigua casa de Gastón de Etchepare, donde me
encontraba muy a gusto.

LOS CARACOLES

El cuarto que me cedía madama Ithurbide (yo la llamaba así, aunque


no fuera éste su apellido) era una sala con alcoba, la principal de la
casa. Esta sala tenía un balcón corrido que daba a una duna verde
que se cortaba en el acantilado del mar.
Era una sala eminentemente marina; el papel de la habitación tenía
unas fragatas que navegaban a todo trapo.
En la chimenea, sobre el mármol, se veían dos ramilletes hechos de
conchas y metidos en fanales de cristal; en la mampara, una
estampa de color con una lancha de pescadores. Sobre una cómoda
había un barco de marfil, y sobre un velador, una caja con conchas
pegadas en la tapa, y varios caracoles, estrellas de mar, pólipos y
corales.
Tanta concha y tanto caracol daba la impresión de que se estaba en
un acuario, y que uno mismo era algún molusco o algún pólipo que
por equivocación había dejado su cueva para entrar en aquel cuarto.
El primer día registré el archivo de Aviraneta, y encontré los
documentos que me indicaba, y me puse a copiarlos.
Terminado mi trabajo paseaba por el arenal desierto de Bidart y
contemplaba el anochecer espléndido, en que el sol se iba poniendo
hacia el cabo Higuer. Luego tomé la costumbre de ir por la mañana a
la playa, a primera hora, y después, por la tarde, hacia el crepúsculo.
Este mar resplandeciente con el sol de primavera, cuando lo divisaba
desde encima de las lomas verdes, me daba una gran alegría.
En la casa me encontraba contento. Madama Ithurbide me hacía un
potaje de judías y de verdura, que comía con gusto después de un
año de comida de hotel.
Me hubiera quedado allí mucho tiempo si no hubiese sido porque
tenía que seguir mi marcha.
Uno de estos días, el tercero, al salir de mi casa, por la mañana,
para ir hacia el mar, pasé por delante de un jardín en donde una
muchacha cantaba una canción que había oído en Laguardia:

La Pisqui, la peinadora,
con excusa de peinar,
le da citas al velero,
y se van a pasear.
Me erguí un poco para mirar por la tapia. La que cantaba era una
muchacha morena, de ojos negros.
—Muy bien—la dije—, muy bien. Veo que está usted de buen humor.
—¿Y usted, no?
—Sí, también. ¿Es usted española?
—Sí.
—¿De dónde?
—De Haro. ¿Y usted?
—Yo, de Vera.
La muchacha estaba sirviendo con una señora que tenía un niño
enfermo. Allí, sola, en aquella casa próxima al mar, se aburría
soberanamente. Aquel día, la señora había ido a Bayona a casa del
médico.
Al pasar por la tarde volví a ver a la muchacha, que estaba cantando
y tendiendo ropa al sol.
—¿Por qué no viene usted a pasear conmigo?
—¿Adónde?
—Por la playa.
—Pues, vamos.
Fuimos por la playa, charlando. Me contó su vida. Era de un pueblo
próximo a Haro. Se llamaba Dolores.
Se nos obscureció. Yo estaba muy conmovido, y ella también.
Yo la abracé y la besé varias veces.
Al retornar a su casa entró ella por el jardín para ver si había vuelto
la señora; pero no había vuelto.
La soledad, la noche espléndida y tibia, el ruido del mar próximo,
una especie de aura erótica nos sobrecogió a los dos...
Por la mañana, cuando salí de allí, la muchacha lloraba.
—¡Qué locura! ¡Qué locura he hecho!—murmuró.
Ella no sabía por qué; a mí me pasaba lo mismo.
Al salir en el tílburi de Bidart a San Juan de Luz sentí un ligero
remordimiento, pero se me pasó pronto, y olvidé rápidamente a
Dolores, la riojana.
XI.

MUÑAGORRI Y SU GENTE

En San Juan de Luz visité a doña Mercedes, la madre de Corito, que


me dijo que su hija vendría pronto.
De San Juan de Luz marché a Sara.
Me encontré allí con Cazalet, el bohemio, que había ido, sin duda,
con alguna comisión para Muñagorri.
—¿Qué hace usted aquí?—le dije yo.
—¿Y usted, qué hace?
Nos echamos a reír.
—Lo mío no es ningún misterio—repliqué—: he venido a verle a
Muñagorri.
—Yo también. Yo he estado hospedado en la misma casa en donde
estuvo Don Carlos acompañado de Auguet de Saint-Silvain, titulado
por el Pretendiente el barón de los Valles.
—¡Qué honor!
Entramos en una tienda, en donde había una muchacha muy guapa,
que Cazalet conocía, y que se llamaba Pepita, Pepita Haramboure, y
allí tomamos unas copas de vino blanco con bizcochos.
Cuando se fué Cazalet le pregunté a Pepita dónde podría ver a
Muñagorri, y me dijo que tenía el campamento cerca del pueblo. Salí
de la tienda y fuí a ver si lo encontraba. Vi en el camino a varios
hombres, por su aspecto, soldados de Muñagorri. Le pregunté a uno
de ellos dónde podría encontrar al jefe, y me señaló un caserío
abandonado. Efectivamente, allí estaba, en compañía de otros dos
hombres, moviendo con una gran cuchara un caldero de habas. José
Antonio Muñagorri parecía un buen hombre. Era grueso, rechoncho,
de cabeza redonda, de nariz aguileña, ojos negros y sonrisa amable.
—¿Ya ha comido usted?—me preguntó hablando con un canto de
aldeano vascongado.
—No.
—Pues dentro de una hora comeremos aquí. Si quiere usted venir...
Le dije que Aviraneta me había enviado para que me diera ciertos
datos acerca de sus futuros planes.
—¿Conoce usted a Altuna?—me preguntó.
—No.
—Pues vaya usted a verle al pueblo. Estará ahora en la fonda de
Hoyartzábal.
Fuí a la fonda y lo encontré. Asensio Ignacio Altuna, el secretario de
la empresa Paz y Fueros, dirigida por Muñagorri, era hombre alto,
rubio, de buen color, de ojos claros, con un aire atlético.
—¿Ha comido usted?—me preguntó.
—No.
—Quédese usted a comer aquí.
—Me ha invitado también Muñagorri.
—No haga usted caso; aquí comerá usted mejor.
Me pareció poco cortés, pero, ya que el subordinado de Muñagorri
me lo decía, me quedé allí. Le expliqué a Altuna el objeto de mi
viaje; cómo venía de parte de Aviraneta, quien probablemente
pasaría mis informes al Gobierno.
—Le daré a usted mi opinión sin ambages—me dijo Altuna—.
Muñagorri es un hombre inteligente y un hombre honrado. Es un
tipo que encontrará usted aquí en el país vasco, bueno, optimista,
pero de esos a quienes se les ocurre una idea y ya no varían jamás.
Su proyecto de Paz y Fueros le parece admirable.
Yo sabía que esta idea no era originalmente de Muñagorri, pues
había sido inventada por un amigo y compañero de Aviraneta, don
Juan Olavarría, y patrocinada primero por el ministerio Bardají, y
luego por el ministerio Ofalia.
—Muñagorri no avanza—siguió diciendo Altuna—, porque en vez de
luchar por una causa vieja y tradicional tiene que defender una
causa nueva inventada por él. Para esto, no basta un talento
corriente: se necesita genio.
—¿Y él no lo tiene?—pregunté yo.
—No, no lo tiene. ¿Quién lo tiene? Él no es capaz de cambiar de
ideas, pero sí de procedimientos. En su misma vida ha cambiado:
Muñagorri antes de ser fundidor era de profesión escribano; luego
abandonó el oficio y arrendó varias ferrerías en Berastegui, con lo
que ganaba mucho y daba de comer al país. Tampoco es un
aventurero. Ha sido un hombre rico, condecorado con la cruz de
Carlos III, y ahora con su empresa se ha arruinado, y sus ferrerías
de Berastegui trabajan fundiendo cañones carlistas.
—Así, que el jefe no es malo.
—No, no es malo.
—Pues corre por el país la idea de que es un inepto.
—No, no es verdad. Lo que nos pasa a él y a los suyos, es que
tenemos muchas dificultades. Usted sabe que se organizaron en
Bayona juntas de las cuatro provincias para que influyesen en el país
y ayudasen a Muñagorri. Estas juntas no han dado resultado. El
Gobierno nos abrió un crédito de dos millones de reales en la casa
Ardoin. Este dinero ha venido mermado. ¿Quién se ha quedado con
él? Yo no lo sé. Al principio patrocinaron la idea algunos de nuestros
políticos y varios prohombres ingleses. Lord Palmerston y sir Jorge
Villiers escribieron a lord John. Hay para que nos favoreciese. Hoy ya
no se acuerda nadie de nosotros, y únicamente el general Jáuregui
nos alienta. El cónsul Gamboa trabaja contra nosotros. En Bayona,
las autoridades del Gobierno cristino nos han tratado como
criminales y desertores. El subprefecto daba noticias a los carlistas
de lo que hacía Muñagorri. Al cónsul esto le parecía muy bien.
—Es que este Gobierno español y sus empleados son de una
incapacidad tan extraña, que llega a lo ridículo—dije yo.
—Parecen agentes de los carlistas. No nos favorecen los liberales, y
los carlistas nos odian. El general Iturbe, que estaba comprometido,
se ha puesto francamente en contra de la empresa. Los carlistas han
empleado toda clase de recursos contra nosotros. El canónigo
Batanero ha pedido para Muñagorri y su gente la excomunión.
Necesitaríamos alguien que consultara con los generales cristinos y
nos indicara sus intenciones.
—Yo no puedo hacer eso.
Le dije a Altuna que, pasadas un par de semanas, tenía el proyecto
de ir a San Sebastián para enterarme allá de qué pensaban los
generales de la Reina de la empresa de Paz y Fueros.
—Escríbanos usted con detalles el resultado de su entrevista—me
dijo él.
—Lo haré, no tenga usted cuidado.
Volvimos Altuna y yo al campamento de Muñagorri.

CANCIONES
Había concluído de comer Muñagorri con quince o veinte de sus
partidarios, y un viejo cantaba una canción en honor del caudillo
fuerista, que comenzaba así:

Carlos aguertu ezquero


Provinci auyetan,
Beti bici guerade
Neque ta penetan.

(Desde que Carlos ha aparecido en estas provincias,


nosotros vivimos siempre en la fatiga y en la pena. Se nos
quita nuestros bienes y nunca se nos da nada.)

Esta canción lacrimosa me pareció muy propia de una empresa que


marchaba tan mal.
Me despedí de Muñagorri y de Altuna y tomé a caballo el camino de
San Juan de Luz. Antes de llegar a Ascaín me encontré con tres
muchachos carlistas que habían estado quince días en el
campamento de Muñagorri y que pensaban volver de nuevo a
España, al ejército de Don Carlos. Uno era guipuzcoano, el otro
navarro, y el otro francés. Se burlaban de Muñagorri y de sus planes
y me cantaron varias canciones contra él. El francés llevaba un pito,
con el que tocaba. El guipuzcoano cantó:

Estute aditzen soñu eder ori,


Saratican elduda gure Muñagorri.
Riau, riau, riau, cataplau.
Gure humoria,
Utzi al de batera,
Euscaldun gendia.

(¿No oís un hermoso sonido? De Sara ha salido nuestro


Muñagorri. Riau, riau, riau, cataplau, nuestro buen humor,
dejad a un lado, gente vasca.)

Después de esta canción cantó otra más burlona, que empezaba


diciendo:

Muñagorrien sarrera
Españiaco lurrera,
Legua guchi aurrera.

(La entrada de Muñagorri en el suelo español, pocas leguas


adentro.)

El navarro a su vez cantó:

Muñagorrien gendiac
Shutan ez dirade trebiac;
Billa litezque obiac
Seculan eztu
Gauz onic eguin.
Guizon gogoric gabiac
Gueyenac desertoriac:
Diru billa ateriac.
Aditu biarcodute beriac.

(La gente de Muñagorri no es muy lista para el fuego;


podría encontrarse fácilmente otra mejor. Nunca ha hecho
cosa buena la gente sin ganas: la mayoría desertores.
Tendrán que oír lo suyo.)

Estas canciones, mucho mejor que las palabras de Altuna, me


indicaron que la empresa de Muñagorri marchaba muy mal.
XII.

NUEVA TERTULIA

Cuando llegué a Bayona a hacer la vida ordinaria, me encontré con


algunas ligeras novedades. Se había instalado en mi mismo hotel
González Arnao, que tenía su tertulia en su cuarto.
Solían ir a ella varios españoles, entre ellos, Eugenio de Ochoa, hijo
natural del abate Miñano. Ochoa era por entonces un joven
elegante, de veintitrés a veinticuatro años, muy emperifollado, muy
culto y que hablaba perfectamente el francés.
También solía ir un pintor muy malo, Augusto Bertrand, entusiasta
de lo más ñoño de la pintura francesa, ya de por sí un tanto ñoña.
Monsieur Bertrand era gran admirador de David, de Ingres, y sobre
todo de Greuze. Fuimos al estudio del señor Bertrand, que, cuando
mostraba sus cuadros, daba una lente grande, como si se tuviera
que contemplar la fractura de algún mineral o de algún pequeño
insecto.
Otro de los contertulios fué el profesor Teinturier.
Yo, a este hombre, no le entendía. Era republicano radical,
entusiasta de Barbes, de Blanqui y de Martín Bernard y de los que
con ellos preparaban la revolución en las sociedades secretas, y al
mismo tiempo tenía una predilección marcada por Racine y los
clásicos antiguos. Sin duda aspiraba a una revolución con formas
clásicas. Esto para mí era difícil de comprender. Yo me explico que
los revolucionarios exaltados deseen la igualdad absoluta, el
comunismo y hasta la antropofagia, pero revolucionarios con versos
de Horacio y de Racine, no me caben en la cabeza. Para revolución
con formas académicas, hemos tenido la Revolución Francesa, y ya
basta.
Teinturier, después de muchos rodeos, me pidió que le presentase
en casa de madama D'Aubignac. Le dije que hacía tiempo que no la
veía a esta señora, pero que en la primera ocasión le presentaría.
Cuando fuí a casa de Delfina y se lo dije a ella, se opuso.
—De ninguna manera se le ocurra a usted traer a mi casa a ese
señor—me indicó.
No repliqué nada.
—La vista sólo de ese hombre me molesta—añadió—. ¡Tiene un tipo
tan vulgar! ¡Unas manos tan ordinarias! ¡Unos pies tan grandes!
¡Luego mira de una manera tan descarada!
—No crea usted. Es más bien la timidez. Está muy entusiasmado con
usted.
—Pues, no; no le traiga usted aquí.
¡Pobre hombre!—pensé yo—. Para eso ha estudiado tanto, para que
no lo consideren ni siquiera a la altura de uno de estos oficiales
majaderos e insolentes que se lucen en los salones.
Siempre me ha chocado la poca comprensión que tienen las mujeres
por cierta clase de hombres. Estos tipos de hombres fuertes, que se
creen más fuertes de lo que son, que ven a la mujer como un
producto débil, más débil de lo que es en realidad, este hombre
toro, que parece que debía ser el ideal de la mujer femenina, lo es
pocas veces, casi nunca.

CONVERSACIÓN CON DELFINA


Delfina me preguntó si le había vuelto a ver a Stratford. Le dije que
le había visto un momento.
—¿No le ha hablado a usted de mí?
—No.
—Estamos reñidos.
—¿De verdad?
—Sí.
—¿Y por qué?
—Yo le tengo cariño a Jorge, le tengo por un caballero, por un
hombre noble y bueno.
—Yo también.
—Yo desearía conservar con él una buena amistad, pero él no se
contenta con eso.
—El quisiera ser su amante.
—No.
—Pues entonces, ¿qué quiere?
—El quisiera que yo abandonara mi casa y fuéramos juntos los dos a
otro país.
—¿Y los hijos?
—El me decía que nos llevaríamos los hijos.
—¿Pero su marido de usted?
—A mí, ¿qué quiere usted? No me importa nada mi marido, pero lo
que no puedo sacrificar es mis hijos. Prefiero ser desgraciada.
Hablando del asunto llegué a comprender la situación respectiva de
Delfina y de Stratford. Ella le había dado a entender la posibilidad de
que él fuera su amante sin escándalo, lo que ocurría en muchos
hogares. El no aceptaba la solución. Nada de bajo adulterio,
ocultándose del marido. Afrontar la situación desde el principio y
marcharse a otro país.
—Jorge es un corazón noble y yo le admiro ahora más que antes—
dijo Delfina.
Hablamos largamente y me pidió que la primera vez que le viera a
Stratford le sondeara acerca de sus intenciones.
Al despedirme de ella, Delfina me dijo:
—Cuento con su discreción, Leguía, ¿verdad?
—Una vez he podido ser imprudente, pero dos, no.
—Así lo espero. Además, aquello era una niñería.
Cuando salí a la calle, todo lo que se me había ocurrido mientras
hablaba con Delfina se lo dije al viento:
—Señora: usted es muy alambicada y muy cuca; quiere usted
religión y libertad de pensamiento exclusiva para usted, costumbres
muy severas y al mismo tiempo facilidad en las pasiones; ser muy
honorable y tener un amante, tener un hombre enérgico y altivo y al
mismo tiempo que se doblegue a sus necesidades y a sus caprichos.
Todo esto no se encuentra mas que en Jauja o en el país de las
Gangas. Yo no diré nada, pero no seré tampoco el que intervenga en
sus asuntos.
XIII.

VUELTA POR ESPAÑA

Como quería cumplir el encargo de Altuna y dar informaciones


precisas a don Eugenio, me preparé a ir a San Sebastián; pedí
pasaportes y cartas de recomendación a González Arnao, quien me
recomendó al coronel inglés Colquhoun.
Partí de Bayona para San Juan de Luz, fuí a Socoa y salí en un
pailebote que marchaba a San Sebastián. Llegué a la ciudad
donostiarra y me vi inmediatamente con Alzate y Orbegozo. Alzate
me dijo que con quien podría enterarme bien de las intenciones
inglesas con respecto a Muñagorri, sería hablando con el coronel
Colquhoun, que estaba en aquel momento en Ategorrieta.
Seguramente la carta de González Arnao me serviría para llegar a él.
Respecto a los planes de los generales cristinos, él me daría una
carta para el general Jáuregui.
A la mañana siguiente tomé un coche y fuí a Ategorrieta. Llevaba en
aquel punto mucho tiempo acantonada la Legión inglesa. A la
entrada del barrio había un letrero con pintura negra en una pared:
Westminster Square, y en otra esquina ponía Constitution Hill (colina
o cuesta de la Constitución). Este segundo letrero duró mucho
tiempo; yo lo vi quince años después. Algunos supusieron que
quedaba porque Hill, en vascuence, quiere decir muerto, y los
campesinos vascos, en su mayoría carlistas, al leer Constitution Hill,
suponían que decía Constitución muerta.
Al acercarme al barrio me detuvo un centinela, que llamó a un cabo,
quien me condujo al Cuerpo de guardia. Cerca había una fila de
carros, caballos y cañones.
Entramos el cabo y yo en el Cuerpo de guardia británico.
Los soldados ingleses, con sus casacas rojas, se paseaban de arriba
a abajo con las manos cruzadas en el pecho, silbando o tarareando;
otros, sentados en los bancos, cosían un botón o remendaban una
ropa vieja. En la pared estaban colocados los fusiles, y en medio
había un brasero lleno de tablas ardiendo. Había un olor fuerte a
tabaco. Salió un oficial, le pregunté por el coronel Colquhoun, y me
indicó una casa próxima al camino de Pasajes.
Aquellos ingleses me parecieron gente de buen aspecto, a pesar de
que tenían mala fama como soldados. Se decía que eran
vagabundos enrolados en los muelles y en las tabernas de
Inglaterra; se añadía que desertaban a la mejor ocasión a las filas
liberales o carlistas; que robaban en los pueblos, y que se
emborrachaban siempre que podían.
A pesar de esto se habían batido como leones a las órdenes del
general Lacy-Evans en la batalla de Oriamendi.
En la casa que me indicaron como residencia del coronel Colquhoun
vi a un soldado inglés con su mujer y dos chicos en brazos. Le
pregunté si sabía si vivía allí el coronel, y me dijo que sí.
Colquhoun me recibió muy amablemente, pero me dijo que no sabía
nada; él influía con el comodoro lord John Hay para que no se
abandonara la empresa de Muñagorri, pero no conocía los planes del
Gobierno inglés.
Colquhoun me pareció un hombre amable y culto. Era matemático e
ingeniero, y por la presión de lord John se había metido a politiquear
y a intrigar, cosas para las cuales no tenía condiciones.
Volví a San Sebastián y fuí a Hernani, en donde me dijeron que
encontraría a Jáuregui. Efectivamente, le encontré; le di la carta de
Alzate, y me preguntó por mi tío Fermín, y nos hicimos muy amigos.
Tenía él que ir a Urnieta; le ofrecí mi coche; aceptó, y fuimos juntos.
Me dijo que O'Donnell y él pensaban hacer un reconocimiento en
Vera, y que le iba a ver en aquel momento al general para ponerse
de acuerdo en los detalles de la expedición.
—¿Cree usted que yo le podría hablar a O'Donnell?—le pregunté a
Jáuregui.
—¿Acerca de qué?
—Acerca de la actitud que piense tener con relación a Muñagorri.
—No le contestará a usted nada.
—¿Está usted seguro?
—Segurísimo. O'Donnell es un hombre impasible, impenetrable; le
oirá a usted muy amablemente, le preguntará lo que usted opina, le
escuchará con mucha atención, y cuando usted intente averiguar lo
que cree él de esto o de lo otro, sonreirá y pasará a otro asunto.
Además, esa cuestión de Muñagorri es un punto que no le gusta
tratar.
—Entonces no le preguntaré nada.
—¿Usted es de Vera?—me preguntó Jáuregui.
—Sí.
—¿Quiere usted venir al reconocimiento que vamos hacer en su
pueblo?
—Con mucho gusto.
—¿Dónde para usted?
Le di mis señas en San Sebastián.
—Bueno, yo le avisaré a usted.
Llegamos a Urnieta. Urnieta tenía todavía las huellas de la batalla
dada por O'Donnell el otoño pasado, que había costado el incendio
casi total del pueblo. Dejé a Jáuregui en una casa próxima a la
iglesia, y entré yo en una taberna, donde pedí una botella de sidra.
En la taberna había un hombre manco y tuerto, con una blusa larga,
que llevaba un montón de papeles bajo el brazo. Tenía el hombre
aquel cierto aire de sacristán y una voz un poco aguda. Hablamos.
—¿Viene usted aquí de paseo?—me preguntó.
—Sí; ¿y usted?
—Yo, por el comercio.
—¿Por qué comercio?
—Vendo canciones.
—¡Hombre! ¿A ver qué canciones tiene usted?
—Son canciones carlistas.
—Muy bien. Yo soy liberal, pero eso no me importa. ¡A ver, cante
usted!
El manco empezó a cantar, con su voz aguda, una canción sobre
O'Donnell y la quema del pueblo, que empezaba así:

Orra nun den Urnieta,


Ez ta besteric pareta,
Malamentian erreta.

(Ahí está Urnieta, no quedan más que las paredes,


malamente quemadas.)
O'Donnell generala
Zubela aguintzen
Fanfarroi zebillen
Etchiac erretzen.
Solamente jauna ori
Ez da gaitz itzuzen,
Chapela galdu eta;
Hernani sartu zen.

(El general O'Donnell mandaba y andaba muy fanfarrón


quemando las casas. Solamente ese señor es difícil de
asustar; perdió el sombrero y entró en Hernani.)

Chapela galdu eta


Gañera saldiya,
Beste bat artu eta
Iguesi abiya,
Guezurra gabetanic
Esango det eguiya,
Traidoria da eta
Cobarde aundiya.

(Perdido el sombrero y además el caballo, tomando otro


para correr más deprisa, sin mentira diré la verdad, porque
es traidor y cobarde.)
Santo Tomás eguneco,
Amar terdiyetan,
Etsagon atseguin
Ategorrietan.
Pechotican eztuta
Caqueguin galzetan.
Orra sein cobardiac
Dirade beltzetan.

(El día de Santo Tomás, a las diez y media, no estaba muy


tranquilo en Ategorrieta. Con el pecho oprimido y
ensuciados los calzones. Ahí se ve lo cobardes que son los
negros.)

Luego, el manco cantó otras canciones que, a pesar de ser primitivas


y bárbaras y casi siempre incoherentes, no dejaban de tener gracia.
Una de ellas, contra los extranjeros, comenzaba así:

Francesac ta inglesac berriz


Cecen icusten dabiltz
Barrera gañetic irritz,
Arriya tira escua gorde
Eguindigute bost alditz,
Au consideratzen balitz.
Baliyoco luque aunitz
Buru gogorric ez balitz.

(Los franceses y los ingleses de nuevo están viendo los


toros desde la barrera, riéndose; tiran la piedra y esconden
la mano; nos lo han hecho muchas veces. Esto lo
comprenderíamos muy bien si no tuviéramos la cabeza tan
dura.)
Este reconocimiento de la dureza de nuestra cabeza vasca me hizo
reír a carcajadas.
Después de la rabia contra los extranjeros venía el rencor contra los
castellanos y los hojalateros, que querían que continuara la guerra:

Orien votoz necazariyac,


Pasabiarcodu dieta.
Erdealdunaren copeta.
Morralac ondo beteta.
Guero iguesi lasterta.

(Por el voto de esos, los trabajadores tendrán que vivir a


dieta. ¡Qué tupé el de los forasteros! Llenan bien el morral
y luego echan a correr.)

Después de estas imprecaciones y cóleras el manco cantó una


canción filosófica que comenzaba así:

Aurten eztegu izango


Fortuna charra;
Bici galdu ezquero,
Acabo guerra.

(Este año no tendremos mala fortuna; perdiendo la vida, se


acabó la guerra.)

Le compré al cantor varias de sus canciones y volví a San Sebastián,


y esperé a que me avisara Jáuregui para ir a Vera. En tanto, pedí a
Bayona un libro que había comprado meses antes, que se titulaba
Campañas de 1813 y de 1814 sobre el Ebro, los Pirineos y el
Garona, por Eduardo Lapene. Cuando me lo mandaron leí la parte
que hablaba de combates entre franceses y aliados en el Bidasoa y
en las proximidades de Vera.
Aquellos días de lluvia charlé bastante con el antiguo amigo de
Aviraneta, el cabo de chapelgorris, Juan Larrumbide, Ganisch, en la
taberna del Globulillo, de la calle del Puerto de San Sebastián, quien
me dijo que iba a ir también en la expedición a Vera.
El día primero de abril me avisó Jáuregui y fuimos a Oyarzun.
A mí me dieron un hermoso caballo, y, como llevaba un magnífico
impermeable y un sombrero también impermeable, llegué sin
mojarme a Oyarzun.
Ganisch, que conocía todos los rincones de la provincia, me llevó a
un caserío de Arichulegui, donde comimos admirablemente y donde
dormimos igualmente bien.
Por la mañana, nos levantamos, y, a la hora de la diana, tomé yo mi
caballo, y, con mi impermeable y mi sombrero de hule, seguí a la
comitiva de Jáuregui. Nos encaminamos hacia la peña de Aya,
pasamos por la ermita y la ferrería de San Antón, por el mismo
camino por donde fueron las tropas de Wéllington y donde murieron
despeñados muchos soldados y oficiales ingleses. A media tarde
llegamos a los montes próximos a Vera, y allí se acampó.
Ganisch me llevó al barrio de Zalaín, próximo al Bidasoa, al caserío
del cabecilla Gamio.
Gamio fué el capitán de una partida liberal que, en una correría a
Zugarramurdi, mató al coronel carlista don Rafael Ibarrola. Al volver
de la expedición, el mismo día, Gamio fué visto por una patrulla
carlista cuando descansaba, a la puerta de un caserío, con sus
partidarios, y le soltaron una descarga cerrada y lo mataron. En Vera
se había confundido el hecho y se creía que la muerte de Ibarrola
era debida a mi tío Fermín Leguía, que por entonces estaba en
Cuenca.
Me recibieron muy bien en el caserío de Gamio, el hijo y las hijas del
partidario liberal. Cenamos espléndidamente, y tuvimos baile
después de cenar. Por la mañana me presenté en una chavola de
Alcayaga, en donde estaban reunidos Jáuregui, O'Donnell y otros
jefes.
—¿Qué ha hecho usted?—me preguntó Jáuregui.
Le conté cómo había pasado la noche.
—Es usted un hombre de suerte.
No acababa de decir esto cuando una granada dió en la puerta de la
chavola y la hizo polvo, y uno de los cascos pasó por encima de mi
cabeza.
Nada; no tenía duda. Era un hombre de suerte.
Los carlistas sabían ya dónde estaban los generales enemigos, y
disparaban allí.
Salimos fuera; O'Donnell, Jáuregui y los oficiales del Estado Mayor
montaron a caballo, y yo hice lo mismo, y lucí mi impermeable y mi
sombrero de hule.
El tiempo estaba malo: llovía y venteaba.
El Bidasoa venía muy crecido.
—Vamos a ver—me dijo Jáuregui—, ¿cómo pasaremos mejor el río?
—¡Supongo que no querrán ustedes forzar el puente!
—No.
—El hacerlo costó mil bajas a los franceses en 1813 y la pérdida del
general Vander-Maesen, que murió aquí.
—¡Tantas bajas hubo!—exclamó Jáuregui—. No lo sabía. Por
entonces, yo estaba herido en Cestona; por eso no pude tomar
parte en la batalla de San Marcial.
—Por lo que he leído, si no murió más gente francesa fué porque un
jefe del batallón, Lunel, se colocó en esta orilla y cañoneó esas dos
casas de enfrente y el fuerte de ese alto, llamado Casherna.
—Es curioso. Desechada la idea de forzar el puente hay que intentar
atravesar el río por otro lado. ¿Por dónde le parece a usted mejor?
—Por aquí, aguas arriba, se puede ir hasta el puente de Lesaca, por
donde pasaron los ingleses de Wéllington en 1813. El puente quizá
esté fortificado por los carlistas.
—Sí.
—¿Y el de Endarlaza?
—Lo mismo.
—Entonces, creo que lo mejor es que algunos de sus hombres vayan
a Zalaín, saquen la barca, que quizá la tengan escondida los
campesinos, y vayan pasando y fortificándose en la otra orilla.
Jáuregui conferenció con O'Donnell; decidieron esto y fué
marchando hacia Zalaín un grupo y después una compañía de
chapelgorris, que cruzó luego el río.
La situación respectiva de carlistas y liberales era ésta; ellos tenían
algunas fuerzas en el pueblo, varios tiradores en dos casas situadas
no muy lejos del puente, una de ellas llamada Dorrea, y otra que era
una antigua hospedería de peregrinos de Roncesvalles; tenían
fortificado el puente, unas compañías en un fortín de un alto llamado
Casherna y patrullas en el monte de Santa Bárbara. Los nuestros
estaban en un barrio de Lesaca, de nombre Alcayaga, y diseminados
por el monte Baldrún y por la orilla del río.
Para distraer a los carlistas se hizo un simulacro de atacar el puente
y se enviaron varias compañías hacia Lesaca. Se cambiaron
cañonazos de un lado y de otro y, al mediodía, los chapelgorris se
apoderaron de las primeras casas del pueblo.
Entonces empezaron a pasar más soldados por la barca de Zalaín y
comenzaron a aparecer y avanzar por la orilla del río. Los tiradores
de las dos casas, Dorrea y la hospedería de peregrinos, se opusieron
a su avance; las cañones de O'Donnell bombardearon las casas
hasta que las desalojaron.
Al ocupar las dos casas próximas al río los liberales, los tiradores
carlistas del puente se vieron mal y lo abandonaron. El puente
estaba libre de enemigos, pero lleno de obstáculos, y los que fueran
a quitarlos se exponían a ser cazados.
Entonces los soldados de Jáuregui cogieron dos carros con hierba y
los fueron llevando por el puente, y, avanzando detrás, quitaron los
obstáculos, y los nuestros comenzaron a pasar y a marchar al
pueblo.
Un grupo de veintitantos carlistas, al mando de un sargento, quedó
rodeado en la plaza por los chapelgorris y los soldados cristinos, y
los veintitantos subieron a la torre de la iglesia y se fortificaron allí.
Por la noche bajaron de la torre con una cuerda y se escaparon.
Al anochecer, Ganisch y yo y un liberal del pueblo, al que llamaban
Laubeguicoa, fuimos a una posada de Illecueta y cenamos con unos
carlistas; pasamos parte de la noche cantando, y dormimos muy
bien.
Por la mañana volvimos a la plaza de Vera. Le conté a Jáuregui
dónde habíamos estado, lo que le siguió pareciendo un exceso de
suerte.
Al día siguiente de entrar en el pueblo los liberales, tenían las casas,
la iglesia y el calvario; los carlistas estaban en un alto enfrente de
Vera, en un fuerte, con un cañón que lo disparaban a cada paso. Lo
que me hizo gracia es que los cornetas del fuerte carlista, de cuando
en cuando, tocaban la jota navarra, como para demostrarnos a
nosotros que no nos temían.
Yo le dije a Ganisch que alguno de nuestros chapelgorris tocara con
la corneta Andre Madalen y Ay ay, mutilla.
Los carlistas, como ofendidos al oír nuestra música, dejaron de tocar
la jota.
Yo me acerqué varias veces, a caballo, con mi esclavina y mi
sombrero de copa, al reducto de Casherna, y oí silbar las balas cerca
de mi cabeza.
En dos días, a fuerza de zambombazos, quedó desmontado el cañón
enemigo, desmoronado el fortín, y los carlistas abandonaron los
alrededores de Vera.
Toda esta acción, en mi pueblo, no me pareció muy diferente de una
pedrea de chicos. Al menos, en ingenio, no había gran superioridad
de los militares profesionales sobre los chicos. La única superioridad
que se podía encontrar era que en esta lucha de soldados había
muertos de verdad; hombres con el pecho agujereado y las piernas
rotas.
Pensé varias veces, aunque, naturalmente, no me atrevía a decírselo
a nadie, que esto de la guerra, como ciencia, es una verdadera
tontería; yo creo que la guerra es una cosa instintiva; así se
comprende que un cura, o un maestro de escuela, metido a
guerrillero, pueda tener en jaque a cualquier general: que un moro
desharrapado haga maniobrar a su gente como el más perfecto
táctico.
El 5 de abril, O'Donnell y Jáuregui se dispusieron a volver a sus
campamentos; yo me uní a unas tropas francesas que habían
avanzado desde el lado de Oleta a Vera, y fuí con ellas hasta la
frontera, y luego, solo, a San Juan de Luz.
La acción a la que había asistido me pareció poca cosa y me afirmé
en la idea de que si alguna vez tenía que tomar parte en la guerra,
no sentiría el menor miedo. Mi dandysmo estaba por encima del
peligro de las balas.
TERCERA PARTE
NUEVOS CONOCIMIENTOS

EN SAINT-MORITZ
Cada nueva parte de mi libro la voy escribiendo en distintos lugares.
Ahora he venido a Saint-Moritz, sitio de moda, por el que tenía
alguna curiosidad, pero pienso pasar poco tiempo. Este hotel,
grande como un cuartel, con tanto millonario, me ha dejado
espantado.
El enorme edificio está lleno de judíos, de americanos, de japoneses,
casados con francesas e inglesas, y hasta de chinos.
¡Qué decadencia la de nuestro continente! Por todas partes no se
ven mas que amarillos, negros y achocolatados. ¡Qué pisto! Dentro
de algunos años, en Europa no quedará un europeo de verdad:
todos serán mestizos y habrá una extraña mezcla de sangre de
todas partes.
Entonces, esta vieja Europa, que no tiene ya ideales, no tendrá
tampoco razas un poco limpias, y la común basura humana será el
patrimonio de sus ciudades y de sus campos.
La contemplación de la naturaleza no me compensa del
desagradable espectáculo de esta jaula de micos que me parece el
hotel.
Es curioso el poco entusiasmo que siento por la naturaleza alpina.
Acostumbrado al país vasco, con sus montes pequeños y claros,
estas enormes montañas me cansan, me abruman, me parecen
extrahumanas y casi desagradables.
El resplandor de las manchas de nieve en los montes, como trozos
de porcelana sobre el cielo azul, me hace daño a la vista.
Esta naturaleza grandiosa no la encuentro atrayente. Es una
naturaleza de aire cósmico, nada humanizada, monótona de de
color, que se ofrece, como una virgen selvática, al hombre joven y
fuerte, y que desdeña la debilidad y el cansancio.
Creo que el artista no debe encontrar grandes inspiraciones en estos
paisajes, que son para el turismo y la fotografía más que para la
literatura y el arte.
Me dicen que aquí puede haber una inspiración de algo grandioso y
colosal. Yo cada vez tengo más antipatía por lo grandioso y por lo
colosal. No creo en nada colosal. El hombre es, como decía el
filósofo griego, la medida de todas las cosas. Lo que pasa de nuestra
medida no es nada, al menos para nosotros.
Yo me contento con lo que abarca la medida humana; creo que hay
en sus límites materia bastante con que llenar el corazón y la cabeza
de un hombre, y no aspiro a más.
I.

PARÍS Y MADRID

A la primera ocasión que tuve fuí a París.


El París de entonces no era el de ahora, este París enorme, cortado
por grandes avenidas con árboles. Era todavía un pueblo de calles
estrechas, misterioso, en donde todo parecía posible. No había este
cuadriculado policíaco actual de la vida, que hace en una inmensa
ciudad como París, Londres o Berlín, se conozca a la gente casa por
casa y cuarto por cuarto.
Eugenio de Ochoa me sirvió de cicerone; pero me enseñó, sobre
todo, aquello que le podía dar lustre a él. Al cabo de quince días
volví a Bayona.
Muy poco tiempo después, al comienzo de la primavera, don
Eugenio me escribió diciéndome que sería conveniente que fuese a
Madrid.
Me alegré mucho; tenía curiosidad de ver algo del interior de
España.
Me ofrecí a mis amigos y conocidos bayoneses por si querían algo
para Madrid. Gamboa me dió un paquete para que lo entregara al
secretario del infante don Francisco, el brigadier Rosales, y dos
cartas: una para don Ramón Gil de la Cuadra, y otra para don Martín
de los Heros, políticos amigos suyos.
Eugenio de Ochoa me dió también una carta de presentación, para
Usoz del Río.
A mediados de mayo marché a Santander, en barco, y de Santander,
con grandes dificultades, a Madrid. Ya en el viaje me chocó la
confusión y el desorden que había en todo, y me asombró, al entrar
en Castilla, la cantidad de páramos y de desiertos que atravesamos.
Don Eugenio me esperaba en la Aduana, a la bajada de la diligencia,
y me llevó a una casa de huéspedes de la calle del Lobo, donde vivía
él.
Verdaderamente, Madrid me pareció feo y destartalado. La Puerta
del Sol era una encrucijada sin importancia; todo lo encontraba muy
polvoriento y descuidado.
—La verdad es que esto, al lado de París—le dije a don Eugenio—,
parece poca cosa.
—¡Ah! ¿Tú también vas a ser de esos imbéciles que porque han
estado unos días en París creen que han de despreciarlo todo?
Me callé, dispuesto a hacer las observaciones para adentro.
No es que yo despreciara Madrid, al revés; para mí, naturalmente,
era más interesante que París, porque en París no podía ver nada
mas que paredes y calles, y en Madrid hablaba con gentes de cosas
que me interesaban. Cierto que entonces todavía tenía ese pobre
entusiasmo de admirar una calle ancha y recta, o un monumento
muy grande, como si por eso fuera uno más feliz; pero, aun a pesar
de eso, como español, Madrid me interesaba más que París.
Yo comprendía claramente que ante la vida europea los españoles
éramos muy poca cosa, que no pesábamos apenas nada. Madrid no
llegaba a ser mas que un barrio pobre de París.
¡Y la gente! ¡Qué mal aspecto! ¡Qué aire de miseria, de mala
alimentación!
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!

ebookgate.com

You might also like